Reza Rad is an Author, Trainer, Speaker and Senior DW/BI Consultant. He has a BSc in Computer engineering; he has more than 10 years experience in databases, programming and development mostly on Microsoft technologies. He is a SQL Server MVP and his writing on different aspects of technologies, particularly MS BI, can be found on his blog.
Reza participates in a variety of community forums, and is a moderator of MSDN SQL Server forums, the leader of the New Zealand BI users group and speaks at a variety of SQL Server conferences including SQL Saturday.
Greg Low: Introducing Show 59 with guest Reza Rad.
Welcome, our guest today is Reza Rad. Reza Rad is an Author, Trainer, Speaker and Senior DW/BI Consultant. He has a BSc in Computer engineering; he has more than 10 years’‘ experience in databases, programming and development mostly on Microsoft technologies. He is a SQL Server MVP and his writing on different aspects of technologies, particularly MS BI, can be found on his blog that I will put a link to in his show notes. He participates in a variety of community forums and is the moderator of the MSDN SQL forums, the leader of New Zealand BI users group and speaks at a variety of SQL Server conferences including SQL Saturdays. So welcome Reza!
Rosa Rad: Thank you Greg and thank you for your time it is nice to be here.
Greg Low: Lovely and so what I will get you to do is tell us as I do with everyone just how did you come to be involved with SQL Server in the first place?
Reza Rad: I started more than 10 years ago, actually as a junior Microsoft Visual Basic. Visual Basic 6 developer and I worked with SQL Server 7. That was the first time that I started to actual programming and working with databases as a graduate person who comes from university and after that I worked on different projects and SQL Server 2000. After that I started BI stuff and data warehousing, ETL and it comes to my current stuff.
Greg Low: Great, while listen I first came across you first at PASS and involvement with the MVP deep dives I think as well at the time
Reza Rad: Yes yes yes you were also writing in that book as well.
Greg Low: Indeed and the other one that I came across is of course one of your books, the SQL Server 2012 Integration Services expert cookbook as well. It is one that I have read in recent times and that was a useful book as well. What I thought I would do is start off around some aspects of Integration Services that I don’‘t often see people tackling all that much. So the first one of those is extensibility which I know you have done some work around.
Reza Rad: Yes, yes actually.
Greg Low: I would love to get your thoughts on what parts of the extensibility model for Integration Services you have used and what you have learnt?
Reza Rad: You know SSIS 2012 comes with some new features like SSIS Catalog which provides useful logging and information systems for deployment of packages and also logging of different parts of logging which previously in Integration Services 2008 and also 2008R2 we did all of this stuff with some scripts and some actually exec SQL tasks. In 2012 we have many of these stuff built-in already but also this staff need some more extensibility features for example in many SSIS projects and many ETL projects and I actually feel that there are some requirements for frameworks, for SSIS frameworks, for ETL packages to do for example this kind of task for example to provide consolidation. For example for data warehouse and SSIS packages, yes I worked on this type of framework. I strongly recommend others to use this kind of framework for their work.
Greg Low: Yes I have often said to people that I think there is some maturity curve people go through in using Integration Services. I think the first phase is where they tend to use everything out of the box as it is supplied. I think the next level they tend to start to get involved in script tasks and things like that. I think the lurks level is probably where they start to then build components that drop into that Toolbox of Integration Services and prepackaged their logic. Probably the last or the higher level of that is when they start programmatically using the object model to create packages directly like packages.create, package.task.add and so on and literally doing it in VB code or C sharp code. What parts of those have you played with?
Reza Rad: I have played with that part of programming exercise which is for example programming in script component and script task which is actually basic SSIS programming. I assume because the component and task is ready, you just develop what you want in documents C sharp or.net. For example if you want to apply some regular expression on a specific string you can do this in SSIS script component and create some couple columns based on that. These are actually kind of basic programming stuff in SSIS and also as you said there is capability working with SSIS component to create task and output, create packages programming actually which is very handy and useful. I actually use this part in some of my projects to create for example, in some of my projects I created some components and task to just use them from Toolbox to drag and drop. To use this in different packages and also I use some stuff to create packages from scratch, actually C sharp programming.
Greg Low: Yes I think one of the things I liked with script components is that they seem quite useful for prototyping things that you might create as components yourself anyway. They are not a bad starting point for that, but the problem I think with script components that’‘s right you only had have what do they call that copy and paste inheritance. If you are going to reuse it you just copying code from somewhere else.
Reza Rad: Yes that’‘s rights.
Greg Low: Yes I think it is desirable to try and repackage those capabilities. A good example I saw that recently, I had a client who is doing work with a progress data source and they were using basically a ODBC connection but then they were applying every time they did that they were applying a lot of rules. Progress the driver if it had a value that was null, it would actually send it out as question mark, they would then have to write logic to reinterpret that as a value having null and so on.
It just struck me that you don’‘t want to be doing that all the time you want to build yourself a progress data source or something instead, is a much better way of using the product.
Reza Rad: Yes that is right, we create custom components and this sort of components to actually put some part of logic in a component task then in all packages that we want to use this logic drag-and-drop and when we want to change that logic we just change it once and all the other packages use this logic again.
Greg Low: So I suppose one of the concerns people have with this is that once you start packaging it up you are going to package it up as a DLL but then there is the issue of what is involved in deployment?
Reza Rad: Do you mean?
Greg Low: As I said I think one of the concerns people have is that you know as well is a package they now have something else that needs to be shipped out as well?
Reza Rad: Actually do you mean the documentation of the packages?
Greg Low: No I was meaning, if I create for example a new data source that is actually a DLL, and that DLL itself had to be deployed to both the developers doing the development work and to the server where the packages is going to run as well.
Reza Rad: Actually this is a requirement but I think this will not affect, there is a requirement for deploying this DLL for packages but this is not actually kind of tedious work or something that would be hard to do actually when use that DLL the first time for deploying that DLL it might be hard but it is a procedure that you can use.
Greg Low: Yes I think that is the thing and again you have to understand that they need it both when they are doing development and they needed in the run time environment as well. The other thing is in 2012 the way you added to the Toolbox changed as well.
Reza Rad: Yes, yes in 2008 it was completely different and not completely different in a different way, in 2012 it has changed you could use that.
Greg Low: I think in 2008 it they had the visual studio normal Toolbox that was used in the other languages and you had to go in sort of at the item to the Toolbox basically registering that but in 2012 it is quite nice in that way where you could just drop it in the folder and it automatically pops up into the Toolbox.
Reza Rad: Yes.
Greg Low: Have you found any problems when you are building components or anything you have learnt along the way that you found different in this version or are they all pretty much the same?
Reza Rad: Actually I have found them mainly much the same. The main things in 2012 were that we can use framework 4, stuff which was really quite good. In terms of deployment and putting that script, component and that task it is pretty much the same just a few small changes and all of those. I remember there are a few small changes.
Greg Low: One of the things that I wasn’‘t aware of is that when you are in that debugging environment inside visual studio of course when you are using the designer you are using a 32-bit application. But if by default when you are on a 64-bit system, the minute you click that debug button and you are actually running the 64-bit version. That was a little bit tricky, what through me was the error message that came up said something like cannot the call to perform upgrade method fail. And I was like what? I was like looking at this method in the code and I looked and looked and looked it is interesting it is just a subroutine it doesn’‘t even return a value and put breakpoints into it and I could step right through the whole thing and I couldn’‘t see any error at all. It was driving me absolutely crazy and I ended up sending the example off to Matt in the SQL team and shout out to Matt thank you. He basically came back and said hard then he pointed to the fact that that was a problem to the 64-bit registration not with a 32. It was really weird where it was registered for properly for 32-bit. You can go in, you could debug, you could set properties and do all that but the minute you try to debug it, it came up with this weird error. Just very peculiar the error that came up but it was just about making sure that you had the 64-bit stuff registered as well not just the 32-bit.
Reza Rad: Yes, yes that’‘s right actually I remember when you had this issue you asked for this and I also try to find out the solution but you found out the solution earlier.
Greg Low: Listen what sort of logic do you think is worth packaging up in your own components?
Reza Rad: What do you mean?
Greg Low: I am trying to think of, how do you decide whether to do something in a script task or whether to do it in your own components or is it just the amount of times you will reuse it?
Reza Rad: First thing is the amount of time you are going to reuse it actually I say that if I want to reuse it just only another package if you have read in a separate component and the other things is actually timeframe of project for example in some projects it does not have enough time to actually consider developing custom tasks all that sort of thing. For example in earlier phase of the project we might reuse script tasks all specific logic but for example in the next phase we change the execute task in the custom component. The amount of reuse is important and also this is actually based on the logic, for example it if it is something that can be done, combination of a built-in task so I prefer to create a package for that rather than creating a custom component.
Greg Low: Okay so you would prefer to create a child package out of a built-in task rather than lumping it together yes that make sense that is good.
Reza Rad: But this is only for example when you can do that in the current built-in task.
Greg Low: Listen another one that appeared in 2012, is the DQS client component, the Data Quality Services component, have you done anything serious with that as yet?
Reza Rad: Yes I have used it in a few projects, it has actually problems. Firstly let’‘s say what it is doing, it is actually connecting to the DQS engine. Data quality engine in DQS 2012, it is a data cleansing project it actually creates data cleansing projects and works with that and gets results. It provides the results into a data stream but the problem with this component is it is a single transact component.
Greg Low: Of so it is single threaded, yes okay.
Reza Rad: If you have a big data stream then you will face performance issues. This is highly recommended if you have big data stream that you should need to split it. For example conditional split task into multiple DQS cleansing components and then do it that way.
Greg Low: Oh so you are saying in the package itself, split it out into multiple streams that will hit different DQS component instances so that each of those yes so of course that is part of the data flow task in the first place, so the threading is controlled by the engine threats property on that task.
Reza Rad: Yes that is right there is also another trick for that because every time the DQS cleansing will create a data quality project, a number of data quality projects will arise by that time, so there is a requirement to actually clean this type a project, unnecessary projects. In DQS and Microsoft published a script we can simply use DQS as a cleansing projects to remove a script like that which actually remove unnecessary projects which is quite handy.
Greg Low: Now in terms of DQS, the basic knowledge base that you build the thing that you connect to, just running through the things that you can do inside that. The first one of those is things like’‘s stuff to tuition of data is in it?
Reza Rad: Yes actually you can create a knowledge base and in that knowledge base you can have different dummies. In each domain you can have say for example domain you can have this kind of value, this value has kind of synonyms.
Greg Low: So if I have a domain that deals with the names of sports, I mean somebody says football and here we might have preferred soccer instead or is something the other way around. So every time we see this we could substitute a different value or another example I saw simply things like here we have the AFL, there Auskick, there are lots of different names for things they end up actually relating to the same sport. Yes those sorts of substitutions on their fly are very useful, but all other sorts of rules and things that you can apply?
Reza Rad: Yes you can create also business rules. Some rules for example, some kind of cleansing rules for example you have gender information, female or male and also you have married status which is for example married or single and you want to get a title based on these two. For example you say that gender is female and is married and it is Mrs. If it is male it will be Mr and this kind of rule can be created because it has use will expressions. You can create a specific will with that expression for each domain also for company domain. This is a company domain because it creates different dummies and creates rules based on it.
Greg Low: I suppose one of the questions so is that I’‘m sure it will come to the mind of the developers and so on is that they can easily do that sort of deriving value again. Generally I would rather the user type that in but anyway or pick it out of a list but regardless if we are going to derive something like that out of the data that could be done just routinely inside and Integration Services package. To me I think the real value of DQS is that the person who has got the domain knowledge can use a client tool which is completely separate to the person doing the development.
Reza Rad: Yes that person or data can use that DQS schedule or data quality client with different definitions, create synonyms, change that knowledge base as desired and then that DQS component in SSIS can use that knowledge base.
Greg Low: Yes to me that this is the real value to me I often find people seem confused because they go I can do most of these things quite easily inside and Integration Services package but that is not the point. The point is you’‘d I want to be the one doing that, you want to hook it up to a knowledge base and somebody has declined tool who understands that business stuff, who actually does the editing of the knowledge base instead of you.
Reza Rad: Yes that is right because, the knowledge base is actually provided by the business and the data is taken from the business is actually the one who would do that. SSIS developer would just use that knowledge base with that DQS component.
Greg Low: It also the domain owner can also do all these updates and things without having to come back and get something changed. That is nice from that point of view.
Reza Rad: Yes that is right, also the part of data quality which doesn’‘t actually interact with SSIS is matching project. Matching project actually interacts with master data services not with SSIS.
Greg Low: Yes.
Reza Rad: The data can also use that matching project directly from DQS studio to apply some fuzzy matching data for example to see how these day two is similar to each other and this kind of stuff. Also it can be applied to before loading it into master data services then added to excel.
Greg Low: Another component that is an interesting one, do you tend to make use of the fuzzy lookups at all? Have you found them useful or not?
Reza Rad: Yes they are very useful the fuzzy components: fuzzy lookups and fuzzy grouping.
Greg Low: Maybe for those haven’‘t looked at fuzzy lookups, maybe if you could spend a moment to just explain what they are there for?
Reza Rad: Yes for example for the look up, it looks for the information based on the similarity threshold of those values which are reference tables. For example we have referenced tables and we have data incoming, this incoming data will be checked in the similarity threshold not the exact similarity. The exact similarity is the actual look up component.
Greg Low: Yes that’‘s right the look up component is the, I want the exact match and so this is just I want to match but I am going to determine how similar it is. I suppose the nice thing about that is you can tune that over time if you think too many false or negative results coming out of it yes.
Reza Rad: Yes based on the number of results you can adjust the threshold and you can also. Yes there are some tricks to using it for example. The fuzzy component is also, this also affects performance of the package because it is very time-consuming. But for example we can first apply the look up component to see for example how much exact match we have and then pass the non-matched stuff to fuzzy look up and that sort of thing and after the time we adjust the threshold and create better knowledge base for our reference tables.
Greg Low: Yes, that is great! Now in terms of the changes we started to mention before as well Integration Services now has its own database but one of the nice aspects of that is that there is a whole series of use and stored procedures, the catalogue views that are a programming interface directly to Integration Services from T-SQL.
Reza Rad: Yes, yes for example, that exercise catalogue is very handy for development, not for development actually deployment or logging information and for actually storing environment variables for example some stored procedure and some tables for that environment variables. These are all our stuff that we actually didn’‘t have in previous versions of SSIS. In current version all these tables, stored procedures are coming to create integration, a better logging system. For it example in 2008 I remember that we could do multiple configurations with XML configurations files but this environment variables are much better. We can create environment variables with different variables, map those variables, all package parameters and all of these actually happen in the backend database in the SSIS catalogue.
We can do all of these stuff with some stored procedure. For example I can create T-SQL script just to do all of this stuff deployment and pass it to for example the DBA to run this. This is much better for actually deployment, logging and this kind of stuff.
Greg Low: That is great, so listen I was also going to ask you BIDS helper, BIDS has also been updated for the current version. Which things in that do you find useful for Integration Services work?
Reza Rad: Actually I have used this for kind of documentation, good documentation. I use this from 2008 and then the new version of this which came with 2012. I used it again and it also has some other features, previously it had some features I think for version controlling. I actually didn’‘t use that.
Greg Low: Yes I think they had basically some work that they had done towards a smart comparison of packages and that was horrible in earlier versions. Even though a package is XML or the structure of the XML was just completely horrible and I think it is one of the very big step forwards in 2012, the way they have structured the XML in the package files now.
Reza Rad: Yes the structure of the XML it is completely changed. Previously you was in 2008, in in 2008R2 based on actually each property value kind of attribute value but in 2012 it is XML structure. Both are XML, this 2012 version is much better structure we have different elements, different attributes in XML for each property, each component. Which makes it..
Greg Low: Yes I like the way they are all now all nicely nested insight which it didn’‘t have a very good structure before. They also with all the attributes and things they list in alphabetical order even though they are case sensitive alphabetical order but at least they are the all in order. They have done away with the entire lineage IDs which were horrible they just all absolute paths to everything. The other big one is just the fact that they are only store values if they are not the default values, so that makes the packages much smaller in terms of the content. They can layout information and pushed that into a C data section at the end and it is discarding able so you could just compare only up until you hit that C data section if you are comparing two packages. Now it just XML comparison tools do a great job of finding what is different, whereas before it was just a lost cause trying to do that.
Reza Rad: Yes, yes and also with all this stuff copying something from a package and paste into another package based on the old structure was hard but this new structure is much easier to do this.
Greg Low: Yes indeed, yes actually another one I do remember that was in BIDS helper to that I used in the previous one was there was an option there to reset the GUIDs, the IDs for the packages as well, because internally they have a GUID. I did come across a bug when I was at one of my client’‘s sites where they had duplicated packages but they still had the same ID, same internal ID and what was really scary is that when one package was running it was accessing the memory of the other packages variables in the memory when they were running at the same time. That sort of thing is so hard to debug, just crazy hard to debug. I remember friend Neil who were sort of debugging that I think that sort of thing is completely horrible. It is always really important that duplicating and just like when copying a package to make sure it has different internal IDs and again BIDS helper does that by default. I think it has an option there for that.
Reza Rad: Yes this is a great option.
Greg Low: Yes look in terms of getting reasonable performance out of Integration Services packages are there particular things you try and do? I remember in your book you sort of mentioning a number of things you try and do. I suppose the data flow is where most of the action happens, so I suppose the first thing you listed was optimizing queries. I suppose they make sense that the queries that it is executing needs to be running as well as they possibly can.
Reza Rad: Yes.
Greg Low: But I think also probably not returning to much data is probably the.
Reza Rad: Yes for example, selecting those columns which are, which we want to use for example not SELECT * or that sort of thing. Just the columns that we want to use, actually using for example components like direct columns, transformation, all for example would just this kind of expression transformation. This kind of road by row transformation, does not affect anything but for example we have components like stored components. Stored components is actually performance consuming, this needs all the information to be loaded. We order all of this and send it to the next transformation so this kind of transformation also needs to be used carefully. For example if we can do that sort of stuff in there source script it would perform much better rather than stored components. Yes using for example components like fuzzy components, again carefully for example if you want to do that, first do for example look up components to narrow that data stream and once you narrow that data.
Greg Low: Yes you are saying to the initial search in case it is right and then if only it is not right then considered doing a fuzzy yes that gets rid of all the basic once to start with.
Reza Rad: Yes, yes that’‘s right.
Greg Low: Yes that’‘s good and I suppose the other thing would lookups to is you can control the query that is used against the database for the lookups as well.
Reza Rad: Yes, yes for example, for lookups you can do many things actually. For lookups we can do a look up based on for example on full cache, partial cache, or this kind of information for example if we have a small reference table, or a small lookup table so we can use a full cache because it is load everything.
Greg Low: Actually I should just take you back one second for those who are not very familiar with that, there are three caching options that you get inside those lookup components. We should go through each one of those, so full cache, so the it action is to go off and load all the data from the target.
Reza Rad: Yes.
Greg Low: Before the component starts and so often see symptoms where that is being used where shouldn’‘t be where a package runs until it gets to that point and then it sits there for ages and people think what is this doing and of course it is going loading all of the data from the.
Reza Rad: Yes that is right, it is that full cache is good for small reference tables but not full big reference tables. If you have a big reference table then if you use that full cache you will wait a long time to load everything in memory and it actually consumes all memory and then other stuff will run. Yes but you can for example use partial cache for this or.
Greg Low: So that is the next option: partial cache, so maybe describe what that does?
Reza Rad: Partial cache, actually we can set the cache size for example in advanced tab of that component we can say that my cache size is this, so load this partially and just check this partially with that incoming data partially and check that reference table or not.
Greg Low: And of course the third option is no cache at all?
Reza Rad: Yes this is no cache is actually time-consuming for each incoming data rows because goes back to SQL Server or any underlying database and searches for that value and then comes back again and goes for the next record. This is good when we have our low number of incoming records and we have a number of records in the referenced table.
Greg Low: Yes I normally, the advice I normally give to people on that one is it is also decision as to whether you are likely to see the same row again as well. So if I am looking up a row and I am never going to look it up again it is completely pointless chewing up memory and things caching those rows.
Reza Rad: Yes and one thing which is quite tricky in the look up component, for example when we have an incoming data row that has duplicate records. That we want is check that for example that this record exists in the destination, in this reference table or not. When we use for example the full cache it checked that it is not exist and pass it on and the next record comes, the next record is not exist again and this goes on so we will have duplicate entries because it has loaded everything into cache first.
Greg Low: Yes it and it didn’‘t find the duplicate. Yes that is one that I have often fallen foul of myself, you think it in terms of single row logic and you perform the look up but the value is not there. But what you are missing is that the fact that you have two copies of the incoming data that you are trying to put into something that would be you unique. So yes that can be a real challenge with that as well.
Reza Rad: Yes with that one it is important to check does this do because it differ when sending it to look up for that type of solution.
Greg Low: In some of those cases what are your thoughts about alternatively pushing that into a staging table and using a MERGE statement in T-SQL?
Reza Rad: Yes that is also another solution for doing that. In data warehousing stuff, always not me, actually data warehousing experts says that you should load the information into a staging table even if you can do all of this stuff in SSIS packages. Just first load it in staging table is to at least keep, these staging record of everything. This is good for reconciliation processes, for lineage processes and then does a transform package that actually brings the staging information and check it with the existing fact or dimension structure and apply transformation and load it back in.
Greg Low: In fact I completely agree with that, one of the things I am a big fan of is I tend to more of they would say ELT rather than ETL, I like sort of extracts load and then transform before I then loaded again. I think a large part of that is I like to have the minimal impact I possibly can on the source system and I want to if anything is going to fall over I don’‘t want to doing so on the way from the source system to where it is going. I want to get it in with the least possible constraints and then check the data and fix it in bulk and I have a strong preference to do a majority of that in T-SQL because again if you look at the lookups, if I have to look at 10 dimensions for a fact I would much rather bring the facts into a staging table and just do one big update statement with 10 left joins or something.
Reza Rad: Yes you are right.
Greg Low: That is going to completely outperform the other options anyway.
Reza Rad: For example if you have this kind of stuff in data warehouse, big challenges for example for incremental loads. We can use SSIS components for incremental loads, it can be also easier with helping T-SQL statements or different types of stuff for example. With slowly changing dimension component but it can be also made easier with T-SQL scripts. There are some best practices, this is actually why I am saying that you need to use a framework for SSIS packages is always recommended. For example if we use this method for a slowly changing dimension, then we can use this as a framework for all of our packages.
Greg Low: Yes it is a good point I probably tend to err on the side of way more T-SQL be used in my packages then what I see typically around the town but again I find it makes a big difference to the performance of the packages as well so.
Reza Rad: Yes this actually depends, yes is sometimes depends on your environment. For example if you have SSIS server and first database server and destination database server then that SSIS database server will consume memory for loading data into SSIS memory and that sort of thing. If you do not want to use that SSIS server memory or you have that SSIS server part of one of those source or destination then some T-SQL statement might work much better.
Greg Low: You also noted in the book I remember avoiding unnecessary sorting in the data flow. One of the things I have come across lately is people have tried to use like a merge join component and it requires a sorted input and so what they have done they have put a sort in front of it even though it was already sorted. I think it is really important to realize there are those options in the advanced edit from the data source where you can indicate that this data is already sorted.
Reza Rad: Yes that option is quite handy. Yes it is much better to sort this information, in I mean T-SQL or source an actual script, order by something it will perform much better than a third component. After that we just need to say to SSIS that ok this sort is based on this column. Just mention that it is sorted and do not use that use sort component heavily, actually performance.
Greg Low: Another thing maybe experiences you have had around trying to do incremental data loads? Everybody eventually gets to a point most of the time anyway where they need to not keep getting all the data were they just need to incrementally have become in. What have you found works and doesn’‘t work for you?
Reza Rad: There are some of best practices for this kind of thing. Incremental loads, there are some articles about how to do incremental loads and that sort of stuff in SSIS packages. There are different options, for example one option is to actually tracking information in a table. For example say that in this table, we have the staging table loaded after this date, in each in ETL round, in each SSIS job run we check that date and the information from that date again.
Greg Low: Yes even that is tricky right? This is a good example of where one of the things that I come across all the time is people have source tables that have like last modified date or something like that and it seems obvious that you would just keep track of where you up to, asked for everything since that and get the biggest value that you just got and use that is just the starting point next time.
Reza Rad: Yes, that is right.
Greg Low: But I run into trouble with this sort of thing all the time when this occurs. The first time I was really bitten by that it was a DB to source system and what was happening was that they were using a trigger to set the last modified date and what would happen is the trigger would fire when they are updated the record all the row. It was some short time later that it would then finish running the trigger but when it finished running it would set the date to the days when the trigger started running. So the problem with that is that what you had they were loading millions of rows the day but then all of a sudden every now and then two or three rows in a day wouldn’‘t work properly. It was just a small timeframe where that was a problem now in the end I think the solution for those. I often say to people in the classes is that how would you deal with this sort of scenario? I think it is really easy to say look I am going to use a last modified date but the thing you have to be so sure about is how that date get set. You know it is that a date that comes from the application at and has taken some time to get into the database all is that some proc layer or something and even that how long can that run before. When do the work out the value and when do that update the value, in my not be the same time I think there is a quite a bit you have to be careful with that.
Now in the case of something that is updated by a trigger, the thing that I found that is is good solution for that was never to ask for everything right up to the last moment. The trick is always to asked why everything up until 30 seconds ago or something like that and just keep doing that and that way hopefully whatever the period of the trigger is you need something wider than that, you only ever ask for up to that point then you tend not to miss things.
Reza Rad: Yes that is right and also, in new version of SSIS there is a task that helps a bit in this kind of solution.
Greg Low: Yes the Change Data Capture control and tasks inside there. Have you had a look how that is implemented though? Because it has been on my list of things to go and have a look at but what makes me nervous is when I had a look previously at the examples in BooksOnLine Change Data Capture works with log sequence numbers are reading out of the transaction log which they are a binary 10, they are bit ugly to work with and what they kept doing in the examples is that they kept flipping that and converting that back to an actual dates and time and then working with that and kept going backwards and forwards. I just think with the speed at which that things happen in a log, I am not so comfortable with doing it that way. I think you would be much better off to work with the actual log sequence numbers and just make sure you have every sequence number that you meant to have. I haven’‘t looked at the components yet, the commands they are issuing underneath the covers to find out how they are actually doing it.
Reza Rad: Yes that is actually right. I read this kind of information also in MSDN articles, I have worked and played with it CDC and that type of component. It can be better with some cases and best practices but this is actually a new component with some kind of improvement but it is quite handy. It is good to use it and you some best practices with it.
Greg Low: Yes I think the concept is great, I mean they added Change Data Capture in 2008. It didn’‘t ever feel to me like it was totally finished that part of the product that. I mean for a start they didn’‘t have DDL statements, so there was no CREATE, ALTER, DROP any of those things. It was all just system stored procedures and I talk to the guys in that group and they told me they would of liked to include DDL but they just couldn’‘t make it happen in time. The other thing is I find that when you have a change in source database schema and so on those sorts of scenarios you need to work through very carefully, to make sure you work through what is going to happen.
Reza Rad: Yes that is right because it these are very sensitive based. Those kind of metadata structure change and that is perfect.
Greg Low: I think that one thing that is really good about it though is the way it is reading from the transaction log instead of picking up the changes as they occur in the database so that takes the load off the process that is causing the change in the first place.
Reza Rad: Yes that is right.
Greg Low: So I think that is an interesting component. Listen another one I want to ask your thoughts on is the different types of destinations. I often get asked questions about, you know you look here and there is an OLEDB one and an ADO.net one and an ODBC one and they go ok which shall I be using.
Reza Rad: Yes this is I think depends on the data, I mean the destination type for example. If we use SQL Server as a destination also we have two kinds of destination for SQL Server. SQL Server destination and also OLEDB destination which are quite different. SQL Server destination is very good in terms of performance but the problem with that it only works with local instance of SQL Server.
Greg Low: Yes I think that has a terrible name, I always say to people when you first doing this that it seems obvious that if you are going to SQL Server you would use the SQL Server destination.
Reza Rad: Yes that is right.
Greg Low: I can understand why people think that would be the right choice but yes it uses that in-memory provider and that means that is cool right up to when you want to run the package on a different machine to the server.
Reza Rad: Yes, you will find lots of actually questions in MSDN forums about this SQL Server destination because of this kind of error when they face when the developer worked with it. If you have that kind of SQL Server instance, local instance then using that SQL Server destination is good if we do not have that or OLEDB destination is very good for every database engine provider that you can use OLEDB. For example with Oracle we can use OLEDB with some databases, we can use that for example with MYSQL I am not sure right now but originally when I worked with MYSQL there was no OLEDB provider for MYSQL so I actually first used for example ODBC provider or ADO.net provider for these kinds of providers are a bit slower than OLEDB. There are some kinds of differences between the connectors, data connectors’‘ type of connection between SSIS component to the database engine.
Greg Low: Yes I find I think the ADO is a slower destination. The thing that is kind of nice about that of course is you get to use named parameters inside the destination instead of the OLEDB one you just have a series of question marks as place holders. I find that makes the queries a lot harder to read and so on. For performance most people tend to use the OLEDB ones and of course the SQL team themselves though seem to be heavily invested in moving to ODBC in the future. I have got a feeling that down the track that is probably where we are going to end up being. The other thing I find people tend to get wrong with those OLEDB destination is when they first bring it out, if they connected to a table it is the fact that the defaults are like TABLE LOCK and so on as well.
Reza Rad: Yes, yes, actually we can use that, do you mean the fast load of information?
Greg Low: Yes, so the default often when people pick a table is the option that does only fast load into the table which is good and fast but only if nobody else is trying to use that data at the same time.
Reza Rad: Yes that is right. You need to use this very carefully, this fast load is very good in terms of performance but you need to think about when to use it, how to use it.
Greg Low: And so look are there any guidance things that you have around control flow? Anything that you see people often do wrong?
Reza Rad: In control flow, do you mean kind of best practices in control flow?
Greg Low: Oh just anything you see people do that you wish they would do differently?
Reza Rad: Yes there are multiple things, for example if I remember one of them. One of them for example you know in previous versions of SSIS there was no expression task, for example if a developer wants to create a variable which is coming based on an expression you just put that expression in the expression property of that variable and evaluate that expression.
Greg Low: So we don’‘t need to use a script task to do that.
Reza Rad: Or using a script task, if we use the evaluate expression variable task that variable will be solved by that expression every time you refer that way. So for example if we use that variable at the start of a package, in the starting of a package for example at 5 AM, this is something else but if you do the same at the end of the package for example 5:30 or 6 this actually provides some script task. For example that loads per value into a variable with that script all with new SSIS 2012 there is an expression task. We put that expression task in every place that we put this, this will be the location that variable will resolved by that expression. This means that this will be consistent and we can use it for logging and everything.
Greg Low: Yes that is great Reza, listen just before we finish up, what where will people see you coming up? Or see or hear from you?
Reza Rad: Actually in my blog.
Greg Low: Yes I will put a link to the blog.
Reza Rad: And some speaking sessions, for example 25 of this month I am coming to Sydney.
Greg Low: Ah very good.
Reza Rad: For SQL Saturday.
Greg Low: For SQL Saturday in Sydney. Excellent!
Reza Rad: Yes, I intend to attend this year at SQL PASS. I am not sure if you are going there.
Greg Low: I still undecided at the moment actually. Every year that I cannot be there, I am so sad if I can’‘t be there. I love being at the main PASS summit if I can.
Reza Rad: Yes these are my upcoming speaking sessions, they can email to me. This are a few kinds of ways they can contact me. Also on MSDB forums, I will be answering questions.
Greg Low: Yes on the forums answering questions, that is great and I will also put a link to the book you had for Integration Services as well.
Reza Rad: Thank you.
Greg Low: Terrific, listen that you so very much for you time today Reza and we will talk to you again soon.
Reza Rad: Thank you for your time!