Andy Leonard is a Solid Quality Mentor and SQL Server Database and Integration Services Developer, SQL Server MVP, and an engineer. He’s coauthor of Professional SQL Server 2005 Integration Services, Professional Software Testing with Visual Studio 2005 Team System, Tools for Software Developers and Test Engineers, the MCITP Self-Paced Training Kit for Exam 74-441, Designing Database Solutions by Using Microsoft SQL Server 2005, and the recently released eBook, Mastering Visual Studio Team System Database Edition Volume 1. Andy founded and manages VSTeamSystemCentral.com and maintains two blogs there, applied team system and applied business intelligence. He also blogs for SQLblog.com and his background includes web applications, architecture and development in VB and ASP, SQL Server integrations services, data warehouse development using SQL Server 2000 and 2005, and test driven database development.
Greg Low: Introducing Show 38 with guest Andy Leonard.
Greg Low: Our guest today is Andy Leonard. Andy is a Solid Quality Mentor and SQL Server Database and Integration Services Developer, SQL Server MVP, and an engineer. He’s coauthor of Professional SQL Server 2005 Integration Services, Professional Software Testing with Visual Studio 2005 Team System, Tools for Software Developers and Test Engineers, the MCITP Self-Paced Training Kit for Exam 74-441, Designing Database Solutions by Using Microsoft SQL Server 2005, and the recently released eBook, Mastering Visual Studio Team System Database Edition Volume 1. Andy founded and manages VSTeamSystemCentral.com and maintains two blogs there, applied team system and applied business intelligence. He also blogs for SQLblog.com and his background includes web applications, architecture and development in VB and ASP, SQL Server integrations services, data warehouse development using SQL Server 2000 and 2005, and test driven database development. So welcome, Andy.
Andy Leonard: Thank you Greg.
Greg Low: Can you describe how you became involved with SQL Server?
Andy Leonard: I was standing close to the server when they let the SQL Server guy go. Sort of true. Reassigned SQL on a database warehouse team I was working on. I was web developer, report administration. I was done with my part of project. We knew we needed to reassign the SQL Server guy. His passion was C Sharp. I was the only guy with the words SQL Server on his resume so I qualified as the SQL Server person. Large data warehouse project for then and for now. 1.6TB in SQL Server 2000, about five years ago.
Greg Low: Whenever I see discussion on large and small, terabyte boundary is still where people talk about large.
Andy Leonard: Yeah. I agree. I think there’s threshold in SQL Server where database performs differently above than below. I had no previous experience with large databases before that. I gained experience quickly. I was temp to perm. I was told if I made it work, they’d hire me.
Greg Low: Makes you think before making minor changes.
Andy Leonard: There aren’t any minor changes in database that size. Cluster key across five columns taking 24 hours wasn’t unusual. Learned a lot, got a lot of experience, I’m still learning. I enjoy the field. Moves at different pace than development. Never thought I’d do database work, but I’ve gotten into and enjoy it.
Greg Low: Thing we were going to talk on is database testing, specifically unit testing. If I look at individually, developer camps and database DBA camps, developer camps its game over. Heavily involved in unit testing. Not all test driven development. Different story. Most into recognizing point and value of unit testing. When I put other hat on, database side, it’s almost rare to find anyone doing decent unit testing, unless through application layer down to database. Tend to not test database unit layer as separate thing. Do you see this?
Andy Leonard: Absolutely. You’ve seen same things I have and describe them well. When I speak to .NET user groups, I get chance to talk about testing database and I ask at beginning how many do unit test of databases. Almost all hands go up. I then ask how many running those tests through database access layer, something you’ve built or bought. Almost as many hands go up. I’ll ask about TSQL and get one or two out of a hundred and those aren’t .NET developers but database developers that showed up because I was in town. Rare. No disrespect or harm to fellow database developers. Some are brilliant, advocating testing in that manner. I trained as engineer. That’s where my testing comes from. You always test at lowest possible level. If not testing database, standalone, that portion, TSQL in query analyzer or tool like DATABASE unit tools out there, however you’re accomplishing, if you’re not testing database, you’re testing more than one thing. We all know failures, where’s the source? Data access layer? Database? You never know from the results of those tests.
Greg Low: For many who listen here, we must presume most don’t do unit testing. Need to define we’re not talking testing application. Everyone says they test what they do. Can I get you to define unit testing?
Andy Leonard: My understanding of unit testing, you jump in with yours. My understanding is you are not in search of value per se, but in search of data types, column names. More metadata is what I’m after when I unit test. I do exhaustive unit tests on stored procedures. Stored procedures need unit testing most. Most suited for what software developers talk about in unit test. Did it return something or nothing to me in format I expected? If you look at values and if the right thing was returned, that’s functional testing. Lines blurred in software world. I don’t have problem with it but like to make distinction so people start with right terminology and right idea. You can blur. Both are important. Unit testing typically looks for did column name I expected come back? Proper data type?
Greg Low: The point to make also is we’re talking detailed automated testing. A distinction from people doing testing of application or as they build proc, they test proc. Key is as you get more code in system, eventually every change you still need to be confident nothing broken. If you can express what should happen in each case in a test, that’s automated test you keep. Run suites of tests so when you change things, done with confidence.
Andy Leonard: I agree. Those suites of tests have value that is just now being realized, even in application world where people store them and use as regression suites. They know they haven’t broken backwards compatibility with new version.
Greg Low: Also gets around fact that if you’re trying to test at more manual or higher level thing, can be very hard to go through, particularly at UI level. May test your application, but hard to do very detailed test. As you get more complexity and possibilities, number of tests exponentially increases.
Andy Leonard: Yes. There are great applications for managing these suites of tests. I enjoy working with Team Foundation Server and I find it does great job automating test runs. There’s a whole lot of functionality in for testing and more stuff on the way. SQL Server 2008 release, more will come in Service Pack for Visual Studio. Good stuff on the way. Good stuff now. Early in game. That’s why a lot of database professionals don’t have a lot of testing experience. Relatively new to their world. That’s fine. Your listeners should feel they’re missing out, missed new trend, behind times. Trend isn’t old, just started in past year. Not close to a discussion on database testing as a practice. Young. Plenty of room and lots of need for people to get involved and add their ideas. I follow blogs of giants in field and those who blog occasionally. Interesting things generated from the people that have application development experience such as yourself. Have seen both sides, understand database development is in state very much like application development was in 10 to 12 years ago when test driven development first proposed.
Greg Low: We should make a shout out to Adam Machanic, a friend of ours, who runs SQLBlog.com, an active MVP. Adam has been at conferences sessions where he talked his TSQL unit framework. One of the frameworks out there. There are many people starting the push at looking at how this should be done. Another point is detailed unit testing databases has also been considered hard or harder than application development. Feeling on this?
Andy Leonard: I agree with that assessment. Largely because to test the metadata you receive from a stored procedure, TSQL is the language you want to use if you’re running in native form. Difficult to query those results and check if you got right column or data back. Tough things to do.
Greg Low: That’s a good point. We should cover some terminology. When you talk those things, testing folk would consider them assertions. An assertion could be things like did I get right number of columns back? Do they have the right names? Did someone rearrange the select statement and now they’re in wrong order? Does it matter? Did I get multiple sets of rows back? Did I get an expected value? Did this query return no rows?
Andy Leonard: Or it gets worse. Some ways easier. If you’re asserting an error, that you violated a key restraint on an insert statement, easier to do in TSQL than in metadata returned to you from actual query or stored procedure. I get excited about the frameworks out there. I focus on database edition because that’s what I have experience with. It’s the best I’ve worked with.
Greg Low: We’re talking about Visual Studio Team Edition for Database Professionals.
Andy Leonard: Yes. But it was renamed for 2008 to Database Edition.
Greg Low: All flows from group Gert Drapers is in.
Andy Leonard: They’ve done fantastic work, from raw software development perspective. The work from their first CTP release in May 2006 until June, that’s a phenomenal amount of work in 25 or 26 months. Hats off to that group. I don’t know when they sleep. They are cranking out fantastic code.
Greg Low: One thing we should also talk is why it matters. To me, I find in the DBA side, people are hesitant to make changes ever. Development side, refactoring, people are more prepared to endlessly revise systems.
Andy Leonard: There is excess on both sides and room for both mindsets to move to middle. When I think of that, I think of what makes good production or operation or DBA good at their job. I understand that change can be bad. Enemy. Those of that mindset make good operation and production DBAs. Wired that way. Being in a position where they resist change, knee jerk response is no, if you need to get something done, bad, but not bad overall. Conducive to the long term stability of enterprise data architecture. On same side, developers do need to make changes. Tweak things. Make improvements. That’s the goal. Sometimes what they change makes things worse or what’s more common, nothing happens. Make change here that does improve code or application in one spot, but brings server to its knees later on trying to do another query. We have natural contention that’s built up between database admins and database developers and application developers. Some database developers do good job bridging the two, but database development as a field is young. I agree with you. I don’t see as bad thing. Diversity in technical sense.
Greg Low: Ability to insist on keeping stability in enterprise can be very imp thing. Many that I talk to, part of the reason resistant to change is that when they make change, have no idea what they’re about to break.
Andy Leonard: I agree. That’s one of the more valuable things that testing can bring to the enterprise and the individual. Helps both sides. If application developers can see when they make changes, and performance testing is huge. We’ve been doing for decades in SQL Server. We didn’t call it that, but that’s what we were doing. Checking to see when we executed suite of statements, returned in 30 seconds, or each individual one returned in a second.
Greg Low: Good point. Very valid assertion. This statement must return in certain period of time. That fixes the scenario I often see where a DBA has built a covering index or something, then someone has added an additional column to query. Query still runs fine, but takes forever to return.
Andy Leonard: Yes. That’s a default built in assertion in Database Edition, the query executes in defined time. They're aware of that. Built in assertions powerful. You can add your own conditions. You mentioned one I wrote. Seems dumb unless you run into it when it breaks everything. Number of columns and order. I wrote that test condition. I think I put that in the testing book. How to extend Database Edition with custom test conditions.
Greg Low: I was pleased about that one. I did say to the team in early versions, my frustration with the inability to extend parts of the product. One we talked about was building assertions. I know Ricky went to a lot of trouble making sure the ability to build assertions was built in. on his blog he has good examples on building assertions. You’re not limited to choices of what you can check. You can build your own assertion that this column will relate to another column.
Andy Leonard: Absolutely. I’ll second your shout out. He taught me how to write my own custom conditions. That’s where I learned, his blog. Great information out there. That example I just gave, the proper columns, they had the name I expected and in the order I expected. I use that a lot testing results of ETL. I’ve done a lot of work with SQL Server Integration Services. That’s used typically to load data warehouses. I found that I could use Database Edition unite testing to test the results of my ETL.
Greg Low: Yes. In fact I’ll have you pursue that in the second half of the show. Specifics of what you’ve done. We’re saying a main reason to do this is we want o be confident our changes don’t break everything else in system. Part of the reason is to give us ability to make changes. We have our interface to external world we can define in terms of test. Allows us to make changes, making sure we don’t break all the clients. There haven’t been a lot of books published in this. I read Scott Embler’s book, Database Refactoring. Did you read that?
Andy Leonard: I haven’t. I saw your review on it. I read the first bit of it. I haven’t had time to read it.
Greg Low: When I look around, an area without much books or text referred to. It’s a bit in infancy. In Scott’s book, he describes different refactorings that might happen with database. Those not familiar with refactoring, let’s talk that from application development side.
Andy Leonard: My understanding. I started doing with web applications. Typically I would start by building active server page with some database and j script code. As the page developed, I would see portions of code that could be reused. First I’d have 50 functions. I’d pick some functions from that list, that with a couple arguments added, I could reduce to one function. One example of refactoring. Another is, let’s say I change name of variable on my web page and want that name change to propagate throughout rest of page. I could rename refactor there. Typically, real intent is where in development I see that it could be served best going from presentation tier to a business or middle tier and pull code out and plug into DLL and call from there. Parallel in databases there, but not straightforward. Rename refactoring makes sense. Rename a column in a table, you won’t break stored procedures addressing by old name. Search and replace. Not that easy as column names can be same. ID column in customer’s table. You could export and script out stored procedures and do blind search and replace, changing all ID columns to customer ID, but you’ve also changed the ID column on the product table. Blind search and replace doesn’t do justice to rename refactoring in databases. Same for table names, though not big risk of tables having same name in SQL Server. Schema’s relatively new.
Greg Low: We could have tables in schemas with same name.
Andy Leonard: True. You’ve done database development against other platforms. I’ve done some with DB2 and Oracle. Pleasantly surprised to see schema as formal object in SQL Server 2005. Adds to architecture and to code reusability and especially security.
Greg Low: I’d still try to avoid tables with same name in different schemas, but possibility.
Andy Leonard: Refactoring is changing from brainstorming in code. Just to get things running. Now going back and cleaning up, sometimes just from aesthetics, hard to read. In web page or stored procedure, more code than needed, will perform worse than if you can shrink code. Performance reasons for refactoring. Last, code review. There’s been interesting work in past few mos. Statements that say don’t write code for reuse. Interesting take on that.
Greg Low: We should define that most refactorings involve changing almost the quality of code without changing behavior of code.
Andy Leonard: Great definition.
Greg Low: Example. I was at consulting site a few days back. When I look at worst performing code, I found proc where I could see far from optimal. What struck me was how many people how hacked around and made changes to it over time. Eventually, entire code needs to be rewritten. Unit test, if I go in and want to rewrite, I need to have good confidence that it behaves same at end as when I started. Many refactorings are constantly improving code. Rename refactoring is an interesting as changing database model and improving that, but potentially changing behavior as you might return different column names.
Andy Leonard: The database applications are different. They are never presentation layer. End user isn’t directly reading from SSIS or Query Analyzer, but through some presentation layer. Can be several layers on top of the database between user and data. Database unit testing critical. A lot of engines including reporting services have semantic layer or device that maps names of field’s user can interact with, build criteria with, to names of database columns. Sounds like should be no-brainer, don’t ever change column names. But sometimes you’re adding a column to a table or data being collected. Or you might be supplied with data. If names are similar, need to rename. Unit testing will tell you what will break.
Greg Low: Simple search and replace aren’t enough as you get any level of complexity in database. Uses an instance of SQL Server as parsing engine and does deep parsing, not just a simple search and replace. TSQL is messy to parse but does great job drilling down and finding appropriate objects.
Andy Leonard: Yes. One of the changes they’ve made. June CTP of General Development Release. GDR. Move away from requirement of local instance of SQL Server to manage that. I don’t know how they parse now, but were relying on relational engine for that.
Greg Low: One thing I’ve seen talked is they seem to build stand alone code based parse engine for TSQL. There are a number of products which find need to parse off line or disconnected from SQL Server. Reporting services where when you go to configure code in rpt, parses TSQL. I’ve found frustrating in reporting services that parsing is different from SQL Server itself. I can add statements in reporting services that are valid, but reporting services will tell me not valid. I often want to tell it to just go with me and send to database. It’s trying to interpret and deal with itself. Like a predicate where something in. you can’t do in TSQL at variable name. In reporting services you can. It will then take the variable and pull apart and rearrange before sending to SQL Server. Applying different rules than if same stmt sent to SQL Server itself. Seems to be push to build separate standalone parsing engine as there are different parts requiring. Wouldn’t surprise me if intent is to use same things. I’m simply guessing.
Andy Leonard: I’d agree. I have no knowledge of that. Would be wonderful. I’ve seen tons of uses for that. Similar issues in other services. For standalone query generation, would be nice to add that to Visual Studio development. If anyone writing TSQL statement, could validate.
Greg Low: Another area is move of product towards supporting other database engines.
Andy Leonard: Not sure if you know, but they made change in GDR to take in that directions. Provider based model. They’ve made announcement that partnering with IBM to support DB2.
Greg Low: I’m glad you mentioned that.
Andy Leonard: Blogged. Nice to see them doing that. We were talking in 2006 with teams, that was something that came up back then. Some functions could be useful standalone or aimed at other engines or provider base. Beyond scope of this talk, but data generation. I remember thinking that would be great.
Greg Low: We’ll talk in a bit about data generation. It comes up pretty regularly. Let’s take a break and then deal into some specifics.
Is there a life outside SQL Server?
Andy Leonard: Absolutely not. Are you kidding? I do have a life contrary to popular belief. I’m the proud father to five children. Here’s the shocker. They range in age from 26 years to 11 months. That’s the big shocker. I have two daughters from my first marriage, ages 26 and 24, out on their own, pursuing gainful lives. My second wife and I have been married six years and we have a five year old son, a three year old daughter, and an 11 month old son. But this is it. No more.
Greg Low: With 11 months, I suppose you’re at the sleeping through the night stage?
Andy Leonard: Goodness, no. we don’t have sleeping kids. They all wake up. My wife and I joked the other day that she usually gets up with them. The older two are usually up and down between when they go to bed about 8:00 p.m. and about 4:00 a.m. I’m in a little town, Farmville, Virginia. Quaint community with a couple colleges in town. I live in the woods with about three acres of land. Across the road is a couple hundred acre field and pasture. Rural. I’m one-half mile from one of the colleges, so I get 10MB DSL and pizza delivery here. It’s perfect. I grew up in country. That’s why I talk this way. I’m not faking my accent. Life is good. I do a lot with SQL Server and sort of a personal life with user groups. Recently I was honored to serve as PASS Regional Mentor. You’re my boss. We’re working together regularly now. Fantastic opportunity. Neat to see PASS focusing on this as they have. Something that’s been needed for awhile. They’ve tried before. It’s growing, how all good things come to be.
Greg Low: Quick shout out to PASS folk. There is major change this year in amt of focus placed on chapters. If anyone has SQL Server user group that isn’t chapter of PASS, certainly pop us a line or make contact. We should talk things. One criticism years ago was people with user group joining PASS and little would happen. A lot was resource constraint. But there’s changes happening and people should look at.
Andy Leonard: I agree. Exciting changes in and around PASS. Large developer, user community of database professionals. More so than any other organization. Driven from ground up. Doesn’t’ happen in isolation. Growth and resource constraints were very real. Legitimate gripes. But work in past few mo s astounding. Please check it out.
Greg Low: There is the main Summit coming up this year. I hadn’t really mentioned much on this. November in Seattle promises to be biggest Summit ever. I’m looking forward to it. Being in Seattle is huge involvement from Microsoft and SQL product group. Many MVPs going, people coming. Looking amazing. Other to watch for in September/October, there’s a whole series of things related to, not the launch of SQL Server 2008 as it’s already happened, but the RTM appearing phase. Events surrounding that. Many will be coming from PASS. Keep an eye out for that.
Andy Leonard: Last night we had July meeting of Richmond, VA SQL Server User Group. I cofounded. We did event as well. We never did Heroes Happen Here local event. We postponed, thinking we’d do with SQL Server launch. Now talk on doing even more than just the normal Heroes Happen Here event. We’re excited. I won’t go into detail. We’re excited to hear chatter of more stuff coming out. Will be lots of launch-like events. A lot of SQL Server people were disappointed by the hoopla around Visual Studio and Windows Server 2008 and that SQL Server was rolled in, even though four to six months from code. Same time, decision made months before. My take is, I don’t do marketing. Not my bag. I won’t tell them how to do that if they don’t tell me how to write TSQL.
Greg Low: Unit testing, testing of databases. One difficulty that comes up is you need to build tests so that they can be rerun endlessly and you can pick up and know every run won’t have leftover shrapnel from previous tests.
Andy Leonard: There are a couple approaches that are popular. You generate data and do every time. Even tear down database and recreate every time. In test and development phase, works well. I recommend redeploy and populate database with data however you want to derive. More and more, especially in U.S., we have regulations for health and financial data that requires us to not use production data anymore.
Greg Low: That’s a good point. Endless question is when doing testing, where do you get your data? Most common response is periodic copy of production database. Other answer is getting scrubbed copy of production copy of database. Another answer is pool of data typed in.
Andy Leonard: Those options are all valid. When you’re talking technically, production data is best to test and work with. But is not legal or best practice, need alternative. There are good data gen tools. Redgate has one. You can use data generation tool in SSIS or even generate data from SSIS. I’ve used TSQL. Mostly I use one build into Database Edition. It’s flexible. Since it’s built into the engine I develop database project in, I can deploy project and have target automatically tear down database and redeploy then populate, or do incremental build and truncate all the tables and drop data in from tool. Preserves relationships, can be as gibberish as you want. Unicode characters. Lots of control over data. A feature in Database Edition that is extensible. You can write your own data generators.
Greg Low: Yes. Another area with good power. Specific example. Rather than here is text field, a column, I can say perhaps this is a U.S. phone number. I can build data generator that generates U.S. phone numbers or build one with Australian State names. When I generate data, doesn’t have to be gibberish. I can say what type of data is and rules to perform for this column. Can reuse wherever makes sense in database. Other thing it’s not just repetitive. You have options as to null values, percentages nulls, logarithm curves that determine how numbers allocated. Clever stuff.
Andy Leonard: Yes. Even though you don’t have to have repetitive, if you’re testing, repeatability is good. Uses seed value, starting between zero and nine. Every time you deploy database and have data generation plan, collection of generators to populate tables, it will put same data out there so you can write tests that go after specific Australian state names or phone numbers and return info.
Greg Low: Good point. Seed does do that nicely. Idea is if I found problem running test, next time I don’t want different data and not be able to find problem. This system randomly defines how you do based on seed value. If you use seed value, same data every time. Simply change seed values to get different data.
Andy Leonard: Point you made that once you find error or unexpected test results, or assertion failure, then process to correct is one thing. Moves to development arena. For database tester, failed assertion means you need to write another test or improve your test. You might be missing something. You had tests that tested moving through. Can mean something bad happened and a change was made. Process lends self to iterative development. Testing, no matter what type of method used to dev software, testing portion is iterative and I use a lot when I’m doing database development, whether reporting solution in SSIS or integration solution, I find using iterative process. I’ll do work and will think all is OK and will send downstream to next people in process. Most recently, did project with master data for SAP. Handed data set to SAP. They’d load and find issues with values and fields. First step I took as database developer was to write test that detected those conditions where assertion failed on their end. Moved down pipe to my point, granularity, source of issue. When I do development, test driven development, writing test first. When you execute, fails, so you get red/green. Fail first then succeed after correcting. This project lasted seven months so I had a suite of tests. Every change, and it happened often towards end, one little change would affect tests written six weeks earlier. I had introduced more error into it. Value of testing paid off.
Greg Low: Two points. When someone finds problem, what you do is don’t modify code, but build the test to make sure that you can now detect. That’s something that should have been part of testing. You now build and detect and fix the problem. That way you have built in test that forever in future will keep checking same thing. If you just dive in and fix code, you’re back in loop that something else might occur in future causing similar problem that you might not detect. Aim of building test when you find problems is to build list of things that have gone wrong that you’re checking for. Test driven development is another on my list of topics. We’ve talked how people build tests, but different approach is test driven development. As you said, people build tests then code. Idea is specification up front. Know what is meant to do. Then define tests and write enough code to make test pass.
Andy Leonard: Yes. That also is iterative process. I use iterative and organic interchangeable. A way to do software development. Not silver bullet. Silver or lead. Solving problems or weighting down?
Greg Low: What’s your gut feeling on how many people do test driven development compared to unit test?
Andy Leonard: Academic sense, I’d say minority. Good reasons. Version 1, test driven development is an expense. Slows things down. If your application has nature up front where you know you’ll put out Version 3 someday, that’s driver to go for test driven development and take hit on Version 1. I’ve found and have documentation that I can’t publish, validating several projects where Version 3 test driven development is at minimum break even because of regression and backwards compatibility ability. Add quality of code from Version 3 forward, money in the bank. More than makes up difference. Up front, yes. Unfortunately, a lot of projects, database and applications, we don’t have time. Told to get done yesterday. Don’t do these practices. That project that lasted seven months, still was rush job as so huge running data and SAP from collection of companies. Seven months was barely enough time to get done.
Greg Low: An interesting point you raise is whether test driven development makes more sense at some point where you’re in project. Version 1 there is often so much churn in project, can be hard to test upfront. Once you have system that is starting to take shape, when you need another proc or function, you know what it needs to do. Rather than describing in document, you do in set of tests and then make sure you’ve built what you need.
Andy Leonard: I’ve found when I do this I will write one test at a time, then one proc. That’s how I start. Typical for me at end of phase, minor release or some spot where it makes sense, to go back and write things like specific functional tests and tests that test for proper error. Things that are important.
Greg Low: Good point. Great example. Let’s say I do something that should cause primary key violation. Not good enough to just occur n-error occurred, but that error occurred. Maybe the insert failed for another reason not to do with primary key violation.
Andy Leonard: Great mechanism in TSQL and 2005 for doing that. Try and catch build in TSQL. If you catch error message you can return error message function which will come back and you can test to see if this is error. If so, way to assert error. Difficult for people who haven’t tested to understand how valuable error asserts are. About 10 percent of time, in my experience, applications run just as they’re supposed to. You can write application in one-tenth of time that will perform under ideal circumstances. My experience also says to do good engineering, software and solution engineering, you need to account for user input errors or bad data or network failures. Any host of conditions that could happen. Makes up about 90 percent of my code dealing with invalid inputs. We’ve all been there. When I started writing code I’d write the 10 percent. When it broke I’d say someone did something they shouldn’t have. That’s OK if I’m writing for me. When customers paying me, they won’t accept that. A text box with a date field and someone inputs something that’s not a date, no solution for them. Just an error waiting to happen. Art to this. Engineering involved. A lot of science. That’s the breakdown I see. 10 percent code under ideal conditions, 90 percent exceptions, handling faults gracefully, not taking server to knees, shutting down business. Testing helps with that. It’s not that I believe unit testing can achieve zero errors. I do think it seriously reduces errors and does so enough to provide positive return on investment of time and effort.
Greg Low: That’s good. With a database, getting back to known state is harder than it is with application. Does proc run in certain time? Irrelevant test. We need to get to that point. Data generation is an option. Flexible. Issue is we can’t do on every test. Would take long time running. I find useful is using database snapshots. Any luck trying that?
Andy Leonard: I have used database snapshots. As long as I’m not violating a rule or best practice, health or financial, those have been effective. Techniques for performance test setup that have gone back awhile. Ways to load statistics from production to development or smaller database. You can get some better idea of production performance. These go back to SQL 7.0 days and before.
Greg Low: What I find useful is to use data generation to populate database but then when I start tests I’ll generate database snapshot, run tests, then do restore from snapshot, a quick operation. I can do enormous number of tests, always starting with clean slate.
Andy Leonard: Great way to do. Eric Veerman has done similar work with snapshots to be poor man’s transaction manager. He’ll snapshot and run against database then do restore if it fails. Has had success with large data loads with SSIS. Interesting use of snapshots. I don’t know when Microsoft developed it they knew it would be popular use. Transaction management. Is fantastic tool.
Greg Low: Makes sense when single user doing things. Otherwise doing restore from snapshot is scary. I find where you have your own copy of database working against, very useful. In production scenarios, database snapshots interesting, but I find limited in hard to refresh snapshot if you want it to be used as point in time thing. Functionality needed.
Andy Leonard: I agree. We’ve all had uses for snapshots. There could be more there. From testing point, next use beyond is if you’re in situation where the database you’re executing tests against are being used by other users, I’ve seen people put and leave dummy records in production data so they could migrate data to quality assurance layer or stage just below production and run tests against that. They could get profile of how different objects will perform in production. I find in larger enterprises, SANS with disk space to hold. People will put bit field at end of large tables to indicate test rows. Status indicators so people know when reporting against data don’t bring in test rows.
Greg Low: Must remember to put predicate in queries.
Andy Leonard: I’ve had that bite me.
Greg Low: We touched on try-catch. I was surprised to not see abilities of that expand in this version. Normal .NET development where we have try-catch, we can re-throw an exception. Things like that. Things are missing from implementation that I was hoping would be enhanced in 2008.
Andy Leonard: I’ve done manually. Have to reconstruct from error number and message functions. Would be nice to just write throw stmt in like you can in .NET and have error logic thrown to next level.
Greg Low: Problem is you can’t re-throw many errors. Only user errors. If you have 823, you can catch, but can’t re-throw 823 back to surrounding code as you have no way to throw system error. A lot of good work done in TSQL 2008, but an area I expected more in.
Andy Leonard: I miss finally.
Greg Low: Yes. The finally block. I think they presumed finally in .NET used for cleaning up non-managed resources, that there was no equivalent use. I think it’s clear a finally block would be useful.
Andy Leonard: Response I got was they thought you’d do try-catch in transaction then roll back or error would roll back to catch. OK. I can see that, but would like option. I’m sure use case out there.
Greg Low: Good example would be things in code not transactional. Variables. Can’t just roll back and have them be what they were. A finally block. Uses where it would be nice. You also mentioned you were using data to do proc to do integration services. What sort of things?
Andy Leonard: Interesting. I’ve done manual testing for years on ETL. If doing data warehouse, want to check data in warehouse is data you collected from source, proper manipulated as always manipulating to summary or aggregation. Doesn’t matter what tool you’re using. ETL testing, first is raw record counts. Nice way to do. Sometimes they are equal between source and destination. You have 5,000 customers in customer table. Run process to move data to data warehouse. Need to check custom dimension to ensure after runs still have 5,000 customers. Where complex is when you’re applying aggregation and storing in your data warehouse at different grain, maybe summary, maybe rolling up customers by state or country. You don’t have actual individual customer in that dimension in data warehouse. No need for. I can’t imagine a data warehouse without individual customer data but I have seen odd things in my years of doing data warehousing. Oddest I saw was a time dimension that didn’t go below the week. Depends on company.
Greg Low: I can imagine scenarios where volume of data is high. One thing I laughed over a month back was a comment where someone had exceeded bigint for identity column. Not a lot of good workarounds there.
Andy Leonard: My goodness. What do you do? Make another bigint column and compound primary key?
Greg Low: Conversely, a chuckle at a MVP group, someone in data model for agenda, had defined as bigint. Everyone chuckled over what the possible use could be.
Andy Leonard: I have a story about the gender column. When you’re loading date warehouses and using certain methodology, you can say when a value should never change. Of course, defining a person dimension, I put gender as not changing. Of course it changed one time and blew up process. About seven million people in database. Gender had changed. Real. I learned then, fixed attributes. Be careful. If it can change, no value judgment, just technical standpoint, pay attn. I did test conditions that I have started using regularly when testing date warehouses. One seems simple. In data warehousing, you’re removed from source system. Often forget you’re there and make changes to source system so line of business processes faster and adds to source table. Often that will break the load process, ETL process, or data warehouse itself. You’ll get duplicates where there shouldn’t be happen. Being able to detect number of columns, names and order, sometimes in reporting applications, off shelf or homegrown, use ordinals for fields. Important. Can change name or type. If goes from numeric field with a sum applied to a character field, reports stop working. These problems can propagate and grow through enterprise. I apply tests there. Schedule them to run periodically. Tests I run often, sometimes after each ETL execution are those that aggregate. Look for counts. Source and destination. I coined ETL instrumentation. I’ve blogged on it. Six parts to a series. Four have been published, about 120 pages. Collecting data from your ETL process. When I test against, I collect enough info to give me confidence that my ETL has performed correctly, timely, integrity, moved data as I wanted from source to destination. First is expected counts. You can read more on that. Second order is doing sums or some sort of aggregation on numeric fields. Financial data is dollar amts. I have exceeded currency data type. I’ve done that before. I’ll do modulus of some ridiculous random number and look for result. Way of doing hashes or check sums. Way to say does this hash equal counterpart in database in data warehouse. I collect data using ETL instrumentation, then test with test conditions. Fascinating to tinkle with. Lends itself to my engineering training. I’m an engineer at heart. You want to know what you did works and have that process measureable, repeatable, and to continue long after you’ve left project. That’s why I do it. One additional data mark for each ETL project I do. Test and reports executed.
Greg Low: Great. Brings us to time. Where will we be seeing you?
Andy Leonard: I’m speaking at PASS Summit 2008. SSIS scripting. In 2008 we now can do C Sharp. Big addition. Visual Studio tools for applications. I’ll be there.
Greg Low: Excellent. Thanks for your time today. We’re all hoping that while this is infancy, I’m encouraging people in community to get out and look at unit testing.
Andy Leonard: I agree. And thank you for the opportunity. I’ve really enjoyed it.