Richard Tkachuk
SQL Down Under Show 29 - Guest: Richard Tkachuk - Published: 24 Jan 2008
In this show SQL Server Product Group member Richard Tkachuk discusses the SQL CAT team, working with Analysis Services and what is upcoming in SQL Server 2008 Analysis Services.
Details About Our Guest
Richard Tkachuk is a senior program manager on the Microsoft SQL Server Customer Advisory Team or the SQL CAT team. He now works with customers on the most challenging analysis services applications due to size, user workload, calculation complexity, or whatever. Prior to joining the SQL CAT team, he was on the analysis services team, where he focused on the analysis services engine, with a special focus on the calculation security models. Richard has been with Microsoft for the last eight years. He lives in Sammamish, Washington with his wife and four kids.
Show Notes And Links
Show Transcript
Greg Low: Introducing Show 29 with guest Richard Tkachuk.
Our guest today is Richard Tkachuk, who’s a senior program manager on the Microsoft SQL Server Customer Advisory Team or the SQL CAT team. He now works with customers on the most challenging analysis services applications due to size, user workload, calculation complexity, or whatever. Prior to joining the SQL CAT team, he was on the analysis services team, where he focused on the analysis services engine, with a special focus on the calculation security models. Richard has been with Microsoft for the last eight years. He lives in Sammamish, Washington with his wife and four kids.
Greg Low: Welcome Richard. As we do with all guests, describe how you came to be here, involved with SQL Server.
Richard Tkachuk: Yes, it is a bit of a long story. I’ve been in data warehousing for 15 to 20 years at this point. Long before Microsoft, I was a consultant. We were given an assignment that was completely undoable. Somebody showed me Excel working with the OLAP products. My mouth dropped. As consultant for transaction systems for a long time, customers would say, now we’d like to have someone information about data in systems. That’s going to cost you this and this. Immediately saw this makes so much easier. Partner and I started own business, did that for a couple years, worked with Microsoft, ultimately joined Microsoft working in analysis services.
Greg Low: I suppose, at the stage you started in, it was fairly early days.
Richard Tkachuk: Can’t claim to be there in beginning like some others. Joined Microsoft shortly after released first beta of Plato project. Ultimately turned into analysis services. Originals like Sasha, Mira, and Mosha 3:21 had been here for a while.
Greg Low: So current role with SQL or CAT team we haven’t talked about. Perhaps describe its role and where it fits in.
Richard Tkachuk: It’s a fascinating assignment. With CAT team what we do is have representatives across SQL Server. I know a little about the engine. You guys are just amazing technically running full gamut. All over the world. I’m based in Redmond at the headquarters. We have people all over: in the Bay area, Denmark, Philippines, Asia. What we’re looking for is customers working on most challenging assignments to stretch one or more aspects of usage. Want to work with people to help them make project successful. Take that experience back to product team, take lessons learned, and integrate them into the product.
Greg Low: Not a huge team, pretty high quality people.
Richard Tkachuk: These people are just on SQL engine; different aspects of replication, different features of SQL Server product are absolutely amazing. They’ve been there with customers doing design, learning hard lessons. Enjoyable team.
Greg Low: In terms of things that go back to product group, examples of things that you’ve seen done flow back?
Richard Tkachuk: Much of the work I’m doing is taking experience I have from product teams and working with customers and saying if I have a database of that size these are the things you want to be worried about: size of partitions, how many partitions you want to have. The stuff you take back, for example in the calculations we’ll see some usage of calculations that we might not have expected people to do in that, the more common they are, the more emphasis we want to put on them, to make them perform well in the engine.
Greg Low: That’s good. In amongst the CAT team, you were on the Analysis Services team itself, what sort of role really?
Richard Tkachuk: I’m an engine guy. The primary role I played was on the semantics of the calculations. MDX is to Analysis Services, what SQL is to database. What my job was to was as MDX evolves, how do we make sure it satisfies customers’ scenario and how do we keep it as clean as possible. The people who know MDX are snickering at themselves because it can be daunting, but when you get into finer aspects of calculation semantics there are fascinating problems solved, some need to be solved, some security model, how does dimension security work, how does cell security work, how do these things integrate with MDX and the rest of the engine.
Greg Low: In fact, that’s kind of an interesting thing, in general with analysis services, given the fact that you’re trying to aggregate things to keep performance really high, you would think that granular security is almost direct opposite goal of that. Just wondered how the tradeoff works in that area.
Richard Tkachuk: Well, great question. Two aspects of that: first, analysis service has two separate security models, one is cell security, where you couldn’t see the value of a certain cell, for example, you couldn’t see combination of profit for North America but you could see profit for Australia. That’s cell security. The other is called dimension security. That is, prohibiting some users from seeing any members of the dimension. For example, I could see USA and Canada, but not Australia; would be as if the country wouldn’t exist in database. Nothing I could do to see whether it was there. For calculations, we handle dimension security very well. That is handled in storage engine so the calculations are evaluated on top of the records on storage engine. Cell security is harder. Do see changes in performance applied to cell security applied to a cube versus dimension security, or no security at all.
Greg Low: Yes, that does answer the question. Another one in an allied area is increasing amount of encryption being used in transactional databases and I wonder, down the track, does that have any impact on what can be pulled out into Analysis Services. Is there any issue with encryption?
Richard Tkachuk: The analysis services is the little brother of database engine. You see lots of features; we’ll talk about these in few minutes. One of the things the SQL Server engine has incorporated in itself is encryption and actually in the next release, I’m no expert but somebody that expects encryption they’ve made it even richer feature being able to encrypt data, manage keys inside and outside the engine. The analysis services engine itself doesn’t offer the mechanism to encrypt data.
Greg Low: One of things I find very interesting haven’t yet seen widely deployed column level security in SQL Server level 2005. The beauty to me is idea that I can encrypt particular columns then, when someone issues a select statement against the table, depending on which certificates they have open at the time, they either see or don’t see the data. And I think there’s an inherent beauty in that, because the application doesn’t have to deal with that as long as the identity is being propagated down to database level. Whereas typically in an application you see endless code that says if they’re in this role show them this; otherwise turn this off, in the next layers up. There’s a kind of a beauty in being able to issue a statement against a database and then depending upon who you are, you see different data without actually modifying the statement at all.
Richard Tkachuk: Analysis services has something related to that, with dimension security: if you’re a member of role in my database, I can configure things such that you don’t know that what you think of as a column, and I think of as a member, even exists.
Greg Low: That’s one of the aspects of security, stopping people being curious. “You shouldn’t even know something exists if you don’t have access to it.”
Richard Tkachuk: And now to push that a little bit further, concept of advanced analysis services people know something about visual totals. We aggregate data very well. If we apply dimension security, what should the value you see for all countries be? If you can only see a subset of countries. What we spent lot of time on is thinking through these problems, more interesting if you look multidimensionally, you can see some members of this dimension, some of this dimension…how do things aggregate up properly? This is something that just works in analysis services.
Greg Low: Now in terms of the maturity of product itself, we are now a few versions in. It intrigues me that when I look back at SQL Server product, the introduction of Analysis Services was a significant turning point in the product. I suspect that the introduction of spatial data now could be the same sort of turning point. In terms of where analysis services is, where is it on maturity curve?
Richard Tkachuk: That is a good question, and we could talk for an hour or two about that topic. Let me start this way: it’s just beginning on the analysis side. When you think of what people need to do to be able to derive meaning from their data, it is up to the people in the industry to offer those mechanisms and see what customers accept and what they do not. On analysis services side, it’s so entrepreneurial in terms of what do we need to add so people can gather meaning from the data in their organization large or small. There’s so much there you can do with features that you can provide. And then those features have to be exposed in client applications via grid, via visualization. It’s something that is booming for last 10 years. I really don’t think there is an end in sight. It’s a very exciting business to be in. The question of maturity: analysis services is a mature product: 2000, 2005 release, in upcoming 2008 release. I would continue to look for innovations being incorporated to product. So much room to grow.
Greg Low: In terms of user experience, most people start to become familiar with pivot tables, things like that. Whenever I see visualization, it comes down to either reporting or pivot tables. Do you think there’s still a big gap in how things get presented, used by users?
Richard Tkachuk: If you’ve seen recent changes in Office 12, Excel and Analysis Services have been healthy partners ever since the first release of analysis services. I would look to that partnership as one of the reasons analysis services has been successful. One of changes made is not just pivot tables, they’ve added something called cube formulas, you can now, on cell by cell basis pull information from analysis services and merge it with data from other sources outside pivot table. What you’ll see is a richer look that people can apply more formatting, pull data from more diverse richer set of information they can present. Since Excel 12 came out we continue to work with Excel team, more exciting things coming out as well. Back to room for innovation: no end of opportunity. Fortunately we have broadly used application like Excel where a little does go a long way since Excel is so widely used.
Greg Low: One of things one of my colleagues referred to a while ago, he suggested when Microsoft started using technologies in their own products, they tend to jump to another level of maturity or acceptance. We’re just starting to see some implementation of Analysis Services in products like VSTS (Visual Studio Team Suite) and so on. What’s your feeling why it’s been slow to be rolled out amongst Microsoft products?
Richard Tkachuk: It’s like, why don’t those other people at Microsoft see the value in the products that you’re working on? Great question! I can’t speak to that. Performance point for example is taking off in Office. One reason is that a lot of the people who were on the analysis services team or who worked on that business intelligence or SQL Server are now working with Office directly . We’re growing people; they’re descending through the company. The value of analysis services is being seen; value to customers by incorporating some of the technology is being accepted. For those who haven’t incorporated it, give them their own podcast …
Greg Low: Interesting to see with VSTS now implementing number of cubes and reports based on cubes, so on, interesting the idea that the more it happens the more you'll see third party applications shrink wrapped that include Analysis Services. What I’ve seen in the past, more custom and one off implementations whenever I see analysis services deployed. What’s required to get to the point where more shrink wrapped applications would likely take advantage of it?
Richard Tkachuk: Those applications are out there, I’ve worked with many vendors myself to optimize use of analysis services applications. I’ve worked with companies like Inforce, many others who have incorporated analysis services in either ad hoc client applications or vertical applications like budgeting or planning applications. We’re doing some of that work here at Microsoft. More and more the use of Analysis Services is growing with time.
Greg Low: If I buy the typical business application today, it’s very common it has direct support for SQL Server, dramatically less common that it has analysis services.
Richard Tkachuk: Not ‘dramatically’. Analysis services is part of SQL Server. Always debate about what drives the business. One fact is this is here to stay; it’s a great business, big opportunity to grow.
Greg Low: In terms of database size, we’re starting to see some very large systems. Amongst CAT team, what size databases are you starting to work with?
Richard Tkachuk: Two answers to that. On the SQL Server side, I’m not the person to answer that.
Greg Low: Lubor is on the team, will ask him in a while on the show.
Richard Tkachuk: Then, on Analysis Services, how much data can you put into a cube? There are no hard and fast rules about size. If you have fairly straightforward, taking 5 terabytes of data from relational engine, be it SQL Server or one of our competitors, putting into analysis services, something that requires thought and expertise, but is not daunting.
Greg Low: An approach I see taken by many partners when dealing with existing larger Oracle sites, things like that, good approach to get in door is to implement analysis services against Oracle rather than threatening the core transactional business that’s already there, tends to be very good on sell for people to get in there. Analysis services one of the best of all application tools to work against Oracle. Issues with data type compatibility? I imagine cleansing of data is always a challenge, but when you throw in another platform...
Richard Tkachuk: Every database engine has own peculiarities re: how handle dates, data types. In analysis services, the way we handle that, we have cartridge architecture, different cartridges map to different databases and types of providers and types of drivers; you even get some eccentricities of each driver. We have abstract versions, then cartridges that translate that abstract xml into SQL we send. We have some flexibility to 1): simple mapping; and 2): opportunities for optimizations. SQL Server cartridges take advantage of that.
Greg Low: In the second half of show: we’ll talk about 2008 and what’s coming. Most common mistakes when deploying or implementing analysis services?
Richard Tkachuk: You’ve put me on the hot seat. Not a mistake, but types of calculations people do mimicking what they used to do with SQL and with relational. MDX is unfortunately close enough to SQL to lull one into sense of complacency. Most common issue is not getting concept of place that MDX has. MDX very bound to every cell, parent, hierarchy, previous member, next member, children, etc. Most common mistake is climbing that learning curve of MDX to make application sing, perform. That’s by far the most common. Other thing: dimensions feature has great analytic capabilities, but can be abused when things grow beyond certain size, performance can slow down. There are mechanisms to optimize those. Some blogs about how to resolve that.
Greg Low: Very common modeling requirement is some sort of many-to-many relationship.
Richard Tkachuk: Most common is fields by customer, customer can belong to multiple demographics, you want to see data by demographics, how do you show that? When we aggregate it, do a distinct sum, don’t double up. As data meets, it reconciles, doesn’t aggregate itself more than once.
Greg Low: When customers in multiple locations, when we view it by location, we can no longer add up locations to get total.
Richard Tkachuk: Aggregation design is also a common issue, in world of 1,000 attributes in a cube. Can tag attributes about how they participate in aggregations. Sample application called coldplex 29:21, then usage analysis, common area where people need to work.
Greg Low: Anything with time dimensions? Common errors? Things you see people doing the way you do not prefer?
Richard Tkachuk: We have great work in calculation performance for common time calculations that didn’t perform as well in 2005 as in 2008. 2008 version just sings. People might be pleasantly surprised. One thing: people can have multiple calendars. We’ll get technical here: every attribute can have a default member. Typically, a year attribute, the default member is current year. Someone can have fiscal calendar. What can happen is that default members interact unpleasantly. You slice on particular month of fiscal year, and that clobbers the default member on fiscal year, not on calendar year. All of a sudden you’re seeing a half or a quarter of the data you expect. Setting up default members can get tricky.
Greg Low: Any lessons on how data would be partitioned?
Richard Tkachuk: Let me plug a performance white paper produced about a year ago. Author did a bang up job. Partitions: the number is important. You’re OK with about 1,000 partitions, more than that can become problem in UI when we iterate over the metadata, things can slow down. Be cautious. Recommendations of row count in a partition, about how many rows can be squeezed in partitions. One more thing: how partitions work with distinct count measures. We recommend the column you bind a distinct count measure to, should have none overlapping values in each of partitions that they’re contained in. White paper coming out in next month on how to model this.
Greg Low: Welcome back from the break. Is there life outside SQL Server?
Richard Tkachuk: My four kids keep me busy. Aged six to 17. Took young ones snowboarding last weekend, kind of fun and trying at the same time. We’re a 40 minute drive to the mountains, wonderful place to snowboard.
Greg Low: Any interests of your own?
Richard Tkachuk: Sound trite, but I love this stuff. Problems of working with data are fun. For people who listen to podcast, I’m hoping they’re into it the same way. I’m not just another data geek!
Greg Low: Best thing in life is to work in your hobby. There’s a dividing line between things you have to do, and things you’d like to do. Are there areas you’d like to pursue that you don’t get to?
Richard Tkachuk: All of our work, there’s that piece of it that you have to do. Administrative stuff needs to be done. This afternoon, I was looking at a query that took forever, tweak, tweak, tweak to fix. Gratification of doing things like that. Next point I’m looking forward to, pointing out, why wasn’t this done automatically?
Greg Low: Performance trimming is really gratifying. I find that the dot net guys at the upper levels can tweak all they like, might get a five percent change, but working at the bottom end can have dramatic effects.
Richard Tkachuk: True. Some changes we did in 2008 with calculation engine. Will describe a naïve approach, exaggerate purposely to make a point: say you ask for a cell set, you’ve got some complex calculation, maybe ratio of two numbers, or more sophisticated, like: take value from last year, number of employees this year, project down to the number of employees I have, aggregate back up by multiplying by a factor over here. Point is, calculations can be from simple to enormously complicated. Can extend over enormous space. Theoretical space of cube can be unimaginable. One approach: evaluate first cell, do navigation from one member to parent or ancestors, get that data, move on to next cell, next, next. Couple things not good about this approach: cubes are generally sparse. Actual space populated by cells in that cube is small. One thing is to be able to ignore null space wherever we can. Techniques to do that in 2005 and earlier, some familiar with mnbp 40:52 calculations. Sometimes can’t give it that property. Other problem, repeating same navigation again and again. Members don’t change over cell set, repeating same navigation. What developers did move toward subspace computing model more like SQL Server, building an execution plan, then navigate in outer loop, not cell by cell, pull from inner loop to outer loop, figure out all information from storage engine, retrieve just that data, then apply calculations only to data we retrieve. Can be orders of magnitude less. Simple queries can improve by orders of magnitude. Don’t want to overpromise: what we’re gathering now, for typical user, we expect this performance change, this percentage better, watch for these types of calculations, etc. We see fairly significant performance improvement.
Greg Low: Doesn’t surprise me. Ironic, even in OLAP world, same approach. Example of that: had query that was populating details from one table into temp table. In each row, had subquery, going off looking for matching row another table. Irony was, number of rows in main table was huge; number of rows that ever existed in new table was tiny. Much better off put first table in, separately process few rows that were in other table. Dramatic change in performance. Just focusing on rows that are there.
Richard Tkachuk: There are some techniques you can apply. We’d like to make it work so engine can optimize away. Let me give you a whiplash, here’s a common mistake: take a calculated member, add zero to it, because people like to see zeroes. What that does is make that cube, what used to be sparse, dense. It’s filled with zeroes. We can do optimization for default values other than null. If people want to see zeroes, do with formatting. Adding zeroes to value of all cells will hurt performance.
Greg Low: Things that have changed in terms of analysis services, you mentioned execution plans are dealt with differently, are there things that expose that information?
Richard Tkachuk: Sensitive topic, matter of some disagreement. We have new API, can be used to tell us what the engine’s up to. Our internal application shows graphical representation of that. We’re not planning on producing anything publicly. First, new API, we are going to change it. We change it a ton of times through development life cycle. You could say, what about a sample application? Customers are ambivalent about those. If it’s worth a sample, why not put it in product? Still out there about what we want to do with analysis services applications.
Greg Low: Do designers have a big shot in the arm with this version?
Richard Tkachuk: I’m an engine guy. They’ve done some great things with tips, design suggestions, are wonderful. Example: when designing a cube, there are blogs, performance white papers, all that fragmented information. Tools guys pulled that into the product. Whenever you violate a design suggestion, you see a squiggly: “Have you thought about this?” If you’re smarter than suggestions, there’s a mechanism to do away with them. For example, I’m working on a MOLAP roll up, asked myself: what’s impact on roll up if I turn off a reference dimension? So I did, forgot about it, got “squiggly” reminder the next day.
Greg Low: Like little squiggly underline in Word for mistakes that suggest what you should have done.
Richard Tkachuk: Squiggly says, “we recommend that ….”. Concrete suggestions.
Greg Low: Someone half-joked about being intrigued at the number of things that pop up. Subspace calculations are a big deal. Changes in MOLAP write-back as well?
Richard Tkachuk: The way write-back works, in 2005, earlier: we need to know current value of cell, current value of all cells that contribute to that cell, figure out delta, to make value (of write-back cell) what you want it to be, figure out increment to add to each cell, write that back to write-back table.
Greg Low: In proportion? In absolute?
Richard Tkachuk: Bunch of different mechanisms. Absolute value, or related to some other value, lots of flexibility. Can optimize it to reduce the number of cells. Taking cells that contribute to a value, there can be a lot of them. Taking a performance hit in two places: First, in figuring out current value, have to go to write-back table, read contents to see what contributes to that cell. Second thing: once you figure out increment value, do update to relational table, can take lots of time. Maybe thousands of rows to update in write-back table.
Greg Low: Can look like a single value, but a lot went into making it a single value.
Richard Tkachuk: Browse your write-back table. Some say we’re not familiar with how write-back table works. Food mart example in 2000. They put in absolute value in top cell. Complaint, taking too long. See what’s going under the covers; see it’s not that long. Change we made: we keep write-back in MOLAP partition. Way write-back works: to read a cell, will get the MOLAP performance you expect. Will not read from write-back table. Will be huge improvement. Also for queries. When you do update, still do write-back to relational table, in parallel, updating data MOLAP partition. Impossible to say what performance benefits will be; what we see in queries, three times better performance in our lab.
Greg Low: You say query plan equivalents plans might not surface. What about dynamic management views (DMVs)? Session resources or something like that was exposed?
Richard Tkachuk: Good question. Don’t know because pretty sparsely documented. Program manager who owns feature is anxious to get as much out there as possible. Some killer information you can get from new schema row sets. Can get deeper understanding of who’s connected to your cube, how long they’ve been there, etc.
Greg Low: In 2005 with OLAP equivalent with DMVs and DMFs, I find them incredibly powerful, still quite underutilized. People still discovering them. So many added to product. Takes a while to get head around it.
Richard Tkachuk: SQL DMVs?
Greg Low: Yes. When added, a wonderful insight, but takes people a while to build it in to how they work. I like the idea of having it in a relational form in the first place, that you could run a query against or something, is excellent.
Richard Tkachuk: That’s the choice analysis services took as well. Concept of scheme and row sets. Analysis Services does support a subset of the SQL syntax.
Greg Low: Notable things?
Richard Tkachuk: Excited about share scalable databases. Can get multiple read-only versions of your database. There were ways to get this done in 2005, but now we’re giving you the building blocks in 2008, I’m looking forward to that as a great mechanism to scale out among large number of users. Can get multiple cubes, manage interaction between them. There’s some work we have to do, but should be able to see applications scaled far beyond what was done earlier.
Greg Low: From here, what’s in the future? Where might people see you?
Richard Tkachuk: Try to get to TechEd’s as I can, and the internal presentations to Microsoft people. I’d love to know about a fascinating application you have. We have a program, if you’re worried about some aspect of data warehousing, you think it challenges the design constraints, I’d love to hear more about it. Maybe we can talk.
Greg Low: Thanks for your time today, Richard.
Richard Tkachuk: Thanks for having me.
Phone: 1300 SQL SQL (1300 775 775) l International +61 1300 775 775 l Fax: +61 3 8676-4913
Copyright 2017 by SQL Down Under | Terms Of Use | Privacy Statement