Conor Cunningham is a principal architect on the SQL Server team. While many listeners will come across Conor in relation to query processing and query optimization lately he has mostly been working in an architect role. Specifically he has been focusing on Windows Azure SQL databases formerly known as SQL Azure. Conor has been working the largest customers and finding ways to make them successful by approving the overall architecture of the platform and in particular he has been improving customer guidance and tooling.
Greg Low: Introducing Show 51 with guest Conor Cunningham.
Welcome, our guest today is Conor Cunningham. Conor is a principal architect on the SQL Server team. While many listeners will come across Conor in relation to query processing and query optimization lately he has mostly been working in an architect role. Specifically he has been focusing on Windows Azure SQL databases formerly known as SQL Azure. Conor has been working the largest customers and finding ways to make them successful by approving the overall architecture of the platform and in particular he has been improving customer guidance and tooling. So welcome Conor!
Conor Cunningham: Hi, thanks for having me.
Greg Low: So what I get everyone to do the first when they are on is just to tell us how did you ever come to be involved in this in the first place?
Conor Cunningham: For those of you who are familiar with my prior work, I spent a lot of time working on query processors for Microsoft in SQL Server. I still do that to some extent but it so happens at one point a few years ago we had an internal organizational change and we decided we needed to fix a few things to get two different things to merge properly. You kind of look around at one point and say hey there is no one to do this problem, this problem or this problem. Then before you know it, you have volunteered. I was essentially volunteered to go and work on some areas that was new and also needed my particular skillset.
It is nice in some ways, because you get to try different things but it also means a lot of the core assumptions you have been thinking about for a long time gets to change. It is actually a fun time to be an architect because every day I like to describe it as a target rich environment. It is one of those things where, when you start looking at different parts of the problem space there are new things to do. There are some features that we haven’t released yet that I was specifically asked to go and shepherd into this new framework and what became Windows Azure Databases or commonly SQL Azure. We will be releasing some of those over the next few months.
My expertise in those particular areas lead me to discussions about the customer experience, the surface area. We have this role that we call the application administrator or the app admin, which is sort of what a DBA does in the regular SQL Server normal wager. I sort of spent a lot of time thinking about what would that role do? And how do they interact with the system? And what should be the layers and boundaries and the contracts that we close in order to make that successful? Some of that functionality is starting to leak out each month when we release and others will come out in future months when we get to the point to the individual recruitment.
Greg Low: Yes, look early on with SQL Azure I had a feeling that it was almost like a very divergent set of code from the main SQL Server code. But lately I get the feeling that is very much all back on a single track, is it pretty much the case?
Conor Cunningham: The original work that was done on SQL Azure was done on the branch of the code but it was still underneath SQL Server as you would call it. With layers above and layers below and things like this. We still are not at the point where we would say the architecture is final. We do have one code base and there a pieces that are added to sort of run our cluster architecture. I think that what you will find is that the policies that we have for SQL Azure will be a little bit different compared to what you will see at Enterprise SQL Server but over time more and more of those of those things will start to look and smell similar to what you would expect from a boxed product equivalent installation that you have been trying to do.
Greg Low: Yes, so look maybe if I wander through some of the most common questions I get asked when going out and talking about this. The very first question I get from people all the time is that they say isn’t this just a hosted copy of SQL Server?
Conor Cunningham: Right, this is a wonderful question and it boils down to what you are trying to get out of building a different thing. I think the answer is more complicated than just hosted SQL Server. I think if you look at the different strategies that are being employed by the different competitors in the market. There are some who are trying to sell you in the infrastructure and basically give you a VM on some hosted piece of hardware and you install whatever you want on it. Microsoft is taking a slightly different approach. We used this as an opportunity to rethink what it means to us to interact with our customers. How do we make it so we can make it so we can greatly reduce the overall cost to them to be able to use our products? If you have a VM, you end up having to install patches and understand what versions are there and worry about all the things that are associated with a traditional virtualized notion of a machine.
Greg Low: Yes, the same things you have to worry about with a physical machine too. That is it is kind of intriguing that in general businesses really don’t want to know about patches and operating systems and things like that. We have kinda of made them want to know about that or have to know about that. I think they really don’t want to.
Conor Cunningham: Yes it is the cost of doing business and when you look at Microsoft strategy with SQL Azure, it is basically a platform story we are selling you a database. We are going to work on building up the distinction between what is infrastructure and what is a platform. So you can say hey I just want to get a database then I want you to worry about guaranteeing to me that the patches are installed on the right time schedule and I don’t have down time installing patches and that I also have the right behavior from the system in terms of physical capacity for each of the resources that I need.
This story is very much being built but fundamentally we are in the process of building a story that makes it easier for you go to from zero to solution and overall lower the cost of labour required for people using our software. Ultimately those savings will mean you will be able to pass those on to your customer and that is a good thing in our mind. We think that it is a good time to disrupt the market and we are very excited to do so.
Greg Low: The next thing that people tend to ask I suppose is, is it reliable?
Conor Cunningham: Underneath we have SQL Server so all the things you know and love about what makes SQL Server great. Deal with, does it finish transactions or does it do this? All of the code is the same now there are some pieces of policy code which doesn’t exist in SQL Azure which relate to throttling for example. We put this in while we were building some of the fancier technologies required to be able to support multi-tenancy on our platform.
Greg Low: Yes.
Conor Cunningham: We should probably explain that for a moment for the listeners today understand what.
Greg Low: Yes it is interesting when I look at the pricing tends to encourage multi-tenancy.
Conor Cunningham: Right there are two levels of multi-tenancy here. The first level is that if you are building a solution on SQL Azure, Microsoft will put more than one database for more than one customer on the same physical back end machine. There a number of reasons for this, but there are many solutions in a virtualized environment don’t really need a while new machine. The infrastructure world to continue the analogy, you are going to pay for a VM for each of those solutions essentially. You can get fancy of course and try to combine them yourself but there are different solutions and harder to combine them in a safe way for different customers.
Microsoft is doing all that for you, by basically giving you a database by a giving you a virtual machine and then we are going to pass savings onto you eventually. The other aspect that is interesting is that when you get down to. We are talking about the reliability aspect, yes?
Greg Low: Yes.
Conor Cunningham: When we get down to what it means to get to fully reliable databases, you need to have some of the technologies that we have built in so far, and you need to add a few more things. There are some features which haven’t come out yet. It will help improve what it means to be quote unquote reliable. The multi-tenancy aspect when you start getting to reliability is, you have multi-tenancy at the machine level and multi-tenancy at the level at when I build a solution. What is the kind of paradigm I can use to build applications that are interesting? There is a whole section we can talk about, they types of people building the applications solutions on SQL Azure especially the larger ones. Are things that can’t be easily built on regular SQL Server? So we call these cloud services, vendors and they are kind of the analogy of an ISV in a box.
Greg Low: Yes, well look, I suppose there is reliable the other big question in that same area they usually follow up is, is it secure? The concern is around security and privacy.
Conor Cunningham: So this is a great question. Let’s numerate some of the concerns that people can have. Some people may not want to or may not be able to store sensitive data in someone else’s environment. Independent of who it is, right? So it could be, you need to have it on your physical premises. You are a bank or government agency. I believe that Microsoft will provide solutions for that space over time and not all the offerings have been announced yet. Fundamentally we obviously want to work with our partners in those industries to make sure that they have things that work for them.
I can tell because I work with regularly with the security guys. They sometimes say no to me as I try to do new things and new features in the platform that Microsoft takes security extraordinarily seriously and we spend a lot of time with very, very smart people working through what are the details about how we secure passwords and how do we provide fully integrated security experiences to make sure we don’t have too many people with access to your data. I think that you will find there will be published policies certifications that Microsoft achieves to demonstrate that it’s cloud platform follows the highest rigors of corporate behavior in that regard.
Greg Low: Yes that’s good. Actually it is interesting that often when I see people concerned about this is in the next breath though. I often look at how they have implemented security and things themselves. It is usually quite poor. It surprises me often that they have a level of concern that they seem to show with that. The same thing happens with the reliability. The vast majority of customers that I go into I look at and I think the people managing the Azure platform on their worst day are going to do a better job than almost every site I walk into anyway.
Conor Cunningham: I think you are probably right in both regards. People have concerns sometimes about large companies or storing things in other countries and ultimately Microsoft is going to have to demonstrate through its track record that it is serious about this. I think that if you look at the things that we have done in terms of its security patches over the past several years compared to our competitors you will find that Microsoft track record specifically in SQL Server is very, very, very good. It is rare that we have to do a security patch anymore and that’s a very strong statement. It is because we take that role very seriously. Some of our competitors have a few more of those without naming names and I think that you will find that if we keep working on that regard and have that same dedication that we have shown over the past five or ten years in that area that we will hopefully continue to gain the trust of more and more customers.
Greg Low: Yes I think another one that would be interesting to raise at this point, is I suppose is when it comes up the whole latency story. This is around the idea that I don’t have the things on my premises. What does concern me is that I see a lot of material, let’s say more marketing oriented that seems to sort of suggest that you can take things and redirect connection strings and hope for the best. From what I see in most applications that is just not the case. I find that there is architectural work that needs to be done to get a good outcome.
Conor Cunningham: Greg that is a good point. I think that any time you introduce a long link between any of the tiers in the solution you are going to find the application is sensitive to it. Even a very chatty application and that link is now millisecond each round trip. You are assuming that it is two milliseconds; you are going to notice that difference. No matter who is selling you solutions where. I think you are going to find for many reasons not just latency. It is actually a lot of things that you are going to think about before you decide to use the SQL Azure platform. There are real benefits that can be achieved by using it but it is unfair to describe it as one to one drop in replacement to what you have in SQL Server. So describing it in hosted SQL Server probably isn’t really what you want to do.
We do have an infrastructure in a server solution, a SQL and a VM solution that is preview mode right now. Which is more a drop in than a one to one replacement? Then you still have to be careful of where the latency and how I am having my different tiers talk to each other, so if I have my application tier and it is not co-located with my data tier you are probably going to notice that. Some of the applications that I work with we have to spend a lot of time learning about each millisecond. Try and find a way to make sure that thing can be on boarded properly onto either the SQL or IIAS platform or onto SQL Azure itself and that is an ongoing effort for us. But you are right, you will be wrong to say you can just turn a switch and it will just work, especially when you are putting your database out to some data center that might be two hundred milliseconds away from you.
Greg Low: Yes I think a good example I saw the other day is in fact I had somebody who was trying it and they had a database creation script and they said this is terrible because they sort of ran the creation script against their local database and it took a fifty seconds or something like that. They pointed it at Windows Azure SQL database and literally even to the nearest data center it took about 1 hour and fifty two minutes. They were saying heh this is ridiculous but what was interesting is I then got them to export the same database. It is only about 20MB but I got them to export the same thing as a bacpac, copy that up to Azure storage and then import it and the whole process took less than a minute.
Conor Cunningham: Yes so we have this solution for import, export that uses bacpacs and this is still a new technology and we are still in the process of getting the rough edges cleaned up on it. Fundamentally this is a way for you to pass a blob and all of the notion of dealing with latency for each of the round trips for what is effectively a chatty application meaning that regular BCP or T-SQL execution where you have every single is a GO. Then you are right, that is exactly why we have it, we want to make it so that experience is seamless as possible and ultimately transacted.
Greg Low: Yes. I think what struck me there are usually solutions they are just reproducing exactly the same solution that you use to use and so you often have to rethink whenever when you have lots and lots of little round trips. Of course the worst offenders I see are again a site I was at a couple of days ago, again it was an extremely chatty application. The endless cursor fetches, I mean let me go and do 90,000 remote procedure calls to bring up the first page of the app. That sort of thing that is not going to work.
Conor Cunningham: One call, one round trip gets you the page view which is your ultimate goal for a lot of these. If you can get down to the point where you have a very small number of page views you are being very deliberate in your architectural organization especially across tiers. You can end up with a solution that looks great and we have several customers that have developed either Greenfields solution or have an application that was pretty close architecturally to what you did to happen to be successful and have been able to throw it up there and make a lot of process pretty quickly. But people who have applications that were legacy, sometimes ported from an ISAM application and they didn’t necessarily architect the first time they went to SQL Server. They are having more trouble and were have to spend more time working with them to sort of step back look at the application all up and then say what is it going to take for you to be successful.
Greg Low: Yes. I think another interesting one with latency too is I have been encouraging them wherever they are based and whichever ISP they are using to connect to the internet to go and literally test latency across the different data centers too because I have seen people making the assumption the closest physical data center is going to be the lowest latency and that is just not the case in many cases.
Conor Cunningham: Right, in Australia and in several parts of the Pacific Rim we obviously have some links that are faster than others. There is a data center that we have in Singapore, we have a data center in Hong Kong and we have a data center that has recently opened up in the Western United States. I think you find that these have different speeds based on what path your ISP has chosen.
Greg Low: Exactly. We found when I was doing testing off Telstra is one of the local ones in Brisbane; I tried when I was there the other day. The Singapore one was certainly the quickest one, with the Hong Kong one a little bit beyond that. But it was interesting, I have friends in New Zealand the Western US ones were actually closer than the South East Asian ones in terms of latency.
Conor Cunningham: Right and obviously it is something we look at and do some regular testing on. If customers find that they think it is out of whack we encourage them to contact us and obviously look at it. I think hopefully you are going to find as we add more data centers and more capacity that we will keep trying to find ways to make our customers success because ultimately we don’t make any money unless the customer is able to build a solution that makes sense to them. Right, so this is to our benefit as well as yours to figure this out.
Greg Low: Yes. I suppose we can get a list of data centers at the moment from what is in the drop down list when you go and create a server but for people who haven’t tried that so roughly what sort of number of data centers are available at present?
Conor Cunningham: Right, we have I think 8 data centers that are publicly available for people to use. There are 4 in the United States and they are labeled North, South, East and West. You can perhaps figure out a little more precision exactly where they are. Fundamentally they are in different regions of that country. We have 2 in Asia, as I mentioned we have one in Singapore and one in Hong Kong. We have 2 in Europe as well. One in Northern Europe and another one in Western Europe.
Greg Low: Yes I think it said North Europe, South Europe and something like that.
Conor Cunningham: So something along those lines. So basically if you are in far Eastern Europe I think that you have some countries that are beyond 250 milliseconds from one of those two. But in general most of Western Europe is covered right now with that. There are also some parts of the rest of the world that have fast links to other spots so they are at least close to one data center. If you are not in one of those regions of the world and you are interested in trying SQL Azure please try creating a database in each of them and see what the ping times are and you will find that for example some of the places in South Africa might be able to connect to the one in San Antonio (the one in Southern United States) for example.
Greg Low: Yes I found that one of the things that seemed to be easiest was to turn on Client Statistics in Management Studio or something and execute a query, you know like SELECT some constant that just doesn’t have any query execution time and then just simply look at the times. It becomes very clear when you look at it.
Conor Cunningham: Exactly a good point Greg. There is a lot of things that are going to be a little bit different about trouble shooting in this new world. One of the areas that we have been looking at very closely and making some investments has to do with the client side view of the world and figuring out how to make that a bit more turnkey. If you have a link difference in terms of how far, or where your database server, where is your client and being able to have great tracing from the client side or on top of the SQL client code for example is really important to be able to get a handle on.
What is my view of the world from the perspective of my applications and what is the view of the world from the perspective of the back end database. If you have both of those that are nice and clean then you can understand what is the link cost there? We have also had to get a lot better at our statistics because the behavior of certain networks especially in Shared Networks tends to be interesting. You get this exponential distribution of timings for traffic. Right so you have an average time and then you have to start thinking about what is the tail of that distribution because it is not just who is the average and it is actually exponential. Meaning that the time for the 95th percentile and the 99th percentile and 99.9 percentile might be very interesting and might be very, very large. You have to think about that and understand, is your application able to handle that tail or is that a fundamental problem and what is it have to be in order for your application to be successful?
Greg Low: Well listen one of the things it would be good to tackle too is I get a lot of people seem to think this is based on some sort of super computer type thing but just try and point out that isn’t the case. It is very much more standard hardware that things are based on.
Conor Cunningham: Right I think one way to describe this and this tends to work for most of the customers I tend to work with so far. It is fundamentally if you were to buy a regular SQL Server box and let’s just say your application is wildly successful and eventually you run out of capacity on whatever machine you by and then you have to throw that machine out and buy another one, typically a larger one, and right. You keep that process going and every time you get to the capacity limit of your machine you buy a bigger one and it is not linearly more expensive, it is probably more linear. Right, it can be more expensive, SQL Azure is using commodity hardware and it is bridging the gap by software and we are trying to figure out how do you get to the point where we can use great machines and great on a price performance basis and not set up a server otherwise. They have plenty of disks, they have plenty of memory but they are fundamentally the pizza boxes you put in the rack. They are not huge database machines and ultimately this different will save customers money because the cost of running these machines is much, much lower. Requiring as well, than buying one of these big database machines that we have had in the past, so a lot of the differences that we have to deal with these days is the fact it has to do with scale up platform and not a scale out platform. The machines are a commodity so we have high availability built in via software to help us have multiple redundant copies of your data with the assumption that the machines are going to die more frequently and we are just going to magically move them around to some other cluster machine. We have huge racks of these machines now and the question is how we use them to maximum effect to make our customers successful.
Greg Low: Yes, so maybe one thing that would be interesting is I suppose just to spend a few minutes talking about roughly how does load balancing happen?
Conor Cunningham: Sure, so let’s describe what load balancing is for customers and we can talk about the current state and what works and what doesn’t work with that? If you recall earlier in the conversation we talked about the fact that we store more than one customer database on the same physical machine usually. What ends up happening is that sometime a particular database maybe completely called and no one is touching it at all and it doesn’t matter. You have enough disk space and no other resources and other customers can use the resources of the machine and life is great because this allows Microsoft to make money and also allows us to offer a lower price point for our customers.
Now let’s say that we can’t predict exactly who is going to be busy at a given time. It would be great if we could try and even out the load across all the machines that we have in our cluster so customers all have a decent experience. We don’t want one machine to be completely over loaded so that your queries are blocking all the time waiting for CPU. We want to move you to another machine so that you are able to get reasonable performance all the time and no one is being treated unfairly. We have a component that is called a load balancer that moves things from place to place and the way it does that typically is by utilizing the fact the high availability solution has three copies of your database. Those three copies are stored on three different machines on three different racks. Whenever we decide that it is time to balance out the load we run the load balancer and say ok it would be great if we move these three databases to other. Move from their primary to their secondary and we will switch the primary node and this process will even out the node on the system. This is great when it works and there are a few cases where we are still working to improve the customer experiences better.
One thing I will mention here is SQL Azure has a feature that does this, it will basically do a fail over. It’s like a fail over on the box and you have to reconnect so we teach customers best practice. They need to have code that is able to go and reconnect so that they can take advantage of this. What is fundamentally happening is we are pointing you to a different machine so you can get the load on the other machine instead of the machine you were on before. This same process is what happens when a machine dies, we just failover to another machine and then we build a third replica if the machine was completely dead.
Greg Low: What I have been mentioning with a number of people too, they seem surprised by re-try logic but in most enterprises I go into they should already have something like that as part of their architectural design. I always think it is sad when I see people who have spent a fortune of highly available systems yet the minute they fail over like they are designed to with every application in the building breaks. I just think that is very poor because the people doing the development could have dealt with that.
Conor Cunningham: You are right that not a lot of people are great at doing re-try logic. I would say even in most of the code I have looked at it is not written perfectly. On the other hand, we at Microsoft have an opportunity to try and make it easier for customers and not have to write that code. I tend to emphasize a lot with customers experience and what the total time for solution and how much work do we have to do. Maybe in the future we would be able to deliver some solutions where the actual amount of work required is not much different than SQL Server box and even the SQL Server box experience where you don’t have to think about that as much if you write your code the right way.
Greg Low: I kind of like to, rather than just start a transaction, do a bunch of work hope for the best and commit. I kind of normally like to see code that sort of says while we haven’t committed this. Let’s try and do this and it is not all that much different to the original code but least then if I have to have things like it is on a cluster on the server that fails over. Even if I have a deadlock or something like that I could chose to take an action based on that you just end up with a more reliable system overall if your approach is to try and massage the transactions in rather than just assume they will go in.
Conor Cunningham: I think one thing we haven’t done a great job lately is teaching customers what is the best practice they should use? How do they organize the communication between tiers? In some cases because you want to be able to figure what is the exact transaction boundaries that you are implying. In other cases what is the policy when you have deadlocks or you have failover. How do you avoid losing some of your work and making sure your application isn’t linked to that? Finally we have what’s the way we can make it so it is easy to understand the code and make sure it is technically interact because not all customers are SQL Server experts and that’s ok. We have to figure out ways to make it so that everyone can use our product and while we obviously have some people like yourself who are experts at using the product and think deeply about the end to end solution. We also have some customers who are still learning and we have to figure out ways so that they don’t end up impacting themselves without knowing it. This is an area I have done a lot of time thinking about because I think about having a beautiful customer experience is really important and a couple of things here in the SQL Azure where we still have to improve.
Greg Low: I suppose you started to touch on throttling there. I suppose we should tackle that one because that is a topic that tends to come up. More so than just the re-tries like the sorts of things that tends to currently lead to potential throttling.
Conor Cunningham: Let’s describe what throttling is. Not everyone may have experienced it so far. Because SQL Azure is multi-tenant, we don’t have the ability to block everyone from running the same time. There are some cases where we will overload a particular machine on IO. We are able to run a load balancer yet to move load around. We have cases where we might say, you know what we have too much IO on the system and in order to protect even the fundamental cluster health. We have to prevent new IO cluster from going in. We will start returning errors, or blocking new requests until the existing load on the server backs down. This is an imperfect solution; I think anyone will tell you if you want customers to see. The way I will describe this is, throttling will always exist in any multi-tenant system and you will probably have seen this in the box, where eventually you get out of memory condition and things like this just to take the surface in different ways.
Long term we need to provide different options for our customers to be able to write our code so that it works reliably so we don’t have a lot of errors to deal with. We are in the process of putting together some offerings that I am excited to be able to tell you about in the future but not today. I would say that throttling is when you have to deal with right now and basically you have a big re-try loop like you do for reconnects. As soon as you get that into your code, things can work just fine in SQL Azure. With a few exceptions here and there where things are horribly over loaded and we are working very hard on making sure we provide a lot of the internal mechanisms that give you fairness so that throttling becomes increasingly rare. I think that would be the way that I would describe it.
Today the early version of SQL Azure didn’t have that much isolation at all, we have been working on building more and more isolation which is similar in many ways to what a shared system or mainframe system would provide to you. Once you have that in place, then you will have some cases where maybe we would have provision on a machine but we would be able to plan a lot better. In many cases we would be able to give customers different options to let them say hey, I would like to make sure that I guarantee certain amount of resources available so that I never get throttled.
Greg Low: Yes actually I noticed that even over the time I have been using it the number of scenarios seem to lead to that seem to be getting less and less. In fact I think I noticed something that they said October or something last year, they were killing off CPU, load throttling because they introduced resource governor into it or something. There seem to be more proactive things they were doing to try and limit the possibility of those things happening.
Conor Cunningham: Yes this is an area of much focus for us. You will hopefully see continued improvement as we get into the dual pieces of this technology out the door. But you shouldn’t see too much CPU throttling anymore. I think we have pretty much disabled it at this point. You do have cases where just like on the box you have SLS scheduler yields for example. There are cases where in the past SQL Azure would get exuberant and just kill things and now what is happening is it is working more and more like the box product where you have to look for your DMVs and see where the blocking is and tune your application at the logical level which is probably more to our liking. Even then we can still work on improving that experience but that is the model I think will translate to people who are using the product today.
Greg Low: Yes, actually that is something I suppose we should then tackle is for particularly for people who haven’t started using it much. What do you get when you start provisioning things? Maybe the concept of a logical server and where databases are and the kind of master that is there and so on.
Conor Cunningham: Right, I will preface this by saying the SQL Azure model is still being evolving and it might change over the course of the next year or two. But you create a subscription and then you create a logical server. Logical server is not a real server to us, it is not a VM, and it is just an organizational abstract. It happens to be where you store some of your security credentials. The database you call Master, the logical Master. We set up security credentials and then on top of that you can create regular databases and what you are really dealing with are databases and like the contained database model on the box. We have a similar thing in SQL Azure; they are not exactly identical at this point. But over time they will probably converge and you create.
Greg Low: Actually we should also mention too, that you don’t choose the name of the server that is something that is allocated as well.
Conor Cunningham: Right, it is not something that you can pick, where cherry pick. You can report it and it points to a logical cluster, so you are able to get all of those things hooked up and generate it. You have ten characters today and it has a common prefix. You server name instead of being whatever your local net bios is and your local LAN or your local DNS name. It is actually the DNS name that we provide for you out of our cluster and the database name you can create whatever database name you want and it is in that name space. It is a name space that is perhaps the most interesting correlation to the regular platform.
Greg Low: So even the databases on a logical server could conceivably be easily be on different machines.
Conor Cunningham: Yes in fact there is no guarantee that your Master is the same machine. You generally spread around like peanut butter and you are moved around as your load balancer as this machine is busier. You are going to find that really it is a programming contract not a guaranteed physical location.
Greg Low: Yes that is great. So the next section of that which we will probably do a lot more of that on another day. I suppose is basic concept of federations have been introduced recently as well.
Conor Cunningham: Right, federation is a concept that also going to be evolved more but it is effectively trying to make it easy for you to deal with the fact that since you have these commodity machines you are eventually going to reach the limit of one of these databases can do. Current database size is 150GB and it will never be as big as whatever the largest databases you can create on SQL Server. You need a scale up machine; you need a big storage array to be able to handle that. That is not the kind of machines that we are buying, so federation tries make it easier for you to be able to have a whole bunch of databases that are masquerading as one database or typically OLTP application.
Greg Low: So one of the things I wanted to ask you about was, the idea that because the platform was evolving I noticed that even the interfaces interchange every few months on. What is the thinking in terms of guidance where people will need to plan for working over the top of a platform that isn’t completely static?
Conor Cunningham: Right so we are used to this world where we ship software every three years and later maybe adopt the version that Microsoft ships and everything had to be in a strong contracts. We will still have versions after that if you like to. I think you will find that we will have some features that we will start shipping that will help customers manage what version of my application that I will program against. It would probably be so much similar to how you think about programming against the .NET framework. You pick a particular version and then you write code against that and validate that later. We are fundamentally shrink software regularly like ever month and sometimes we are changing some things along the way. I think what you will find is, we have some portions of our API which are fully supported and those will be fairly stable.
There are other portions where we are still evolving and maybe marked as preview or something like that. In this case if you want to play with the features you can wait until they are done and fully supported or you can use them early and potentially use them to your advantage but you also might have to take a step back occasionally and fix some things. It is not our intent to try and make it difficult to use the platform. We are just trying to get to a point where we are interacting with our customers on a much quicker basis that we look for. Sometimes we ship features in the box where they do 80% of what is needed but because of our engineering model we weren’t able to see that the last 20% was where the value was. Then it is just too late, because we have had to bake it for years and we just can’t fix it. In this model it is very interesting, we are effectively re-writing the engineering book as we go and say hey let’s go and work for some customers upfront and get it to a point where we are quite happy with it and then we will ship it for general availability. You will actually see that we will do work internally where we are going to partner with just one customer and they will be able to build stuff for a little while internally and they get it working and we say uh we are able to change our feature and build the final version of it.
Greg Low: Yes so maybe we should summarize too where the feature levels are at in the current version of the product compared to the boxed product.
Conor Cunningham: Sure I think you find that things are not exactly the same. They will never really going to be the same, essentially just with database concepts and not with whole server concepts. You don’t have the ability to do cross database queries today. There a whole lot of things that isn’t there. There is not Service Broker and no distributed queries.
Greg Low: Actually I did get a question from one of the guys in New Zealand specifically about that one. Is to whether that is something that is likely to appear, I mean you can never talk specifics but I mean is that on the radar or is it a long, long way away?
Conor Cunningham: Let’s backup, there is a question of will it ever be exactly the same as the box? I think the answer will be no. There will be some services that we do not take away from the box, independent of which feature we are talking about. In SQL Azure today, there is already a feature called Service Bus which effectively does the same as Broker Queues. I think that there is a solution for customers who want to do transactional queuing right now and it worked fine. I talk with that team regularly and they are working really, really hard on it. The way I would phrase it is first look at the broker queue stuff, then look at Service Bus. See if there is an actual delta, other than programming surface and you can’t build the solution you want and if there is something that is missing there then we would like to talk about it. That is probably the way the broker question should be answered.
More broadly there are set of features that we will either have to build replacements for or figure out other paradigms that you could use to solve the same problem and we will servicing those over time. I don’t think there is a strict time frame for when we will have a given feature out. We prioritize based on what is necessary for us to the go out to the market and be successful and obviously we want to help our customers with all the things that are missing in our platform.
The way when I first looked at this right, as a box guy it was very easy to say oh look at all the things that aren’t there. How can the platform be used for anything? I would spend a month kinda of being angry about that, then I sort of looking at it some more then I go wait a minute. You can’t actually build this kind of app and this kind of app and this kind of app. As soon as you start putting that together with and it will cost you less money and I don’t have to wait for my IP shop to eventually put a machine or tell me that they are not going to do it then you start thinking about it and you go this is really just a paradigm difference.
You have to think about it a little differently than just here is the blow by blow feature comparison. There will be things that the regular boxed product does. It has taken years and years to develop this huge eco system of things that it can do. We are not really trying to make it so that you can just port your applications with a lift and shift. We are trying to make it so you can build your solution and leverage the platform and what you really need to think about it. When you start getting into transactional queuing to consider the example, scale out platforms will actually want to separate different functions of your application. And how they could potentially be on different databases anyway because you will be able to leverage the fact that it can be a on a different machine using different isolated computing resources than other parts of your application in your data tier and that is really where you start to see the light bulbs go on and people go. I see, I assumed that I actually put a queue in my database. The real question should be, what is the data flow that you want your application to have and then what does the resource need to be to part of that data flow. Once you start trying to write all that out, the answer where you can build a scale out platform can often be often quite different from a scale up platform.
Greg Low: Actually it is an interesting point that you raise too. When we are mentioning federations a minute ago and the idea that a table might also be spread across a whole lot of underlying databases and it is interesting the question that comes about why can’t I do a fan out query I often think that is the wrong question because that is part of the reason of wanting to push that out to all those things that I could be doing a query from the app for example in parallel to each one of those at the same time I don’t know that I really do want to be sending one query down and having it span out anyway.
Conor Cunningham: Well I Greg you remember I am a QPI and I always want to do fan out queries. I think the way you would look at it is that there are some operations that OLTP commands. In order to get the scale out to work you wanted to touch that one node exactly. You don’t want to go through some routing node and in other cases you might want to do a report for your whole server and say hey how many people do I have and that report might want to run once a week. That doesn’t mean you have to touch them all in parallel because there are a lot of different algorithms that you can use to get that answer. Figuring out ways to make it easier to use that scale out platform to solve each of those different kinds of use cases. It is something that that hopefully we will be showing you more in the future.
Greg Low: Yes listen another thing I wanted to touch on is I get the question from people all the time about what are my options for backups and restores? I mean they accept that you are doing a great job looking after it but they also want to roll back or things or just things they want to do and so they want more control in that area.
Conor Cunningham: Today SQL Azure automatically backs up your databases for you and it provides you with high availability guaranteed to say hey I would like to be able to make sure that if I give it something that it will really commit and lose the machine right away then you can keep your data. So that part is there and we are constantly providing that guarantee from a backup and restore standpoint. Let’s say that you want to do other operations right, where you want to save a copy of your database just in case the user changes that you are going to make goes bad. Let’s say you are upgrading and changing your database and you mess it up half way through.
How do you get back to your pre upgrade state? We have this import/export service which I think is our first offering in this space rather than our final offering in this space. But it gives you some of the technology that you need to be able to take a snapshot at a particular point in time and then restore it back later. There is also a technology we have to create a copy of your database so we can do create database as a copy of and in a few minutes you will have a copy of your database on the same cluster.
Greg Low: Actually subtle different there too. My understanding, correct me if I am wrong. When you do the export that is not actually a transactionally consistent one but if you do a copy of create a copy of the database you do get a transactionally consistent copy. They were saying if you want to export a transactionally consistent one you make a copy then export it instead.
Conor Cunningham: Greg yes you are right, as the technology currently stands there is no guarantee directly on the export so you have other operations happening on the database. You want to think about how consistent it needs to be or do the operation you describe to get a copy so that you can export it in a transaction.
These technologies are not a full set of solutions right we don’t have the ability to say I want to restore from last week’s copy via our backup that we have in the backend for example. We don’t have the ability to give you fully transactional export that works great all the time. We are still in the process of improving those technologies. I think you will find that we also are interested in providing that additional level of control for customers so that you have ability to understand what they can do and what their options are for dealing with that.
One point I will make, this is a game where at the end of the day you have to pay for the storage. You have to pay for the computer resources that you are using. Storing an indefinite number of backups is not something that we can do but we will hopefully provide options for customers that need longer backup retention policies or they have more specialized desires around the order of backups and restores will have the ability to do so at a reasonable cost.
We are looking at what is the model for how do we expose that so that can really get what they need.
Greg Low: I think it is probably worth noting too, the idea that when I copy a database I create another database but when I do an export what I am actually doing is sending things to Azure storage instead.
Conor Cunningham: Right you are creating essentially a bacpac.
Greg Low: A blob, yes.
Conor Cunningham: And putting it out into the Windows Azure blob storage. This is a different model, it is effectively a file system but you are paying for the storage and this is no different than any other hosting models effectively but it is a different model for those who are used to just having the out of the box product. Figuring out how to manage what is the storage, how much do I store and how big is it? How do I make sure I get every dollar spent wisely? It is going to require some thought to make sure you get the model right. We are exposing the cost effectively. This is the cost of our overhead to run the service and then you have to go and figure out what is my business need? Interesting aspect is every time I talk to a new business I come up with a different answer for that based on what is their revenue model? What is their overhead? What are their biggest concerns so if you are financial firm then backups are very, very key? If you are just doing a trading of football cards or baseball cards or whatever so maybe it is not quite as important if it is down all day. That sort of picture you can think about and then the answer will come out very differently as a result.
Greg Low: Yes I think the other thing is I much admit in recent times, they seemed to have dropped the price of storage significantly. If anything the reaction I get from people is they are just surprises to just how low it is actually. People seem quite happy with the storage costs.
Conor Cunningham: I think you are going to find that there will be continued competition on price to make sure we provide a great option for customers. This is ultimately an option that is intended to save customers money. Obviously there are the transition costs and there is the notion of figuring out how to use the platform. You will have a lot of options that will be cheaper because we built a lot of software that automate many of the things that require a human to go and do the work otherwise. So if we give you options to guarantee that the database is readily available so that you can restore this copy and we do so without requiring people except to replace hard drives if they die. That is going to be a lot cheaper than what most people who are not in the IT business can do for them.
Greg Low: Yes that is another good point. Even before when you were talking about the fact that you keep three copies of the data I mean it is a very rare sight that I would walk into that anybody does that now. Let alone the fact that is available is excellent but I think the big story for the businesses is the agility. When I look at the whole picture that is one of the things that I think is such a compelling story. The idea that I go into sites and when they are talking about provisioning a server they are normally talking about weeks or months and the idea that I can just spin up another server in a few minutes later there is another server that I can start working against that. The idea that comes out of recurrent expenditure instead of capital expenditure there is just a lot of compelling business argument.
Conor Cunningham: Yes this is definitely one of the demos that we like to show people. We sort of walk up and we run create database and less than a few minutes later you are up and running with a database. You didn’t have to do anything except put a credit card number in. That is very, very different for anyone who has had to go and provision a server before. It can take weeks or months to get approval for the machine, the space, get everyone to say yes to be able to do something very, very basic sometimes. Now all you have to say is look it will cost you this and you can have it. So if you want a new database here it is. Have it. I think that will work for a lot of cases that customers today don’t realize how hard and how expensive it is for them to go through the loop for that process.
Greg Low: Yes, I think people should not underestimate how much of an impact that is going to have. I think that agility is the key thing. Another question we should mention to be I suppose connectivity. We do need to make https connections or SSLs connection and at the moment SQL Server authentication is the other option there.
Conor Cunningham: Obviously we have SSL because there are passwords falling over the internet and that is for customer safety and it goes back to the point we have looked through the end to end feature and try and decide whether the policy would make more sense that hosted an environment like this. What was the other question you had?
Greg Low: At the moment we are very much SQL authentication and I suppose that is other question I do keep getting is the whole what about if we want to hook up with some other authentication provider or something or Windows?
Conor Cunningham: I think you will find more will be coming and it is just a question when we can make this available to you not like a desire on our side. We obviously know that this is something the customer wants and we also want. We also want customers to use those other options as well. So building full solutions that make sense and cloud architecture take a little bit of time to get right and we are in the process of trying to get that. Don’t stop looking you will probably see something in the not too distant future.
Greg Low: I suppose the other two that I want to mention. First up, the first one is data sync and that is another service that is available. I haven’t looked to see if it is still in preview or not. I think it is, but it is sort of a replication and like technology that we have and moving stuff around. I haven’t used it that much myself but it does give you kind of an ability if I want to tie together my on premise system and my SQL Azure database to try and keep them in line.
Yes it basically seems to base around change tracking and there is a very straight forward options if you are just doing Azure database to Azure database. I mean that is like very, very easy and they have an agent service that you install on Windows, as if you want to connect up on on-premises machine into the same thing but again a place I was at yesterday was sort of looking at 70 or 80 stores and having centralized data and stuff. The data sync looks like it will be a very, very strong potential for that because at the moment they have to have the head office with connectivity issues to and from all the different stores and things. There is something nice about having a hub sitting in the cloud that each of the stores connects to independently and they sync with that as well. I think that is going to end up being a very good story.
Conor Cunningham: Yes I think for customers that have these hybrid solutions they will want to be able to have their on-premise talk to the cloud story. This is going to be a great way for them to have that. They just connect to the application scenarios show up all the time and getting customers’ options to be able to effectively build these solutions as a bit so if you have customers that have problems with it or have feedback obviously send them my way. Overall I haven’t heard too much state about it, it is just when is going to be out and I don’t know the update on that.
Greg Low: I notice the agents and things have been out in a preview as you say publicly available preview and things. I think the agents up to about the 7th revision of it and something; it is getting fairly mature at this point.
Conor Cunningham: Yes I don’t think that I would characterize it as something that hasn’t been invested in or anything. It is just a question of when they are going to actually officially say it. I haven’t talked to the guy in charge of the team in a little while to see what current state it is. I would expect that it is not too far.
Greg Low: Yes I think it is going to be a pretty good story that one. The other one that I suppose rounding out the story at the moment that they have released recently is the Windows Azure SQL Reporting as well.
Conor Cunningham: Yes, so the Reporting Services stuff is just new and we have just put out a version of that not too long ago and I think you are going to find the whole eco system of what you are used to and the SQL Server boxed product is going to have equivalence in most cases on each platform. It will work a little bit differently in the sense that you have to think about provisioning just like you do for the box. It is a story you have to worry about, what are the costs associated with running each piece? Running Reporting Services is going to make a lot of sense; especially near you data and you will want to be able to have great options there. We want to have a whole suite of solutions, you will see that other parts to the system that are not there right now. You will probably see those come up in the future as well.
Greg Low: I suppose I should mention that at the moment it is very much for reporting just off Azure data as well and again there are a number of things that are on premises that it doesn’t do. What it does is it lends itself very nicely to embed reporting in web apps that are sitting in Azure and based on Azure data that lends itself quite well to that. In fact the only down side I have been hearing about that one is, like I was saying when people are really surprise how low the cost is for the storage one I am actually getting the reverse reaction in terms of the Reporting one. That is some feedback I must send back to the team. I just keep hearing the way it is structured at the moment, they have a fee that is based on whether you use it or not sort of thing and it just seems out of proportion to the rest of the pricing to the other products. It is going to be interesting; anyway I will be feeding that back to them to see how it goes.
I think the idea again of having the agility of being able to spin up a Reporting server and having that sit over the top of it straight away in a very short period of time is really quite impressive.
Conor Cunningham: I will talk to the team to see what their current thoughts are but I think you will find that a number of used cases the pricing is something that even in the regular SQL Azure databases stuff we modified the pricing one or twice because we learn about the market and figure out the way to make sure we deliver the right value proposition to the customers. If you find that there are cases where it doesn’t quite line up with the value it is to your business then give feedback to the company and keep taking a new account. This is one of those cases where in the past you know we would set the pricing for our boxed product and that would be it right. That would be three years later we might revise it again.
Here you will find that it will continue to look at the market pressures and what are the sorts of things we have to do in order to grow our platform. It is a brand new game and we are having a lot of fun with it. It doesn’t mean that the very first iteration would be perfect. That is the one thing I will ask customers to be patient about and remember that if you find that it doesn’t work, show us what you are thinking and that is obviously great input for us to be able to revise future pricing.
Greg Low: Listen, the final thing I want to ask you about in terms of tooling is I also get asked about things like the equivalent of profiler or extended events or something in terms of tracing or things like that. It is how do I do something equivalent to that when I am working against databases and I wonder is there any guidance around in anything like that at present?
Conor Cunningham: There are no extended events or SQL Profiler story which is unfortunate. It is a little difficult to get a full solution, 100% the same way as the box. There is also a ton of value to be able to do tracing. I would say at some in the future you will find that there will be something that works for you. It may not be exactly perfect but it will come up and have many of the same use cases that you have now. This is an area where we area where we are greatly aware of a need, there are other technologies where sometimes people use tracing and other things. In some other cases they may be delivering other technologies instead but we feel like we can solve the problem more easily than in certain elements on what you might otherwise trace. The ability the trace is very valuable. The way I would describe it is that there is a need trace both the back end database which is traditional from the box and also the ability to trace from the client perspective, SQL client.
Greg Low: Yes.
Conor Cunningham: And this will evolve as well and you will find that figuring out how to effectively trace my managed systems is key. Especially when you start running at the scales I see at large customers I am working at because if you have one database and you are trying to debug it and trace it. You look at it manually and life is good. If you are trying to run a service with a thousand databases or ten thousand databases or a hundred thousand databases then all of a sudden how do you know single person can look at all the traces and it will be a little silly? You have to think a little bit differently about what it means to be successful in this world because you really can’t look at every single issue. You have to think about it statistically and worry about what is the scope of my biggest problem this morning and then work on the next biggest problem and sometimes problems will just occur until they become big enough to worry about because there is a separation in this world between getting the system back working again to understanding the root cause of every issue. Sometimes it is a lot cheaper to solve the former rather than the latter and separating those two is very difficult for engineers like I but once you start thinking about those two points distinctly then you can say you know what.
Sometimes the problem is solved by just failing that out and that is the fastest way to the get customer working again. We can then work on how do we catch and solve the issue about root cause. Statistically instead of working individually with the debugger on that particular machine right now.
Greg Low: Now that is great. Listen that has been an excellent summary of where things are at thanks Connor that is great. Listen for people’s interest I suppose is there a life outside SQL? Is there anything else you are passionate about apart from the product and family or something I suppose? Is there anything else beyond that?
Conor Cunningham: I have been spending a lot of time on SQL Server and SQL Azure in the past 12 months but I will tell you that I am a big sports fan. I like ice hockey, I like American football both college and professional. I am actually a big fan of Australian Rules football, although it is not on the television much here.
Greg Low: It is an amazing game.
Conor Cunningham: I spent many years completely years, because they never explained to Americans what exactly the rules are.
Greg Low: You mean there are rules?
Conor Cunningham: It took some time for me to discern any sense of rules. Maybe there are unofficial or gentlemen hand rules. I do like sports and I am also a big soccer fan. We have a bunch of things that I like to deal with here. I am trying to teach my 6 year old daughter American football, so each year I take here to one or two games and see if I can convince her difference between offence and defense. So far she likes cotton candy which you get at football games here.
Greg Low: Excellent, actually my wife, it is funny which is weird as an Australian when I grew up playing baseball and that has been a passionate thing for me for a long time. I finally got to take my wife to a baseball game in the US a year or two back. It was kind of interesting; it was just amazing for me that she has never been to a baseball game.
Conor Cunningham: No it is great to go and see those things. It is an area where I like the competition aspect of it and I think it is wonderful when your team does well. It is sad when you team doesn’t do well and you have to kind of go for them either way. That is one of the things I like to do and sort of unwind. I am also into gardening here at the house. My wife and I are really big into travelling and we like to go and visit other countries and see what the difference about other places is?
Greg Low: What is the favorite place you have visited? Without putting you on the spot.
Conor Cunningham: If I have to pick one place that we all like, we are all big fans of Paris. My daughter is in a French preschool so she can speak French now. We like to take her back there to practice her French. We don’t let here eat unless she orders in French to the waiter. I say that is probably the one place we really enjoy. My family is originally from Ireland so I obviously like Ireland as well.
Greg Low: I think the, what part of the. I think the south west corner of Ireland I think is amongst my most favourite parts of the world.
Conor Cunningham: It is beautiful there; I have family in Cork which is the south of Ireland. I have some friends and relatives in the North of Ireland. My parents are both technically from the north and now some of my extended family lives in the south. This is a space where we just love to go and visit people and see things and that is something that I have a passion for. So I plan our trips and in fact I was just planning our trip for next year and think about it head of time and it is something we really look forward to.
Greg Low: That is great, listen so is there anywhere where people will see you coming up. Any events or things?
Conor Cunningham: Right, this is a good question. I believe my next speaking engagement officially will be at SQL Rally in Denmark in October. I need to put this into my blog actually because I just booked the trip in the last few days to make sure that I got there on time. I have been spending a lot of time obviously working on cloud service vendor architecture. I will be talking at that conference about Query Optimizer there but I also visit a lot of customer’s one on one. A lot of that conversation still happens in that space because they are so new we are under NDA. I don’t believe I will be speaking at PASS this year, but I probably do talks for MVPs or groups inside or things like that. I think there will probably be some things in that category as well. I think that you will typically see me at two or three conferences a year and I will go out and find a place where I can go and visit and meet some customers I haven’t seen before.
I have been going to SQL Bits a number of years now.
Greg Low: Yes it is an excellent conference.
Conor Cunningham: I gave one of the key notes. I really love visiting there, the people there are great and it is just a wonderful setup and I get lots of great questions from customers and get lots of feedback to help us drive the product forward. I use this as an opportunity to really help drive what is the pulse of our customers feed back into the product team. It is something I didn’t realize how important it was. I knew it was important but didn’t really value how exactly how important it was until we started trying to do this new platform and fundamentally we as an engineering team had to get a lot better at listening to our customers. Especially we should have monthly cycles and that skill set that I developed for the past several years working with customers on the boxed product has translated to one of the reasons they have asked me to spend more time on SQL Azure product to help develop it now.
Greg Low: Yes, that is great. Well listen thank you very much for your time today Connor.
Conor Cunningham: Hey it is great, I appreciate the opportunity to talk to you and your listeners and I hope to see you again soon. Hopefully I will get down to Australia.
Greg Low: Could do, that’s great ta!