Geoff Hiten is a Senior Database Consultant with IntelliNet. He has over 15 years experience working with SQL Server since Version 4.2. He’s worked in many roles from programming to systems administration, database administration, and consulting.
Greg Low: Introducing Show 34 with guest Geoff Hiten.
Our guest today is Geoff Hiten. Geoff is a Senior Database Consultant with IntelliNet. He has over 15 years experience working with SQL Server since Version 4.2. He’s worked in many roles from programming to systems administration, database administration, and consulting. Welcome Geoff.
Geoff Hiten: Thank you, Greg.
Greg Low: As I ask everyone, how did you come to be involved with SQL Server?
Geoff Hiten: This is an odd one. I was working, doing process automation software at a steel mill in Alabama. For those not familiar with U.S. geography, that’s in the southeast U.S. Georgia, is on the coast, Florida with its panhandle, and to the immediate left, Alabama. We were trying to come up with a data collection system for process collected data to determine quality of our finished product. We were on a shoestring budget, making up stuff. Trying to find good way to report, organize, and present the data. I and a friend saw this new SQL database thing. We had been building in dBase. This looked cool, different. We built our system around that and it’s grown from there. One job led to another to another.
Greg Low: Outstanding. In amongst the SQL MVP community, when people talk clustering, Jeff is who they ask. That’s why we had to have him on the show today. Talk with me about SQL clustering? What are your perceptions on where the technology fits and can you give us a good description of the technology? There are a lot of misconceptions.
Geoff Hiten: Exactly. I’d like to start with what clustering isn’t. Clustering, especially in the database world is an overused term. If you look at different vendors, it means something different to each vendor. What clustering is not is a scale out technology. It doesn’t make your system bigger, faster, handle bigger loads. It’s a fail over technology. A way to gain that magic four hours, those four hours from when the system crashes to when you get a person in front of it who can do something about it. Most support contracts specify four hours response, where I get that number, that and practical experience. If you’re a one man shop, if it crashes at worst moment, might be four hours before you find out about it, wake up, get dressed, go in, and take care of it. Clustering is designed to cover for that magic four hours for you. It’s number one benefit.
Greg Low: It’s good you mention the non scale out. I can’t remember the number of discussions I’ve had where people are thinking of implementing a multi node cluster to scale out for performance. I always say oh no.
Geoff Hiten: A lot comes from an unfortunate choice of terminology from when clustering began. It first came into Microsoft realm around time of SQL Server 7. The Wolf Pack add on for NT 4.0 for Microsoft clustering. SQL 7 came out about same time. Richard Waymire told me that there was pressure from management at Microsoft that included taking advantage of Wolf Pack in SQL Server 7. But it was a bit late in the development to bolt it all on. Clustering are really low level APIs. You have to test the product rigorously to ensure it works. They bolted on a solution that ended up where you had a primary server of the active server. You could cluster with a passive server, the failover. Different from what we have now. You’d end up with active path and people would want to use the other node, passive node. They came up with the idea of making an active instance on one, passive on the other and call it active. That was great for SQL Server 7. In 2000, it changed. Multiple instances of SQL Server in 2000. You got on standalone box. Cluster as well. Stack as many instances as you wanted on a cluster. People would think active cluster, two nodes, two instances, then got crazy with four nodes, six nodes. Then Windows 2003 expanded it out. It got to four nodes with SQL Server 2000 and now with 2005, only limited by how many the OS supports on a cluster. When you start looking at four nodes, three instances, active/active/active/passive or it could be active/passive/passive/active. What is the terminology? Breaks down and doesn’t mean what it says SQL still sees it as active/active. Two active notes talking to the same interface, no! It doesn’t work that way and it’s amazing how many times I have to explain to clients, that that isn’t how it works.
Greg Low: It clearly is a common misconception I get that question regularly when people are trying to improve performance. They think you can put a cluster in and that isn’t the case.
Geoff Hiten: Yeah, it’s not going to get you what you think it is. What’s unfortunate also is that some people who get what a cluster is think it’s a high availability solution and it’s a means of implementing a high availability solution. They don’t understand it’s not magic high availability all by itself; there are a lot of other steps that have to go into a truly high available solution. Things like solid network infrastructure, good power and an application that can handle the peculiarities of a cluster failover. All that is necessary in order to take advantage of what clustering can provide. In some cases where you don’t have the quality of infrastructure, applications or whatever else you need, clustering can actually lower your availability especially if you try to do it with uncertified or leftover hardware; you can actually make the situation worse and not better.
Greg Low: In one of the challenges often described with clustering is the hardware compatibility list and the fact that it takes ages to get onto that list. It’s tricky for people when they’re out buying hardware off the shelf to effectively do clustering. In general most of the successful ones tend to be more the vendor providing a whole solution.
Geoff Hiten: Very much so, although, you don’t have to have a complete end to end vendor solution. I’ve found that in a lot of cases the real key is the storage vendor. That’s where most of the magic happens in clustering. The host hardware computer. as long as it’s basic Windows hardware compatible and doesn’t have any particular oddities to them are usually ok. The one mistake you can do is buy one now and the next in six to eight months and add it to the cluster, because you’re going to get hardware revisions and BIOS revisions and differences in the computer and clustering wants every node as identical as possible. So if you’re going to cluster buy the machines all at once.
Greg Low: Do you tend to buy extra?
Geoff Hiten: Generally no, because you’re going to put these machines on some type of service contracts, the vendors are normally to provide solutions. If you modify heavily you need to make sure everything stays in cluster specifications. They don’t normally like it but can’t argue with you because you’re saying you’re staying within the cluster specifications. The storage vendor is really the key to this. They’re the ones that have to make sure that whenever the shared access storage, I don’t like to call it shared because it implies the whole active/active thing. Their shared connection but only one has true access at any moment. To be a multi connected device you have to be consistent when going from node to node to node. If the storage vendor has it right everything else generally falls in place, if they don’t nothing else isn’t going to make a difference.
Greg Low: So, in a general summary the overall hardware, what we’re talking about with clustering then, multiple physical servers, but basically shared…..
Greg Hiten: They’re called nodes and clusters.
Greg Low: Sorry not familiar with the terminology. Then we have a shared disc sub system so we have a single disc system that may be redundant in it and hopefully it is, but all we are clustering, all the pair of things, the server and not the disc sub system.
Greg Hiten: Correct, usually your disc sub system has everything but the sheet metal, which is considered redundant in a good, well designed sub system. It could be all the way to a mid-range or enterprise grade SAN, or a basic SCSI controller with two connectors on it, one for each host node. Obviously the more you put into and the more you do, the better off you’ll be. Some people who start off with the basic, dumb, SCSI boxes are going to have performance issues because you have to turn off your controller cache, because the cache can’t fail over, so you have to do without it. A lot of people who try clustering on a very inexpensive budget and the performance goes down. They wonder what happened. Well your running a cache without any kind of battery backup or a controller without any kind of cache it’s just flat to the disc and yeah it’s going to hurt your performance.
Greg Low: Yeah.
Geoff Hiten: Move up a little bit and you can get to where the cache and controller are embedded into the array and all you have is a basic dumb connector type controller in the host computer. That means you get your performance and cache back. A lot of times you get redundant controllers in there, so if the controller fails you still keep going. Of course if you move up to mid-range, with SANs you usually you want to dedicate those to SQL, especially when you’re talking about quad proc boxes or greater. Then you have your full enterprise grade SANs which that’s a whole different beast, you usually have experts on staff with the vendor that will be happy to help given the number of zeros in the contract value.
Greg Low: One of the questions that comes up endlessly when talking to the SAN vendors you endlessly discuss the configuration of physical disc versus logical discs and so on and often many of the SAN vendors start the discussion with “all you need to worry about is the logical and the physical doesn’t matter to you.”
Geoff Hiten: Yeah the magic SAN dust that takes IO and X physical discs and makes them magically have more IO’s, some vendors do this. Most of the time it’s because they’re optimizing for the wrong parameter. Most SAN vendors are selling storage, that is megabyte, gigabyte, or whatever. They show their customers the return on an investment for buying a SAN is saving the disc management costs and disc costs and the way they do this is showing the most effective space utilization of the storage that means RAID-5 and very large RAID stripes. What a SQL server needs in most cases is not all that much space. I can go down to my local Fry’s and buy a half terabyte drive, one disc. Of course if I try to do a lot of my database stuff on it it’s going to take days or weeks instead of hours or minutes to run. The reason is IO cycles. The typical disc generally will give you about 200 reads and about 50 writes per second. That number hasn’t grown dramatically over the past few years. Not like the storage and processing capacity. The actually physical number of reads the system can do actually hasn’t changed much. So when you start putting all the over head of a RAID-5 stripe where you have to read all the blocks when calculating the parity of blocks, you can all of a sudden saturate your reads by making a few writes and there’s no room for reads.
Greg Low: Let me interrupt and let’s quickly explain the whole concept. When you’re writing to RAID-5, it isn’t just a simple process.
Geoff Hiten: There’s a simple formula based on the number of discs. Let’s do a quick RAID-5 overview. RAID-5 is where you take multiple discs and stripe them together except one block out of the stripe. One discs worth becomes parity and that parity block is ¬¬¬¬¬¬¬¬designed that if you drop a disc out of the set it can calculate what is missing. That parity is rotated around so that on one stripe it is on disk one then on the next it’s on disk two, then the next on disk three balances the load. It was originally a redundant array of inexpensive discs; now it’s a redundant array of independent discs. The idea being that you can simple bricks of storage and make much larger storage units out of them. It works when what you are measuring is space. If that’s what you want from RAID-5 it’s great, but remember what SQL wants is IO. You’re pushing things out to the disk and pulling things back. Most SANs I find are actually mistuned for SQL. What they do is run the RAID-5 large cache but they are designed to pull in small files. With SQL we have very large files but we want specific blocks out of them, so the tuning is all wrong. But when you are dealing with RAID-5, reads are pretty straight forward. You’re not pulling data off the disc, but when you write RAID-5 the performance is actually in adversely proportional to the number of discs in the stripe. The more discs you have in the stripe the worse your write performance. You can ride one block which is usually how SQL does it. Wants to write one page of data here and one over there and one somewhere else. It’s going to write one block in that stripe. It has to read all of the other discs except for the parity stripe because that one is being calculated. Of course the blocking that is being written it does not have to read. Say I have eight discs in my stripe. I read six blocks and write two blocks. If it was a single disc I would be writing one block. In this case I am doing six reads and two writes. And double the writes. I am taking the capacity of the discs, the IO capacity and trading that for storage. For most applications that’s not a bad thing. For SQL it’s the worst thing you can do. I had a client this week that I had to explain that to. They’re still drinking the KoolAid and believing that the magic SAN fixes everything. We have demonstrated that we have flattened the cache with SQL. This is the big Xbox 2005. It’s blasting through their two million dollar toy, and they just don’t believe it.
Greg Low: It’s good you mentioned about the cache. Another thing that comes out of their mouth , when you say yes it does still matter that you get the IS but the cache is so large. I hope that it won’t be an issue.
Geoff Hiten: Well I talk about they’ve got 8 gig of cache per controller. That’s not bad, I’ve got 32 gigs of cache in my SQL so 32 gigs of RAM in the SQL Server, most of which is going to be data cache. I have a one 50 gig data file. I’ve got news for you; I am going to flatten your cache. Even in the best case though, once I have saturated the cache, it actually can hurt. It acts like a big bucket holding up my IO from getting to the disc
Greg Low: Yeah.
Geoff Hiten: Now when I am writing that is actually not so bad because as soon as I put something in a hardened SAN cache it goes yes I have completed the right operation and it turns it loose. If I need to read something from the SQL or from the SAN into SQL, I issue a read and I have to wait for it to go get the block off the disc and give it back to me. The more cache I have to buffer those read requests, the higher my latency is going to be. In the case of this particular client we were seeing latencies so high that it was affecting the clusters stability. Alive query to the discs was failing. Actually this kind of brings us back around to what is a cluster and how does it work.
Greg Low: Before we leave SANs there are a couple of other things. One is, I often come across in hosted environments shared SANs and that one scares me because I regularly end up in scenarios where periodically there are performance problems that are inexplicable.
Geoff Hiten: Again, with this client I was brought in. One of their systems was mysteriously crashing in the middle of the night. It turned out that a different system was saturating the RAID stripe and this system was unable to get anything to or from the disc.
Greg Low: Yeah.
Geoff Hiten: Because the first system was not clustered it was not quite as sensitive to it. It was simply complaining about IO latency, but there was nothing in the cluster that said, wait a minute that disc is dead from this machine, I need to move to another one. So it was actually forcing a separate cluster to fail over. They keep telling me that they get these little stripes off these very large RAIDs and that’s going to fix all my problems and I am looking at it going, no that is causing the problem then they offer to sell me the whole SAN you know internal dollars of course, sell me the whole SAN for my SQL application and I am like, ah nothing personal but if I am buying one for SQL it is not going to be that one.
Greg Low: Yeah, now listen do you have a……..
Geoff Hiten: (Laughs) We are not going to get into brands.
Greg Low: The other question I have got in there still to that comes up is block sizes and alignment.
Geoff Hiten: Okay, the first one is really simple. It actually took me awhile to chase down the folks at Microsoft to get a recommendation on block size. It’s actually pretty simple when you finally listen to their answer and go well yeah, duh. Go with what the manufacturer default recommends, because that is what they tune their box to run best at. The best experiments they have been able to do at the Microsoft labs from what I gather was just a percentage point or two different by playing around with stripe size. It is really not worth the effort to where you want to put it. Alignment however is a huge issue. Not having it properly aligned can take out up to 40 percent of your IO capacity. For SQL Server that can be critical. That’s the difference between, you know making your overnight run window and not making your overnight run window.
Greg Low: And so actually maybe if you explain alignment?
Geoff Hiten: Yes. When whatever windows formats a disc using the NTFS file system, it creates what is called a Master Boot Record at the beginning. Even if it is not a bootable disc per say, it is going to create this MBR. This is where it places a signature and some other information about the disc. The interesting thing is that Microsoft chooses a very odd size for that MBR. It shows 32.5 K for Windows 2003. Do not ask me where that comes from. So whatever the old one was and padded it up and that was enough. They didn’t think about alignment. Now when you are doing mapping straight to blocks and discs it doesn’t matter. But underneath everything, under the SAN magic, most of the time works pretty good for files, it takes very large disc stripes so that it does pretty good sized read and ride blocks into and out of the disc saving head movement and stuff like that. It does these large read and rides and creates these stripes and abstracts those back into the logically block that a regular physical disc would see. So it takes a big block and slices it a little bit. What we would like to do is map our allocation blocks, our read ride blocks to those same underlying large stripes. Most of the time those are sixty four or one hundred and 28 K in size. So you want to create your MBR to be an integer multiple of that. SQL writes in 64 K extents that’s a nice round number to move everything to. The problem being you can’t do that simply by right clicking on the blank drive in disc manager and going partition. You actually have to use a command line tool to do this that tools into our service pack of our two versions of Server 2003, but you have to load it up separately from the Windows 2000 resource kit if you wanted to do it on any previous version. That’s really tricky if you don’t do this when you issue an IO that crosses that boundary. That single IO at the OS level breaks into two IOs at the SAN level, so it has to go read two 64 or one 28 case stripes. Again depending on your IO pattern it will cost you 40 percent of the throughput.
Greg Low: Actually another one, while we are at disc technology is alternative technology like solid state drives.
Geoff Hiten: Yes. Everything maps back into a physical disc unit where it blocks disc sectors because all of our disc management assumes that. It assumes that there is the same logical way out of the disc, that this is how it is broken up. It is not actually true memory mapping it. If you look at a small SD card or a USB stick, you will see that it is formatted as NTFS it looks like a block device. Everything is going to look like that. The big advantage you get on that is the lack of physical head movement. You don’t have to go subtle ahead, you don’t have to go move ahead. Go much, much higher. This is what Microsoft did when they built SuperFetch and ReadyBoost into the new Vista products. They use slower solid state devices. If you have ever tried to copy a large file into a USB key you know that is it faster to actually to copy it to a disc.
But, if you try to copy little bitty files, it does fairly well and that is what the technology they are using there for is to replace all these look ups for these little bitty DLL or reference files into the solid state and save all those head movements and all the rattling of the disc and on the net, it speeds it up. The idea being with solid state devices, you can get the same type of effect. I see why people try to use those for log devices because they think that is the most critical IO. In reality with a decent amount of cache, a disc, RAID-1 or RAID-10 layouts you are not going to be logged down on SQL server very much. Putting those as beta or even more importantly as key indexes, indexes tend to get fragmented, especially non clustered indexes. So having anything that can take that fragmented IO or that random IO and can kind of take away the fragmentation effect you never even have to bother with defragmenting the disc.
Greg Low: I think that is a key thing. They are really good for random IO because we had no seat time. In fact one of my friends in New Zealand was telling me that they had a process that had a huge amount of random IO and they changed to using one of these new RAM SAN drives and literally they had a process that went from ten hours to fourteen minutes. Something like that and it was completely ridiculous, the difference.
Geoff Hiten: One of the first applications that I saw of those, was actually not database. It was in a large scale email center where these people were generating large amounts of email both in and out bound and it needed to process it and they dropped a RAM disc behind everything and it was incredible what the difference was.
Greg Low: Have you seen that implemented much? SQL Server boxes shared?
Geoff Hiten: Not a lot, I have seen some people talk about it until you start talking about how much money it costs to put together one in the 100 gig plus range and then they decide to try regular discs first to see if they really need that. I have not seen anybody yet that was willing to trade the amount of money it takes to get the benefit out of it. The closest thing I see to that is people with small databases under 50 gig or so cramming 32 gig of RAM in a box and basically being able to hold most if not all their database in RAM at one point.
Greg Low: That is the absolute sweet spot. I have seen some that have very high throughput, but not huge data bases, so that sort of thing the difference is just breathtaking. No, that’s great. So anyway that kind of leaves us beyond SANs, so we have kind of dismissed some of the SAN magic dust. In terms of getting the disc sub system right, that’s a key thing. What are the other requirements hardware wise for clusters?
Geoff Hiten: You need at least two network connections. The cluster needs one way to talk out to the public world, so that the clients can connect to it also needs a private communication path. The reason being for any kind of state changes, sometimes it has to take NIC offline while it does certain things to it during the failover, so it always wants to have a communication path clear for its self. There is a second communication path and a lot of people want to try and put in these big gigabyte NIC’s and all that.
Greg Low: I should mention NIC is Network Interface Card.
Geoff Hiten: Yes. Anyway the heartbeat network which this is called has very low traffic. I wish I could get the old, old coaxial based, Ethernet not the twisted pair that we use today because one cable routed between them with little drops on them would be perfect for a multi node cluster. Two MB per second is more than adequate. You just need some way of saying are you there, who are you, I am in charge, you do this. Real simple, low impact stuff. It does need to be low latency though. That’s one of the reasons there is some challenges in what they are making as geo clusters what they are calling a cluster that is geographically separated of one location to another, so that you have physical fail over. You have one in New York and you want to have another one in Boston or pick two cities that you are familiar with and you want that, that is very challenging, because that communication has to happen rather quickly. It’s just not a very high volume of communication. So once you’ve got the network and of course you have had to be on different subnets, so that they don’t even see each other, matter of fact it should be that the heartbeat network is unrouted. It is completely isolated. It doesn’t ever get outside. The computers never use it to try and talk to anything in the outside world. Once you’ve got that the only other thing of course is the clustering software of a clustered virtual computer as it were, not to be confused with virtualization of running one OS inside of another but the virtual instance of SQL the virtual instant of anything. It really only has three major components. It has a disc or LUN to anchor it, it has an IP address and it has a network name. That constitutes a unique system within a cluster. Ah the cluster itself has one of those every SQL instance has one of those. Of course the SQL instances have the SQL application, the SQL server, the SQL agent, possible SQL full text. What we are doing is creating an isolated virtual system that can float between physical hosts. Then we make a cluster.
Greg Low: And I suppose another concept in there then is the basic disc layout that’s occurring with a cluster? I suppose I am leading to Corium drive.
Geoff Hiten: Exactly, the first thing you have to do is build the cluster itself before you ever install SQL. The cluster consists of the Corium drive the cluster name and the cluster IP address. These names and addresses are in addition to the basic IP addresses that the host nodes have and the basic computer names that the hosts have. I like to use host names when I am building test things like east and west. It tells you, I have got two different machines. I can use the virtual machine of north or south, so the Corium, the cluster itself can be, I like to use ocean names in my test clusters. I will name a test cluster Atlantic or, Pacific, or Indian, or something like that. It is just a naming convention I have gotten used to over the years. As I am building the Pacific cluster I’ll have it living on nodes, it will live on one of these nodes at a time. The machines that the clusters consist of are going to be east and west and the cluster as I said has the name, the IP address and the Corium drive. The Corium drive has a unique role in the cluster. It’s the tie breaker, one of the problems that a cluster can have is called split brain. If I cut off all communication between the nodes of a cluster, how does each node know which one should be running? The answer is whoever owns the Corium drive owns the cluster. For classic literature fans it’s the conch from Lord of the Flies. The Corium prevents split brain. Whoever owns it arbitrates all decisions and breaks all ties.
Greg Low: I’m sure that will sit with many of the listeners. That was good. Another one while we’re a little bit on the discish end of things. Your thoughts on SAN replication technologies?
Geoff Hiten: Well those are actually pretty powerful, especially now that Microsoft has actually put some pieces in to the SQL product to kind of help with that. They had to leverage the volume shadow service piece to allow you to do things like SAN snapshots. The SAN replications is used mostly to enable geographic clusters. Remember when we were talking earlier about the difficulty of having these clusters talking across these lines? One of the keys is you need a storage system that looks local to every node. The only way you do that is to have some type of a SAN replication in that and that needs to be real time. So you end up with some fairly severe geographic limitations. Something on the order of less than 100 kilometers. Not by a whole lot. It’s a batch upper limit. How far you can have a couple machines apart in real time synchronous mode. If you try to go much further then that you have to do some type of asynchronous and that means that there is going to be some down time or potential data loss if there is a catastrophic failure of the primary system.
Greg Low: And how is SQL Server aware or do they need to be?
Geoff Hiten: Not really, I mean there are some hooks in there from the manufacturers; the biggest thing that they need to be aware of is to keep multiple Logically Unit Number’s (LUN) in sync. To kind of go back and explain one of these. When you build up a SAN, if you take a lot of discs and you build a RAID set or what the manufacturer calls it. You take slices of these arrays and they are called LUNs. The LUNs are then zoned or presume into various hosts that are usually connected through, usually a fiber channel, fabric switch network. The LUNs look like physical disc drives to the host computers. It is very simple. The key reduces those, because we like in SQL to keep physical separate discs for our data versus our log versus temporary database. If I am going to fail that over from one storage system to another, I need to make sure that I am keeping them all in sync. One of the things that SQL assumes is that when I write to a disc, and it says it is done, it is going to keep it that way. When I write something into LUN A or LUN L which is going to be log and LUN N which is going to be my data, I can’t afford them to get out of another box. That is the one key piece you have to have. If you can do that, keep them in sync with each other, then the box really supports everything that SQL needs. That is the only thing it really has to be aware of.
Greg Low: Yeah, that’s great. Well listen that’s probably a good point, we’ll take a break and we will continue on after the break.
As I do with everyone I will also ask you. Is there a life outside of SQL Server?
Greg Hiten: Sometimes I wonder. The answer is yes there is. I’ve got a family and I have been married for about 15-1/2 years. A large number of my friends have lost money on that. They were betting on two or three months before she killed me. I have three daughters and even the dog, the cat and the guinea pig are female so it’s a very interesting house to live in.
Greg Low: I have three daughters as well.
Greg Hiten: One of my hobbies is tropical fish. It is a wonderful hobby to teach patience and careful management. It is a great hobby for a database analyst. You need to remember one thing. No matter what you do, your primary goal is to keep the fish alive. The more you change things, the more likely you are to kill them. You have to think properly and plan before you do anything drastic.
Greg Low: On a serious note, they often say that fish and things are good for kids because it actually teaches them that things do die.
Geoff Hiten: Well, to kind of bring that full circle, yes, we had have that happen. My wife and I got a dog about four weeks after we were married and we had to put the dog to sleep about a month ago. That was a rough time for all of us. It was an important life lesson for the family.
Greg Low: Yes it is.
Geoff Hiten: The dog was a family member for longer than any of the kids of course, but anyway, that’s enough of that.
Greg Low: Follow any of the local sports? Baseball, football, any of those?
Geoff Hiten: Well I am a huge fan of University of Alabama football. That’s where I went to school. It has a really great history. We have rough times lately but there is always next year. We had one of the greatest coaches ever many years ago. A guy named Paul Bear Bryant. There is a joke going around about how many Alabama fans does it take to change a light bulb? The answer is three, one to change it and two to talk about how bright that old bulb used to burn. I am also a big fan of NASCAR. That’s American stock car racing. That is always fun to watch. It is entertaining racing and entertaining personalities too. I believe we got one guy from Australia recently to try his hand at it. He’s working his way up the ranks and seems to be doing pretty well.
Greg Low: Yeah, I did notice one or two that was into the NASCAR sort of thing. There are a number of people locally here that sort of follow it, I suppose since the days of cable TV. You can watch almost anything from anywhere now.
Geoff Hiten: You got a point.
Greg Low: In fact it was funny when you were mentioning at the beginning, where your state was and you were trying to describe where that fitted in. It reminded me I have done quite a bit of training on some U.S. military bases and many times in Japan. It was sort of interesting on the military base; they get the same sort of cable TV with movies and things that you normally get at home. When they would normally have commercial breaks, the arrangement is they are not allowed to show the advertisements. They try and fill it in with other things. One of the things they have is a little show called What State is That. It sits there and describes a little bit about the state and you have to guess what the name of the state is, and then it shows you where it is. I cannot remember the amount to times I have sat there watching, What State is That. I am starting to get a pretty good idea where most of the states are. I just endlessly watched this show. Another little fill in, in the Navy was, What Section 53 Means to You.
Geoff Hiten: Sounds like truly compelling programming, yes.
Greg Low: It’s extraordinary.
Geoff Hiten: Let me guess, there wasn’t a remote or a way to change the channel? You were pretty much stuck?
Greg Low: It was dreadful. I was left under no illusion for example if I was caught with inappropriate substances while serving on a navy vessel. I knew exactly what would happen to me. No question about that. Anyways so back on to SANs, I suppose, I’m sorry on the clusters, maybe we should start with any tips and tricks related to installation.
Geoff Hiten: The first one is something I learned from my grandfather really. He was a master craftsman cabinet maker and one of the reasons that my online handle is SQL craftsman. It is sort of a way to kind of honor that. The first rule of wood working is measure twice, cut once. The first rule of clustering is plan it, plan it, and plan it again. Make sure you’ve got your entire network layout chosen. Make sure your discs are laid out carefully. Take your network names, you want to have a complete check list to go through when you are building this thing cause you don’t want to try to undo or back out, because a lot of times the only way to do it is to flatten the box and start over. Plan it carefully.
Greg Low: Yes that is an important message. From what I have seen most of the time, changing things later is hard.
Geoff Hiten: It can be very difficult and be very challenging. Clustering does have a couple of tricks in it that makes changing certain things later on fairly easy. For example if you wanted to put a larger drive in and you had filled up your hundred gig line, you wanted to swap that for a two hundred gig line, if you can take the outage time, you can swap the disc. Take the contents of this drive they call it drive in; copy the exact contents to the new disc. Change the drive letter, set the new dependency to fix everything in the clustering, bring it back up and it will work perfectly. The only thing SQL sees is the drive letter. Being able to stuff like this in kind of a swap out mode is something that is pretty unique to SQL and to SQL clustering.
Greg Low: So planning the important first up.
Geoff Hiten: Absolutely. Planning and make sure you’ve got all your hardware in there. Don’t try to build your first system as a production system. When I first started learning, we didn’t have good virtualization technology. As a matter of fact, I went to a class in 2001. It was Kimberly and Don’s first big high availability one that they did. It was taught in Redmond. Product support services actually built twenty five clusters. Physical clusters for us to test with and teach on. We learned using those and we actually ended up reusing them, while we were at lunch so that we could try different things with them. That’s how hard it was to do clustering and learn clustering when it was first popular, beyond some of the weird gyrations of SQL 2007. When we had SQL 2000 and we were able to do clustering in some kind of the same manner. Now we have virtual systems and we can build clusters in the virtual system. There is actually a TechNet how to web cast on exactly how to build SQL 2005 cluster using Microsoft Virtual Server. I advise anybody who wants to play with clustering to do that. It certainly beats trying to scrounge hardware and build a little junky external SCSI controller and or SCSI enclosure and connect everything up. What I used to call a cheap cluster. You couldn’t run anything on it but it was a great training tool. You can put that same training tool in a virtual system and carry it with you on a laptop.
Greg Low: Actually there’s also a TechNet webcast from Brad that walks through building a SQL Server 2005 cluster on a virtual server.
Geoff Hiten: Yes. I have sat in some of Brad’s presentations and have seen his material. He is very good and a very careful and thorough instructor on how to do clustering. He is an extremely good resource to look at.
Greg Low: Yeah, and actually Brad is now actually doing work now for RedGate directly. A quick shout out to Brad.
Geoff Hiten: Yeah, he has kind of taken over, especially their representation in thus to the user in the MVP community. He has really jumped into that role and has done a lot of good stuff. I know that I have seen a lot more responsiveness out of RedGate in terms of listening to what the community wants to do and trying to meet those needs before he was involved.
Greg Low: And a quick shout out to Rachel too, who is also the community lady. There have been really excellent code camps and things around the world as well. It is certainly worth looking at the webcast that Brad did or similar. He does at least one via screen by screen.
Geoff Hiten: Now once you build one you need to learn how to do certain pieces with it. You need to learn how to break it. In the documentation they describe two types of failures. Books online describes a type one failure where you completely loose the node and have to rebuild it and type two failure when you can actually recover the node, but only partially, you may have to do some things with it. Walk yourself through those scenarios because they are not as straight forward as you think, especially after you have had a cluster in for a little while and it has not applied any service packs or hot fixes to it. That’s where it gets entertaining.
Greg Low: You have wandered into my next question, which of course is related to applying service packs.
Geoff Hiten: It’s fun in SQL 2000. Simple. Go to the node that was the current host for the instance run the service pack, when you’re done you is done, you’re off for about 15 minutes or so and it’s all happy. Same thing with a hot fix. Beautiful, easy, but of course they couldn’t let it go. They had to make it a little more complicated. The problem comes in SQL 2005. The installer is not really cluster aware for all components. Only the SQL Server, SQL agent, SQL full text and analysis services are natively clustered. Everything else is node specific. One of the common questions I get on a two node cluster is, I failed over the second node. I can’t find management studio, what happened? Well the answer is you didn’t install it. What I actually like to do is go in and install all of the non cluster components, to all of the nodes first. I have got management studio, I’ve got the client configuration or the figuration tools or the service. I’ve got all that stuff sitting there waiting for me when I finally bring in the clustered engine pieces.
Greg Low: Yeah, now that’s important. In fact one of the other things that surprise people too is just like integration services not even being instance aware so there’s a complete difference with all these components as to what roles they can fill.
Geoff Hiten: Right. Now it’s possible to cluster integration services. Microsoft suggests you don’t do it. I‘ve actually come up with a little more complex rule that I try to follow. If you have multiple instances it’s probably a good idea so that you can control whets going on. If you have items residing on clustered resources that you need to get to, then that is probably a good idea. A lot of people keep their SSIS package on a file system. If you are going to do that in a cluster, you are going to have to cluster SSIS. There's a KB article on exactly how to do it. You read pass the first half that tells you don’t, then it tells you exactly what to do and it works. I am actually supporting a couple places that have clustered SSIS I had a real interesting time recently when we had to replace one of the failed nodes in the SSIS cluster. That gives you the reminder of always go through the exercise of testing it on virtual first. One of the things I have discovered because this is a post FB2 hot fix system. In other words we had an FB2 on it we later build, thirty one, sixty one, something like that on there. One of the easy things to do when you are trying to put one of these back together is to miss the reboot, especially on the so called passive node or the node you are not on the counsel of. There is another little gotcha I discovered on building clusters. If you are on SQL 2005 you can really only be logged into the node you are loading SQL from. If you try to log into one of the passive nodes, more than half the time the install will fail. It will be really vague as to how that happens. Installer doesn’t really pop up a lot of error messages. You have to go look at, find and analyze the log files to determine why this thing failed. When you are trying to do a repair it under a maintenance window is not always the easiest thing in the world. The need to practice and test it on the virtual system and get your check list to where you simply executing during your maintenance windows.
Greg Low: Actually another question. What’s your feeling on upgrades from 2000 to 2005?
Geoff Hiten: As far as cluster upgrades go I simply do not trust a full in place upgrade.
Greg Low: Great to hear someone else say that.
Geoff Hiten: I mean the guys at Microsoft are great. I actually sat down with the program manager in charge of the installation clustering group at the 2007 Summit. He was at the Ask the Expert booth for awhile. He had mock ups of all the cool new stuff. We will get to that later. I simply don’t trust the installer to do everything right. Clustering is high availability and if you run the installer and it doesn’t work, your only real fix is to flatten the box and start over. Personally I like to do a side by side migration if I have the room. Create another clustered system, move everything over. If I can’t, I at least try to do a data in place migration. That is where I detach my databases, connect them to the new clustered instances, and set my dependency. This is going to kind of break off to one little side here, this is something I get a lot as well. People put a new disc on the cluster and wonder why SQL can’t write to it. Very simply put the clustering has to have a series of dependencies. Certain things have to come online before others. I have to have the disc there before I can bring SQL up because it needs to see that. The clustering handles that with what it calls dependencies. SQL is dependent on the disc. SQL is smart enough in a cluster to see if I am not dependent on a disc, I cannot write data files to it. If you go in and set the dependencies using the advanced properties and the cluster administrator, you write right to it. If not, you will go kind of buggy trying to figure it out. That question pops up a huge amount.
Greg Low: I suppose the other thing is that before attaching the databases to 2005, that used to be 2000, I would be mighty sure I had a copy of those databases because there is also no going back once you have attached them.
Geoff Hiten: Absolutely. One of the high ends SANs can or even mid range SANs with the right software can help. About seven months ago I did an upgrade of a very large database. This is on the order of 600 gigs plus of an OLTP system. Massive, massive system. Thirty two bit box. It was actually going to a 16 processor dual core, 64 bit that, by the time its busy season rolled around, they put another eight cores onto it. Twenty four processors, 48 core, big honking system. What we did was actually use the SANs snapshot technology. We took a snapshot, much like the database snapshot, that uses a different C file then presented those LUNs over to the new system. If we would have failed we would have simply reverted the snapshot which would have been a relatively short amount of time. That was our roll back procedure. We didn’t have to have an extra 600 gigs of high IO capacity SAN storage. So definitely having a way to go back. Leverage your other technology sometimes, instead of having to make a back up, you’ve got that SANs snapshot technology, use it!
Greg Low: Yeah that is good, it seems to be a common misconception that if you take a 2000 database and attach it to 2005. I often have people that say I often will leave it at database compatibility level eight.
Geoff Hiten: The compatibility level is simply a syntax resolution. If there is any ambiguity in the syntax between SQL 8 and SQL 9, or 2000 or 2005, it resolves it in favor of whichever your setting is. That’s the only thing that compatibility level really does. Now there are a few features that you have to have full 9.0 compatibility to enable, but for the most part that is all it is. For the most part it’s simply syntax resolution.
Greg Low: I find that it doesn’t. Keeping things at 8.0 doesn’t worry me much. The only time I tend to keep things at 8.0 is where there is maybe some old syntax. Even then I try and fix straight away. More like object names that have become reserve words or something like that. That is about the only scenario. I see people who routinely leave systems at 8.0 compatible levels. I don’t think they realize it really doesn’t bother anything much anymore.
Geoff Hiten: I think part of that is deny and neglect. They don’t realize that if you attach a system, a SQL 2000 database to 2005 by default it stays at 8.0. You have to manually change it to 9.0 compatibility. A lot of people just don’t go through that step. The problems I have always found with compatibility is going to either clearing it or order by operations. Some of the syntax rules for order by were some of the biggest changes we saw in 2005. Those were the two big breaking changes I have seen in the field.
Greg Low: Yeah the other one I have seen is people have the old join syntax. Hopefully not much of that and needs to be fixed.
Geoff Hiten: Yeah, if you are using that you deserve to have your system break, that may be a little harsh but you kind of get the sentiment there. It is not something you can just build and run forever you have to maintain it once in awhile.
Greg Low: No that is great. So any other hints with maintenance with service packs?
Geoff Hiten: Again, because there is so many varieties, especially when you start playing with multi nodes, three, four, five nodes at a time in a system. Simply build yourself a test platform and practice it. Measure what happens, check the versions of each of the nodes, each of the components when you are doing this. Again, clustering, kind of going back to the beginning of this, clustering isn’t a magic bullet. It is part of an overall high availability plan. That plan has to account for people, process and technology. Clustering will help the technology end of it but you have to have good management practices. If you drop the database, clustering won’t help you. If you have somebody that comes in the middle of the night and mops your machine, clustering is not going to help you all that much. You have to have the right clustering in there to buy you that magic four hours that we talked about at the very beginning. That time when you don’t have that person in front of the counsel. You get another computer that sits there and watches the first one all the time going are you alive? Are you alive? Are you alive? If it takes over the minute it isn’t.
Greg Low: One argument I hear now is they say that the server hardware has got some amazing reliability level and its more discs that tend to fall over given the fact we have gone with, the discussion I get endlessly now, is there much point still to clustering? The basis being that it is really only protecting from server failures not drive failures and there now fairly rare. What’s your feeling there?
Geoff Hiten: I think they are about half right. About half the things that go wrong are discs. They are the ones that have the big moving components in them. But then again power supplies tend to have the highest wear components. They tend to go out, they generate a lot of heat, and they’re under quite a bit of stress. One trend I have seen in computer hardware in general disc specific is people are building redundant systems, so I can build cheaper quality hardware because they are going to have a failover system. So I have kind of seen the other side of that. We know that a certain number of discs are going to fail, as long as they don’t go out in a certain amount of time with each other, we are good to go. I have seen boot discs take systems out before. Clustering will help you quite a bit on that one. I have seen memory go bad of course. I have even seen a processer burn up once or twice. Power supplies go. Also by putting different hosts in you can put them in somewhat slightly different physical locations. You can connect them to different power feeds. You can have them on different cooling platforms. There are ways of spreading out your environmental risks.
Greg Low: When the air conditioner dies.
Geoff Hiten: I had that happen once in a Class One Data Center. I had the A/C go out, lost the whole tape array because of it. It can happen. I thought it was cooling and it wasn’t.
Greg Low: Yeah, you get some weird environmental things, I remember one of the funniest things when I was working at HP we used to have the old CO2 systems. It was interesting seeing rooms where it was accidentally set off. Going in to look at the systems and chipping away at the ice in front of the screens and seeing that the system was still working underneath it.
Geoff Hiten: Yes, one of the things that working for seven years in that steel mill taught me is that, Murphy is the patron saint of computer science. If anything can go wrong on the computer, it will for sure.
Greg Low: The last main topic then, is what changes are coming in 2008?
Geoff Hiten: One thing we’ll see with 2008 is a big difference in the way they build clusters. Anyone that has ever built one will notice a difference in how you build a cluster and how you do SQL. When you create a cluster in Windows you start off with one node and bring the other nodes in to it. The binaries are actually in place as part of the basic OS. Then change the SQL installer, the clustering components to do the same thing. They are looking at creating a single node cluster, you can see the same discs, and you can join me into this cluster and add nodes in there gradually, just like we do in Windows 2003. That makes a lot more sense because it supports the failure to recovery scenarios and growth scenarios so much better. Ultimately what we would like to see is clustering take on the definitions that some of the other vendors have. The ultimate would be having eight computers and no big storage. They are just simple dual processor bricks from a big favorite vendor and I need more database power and I need to plug another one in. I haven’t seen anything from Microsoft on this, but we’ve got very bright people over there and everybody would love to get to this. I would be surprised if they don’t have anything in the labs at least working toward that type of a goal. I think we are going so see database, at least for most commodity type of applications similar to a web bar, application bar. I think it is several years before we see that, but we will see a, expanded definition of clustering to not only failover, but eventually to embrace the fail out in that arena.
Greg Low: Actually the last topic on this is just your thoughts on clustering versus mirroring?
Geoff Hiten: Mirroring is a great compliment to cluttering. Mirroring is good because you can use asymmetrical hardware. You can use more geographically disburse, especially if you are willing to go asynchronous. One of the problems with mirroring, if you go above a certain stress level on your computer or use a certain number of databases in your system, there are some pretty severe limits. You can only mirror maybe ten databases tops. Clustering doesn’t care how many databases are in the system. There are some different limitations there.
Greg Low: I normally tell people myself that eight to ten databases is about the right number. I had a feeling that there might have been part of that number spelled out. I have had people ask, why that number? It just seems to be about right.
Geoff Hiten: The number I got was from a program manager at Microsoft at the 2006 PAC Summit. It was 2006, shortly after we got 2005 in the marketplace, but before SP1. The first summer after we got mirroring but before they turned it on for everybody because remember it didn’t work automatically before the Service Pack. That was one of the reasons there were certain scalability limits that weren’t fully explored. These were the general limits we were seeing mainly due to internal thread allocations and mirroring has certain hard threads that it binds to inside SQL. Unless you’re running a high number of cores you’re going to have a problem with thread starvation. If you start spinning these up, because mirroring instance requires a fairly good number of threads, I want to say about eight to ten. So eventually you end up without any more threads. So, that’s the problem with scalability and mirroring.
Greg Low: The other thing I found strange management wise is that when I am working with a cluster at least the whole thing is failed over or the whole thing failed back .With mirroring I find six databases on the primary and two running on what was the secondary.
Geoff Hiten: Mirroring is done at the database level and clustering is done at the instance level. If I remove one disc out of a cluster or one network connection or any of the necessary resources to operate that clustered instance everything goes over to the next node in line.
Greg Low: The other thing I’ve found quite strange and I don’t know why it’s happening is often when I go to mirroring sites, where I’ve got eight to ten databases, I often don’t find them all on the one node. It’s quite common to find some databases active on one and some on the other, kind of strange.
Geoff Hiten: I’m not knocking mirroring. It has its place. It’s a very useful piece, a great compliment to the whole high availability toolkit. I’m glad Microsoft put it out there but it’s not an all situation replacement for clustering. Neither is clustering a one for one all situation replacement for mirroring.
Greg Low: I think it’s a great cost based solution. Where you then are also talking about it has effectively shared nothing, I think that’s how they normally say it. Then again not having the systems identical and everything. It definitely has an upside in some of those areas.
Geoff Hiten: Microsoft actually called clustering a shared nothing technology because no instance is dependent on any other instance in a cluster. They are totally independent of each other. It’s locked into resources that are only available on one node. By that definition is where they started with the shared nothing. They don’t need each other for anything is where that originated.
Greg Low: Is there anything 2008 wise that would affect clustering?
Geoff Hiten: That’s all I’ve had a chance to look over and work with. Looking now at bigger scalability limits in the clustering, no limits to the number of nodes, keep adding nodes and instances. I’ve had issues in the management of four node, three instance clusters. It changes the way you think about them. I hesitate to look at 50 node clusters, I look at them and wonder how in the world you deal with that.
Greg Low: That pretty much brings us up to time Geoff so thank you so much.