James Luetkehoelter
SQL Down Under Show 23 - Guest: James Luetkehoelter - Published: 24 Jul 2007
In this show SQL Server author James Luetkehoelter, discusses disaster recovery planning and technology for SQL Server, clustering, log shipping, mirroring and snapshots.
Details About Our Guest
James Luetkehoelter who has been in the database arena well over a decade and his bio mentions that any other clues would reveal his true age. While anything database related interests him, his passion lies with SQL Server, working with it consistently since version 4.2. James owns a small specialized data-centric consulting company in the U.S. and in his free time he catalogues various phonetic pronunciations of “Luetkehoelter” where the tab is currently up to 2,082 distinct variations.
Show Notes And Links
Show Transcript
Greg Low: Introducing Show 23 with guest James Luetkehoelter.
Our guest today is James Luetkehoelter who has been in the database arena well over a decade and his bio mentions that any other clues would reveal his true age. While anything database related interests him, his passion lies with SQL Server, working with it consistently since version 4.2. James owns a small specialized data-centric consulting company in the U.S. and in his free time he catalogues various phonetic pronunciations of “Luetkehoelter” where the tab is currently up to 2,082 distinct variations. Welcome, James.
James Luetkehoelter: Thanks for having me, Greg.
Greg Low: Tell us the story about the surname and how you come to hear so many variations.
James Luetkehoelter: It is a unique name. There are actually only four in the United States: myself, my wife, my mother, and my brother. It is spelled L-U-E-T-K-E-H-O-E-L-T-E-R and people tend to look at it and panic. I have relatives in Canada and Germany, but if they have that last name they’re related to me. We don’t even have consistency within ourselves. I say “Let-Kee-Helter” because it’s nice and rolls off the tongue, my brother says “Lick-Kee-Halter,” and my mother says the more German sounding name. I have a notebook where I write them all down. I was in Atlanta and someone came up with a new variation so it’s going up to 2,083.
Greg Low: Where did I see you last, was it TechEd? TechEd in Orlando. Been in the warm weather for a while?
James Luetkehoelter: Yeah, everything is warm right now. Kind of all over the place. I had been in Denmark for a conference there, and that was quite chilly in March or so. Lots of rain. I was the first presenter of the first day and I fell down a flight of stairs. I spent the rest of the conference lying on a couch trying not to move. Luckily conferences like PASS don’t have those problems. You have elevators at least.
Greg Low: We’re looking forward to that coming up in September; you’ll be there as well?
James Luetkehoelter: Yes. I’m actually giving a non-SQL session, talking about everything non-SQL that a DBA needs to know something about: a little application development, a little hardware. Finding too many people out there are too focused and when someone else starts talking, it’s the deer in the headlights look—they don’t want to know.
Greg Low: I really liked Peter Ward’s session called “Engineering 101 for the DBA” as well.
James Luetkehoelter: Yes, that’s a good one. This is something similar but I’m going to try to cover as much as possible. Like most of my presentations it will either be very popular or I’ll have two people there. People either like me or absolutely hate me. One of the two.
Greg Low: Like I do with most guests, tell us how you came to get involved with SQL Server?
James Luetkehoelter: I started in school with a mathematics background then philosophy so I have the database categorization type of mind. The concept of normalization to me seems obvious. Pointing out the normalization rules, it’s like “yeah, duh!” Started off that way. Got a job just doing help desk support, just getting started. Then I got on a team that supported everything and we had to specialize. When it came to databases the entire team pointed at me, I didn’t even get a choice. Started dealing with any type of database out there, Dbase, etc. I had to learn lots of platforms very quickly and I loved it. It’s the way I think and see the world. Unfortunately I see a lot of things done poorly with it so I’m off to change the world. Unrealistic but you’ve got to have a goal.
Greg Low: Great. James, the session we’re talking about today is about disaster recovery and I gather you’ve just written a book about that?
James Luetkehoelter: Yes. It should have been out already but due to my running my own business and trying to write a book…it’s challenging. Microsoft officially announced SQL Server 2008 so now I have to cover what I can of that before I get the book produced. So, it’s coming.
Greg Low: The SQL CLR book I just finished, we’ve also done the same thing. In fact I’ve looked at what’s different with 2008, and putting in some minor changes for that.
James Luetkehoelter: Nice curve ball we got from Microsoft, by the way, here’s a new version.
Greg Low: It’s topical we’re talking about disaster recovery. I’ve been doing disaster recovery planning for organizations in Cambria and we should start with a definition of what you think disaster recovery is.
James Luetkehoelter: I’m a bit of an oddball. I’ve always got a little different take on everything. The way I see disaster recovery is partially backup recovery which is what normally people think of…having a backup data center somewhere. But I include high availability techniques in it and process techniques and all sorts of things. Hopefully there is no disaster recovery. In the best scenario, it never happens, but it is a distinct plan. There’s a difference between the response to a disaster and what you do to mitigate that disaster and prevent it from happening. The latter is usually separated off as high availability, usually discussed as performance and 100 percent up time but to me it’s more of a disaster recovery mechanism, or it should and can be.
Greg Low: So you’re arguing that one reason to look for high availability is to mitigate the chance of having a disaster.
James Luetkehoelter: Absolutely.
Greg Low: And then the response to the disaster is a separate part of the plan.
James Luetkehoelter: Yeah, they’re distinct things to think about. Definitely you’re including both. You have to have a response plan. Response plan is required, mitigation is a good idea if you can afford it or find a decent way to do it. But response is required, and not just having a backup and recovery plan written down. Having it practiced. Checking backups to make sure they’re good. I don’t know how many times I’ve asked for the last full backup and they give me a tape and it’s corrupted. Working as a consultant that’s a very uncomfortable situation to be in.
Greg Low: Had a colleague who loved three-letter acronyms and he called those CLMs, a Career Limiting Move. It does intrigue me that people will have an amazing degree of confidence in hardware but the years I’ve spent in hardware support, I think it’s misplaced. Even on very, very large tape drives. One of the scariest things I’ve seen is one that had the erase head permanently stuck on and as it would write the tape it would erase the data and then it would write the new data and then it would do the read after write to make sure it was written properly. It would get to the end and as it rewound it would erase the entire tape.
James Luetkehoelter: Without the human process of actually validating it at any point.
Greg Low: Yes. Things like that do happen and they’re very scary.
James Luetkehoelter: Great point, that hardware is seen as how we deal with disaster or mitigating it. We have a very high reliance on it and one of the disaster scenarios you’re going to deal with is hardware and there are very few options for protecting against it. Yes, is anybody checking the hardware? Checking the tapes? Did the tapes make it? Or it’s a tape library and it’s an auto-feeder and it’s striping across there. Well that scares me. What if I lose one?
Greg Low: The other common thing I find is come the disaster the only tape drive on the planet that would have read that tape is the one that just died.
James Luetkehoelter: Yes. If the one that you needed was available it would be utopia, it just never happens. It’s always the worst situation that you can be in. A mantra that I preach to clients is to plan for the worst. I have them do silly things but you plan for extremes. If you’re designing something starting at ten gigs imagine it’s going to ten terabytes within three years. Same thing with disaster recovery, assume the worst. Don’t assume you’re living in a place free of environmental disasters, I’m not going to be facing a Katrina incident with the town flooded. It will happen. The more confident and arrogant you are about it not happening, it just seems that cosmically it will happen.
Greg Low: I was sadly reading about a company based in the Twin Towers that had its DR site in the other tower. Just unbelievable. Cost justification is the next main issue. What’s your thinking in the areas of cost and how do we discuss this with management?
James Luetkehoelter: This is the biggest problem because disaster recovery is something that cannot be cost justified because in the best case scenario you won’t have to implement that plan. So how do you argue to management that we need to spend money to protect against these types of problems? You can posit scenarios where you lose certain amounts of money but then you become an actuarian. Like car manufacturers will have a safety defect and someone will say, well, we can afford this amount in lawsuits if this is the percentage of cars that die, rather than having the recall. And that’s a scary way of thinking.
Greg Low: It is reality. Interesting point you make. I see so many situations where people don’t realize that all these things are trade-offs. I think of aircraft. I fly so much and everybody thinks that everything is working perfectly every time they get on a plane, but the reality is no plane has everything working. They’re endlessly making the decision, here’s what works and doesn’t. At what point do we say, no, it doesn’t go. There is no concept of it all just works.
James Luetkehoelter: Correct. The same sort of thing can fall into disaster recovery planning so instead of disaster recovery planning there’s negotiation of, “How much can we afford to lose?” Well, that’s part of it but you don’t discuss that with the people who spend the money. In fact disaster recovery becomes an acronym I really like to avoid—DR, HA, BC—they’re like buzz words where executives will jump on them. If you say we need to have a DR plan, the first thing CFO will think is, oh no, it will cost us millions and millions of dollars. But it doesn’t need to. We as DBAs need to impress upon people that it doesn’t have to be expensive. There are simple things that can be done. Your example of the tape--is someone checking the tapes, just once a week? It’s an ongoing process/job responsibility but it’s not seen that way. It’s “We’ve got a DR plan.” Well, good for you, but who’s doing it on a daily basis? It has to be on a daily basis or you’ll run into that piece on the airplane that will cause it to fail, or the piece of the car that will cause it to crash, or the tape that is corrupt. It’s going to happen.
Greg Low: That leads us into the different scenarios of disasters. What categories do you put things into in terms of disasters?
James Luetkehoelter: This is another area where I get odd, and deliberately odd. I go against basically anything you see in Microsoft literature and in lots of literature and I do so for a reason. It turns out that the response and mitigation approaches are very similar for each category. First one is environment. This can be simple like loss of power, data center over-heats…I remember a data center that was basically a closet and it got so hot they opened a window. Even in winter in Wisconsin they would still open the window because if they didn’t those servers would continually reboot. Could be that simple or we could go as far as a Katrina or Twin Towers type event. It can be anything where the environment changes so much that the database fails or the perception of failure. That’s another nuance to mention. It doesn’t matter to the users, mangers, or CEO if the system is really up and running but they can’t get to it…they don’t care about that, only if they can use it. That’s a big part of it as well.
Greg Low: Good point. I often see high availability discussed but no definition of what “available” means. You’re quite right, unless the users are able to use it as they normally would use it, then it’s not really available.
James Luetkehoelter: No. It should be not up instead of walking through an IT department and seeing a chart on some manager’s door showing the up time for a given system. Well, that’s your part of the system but what about the network, what about the application, other things. It’s the users who determine that and it’s the same thing with disaster recovery. The users will determine what plan, they should determine what plan, not the DBA. What are the requirements to meet? Unfortunately the difference between up time and the perception of up time is kind of a mist in our segment of the industry, in IT, and I want to see more of a move back to that. And I will because I will keep complaining about it until people tell me to stop talking.
Greg Low: That’s environment and we’ve mentioned hardware. There are enormous numbers of possible hardware related failures. I suppose then, the media itself?
James Luetkehoelter: Yeah, hardware is a category. Media would be just as you described with the tape, some sort of damage. Media is disk drives, tape drives, the magnetic media. They’re quite fragile, it’s not solid state storage. Until we get to a point when that’s an affordable option, it takes nothing. One of my employees, at a manufacturing company, had a good server room with a hallway. Left side was PC based servers, right was AS400s, and a cabinet with all the tape drives. Well somebody came in with an electromagnetic thing that they got to clean up the manufacturing floor, metal shavings. He walked in, turned the thing on, throws nails in front of him then lifts it up and picks up all the nails from way across the room but he pointed it directly at the tape cabinet. Luckily none of the disk drives were destroyed but every tape was damaged. That company, every day, would sweat until they got back to their monthly back-up.
Greg Low: That’s the media coming to grief. What intrigues me, even with low cost CD-R media, that maybe it doesn’t have the life we expected it might. And that’s chilling, that even things we’ve put off to permanent storage or archival, and periodically checking that the archival media is even really readable still.
James Luetkehoelter: Absolutely. DVDs are a great example. Many I’ve loaned out come back with a huge physical scratch. It will play, sort of, but it will jump and skip. The idea that there is media that is protected or that you don’t have to worry about it is a very big mistake I see people make.
Greg Low: They were discussing that some of the early CD-R media, particularly the low cost ones, may have as short a life as 8-10 years of actually being readable. We tend to check them early on but then a lot of people store them away and presume they’re readable long-term.
James Luetkehoelter: Yes. And eventually they’ll erode away and the headers will erode away.
Greg Low: The next area you’ll categorize is something failing in the process. I find the more complicated the backup process is and the more pieces of media involved, obviously the bigger the chance the process will fall down. We used to check media from clients and people would have three or four pieces of media but they would send two of Tuesdays, one of Wednesday but they thought it was a media set.
James Luetkehoelter: Sure. That’s definitely an issue. Simplicity is the key and it doesn’t have to be complicated. The stripe set thing is definitely a problem. Even SQL Server almost pushes you behind the back to do it, accidentally.
Greg Low: I have a personal preference whenever possible to have everything fit on a single piece of media because invariably as soon as multiple pieces of media are involved people get things wrong. I don’t like to assume that the person doing it will get anything more right than what day it is, and even that they won’t get right let alone something more complicated.
James Luetkehoelter: That’s a great point. I have a similar rule. Whether it’s backup or the initial design I never size any file larger than my ability to transport it individually. So it’s got to fit on one tape. So if we’ve got to do file by file backup, O.K. But none of this 32 terabyte file which I think is the biggest SQL Server 2005 supports; that seems insane because I can no longer move it. The same goes for backups. They have to be portable. If they’re not portable and they stay on one site, well environment is a problem, media is a problem. There are all sorts of things that keep me up at night when I think about that kind of thing.
Greg Low: I also have a major preference for write once media over re-writable media at the lower end. For example, I see people with rewritable DVDs, DVDRs and +Rs and just write once. They seem proud of themselves that they get to reuse the same piece of media again and again, but when I have to recover something having a stream of one of them from every day backwards increases my chance of having something usable dramatically compared to endlessly reusing the same piece of media and only having a handful of pieces of media to choose from.
James Luetkehoelter: Exactly. That segues into another of my mantras, “The longer any single process takes, the more likely it is to fail.” Look at a big project, the data warehousing projects pre-2000. Most of them failed because people wanted to data warehouse their entire company rather than take it in bits and pieces so it would be huge projects and the majority would fail. Same with simple backups. As long as it takes that backup to run, the more likely there is a problem. The idea of going directly to tape or media, or re-writing over the same media, even that would be increasing the time just a little bit and that increases the chance of it failing plus that media will wear out eventually.
Greg Low: You treat the user as the last category in the scenarios. Are you thinking maliciously? Or a user running something when they shouldn’t? Or both?
James Luetkehoelter: Process and user are similar. Process error, definitely we’ve got the things we’ve talked about: the disaster recovery process itself is too complex, or the environment in which you have to implement it is too complex. But I also include a planned or automated process that fails. Example of a process error is the classic scenario “delete from table” and you forget the “where” clause and you delete everything. There are new tools in 2005 that make recovering from that a lot easier than running to find the server and unplug it. These days, it could be me as a consultant, thousands of miles away, VPNd in, accidently doing that, and I can’t call somebody to run and fix that. Same thing with an automated process. There’s an automated reindex and something happens during the reindex that damages the disk. There are new tools that we could use. The user ones are unplanned. Yes it could be malicious but the scenario I have in mind is more of the order entry person going “oops, I just deleted our biggest customer,” or an important order. You ask them what was the order, when did you do it? “Well, I don’t remember.” That’s a common thing I run into, where somebody makes a mistake and they don’t fess up to it or they don’t really realize what they did. Previously we would restore the entire database or if they were allowed to lose data, restore to the point of the failure. We’ve got new tools to deal with that as well.
Greg Low: Interesting point. In many recovery situations, unlike real-world situations where the people around you are trying to help you recover it, one of the problems you run into is the people who are responsible are not prepared to tell you what they actually did but quite often will tell you something very different to what they did.
James Luetkehoelter: Absolutely. And the invariable pressure of the IT manager or the CEO standing behind me as I’m working. A habit that infuriates them is when I sit down calmly, pull out pad and pen and write down, first I’m going to do this, then that, ... Just to be very clear rather than doing it in a gunshot approach. Because it’s taking time, they freak out. But I’ve done the gunshot approach and I’ve made mistakes that cost more time. I want to make sure that I don’t lose any more data. There can be situations where what you do causes you to lose data.
Greg Low: Yes, that’s so critical. I always tell the guys I work with, the number one rule is you do no further harm. You must not do something that makes the situation worse. In many situations, that first action you take might remove possibilities down the track to recover something because you didn’t realize you could.
James Luetkehoelter: From personal experience I’ve done it accidently and lost half an hour of data. This idea of 100 percent up time is great, but for a lot of companies they have to choose between being down for 20 minutes because of a problem or losing 20 minutes of data but still being up. That could be critical for an on-line retailer, a big one that sells books and lots of things, huge. They’re going to say they don’t want to lose data because they’re not only losing orders, but also losing customers for good. So the loss of data is often downplayed. IT managers are driven to think, up time, up time, up time, but data loss is often the more important thing to protect against rather than up time.
Greg Low: I suppose it depends on the industry, but it’s a very critical thing. One thing I’ve found is whenever I’ve had a really bad situation that you’re coming into, I have a rule that tells me that any copy of the data we’ve got right now is gold, no matter how bad it is. And you never harm one of the copies you’ve already got. One action I’ve seen is when you try to restore over the copy of the only working database they have from some unknown piece of media which turns out not to have a valid backup on it. They may not have been happy with the database they had but at least they had one. Things can get a lot worse from that in a big hurry.
James Luetkehoelter: Which makes the idea of sitting down with pad and paper and writing it out a little more attractive. My mind jumps all over the place, thinking about different scenarios, going ahead of myself, not thinking step two but I’m thinking step five or six and if I let myself do that shotgun approach I will invariably make mistakes. I know that from experience. That kind of thing is very important and underrated. Even if you have a plan in front of you, when was the plan last updated? Is it accurate? Evaluate the plan. You ask when it was updated and they say last week, but if their boss is standing next to them they don’t want to say they’ve never updated it. You’ve got to assume the worst. Assume you’re alone on an island and nobody is going to help you and you’ve got to figure it out.
Greg Low: Starting with the basic backup and recovery process, what goes wrong most in relation with SQL Server? What are the most common sort of gotcha’s?
James Luetkehoelter: One thing that goes wrong is just getting set up. In SQL 2000 I was never a fan of the maintenance plan wizard but it got the job done and for a lot of smaller shops that was easier to work with. But they would think “what’s my backup plan” and then “what’s my recovery plan?” but you have to think of it in one process. Often you need to start thinking about what’s my recovery process? How much data can I lose? How much down time can I have? How much am I allowed to spend? Now with backup and recovery the spending isn’t usually an issue but you’ve got to think about those things because they play into how you set up your backup plan and how you employ it to recover. Do you have to recover quickly? Do you have to make sure all of the data is there? From a technology standpoint, if you’re using the GUI or using Transact SQL, some people aren’t aware that SQL Server has defaults all over the place and one of the biggest mistakes I see is “we’ve got the full backup restarted, can you apply the transaction logs later on?” Well, I log in to see they’ve done the command “restore database” from whatever file and nothing else. Because “with recovery” is the default so by default it brings it online and I’ve got to start over and re-restore the whole backup. Even in the GUI that stuff is on an options screen and the default is there to recover the database. So if you’re a neophyte it’s easy to fall into that default. That’s the biggest one is having that database recover itself.
Greg Low: At least it’s not tragic, it just delays. But it’s very frustrating. I think the biggest problem I see is when the DBAs abdicate their backup responsibility to the IT folk and they don’t understand what is required. The number of times people just have an MDF file copied off a live system and go, “Yes, I’ve got the database” is just breathtaking.
James Luetkehoelter: Yes, it is amazing. That particular default, and there are hundreds of them that make me mad that it’s a default, but that particular one…I train myself, staff and every client that when you’re writing that statement, before you even put in the file, put in the clause “with no recovery.” Even when I’m done I still don’t recover the database. I stop, look at it, and what people don’t know is the database will just say “loading” and you can say “restore database with recovery.”
Greg Low: Yeah, database name with recovery.
James Luetkehoelter: Yeah, and it just recovers the database. Something as simple as getting in the habit of that makes a huge difference in how successful you are.
Greg Low: Is there any GUI option for doing that? I was writing instructions and the only thing I couldn’t work out how to do in the GUI was to just do recovery on a database without actually restoring a transaction log.
James Luetkehoelter: Without restoring a transaction log…you can only do that in Transact SQL because you have to not specify a file…
Greg Low: ...and they won’t let you go to the next screen without a file.
James Luetkehoelter: Yes. Which is another default behavior I hate and it still exists in 2005. All these exist, the hidden GUI with automatic recovery, with recovery being the default. Another one I hate is the screen where you put in files to restore. I don’t know how many times I’ve seen that full of files and someone selects it to restore from and I point out it’s striping amongst all the files.
Greg Low: Yes, that’s classic. The number of times people go in and there’s already an entry there and they go to add one and they end up striping the backup across the files. That’s really common.
James Luetkehoelter: Yes, and there’s no visual idea that’s happening. That’s something you actually have to know. Unfortunately, a very common problem. And because they think they’re separate backups they delete old backups.
Greg Low: That’s one of the things to add to the Connect site. There will be a few of those. Before it starts a striped backup in the GUI, it should warn you.
James Luetkehoelter: Or at least names of things, or put something visually on the screen “more than one file” or “you’re adding another file. It’s a striped backup, is that what you want to do?”
Greg Low: Particularly if the files weren’t in the same folder. That would almost never be correct.
James Luetkehoelter: Yes. If someone’s fallen into that trap you’re almost guaranteed that all of those files will not be there. I would forego the striping thing and if I had to do a backup and restore, maybe move to a file level backup for portability rather than stripe it. It’s going to take longer and there are more variables and something could go wrong. Probably paranoid on my part but that’s kind of who I am.
Greg Low: Another cause of a lot of problems have been backup jobs that don’t run, people aren’t aware that they aren’t running, and the IT folk are endlessly backing up some old backup from months ago every single time they do a backup.
James Luetkehoelter: Yes. Well, if there were an SQL maintenance plan there would be times when the agent might think it has gone successfully but it didn’t run. Or, now ubiquitous, from 3 or 4 years ago there was vulnerability with the messenger service. You would go to this website and a popup would say your system is at risk, did your CD drive open? And then your drive would open and it was with the messenger service. In every organization I run into, that’s disabled so you can’t send instant messages. With 2000 if you relied on SQL Mail, it has its challenges and may not work, you may not be notified. Or you set it up and tell someone else to monitor it, but that person may not really be monitoring it. Especially if there are lots of databases it’s easy to miss one particular one that might be the key database. That happens all too often.
Greg Low: James, tell us about where you live, sports? Hobbies?
James Luetkehoelter: With work demands it’s hard to have hobbies. I live in Madison, Wisconsin and our problem is the arctic air blowing down in the winter. It gets very cold. I can say that since I’m a first generation American Canadian. My father immigrated from Canada. Otherwise that would be a little rude. If there was a wall or mesh fence to slow it down, but no, it whips across those plains so by the time is hits us its well below 0 Celsius but with wind-chill factor it becomes dangerous to walk outside for more than five minutes. Interesting place to live. The only natural disaster besides possibly freezing to death getting the mail is tornadoes. It’s a nice community, cosmopolitan yet small. I really enjoy where I live.
Greg Low: When you’re not working what’s your passion outside work?
James Luetkehoelter: I’m ashamed to admit it but I am a PC gamer, although I had to cancel my online game accounts because they were consuming so much of my time that my wife wasn’t happy with that. I have to admit that’s one of my guilty pleasures.
Greg Low: I tend to avoid them because I know I could get immersed.
James Luetkehoelter: Don’t get started! Once you get started it’s a dangerous slippery slope and will bring you nothing but trouble. Another thing I love to do is just go to coffee shops. We might not be #1 but we’re in the top three, Madison has the highest ratio of restaurants to people of cities in the US. Because any given workday there could be half a million/three-quarters of a million people in the city.
Greg Low: Do you think that’s a function of the cold? That you tend to have so many indoor things and coffee shops and restaurants are just an extension of that?
James Luetkehoelter: No. I think there just end up being so many people during the week, to the point where a restaurant will have just five tables. I love to go to coffee shops. I think I mentioned I have a philosophy background, I was a PhD candidate at U of Wisconsin and I love to argue. Real argument, let’s build an argument as to why this is and then the other person attacks yours and you attack theirs…I wish I could do more of it.
Greg Low: Like the monkey skit about arguing?
James Luetkehoelter: Oh yes. That brings to mind the argument sketch: “This isn’t an argument.” “Yes it is.” “No it isn’t.” “Yes it is.” But then there’s also one with a university in Australia and the philosophy department is Bruce, Bruce, and Bruce.
Greg Low: Yes, the Bruce sketch, it’s the University of Wallamalu.
James Luetkehoelter: Yes, Bruce here is in charge of Rasputin philosophy, I’m in charge of existential philosophy and Bruce here is in charge of the sheep dip.
Greg Low: There was one guy who was existential philosophy and the sheep dip.
James Luetkehoelter: Yeah, all they would do is drink beer and toast Australia.
Greg Low: They had the famous line: “This is the Waddle, the emblem of our land; you can put it in a bottle or hold it in your hand. Amen.” And then they’d drink. Having your systems situated in an area where you don’t have many natural disasters is a good start but in terms of other mitigation ideas, of things we have built into SQL Server that help us, I suppose the most widely implemented is clustering.
James Luetkehoelter: Yes, definitely the most widely implemented and I think the least effective. Once servers were unpredictable. Usually with any piece of major hardware if there’s a defect you’ll know within 30 days so you do a 30 day burn-in and we’d be fine. A lot of times you get them now after a burn-in. Today the servers, even commodity servers, are solid and the support and response time is very quick. So what clustering will mitigate is hardware problems with the server. If there’s a problem with the shared storage you’re out of luck. It’s very limited in what it mitigates. But it’s still ubiquitous, and definitely if you can afford it, why not? But it isn’t nearly as needed as it once was. Five years ago my answer would be totally different, but today not so much. I’m sure every hardware clustering vendor just loves to hear me say that. I’m already seeing emails coming in with death threats written out in newspaper letters.
Greg Low: Clustering only really helps when we have failure of one of the server’s bits of hardware and detectable failure. And if we have a shared disk subsystem which we have to have, if there’s a failure amongst that it’s got to be redundant enough by itself to be resilient enough to avoid the problems. The next step is to get the data off-site, and again log shipping was the most common scenario there. But in SQL Server Enterprise Edition 2000, one of the key reasons people bought it was to do clustering. It’s worth noting that Standard Edition 2005 supports two node clustering. That has released a lot of reasons why people were buying Enterprise Edition; however, there are now even better reasons to buy Enterprise Edition. In an upcoming podcast Lara Rebelky is going to talk about the differences in the different editions and specifically why Enterprise matters. The other part of Enterprise was log shipping and it’s something you could have done yourself with a little effort but the log shipping infrastructure helped. It was effectively backing up transactional logs, copying them, and restoring them on another system.
James Luetkehoelter: Yeah, with 2000 Enterprise Edition you’re referring to, correct?
Greg Low: Yeah.
James Luetkehoelter: And my friend that carried it out was the SQL maintenance plan.
Greg Low: That is worth noting. In 2000 you set that up via the database maintenance plan wizard and fortunately in 2005 they’ve moved that into the database properties window.
James Luetkehoelter: Database properties window is supported further down, even to the work group level, and the actual work is done, after you’ve configured it, it’s done by stored procedures rather than a black box exe which is what scared me. One thing you’re right about with previous versions is it’s something you can set up. I remember back on Oracle 7.34, it is really a manual process that you’re automating. Back up a log, have another server that is in no recovery mode, make sure you restore each log with no recovery, copy it over there, and restore it. Another thing I like about that over using replication to get a stand-by server, is I know exactly the state of that server, how much data is there, how much lost, but if I’m using merge replication there’s no way I can know that. There’s no transactional guarantee. Transactional replication you could figure that out but it would take digging into the distribution database and harder work. I like the simplicity of it. In 2005, in the GUI itself, in the database properties window, you can specify a number of locations so you can have multiple stand-bys anywhere.
Greg Low: Yeah, it lets you set up multiple groups.
James Luetkehoelter: I can send one up to your site, back one up to your house.
Greg Low: Another thing I like with the configuration of the secondary too in the GUI is it does give you the option of putting a delay in restoring the logs.
James Luetkehoelter: That’s one of the things I really like within the GUI because it scares people a little, the number: first you have to do this, then this, then this…and for those that aren’t familiar with it and to figure out how to copy it over, and do I have to FTP it because it’s on a different network? It makes it much more approachable and from a mitigation stand point, if you’re not doing it, why aren’t you doing it? Because unless it’s a huge database, in which case you probably have other options that are more robust than log shipping but it eliminates any kind of environment error. In my case if I have a server in Wisconsin that’s hit by a tornado and it’s embedded into a tree, if I’m moving it off to a different location, say Australia, it’s completely protected against that. Or the server room overheats, or that specific hardware dies, or the media dies. Even a process or user error if you can catch it quickly in whatever your interval is, fifteen minutes, you can stop it and have it not restore that particular problem.
Greg Low: That’s one of the reasons I mentioned that delay. You can now manually configure that so you can stay two hours behind. That does allow you to have additional options should somebody do something tragic. Rather than having to go back to square one and starting restores, having another copy of the database that’s fairly recent but not completely up to date is useful. That is one of the few places the “with stand-by” option is useful compared to the “with no recovery.” With stand-by allows you to have a read-only copy of the database that’s in restoring mode, but that’s only useful when I’ve got a delayed thing and I want data out of it from before the problem occurred.
James Luetkehoelter: One thing I’ve found is you can do double duty with it if you’ve got a central and satellite offices and it’s a small database and everybody wants to record off it, but only the central office wants to write to it. You’ve got stand-bys all over the place and they’re readable. Now there’s more hardware required at the satellite offices than is necessary but it is an option that you have. There are other SQL Server technologies available where you don’t have that option to read it directly.
Greg Low: One of the problems I’ve had trying to use it as a reporting server, the stand-by one, is usually a clash between running really long reports and restoring the logs.
James Luetkehoelter: Yes. That is because once restoring the logs you’re going to kick everyone else out while doing it.
Greg Low: So if you have the logs restoring every couple of minutes then the number of situations where you can actually use it as a reporting server is hardly worth mentioning.
James Luetkehoelter: Good point. I have seen it used that way, but definitely some overhead that the local offices would have to do. And they would do that. They would pause things while somebody’s running a report.
Greg Low: That leads us into snapshots which is another option for having something to report against which isn’t a moving target off the live database. The other thing we can use those for is recovering data. If we created periodic snapshots we could get a view of the data before the problem happened and maybe with T-SQL commands we could fix the problem.
James Luetkehoelter: True. Having that point-in-time view of the database is one of the crowning jewels of 2005, especially the way they’re constructed they take up very little space to start with and they’re only going to increase as things change. You do get that point-in-time view of it and given most user scenarios you can figure out what might have changed, although there are scenarios where it can be extremely difficult—the one when the person’s not sure what they deleted and having to compare the two databases it can be a huge process.
Greg Low: It sure helps with a huge database when somebody says they’ve dropped a table and being able to pull the table back out.
James Luetkehoelter: Yeah, a big improvement from what we used to have to do, restore the whole database from a backup and hopefully find what we need to in there. That just wasn’t practical. Another thing I love about database snapshots is the idea of a process error, either a planned or an automated activity that could go wrong. Anytime I’m doing an update to database code, tables, or anything like that, I take a snapshot right before. This is a big plus for Enterprise. Having that from an administrative standpoint is gold to me.
Greg Low: Because we can do restore database from snapshot.
James Luetkehoelter: Right and that’s extremely quick as well. My maintenance window is very small, my process takes up most of that time and you can restore it back from the snapshot. I refer to it as snapping back.
Greg Low: I find it’s incredibly useful for unit testing which is always very difficult. Getting the database into a known state before every test is very hard so being able to create a snapshot, do some work, check the results, and restore from snapshot, it’s just beautiful.
James Luetkehoelter: Definitely. All levels of the environment as well. Maybe it’s a sandbox mode and they’re testing out techniques they might use and we can put it back to where it was.
Greg Low: I’ve seen people try to do it with nested transactions but it’s not the same. A snapshot is wonderful for that. I also use it extensively for end of period reporting. I see people get to the end of a period then want to do a huge amount of reporting before the next morning, and being able to take a snapshot and run reports against that while people continue using it is wonderful.
James Luetkehoelter: Another scenario I’ve run into is when somebody’s database should have an audit functionality on certain tables, but doesn’t. So they’ll go from week to week and somebody’s sales might change, historically change, and it’s very difficult to tell what’s going on, and is that accurate, so it’s good to be able to run a snapshot once a night and keep them.
Greg Low: What about mirroring? How do you think that fits?
James Luetkehoelter: Mirroring, again, I think is a crowning jewel if there is one. Both database mirroring and snapshots have their problems but they’re definitely fixable and they will be fixed. This is why I focus with SQL Server. With Microsoft products if it’s not exactly right the first time, they’ll get it right. Mirroring I love, but people often over-estimate the capabilities. I’ve seen people try to do a mirroring session between Wisconsin and China and have it be synchronous through a VPN tunnel and they wonder why it’s so slow to enter anything locally. Well, it’s synchronous. It’s a two-faced commit. Having that automated fail-over and you have a separate copy of the data so it’s like log shipping on steroids, but if you’re clever enough as a developer and you’re using the SQL Native Access Client you can capture the error that the principle wasn’t available and seamlessly repoint that client to the mirror so that perception of availability is there even if you lost one server. It may be a little slower given the distance…But there are things like that really make it almost a must for certain organizations. If you don’t have the funds to do geo-located, auto-synchronized sands and throw hardware at a problem, mirroring is the one. Especially from a user experience.
Greg Low: As a final thing, what’s your feeling on selling disaster recovery planning?
James Luetkehoelter: I mentioned this at the beginning. You can’t cost justify it to the financial person. Then you’re the actuarian and they want your probabilities. In New York in the World Trade Center knowing that both towers would be hit. How could you predict that? Or in New Orleans and having the levy actually break. In New Orleans it’s like being in a fishbowl. You look up and you see the sea. But how do you predict that? What I stress is you have to approach it as an everyday job, part of the DBA’s daily responsibility. Daily checking the tapes and every so often reviewing what you’re doing. It’s the DBA’s responsibility that the business unit understands the implications of what you’re doing. That’s often lost.
Greg Low: It just needs to be standard planning, that’s right. Nothing special.
James Luetkehoelter: Yeah. You could say “if we don’t do this something horrible might happen.” I saw colleagues use that approach to sell products after Katrina, after 9/11. That will fail. The scare approach, the doomsayer thing doesn’t help. It’s not going to get them to where they need to be. Another thing to sell it is to start very slow. In most cases you don’t need to request something new. You’re going to discover little things you could have done. You’re not storing tapes off site? Rather than paying somebody to do it, if you carry it in your hand out the door, that’s something. Taking things in small steps and demonstrating that. Then when you need to move to the next level you can say this is what we’ve been doing to protect against these things, and there are certain scenarios we would also like to protect against. It’s usually more effective from the bottom going up rather than from the top down. The top says we’re putting in a DR plan. Any time I hear “DR” I’m worried because that means big money. That kind of thing.
Greg Low: That brings us to about time. Where will we see you, what’s coming up in your world?
James Luetkehoelter: Definitely speaking at PASS in September in Denver. If you want to argue with anybody that’s where to find me. I’ll probably be speaking at DevTeach/SQLTeach in Vancouver in November. And possibly a couple of others but those are the two I know about for sure at the moment. Otherwise you’ll see me at an airport running by saying “I don’t have time to talk to you, bye!” That’s been my life lately, unfortunately. I would like to change it a little bit but this is what I’m doing and I like what I’m doing.
Greg Low: Thank you so much for your time today James, and I will see you in September.
James Luetkehoelter: Yes. I look forward to it.
Phone: 1300 SQL SQL (1300 775 775) l International +61 1300 775 775 l Fax: +61 3 8676-4913
Copyright 2017 by SQL Down Under | Terms Of Use | Privacy Statement