Bob Beauchemin who is the new director of developer skills at SQLskills which is the company that Kimberly Tripp has been running for the last few years. Bob has over 25 years experience as an architect, programmer, and administrator with datacentric distributed systems. He is the co-author of the book “A First Look at SQL Server 2005 for Developers,” the author of “Essential ADO.Net,” and has written articles on data access and databases for major technical publications and online portals.
Greg Low: Introducing Show #4 with guest Bob Beauchemin.
Greg Low: Our guest this evening is Bob Beauchemin who is the new director of developer skills at SQLskills which is the company that Kimberly Tripp has been running for the last few years. Bob has over 25 years experience as an architect, programmer, and administrator with datacentric distributed systems. He is the co-author of the book “A First Look at SQL Server 2005 for Developers,” the author of “Essential ADO.Net,” and has written articles on data access and databases for major technical publications and online portals. Welcome, Bob.
Bob Beauchemin: Welcome to you, too.
Greg Low: The first time I came across Bob was on the SQL Server 2005 Beta newsgroups particularly in the CLR areas, finally met him when he came down to present the Ascend training materials in Australia. First up, how did you get involved with SQL Server in the first place?
Bob Beauchemin: Well I was involved with SQL Server and databases way back when SQL Server and Sybase were together. Became involved with SQL Server 2005 more than other SQL Server releases but I’ve been dealing with it ever since the beginning. SQL Server 2005 I’ve been involved with since 2002 when I was asked to write a book on it with Niels Berglund and Dan Sullivan, so I’ve been more involved with that release than any other.
Greg Low: That book, “First Look at SQL Server 2005 for Developers,” I must admit is very thick and a very detailed discussion of the material. Personally, I found it really entertaining to read and I thought it gave really detailed coverage of the topics. How did you break up the writing of the book between the three of you?
Bob Beauchemin: We picked the topics that we liked and knew best.
Greg Low: One of the problems with writing books at the early stages is things that end up changing. Particularly thinking of things like object spaces.
Bob Beauchemin: The object spaces thing was funny. Object spaces was postponed the day after we shipped the final copy of the book so it was just one of those coincidences. It’s difficult to write a book on a moving target like that. I was thinking of that as I was revising the book. It’s easier this time because things aren’t changing daily or weekly. It was a lot of work. We rewrote that book three or four times total and some parts of it were rewritten quite frequently. It was hard but it was fun to do.
Greg Low: I was quite impressed with the amount of detail of how and why things worked. You must have had fairly close cooperation with the Microsoft teams?
Bob Beauchemin: That was one of the nice things. Everybody was really good about it too. They said we would have access to the team members and I live in Portland, Oregon, pretty close to Redmond, and every time I came up and visited folks I would say how sorry I was to break in on their day. They always felt it was an important part of the product to get information out about it so everybody made me feel welcome and was happy to see me.
Greg Low: In general I’ve found Microsoft very open with this release in terms of providing details. They certainly seem very prepared to provide details to anybody who is really interested.
Bob Beauchemin: If you’re going to start that early they sort of have to do that. And that’s why I was happy to do it.
Greg Low: In terms of CLR integration and managed code, which is the main topic for today, could you describe why you think it’s an important aspect for the database in the first place, given the fact that there are a number of DBA’s that are concerned about the whole concept. I suppose the important thing is why do we want to do it in the first place?
Bob Beauchemin: Well the biggest reason is that Transact SQL which is the procedural language around the SQL language itself, is an interpreted language so it lends itself really well to datacentric things but doesn’t lend itself well to writing long and involved mathematical computations. People wonder why you want to do that in a database, but you have to remember too, that if you fetch that data out to a middle tier, if you fetch a lot of data, things are going to slow down quite a bit. So, one of the major reasons to have CLR in the database is for mathematic-centric programming.
Greg Low: One thing raised is if SQL Server really should be an application server given the fact that also we can now expose stored procedures as web services and so on. It really is raising the whole discussion of how much of the business logic really has a place in the database at all.
Bob Beauchemin: That’s true. I like to think of it as giving people more options. You can do these things in the database or you can do them outside of the database. Based upon what I call locality of reference, it gives you the option to do things where it makes the most sense. It doesn’t make sense for every case or problem, but it’s good to have options.
Greg Low: What about the idea that at least then you can use the same programming language from the client right through the middle tier down to the database. Any significance there?
Bob Beauchemin: I’ve heard that too and there are a couple of different ways to approach that concept. One is to not think that being able to do the procedural part of your programming in .NET managed code means that the programmers are absolved from knowing the SQL language. The real language of a relational database is SQL, and Transact SQL is the parts of it that go beyond SQL like flow of control and things like that. There’s a difference between being able to use it and being absolved from using SQL. So that’s the first thing. The second thing is it is possible now, and even more so with the merged providers, to use not only the same language but even the same kind of code. Originally the provider was specific for SQL Server and was highly optimized for SQL Server and it was called the SQL Server provider.
Greg Low: Yeah this was System.data.SQLServer as opposed to System.data.SQLClient.
Bob Beauchemin: Yes, exactly. And because it was optimized it had so different of a programming model that it was difficult to see what this would buy you except for the fact that you could use the same language. With the merged provider, this SQL client that can be used in the server, it does have some optimizations for the server in it, but the programming model was so similar that even data access code could conceivably be similar in the client and server. With mathematical computations the code can be almost identical.
Greg Low: I suppose one of the possibilities is it does give you a chance to share some assemblies between other parts of your code where there may be function libraries, etc?
Bob Beauchemin: I talked to members of the team about that and they said not to tell people you can do that because people will think there’s a switch in Visual Studio, client on client or client to server. I had to promise to tell everybody that there is no such switch, but you could certainly. There’s a Microsoft example that shows a Wittener-Wagoner algorithm and it could be implemented on the server, middle tier, and even implemented on a really powerful client. Most people would do it with middle tier servers but the problem is there is so much data that is passed around the network to do these computations that sometimes it’s better to do it on the server itself. That code runs 40 to 50 times faster written in managed code than written in Transact SQL. That’s an example of how you could do things in different places depending on how it makes the most sense.
Greg Low: Indeed. I suppose the other thing it has introduced are a couple of new objects we didn’t have at all in T-SQL.
Bob Beauchemin: You mean like SQL Context?
Greg Low: I think also at the level of things like the new aggregates, for example.
Bob Beauchemin: Oh, yes, the user defined types and user defined aggregates. Especially the user defined types are really misunderstood. When I first saw user defined types back in the SQL 99 standard, I thought it was meant to make SQL Server an object oriented database. I was told in no uncertain terms it was not meant or optimized for that and it was meant to add new scaler types to the system, things like the time to ration data type that SQL Server doesn’t have natively. The user defined aggregates are really interesting because the engine is so aware of those things that they can even do the aggregation on multiple threads. There’s a method on the aggregate called merge where if you’re doing the aggregation on multiple threads you merge them back together and that’s how aware the engine is of the user defined aggregates. Those are things that are brand new and haven’t even been in SQL Server before.
Greg Low: Maybe we can start with what you can build using managed code?
Bob Beauchemin: Okay. In managed code you can build three things in either Transact SQL or CLR and those are stored procedures, user defined functions, and triggers. Two things you can only build with CLR which are user defined types and user defined aggregates. You build those things as methods and classes and assemblies and then you put your assemblies inside the database and run DBL. Part of the DBL has a special part called external name and that’s a three part name that in user defined functions, for example, refers to the assembly, the class, and the method within the class you want it corresponded to. The SQL Server engine executes that just like it was a normal user defined function.
Greg Low: I’ve found with people coming from the VB side of things, the concept that that class is the middle part of that name actually needs to include the name space as well if there are any name spaces involved.
Bob Beauchemin: Yes. The second part of the name has to be the class name and the class name can be a compound name that includes name spaces and you put dots in there. The separator for the three parts of the three part name is also the dot so if you include a name space you do have to have the second part of the name be surrounded by straight brackets just so you can distinguish what class it is. It has to be the entire class name and the entire class name does include the name space. Visual Studio gets around this by not having your class inside of a name space at all so you don’t have to worry about it. There is also the interesting thing about case sensitivity. Everything CLR related is case sensitive and this might throw some VB programmers at first but that’s just the way they decided things were because I think they didn’t want to have to worry about if the database itself was case sensitive and so on. Same thing with Service Broker. Service Broker objects are case sensitive and that’s because Service Brokers talk between databases or can talk between instances and they didn’t want to have to worry if the one they were talking to was case sensitive so they just decided to have all objects be case sensitive in Service Broker. Same way all external names are case sensitive and not just external names but anything to do with CLR.
Greg Low: Maybe if we talk about the different types of objects. Starting out with functions, what can we do with functions?
Bob Beauchemin: User defined functions are usually used to have some kind of complex mathematics or business rule because the SQL specs state you really can’t do anything to, or change the state of, the database inside of a user defined function. You can’t update things, for example. The user defined functions were introduced in SQL Server 2000 and the folks that I’ve spoken to think that the user defined functions will be one of the best fits for CLR. In SQL Server 2005 the user defined functions got faster in T-SQL but they’re faster still in CLR. That’s probably the best use for it, in user defined functions.
Greg Low: In talks they’ve emphasized the fast performance of managed user defined functions.
Bob Beauchemin: Yes, even with respect to the fact that T-SQL user defined functions have become faster in 2005, CLR is still faster.
Greg Low: We can also do table valued functions?
Bob Beauchemin: Oh, yes. Table valued functions changed recently. You used to have to implement this fairly long, complex interface called I-SQL Reader and in the last couple of data releases they changed it so that you implement two methods. It’s much better for writing code and much clearer what it is you’re writing. We had an example in our “First Look” book that Dan Sullivan wrote, and that example took 420 lines of code to write a user defined function that produced Fibonacci sequences. I just rewrote that in the latest release and I think we’re down to less than 40 lines of code so it’s a big code saver, the way they chose to expose them.
Greg Low: That was because you had to implement the whole of I-SQL Reader and there were just so many methods whether you were using them or not.
Bob Beauchemin: If you look at Dan’s example you can see half of them say things like “here because it has to be,” or “code return null,” something like that. This is a lot easier. If you only expose integer columns in a user defined functions what would you do with a method called get string, or a method called get double?
Greg Low: Indeed. What about stored procedures?
Bob Beauchemin: Stored procedures is the most interesting because stored procedures usually either wrap data access or cue data access inside of a stored procedure and T-SQL can still beat CLR for data access. So if all a stored procedure does is wrap one access to the database it’s probably best written in T-SQL. If, however, a stored procedure not only calls to the database to get data but also does computations or permutations of that data, it might be a candidate for CLR. I tell people not to blindly write everything in CLR, but you do want to take your most complex stored procedure, the one that has a lot of procedural code in it, and write one or two of those in CLR and see if you get any performance benefit at all. I remember at the beginning of the Ascend program there was a guy, at the end of the CLR part, who said he was really glad to be done with the CLR stuff since he didn’t see any use for it in his application. A couple of months later I got mail from him which said on a whim they had rewritten a stored procedure that used data and computation in CLR and it was 4 to 5 times faster, so thanks for suggesting it. I don’t remember suggesting any specific feature, just telling people how they worked, but it was nice to hear that. That’s the strategy I think people should use for stored procedures.
Greg Low: Outstanding. The other similar one we can do today is triggers.
Bob Beauchemin: Triggers are interesting because in most triggers you don’t want to do anything specifically to return a result set, or do anything with data inside of it. You just want to see if certain rules are being followed, so the trigger doesn’t really have that compelling a reason to be written in CLR. There have been triggers for writing to message queue, for example, and if you want to do something outrageous to access external things inside of a trigger, things external to the SQL Server instance you might want to, but the trigger is probably the most difficult one to motivate among all of the CLR functionality.
Greg Low: I think that DML triggers like insert, update, delete triggers tend to be datacentric and will still be best done in T-SQL even if that involves calling a user defined function to do something more complicated. But I suspect DDL triggers will be more text processing and may lend themselves much more to managed code. Do you think that’s reasonable?
Bob Beauchemin: That’s definitely possible. With the DDL trigger you get that event data that’s a chunk of XML and you might want to use the XML processing either in T-SQL or in the CLR with maybe System XML to process that. That’s probably the best case for them. I think you’re right about DML triggers, though.
Greg Low: I was talking to the product group about how I would really like to be able to modify the DDL itself in the DDL trigger but today you only get a read-only copy via the event data. In standard DML triggers we can physically modify the data while we’re in the trigger but in a DDL trigger we don’t have the ability to do that. I got the impression it would be very messy to give you the ability to do that directly but they were discussing maybe an option to provide an “instead of” trigger instead of doing that. I think it would be nice to be able to do things like, if I just loosely type into a query tool “Create Proc, …” I would love it to be able to nicely format it before it puts it into the catalogue. It strikes me that all that could be done via a DDL trigger. Or you could enforce naming standards, which you can do without modifying it since you could just roll back. I think there’s a whole lot of text processing, almost like code macros, which you could do in the client tool but it would be interesting to be able to do it in a trigger.
Bob Beauchemin: I remember the DDL trigger you wrote in the class when you got so tired of watching me put things in the master database by mistake to keep me from putting anything inside the master database. Class is probably the only place you would do anything like that.
Greg Low: We’ll take a break now and after we’ll talk about new types of object and database security and stability.
Greg Low: We should talk about the new objects you can build. First up we should tackle aggregates since they are brand new. What are your thoughts on user defined aggregates and why they are important?
Bob Beauchemin: So an aggregate itself is just a function that takes a set and returns a single value. There are a bunch of aggregates inside SQL Server as it stands now. I counted about 13 and there are five aggregates that are required by the SQL spec itself, but a lot of people like to do things related to statistical processing inside the database and a lot of those require aggregate functions. One of the nice things about user defined aggregates was if you wanted a special function inside the database before this you had two choices. You could write the function in T-SQL and then the engine would have no special knowledge it’s an aggregate function, or you could write a message to SQL Wish. And with user defined aggregates and .NET you don’t have to wish. If you want an aggregate bad enough you can write it yourself, the engine is aware of it…User defined aggregates in managed code seem to be something only SQL supports. It’s not part of any other database.
Greg Low: Interesting. I’ve seen examples where the ability to write aggregates has allowed you to take cursor based code and collapse it down to a set based operation where you have control over the aggregate yourself. Also interesting is if you’re not happy with some min-max, etc. just the fact that you can build your own if you need to. For example, if you want “average” and you want a different version, whether or not it deals with null values, you can then have control over that sort of thing.
Bob Beauchemin: I’m not sure about that one. T-SQL has ways to solve that specific problem. There are aggregates, especially in respect to user defined types that you may not have directly supported inside SQL Server and it might be a good place to do that. When we were looking for aggregates to write for our book, I noticed that since the Oracle database doesn’t have a specific analysis services server, they do a lot of OLAP processing inside the database so they have many more aggregates exposed inside the database in their language than SQL Server does. If you wanted some of those aggregates or were converting an Oracle application into a SQL Server application this would be a really good use for them.
Greg Low: That’s really good. So if you’re doing Oracle migration this is another area where this might help you do that if you’re dependent upon aggregates that are part of that product but not part of SQL Server.
Bob Beauchemin: It makes operations with databases that do things differently a little easier.
Greg Low: I should say where we now have user defined data types you can now define aggregates that apply to those data types that are domain specific.
Bob Beauchemin: My simple “Hello World” example is the point and then you can use the aggregate to add points together. There is a limitation to the user defined aggregates and user defined types in that they can only save state up to 8,000 bytes, they can’t save more than 8,000 in this release. That makes it trickier to write some of the aggregates you might write by saving every value, for example.
Greg Low: That leads us into user defined data types. As you said this is not intended to be like the complex types and Ancy spec, it’s really intended to extend to the scalar type system. It’s interesting that all the best examples really are extending the scalar type system. Nearly every example I’ve seen from Microsoft has things like point and triangle and so on.
Bob Beauchemin: Well multi-value is different than non-scalar too. They can be multi-value but for example if a point represents exactly one point on the earth or one point on a two dimensional plain, that really is something that represents one fact. I always start out with Dan’s example of the “date” data type. The data actually has three different values. When people think of dates they think of month, day, and year. You wouldn’t likely store month, day, and year in three specific integer columns inside a database, and in fact date is usually represented by a number of seconds since some magic date, so although date has multiple values, date really is a scalar data type. Nobody ever thinks of storing a date as three integers because the SQL people gave us a standard date data type. They didn’t give us, for example, a standard data type for a point on the globe or for position on the earth. So if you wanted to use those domain specific scalars in your application, you can do that with the user defined type. What you shouldn’t think of doing with user defined type is take a person-table and making it into person-object. It just wouldn’t scale or be what you want at all.
Greg Low: The other reason for that is also the lack of some of the other object oriented aspects. A lot of people are not realizing, for example, about the lack of inheritance. If I build a stored procedure that takes animal I can’t pass dog to it.
Bob Beauchemin: Right, although you can conceivably get around that just by implementing every method and delegating to the base class. But again, that’s not what people think of when they think of objects and object oriented programming.
Greg Low: The other one was overloaded methods.
Bob Beauchemin: Overloaded methods aren’t supported at all. You can’t define any of them.
Greg Low: I think those two do tend to also help preclude use as a general purpose object-oriented database. But who knows, it may change in the future.
Bob Beauchemin: There’s nothing that prevents you from using inheritance or overloads in your own internal implementation as long as you don’t try and define those to T-SQL. It’s just that T-SQL has nothing in its catalogue to say “this class derives from this class,” or “this is really an overload of that method.” It doesn’t do the resolution at run time of an object oriented system.
Greg Low: This allows us to build variables, columns, and tables using these user defined types. The next thing you can normally do with a type are comparisons: less than, greater than, equality, things like that. What happens there?
Bob Beauchemin: That’s a really interesting story. Originally, the way to implement that kind of comparison in a .NET object is to implement Icomparable. In the early days they would allow you to implement Icomparable and they would use your implementation of Icomparable, but they were concerned about poor implementation. Imagine an “order by” statement in which halfway through the “order by” something was greater than something and then further on in the order by it was less than it. That could have all kinds of consequences in respect to the engine. So they decided to not go by your implementation of Icomparable if it was in the engine. If your type is to be comparable, as far as the engine is concerned it has to be binary compatible. You can use tricks to make it binary compatible, like put a key at the beginning. Or you can have a special serialization called native serialization and .NET will take care of it for you.
Greg Low: Where it does a byte by byte comparison.
Bob Beauchemin: Yes. Even if you don’t or can’t use Icomparable on the server you always want to implement Icomparable if you want to use that type on a client. Because people might take bunches of these types out over to the client or the middle tier and do some computations. If you do that, Icomparable is your friend. If you expect to use Icomparable outside of the server, it doesn’t use that at all. There’s been a long discussion about nullable and how the is null works and is a little different from the SQL null.
Greg Low: We should mention that your types are required to implement Inullable. They have to be null aware.
Bob Beauchemin: Yes, and they have to implement parse. Other databases allow classes as user defined types and they don’t have those requirements, so what you see is people dumping big parts of the base class libraries into the database as a user defined type. But in SQL Server they have to implement nullability and the ability to parse user defined types from string. This is because SQL Server did not implement a new operator. The only way you can get a new instance of a user defined type is to have a static method that makes one, maybe called the crater method, or to parse it from a string.
Greg Low: So you have to have a default public constructor that doesn’t take any parameters and effectively you’ve also got to implement Inullable. One discussion is the concept of the “is null” method. The question is would SQL Server ever call it anyway since it knows when the object is null. So it wouldn’t physically grab an object and then call is null.
Bob Beauchemin: That’s a big topic of discussion right now because the “is null” method, I always thought, was meant to indicate whether or not an instance was null of a data type. The big implementers of it have been system.data.sqltypes where they are all single valued structures. There is also a special property called “value” and is null apparently indicates that that value is null. It does come out, one-to-one correspondence, that also indicates whether or not the instance is database null. Because of some of the optimizations inside the server, you can get, using “is null” inside the server, the value null if you try to invoke is null on the server. This is very strange because is null returns a Boolean which has two values: true or false. It’s not a SQL Boolean which returns three values, true, false, or null. But you can get that null on the server. It is strange and there’s a discussion going on about it right now.
Greg Low: Another issue is the SQL types versus the native types, implicit conversions, and if you should be writing your code using SQL types and controlling your own type conversions and casting rather than letting SQL Server do that implicitly?
Bob Beauchemin: SQL types are meant to get around the fact that the other way you determine whether things are null inside the client is that if there are null values inside the server they come back as a special class called DBnull and you look for DBnull.value to be true. The strange part about DBnull, the class, is that it can’t be cast to any of the other classes. You can’t make it a null in the client, you have to replace the entire object by a different class. SQL types were meant originally to get around that issue because that turns out to be a lot of different handling of the code. With SQL types you just have to look at the is null property and see if it’s true or not and then you know if it’s database null or not.
Greg Low: The SQL types look to me like a structure that has an is null flag and then the integer value.
Bob Beauchemin: Yeah. Now in .NET there are generic nullable types and unfortunately SQL Server has no concept of using those. It’s like inheritance, it can be used in the implementation, T-SQL or the engine isn’t aware of that type. You can’t, for example, have a generic aggregate.
Greg Low: Would you be trying to write most of your managed code using SQL types or only at the boundaries?
Bob Beauchemin: It depends if you’re writing a stored procedure or a function. If you’re writing a function it can have a special property on the SQL function attribute which is similar to defining this in T-SQL which says on null call, so it won’t call if the value is database null. If you write a function and have on null call you have no necessity to use SQL types inside that function or to have the parameters to that function be SQL types. However, stored procedures don’t have any on null call stuff. If you have input parameters they have to be SQL types because somebody might pass a null in there and if they do you have an exception. It doesn’t even hit the procedure. So with functions there’s a way around it, and that’s usually the semantic you want in a function. If somebody passes in a null value, you want a null value back. Most functions work that way. With stored procedures they don’t have any null on null input. Therefore you have to use SQL types, at least for your parameters, to get the right semantic.
Greg Low: Another area is stability. A key aim in implementing managed code inside the database was the stability of the server itself. What’s been done there?
Bob Beauchemin: Another interesting story. In the early alphas of SQL Server it was even suggested, by people who didn’t know better, that SQL Server could recycle itself like the ASP.Net worker process. I don’t think so, that’s possibly an issue. What they did was amazing things with the .NET hosting APIs. Basically these hosting APIs had been drastically enhanced to allow SQL Server to control everything the .NET runtime does. SQL Server is almost the operating system as far as .NET is concerned. It can control the memory, the I/O completion ports, the escalation of an unhandled exception; anything you would expect from an operating system is done by T-SQL. A really good book on this is by Steven Pretchner and he goes into this in intricate detail down to the last interface pointer and method on the interface. They also put in reliability classes in .NET 2.0. Classes like safe handle. If you wrap your managed thing inside of a safe handle, it is guaranteed to get released even if there is an unload. That is always the way of last resort. If SQL Server has nothing else it can do to save the integrity of the process it is allowed to do an unload.
Greg Low: I heard that described that if it has to choose between your code and it, its wins.
Bob Beauchemin: Yes. That’s because it controls everything, the horizontal, vertical and everything else. It’s very important that DBAs, like those that didn’t want extended stored procedures inside of the database, they should read about these hosting APIs so they feel better about what’s happening and how much control they have over everything. It’s quite amazing the amount of work they put into this. Also, there’s the approved list of .NET assemblies. Every .NET assembly had to go through an approval process before it was allowed to load in SQL Server as a safe assembly. So there are only about 15 or 20, maybe?
Greg Low: Last time it was 15 or so…
Bob Beauchemin: …that can load inside of SQL Server. It’s not that the other assemblies are inherently unsafe, but they didn’t want to allow any assembly that hadn’t gone through the evaluation process to be able to load in there. You can still get those in by taking them as your own and those are stored inside the database just like user assemblies.
Greg Low: We should mention that once you do a create an assembly statement, the assembly is placed in the database, and probably emphasizing the word “database” rather than “server” so these are database specific things.
Bob Beauchemin: Right. Assemblies are scoped to the database because F domains are scoped to the database, or vice versa. The fact that assemblies live inside the database is more like databases that would put jar files inside the database if they were running Java run-time as opposed to databases that leave the code outside the database like DB2’s implementation of .NET. They will load the code off the file system every time. Extended stored procedure and comma object work like this, loading the code off of the file system rather than putting it inside the database. I think this is a good thing for assemblies, that they should be part of the database, backed up with it, and subject to all the same constraints of any other database object.
Greg Low: What does this mean for extended stored procedures?
Bob Beauchemin: Probably means the eventual death of the extended stored procedure, really. You can write your managed code in C++ if you want to so there really isn’t a speed issue there. And extended stored procedures were fairly dangerous, because although they couldn’t branch to location zero on the main thread and cause an exception, they could write over SQL Server’s memory buffers. Even unsafe assembly is safer than the safest extended stored procedure.
Greg Low: Do you think Microsoft will rewrite a lot of their own extended stored procedures as managed code?
Bob Beauchemin: That’s debatable. I remember doing a talk on extended stored procedures once and was amazed how many different things that you would think are part of the server are actually written as extended stored procedures rather than integrated into the SQL Server.exe. Those I don’t think they’ll write over. They’ve already taken responsibility for those. Those are part of the server as far as everybody is concerned. That’s like asking if they would rewrite the internals of the server in CLR—doubtful that they would do that.
Greg Low: The naming conventions are a giveaway. Originally they were all XP_ and the standard systems stored procedures were all SP, but now if you look at the management studio folder and the extended stored procs there are quite a lot that are SP_something now that are extended stored procs. There must have been quite a migration of code into extended stored procs.
Bob Beauchemin: Yeah. It seems like a lot of the new features were implemented as extended stored procs rather than putting them right into the engine. Also interesting, if you go to the books online and go to the programmer’s reference section of the books online and then the extended procs section, first thing it says is that extended stored procs will be depreciated in future versions and you should use CLR. That’s a pretty straightforward statement of intent.
Greg Low: Another aspect is when the assemblies get loaded up and placed in the database it can also place additional files in there as well as the assembly?
Bob Beauchemin: Yeah. It can put in any file you want. You can associate a text file called “Notes to self” with an assembly if you want. The usual reason it’s done, if you’re in development you would do this in production, is you can debug a CLR stored procedure. One of the things you must have to debug a stored procedure is a .pdb, or symbols, file. The thing this is mostly used for is to get the symbol file into the database. Earlier versions of the betas would go out to the file system and try to find the symbol file out there. This version does not and won’t in the RTM version. Therefore you do have to have a .pdb file. Visual Studio goes one step further and they will put the source code and even the project file inside the database. You had an idea where you were working on something to have that source code compiled inside the database. How did that ever go?
Greg Low: I was keen to build a user defined function that effectively took a string that said which language you wanted to build the code in, another string which contained the code, and returned back a sequence of hexadecimal as ASCII so that could be used in a create assembly statement to generate the code directly. I was thinking that the command line compilers were already on the machine anyway. One of the concerns I have a little bit down the track is how we’re going to go with versioning of assemblies.
Bob Beauchemin: Well that would be a way to work around it. Although the DBA could just have the complier on her desk too, and maybe do that at the time you come around with the code. Sounds like an interesting idea.
Greg Low: I’m guessing it may have to be an unsafe assembly if I’m going to dive out and do command line compiling and/or things like reflection/admit or any of those things in memory. Anyway, it’s a topic for another day. Another DBA concern would be for security in the database and how the managed code would affect that. Thoughts on security?
Bob Beauchemin: It does seem like the hosting API folks have outdone themselves with giving SQL Server every kind of entry point it could have and ways to monitor what’s going on in the process. They also added those host protection attributes which give SQL Server the ability to look at what any call is doing, or really it’s the author of the library that tells you what the call is doing. SQL Server can decide which of those three buckets the call should be in.
Greg Low: These are the permissions sets: unsafe, safe, and external access.
Bob Beauchemin: Right. That’s what they do to say what should be in what permission set because there are certain things like MScorlib that has a lot of functionality you would want to use inside of it, you couldn’t really function without MScorlib. But it also has some fairly unsafe stuff like system.threading and so they had to have a way to distinguish at a more micro level than the assembly level itself, so that’s what the host protection attributes are about. I think after what I’ve been reading lately that I would tell DBA’s with questions on how this worked that the best thing they could do is read about the hosting APIs themselves so that they’re convinced there’s no way people could go around the security that’s built in. In fact they used to dare people to do that at beginning. A lot of people tried to but very few people did anything that could even be considered hurtful.
Greg Low: That’s great and runs us close to time. I’ll certainly get you back another day when we’ll talk about assembly versioning and alter assembly and those issues. Last thing is what you’ve got coming up yourself in terms of books or presentations?
Bob Beauchemin: We’re doing another revision of the SQL Server 2005 book. It’s going to be done after RTM this time so hopefully things will have quieted down a bit. I’ve got a lot of stuff going on with the new company, SQLskills, gigs going on with them. I’ll be doing a couple of talks at SQL PASS, a two day preconference at SQL PASS, and a talk after that on XQuery and XQuery tutorial, just on the language itself.
Greg Low: Great. I’m presenting a session myself at SQL PASS and the MVP Summit is there at the same time and it should be a really interesting group of people.
Bob Beauchemin: Great, I’ll see you there. And if you’re in Hong Kong the week after that I’ll be doing TechEd in Hong Kong.
Greg Low: I won’t be there but I’m sure some of the people listening could possibly be.
Bob Beauchemin: Oh, great.
Greg Low: Thanks very much, Bob, and hopefully I’ll catch up with you again soon.
Bob Beauchemin: Appreciate it.