Pablo Castro is a technical lead in the SQL Server team in Microsoft. He has contributed to several areas of SQL Server and the .NET framework ranging from SQL CLR integration to the TDS client server protocol and the ADO.NET API. Pablo is currently involved with the development of the ADO.NET Entity Framework and also leads the Astoria project, looking at how to bring data and word technologies together. Before joining Microsoft Pablo worked in various companies on a broad set of topics ranging from distributed inference systems for credit scoring and risk analysis to collaboration and group work applications.
Greg Low: Introducing Show 30 with guest Pablo Castro.
Our guest today is Pablo Castro. Pablo is a technical lead in the SQL Server team in Microsoft. He has contributed to several areas of SQL Server and the .NET framework ranging from SQL CLR integration to the TDS client server protocol and the ADO.NET API. Pablo is currently involved with the development of the ADO.NET Entity Framework and also leads the Astoria project, looking at how to bring data and word technologies together. Before joining Microsoft Pablo worked in various companies on a broad set of topics ranging from distributed inference systems for credit scoring and risk analysis to collaboration and group work applications. Welcome Pablo.
Pablo Castro: Hi, how are you doing?
Greg Low: Good. At this stage you are fairly busy in your team.
Pablo Castro: It is quite a crazy time. We are in the Entity Framework and Astoria. We are final stages. We don’t want to miss anything. Make sure everything is just right.
Greg Low: Describe how you came to be where you are today working in this job.
Pablo Castro: I started working at Microsoft five and a half years ago or so. More maybe. Before that I used to run a small start up and I always loved databases as a topic that I am really interested in. I have a few friends at Microsoft and one day they said, why don’t you come to the SQL team. I couldn’t reject the offer. I came over and started out in low-level programming interfaces like managing native drivers for SQL Server. Spent time on the protocol. Then I also spent time in SQL CLR like integration between SQL Server and .NET. And I got involved with .NET. Sitting somewhere between the two worlds, the data world and the .NET world. When time came, it was about, how do we move data to the next level? Go from being a low-level thing you use to exchange SQL statements with the database, to see how we can do better for our developers and for the evolving needs out there. Out of thinking of a lot of people in the SQL and .NET buildings we started to come up with LINQ and the Entity Framework so I ended up with being involved with those things. I was sitting there along with other people, the Entity Framework team and LINQ was all over the place. Entity Framework became a big thing inside and outside the company. Becoming increasingly known. The Entity data became important, a central piece of our strategy for data. I was interested in how these tie up to the Web. How do we enable data sharing between systems and people, system-to-system sharing as well on the Web. Opportunity for data services. I was involved on the Astoria project, the data services project. That’s where I’m spending most of my time right now.
Greg Low: The first time I came across you was probably in ADO.NET. I do remember the software design review back in 2003 where we looked at what ended up becoming LINQ. I remember it being a lot of interesting expressions around the room as people were looking at it. A lot of the developers have been across some of this, but maybe the DBA folk not quite so much. They will see a lot of this coming out at the SQL launch event. Perhaps we can start with LINQ and work our way there. Definitions. How you see it fitting in.
Pablo Castro: Let’s first look at LINQ by itself in isolation. The language at Microsoft came up with a set of extension language so you can write queries. Not to create a gateway. This is to build query concepts and expressions into the language itself. Whenever you are dealing with data in all of the cases you use the same style and constructs to formulate the queries and manipulate data. It goes well beyond databases. How you interact with data and how can you set base constructs and bring them into the more general programming C Sharp and Visual Basic. That said, it is bounded to query formulation plus a nicely refined framework to work across many data surfaces. Like we ship in the box and a way for third parties to bring their data into the picture.
In practice, very rarely do you have LINQ in isolation. You use some framework to tie up your environment with the data. If you are using a database with LINQ you use LINQ to SQL or Entity, one of these frameworks. Bring more than LINQ formulation. You bring change management, objects and then you can manipulate them and push changes back to the data. They automatically track changes, generate sequel statements. Overall, when you talk about using LINQ, it means more than LINQ, it means that I am using some LINQ-enabled framework to interact with my data source.
You can see a world with a subset of developers with use LINQ to objects and they have nothing to do with databases at all. A memory technology. When it comes to databases, LINQ becomes a way of formulating queries that in the end become database queries. They are translated and they become dynamic SQL statements. Objects are regular .NET objects. They make changes to them and we track the changes so when you say I am done making changes, make them persistent. We can then use SQL to push the changes back. With this picture in mind, let me come back to LINQ and the DBA take on it.
First, LINQ is query formulation technology or mechanism. You will be generating SQL statements on the other end. In some environments, generating dynamic SQL is the right thing. There are a number of environments where policy or security considerations, they have a fixed API to the database, the question there is what is there in LINQ? I want to keep my DBA, keep my regulations and style of maintaining my database. Can I enable my developers to still use some of these new technologies to write their apps without disturbing the database operation? LINQ itself is a query formulation mechanism. The Frameworks do support. Both LINQ to SQL and Entity can bind procedures. You just write a function call and we will give you a .NET around it so you don’t have to build a command option manually. You get full .NET objects. First, you get objects back that are easy to build with. We can still do all of the change tracking and automatic update management for you. Quite a bit of substance. Still use LINQ to SQL and entities in your environment. And the same thing applies to the update path. Generate dynamic SQL statements. Good balance. If the DBA needs very precise control of how you get into database and what you do with it, you still get that and still developers still get higher level screens in dealing with data.
Greg Low: How does the question that tends to come up as soon as you have DBAs involved that tends to mean you have larger databases. Question is the performance one. If we are using LINQ to SQL, where does that take us performance wise? When I looked at Scott Guthrie’s video, what he did was if you are concerned about the quality of the code generated, you can put a break point here and open up the window and look at the code. He said look. I thought it was horrible. The funny thing is that most developers I have talked to watched the same video and said I am glad that I didn’t have to write that SQL. I said nobody should write that SQL. The main thing was that it had things like instead of selecting from the tables, it did sub selects. When if followed a referential integrity path or a foreign key it tended to do left out of joins instead of inner joins. What is your feeling about what this means performance wise?
Pablo Castro: First, philosophical tact. In principle, there is a tradeoff when you go for abstractions. The reason for the frameworks, if you look at an application, they all follow a similar form of data access layer, business objects and presentation infrastructure. With the frameworks we try to help people not develop the data access layer and the bottom layer of the business access. We are trying to help people abstract things out. It will cost performance, it will. These things will rarely go faster than the previous thing. What is the tradeoff? How much productivity and enhancements you will get out of it. Assuming performance is within a reasonable difference, then tradeoffs start to work and it makes sense to make an investment in a little bit of performance to get maintainability enhancements.
More practically, how is the system layered in practice? We have the .NET 2.0 version of the framework. SQL Command, Connection. Some implementations are on top of native APIs. These goes straight to the server. Now we build frameworks on top of these layers. The two main efforts are around minimizing overhead on top and second on leveraging opportunities that developers wouldn’t typically do at the lower layers. We look at SQL, we generate a lot. Things that may help the entry level developer adopt these best practices.
One is the kind of SQL we generate. Machines tend to be dumb for many of these things. If you look at the stacks, Entity. We do heavy query false prophecy. We have a whole transformation phase. Nested joint, query elimination is an issue. First, nested joints or nested queries are an artifact of how it does implementation. We eliminate a lot of them. I sit two floors down from query optimizer guys. We exchange ideas. We are generating this class of queries, how will this go? We have a good flow between teams. That helps a lot. Some queries are ugly but they don’t generate bad things. The other aspect is about NET overhead. In the end we get the query formulator either using LINQ or using Entity SQL, a textual query language for the Entity data model. We have to go from there. We have heavy mapping layers to transform model. Type system difference. Translate one to the other. They take time. The transformation process which is very CPU bound.
Greg Low: In the latest version of SQL Server with 2008 coming there is a further move to even up the type systems.
Pablo Castro: The closer we can get on type systems, the better. How .NET does type and SQL does types. We don’t introduce types that aren’t present across the board. Before these were smaller codes that grew up from historical paths and they had to live with their own compatibility challenges. Trying to make them aligned.
Greg Low: With the query engine, are they making changes to better accommodate the types of queries being generated by LINQ?
Pablo Castro: Query optimizer team and many SQL teams they have playback. They record workloads from real world application to tune the optimizer of the database. We have a set of patterns they can feed into their equation for consideration. Today we have looked at these. We look at how we can tune the database so it is a better server for these applications. We are doing some. We will do much more as these frameworks become more popular.
Greg Low: In summary, LINQ is a language enhancement. It is an important message. Some people see it as just LINQ to SQL. It is not. It has a wider variety of things. I find LINQ to XML quite a useful one.
Pablo Castro: LINQ to XML changed my life. LINQ is so nice. People don’t look at it hard enough.
Greg Low: It has an elegance to it. LINQ is important to realize that people will tend to write their own providers. People will find lots of different LINQ providers will come out over time. They may or may not be related to a database. One question, I often see people describing part of the reason they wanted to do the LINQ thing is to come up with constructions in the language. I didn’t want to learn T-SQL. Why did they pick keywords that were SQL like, rather than more generic object-related terms.
Pablo Castro: Tradeoff. In the early days of LINQ, it wasn’t called LINQ. We used to have sequence operators which are still there. There was no nice comprehension syntax. We got a lot of pushback. The message we got was, more of a C Sharp SDR. They really liked the idea of integrating two things, the data and programming worlds. But quite a bit of resistance. You do SDRs to get outside perspective, because you are too stuck in your own ideas. We do SDRs so we aren’t stuck. We got a clear message that folks expect more like a query formulation, a syntax for query formulation. A comprehension syntax along with features that enable this. The initial BASIC folks looking into this space. The projection piece first.
Greg Low: I do recall that. I remember the guys saying, how would you feel if the form was first. One of the driving things has been is that over 90 percent of applications are data related. Yet, data is an add-on library to the application. It seemed separate. It struck me that I never got the same productivity as I got with a language that had data embedded in the language in the first place. Progress and older 4GLs, when the entire language was a construction that understood data. Always a database context in the language and I could write a statement using 4H whatever and the language understood what I meant.
Pablo Castro: One of the challenges with those, a broad range, embedded SQL all the way to languages that have always been like that, FoxPro. One of the problems it is hard to modernize them if you have requirements that push you in that direction. A two-tier application can do its job just fine, but if you are pushed into a three-tier system. It is almost unviable. Even in FoxPro where we added technology. In the end, the value of the language is compromised. Once you go into multi-tier systems or your servers have the whole Web in-between they don’t work very well. On the other hand, the LINQ approach to things, the language guys said this is how you write queries, regardless of your language. Then you plug in a target. Target could be a database. Or it could be your middle tier if you are talking to your server.
Greg Low: That’s probably a good point to take a break. When we come back we will talk about the Entity Framework and data services. Have you been living in and around the Seattle area for a while?
Pablo Castro: I have been here for five years or so. Before that I lived in South America. It has been quite a change.
Greg Low: Weather?
Pablo Castro: There are a lot of crazy stories. It rains a little. It’s not that bad. I come from Buenos Aires. It’s really hot. There is a good mix. Sometimes we go with the whole Astoria team and spend a day on the slopes. Or you go skiing at night. It works out pretty well.
Greg Low: Do you have apart from this passion, any non-computing passions?
Pablo Castro: Very few. I haven’t had a lot of time lately. I like computers a lot. I would do it even if I wasn’t getting paid. Since I moved here I got into snowboarding. I have a kid who I spend a lot of time with. I have gotten into biology.
Greg Low: One of the things I have started doing is I have started learning Mandarin. The thing that impressed me is that I did a summer immersion course. I saw Ang Lee’s new movie and I was surprised how many things I understood. Maybe it is possible. There is a podcast, Chinese Pod. Every day I get Mandarin for 10 minutes. I am enjoying that. The other things we need to look at, so Entity Framework is the next layer in this whole thing. The quick view, what is it?
Pablo Castro: The Entity Framework looks like an object relational model. Why it matters is more important. The Entity Framework brings an entity relation model into the picture, mainstream development in the .NET environment. So the idea of the Entity Framework, imagine you can paint a picture of your data using a conceptual model. Simply say what the data you want to work with should look like. Think of this as a highly expressive model. The model is very well described then you can layer a lot of services on top of it. Reporting, data services and all of those things. Semantics of what you want to expose, keep in mind. Inheritance, associated with each other. If you have something like this, building a report will be easier. Your business objects and how they relate to each other instead of how rows and tables relate.
Greg Low: This is the sleeping giant. The Entity Framework is the one people will want to program against. I see simple examples of LINQ to SQL, but this gives you a one-to-one mapping to the tables. The example I did in the launch events, many-to-many relationships. Passengers and flights in an airline that would typically be an adjoining table, the Entity Framework allows us to program the passengers have flight collection and the flights have a passenger collection. Different concepts in the underlying structure.
Pablo Castro: The Entity Framework pushes the abstraction up by supporting the EDM as a first-class construct. The EDM becomes an executable model. Make the EDM what the rest of the application uses to get to your data. Since the databases today are mostly relational, we include a powerful mapping engine, expose relational models using EDM terms. This is a first step. We will have a way of bringing data into EDM terms like the Entity Framework and then we can create services cross cutting all of the schemas that use the EDM. Synchronizing whole customers. That is a key aspect. There are technological advancements. Broad data platform being one of the initial building blocks for it.
Greg Low: Entity SQL often comes up. Where does that fit in?
Pablo Castro: The EDM, or Entity Data Model, has its own constructs. You need a query language that goes with it. SQL is a well known thing that we didn’t want to deviate from. We took traditional SQL and added just enough to support the constructs used by EDM. What if you have an inheritance hierarchy? If you have a hierarchy that says I have contacts and you have sub contacts, say employee and customer and they have some things in common, a name and phone number. But the customer has other information that differs from the employees. Sometimes you want to treat them uniformly. List of all of the people you know about. Sometimes you want to say, give me all of this container of people, give me only the customers. How do you ask those questions in SQL? There is no column to identify who they are. We use a few constructs in SQL to express those things. The same thing in Navigation. In EDM they are modeled as associations. They are more explicit ways of tying things together. Entity SQL constructs that allow you to say give me all of the orders for these customers. You write a joint that finds customer by some criteria. In Entity SQL, given these customers, give me these orders. Under the covers we will do a joint, but the important thing is the query formulation perspective, it looks like a navigation or an association traversal.
Greg Low: One of the points to make is that the mapping lives at the application level. One of the immediate reactions, couldn’t I do most of that by building different views in the database?
Pablo Castro: Two aspects to this. Some extent, yes. First you can only build do many views and make it practical to manage them. If you have a lot of applications, those databases tend to have more applications growing around them. If you have to create a custom set of views for every one of these applications, it gets tricky. You have this reshaping which makes it hard to manage. Or you have a ton of view on the database. Views are within the closure of the database so they can’t expose more than the underlying queries. Drop columns or join tables together. You can’t use views to model containment, inheritance or constructs outside of the data model of the database. That is one of the things we do in Framework. We give you more constructs. So these combined make a strong case in the end every application will need a different perspective on the database, we are creating it is easier and convenient from the DBA and the developers’ perspective to create these different perspectives on the database without causing a bunch of noise in the database or creating some implicit artifacts which are hard to manage.
Greg Low: One of the other points is that the database itself in most large organizations tends to outlive the applications. The only question is, one of the things I like about the mapping layer is that is spells out the relationships, otherwise people formulate in queries have to remember to put in where clauses, and sometimes they forget. The one concern I have is that if we have all of the mapping files which are outside the database, doesn’t that make it harder to re-factor the database?
Pablo Castro: To reflect the changes they need?
Greg Low: Yes. When it was contained within the database, I know the surface area of what I need to touch to re-factor. If mapping layers live outside that, doesn’t that run the risk similar to what you see in organizations today, lots of access applications hitting the database.
Pablo Castro: The database has a policy interface; sometimes the tables themselves, or view layered on top of them. Whatever entity model or mapping will layer on top of the API. You have the same bar from the compatibility perspective at the database level. If you see an app that doesn’t use this conceptual modeling technology, how do you avoid breaking the application? The one other benefit is if you have proper model management, the artifacts are now loose files, you can imagine a world – this change I can’t make in the database without breaking the contract. What I could do is bring the contract to the database level but then compensate in my mapping files. Today that takes a redeployment of the mapping files. Today it is typically cheaper than changing the code.
Greg Low: Should we have the mapping files themselves living in the databases?
Pablo Castro: We been debating this a lot, what is the right way of managing meta data. We are exploring this space. We don’t have anything right now. We don’t have a management thing. There are a number of groups in Microsoft exploring building repositories for meta data. We will are going in that direction. Not sure quite what that means. Pressure from bigger customers in that space.
Greg Low: We have talked about LINQ as a language construct, LINQ to SQL as a one-to-one mapping, and Entity Framework as a way of mapping business objects to the end-of-the-line database. The other one is EDR.NET data services, formally Astoria.
Pablo Castro: Astoria, the motivation. If you look at the data technologies we have today they assume line-of-sight. There is whole space, not tapping. What if client and server are across the world from each other? That is more common today. Rich Internet application talking to server. Silverlight. Server on other line. Full-on application that wants to talk to a server across the world. Many applications are becoming hybrids, applications and platforms. Facebook to Flicker to Twitter. They are applications and sub-platforms. Developers to work against them. Exchanging data across the Web as a first-class construct without anything around it is becoming important. Astoria wants to produce technology to have developers build services that expose data to the Web as an API for others to consume and consume data from the Web that is natural for the Microsoft environment in general.
Greg Low: If people had asked for Microsoft guidance a few years, the answer would have been Web services. What is the move from Web services and why is this interface a better one?
Pablo Castro: I see more of a broadening thing. We have a solid story. It’s been maturing over time. If see the WCF stack it is in good shape. We have a class of services that are operation oriented. There is a trend for another class of services centered on the entity, the resources they expose, not so much about the operations. They tend to have a uniform interface. Entities have semantics but have uniform actions you can take on any of them. The restful interfaces follow this pattern heavily. They are leveraged in HTTP as a protocol.
Greg Low: Let’s define or mention REST. Many of the DBAs might not have come across that.
Pablo Castro: REST, is a way of an architectural style of building applications that are data less applications, resource oriented and they have an addressing scheme that you can point to resources using identifiers using things like URLs. Server serving resources, you can access those resources with HTTP, a stateless protocol. Obtain the representation of that resource. Modify or delete it using well known operations. Layering is possible. Layer security is possible. This is becoming interesting because it is a very simple mechanism. Little need for background knowledge know-how or extensive tooling. It has been shown to scale very naturally to the high end needs of Web-wide applications. Astoria says if you want to expose your data in terms of HTTPN point you have to explain what the data looks like. Model it using it the EDM and we will in the EDM, we will give every entity a URL and turn associations into links. Results is an end point that has a graph of all of your data with the links as result of the associations. It means you have a flexible interface to your data. Don’t have to predict all of the usage patterns. You can secure it in a policy-style way. Once you chose what to surface, you don’t need to specify how to access the data. Different client applications can choose the right way of accessing the data.
Greg Low: This fits well with the Silverlight and Ajax applications?
Pablo Castro: Yes. If you see what happens with those applications data access typically happens in the server web applications and you render some HTML out of that and send that to the client. With Ajax and Silverlight, that’s not the case anymore. The Web page with the UI has been served and now you are sitting at the client and want to show the user data. With Astoria, you can go to your server and interact in terms of data and show the data in the UI. You can also do it with Web services. Do you want a fixed entry point in the database? Or do you want a more flexible data model but it also has the challenge that it is more open ended, a broader class of queries coming into your system?
Greg Low: The other terms we need to come to grips with, JASON versus XML?
Pablo Castro: In Astoria we tried to be a flexible mechanism to expose data to the Web. Not to dictate too much about how you consume it. We have a flexible mechanism for representing entities. We support a few and introduce more as we get customer needs. Now we support APP protocol an XML-base encoding for data. We support JASON and it is a smart trick. People that created JASON took a subset of JAVA script and they used it to represent data. If you are running in JAVA script environment, it is very straightforward. Each one of them has its advantages and issues. There are targeted to different scenarios.
Greg Low: That’s a great summary of each of the three technologies. The final question is where are we at release wise with these things?
Pablo Castro: We shipped around December last time. Will do another iteration sometime this year. Looking at shipping mid this year.
Greg Low: Everything but LINQ which is already shipped?
Pablo Castro: LINQ has shipped. LINQ to SQL has shipped as well. LINQ to Entity mid this year. We are on track. Talking about this a lot to show the latest advancements.
Greg Low: What do you suggest people do to get involved?
Pablo Castro: Go to the forums. Download the bits and try them. Ask questions, send us comments. The forums are active. Blogs.msdn.com/ADRNET. Astoria team blog. Blogs.msdn.com/Astoria team. We try to share. Happy to take questions. The next conference MIX 2008, I will be there. Web oriented conference.
Greg Low: Later in the year, where will we see you?
Pablo Castro: TechEd mid-year. PDC at some point.
Greg Low: The PASS conference is in April in Germany and November for Seattle. LINQ and Entity content in that as well.
Pablo Castro: You can find us in all of these places.
Greg Low: Thank you for your time, Pablo. Hope to see you again sometime soon.