Karen López
SQL Down Under Show 57 - Guest: Karen López - Published: 12 Mar 2013
SDU Show 57 features SQL Server MVP Karen López discussing data modelling, database refactoring, space data, and open data.
Details About Our Guest
Karen López is a Senior Project Manager and Architect at InfoAdvisors. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. She is a Microsoft SQL Server MVP, blogger, and speaker and she wants you to love your data.
Show Notes And Links
Karen López is a Sr. Project Manager and Architect at InfoAdvisors. She has 20+ years of experience in project and data management
on large, multi-project programs. Karen specializes in the practical application of data management principles. She is a Microsoft SQL Server MVP, blogger, and speaker. She wants you to love your data.
Karen blogs (http://blog.infoadvisors.com/index.php/about/karen-lopez/)
Karen tweets (https://twitter.com/datachick)
Open Data site (http://data.gov) (includes links to open data sites for other countries)
Show Transcript
Greg Low: Introducing Show 57 with guest Karen López.
Welcome, our guest today is Karen López. Karen López is a Senior Project Manager and Architect at InfoAdvisors. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. She is a Microsoft SQL Server MVP, blogger, and speaker and she wants you to love your data. So welcome Karen!
Karen López: Thank you Greg!
Greg Low: Is so as I do with everyone, I will get you to tell us how did you ever come to be involved with SQL Server?
Karen López: Gosh I have to think back on that one.
Greg Low: Aha.
Karen López: Definitely my background primarily as a data architect meant that I ended up a lot of times practically being an accidental DBA. At least I am one of those data architects that are not so common that can take a data model, logical model, run it through physical design and I always insist on generating first cut DDL to make sure my models can actually be used on real-life projects. So that definitely ties me closer to databases than most data architects I would say. Then you start learning a little bit more about operations of DBMs and I also do some volunteer work for not for profits to help them manage their servers and their databases and that has also got me more involved in SQL server. That seems to be a very common choice for them.
Greg Low: So you have been working with it a long time now?
Karen López: Yes.
Greg Low: With SQL server specifically?
Karen López: Yes I would say so. That is why with my profile I say just 20+ years because I decided that was the time to stop counting.
Greg Low: Yes I can relate directly to that.
Karen López: That’s great exclamation
Greg Low: So listen one of the first topics that I really wanted to start discussing with you is around data modelling. One of the complaints I hear all the time in the SQL environment is that is probably one of the biggest holes in the platform is the lack of tooling to help with modelling and so I am interested in do you use any specific tools?
Karen López: Well primarily I use what my clients use because I am in a service industry. That means over the years I have used tools Sierra data modeller, Embarcadero ER Studio, a little bit of Sybase which is now SAPs Power Designer, they would be the top three anyway.
Greg Low: What is your feeling on each of those?
Karen López: Well I enjoy all of them, the thing I tell my clients and a lot of people asked me what data modelling tool? Because I believe strongly that data modelling should never just be just an analytical exercise, a theoretical exercise that some modellers petition it as even that is incredibly important and that is actually where my passion lies is that I still need to have models be usable and we need to remove obstacles from them. I am not a fan of data modellers they do a pretty data model and then send it around, embedded diagrams in a PDF and tell the DBAs and developers do your best to make your database match this.
Greg Low: Yes no.
Karen López: Because of that I want to do round trip data modelling which is you have an inspiration or requirement you model it and conceptualize the logical, design it into a physical model, build something from it or parts of it even then put that into an implementation. It might not even be a relational database and then as requirements change all better understood then I want to go back to them model and make the changes there and run it through the process again. Because of that I want traceability, what I call what model driven development, then I think the best modelling tool for someone to use is the one that best fits your physical environment while also porting your needs for analysis and requirements and to support the modelling process. That’s why I tend to use enterprise class that data modelling tools because a lot of either free or lower end tools are really just diagramming tools.
Greg Low: Yes.
Karen López: They don’t really support the round model driven development.
Greg Low: What is your feeling on the quality of the DDL that comes out of any of the higher end tools?
Karen López: Well it sure has changed over the decade. I remember will one of the first modelling tools I used generated DDL and it didn’t support specific data types of the target DBMS and it generated tables and columns only in alphabetical order.
Greg Low: Lovely.
Karen López: That meant the DDL that was generated was almost 100% written rewritten by hand and it was very painful. That was quite a while ago. Right now my goal on all my projects is to have the DDL that I can generate from these tools support all or substantially all the features that I need in the target DDL. That is limited to the types of things that come out of the models, for instance in my modelling to the enterprise class tools support not just tables and columns and data types but file stores which is a generic term, indexes, partitions, all the major components of an object. Data modelling tool vendors have done over the last few years is give modellers the chance to extend their DDL generation, either by actually customizing via template. How the generation happens or trips to be am to do those things. It is very rare now that there is something I need to do in the DDL that I can’t do automatically in the modelling tools.
First there is fear and a lot of urban legend going around that the DDL that comes out is just not usable when a lot of it is one of the things that it is hard to explain to people. People who pick up a data modelling tool often is overwhelmed by the complexity of at all because it does support 50 different ways, I am exaggerating here. But several different ways on how you would like your DDL to be generated. It is not enough to hit next, next, next in the wizard and generate that DDL and say it is not doing what I wanted to do.
Greg Low: Yes no indeed, I suppose the main thing is so many of them has so many configurable options that you can have a lot of control of how that comes out anyway.
Karen López: Exactly in the tools I used often. Often thousands can be set and you say those are the setting so for instance I will have different profiles that I’m generating from the same data model. DDL for the development environment versus QA and production just because there are different things that you might want to do based on how those environments differ.
Greg Low: I suppose at a bigger picture level, one of the question I see a lot of arguments in sites about whether people should be designing a data model or whether they should be designing an object model Which end of that spectrum do you tend to lie?
Karen López: Well so as a long-term data modeler, definitely in the data model side but that doesn’t mean object models shouldn’t be done. Whereas I see by object model, you mean things like entity framework, code first and.
Greg Low: Well if I take while may be a specific example, if I have something like an object I want where I have flight and I have passengers, maybe I have a flight object and I have a passenger object but then I want to model the many to many relationships. I have got which passengers I have on which flights now presumably the object layer I want to have a flight object with a passengers collection as an attribute of it and I want to have a passenger collection with flights collection as well but at the database level I would probably want to have a flights table and the passengers table and maybe some sort of flight manifest that says who is on what flight. I suppose the question is, is there sort of an in that you prefer to start at? Some people see it as the database as just a place they put their objects, the concern I have with that is that it does tend to lead to a whole lot of little information is silos or alternatively you start with the data modelling and but that is usually not whether developers want to start.
Karen López: And you are right on both sides, so my work is on designing models and databases for the persistence of that data. So one of my taglines is love your data, is about how data lasts much longer than code. So the object model typically is specific to a process or to an application or function. Whereas the persistence layer, I am thinking about that data as we store it and that often has much more complex requirements. You know you talk about and I have done very large data models and designs for just example you talked about the travel industries and that simple example I really simple which means that they are terribly wrong but I happen to know that just that one little bit you talked about with several, several tables that and someone programming a specific function such as keeping track of did that passenger actually fly on that flight for the award of frequent-flyer miles in an object model would look completely different than in my persistent model because I am trying to keep track of all the data. Not just that they reserve that flight level but also did they actually fly? Was there another flight substituted? Was it an airline that could no longer earn the frequent flyer miles on?
It gets really complex, one of the reasons why developers lean towards this object model, because they want to focus on the one problem they are trying to solve. Where is data modellers architects are trying to combine all of those requirements in a way that will meet all those needs. That is where this natural sort of conflicting point of view comes from.
Greg Low: Yes I like what you are doing.
Karen López: There is no one wrong answer, right? There is no one right answer and there is no wrong answer.
Greg Low: Indeed, I like what you were saying about the life of the data because one of the discussions I often have is that generally in most organizations I think the data is the most valuable thing the organization owns. I think it usually out lives generations of applications. Most large organization’s I see the data just gets morphed from one shape to another but it often lives on and on and on.
Karen López: That’s right. I work with data at utilities that are 100 years old infrastructure that was put into place hundreds of years ago. Addresses that were established long before then, buildings those were older than that. Right? Every time someone even before their computing era people need decisions on how to store that data and I see a lot of times on my projects. A game that natural differing of view of we compensate developers for getting stuff done. Even agile manifestos, just do enough, just enough and do it faster because we are taking too long to do all this design and stuff. I fully agree with that sentiment but the data, there are destructive decisions that you can make about the data that can never be undone. The perfect example was if you have an address data that right now persisted in all of its tiny components: street number, street number suffix, street name, street suffix, prefixes and you decide for your application that you are now going to just combine those into Address 1, Address 2 etc. You can never automatically under do that back to the finer level of granularity.
There are ways of trying and the example I use is with people’s names. My quote middle name which is a series of specialize names because of the culture I come from. You combine that altogether as one middle name, there are about six words in it and if you combine that altogether. You could never accurately pass that back out again.
Greg Low: Yes I know certainly with most of my Spanish friends, it is a good example of where I just look at that and go when I see their full name and how it is really done there is just no way you could just past that or try and work out how to try and put that back together.
Karen López: Yet people will tell me they absolutely could. They could run it through a passing algorithm and offer a list of valid names and somehow figure out that, you know through the prepositions in there and spaces and hyphens and everything that they could get it back.
Greg Low: You are just guaranteed to get it wrong.
Karen López: Right, exactly!
Greg Low: That would be or you would be doing.
Karen López: Yes.
Greg Low: Yes I think that is a great example of something that is just something that is a very, very hard problem. It is the same as when I have components of names, I would never presume that I would know how to put a name together. I would like to ask the customer, whoever, like what is their actual full name as well not just components of names. You can just not assume how that would be assembled from that.
Karen López: That’s right and it is contextually sensitive, so someone from an Asian culture thinks of what their full name is varies based on whether they are doing that locally. The order that names appear is different in Asia and parts of Asia than it is in North America or Europe. Same name, it is just how it is presented and you get to things like Dutch names where the van, you are not supposed to alphabetize on that because that is the prepositions.
Greg Low: Yes.
Karen López: The phone book would be fall of Vs.
Greg Low: Yes, exactly.
Karen López: And yet when we take their names in other parts of the world and we do that, they have to know how to change how they tell people what their last names is so they can look at up.
Greg Low: Yes indeed, the Asian one is a good example obviously where family name it is considered to be the most important thing and then it sort of works down from there. In fact it is interesting when you mentioned addresses because they do exactly the same thing with addresses. They are actually; they start with what country, what province or region, what street, what building, then eventually that the person. We actually do it the other way around by default but again it is an interesting one where if you have not stored the components you could not reassemble it in a different order if that was required for a different destination.
Karen López: That’s right. This whole concept of what we consider really basic, easy information. It is actually that some of the most difficult things to get right because of regional differences, cultural differences, transliteration differences, all of these things. Having these discussions with people who have been told that they have two days to code it all and yet I have all these questions. They say just give me the first name, middle name, last name.
Greg Low: Yes.
Karen López: That’s fine, except that our first appointment will be in Singapore. So do we want to get it right anyway? And then the other thing.
Greg Low: Even in a country like Australia, again I will get people with first name, given name, last of, first name, middle name, last name but then things like which of those are required fields. Now, there is actually a standard here in Australia that is evolving but it basically says you should be using given name and family name but the given name is the only required field because we actually have a whole bunch of people in the community that only have a single name.
Karen López: That’s right, there are so many cultures.
Greg Low: Yes.
Karen López: Then I’m trying to, you know I’m trying to write rules, and people are trying to write data quality rules. People really do want the data be to be right and then I will get the requirements along family name should be at least three characters long but then there are entire cultures where names are one or two characters long.
Greg Low: Yes that’s right; I mean Ng is a very common Asian name.
Karen López: A lot of times the data model might produce that is trying to anticipate international rules seems so overly complex to my developers and my DBAs. I have to tell them another one of my taglines is “if you want to make the data model simple, go out make the world simple and then come back and we will make the data model simple.”
Greg Low: Yes in fact one of the ones I loved when people were discussing spatial data they were talking about all the different modelling schemes that they had inside SQL Server and that is all good. There are areas in the US where for example they simply legislated that something was a different shape to what it is. Then you go, um. I can’t deal with that.
Karen López: Exactly!
Greg Low: The real world intervenes, sometimes yes and it gets quite bizarre. What are the most common mistakes you find with data modelling?
Karen López: Well they run such a gamut, I said in my bio I like the practical application management so a lot of times I am sitting on a project between traditional data modellers who want to stop everything and build the perfect data model. Maybe go solve their companies master data management problem, set up taxonomy, enterprise glossary. Meanwhile I am sitting with the developers and DBAs in the middle of the spread and we don’t have time to wait for that.
Greg Low: Yes that’s right the first guys other ones that have projects that are never complete. Right?
Karen López: That’s right, so trying to find that balance of doing the right thing for data as well as I was again from the other side we need to massively did normalize this? Or would you need to have a name field, we would just solve this first name, middle name, last name thing we would just have a name field and I have to fight against that. Trying to find that balance, the real biggest mistakes is trying to find the balance between preserving and protecting the data, loving the data, and getting the project wrong. Everything I find almost a mistake is related to that.
Plus the general unfamiliarity. It used to be that I supported projects, where people were just building something that was just worked locally for them. I am finding that those days are pretty much gone. Even if this would be very common for my US clients, they would say that we have no foreign customers, we are not going to support any foreign addresses, or foreign telephone numbers, we don’t do business outside the US and yet customers have this way of moving away when they are you money and you still want to contact them. All visiting the United States and they still want to buy something for you and I think the days of being able to say that we don’t need to support any international things are pretty much gone. You throw in the fact that, commerce, Internet doesn’t really have borders, even though we have to deal these sort of things.
Greg Low: Yes.
Karen López: I guess that, the biggest mistakes are just modelling requirements that you know and not to the ones that you are not going to have to support. One of my favorite...
Greg Low: What role do you see standards having around these sorts of things? So I am it just interested in your thoughts on just like coding standards or any of those sorts of things? Again I see people coming up would with their own variants of things that are actually already standards for.
Karen López: Yes so fear is kind of a strong word I like the word patterns. I use a lot of patterns, there are several data modelling patterns, and there are books for those, there are also what I call industry standards for data models which are usually factor specifics. So there are specifically ones for retail, utilities, insurance, health care those have been coming along just like gangbusters in just last three years because organizations got together and even competitors and said it is really tough to share information with our supply chain because we don’t have a standardized way of looking at data. So from my point of view those standards which really are more patterns because there is not much to enforce them, they have changed the way modelling and design works. Now you are doing more tailoring of a giant body of knowledge standards about data to tailoring them and making them work. I call that more forensic data modelling that you are going to apply a model. It’s just like working with a package, you are going to a data model all package and trying to figure out how that works and how can we make that work for us while still preserving this benefit while being able to share data in a very standardized way.
Greg Low: What about richness of data types do you tend to use? I suppose one of the questions I have got is that I see a lot of people doing modelling and they use for example SQL Server 2012 basically like it is a new version of 2000. I am just interested to what degree do you let yourself get into new data types and things?
Karen López: So there is common myth that if you just use the most common denominator features of something at any point in time, you could take your whole application, change the databases which really differences between SQL Server 2012. There are a lot of substantial differences in data types, functionality and all of those things that you could just rip out your DBMS and replace it with just another one because you use these really common denominators features. I have yet in my experience see that actually be painless. It is kind of the same thing about the reasons software vendor’s say that they don’t use primary keys or foreign keys in their design, it so that they can support many DBMSs.
I don’t really believe in that, that much it’s because it’s pretty much the foreign key index or constraints and taxes that are very, very similar across many target DBMSs, even essentially different ones. So I do make use of the data types in the features where I can see a clear cost benefit, and risk reasons. For example in SQL Server 2012 is the advent of sequences.
Greg Low: Yes.
Karen López: It these didn’t happen in SQL server until 2012 and yet it is very common for me to use both in DB2 and in other DBMSs. I am happy to see than SQL Server because there are cases where one would want to make use of those. Where we needed those in prior versions of SQL Server, we kind of had to roll our own solution which didn’t work out very well.
Greg Low: Yes oh you could have emulated to a degree.
Karen López: That’s what I meant by roll your own, right?
Greg Low: Yes indeed, yes.
Karen López: Yes, so now I am happy to have that because I am familiar with the use of sequences and for most people who have only worked with SQL Server don’t see them as adding that much value because they never had them before and they weren’t working with other DBMSs. They don’t understand. Why would I use a sequence instead of, the identity property, the identify column and you know I talk to them about the fact that well. With the identify properties you can only have one per table, whereas with sequences because it is a completely separate object from the table. SQL Server doesn’t even know how you are going to use sequences; it is not tied to a table, so you could now have several of them. If you didn’t need to have a bunch of document numbers or something in a table or status codes or something status numbers in there. It gives you some flexibility, in the fact that it works well with doing a more complex transaction because you could go in get all the sequences you needed before you recorded that data. There is all this great new use cases that you could do, I would make use of those in SQL Server even though they are brand new.
Greg Low: Indeed that is really good. I suppose one of the questions that come up, is I think re-factoring databases is still really tough and one of the things I was interested in is I remember a couple of years back I think was Scott Ambler was his name. Produced a database re-factoring book and I thought this is great I love the fact that someone was actually writing on that topic however as soon as I got into the content of this book unfortunately you immediately see this lower common denominator approach. So for example things that could be calculated columns he would end up generating triggers, for example and so and so on. I did love the book at lease from the point of view, it raises the things that you need to think about all gives names to the typical sort of changes that you might need to make in databases and think about the processes on how you might do that. Did you happen to see that one?
Karen López: Yes, I am familiar with most of his works. He is not a fan of data management or data modelling and he has written books on this topics. He comes across the issue of data with a very code centric view so for instance in one of his data modelling books, I think agile data modelling. He talks about how is security number is a perfect primary key for a table and I have full presentations on why that would not even work.
Greg Low: Why that is not a good idea.
Karen López: Not just not a good idea, it just won’t even work. But his re-factoring books, the theory of re-factoring in development regardless of whether it is databases or code is making changes to something in a way that never breaks backward compatibility. That is kind of the gist of it.
Greg Low: I suppose we should maybe sort of just define re-factoring for those aren’t used to that terminology from development.
Karen López: Right, people use that word to just mean changes. It is not about changes, re-factoring code is something that a developer would do while they are in making a better but it is not applying changes that are functional. The theory is that it doesn’t break backwards compatibility as well that is one of the features of it.
Greg Low: Yes, so you are really trying to make improve the quality of the code without actually changing how it works. It is an important thing to be able to do that on an ongoing basis.
Karen López: Right, it’s another one of those things. Really great, that to me is a great goal, so re-factoring databases is about applying changes to a database in a way that doesn’t break anything else. Thomas LaRock and I actually have a debate presentation that we did at that PASS summit we did last year where we actually debate this re-factoring thing. We each take one side on it, so I took the pro side as a project manager. I kind of like the idea about being able to make a change to a database that doesn’t require me to go and find all the applications, all the reports, all the queries, all the spreadsheets, all the charts that are dependent upon it. Get them fixed up before I can make this change. The example I use, let’s say we keep track one email address per customer.
Greg Low: Yes.
Karen López: Marketing comes along and says this doesn’t really match the real world any longer and we want to keep track of multiple email addresses per customer. Maybe by their preference or by order, so in a traditional data modelling and database design that is one to many relationship. So traditionally the way we would solve that is that we would now going create a new table that we hung off customer, called customer email address or customer contact method and you would take email address out of the customer table, put it into this new table, along with the Customer ID and either some sort of number or date or reason or something. Right? That would make a unique instance of Karen’s work email address; Karen’s home email address, something like that.
Greg Low: Hmm.
Karen López: That is the right answer that is the right normalize answer. In order to make that change, we would be changing the customer table and introducing a table where we would actually moving where email addresses is actually stored. That would break anything that use that table, because email address column would no longer be there and anything that would ever need that email address would have to go to this new table. In an enterprise where you are using a lot of systems are accessing customer data, that is probably going to take a long time before we can make changes to although systems. Tap into a lot of companies, we wouldn’t even know all the systems that use email addresses.
Greg Low: It is really interesting you mention that because that is one of my pet reasons why I like to see one layer of abstraction inside the database engine. It is because I go into so many organizations where no one feels like they can ever change anything because they have zero visibility in who is going to scream at them when they make that change.
Karen López: Exactly, so one of the ways we might address this in re-factoring is we might leave email address in the customer table. We would leave that one column in there. We would go ahead and create this new table and we would make sure that that email address in there and we would continue to add new things.
So for a new system that was built, say for the marketing team, we kept track of all those email addresses, we could make use of the new structure and everything existing we could still make use of the old one. We would be violating some data management principles, we have intentionally but controlled duplicate data and then over time in a highly disciplined, these are all the qualifiers. A highly disciplined, well-managed data management development environment, we would then slowly migrate all those other systems and everything to start using the new table and eventually someone would come in and clean up and then make the final change so that email address is now only one place.
Greg Low: My take on it, is that would never happen.
Karen López: Right, that is Tom’s point of view that is why that is highly, highly qualified here. In a highly disciplined, well managed, data management approach that could happen. As a project manager, I actually would want to see this work because I don’t want to be the one that goes to marketing and say yes we would probably never be able to support multiple email addresses all we would have to do something like create a dummy customer for each new email address that points back to the original customer. All we are going to have to do something like make each email address column, a comma delimited and now we still have broken all the old systems or you are just going to have to wait six months to 3 years to where it rewrite the customer system. None of those answers makes me happy either.
Greg Low: The thing I love though, is that had the external code hit a proc, or had the external code here to view instead you could have come up with a decent data model and not have changed the external code.
Karen López: So the reason why I use this particular example is I am not sure of views. So when you go from one piece of information about a customer to now having multiples of those I’m not sure aliases or views are really going to solve that particular type of problem.
Greg Low: Even so, you could of relaid the tables and the view that original thing was sitting you could have still produce that same view and have that still working.
Karen López: Right.
Greg Low: The important thing to me, that you could now still at least have a decent data model that represents reality as changes occur, without having to have the stagnation of not being able to change things. Particularly where you have no visibility in what hits it, that’s one that worries me even more.
Karen López: right that’s why data architects like me are big fans you want to have a real data management, data governance process where we do know all the things, we have standards, we have data repositories, where we do know what sort of systems are using those email address. We don’t have to go through this big re-factoring thing because there is an abstraction layer that allows us to make this change.
The other thing is, about the view thing that has always been our way of dealing with this. Again life gets complicated in that one original email address. It has changed, we now have got for more, we have got one original email address and the customer service that originally email address you had for me no longer works all it let’s just say it is bouncing and I wanted to delete it. How do we know which new email address to replace it with?
Once you go from one too many, it makes it more difficult. The other re-factoring thing though, that is harder for me to get behind is even if you did something like change the name of a column. By the way as a data modeler I never think I am allowed to do that just because the cost to do that. Let’s say you want to change the name of that column, what the traditional with the re-factoring people would do is that they would not change the name. They would create a new column, with the new name.
Greg Low: Yes.
Karen López: In data probably have a trigger to keep track of that, and then they wouldn’t get rid of the original column until someone had verified all the other code had been changed.
Greg Low: Yes.
Karen López: That sounds nice, you know I mean, it sounds really nice that if you multiply that times the hundreds or thousands of columns, data and names and structured changes that happen in an enterprise. You are going to end up with thousands and thousands of these extra duplicate columns.
Greg Low: Yes it is an incredible mess.
Karen López: Yes.
Greg Low: An incredible mess.
Karen López: So we have experience in other ways basically we can use aliases or something like that but you can see how this would be very difficult to manage so I see a lot of the database re-factoring that is happening makes sense and maybe during a development project, where you don’t want to stop people from having to go through their backlog and everything. But putting this in a very, very large environment. One of his points is you know storage is free, it doesn’t cost a lot to do this but if you have a table that has 1 trillion rows and you are adding another 255 characters even though it is a variable field it starts add up.
Greg Low: Yes it does. Another interesting approach I am seeing used it a couple all places now, is people are making use of schemas for this. So what they are doing is that they are defining a schema like it is a particular version of a contract and the views and procs that are in that schema are what an application connects to as it’s schema and then basically all they have to track later on is anybody still using that particular all the things in that particular schema.
Karen López: Yes.
Greg Low: So each new application is designed to use a later so it is actually like a versioned schema and again it is something we don’t have like automated support built into the database for sort of versioning of these sorts of things but again I thought was an interesting approach to try and you schemas is a form of contract.
Karen López: Yes I have seen that both with versions, like application versions as well is development environments right? It just adds a layer of complexity, so everything is a trade-off not just in performance and design but how we do our work.
Greg Low: Listen one thing, in terms now you are based in Toronto and but you have a fascination I note with space.
Karen López: I do!
Greg Low: If anybody follows you on Facebook or Twitter they could not help but know this.
Karen López: That is correct.
Greg Low: So where does that stem from?
Karen López: I have this picture where I tweet once in a while of me when I was about 9 or 10 in an astronaut costume. I definitely grow up in a time, because I was so experienced when space and space exploration just was so exciting and lots of great things was happening. I will my brother had and astronaut lunchbox and just all kinds of astronaut stuff. It was a childhood thing and one of my first jobs out of school was I worked on strategic defense initiatives which involved a lot of space stuff. I worked space division in Los Angeles which was an Air Force Base and that was a very exciting time during the shuttle missions. Every time there was a launch for landing we would go down to the officers club and watch that. That was all really exciting! Then I kind of got out of that for while then a few years ago now, I started engaging people in social media to bring them in and sort of get a behind-the-scenes look of what was going on. I was able to attend the shuttle launch, the Endeavour, which was the second to last shuttle launch and then that just made me think again about what was going on. Since then I have attended a lot of these social media things even for European Space Agency. I have been to visit the Japanese space agency. I played closely with the Canadian Space Agency, especially since Chris Hatfield astronaut on the international space station right now and have been doing wonderful outreach and social media.
Greg Low: Ahh we watched it ago across last night.
Karen López: Oh excellent, yeah people, yes that’s one of my little plugs for the space station. It orbits the earth 16 times a day, so add various points during the year there are several nights in a row where you can go out. You don’t need any equipment, you can just go out and watch this space station and give it a wave.
Greg Low: And you can even signup for an email too.
Karen López: Yes that’s right.
Greg Low: The thing that I loved, they just simply send you an email when it is going to be in your area and where it is visible from where you are. So you just pick a city and they just send you an email. That’s great!
Karen López: They can send you an email; they can send you a tweet so there is lots of
Greg Low: And there is a chart that shows you the days. Where it will appear, where it would disappear and how long it will be there for, it’s great!
Karen López: that’s right, so it only takes depending on what its orbit is, what its path is, you know it can be a couple minutes. It can be six or seven minutes across the sky. I just find that really amazing that you can just go out and see it go across the sky. Give it a wave. There is a hash tag, ISSwave and then some website tracked that. You can go and see a data visualization that everyone who went outside and gave it a wave.
Greg Low: Outstanding! Yes look I am really jealous with you getting to see a launch. I never have managed; I won’t ever now manage to get this see a shuttle launch. I did get to see a landing actually, the second one that came in and that was really interesting. I was on a course in Cupertino area at the time in California and we had driven down to LA for the weekend. It was supposed to land at Edward Air Force Base there; because it was raining they moved it out into the New Mexico desert. I remember originally, NASA said the public can’t come and we thought oh, we were so disappointed because we were in the area and we had driven probably halfway back to San Francisco. When the government stepped in and said oh know the public will be allowed to come. We went wohoo, so we turned around and drove right out into the middle of the desert. It was just absolutely wonderful! Really interesting to thing to see. The thing that really impressed me, I never really picked up when watching it on TV, it sort of looks almost graceful on TV. I have to say it came down like a brick. You know it was not so fast.
Karen López: It’s a glider right?
Greg Low: Yes it is a glider but the thing you will see those TVs shots and they are shown from a jet flying beside it. You know what I mean; it is really coming down mighty fast. So yes it was really quite something to see. It was interesting to see the length of the runway that it was landing on too. It was extraordinary, and lots of things at the end that we were making sure it was not going to go past it. No matter what. It was fascinating to watch and look I’m really sad to see the program disappear from that. I think it was a completely breathtaking and amazing program.
Karen López: It is, the launch is a great and now I have seen several other types of launches as well. Just really exciting stuff, so even though the shuttle program has gone we are still sending astronauts up. As well is now starting up commercial flights, so the shuttle program has just been replaced by other types of vehicles.
Greg Low: Yes I think it is going to be just a constant industry and yes I think it is completely fascinating watching where this will all go. I am looking forward to seeing it all as time goes on.
Karen López: Yes the other thing though that I wanted to say there is a big chunk of the SQL family also into the space thing. So it is not just me, there is probably more than a dozen that have participated in either of these NASA events all gone to watch launches or landings or any of those things. I find that a nice overlap of interests.
Greg Low: Yes in fact, the one I am absolutely kicking myself on was I was in Orlando when one of the recent launches occurred a couple of years back. Tech Ed was there, it was finishing Friday afternoon and then the launch was Friday afternoon and I did not know that and of course I was at the airport on my way out and I couldn’t change my flight at that point. I thought oh I can’t believe it’s on you know not that far away and here I am at the airport on my way back to Australia. I was so sad, so I didn’t actually manage to get to see one of those.
I had a major respect for that stuff; I have spent a lot of time going through the Air and Museum, visiting Houston and just places and stuff. I just love the stuff! The thing that intrigues me is that people thought it was almost safe you know in some way and still I look at that stuff and think these guys are brave. There is just nothing that could ever be safe about that at the moment.
Karen López: Exactly.
Greg Low: It still is a long way off from being safe.
Karen López: It’s been a good learning lesson, just from project management skills. Reading about the things that they deal with, on a massive scale compared to, we think our projects are complex. Reading lessons learned because failures and reading how they deal with readiness reviews and those kinds of things. Some of those can adopt some of those processes, not quite the same as what NASA all those other agencies do. It really does talk about the culture about making decisions and everything I have learned quite a bit from that to.
Greg Low: Actually speaking of government data becoming available and so on, also I noticed you have an interest in the open data movement.
Karen López: I do, mostly because I feel again I am data chick on Twitter, so all things data I find interesting. So open data is a push for publicly generated data, the data we pay to collect should not be made just available back to the public, back to researchers and to organizations but also in standardized formats to enable them to be consumed on a regular basis. So even though it’s been a long-term thing, Freedom of Information Acts and other kinds of things to provide information. The way used to work, is you had to call up some organization and say I am interested in finding out about all the parking tickets that were generated over the last two years. They will come back to you and so that is going to cost us $25,000 to generate a report for you, how would you like to pay? You would usually get paper reports.
Now with open data and with the work that is being done there, not only do you not have to make a request. I mean those vehicles do exist, but governments and agencies are providing this data automatically and what they are finding out is that it is cheaper and less expensive for them to do it that way than to respond to all the requests. Because a request requires a privacy review, a legal review, a security review and now they can cure rate this data and provide it in a way in fairly open formats. The big struggle that organizations are going with now is which format and how to make them the most consumable.
Greg Low: Yes I think that is one thing up I was going to put a link in the show notes to the open data website. The open government ones, I find a lot of folk when working in SQL are just not even aware that these sites exist. Certainly data.gov is sort of the grandfather of all of those ones. Last time I looked it was nearly 400,000 datasets on the US so you really it has almost everything you can possibly imagine wanting to know about the country. It is also worth noting I think that is now pushed out into quite a lot of countries. I know we have data.gov.au and certainly when I look around New Zealand, Singapore and so on and various Europeans ones. Canada is also another one; in fact there are a lot of these open government sites now some of the may in their infancy. I look at the Australian one I think it is only in 1000 or two data sets but I think there are thing that people don’t get is governments collect and amazing amount of what we would almost consider trivia. They deal with it and process it and I think it is really desirable to get them to put all the information in a single spot.
Karen López: Yes it is, it is governments, it is now other organizations are putting non-governmental organizations putting data up there that can be shared. You just don’t understand that hundreds and thousands of datasets. Even people come to me and say I need some data for a demo and you are going to point them to data.gov.
Greg Low: And say just go for it.
Karen López: Go for it, do some searching then drawing back to my passion. You can go back to NASA.gov and get some really exciting data that would make a great where you mashing up space data, against White House data, against weather data, versus crime statistics and all of these things. Right now there should be no reason that anyone should be missing out on some sort of demonstration data or even open data that they can use on their applications.
One that I make use of a lot is where the data that can be used in a lot of analytics. There are a lot of business functions and I have a background in retail. Weather is sometimes a bigger indicator of how well I promotions are going to do than how great the promotion was then actually going into a bricks and mortar store.
Greg Low: Yes that is a wonderful point. Give I look at the Australian data at the moment. It is still kind of limited. They have things like some of the State crime statistics; they have all sorts of taxation related data and so on. One I love is that they have the national public toilet map.
Karen López: Excellent.
Greg Low: I thought was excellent, another is that have the location public barbecues in the ACT one of our territories. I think people underestimate the level of data that governments collect is it really is something. And having a spot for that, yes we all paid for it in that first place to be collected and processed. I suppose the key thing there is when you get to those sites, the format of the data I mean it is one thing to have one spot where it all leaves but there is a big variety of you know he use a spreadsheet, here is a zip file, and so on and I think this is the next real challenge.
Karen López: There is that, there are formats. We all know that even if we declare or data shall be provided in XML format that doesn’t necessarily make it any more usable to us. One of the things like the classic data sharing, data integration issue is that there is format and what is the meaning of the data. Again we come back to what is the customer? What is a citizen? What is a resident? In one data say a resident might be someone who just they said is their residence, their address but from citizenship and immigration purposes a visitor is not a resident so they might not be listed or they might be listed differently. Open data stuff that I am interested in is in solving that next problem. First we have to share the data and we have to put them in consumable formats that are easier to share, like XML, comma delimited, like flat files. Embedded into spreadsheets or PDFs or Word documents and trying to come up with standards for sharing.
Now even within the US there are some bodies that just focus on helping organizations share data not from the technical point of view but coming to agreements on what they mean and how they should be used. One of the organizations with a focus that came out of events from 9 11 in the US is that jurisdictions had a terrible time trying to share data, technical exchange issues, actually conductivity issues and then they started to try and change sharing information and they realized that just emailing spreadsheets around was just wasn’t good enough. So they came up with an organization called NOEM the National Organization for Exchange Model for sharing data among municipals, states and federal agencies and they are expanding beyond borders so that they can help organizations share. I think that is going to be the toughest thing is first we solve the problem of making it accessible, then in a proper format, then having to understand the meaning. The last part is going to be the hardest part.
Greg Low: Yes now lovely and listen so you are also actively involved in social networking, I will put a link in the show notes to your Twitter account. Other places that people will find you?
Karen López: Oh my gosh everywhere so if they look in LinkedIn I can share those with you if you don't have them. You know I am kind of old school too I still hang out in some forums and mailing lists and those sorts of things but primarily on Twitter I am over 100,000 tweets, In my lifetime and that's after Twitter lost about 20,000 a couple of years ago. Let’s just say I am active in it.
Greg Low: Oh that will raise a question on data governance we will leave that for another day. Indeed is there anywhere you mentioned you were at the last PASS summit where would people see you in the upcoming times?
Karen López: Oh my goodness well I do a lot of SQL Saturdays. I just love going to meet people in their local environments and I am speaking at enterprise data world coming up in a few weeks in San Diego. Now that’s from the data management point of view, it’s a conference that focuses on data architecture, business intelligence, data modelling and some no sequel top stuff and emerging trends. And that conference has been going on I’ve still been going to it for about 16 years in a row. So that definitely is in my calendar and that is about it SQL Saturdays.
Greg Low: Now that’s great. Well listen thank you so very much for your time today Karen.
Karen López: Thank you, I had a great questions and it was nice chatting with you.
Greg Low: Indeed lovely, thanks a lot!
Phone: 1300 SQL SQL (1300 775 775) l International +61 1300 775 775 l Fax: +61 3 8676-4913
Copyright 2017 by SQL Down Under | Terms Of Use | Privacy Statement