Welcome to Foxite.COM Community Weblog Sign in | Join | Help

Modeling Lookup Tables

One topic that caught my eye on the Foxite forum recently was a discussion of how to implement Lookup Tables. Like many such questions it provoked strong opinions and definitive statements of the "Thou shalt ...." type (i.e. the type of "dogmatic logic" based on opinion, instead of reasoned argument). The question centers on whether it is better to use a single table for all lookup data (the so-called "One True Lookup Table [OTLT] approach) or to have separate tables for each different set of lookup data that is required. As with many such data The usually quoted conventional wisdom is that:

Developers like the idea of a single lookup table, but Database Administrators do not

and the reason why developers like them is that they simplify the creation of maintenance screens and allow generic code to be written to handle the population of controls in the user interface. Conversely the reasons why DBAs do not like them are that (according to an entry in "Systems Engineering and RDBMS" at http://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table/ )

This is an OO design pushed into the database tier and poses many problems including :

  • Not being able to force foreign keys (so your data integrity is compromised right there),
  • Not being able to use the right data-types since everything has to be a string in order to accomodate all data-type values (this itself can give rise to bad data issues since a numeric(4,2) constraint cannot be enforced –> thus domain integrity is out of the door,
  • Because of (b), the length for the string column will be huge,
  • One has to then write complex SQL queries in order to retrieve the data and also take into account the type conversions that occur when you join this table with the other tables…implicit conversion is not allowed in all RDBMS and is not a good thing anyways.

However, as all too often with such didactic statements, this is based on a couple of assumptions; first, that all lookup data must be contained within a single table and, second, that all "lookup data" is the same and must all be handled the same way. Now, if those assumptions are really correct, then there is no question that using a single table for all lookup data is a bad idea (for all the reasons given). However, are they really valid?

The key to understanding this issue lies in one of the most basic principles of good database design - which is that "a database must model the business process that it supports". When viewed from this perspective it turns out that the assumptions on which the argument is based are incorrect. What we are taking about here is what I defined in my article "Designing A Database – Part 2" (see November 2008 on this web log) as Support Data. In that article I said the following:

Although support data cannot be derived from other data and therefore requires permanent storage, it differs from core data in that it is does not form part of the body of data upon which business applications operate directly. This includes all data that is used define how the system behaves (i.e. metadata and preference data), or how the data actually stored in the system is interpreted (i.e. look-up and reference data). The support data is a key element in the database in that it is primarily concerned with the translation of data into information – and this applies equally to metadata as it does to look-up or reference data.

Notice the highlighted section. The rules of data normalization require that we use Foreign Keys to reference values that would otherwise be repeated in a table and such values are usually referred to as "lookups" leading to the idea that they are all equivalent in some fashion. In reality that is not the case and there, as indicated, TWO types of data that have to be dealt with - "lookup" and "reference" data. In the previous article I did not expand on the distinction because it was not really relevant in that context. Let me do so now, because this distinction is, in my opinion, the key to the whole OTLT issue.

Lookup Data should, as the name implies, consist only of things that can be "looked up" or, to put it another way, consist of only one piece of significant information. So for a lookup, the foreign key directly substitutes for a denormalized value. Conversely, "Reference Data" may comprise many pieces of related information and a the foreign key is used to retrieve the set of information that is applicable. You may be thinking, about now, that I am making an artificial distinction, but believe me, I am not. Table 1 shows a few examples of "lookup keys" and how they map to the different categories.

Notice that for all keys that fall into the lookup category there is always just one piece of information – a description, and nothing else. While in the reference data category there is always more than one piece of information associated with the key. I can't speak for anyone else but myself, but to me it is obvious that there is no way that storing "Customer Credit Terms" in the same data structure as "Units of Measure" would make any sense. 

So the assumption that all lookup data is somehow the same is false to start with. For anything that falls into the Reference Data category, an appropriately designed table should always be used and I absolutely concur with the premise that doing anything else is poor design.

But what about the Lookup Data category? The fact is that all of this data is actually 'enumeration' data and that is probably a better term to use than "lookup" for this type of data. Reduced to basics it comes down to a Key/Value pair. Even so, if we were to use a single table for all such data we do run into the issues raised by opponents of the OTLT approach. Specifically, keys become difficult to define and either we have to enforce uniqueness on the "lookup key" or we have to introduce a 'Type' column to create compound key (i.e. Type + Value). Neither option is attractive and it would indeed complicate the SQL required if we had to adopt such a solution.

But hang on! We are talking about relational databases here (we wouldn't have foreign keys if we weren't)! So why are we constrained to use a SINGLE table? The short answer is that we are not and, for enumerations (i.e. that consisting only of code and description) we can use a PAIR of tables. The structure that I like to use is shown in Table 2.

Note that all of the issues disappear if we adopt this approach. Each table has its own (surrogate) primary key, and the relation between Type (header) and Enumeration (detail) is maintained by the foreign key in the detail table. There is no need to artificially restrict the key values in the detail table and a simple SQL inner join returns the appropriate set of enumerations. If there are specific sets of data that are used often by applications, we can define views for them and create either stored procedures, or functions, to handle specific requirements. The added columns on the detail table are for the convenience of the UI developer. The "Active" flag is used to indicate whether the value should be made available in the UI, and the "Default" flag indicates which value (if any) in the set should be used as the default in the User Interface. They do not have any significance, or impact, at the database level.

So my position is that reference data requires its own tables, but enumerations can be handled using a header-detail pair of tables. This does, however, raise another question. Should enumerations even be stored in the database? After all this data can be handled directly in code if all it is doing is providing expansion of key values to their associated description. The short answer is that they most definitely should be, and there are several reasons.

  • If enumerations are defined only in application code, then adding a new item to the selection always means recompiling, and re-distributing, the application. Admittedly if the new item requires special handling in some way, then its addition would still require code changes, but the addition of, say, a new contact type would probably not require any code change at all and can be handled by simply adding a row to a table and letting the code that accesses the enumeration tables return the new type.
  • Unless the enumeration is stored in the database, there is no way to decode the enumeration key values for reports or when exporting data to another application (e.g. an Excel spreadsheet for reporting or analysis) or to a data warehouse.
  • When multiple applications access the same database, there is not only the issue of ensuring that enumerations are available to each but also that of ensuring that they are synchronized and consistent. The scenario where one application interprets an enumeration key of 1234 as "Home Phone Number" while another defines it as  "President" does not bear thinking about!

So whether we are talking about an On Line Transaction Processing, a Management Information System, or a Data Warehouse, we always have to cater for two distinct types of "lookup data". The enumerations that are used solely to normalize values that would otherwise have to be repeated and the reference data that is used to manage various aspects of the application. They are totally different entities and should be handled in different ways.

So is there a case for the "One True Lookup Table"? Not in the way that it is usually defined. The trick is not to confuse reference data with enumerations and not try and shoe-horn reference data that has different attributes into a single common structure. However, a pair of tables can certainly be a much more efficient way to handle enumerations than having lots of separate, identical tables, each of which contains only a few static rows.

For example a CRM application I worked on recently has 40 rows in its lookup header table  (i.e. we would have required 40 tables if each type were handled separately) and a total of 303 entries in the detail table (there are 50 "States" and 42 "Providers", so the average number of 'rows' per type is only about 7). Can forty, identical, seven row tables really be more efficient than the header-detail pair? Hardly!

Published Sunday, May 10, 2009 12:59 PM by andykr
Filed Under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Modeling Lookup Tables

Monday, May 11, 2009 4:16 PM by James Richard Wade

Another great topic and exposition, Andy!  I have both your books and look forward to you blogs so much that I really miss them when you have a period of speaking engagements that limit your time.  You're a dedicated teacher.  Thanks for sharing your knowledge and exposition talents.

Thank you for the kind comment. Since Marcia and I have given up speaking, and we have also given up the KitBox column, I do have a little more time fore the blog these days. Don't know if that's necessarily good, or bad, but that's how it is Smile <img src=" /> -- Andy

# re: Modeling Lookup Tables

Monday, May 11, 2009 10:12 PM by Steven Black

Another really great article, Andy.

I'm wondering, do you have insight into practical or advisable general-limits of this?

For example, if our database tracks documents, document statuses could be a lookup/enumeration.  That may be perfectly fine in many implementation cases.

However, if document statuses are at-the-core of our application, the subject of things like policy, workflow, and decision heuristics, then lumping document statuses with other lookups can get ugly because the common shared-structure precludes easy extensibility.

Therefore I wonder if we could wisely, at the outset, state that reference and lookup data should always be limited to adjunct data, and should specifically exclude enterprise- or core-data.  

In other words, disregard the apparent easy-fit; if the entity is a core element-of-the-enterprise, stick it in its own dedicated structure.

I've seen this many times before: a generic data structure is introduced and, within a short time, it gets abused.  As in, "all-round-pegs now go into this-round-hole", sometimes without proper regard for the true-nature of particular roundish-pegs.

It always seems difficult to pin serious underlying problems on evidently elegant generic structures.  When limitations appear, even smart people tend to code-around the problem rather than pay the immediate refactoring-cost of extracting the entity from the otherwise "elegant" generic data structure.

At the outset I think it's wise to say, if the entity is "core", or if the entity will be  built-upon or extended, it's not a lookup even though its current structure fits into the lookup.

Not sure about "insight" but I see what you mean, and agree totally. However, as soon as you introduce the possibility that some data entity might consist of more than an enumeration then, by definition, you have turned it into a reference entity and refactoring is the only sensible decison.

Of course it is not always possible to tell, in advance, which enumerations may need to alter but then that is what Views, Cursors and Stored Procedures are for (and why I especially recommend their use in these scenarios).  Each allows you to change the database implemention without changing the interface that the database exposes and cumulatively they provide a methodology for handling changing requirements that might invalidate earlier decisions made on the basis of the information available at the time.

Having said that your other point about 'roundish' pegs is very true. It is often hard to convince people that doing it right first time is usually the best solution in the long run. As we both know, refactoring never gets easier with time! TH elonger you leave it the harder it will be.  -- Andy

# re: Modeling Lookup Tables

Tuesday, May 12, 2009 8:01 PM by Mike Yearwood

If the entity is core or not, you will choose a different tactic for dealing with them?

Absolutely! There is no one-size fits all to data modeling and only a fool believes that there is only one design. There are many design patterns, for databases, not just one and it is your job as the database designer to understand them, and to choose the most appropriate design for each element of the database.

Why not just apply the same data modelling tactic in all cases and never have to worry about refactoring one of the lookup types out of the lookup table?

Refactoring is not the issue here. What matters is correctly classifying your data and that depends on a quality of your initial analysis. The overwhelming  case (better than 98% I would say) for enumerations is that they never need any additional data or refactoring and their structures remain unchanging throughout the life of the database.

Rarely some overlooked aspect surfaces, or things change dramatically, and refactoring becomes a necessity. However, you shouldn't design for exceptions, or for the 2% use case. You choose the best design possible given the information at hand at the time. 40 identical tables with less than a dozen rows each is NOT a good design, no matter how you slice it. -- Andy

# re: Modeling Lookup Tables

Tuesday, May 12, 2009 11:48 PM by Mike Yearwood

Joe Celko is an acknolwedged guru of SQL and he thinks it IS a good design to have 40 identical tables.

http://www.dbazine.com/ofinterest/oi-articles/celko22

I am sorry, but you post this as if it somehow refutes what I am saying. Nowhere does he say, or even suggest, that having 40 identical tables is a good idea. What he says is that using a single table for all lookups is a bad idea - and that is exactly what I say too. Did you actually READ anything I said?

Joe and I are in total agreement on this issue - at least as far as he took it. I just took the analysis a little further. So I am at a loss as to why you are bothering to post such ridiculous assertions.  -- Andy

# re: Modeling Lookup Tables

Wednesday, May 13, 2009 2:28 AM by Charter COmmunications
Big Fan Andy!!  read your books and follow your blog.. many thanks for all that you do.

# re: Modeling Lookup Tables

Monday, May 25, 2009 7:15 PM by Alex Kato Ba

Thanks! that was an excellent piece of article

Thank you. Glad you found it useful -- Andy

# re: Modeling Lookup Tables

Wednesday, July 15, 2009 12:08 PM by chris

"Note that all of the issues disappear if we adopt this approach. "

I can't see, that the datatype issue disappears

What "datatype issue"? Since I am specifically talking about Enumerations that consist only of a Key and a Description there can be no datatype issues....? -- Andy

What do you think?

(required) 
required 
(required)