what is a "vertical table"?
the short answer is that a vertical table is one which does not store data elements in separate columns. instead data is stored in rows, with each row containing a description of what is being stored (the “attribute”) and the actual value. in a normal table, the requirement to capture an additional attribute means adding an additional column to the table. the result is that the table gets wider (i.e. grows horizontally) as new attributes are added.

in the case of a vertical table, each row contains both the description of the attribute and its value. the result is that as additional attributes are added to the table it acquires more records (i.e. grows vertically) but never gets any wider. at its simplest, a vertical table consists of only two columns, the attribute descriptor and its value, hence the alternate name for this type of table; an ‘attribute/value pair’ table.

whilst vertical tables can be used in many scenarios, there are two key criteria that, when satisfied, indicate that a vertical table is actually required. the first is that the data being modeled is infinitely extensible. the second is that the data related to its owning entity is always accessed as a complete set. these criteria are directly related to the advantages (and limitations!) of vertical tables and so merit detailed attention.

the term "infinitely extensible data" is simply a fancy way of saying that there are no specific rules about the data in question. one of the most commonly encountered examples of infinitely extensible data is postal addresses. this may sound odd, but if you think about it, there are no hard-and-fast rules for how postal addresses are defined. for example the us postal address for an apartment might look something like this:

123 crossover street, apt 12a
akorn, oh 44123

while an equivalent type of address in the uk might be:

12a cowsfoot house
234 pastures walk
london
sx1 3gg

analysis shows that each actually contain the same basic set of attributes (apartment number, street address, city and postal code) but the way in which they are defined is totally different. thus the uk postal code includes the county (more accurately it defines a unique "postal district" which means that the county is implicit) while the us zip code requires the state to be defined explicitly. so what is usually done is to create a table with columns for each attribute and screens that allow data for each column to be entered. the result is usually something like this:

which, of course, is perfectly ok - as long as the table was not defined with purely us coding in mind (i.e. 2-letter state codes and 5/9 digit zip code) but there is obviously no standard for the way address details are handled and, in practice, very few addresses really require all of these columns anyway. now consider the same addresses as they might be stored in a vertical table:

the first thing to notice, which is a benefit of this approach, is that we can now name our "columns" much more appropriately for each address. second, we have no wasted space in the table. each address consists of as many rows as it actually needs. however, there are some issues that do not arise in a more conventional table.

first, even though we know which rows belong to which owner, we have no way of knowing in which order they should be presented because we do not have standard column names that can be mapped to a ui or report. so we need an additional column for each set of attributes to define how the attributes should be interpreted.

second, our ui (or report) needs to be able to dynamically assign the appropriate number of rows and their descriptions because, while the address for owner 123 has 5 attributes, owner 125 has only 4 while owner 124 has 6 attributes. more importantly the attributes that were used differ between addresses.

so, you may be wondering, why don't we all use vertical tables like this for handling addresses? the answer is twofold. first, that while addresses considered globally are infinitely extensible in any one context they are not. after all, all us addresses are handled the same way, as are all uk addresses. so in an application context there is no real need for this kind of extensibility, and even when there is, there are other ways to handle it.

the second reason is related to a major limitation of vertical tables, that is, in turn, based in the second criterion for requiring one; i.e. that the data is always accessed as a complete set by owner. what this means, in practice, is that vertical tables are really useful when you need to save and restore sets of data, but are less useful when you need to anyalze that data.

for example, with a vertical table it is easy enough to get the address for owner 124:

select attribute, value
  from address
 where owner_fk = 124
order by sequence

but it is difficult to construct parameterized queries on such tables because there are no defined columns by which to filter results. the only option is to search for specific “attribute = value” combinations. 

thus to find how many addresses we have in new york state the query would have to be written as:

select count( distinct owner_fk )
  from address
 where attribute = 'state'
   and value = 'ny'

and this, of course, assumes that all new york addresses were actually entered using the "state" attribute (and not a "county" or "province" attribute). what is worse, grouped queries that are easily handled by conventional tables are even impossible (like totaling sales figures by state for example).

so, if not for addresses, then what should we use vertical tables for?

there are a myriad of possible uses but, in general, i have found that you cannot simply implement a vertical table in isolation. the schema required, if only to implement the ui, is generally much more complex.

here is an example for handling educational and professional qualifications. the problem with qualifications is that they truly are infinitely extensible, even within a single context. there are absolutely no "rules" and no standards that are applicable.

for example the "standard" abbreviation in the us for a nurse is "rn" (registered nurse). however, in the uk "rn" indicates "royal navy" and the standard nursing qualification is "srn" (state registered nurse). the problem is not just international either, what qualification indicates a medical "doctor"? in the us around 30 are recognized (including phd - doctor of philosophy) and to make matters even worse, even within this set there are conflicts – do could be either "doctor of osteopathy" or "doctor of optometry" while dpm is either a "doctor of pediatric medicine", or a "doctor of podiatric medicine" (better be sure and get the right one to treat your problem)!

the situation is even more complicated when you take into account the fact that there are at least three different, and unrelated, types of qualification that people hold:

  • educational: these document your academic achievements and include things like your college degree and area of specialization. these are static because once you have gained an educational qualification, you have it for life
  • professional: these document your competencies and include things like medical qualification boards and memberships of professional associations. these tend to be static, but can change, especially if there are various grades of attainment or membership
  • certification: these document things that you are allowed to do and include things like practice licenses or specialist skills. what differentiates them from professional qualifications is that they expire and have to be renewed periodically

thus we could have three completely different sets of information which all relate to a individual's qualifications and they require completely different sets of data. educational qualifications typically include an awarding body, subject, date and grade. however, a time-based certification will also need an expiry date and a professional qualification may well need a licensing body and number.

obviously this data meets the "infinitely extensible" criterion. however, unlike addresses it also meets the second criteria because we would always want to retrieve the data in sets. for example, license numbers, in isolation, are meaningless because we would always need to know the context for a license (what it is for, who granted it, when it was awarded, when it expires and so on). similarly the fact that someone has taken a c# programming course is not much use unless we also know the level for the course (beginner? advanced?), when and where it was taken and whether they passed or failed.

in order to model this we will need to use a vertical table to hold the actual data that relates to a qualification to but we will need additional tables to describe how that data is to be interpreted.

in order to support this, we also need to define attributes, and this too is a little more complex, requiring three tables as described below:

the full model for handling qualifications is shown at figure 1 below:

notice, however,  that the attribute definition portion of the model is entirely generic and could be used defining any type of attributes. consequently, for any data entity that needs attributes all you need is to define the attributes in this model and create the necessary "detail" table to link them to the owning entity. in fact, i usually define views to handle retrieving the attribute data so that the complexity of the model is hidden, thus to retrieve the generic set of attributes for an "address" i can query the view by name:
select * from dbo.v_trans_attrib where attgrp_nme = 'address'
returns:

or i can use the attribute code directly and get the set of attributes for a certification: 
select * from dbo.v_trans_attrib where attgrp_cde = 'cert'
returns:

by defining these views so that they include all relevant keys and codes it is a simple matter to have the ui generate the necessary data to retrieve the attribute sets, their associated lookup values and data. since every table has a surrogate primary key, the update code is also very simple (and generic). 

ok, so vertical tables can work, but why bother? there are really two reasons. first, as i have tried to show, vertical tables can provide a flexible, and more importantly a generic, mechanism for handling totally disparate data that would otherwise require multiple tables with different columns. second, and perhaps most important, vertical tables can handle changing data requirements without requiring changes to table structure. adding a new attribute is merely a question of adding a new row.

they are not always applicable, but when you are dealing with data that is both extensible, and heavily set oriented, vertical tables offer a powerful and flexible way of handling the data. the additional planning and work involved in setting up a vertical table based model and creating the generic supporting mechanisms is, in my opinion anyway, amply repaid by the benefits achieved in terms of reduced maintenance and flexibility that they permit.

3 Responses to Working with Vertical Tables

  • Doug Hennig says:

    Great post (as usual), Andy. Very nice explanation of this design. Unfortunately, I’ve run into applications that use vertical tables because the developers were too lazy (or didn’t  know how) to add columns to an existing table as the requirements changed over time.

    Ah, well you can always abuse a good design and the incompetence of others is no reason for you or I not to use the design appropriately — Andy

  • boudewijn lutgerink says:

    Did you have, by any chance, a look at Magic, the development environment? The whole system depends heavily on vertical tables. I had a demo on Magic today and saw immediately how it was organized. Good description Andy. If you don’t mind, I keep this in my treasure case for future use.

    Assuming this is the same thing, I haven’t seen Magic since the late 1980′s. I confess that I didn’t like it much then, but it must have eveolved somewhat… — Andy

  • Bob Brook says:

    Just a small point on UK postcodes.
    You cannot infer a county from a postcode.
    Although 98% of postcode areas do cover only one county. Some postcodes (or areas covered) do cross over into an adjourning county.

    Primarily – counties are not important for the Post Office when delivering letters.

    Agreed, but then I never said that they were important, or even that you could infer the county!

    (Although many people seem to be unaware of it, there is a significant difference in meaning between ‘infer’ and ‘imply’. I used ‘imply’ deliberately, which the OED defines as:  “Strongly suggest the truth or existence of a thing not expressly asserted“).

    And what I actually said was:
    Thus the UK postal code includes the County (more accurately it defines a unique “postal district” which means that the county is implicit)
    If you insist on pedantry, read it literally as: 
    … that the county is strongly suggested but not explicitly expressed)
    In any case the point was to contrast it  with the US Zip code which takes no account of either City or State and requires that they both be specified explicitly.  — Andy

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>