Welcome to Foxite.COM Community Weblog Sign in | Join | Help

Designing A Database -Part 3

----------------------------------------------------------------------------------

12/03: Sorry for any inconvenience - somehow I forgot to 'publish' the pictures for this article and since I only reviewed it while logged in to the site as the owner, I didn't realize until I got a message from a reader that they were not visible to others.

Please accept my apologies for this oversight I will remember in future.

----------------------------------------------------------------------------------

In the first two articles I concentrated on the infrastructure that is required in order to design a database – the logical model and how it relates to the business and the types of tables that we can use. However, the essence of a relational database is the fact that relationships between tables are established. In this specific context we are use the word "relation" in a very specific sense. What we mean is a persistent relationship that is defined in, and usually enforced by, the database. Any type of table can be involved in transient relationships (either through the use of SQL or, as in Visual FoxPro, through a programmatic definition) but that has nothing to do with these patterns.

Single table

The simplest of all relational patterns is the single table which is not surprising since, by definition, the single table has no persistent relation to any other table. Such tables are usually used to store definitions or preferences and are most often used for handling metadata or standard look-up definitions. Another common use for the single table is as the target for de-normalized data used for reporting or export although, in such cases, the data is typically transient (i.e. it is cleared and replaced by each operation).

Extension tables

It is often stated, in the context of normalization, that tables that are related on a one-to-one basis should be combined into a single table. However, that is only true when the relationship is always, and exactly, one-to-one. As soon as the possibility of a one-to-zero is introduced the existence of two tables that have a “one-to-zero or one” relationship is valid. Such tables are used to avoid adding columns to a table that will not be required by all records and are usually referred to as Extension Datasets.

 

Figure 1: One-To-Zero-Or-One

Interestingly such extension datasets are often implemented as vertical tables. For example when faced with the task of recording people’s certifications and qualifications, a conventional table is of little use. After all, not everyone has certifications or qualifications to begin with (the one-to-zero condition), and since there are an almost infinite variety of qualifications that a person may hold a plain text (free-format) field is often the only viable approach.

One-to-Many tables

This is often referred to as a “Parent-Child” relationship, although there is absolutely no reason why such relationships should be restricted to two tables. In fact relational chains are a common feature in more complex data models. The relationship is characterized by a record in one table being used to store data that refers to more than one record in another. This is often used when it is necessary to model entities that are made up of repeating sets of data that are grouped.

 

Figure 2: One-To-Many

The key concept that underpins a one-to-many relationship is that of "ownership". In other words the data at the one end of the relationship has direct ownership of the data at the many end. For example, an invoice typically consists of ‘header’ information (customer name, order number, date and so on) and one or more ‘detail’ lines (the actual items being invoiced). Clearly the invoice line items belong to, and are owned by, the invoice header – or, to put it another way, they only have meaning in the context of their owning invoice.

This can best be modeled in two tables, one storing only the header information and one storing only the detail lines the - hence another name for this type of relationship – "header/detail".

The relationship is implemented by storing the Primary Key of the parent table in a column in the child table. This is a Foreign Key and it defines the many end of the relationship, while the primary key in the parent table defines the one end. Of course, such relationships depend upon the existence of suitable keys in the database.

Note: A Primary Key is defined as a column, or combination of columns, whose value uniquely identifies a record in a table. Many database designers (including myself) strongly advocate the use of ‘surrogate’ (i.e. a system generated value of some sort that is unique within the table and that has no purpose other than to be the primary key) primary keys to simplify the definition of relational keys and to separate data that has business significance from task of managing the referential integrity. Most databases have built in mechanisms for generating unique IDs within tables (‘auto-incrementing’ data types in VFP, ‘Identity columns’ in SQL Server and  ‘Sequences’ in Oracle).

Tables which implement a one-to-many relationship have that relationship defined within the database as “persistent”. The referential integrity management structure is made up of these persistent relations and the rules are implemented by using them

Many-to-One tables

At first sight you may be tempted to think that a Many-To-One relationship is simply a One-To-Many relationship viewed from the other end, and if taken literally, you would be correct. However, in terms of database design a many-to-one relationship implies a look-up, or decode, relationship rather than one that is explicitly enforced by keys. In other words the difference lies in the ‘intent’ and, if you are familiar with software design patterns you will already know that intent is one of the four key characteristics of any pattern (the others being name, an abstraction of the problem and the allocation of responsibilities).

The distinction lies in the lack of ownership. In a one-to-many relationship the data in the many table is owned by the data at the one end of the relationship. In the case of a many-to-one there is no such ownership. For example, we may define various types of address (e.g. Home, Office, Mailing, Billing, Shipping) in our lookup table, but when we reference one of these types from the Address Data table it does not mean that the address is owned by the Type. In other words, if we were to delete the "Shipping" type we would not automatically want to delete all addresses that referenced it (though we would need to re-assign them).

 

Figure 3: Many-To-One

The pattern is implemented by storing a reference to the primary key of the look-up table in the data table rather than storing the actual data represented by that key. Not only is often simpler to store a single key value than the entire set of data represented by that key but it allows us to avoid hard-coding definitions into data. This is, of course, a mixed blessing since there are times when we definitely do not want changes in definitions to be applied retroactively.

Tables that implement a many-to-one relationship often employ ‘constraints’ (or other database rules) to ensure that only valid data is entered into the Foreign Key columns, but they are not usually involved in persistent relationships and so are not normally involved in the referential integrity management structure.

Many-to-Many Relationships

While this is definitely a pattern, it is one that cannot actually be implemented directly in a database. It defines the situation in which a record in one table is related to one or more records in another, while each record in that table is  related to one, or more, records in the first. The result is that there is no unambiguous way of identifying which set of records in one table belong to a single record in the other. This is what we mean when we say that it cannot be implemented directly.

 

Figure 4: Many-To-Many

The only solution is to break the many-to-many relationship into a pair of one-to-many relationships. This is achieved by introducing a third table to the pattern. These tables have various names and are most commonly referred to as either "Allocation", "Link" or "Cross-Reference" tables.

Often an allocation table contains only sets of foreign keys, although there is no reason why it cannot also contain data in its own right. For example, the relationship between a list of products and orders is many-to-many (one product can appear on many orders, and one order can include many products). The solution to the problem is to create an allocation table by defining line items. Now we can specify that whilst one order may have many line items, each line item can refer to one, and only one product.

Figure 5: Adding an Allocation Table

Thus the relationship between Order, Line Items and Products is one-to-many-to-one. Similarly the relationship between a product and a line item is that a product may appear on many line items, but each line item refers to one, and only one, order. The relationship between Products, Line Items and Orders is now also one-to-many-to-one. With the allocation table in place we have an unambiguous path in each direction; so that we could get a list of all products on an order, or a list of all orders involving a single product.

Incidentally the line items table is an example of an allocation table that contains data in addition to the keys for the tables whose many-to-many relationship it breaks.

Note that when diagrammed the allocation table may look as if it is implementing two separate relationships, a one-to-many in one direction and a many-to-one in the other, but that is merely a limitation of the diagram. The relationships are, as described above one-to-many-to-one in both directions.

Self Referential Relationships

All of the relational patterns discussed so far concern the relationships between records in different tables, i.e. between disparate data. However, it is perfectly possible to have relationships that exist between records in a single table i.e. between identical data. The actual relationships can be either one-to-one, or one-to-many all that is needed is an additional column in the table that contains the primary key of a record in the same table. Records that have no related records can be identified by setting their foreign keys to either their own primary key, zero or null. There is no ‘right’ solution to this, although my personal preference is to use 0 in these situations.

For example, the data relating to people filling positions within an organization is essentially the same. This could be modeled by defining a relational chain that contains a table for each level in the organization (e.g. “executive”, “manager” and “staff”) and establishing a set of one-to-many relationships between them. However, this is very rigid (how do we cope with ‘staff’ who report directly to an ‘executive’?) and quickly becomes unwieldy (when someone gets promoted we have to remove them from one table and re-create them in another).

The solution is to recognize that what we are dealing with is actually self-referential (sometimes called "reflexive") and to implement it in a single table. This is done by including in each record a column which is actually a foreign key pointing to an individual’s immediate superior. Notice that as soon as we do this, it ceases to matter whether that person is a ‘manager’ or an ‘executive’, they are simply the ‘immediate parent’. Also, if someone gets promoted, all that we need to do is to update their foreign key to point to their new boss (any staff who now report to the promoted person are simply assigned by updating their foreign keys).

 

Figure 6: Self-Referential

Self referential tables work well when there are only two levels of relationship (i.e. each record can have only one ‘parent’) and so they are often used when modeling trees. They are less useful when the relationships get more complex, as in Bills of Material, though they can still be used in some specialized situations.

The final part of this little series will take a look at rules.

Published Sunday, November 23, 2008 1:01 PM by andykr
Filed Under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Designing A Database -Part 3

Monday, November 24, 2008 7:21 AM by Rajesh Solanki

Greate! Very usefull articles!!

Thank you for the kind words

# re: Designing A Database -Part 3

Friday, February 13, 2009 9:34 PM by Jorge Vejerano
This articles are very useful for me, thanks and God Bless You!

What do you think?

(required) 
required 
(required)