Modeling Hierarchies (Part 2)
In the last article I gave the theoretical background to using mapping to model complex hierarchies. In this article I will work through an example that is intended to illustrate how the use of the mapping approach can solve some of the key issues associated with managing hierarchies in an application environment.
For the purpose of this example consider the position of a sales representative, named Amanda Barry, who works out of the Massillon plant for a Company which has plants in various parts of the country. She is actually a member of three separate hierarchies simultaneously. These are:
-
Sales Team
-
Plant Organization
-
Customer
We’ll examine each in turn and model them accordingly
The Sales Team hierarchy
The company Sales Force is organized into teams which are based in geographic territories. The hierarchy describing this structure is shown at Figure 1, Note that these nodes represent not only logical groupings, but in some case are also actual positions within the company and have people associated with them. For example, Amanda is a member of the “Great Lakes” team, which has a Manager (Charlie Davis). But the Great Lakes team is considered a part of the “Northern Sub-Division” which is merely a convenient way of sub-dividing the four teams which comprise the Eastern Division . There is actually no real person associated with the “Northern Sub-Division” Node.
To model this structure will require a minimum of two tables, as described above, one for the Node Definitions, and one for the Mapping data (the rip table). Since we already know that we will want to define multiple hierarchies, we will add a third table to the structure, to allow us to categorize nodes. The structure we will use is at Figure 2:
Figure 2: Basic table structure for mapping hierarchies
To map the structure we require entries in these tables as shown at Table 2

Now we can map the structure to the structure table by tracing the paths. Once all of the paths have been defined the data can be added to the hierarchy (the Node PKs are shown below the descriptions in Figure 1). The actual data is at Table 2.
NOTE: We use the “*” character not only to separate, but also to terminate, the key references in the rip field. The reason is so that we can query this field later and avoid issues associated with partial matches on key references when querying the table using the LIKE operator on the rip field.
Consider the following pair of rip fields: “10*20*30” and “10*20*300”.
Clearly they are not referring to the same node but if we were to query where the value is LIKE “10*20*30%” we would get both. By using a termination character we avoid this problem.

The final task is to link our people to the appropriate node in the hierarchy. For the purpose of this illustration we are not concerned with how the Employee data is stored, but simply assume that somewhere there are tables that include a key value for each employee and the usual associated information (Name, Title, Date Joined etc). This data is represented by the “Person” table in Figure 3.
We use an allocation table to allow us to link a single employee to different nodes (we already know that Amanda is a member of three hierarchies, so she will have to be associated with at least three nodes, one in each hierarchy).

The crucial point to notice is that this allocation table links people to the structure table, not to the node!
The reason for this is simply that the node table is only a look-up which allows us to give a meaningful name to a node in the hierarchy. It has no significance whatsoever and indeed, the same name may be used in many places in different hierarchies (how many “Admin” departments are there in the world?). If we were to attach people directly to the node record we would require a separate node record for each individual occurrence of a node in a hierarchy, thereby making the relationship between a node and its structure one-to-one, forcing us back into a very rigid structure.
Now we can add the necessary records to the allocation table for the people that we know about so far, plus the Vice President in Charge of Sales Nationally (Jane Kingsland) and the head of the Eastern Division (Martin Niles) and the manager and a sales representative for the Mid-West team :
What does this give us?
It now allows us to create static views that query the hierarchy and return matching members. Here are the necessary definitions for two views. The first returns the rip field associated with a given node description:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('v_GetRip') and sysstat & 0xf = 2)
DROP VIEW v_GetRip
GO
CREATE VIEW v_GetRip AS
SELECT NN.node_pk, NN.node_dsc, RTRIM( RF.node_path) + '%' AS rip_field
FROM node_stru RF
INNER JOIN node_def NN ON RF.node_fk = NN.node_pk
A simple query is now all that is needed to return the appropriate rip field mask that defines the starting point for data selection. The important thing to recognize is that this ‘mask’ allows to extract information from any level of the hierarchy without the need to know anything about the level or the relationships between the nodes. For example:
SELECT rip_field FROM v_getrip WHERE node_dsc = ‘Great Lakes’
RETURNS “1*2*4*9*%”
while
SELECT rip_field FROM v_getrip WHERE node_dsc LIKE ‘Mid%’
RETURNS “1*2*4*8*%”
This view can then be combined with another static view to return a list of people who are members of whatever node we wish to query by:
IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('v_GetMembers') and sysstat & 0xf = 2)
DROP VIEW v_GetMembers
GO
CREATE VIEW v_GetMembers AS
SELECT DISTINCT VR.node_dsc keyval, ND.node_pk, ND.node_dsc,PN.title_dsc, PN.fname_nme, PN.lname_nme
FROM v_GetRip VR
INNER JOIN ( node_stru NS INNER JOIN node_def ND ON NS.node_fk = ND.node_pk )
ON NS.node_path LIKE VR.rip_field
LEFT OUTER JOIN node_psn_alloc AL ON AL.nodestru_fk = NS.nodestru_pk
LEFT OUTER JOIN person PN ON PN.psn_pk = AL.psn_fk
Simple queries on this view return the list of people who qualify:
SELECT * FROM v_getmembers WHERE node_dsc = 'Great Lakes'
RETURNS
1 Great Lakes Sales Rep Amanda Barry
2 Great Lakes Sales Manager Charlie Davis
while
SELECT * FROM v_getmembers WHERE keyval = 'Mid-West'
RETURNS
Mid-West 8 Mid-West Sales Manager Ellen Franks
Mid-West 8 Mid-West Sales Rep George Harrison
and
SELECT * FROM v_getmembers WHERE node_dsc = 'Northern Sub-Division'
RETURNS
Northern Sub-Division 4 Northern Sub-Division NULL NULL NULL
Northern Sub-Division 8 Mid-West Sales Manager Ellen Franks
Northern Sub-Division 8 Mid-West Sales Rep George Harrison
Northern Sub-Division 9 Great Lakes Sales Manager Charlie Davis
Northern Sub-Division 9 Great Lakes Sales Rep Amanda Barry
Any query relating to People is now a simple matter. Assuming that we have sales information related to the ID of the person who made the sale, we can get the total sales for any node in the hierarchy by simply summing the sales associated with the list of people who are related to that node:
SELECT SUM( sales) FROM sales_data WHERE sales_person_id IN
(SELECT psn_pk FROM v_getmembers WHERE node_desc = ‘Great Lakes’ )
The same method can be used to extract the members of a Sales Team
SELECT name FROM emp_data WHERE person_pk IN
(SELECT psn_pk FROM v_getmembers WHERE node_desc = ‘Great Lakes’ )
The Plant Hierarchy
The second hierarchy of which our notional person is a member is the “plant hierarchy”, part of which is illustrated in Figure 4. 
We can simply add the data for this hierarchy to our existing tables (Table 4) in exactly the same way as for the Sales Organization detailed above. In this case the Node Category will be "2" and the various plants are added as new nodes – their structure being added to the Node_Stru table as before:
Now we need to associate the Sales Representatives with the plants which they represent (Table 5). In this case only Amanda Barry (Massillon) and George Harrison (Columbus) are directly associated with plants, so only two new entries are required in the allocation table (obviously there would be other people associated with the plants hierarchy but the objective here is to focus only on the Sales side of the business):
Our existing views, of course, will work just as well with this hierarchy as with the Sales hierarchy. Thus:
SELECT * FROM v_getmembers WHERE keyval LIKE 'North Ohio%'
RETURNS
North Ohio 15 North Ohio NULL NULL NULL NULL
North Ohio 17 Cleveland NULL NULL NULL NULL
North Ohio 18 Massillon Sales Rep Amanda Barry 1
Notice that we can now sum our sales by any node in either hierarchy using the same basic query. To get all the sales for all Ohio Plants the query is simply:
SELECT SUM( sales ) FROM sales_data WHERE sales_person_id IN
(SELECT psn_pk FROM v_getmembers WHERE keyval = ‘Ohio Plants’ )
and to get a list of representatives who are working out of any given plant, we use exactly the same query as we used to get a sales team:
SELECT SUM( sales ) FROM sales_data WHERE sales_person_id IN
(SELECT psn_pk FROM v_getmembers WHERE keyval = ‘Columbus’ )
The Customer Hierarchy
Adding a customer hierarchy is as simple as adding the plant hierarchy. Figure 5 shows a partial hierarchy for a large customer group and shows the entries for the Node and Node_Stru tables directly on the diagram:

One more thing that we need to do to accommodate this new hierarchy is to link our sales people to the appropriate node in the hierarchy. For the sake of illustration, we are mapping our Sales Reps to the lowest level of this hierarchy (Amanda Barry to 'NE Ohio [Node 24]' and George Harrison to SE Ohio[Node 25]), the Division Head to the Region level (Martin Niles to NE US [Node 22]) and our VP to National Level 
The net result is that the VP (Jane Kingsland) is now associated with two structures; first, she is attached to the organization at the National Sales node, and second, to the customer organization at the "Wendy's USA" node. Similarly other members participate in multiple structures, so that our friend Amanda is actually in three. She is associated with the sales tree at the 'Great Lakes' level, with plants at 'Massillon' and customers at 'SE Ohio'. However, no matter how many structures an entity is associated with, we still only ever have one record in the person table and one record in the node definition table!
Maintaining the structure
As can be seen, using the approach detailed here it is very easy to map, and to retrieve the data associated with, any number of hierarchies. By defining simple views to hide the complexity of the actual structure it is possible to support any desired user interface and even to dynamically add and delete entire hierarchies. The structures illustrated here show the bare minimum of information that is necessary to manage the data but do show the principles that underlie the approach. An obvious refinement to the Node Structure table is to add a “level” key, to permit data retrieval across all nodes that lie at a specific level of a hierarchy (for example, this would make implementing a TreeView to display the contents of a hierarchy very easy).
By modifying the node structure table so that it includes an additional foreign key for the parent node, and the level, the structure can be maintained using triggers that, when inserting a new record, generate the rip field and even level automatically based upon the rip field and level of the node to which the new record is being related (Figure 6).
The resulting table, for all hierarchies is shown at Table 7:

The T-SQL code for the insert trigger is very simple:
IF OBJECT_ID('node_stru_ins') IS NOT NULL
DROP TRIGGER node_stru_ins
GO
/*********************************************************************
TRIGGER.: node_stru_ins
Function: Insert Trigger for Structure table that automatically creates
........: the Rip Field when adding a new item of structural information
Called..: INSERT INTO node_stru ( node_fk, parent_node_fk )
VALUES ( <emporg_pk>, <parent_pk> )
**********************************************************************/
CREATE TRIGGER node_stru_ins ON node_stru FOR INSERT
AS
BEGIN
UPDATE TSU
SET node_lvl = CASE WHEN TSU.parent_fk IS NULL THEN 1 ELSE PSU.node_lvl + 1 END,
node_path = CASE WHEN TSU.parent_fk IS NULL
THEN '*' ELSE RTRIM(PSU.node_path) END
+ CAST( TSU.node_fk AS VARCHAR( 10 ) ) + '*'
FROM node_stru TSU
INNER JOIN inserted INS ON INS.nodestru_pk = TSU.nodestru_pk
LEFT OUTER JOIN node_stru PSU ON TSU.parent_fk = PSU.node_fk
END
GO
To return the data in a form suitable for display in a treeview we need a little more code, but not very much. Here is a stored procedure that accepts either the PK of a node, or the actual Node Description and uses the rip field to get the nodes, and sequence them correctly, so that they can be displayed.
IF OBJECT_ID('GetTree') IS NOT NULL
DROP PROCEDURE GetTree
GO
/********************************************************************************
SP Name.: GetTree
Function: Retrieve a Tree from the specified root node
Author..: Andy Kramek
Syntax..: EXEC GetTree @vRoot
Params..: @vRoot Either Title/Location Name or PK for the node_def Table
********************************************************************************/
CREATE PROCEDURE GetTree @vRoot SQL_VARIANT AS
SET NOCOUNT ON
/* Figure out what we got given here */
DECLARE @node_pk INT, @node_dsc VARCHAR(50)
SET @node_pk = 0
IF @vRoot > 0
/* Primary Key */
SET @node_pk = CAST( @vRoot AS INTEGER )
ELSE
BEGIN
/* Node Name */
SET @node_dsc = LTRIM( RTRIM( CAST( @vRoot AS VARCHAR(50) )))
/* Get the pk for the specified node */
SELECT @node_pk = node_pk FROM node_def WHERE node_dsc = @node_dsc
END
/* We now have a PK so get the rip field as a mask for the specified node */
DECLARE @Rip VARCHAR(1000), @Max INT
SELECT @Rip = RTRIM(node_path) + '%' FROM node_stru WHERE node_fk = @node_pk
/* Get all matching data, INCLUDING the root node, into a temporary table */
SELECT EO.node_dsc, ES.node_lvl, CAST( 0 AS bit ) haschild_flg,
CAST( 0 AS bit ) isleaf_flg, ES.node_path, EO.node_pk, ES.parent_fk, ES.nodestru_pk
INTO #tempstru
FROM node_def EO, node_stru ES
WHERE EO.node_pk = ES.node_fk
AND ES.node_path LIKE @Rip
ORDER BY ES.node_lvl, ES.parent_fk, EO.node_dsc
/* Now we need to process the data to set the Child and Leaf flags */
IF @@ROWCOUNT > 0
BEGIN
/* Declare the variables we'll need for the cursor operation */
DECLARE @urrip VARCHAR(1000), @nLevel INT, @HasCh BIT, @sLeaf BIT, @nodestru_pk INT
/* And the ones for manipulation */
DECLARE @LastKey INT, @lnCnt INT, @LeafKey INT
/* Create a cursor for the specified fields */
DECLARE curelems CURSOR FOR
SELECT node_path, node_lvl, haschild_flg, isleaf_flg, nodestru_pk
FROM #tempstru
/* Open the cursor, and get first 'record' */
OPEN curelems
FETCH NEXT FROM curelems INTO @urrip, @nLevel, @HasCh, @sLeaf, @nodestru_pk
/* Set the counter */
SET @lnCnt = @@FETCH_STATUS
/* Process all records */
WHILE @lnCnt = 0
BEGIN
SET @LastKey = @nodestru_pk
/* Does it have any children */
IF EXISTS( SELECT node_fk FROM node_stru
WHERE (node_path LIKE @urrip + '%'
AND nodestru_pk <> @LastKey) )
BEGIN
/* It does have children! */
UPDATE #tempstru SET haschild_flg = 1 WHERE nodestru_pk = @LastKey
/* Now figure out which of the children is the last at that level*/
SET @LeafKey = (SELECT TOP 1 nodestru_pk FROM #tempstru
WHERE (node_path LIKE @urrip + '%'
AND nodestru_pk <> @LastKey )
AND node_lvl = @nLevel + 1
ORDER BY node_dsc DESC )
IF @leafkey > 0
UPDATE #tempstru SET isleaf_flg = 1 WHERE nodestru_pk = @Leafkey
END
/* Get the next 'record' */
FETCH NEXT FROM curelems INTO @urrip, @nLevel, @HasCh, @sLeaf, @nodestru_pk
SET @lnCnt = @@FETCH_STATUS
END
/* Lose the cursor */
CLOSE curelems
DEALLOCATE curelems
END
/* Get the results and lose the temporary table */
SELECT * FROM #tempstru ORDER BY node_lvl, parent_fk, node_dsc
DROP TABLE #tempstru
RETURN
GO
Calling this procedure for "Ohio Plants" returns the data shown in Table 8. Note that the "Has Child" and "Is Leaf" flags are added dynamically. The first is used to indicate that a node has children, the second to indicate that the node is the last node at the current level. These are useful when generating a tree by parsing the data into an array or other structure used by the UI development tool. The PKs are included so that any modifications made can be sent back to the database by referencing the relevant primary key.

Unlike the Left-Right numbering approach, changes to hierarchies modeled using a rip field affect only those records that are directly involved. There is no need to reconstruct the entire hierarchy whenever a node is added, deleted or moved. All that is necessary is to change any records that include the affected node, replacing the relevant portion of the rip field with the amended version.
For example, if we were to disband the Southern Sub Division (see Figure 1) and move its “Central” sales team to the Northern Sub Division and the “South West” team to the Western Division, a total of three records in the Node Structure table are affected (Table 9). The process is simple:
-
Retrieve the ripfield mask for the node to be removed (OldStru)
-
Retrieve the mask for the node to which all child nodes are to be re-assigned (NewStru)
-
Update the structure table replacing all occurrences of "OldStru" with "NewStru"
-
Delete the one record where the rip field is exactly equal to "OldStru"
Obviously, any tables which included references to the deleted record would also have to be updated, but the key point here is that nothing else has to change. All data which previously aggregated to the “Southern Sub” group is now going to be aggregated into either the “Northern” or “Western” divisions.
Conclusion
The mapping approach described here provides the most flexible and easily maintained method of modeling hierarchical data of any type. By separating structural information from definition the retrieval of hierarchical information is simplified to the point where simple SQL can be used directly instead of requiring table queries, recursive “Tree-Walking” operations or the complex and inflexible left-right numbering which are the only viable alternatives.
NOTE: The sample SQL2005 database used in this article is attached as a zip file. Simply restore the database to your local SQL 2005 instance to see how all this works in practice.