Welcome to Foxite.COM Community Weblog Sign in | Join | Help

Modeling Hierarchies (Part 2)

In the last article I gave the theoretical background to using mapping to model complex hierarchies. In this article I will work through an example that is intended to illustrate how the use of the mapping approach can solve some of the key issues associated with managing hierarchies in an application environment.

For the purpose of this example consider the position of a sales representative, named Amanda Barry, who works out of the Massillon plant for a Company which has plants in various parts of the country. She is actually a member of  three separate hierarchies simultaneously. These are:

  • Sales Team
  • Plant Organization
  • Customer

We’ll examine each in turn and model them accordingly

The Sales Team hierarchy

The company Sales Force is organized into teams which are based in geographic territories. The hierarchy describing this structure is shown at Figure 1, Note that these nodes represent not only logical groupings, but in some case are also actual positions within the company and have people associated with them. For example, Amanda is a member of the “Great Lakes” team, which has a Manager (Charlie Davis). But the Great Lakes team is considered a part of the “Northern Sub-Division” which is merely a convenient way of sub-dividing the four teams which comprise the Eastern Division . There is actually no real person associated with the “Northern Sub-Division” Node.

To model this structure will require a minimum of two tables, as described above, one for the Node Definitions, and one for the Mapping data (the rip table). Since we already know that we will want to define multiple hierarchies, we will add a third table to the structure, to allow us to categorize nodes. The structure we will use is at Figure 2:

Figure 2: Basic table structure for mapping hierarchies

To map the structure we require entries in these tables as shown at Table 2

Now we can map the structure to the structure table by tracing the paths. Once all of the paths have been defined the data can be added to the hierarchy (the Node PKs are shown below the descriptions in Figure 1). The actual data is at Table 2.

NOTE: We use the “*” character not only to separate, but also to terminate, the key references in the rip field. The reason is so that we can query this field later and avoid issues associated with partial matches on key references when querying the table using the LIKE operator on the rip field.
Consider the following pair of rip fields: “10*20*30” and “10*20*300”.
Clearly they are not referring to the same node but if we were to query where the value is LIKE “10*20*30%” we would get both. By using a termination character we avoid this problem.

The final task is to link our people to the appropriate node in the hierarchy. For the purpose of this illustration we are not concerned with how the Employee data is stored, but simply assume that somewhere there are tables that include a key value for each employee and the usual associated information (Name, Title, Date Joined etc). This data is represented by the “Person” table in Figure 3.

We use an allocation table to allow us to link a single employee to different nodes (we already know that Amanda is a member of three hierarchies, so she will have to be associated with at least three nodes, one in each hierarchy).

The crucial point to notice is that this allocation table links people to the structure table, not to the node!

The reason for this is simply that the node table is only a look-up which allows us to give a meaningful name to a node in the hierarchy. It has no significance whatsoever and indeed, the same name may be used in many places in different hierarchies (how many “Admin” departments are there in the world?). If we were to attach people directly to the node record we would require a separate node record for each individual occurrence of a node in a hierarchy, thereby making the relationship between a node and its structure one-to-one, forcing us back into a very rigid structure.

Now we can add the necessary records to the allocation table for the people that we know about so far, plus the Vice President in Charge of Sales Nationally (Jane Kingsland) and the head of the Eastern Division (Martin Niles) and the manager and a sales representative for the Mid-West team :

What does this give us?

It now allows us to create static views that query the hierarchy and return matching members. Here are the necessary definitions for two views. The first returns the rip field associated with a given node description:

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('v_GetRip') and sysstat & 0xf = 2)
  DROP VIEW v_GetRip
GO

CREATE VIEW v_GetRip AS
 SELECT NN.node_pk, NN.node_dsc, RTRIM( RF.node_path) + '%' AS rip_field
   FROM node_stru RF 
        INNER JOIN node_def NN ON RF.node_fk = NN.node_pk

A simple query is now all that is needed to return the appropriate rip field mask that defines the starting point for data selection. The important thing to recognize is that this ‘mask’ allows to extract information from any level of the hierarchy without the need to know anything about the level or the relationships between the nodes. For example:

SELECT rip_field FROM v_getrip WHERE node_dsc = ‘Great Lakes’
RETURNS “1*2*4*9*%”

while

SELECT rip_field FROM v_getrip WHERE node_dsc LIKE ‘Mid%’
RETURNS “1*2*4*8*
%

This view can then be combined with another static view to return a list of people who are members of whatever node we wish to query by:

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id('v_GetMembers') and sysstat & 0xf = 2)
  DROP VIEW v_GetMembers
GO

CREATE VIEW v_GetMembers AS
SELECT DISTINCT VR.node_dsc keyval, ND.node_pk, ND.node_dsc,PN.title_dsc, PN.fname_nme, PN.lname_nme
  FROM v_GetRip VR
       INNER JOIN ( node_stru NS INNER JOIN node_def ND ON NS.node_fk = ND.node_pk )
            ON NS.node_path LIKE VR.rip_field
       LEFT OUTER JOIN node_psn_alloc AL ON AL.nodestru_fk = NS.nodestru_pk
       LEFT OUTER JOIN person PN ON PN.psn_pk = AL.psn_fk

Simple queries on this view return the list of people who qualify:

SELECT * FROM v_getmembers WHERE node_dsc = 'Great Lakes'
RETURNS
1 Great Lakes Sales Rep     Amanda Barry
2 Great Lakes Sales Manager Charlie       Davis

while

SELECT * FROM v_getmembers WHERE keyval = 'Mid-West'
RETURNS
Mid-West  8  Mid-West      Sales Manager Ellen  Franks
Mid-West  8  Mid-West      Sales Rep     George Harrison

and

SELECT * FROM v_getmembers WHERE node_dsc = 'Northern Sub-Division'
RETURNS
Northern Sub-Division      4      Northern Sub-Division      NULL   NULL   NULL
Northern Sub-Division      8      Mid-West      Sales Manager Ellen  Franks
Northern Sub-Division      8      Mid-West      Sales Rep     George Harrison
Northern Sub-Division      9      Great Lakes   Sales Manager Charlie       Davis
Northern Sub-Division      9      Great Lakes   Sales Rep     Amanda Barry

Any query relating to People is now a simple matter. Assuming that we have sales information related to the ID of the person who made the sale, we can get the total sales for any node in the hierarchy by simply summing the sales associated with the list of people who are related to that node:

SELECT SUM( sales)  FROM sales_data WHERE sales_person_id IN 
   (SELECT psn_pk FROM v_getmembers WHERE node_desc = ‘Great Lakes’ )

The same method can be used to extract the members of a Sales Team

SELECT name FROM emp_data WHERE person_pk IN

   (SELECT psn_pk FROM v_getmembers WHERE node_desc = ‘Great Lakes’ )

The Plant Hierarchy

The second hierarchy of which our notional person is a member is the “plant hierarchy”, part of which is  illustrated in Figure 4.  

We can simply add the data for this hierarchy to our existing tables (Table 4) in exactly the same way as for the Sales Organization detailed above. In this case the Node Category will be "2" and the various plants are added as new nodes – their structure being added to the Node_Stru table as before: 

Now we need to associate the Sales Representatives with the plants which they represent (Table 5). In this case only Amanda Barry (Massillon) and George Harrison (Columbus) are directly associated with plants, so only two new entries are required in the allocation table (obviously there would be other people associated with the plants hierarchy but the objective here is to focus only on the Sales side of the business): 

Our existing views, of course, will work just as well with this hierarchy as with the Sales hierarchy. Thus:

SELECT * FROM v_getmembers WHERE keyval LIKE 'North Ohio%'
RETURNS
North Ohio 15 North Ohio    NULL   NULL   NULL   NULL
North Ohio 17 Cleveland     NULL   NULL   NULL   NULL
North Ohio 18 Massillon     Sales Rep     Amanda Barry  1

 

Notice that we can now sum our sales by any node in either hierarchy using the same basic query. To get all the sales for all Ohio Plants the query is simply:

SELECT SUM( sales )  FROM sales_data WHERE sales_person_id IN 
     (SELECT psn_pk FROM v_getmembers WHERE keyval = ‘Ohio Plants’ )

and to get a list of representatives who are working out of any given plant, we use exactly the same query as we used to get a sales team:

SELECT SUM( sales )  FROM sales_data WHERE sales_person_id IN 
     (SELECT psn_pk FROM v_getmembers WHERE keyval = ‘Columbus’ )

 

The Customer Hierarchy

Adding a customer hierarchy is as simple as adding the plant hierarchy. Figure 5 shows a partial hierarchy for a large customer group and shows the entries for the Node and Node_Stru tables directly on the diagram:

 

One more thing that we need to do to accommodate this new hierarchy is to link our sales people to the appropriate node in the hierarchy. For the sake of illustration, we are mapping our Sales Reps to the lowest level of this hierarchy (Amanda Barry to 'NE Ohio [Node 24]' and George Harrison to SE Ohio[Node 25]), the Division Head to the Region level (Martin Niles to NE US [Node 22]) and our VP to National Level  

The net result is that the VP (Jane Kingsland) is now associated with two structures; first, she is attached to the organization at the National Sales node, and second, to the customer organization at the "Wendy's USA" node. Similarly other members participate in multiple structures, so that our friend Amanda is actually in three. She is associated with the sales tree at the 'Great Lakes' level, with plants at 'Massillon' and customers at 'SE Ohio'. However, no matter how many structures an entity is associated with, we still only ever have one record in the person table and one record in the node definition table!

Maintaining the structure

As can be seen, using the approach detailed here it is very easy to map, and to retrieve the data associated with, any number of hierarchies. By defining simple views to hide the complexity of the actual structure it is possible to support any desired user interface and even to dynamically add and delete entire hierarchies. The structures illustrated here show the bare minimum of information that is necessary to manage the data but do show the principles that underlie the approach. An obvious refinement to the Node Structure table is to add a “level” key, to permit data retrieval across all nodes that lie at a specific level of a hierarchy (for example, this would make implementing a TreeView to display the contents of a hierarchy very easy).

By modifying the node structure table so that it includes an additional foreign key for the parent node, and the level, the structure can be maintained using triggers that, when inserting a new record, generate the rip field and even level automatically based upon the rip field and level of the node to which the new record is being related (Figure 6).  

The resulting table, for all hierarchies is shown at Table 7:

 

 

The T-SQL code for the insert trigger is very simple:

IF OBJECT_ID('node_stru_ins') IS NOT NULL
  DROP TRIGGER node_stru_ins
GO
/*********************************************************************
TRIGGER.: node_stru_ins
Function: Insert Trigger for Structure table that automatically creates
........: the Rip Field when adding a new item of structural information
Called..: INSERT INTO node_stru ( node_fk, parent_node_fk )
               VALUES ( <emporg_pk>, <parent_pk> )
**********************************************************************/
CREATE TRIGGER node_stru_ins ON node_stru FOR INSERT
AS
BEGIN

  UPDATE TSU
    SET node_lvl = CASE WHEN TSU.parent_fk IS NULL THEN 1 ELSE PSU.node_lvl + 1 END,
        node_path = CASE WHEN TSU.parent_fk IS NULL
                         THEN '*' ELSE RTRIM(PSU.node_path) END
                          + CAST( TSU.node_fk AS VARCHAR( 10 ) ) + '*'
     FROM node_stru TSU
          INNER JOIN inserted INS ON INS.nodestru_pk = TSU.nodestru_pk
          LEFT OUTER JOIN node_stru PSU ON TSU.parent_fk = PSU.node_fk
END
GO

To return the data in a form suitable for display in a treeview we need a little more code, but not very much. Here is a stored procedure that accepts either the PK of a node, or the actual Node Description and uses the rip field to get the nodes, and sequence them correctly, so that they can be displayed.

IF OBJECT_ID('GetTree') IS NOT NULL
  DROP PROCEDURE GetTree
GO
/********************************************************************************
SP Name.: GetTree
Function: Retrieve a Tree from the specified root node
Author..: Andy Kramek
Syntax..: EXEC GetTree @vRoot
Params..: @vRoot Either Title/Location Name or PK for the node_def Table
********************************************************************************/
CREATE PROCEDURE GetTree @vRoot SQL_VARIANT  AS
SET NOCOUNT ON

/* Figure out what we got given here */
DECLARE @node_pk INT, @node_dsc VARCHAR(50)
SET  @node_pk = 0
IF @vRoot > 0
  /* Primary Key */
  SET  @node_pk = CAST( @vRoot AS INTEGER )
ELSE
  BEGIN
    /* Node Name */
    SET @node_dsc = LTRIM( RTRIM( CAST( @vRoot AS VARCHAR(50) )))
    /* Get the pk for the specified node */
    SELECT @node_pk = node_pk FROM node_def WHERE node_dsc = @node_dsc
  END

/* We now have a PK so get the rip field as a mask for the specified node */
DECLARE @Rip VARCHAR(1000), @Max INT
SELECT @Rip = RTRIM(node_path) + '%' FROM node_stru WHERE node_fk = @node_pk

/* Get all matching data, INCLUDING the root node, into a temporary table */
SELECT EO.node_dsc, ES.node_lvl, CAST( 0 AS bit ) haschild_flg,
       CAST( 0 AS bit ) isleaf_flg,  ES.node_path, EO.node_pk, ES.parent_fk, ES.nodestru_pk
  INTO #tempstru
  FROM node_def EO, node_stru ES
 WHERE EO.node_pk = ES.node_fk
   AND ES.node_path LIKE @Rip
   ORDER BY ES.node_lvl, ES.parent_fk, EO.node_dsc

/* Now we need to process the data to set the Child and Leaf flags */
IF @@ROWCOUNT > 0
  BEGIN
    /* Declare the variables we'll need for the cursor operation */
    DECLARE @urrip VARCHAR(1000), @nLevel INT, @HasCh BIT, @sLeaf BIT, @nodestru_pk INT
    /* And the ones for manipulation */
    DECLARE @LastKey INT, @lnCnt INT, @LeafKey INT

    /* Create a cursor for the specified fields */
    DECLARE curelems CURSOR FOR
      SELECT node_path, node_lvl, haschild_flg, isleaf_flg, nodestru_pk
        FROM #tempstru    

    /* Open the cursor, and get first 'record' */
    OPEN curelems
    FETCH NEXT FROM curelems INTO @urrip, @nLevel, @HasCh, @sLeaf, @nodestru_pk   

    /* Set the counter */
    SET @lnCnt = @@FETCH_STATUS
    /* Process all records */
    WHILE @lnCnt = 0
    BEGIN
      SET @LastKey = @nodestru_pk
      /* Does it have any children */
      IF EXISTS( SELECT node_fk FROM node_stru
                 WHERE (node_path LIKE @urrip + '%'
                   AND nodestru_pk <> @LastKey) )
        BEGIN
          /* It does have children! */
          UPDATE #tempstru SET haschild_flg = 1 WHERE nodestru_pk = @LastKey
          /* Now figure out which of the children is the last at that level*/
          SET @LeafKey = (SELECT TOP 1 nodestru_pk FROM #tempstru
                               WHERE (node_path LIKE @urrip + '%'
                                 AND nodestru_pk <> @LastKey )
                                 AND node_lvl = @nLevel + 1 
                               ORDER BY node_dsc DESC )
          IF @leafkey > 0
             UPDATE #tempstru SET isleaf_flg = 1 WHERE nodestru_pk = @Leafkey
        END
      /* Get the next 'record' */
      FETCH NEXT FROM curelems INTO @urrip, @nLevel, @HasCh, @sLeaf, @nodestru_pk
      SET @lnCnt = @@FETCH_STATUS
    END  

    /* Lose the cursor */
    CLOSE curelems
    DEALLOCATE curelems
  END
  /* Get the results and lose the temporary table */
  SELECT * FROM #tempstru ORDER BY node_lvl, parent_fk, node_dsc
  DROP TABLE #tempstru
RETURN
GO

Calling this procedure for "Ohio Plants" returns the data shown in Table 8. Note that the "Has Child" and "Is Leaf" flags are added dynamically. The first is used to indicate that a node has children, the second to indicate that the node is the last node at the current level. These are useful when generating a tree by parsing the data into an array or other structure used by the UI development tool. The PKs are included so that any modifications made can be sent back to the database by referencing the relevant primary key. 

Unlike the Left-Right numbering approach, changes to hierarchies modeled using a rip field affect only those records that are directly involved. There is no need to reconstruct the entire hierarchy whenever a node is added, deleted or moved. All that is necessary is to change any records that include the affected node, replacing the relevant portion of the rip field with the amended version. 

For example, if we were to disband the Southern Sub Division (see Figure 1) and move its “Central” sales team to the Northern Sub Division and the “South West” team to the Western Division, a total of three records in the Node Structure table are affected (Table 9). The process is simple:

  • Retrieve the ripfield mask for the node to be removed (OldStru)
  • Retrieve the mask for the node to which all child nodes are to be re-assigned (NewStru)
  • Update the structure table replacing all occurrences of "OldStru" with "NewStru"
  • Delete the one record where the rip field is exactly equal to "OldStru"

 

Obviously, any tables which included references to the deleted record would also have to be updated, but the key point here is that nothing else has to change. All data which previously aggregated to the “Southern Sub” group is now going to be aggregated into either the “Northern” or “Western” divisions.

Conclusion

The mapping approach described here provides the most flexible and easily maintained method of modeling hierarchical data of any type. By separating structural information from definition the retrieval of hierarchical information is simplified to the point where simple SQL can be used directly instead of requiring table queries, recursive “Tree-Walking” operations or the complex and inflexible left-right numbering which are the only viable alternatives.

NOTE: The sample SQL2005 database used in this article is attached as a zip file. Simply restore the database to your local SQL 2005 instance to see how all this works in practice.

Published Saturday, April 25, 2009 2:50 PM by andykr
Filed Under:
Attachment(s): ripdemo.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Modeling Hierarchies (Part 2)

Sunday, April 26, 2009 5:08 AM by Steven Black

Excellent, excellent article, Andy.  Very nicely presented.

Thank you Steven, from you that is high praise indeed. -- Andy

# re: Modeling Hierarchies (Part 2)

Monday, April 27, 2009 7:17 AM by Vinay Pagaria

Very interesting article.

I have been using the LEFT - RIGHT approach since the last five years without ever knowing formally that such a concept exists.

The idea of breaking out the main table and the structure table appears to be great. The only question that is disturbing me is why should i not break the rip field "1*2*4*" apart into three different rows in the table..... Would that not normalize the table instead of me having to use the like clause...

How do you know which parent node a given node belongs to in context? Look at the examples in the article. The same node can appear in multiple 'pathways' and in combination with different nodes in different paths. So if you normalize this out, you would have to include the parent - i.e. a recursive structure with all the limitations that implies!

Using the left-right approach and a combination of certain SQL Queries what i also achieve is SUBTOTALS at every level of the heirarchy. WOuld that be possible with the rip-field...

I even included examples of aggregating to nodes in the article, so I don't understand why you are even asking this. Obviously you can do that.

Am i missing something.......  

Apparently. The whole point of the mapping table is that it describes the entire path to a node in a single column. Every unique combination of patheways is therefore mapped as a separate record and can be combined in any way that you want. -- Andy

# re: Modeling Hierarchies (Part 2)

Monday, May 18, 2009 6:34 PM by Dan

Andy,

Great article.

I am completely sold but came across this article...

http://forums.mysql.com/read.php?125,101885,101954#msg-101954

Is Bill suggesting a solution to what Vinay (poster above) mentioned about normalizing the RIP column?  If so, your reply to him still holds?

To be honest I am not sure what Bill is suggesting. I did not understand it and could not navigate to any other link to find more detail. Sorry. 

Im fairly new to all this, I've been using the Adjacent List recursive model for over 6 years and this solution seems to be a massive improvement.  

I have used all the models I discussed at one time or another. As noted in the article, the limitations on the Recursive and Nested Set approaches were, for me anyway, sufficiently serious that I would not consider using them except in very specialized circumstances.  

Just a bit worried about the performance of the LIKE parameter!

I have used the Rip field, as described, for many years and in dozens of different scenarios. I have never seen any particular issue with the performance of LIKE and in my experience  this model out-performs any other approach by a significant margin (10x or more) with very large data sets. Also the ability to handle multiple hierarchies within a single structure is invaluable. 

# re: Modeling Hierarchies (Part 2)

Wednesday, May 20, 2009 4:35 AM by andykr
Bill Karwin Emailed me this note in response to Dan's Comment (and my note that I didn't understand it). Here it is in full:

Dan emailed me to clarify the tree-storage design I mentioned in that old MySQL forums post.

I call the design "Closure Table."  It's not widely documented.  

Vadim Tropashko calls it "Transitive Closure Relation" in a blog post here:
http://vadimtropashko.wordpress.com/nested-sets-and-relational-division/  

Tropashko also talks about it in his book "SQL Design Patterns."

The idea is that you create a second table, to store every path from an ancestor to each of its descendants, including the path of length zero, from the ancestor to itself.

CREATE TABLE ClosureTable (
 ancestor INT NOT NULL,
 descendant INT NOT NULL,
 PRIMARY KEY (ancestor, descendant),
 FOREIGN KEY (ancestor) REFERENCES Node_def(node_pk),
 FOREIGN KEY (descendant) REFERENCES Node_def(node_pk) );

The data for the Sales Region hierarchy in your article would look like the following (using multi-row INSERT shorthand, although I realize Microsoft SQL Server does not support this syntax):

INSERT INTO ClosureTable (ancestor, descendant) VALUES (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),
(1,7),(1,8),(1,9),(1,10),(1,11),
(2,2),(2,4),(2,5),(2,8),(2,9),(2,10),(2,11),
(3,3)(3,6),(3,7),
(4,4),(4,8),(4,9),
(5,5),(5,10),(5,11),
(6,6),(7,7),(8,8),(9,9),(10,10),(11,11);

Now you have the hierarchy represented in a way that can be queried and manipulated.  

Querying descendants:

SELECT descendant FROM ClosureTable WHERE ancestor = 2;
-- Returns 2,4,5,8,9,10,11.

Querying direct ancestors:

SELECT ancestor from ClosureTable WHERE descendant = 8;
-- Returns 1,2,4,8.

Insert a node (e.g. #12) to the hierarchy as child of #8 by copying all the paths terminating in #8, plus one more zero-length path from #12 to #12:

INSERT INTO ClosureTable (ancestor, descendant)  SELECT ancestor, 12 FROM ClosureTable WHERE descendant = 8  UNION ALL SELECT 12, 12;
-- inserts (1,12),(2,12),(4,12),(8,12),(12,12)

Delete a subtree of node #5 and its descendants:

DELETE FROM ClosureTable
WHERE descendant IN
(SELECT descendant FROM ClosureTable WHERE ancestor = 5);
-- deletes (1,5),(1,10),(1,11),(5,5),(5,10)(5,11),(10,10),(11,11)

One important advantage of this design is that it supports referential integrity -- unlike the RIP field you show in your article, and also unlike the Nested Sets design that is popular in SQL articles and books.



# re: Modeling Hierarchies (Part 2)

Sunday, June 14, 2009 1:13 PM by John
Absolutely fabulous.

What do you think?

(required) 
required 
(required)