Part of the EllisLab Network
   
4 of 6
4
Hierarchical database trees using nested sets
Posted: 27 July 2007 10:16 AM   [ Ignore ]   [ # 46 ]  
Grad Student
Rank
Total Posts:  48
Joined  06-25-2007

I have been working on a similar class (not specifically for CI) to deal with mptt. The one function I am stuck on atm is moving nodes from one branch to an entirely different branch. The class isn’t as ‘functional’ per se as yours, I do not provide all the helper functions.

Right now it provides the following basic functionality:
- inserts nodes
- deletes a node and all of its sub nodes
- deletes a node and moves all of its sub nodes up

The last real thing I need to accomplish is moving a node from one part of a tree to another.

Go check it out: http://ioreader.com/code/mptt.phps

PS:
This is a standalone class that can be configured for any database table. The requirement for that table is that it have a primary key, a column for a left it, and a column for a right id. The column names are irrelevant.

At the moment, it only uses mysql; however, I have isolated those functions enough that making it use the CI db would be a cinch. Also, the configure function currently truncates the table for debugging purposes.

 Signature 

I/O Reader

Profile
 
 
Posted: 28 July 2007 07:06 PM   [ Ignore ]   [ # 47 ]  
Summer Student
Avatar
Total Posts:  6
Joined  07-24-2007

Sorry to be an ass but nested sets are dangerous and unreliable.  They violate the relational model completely by de-normalising data.  I know old Joe Celko pushes them left right and center but I’ve used them for a couple of years and there are far more elegant solutions which don’t rely on risking integrity in such an insane manor.  Anything which relies on the user to enforce the integrity is scary.  The database is the wrong place to do the tree for a start - it’s jus the storage engine at the end of the day.

The solution I developed keeps the relational model - I’m working on a CI port of it now from ASP.Net/C#.  This is PHP5/6 only as it deals with references.

1. Traditional parent-child table:  cols: id, parent_id, text
2. Database reads entire heirarchy into an assoc array.
3. Value objects are created for each row in the assoc array.
4. Each VO contains a reference to the parent node and an array of children and functions for determining the sibling, reordering and paths.
5. When the list is created, it is then processed and all parent and child relationships are made (by scanning the list).
6. The root VO is serialised into string as processed and cached.
7. When required, the tree is de-serialised from cache and manipulated directly.
8. If the database is changed, the cached tree is either regenerated or reloaded at a set interval.  This is done either by trigger-based cache dependencies or interval polling (cron).

Note: the cache backend is either text files with serialised php objects in or memcached for distributed systems.

I’ll post the code when I’ve finished it but it’s safer than the Celko rubbish, doesn’t kill the DB and is read and write optimised.  Sorry if this offends anyone.

Profile
 
 
Posted: 29 July 2007 04:54 AM   [ Ignore ]   [ # 48 ]  
Grad Student
Avatar
Rank
Total Posts:  55
Joined  08-23-2006
Chris The Techie - 28 July 2007 07:06 PM

Sorry to be an ass but nested sets are dangerous and unreliable. 

Not quite true. I started this thread by suggesting the use cases for where nested sets were most appropriate, ie for large and mainly static trees that do not need to be changed much after their initial setup. The nested sets approach is an order of magnitude more efficient than the adjacency list method in these circumstances. As in all cases, when you approach a new project, you select the best tools for the job - you can tighten a screw with a knife but a screwdriver is a better bet!


I’d be very interested in seeing your contribution when you’ve got it ready. There’s one immediate gotcha that springs to mind though. If all I want is the breadcrumb trail from where I currently am back up to the root node and a list of child categories, it’s naturally going to be less efficient to instantiate the entire tree just to get these elements. Not really a problem for relatively small trees, but if we’re dealing with a large tree (ebay’s category tree is 11Mb), that’s a lot of overhead - unnecessary in comparison to two small and simple select statements.

Again, it all comes down to the anticipated size and use of the tree. Large trees with frequent reads and infrequent writes are always going to be much more efficient with nested sets. If you need to do frequent writes, nested sets are not appropriate and this is especially true with large trees as a single change can require the entire tree to be updated. In these circumstances, adjacency list models such as yours are much more appropriate.

Anyhow, how long before your port is ready?

 Signature 

Senior Developer, MyBuilder.com

Hierarchical data trees using nested sets

Profile
 
 
Posted: 29 July 2007 08:04 PM   [ Ignore ]   [ # 49 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  797
Joined  08-06-2006
Chris The Techie - 28 July 2007 07:06 PM

They violate the relational model completely by de-normalising data.

Yeah, that’s what I thought when I first started on this thread. But, I’ve come to realize that there are certain types of data that you want to store that are not “normal” and never will be. I posted my thoughts on it a few pages back - user-generated categories is the prime example.

How exactly would you normalize user generated categories?

Interesting criticism BTW. But, it does seem like you solve the problem with a slightly different database: a simplified database paired with a serialized/cached associative array.

Also, how is database integrity affected by the ‘user’? All the database manipulations are controlled by the class interface…

Profile
 
 
Posted: 29 July 2007 09:59 PM   [ Ignore ]   [ # 50 ]  
Grad Student
Rank
Total Posts:  48
Joined  06-25-2007

Assuming nested sets (mptt), if as long as you’ve covered all the test cases to make sure your algorithm is reliable (I know my insert and delete functions work perfectly), then why can’t you depend on database transactions to make sure that the tree is not compromised? Obviously, not all rdbms support this but most do.

I generally like to use MPTT where the order of the data essentially follows the order of the primary keys (and thus the order that things are added in). This doesn’t necessarily need to be true for a single branch (excluding children), but it makes things easier. Eg: a threaded comment system, the order of the comments depends on the order that they’re added in. Threading using MPTT is wildly more efficient then using adjacency list method.

However, if order does matter, in the case of a product/category list where you want to order the list alphabetically, for example, mptt becomes a pita.

Therefore its a slight tradeoff, where on the one hand you have efficiency when viewing the tree but manipulating the tree is expensive, whereas on the other viewing more than one node set on the tree requires recursive functions that call the database to move lower into the tree (expensive) but it’s easy to add to the tree.

Adjacency list has its place and so does mptt. With the proper handling of the tree, I don’t think tree integrity is an issue.

 Signature 

I/O Reader

Profile
 
 
Posted: 30 July 2007 10:20 AM   [ Ignore ]   [ # 51 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  797
Joined  08-06-2006

In case you were wondering… as I was, since the acronym was introduced in the thread without an expansion:

MPTT = Modified Preorder Tree Traversal

Profile
 
 
Posted: 30 July 2007 11:30 AM   [ Ignore ]   [ # 52 ]  
Research Assistant
RankRankRank
Total Posts:  915
Joined  07-10-2006
sophistry - 30 July 2007 10:20 AM

In case you were wondering… as I was, since the acronym was introduced in the thread without an expansion:

MPTT = Modified Preorder Tree Traversal

As well as:

MIPTT = Modified Inverse Preorder Tree Traversal

Profile
 
 
Posted: 30 July 2007 11:37 AM   [ Ignore ]   [ # 53 ]  
Grad Student
Avatar
Rank
Total Posts:  55
Joined  08-23-2006
esra - 30 July 2007 11:30 AM
sophistry - 30 July 2007 10:20 AM

In case you were wondering… as I was, since the acronym was introduced in the thread without an expansion:

MPTT = Modified Preorder Tree Traversal

As well as:

MIPTT = Modified Inverse Preorder Tree Traversal


But nested sets is easier to say when drunk! wink

 Signature 

Senior Developer, MyBuilder.com

Hierarchical data trees using nested sets

Profile
 
 
Posted: 30 July 2007 11:50 AM   [ Ignore ]   [ # 54 ]  
Research Assistant
RankRankRank
Total Posts:  915
Joined  07-10-2006

One could always pronounce the abbreviations, but it would probably sound like spitting.

Profile
 
 
Posted: 05 August 2007 02:20 AM   [ Ignore ]   [ # 55 ]  
Grad Student
Avatar
Rank
Total Posts:  67
Joined  08-03-2007

Just to toss in my 2 cents, before discovering CI, I was already working on a table structure for managing site URLs using Nested Sets. The Nested Set model allows me to easily retrieve a complete url structure, or just match a single url (for verification).

Here’s a sample table structure, and 2 sample queries:

--
--
Table structure for table `seo`
--

CREATE TABLE `seo` (
  `
id` int(10) unsigned NOT NULL auto_increment,
  `
parent_id` int(10) unsigned NOT NULL,
  `
lft` int(10) unsigned NOT NULL,
  `
rght` int(10) unsigned NOT NULL,
  `
orig_url` varchar(500) collate latin1_general_ci NOT NULL,
  `
seo_url` varchar(255) collate latin1_general_ci NOT NULL,
  
PRIMARY KEY  (`id`)
)
ENGINE=MyISAM  DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=5 ;

--
--
Dumping data for table `seo`
--

INSERT INTO `seo` VALUES (1, 0, 1, 8, 'home.php?cat=1', 'jewelry');
INSERT INTO `seo` VALUES (2, 1, 2, 5, 'home.php?cat=3', 'necklaces');
INSERT INTO `seo` VALUES (3, 1, 6, 7, 'home.php?cat=9', 'earrings');
INSERT INTO `seo` VALUES (4, 2, 3, 4, 'product.php?productid=30&cat=3&page=1', 'blue-heart-necklace.html');


Pull back all valid url paths, regardless of depth

SELECT
        node
.orig_url,
        
GROUP_CONCAT(parent.seo_url SEPARATOR "/") AS path
    FROM
        seo
AS node,
        
seo AS parent
    WHERE
        node
.lft BETWEEN parent.lft AND parent.rght
    GROUP BY node
.parent_id, node.id
    ORDER BY parent
.lft

Match single url path

SELECT
        orig_url
    FROM
        
(
        
SELECT
            node
.orig_url,
            
GROUP_CONCAT(parent.seo_url SEPARATOR "/") AS path
        FROM
            seo
AS node,
            
seo AS parent
        WHERE
            node
.lft BETWEEN parent.lft AND parent.rght
        GROUP BY node
.parent_id, node.id
        ORDER BY parent
.lft
        
) AS paths
    WHERE
        path LIKE
"jewelry/necklaces"

 Signature 

PHP Site Solutions | Nested Set library for CI

Profile
 
 
Posted: 06 August 2007 02:53 PM   [ Ignore ]   [ # 56 ]  
Grad Student
Avatar
Rank
Total Posts:  67
Joined  08-03-2007

Hey Thunder, I see only one real flaw in your class (aside from PHP4 support grin ). It appears you don’t allow for the possibility of a sibling “root” category. Currently your code seems focused on there being 1 root category, and then everything belongs to that root. In actuality, you should consider the table itself as the “root” category, as it can have multiple children (recursively, of course)

From what I can tell so far, your code doesn’t seem well suited to the following dataset:

left   right

1     2
3     4
5     8
6     7


Please let me know if I’m missing something here grin  Thanks for the time/effort you’ve put into this, it’s a good start.

Just a note, CakePHP 1.2 has support for Nested Sets, in a file called tree.php, that might serve as a good code example if you feel the need to change your nested set theory.

 Signature 

PHP Site Solutions | Nested Set library for CI

Profile
 
 
Posted: 06 August 2007 03:24 PM   [ Ignore ]   [ # 57 ]  
Grad Student
Rank
Total Posts:  48
Joined  06-25-2007

The class I posted in this thread, http://ioreader.com/code/mptt.phps allows for unlimited roots.

 Signature 

I/O Reader

Profile
 
 
Posted: 06 August 2007 03:50 PM   [ Ignore ]   [ # 58 ]  
Research Assistant
RankRankRank
Total Posts:  915
Joined  07-10-2006

I believe that Thunder’s current model approach was based on a PHP5 library rather than a model. The PHP5 library was not documented on the wiki.

If you had two tables—for example, trees and tree_nodes—the id column for the trees table could be a foreign key in the tree_nodes table, allowing multiple roots. Modifying the existing model class approach to handle the existence of the new column should not take much effort.

Profile
 
 
Posted: 07 August 2007 07:00 AM   [ Ignore ]   [ # 59 ]  
Grad Student
Avatar
Rank
Total Posts:  67
Joined  08-03-2007

Thanks Peter, I’ll check that out

@esra, I’m already working on modifying his class to consider multiple roots. Looks like maybe 3 functions or so that will need modification

another issue I found, is that if there is a table prefix, the code doesn’t account for that. Easy fix though (but still not happy with my fix, I’m looking for a better change to the code)

cheers

 Signature 

PHP Site Solutions | Nested Set library for CI

Profile
 
 
Posted: 09 August 2007 02:16 PM   [ Ignore ]   [ # 60 ]  
Grad Student
Avatar
Rank
Total Posts:  67
Joined  08-03-2007

okay, i’ve made a number of modifications to Thunder’s class.

Main reason for changes:
- adding parent_id support
- adding support for table prefixes
- moving to using built-in CI db query generation when possible
- support for unlimited root nodes

The code is not backwards compatible, and is not 100% finished (I have some more tweaking/cleanup to do), but it works. When I get a chance, I’ll do a compare of the two versions, to see if I’ve saved the need for extra queries or not.

I can’t post the code in this message, as it hit the max allowed characters for the post, but if you’d like to check out the code, contact me.

When I get a chance, I’ll put together a package for download.

 Signature 

PHP Site Solutions | Nested Set library for CI

Profile
 
 
   
4 of 6
4
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 719, on June 06, 2008 10:16 AM
Total Registered Members: 66430 Total Logged-in Users: 35
Total Topics: 84795 Total Anonymous Users: 7
Total Replies: 455064 Total Guests: 244
Total Posts: 539859    
Members ( View Memberlist )
Newest Members:  Dylan1978X_franbaguasllogocsaturkeyPeter BryanttherendStudioGeorgiaJZeerfedeghe