Build a Tree from Database Records

Introduction

When it comes to storing (and then building) a tree, there are different approaches. One of them is storing each node as a record of the same table and specifying a parent id for everything but the root node. Transforming this then into a structure of nested objects in PHP can be quite adventurous. Let's start with the database table and some content:

CREATE TABLE `tree` (
	`id`     SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
	`parent` SMALLINT(5) UNSIGNED DEFAULT NULL,
	`label`  VARCHAR(63) NOT NULL,
	PRIMARY KEY (`id`),
	INDEX `parent` (`parent`)
);

+----+--------+----------------+
| id | parent | label          |
+----+--------+----------------+
|  1 |   NULL | root           |
|  2 |      1 | first level a  |
|  3 |      1 | first level b  |
|  4 |      2 | second level A |
|  5 |      2 | second level B |
|  6 |      3 | second level C |
+----+--------+----------------+

The tree we'll use in our example, looks like this:

1: root
+-- 2: first level a
    +-- 4: second level A
    +-- 5: second level B
+-- 3: first level b
    +-- 6: second level C

The solution I encountered was in a Zend Framework 1.x project using Zend_Db as the ORM so we had three classes: Table_Tree, Rowset_Tree and Row_Tree. They were pretty straight forward according to Zend Framework 1 specifications. Apart from getters and setters for the (magic)members of the row class, it had a method for retrieving the children of a row.

class Row_Tree extends Zend_Db_Table_Row
{
	// [...]

	/**
	 * @return Rowset_Tree
	 */
	public function getChildren()
	{
		$table = new Table_Tree();

		$query = $table->select();
		$query->where('parent = ?', $this->id);

		return $table->fetchAll($query);
	}
}

In addition to these ORM files, there was a class - let's call it "Node" - which would represent the nodes and leafs of the tree. Omitting the attributes which are not helping the illustration, it would look like this:

class Node
{
	/** @var Node[] */
	private $children = [];

	/** @var Node */
	private $parent = null;

	/**
	 * @param  Node[] $children
	 */
	public function __construct($children = [])
	{
		$this->children = $children;
		foreach ($children as $child) {
			$child->setParent($this);
		}
	}

	/**
	 * @param Node $parent
	 */
	public function setParent(Node $parent)
	{
		$this->parent = $parent;
	}

	/** @return Node|null */
	public function getParent();

	/** @return Node[] */
	public function getChildren();
}

The code for building the tree was inside a controller and used recursion to build this tree.

class TreebuildingController
{
	/**
	 * @param  integer $id
	 * @return Node
	 */
	private function _getTree($id)
	{
		$table = new Table_Tree();
		$parent = $table->find($id);

		$nodes = [];
		foreach ($parent->getChildren() as $row) {
			$nodes[] = $this->_getTree($row->getId());
		}

		return new Node($nodes);
	}
}

Analysis

When I got the job to optimize performance in this project, the tracer showed me about 12 seconds of walltime for this function. That was partially because of the database server being accessed over a remote connection but also because of many iterations and queries. The tree in the original project had about 6000 nodes and leafs, just for comparison. You can roughly multiply our example here by 1000 to get the real numbers.
What happens here is that for every node and leaf there will be two database queries sent. One that fetches the node and a second one which gets all the child elements, further nodes or leafs. Of course if there is a leaf at hand, the query will return zero elements and the recursion can return.
For our small tree above, this results in a total of 12 database queries, so for a tree with 6,000 nodes, it would be 12,000 queries. If the database (including network) can answer each of them in 1ms, you have already at least 12 seconds of walltime passed for just building the tree.

Optimisation

Remove one query per iteration

You might have noticed one small detail here: Every call to this function uses the ID of the parent node as a parameter. The function then fetches the entity just for calling "getChildren" on it. If we look at our example, just changing that will reduce the total amout of queries sent by 5 - cutting the number almost half. The code of our controller would look like this:

class TreebuildingController
{
	/**
	 * @param  Row_Tree|integer $id
	 * @return Node
	 */
	private function _getTree($id)
	{
		if ($id instanceof Row_Tree) {
			$parent = $id;
		} else {
			$table = new Table_Tree();
			$parent = $table->find($id);
		}

		$nodes = [];
		foreach ($parent->getChildren() as $row) {
			$nodes[] = $this->_getTree($row);
		}

		return new Node($nodes);
	}
}

Fetch in bulk

If you choose a different ORM, e.g. propel (at least starting with 2.0), you can take advantage of one particular feature I became quite fond of: you can populate all relation to child entities of a set of parent entities, which will just fire one query in the background. For our tree, that would mean one query per level of depth. In our example that would be a total of 4 queries - the 4th level does not exist, but our code isn't aware of that (we assume that we have a tree which can be dynamically changed, otherwise it would not be necessary to store it in a RDBMS in the first place). The following code is just a pseudo-code example, the _getTree function will basically stay the same - I just removed the database operations to the outside - since we just prefetch the entities in bulk internally.

class TreebuildingController
{
	public function treeAction()
	{
		$treeQuery = TreeQuery::create();
		$root = $treeQuery->findById(1);

		$rowset = $root->populate('Tree');
		while ([] !== $rowset) {
			$rowset->populate('Tree');
		}

		$tree = $this->_getTree($root);
	}

	/**
	 * @param  Tree $parent
	 * @return Node
	 */
	private function _getTree(Tree $parent)
	{
		$nodes = [];
		foreach ($parent->getChildren() as $row) {
			$nodes[] = $this->_getTree($row);
		}

		return new Node($nodes);
	}
}

One Query to rule them all

If you have just one tree in that table - or if you make the nodes of multiple trees retrievable with one simple query that e.g. includes a "tree-id" which is the same for each node of a tree, then you can just fetch all the rows in one query and assemble the tree just by looking at IDs. Since database servers are usually the bottleneck of a webapplication and can't be easily scaled, it's okay to move a bit of the load to PHP and scale by adding webservers.
I'll optimise this in three steps - so first we just change it to one query and put there some excessive looping instead.

1. One query - but excessive looping
class TreebuildingController
{
	/** @var Row_Tree[] */
	private $rows = [];

	public function treeAction()
	{
		$table = new Table_Tree();
		$rowsetTree = $table->fetchAll(1);

		foreach ($rowsetTree as $row) {
			$this->rows[] = $row;
		}

		$tree = $this->_getTree($this->rows[1]);  // magic number -- ID of root element -- simplified to keep the code example small.
	}

	/**
	 * @param  Row_Tree $parent
	 * @return Node
	 */
	private function _getTree($parent)
	{
		$nodes = [];
		foreach ($this->rows as $index => $row) {
			if ($row->getParent() == $parent->getId()) {
				$nodes[] = $this->_getTree($row);
				unset ($this->rows[$index]);   // this makes future loop iterations smaller the more nodes are already handled
			}
		}

		return new Node($nodes);
	}
}
2. One query - less looping

Looping over all rows each time to just find the few nodes for one parent is a bit tedious. So we can optimize further by grouping the elements by parent id.

class TreebuildingController
{
	/** @var Row_Tree[][] */
	private $rows = [];

	public function treeAction()
	{
		$table = new Table_Tree();
		$rowsetTree = $table->fetchAll(1);

		foreach ($rowsetTree as $row) {
			if (null === $row->getParent()) {
				$root = $row;
				continue;
			}

			if (!isset($this->rows[$row->getParent()])) {
				$this->rows[$row->getParent()] = [];
			}
			$this->rows[$row->getParent()][] = $row;
		}

		$tree = $this->_getTree($root);
	}

	/**
	 * @param  Row_Tree $parent
	 * @return Node
	 */
	private function _getTree($parent)
	{
		$nodes = [];
		foreach ($this->rows[$parent->getId()] as $row) {
			$nodes[] = $this->_getTree($row);
		}

		return new Node($nodes);
	}
}
3. One query - and no recursion

...and as the last step we're now kicking recursion out of the door since we actually do not need it to build a tree. We can use the power of object references to help us solve this problem with just one loop over the database result.
First of all, class Node receives an additional function addChild:

class Node
{
	// [...]

	/**
	 * @param Node $child
	 */
	public function addChild(Node $child)
	{
		$this->children[] = $child;
		$child->setParent($this);
	}
}

And then we just loop over it once and create the Nodes in an array, where we can find them later on as parents to add to. Just make sure that, when retrieving thelist of rows from the database, you sort them by parent ascending and do not re-arrange the tree in a way where an older row is the child of a newer one. If that's the case, you need a bit more code to make it work, but that's beyond this article. Please drop a comment if you need help with that.

class TreebuildingController
{
	/** @var Node[] */
	private $nodes = [];

	public function treeAction()
	{
		$table = new Table_Tree();
		$rowsetTree = $table->fetchAll(1);

		foreach ($rowsetTree as $row) {
			$newNode = new Node();
			$this->nodes[$row->getId()] = $newNode;

			if (null === $row->getParent()) {
				$tree = new $newNode;
			} else {
				$this->nodes[$row->getParent()]->addChild($newNode);
			}
		}

		// do something with $tree
	}
}

Wasn't that hard, was it? This magically reduced the walltime of that piece of code to way less than a second.

Holger Segnitz, 2018-06-10