Pages

Thursday, December 15, 2011

The Origin of Red-Black Trees

In Introduction to Algorithms, a red-black tree is defined as a binary tree that satisfies the following red-black properties:

    1. Every node is either red or black.
    2. The root is black.
    3. Every leaf (NIL) is black.
    4. If a node is red, then both its children are black.
    5. For each node, all simple paths from the node to descendant leaves contain the same number of black node.
Starting from this definition, we can prove some very nice properties of red-black trees. The most important property is that a red-black tree with $n$ internal nodes has height at most 2log( $n+1$). Given this, we say that red-black trees are approximately balanced.

Another kind of balanced binary tree is the AVL tree. For an AVL tree with $n$ nodes, the height is strictly less than 1.44log( $n+2$ )-1. Though it seems better than the red-black trees, the delete operation for AVL tree may require $O(\log n)$ rotations, which is inconvenient for many applications, while the red-black trees require at most 3. So in practice, people prefer red-black trees more often.

The idea for AVL trees is straightforward. We first try to insert elements and check whether the resulting tree is height-balanced. If it's not, we re-adjust the structure of the tree to make it balanced. The technique we use is referred to as rotations. In the case of red-black trees, however, things do not seem so obvious: it is quite strange that people ever came up with the idea to use colors to control the height of a tree.

To gain a better understanding of red-black trees, especially how the idea of colors came about, we may want to have a look at the origin of red-black trees, which are essentially B-trees of order 4, or 2-3-4 tree, a 4-ary search tree where each node can have a degree of 2, 3, or 4. Given this definition, the red-black tree is in fact a binary tree representation of the B-tree of order 4: every B-tree node can be transformed, or expanded, to a set of black and red nodes as shown below:

Figure 1. Transforming a degree 4 node into red-black nodes

Figure 2. Transforming a degree 3 node into red-black nodes (two possible transformations)

Figure 3. Transforming a degree 2 node into red-black nodes

Note that in all these transformations, there should be no two successive red nodes because each node in the original B-tree is black in the corresponding red-black tree, and produces at most one level of red node(s). Moreover, since the height of a B-tree of $n$ nodes is $O(\log n)$, the resulting red-black tree has height $O(2\log n)=O(\log n)$.

Thus, we can use a binary search tree to represent a B-tree of order 4, which is actually how red-black trees are invented. Knowing this, we should no longer feel the definition of red-black trees so obscure.

No comments:

Post a Comment