A what?
A red-black tree is a binary search tree that intelligently inserts and removes nodes to keep the tree roughly balanced. You’ll sometimes hear this called a “self balancing” tree, but I will refrain from labeling it as such since I’ll be taking the credit for writing this particular implementation. As the name implies, color has something to do with the structure. We will discuss why later, but suffice it to say that in a red-black tree, nodes are either red or black.
A red-black tree is a complicated data structure, but its logarithmic time complexity for operations on the tree make it very efficient and desirable. When implemented properly, a red-black tree can search, insert, and delete in O(log n) time. This efficiency does come at the cost of simplicity. But don’t let the complications worry you, they’re a blast to understand and overcome.
What makes a red-black tree interesting to me is the fact that it is an abstraction of a 3-4 tree. This is very important to understand and yet I find this information quite often missing from other explanations. Without this information, it can be very difficult to justify the properties that we will cover later. If you’re not familiar with higher order trees, take a few minutes to read about them since it will help make red-black trees easier to understand.
Definitions
I’m going to define some terms up front for us to use while discussing red-black trees. These terms will appear often throughout the series.
- Root node
The topmost node in a tree or sub-tree. - Near-leaf node
A node with at most 1 non-null child. - Internal node
A node with two non-null children. - Leaf node
A node without children. - Sibling
Given a node X and its parent P, the sibling of X is the child of P opposite that of X. - Black height
The number of black nodes following a simple path from a root to a leaf, not including the root itself. - Black violation
Happens when the black height differs among simple path’s from any given root. - Red violation
Two consecutive red nodes. - In-order successor
Given a node X, its in-order successor is the smallest valued node in its right sub-tree. - In-order predecessor
Given a node X, its in-order predecessor is the largest valued node in its left sub-tree. - Order
The maximum number of children a node can have. Some people define the order of a tree as the minimum number of values a node can hold. I do not like that definition since it is ambiguous. A node in a red-black tree has at most two non-null children, and is therefore a tree of order 2.
Properties
In a red-black tree, each node has a color; red or black. The color of a node along with rules concerning how colors interact keep the tree balanced.
- A node is red or black.
- The root of the tree is black.
- A red node cannot have red children.
- Every simple path from a root to any of its leaves must contain the same number of black nodes.
A red violation occurs when property #3 is violated and a black violation occurs when property #4 is violated. As previously mentioned, a red-black tree is an abstraction of a 3-4 tree. In order to properly maintain this abstraction, it is very important that red and black violations are prevented or fixed. We will discuss this abstraction in the next section.
I’ve said “roughly balanced” a few times now and have not defined what I mean. When a tree is roughly balanced, the longest simple path from a root node is no more than twice as long as the shortest simple path from the same root node. This observation yields an upper bound on the height of tree, which allows us to determine the time complexities noted above.
It is not obvious how the properties above force the tree to be roughly balanced, but it is easy to determine. Since the tree cannot have consecutive red nodes, the short path possible would have to contain all black nodes. Similarly, the longest path possible would have to alternate between red and black nodes. Since both paths must contain the same number of black nodes, we see that the longest path possible is twice as long as the shortest path possible. We can generalize this observation as “no simple path is more than twice as long as any other simple path”. This keeps the tree roughly balanced.
Why abstract?
If a red-black tree is an abstraction of a 3-4 tree, why not just use a 3-4 tree? Because I’ve developed higher order trees and they’re no picnic. A node with a single value and two children is much easier to reason about than a node with up to n values and n+1 child pointers. There are other complications that arise with higher order trees, such as splitting and joining nodes when balancing, inserting, and deleting. We avoid these complications by abstracting the concept of a 3-4 tree into a tree of order 2 and adopting special properties that keep the abstraction in check. We discussed these properties above, now let’s see how they fit into this abstraction. Here is a sample red-black tree:
We will worry about how this structure was built next time when we discuss insertion. For now we’re concerned with how this is an abstraction of a 3-4 tree and how the properties above help force the abstraction. A black node with a red child is conceptually the same node in a 3-4 tree. Since red nodes can only have black children, this relationship indicates a different level on tree (as does a black node with a black child). So to convert a red-black tree to a 3-4 tree, we simply “move” red nodes up to their black parent to form a 4-node:
Simple enough. Now let’s validate our properties.
- A node is red or black.
Black nodes represent the middle value in a 4-node. The outer elements in a 4-node are red. A black node with two red children represents a full 4-node, as shown above. It may help to envision lateral pointers from the black values to the outer red values, and the outer red values pointing down the tree. Or more simply, black nodes point horizontally and red nodes point vertically. - The root of the tree is black.
In all honesty, this is not necessary. We could maintain a valid red-black tree with a red root node. But that would force us to have to deal with special cases where a red root could have red children. We can avoid making that exception by always forcing the root to be black. Doing so also fits with our abstraction of a 3-4 tree. - A red node cannot have red children.
If a red node had a red child, that child would be in the same conceptual 3-4 tree node. If that happened, we would have a tree of order 5, and our abstraction would fail. - Every simple path from a root to any of its leaves must contain the same number of black nodes.
This is because all leaf nodes in a 3-4 tree must be on the same level. Remember that in a red-black tree a black node indicates a deeper level in the tree. So if one path has a different number of black nodes than another path our leaf nodes would not be on the same level and our abstraction would fail.
At first glance, the properties governing the structure of a red-black tree seem odd and can be difficult to justify. To truly understand these properties and their necessity, we must understand the relationship between a red-black tree and a 3-4 tree. Understanding this abstraction is the key to understanding red-black trees.
Goals
I like goals. Whenever I start work of any amount I first establish my goals. The goals I establish are not limited to the form of the end product as the word might imply. Rather, the goals I make include tools, processes, restrictions, and really anything that helps guide the completion of a task. Here are the goals I established when working on my red-black tree implementation:
- Easy to read and understand.
I want this to be a learning resource. The code written should be easily understandable and relatable to the properties and definitions of a red-black tree. As always, the semantics should be obvious from the syntax. - Concise.
I’ll write just enough code to maintain the integrity of the tree. The manner in which the code is written will be deliberate and respectful of this goal. - Limit side-effects.
Since we’re dealing with a balanced tree structure, we will need to insert nodes, delete nodes, and there will be times when we need to adjust the tree to fix any violations that occur. I want all modifications to the tree to be explicitly made via direct assignment. Functions that operate on the tree will do so in a predictable and isolated fashion. - No parent pointers.
Parent pointers can be a pain in the ass to maintain. Also, they are rarely a necessity and more often a convenience. My tree will point down and our algorithms for inspecting the tree will do so as well.
How I’ll do it
I have a four step process I use when I design and develop algorithms. It works quite well for me, and you’re welcome to use it if you find it useful. You’ll find me applying this process throughout the series.
- Observe
In this step I simply examine the task and problems presented by it and try to identify scenarios that occur in the problem domain. - Justify
Here I determine the validity of my observations and scenarios, and if necessary prove them correct. - Generalize
This process often involves several iterations, and this is the step in which we recurse. If my observations can be generalized, I will do so here and then proceed to step 1 with my new observations. If I feel my observations are general enough and I haven’t identified any overlapping between them, I proceed to step 4. - Develop
I take my observations and generalizations and develop the algorithm. If I’ve reasoned well about the probable scenarios and my observations are accurate, then this step is a translation of the first 3 steps into code. I may have made that sound more easy than it is; writing code can be just as difficult, if not more, than any of the previous steps.
The setup
The next part of this series will focus on inserting values into a red-black tree. To facilitate that, I’ll need to get some things out of the way. Let’s define a class to represent a node in our tree. Recall that a node in a red-black tree has a value, can be red or black in color, and can have two children. To keep things simple, we’ll assume an integer value for each node. Here is how we will define a node:
private class Node
{
public enum Color
{
Red,
Black
}
public enum Direction
{
Left = 0,
Right = 1
}
public int Value { get; set; }
public Color NodeColor { get; set; }
private Node[] _children;
public Node this[Direction Child]
{
get
{
return _children[(int)Child];
}
set
{
_children[(int)Child] = value;
}
}
public Node(int Value)
{
this._children = new Node[2];
this.Value = Value;
this.NodeColor = Color.Red;
}
public static Direction Opposite(Direction Path)
{
if (Path == Direction.Left)
{
return Direction.Right;
}
else
{
return Direction.Left;
}
}
}
For a brief explanation of why I structured the child references as I did, please refer to this older post of mine regarding tree rotations. While I enjoy the benefit of being able to identify children by a value in an enumeration, I understand that values in an enumeration can be falsified. By simply casting any integer, say 13, into a value of the Direction enumeration we lose our pseudo-type-safety and can provide an out of range index to our children array. Oh well, I’ll be sure not to screw myself.
The Node class is contained within a public container class called RBTRee, which is just too brief to show here (although we’ll beef it up later when we learn how to insert and remove nodes). It is sufficient for now to know that this container class contains some helper functions that we’ll define later and a private member variable called “_root”.
Bathroom break
I’ve included links below to the other parts in this series. Next time we’ll tackle inserting values in a red-black tree. I’m not sure when next time will be, but the heading of this section is meant to imply it will be quick in comparison to the dreadful pace I’m known for.
Parts
- Introduction
- Explanation (you are here)
- Insertion
- Deletion
- Validation and testing
Comments