What goes up…
Do you know what’s even better than building a red-black tree? Destroying one. But not in that awesome wrecking ball kind of way. We will be a tad more graceful and careful.
In researching this topic, I found many comments regarding the high level of difficulty involved with deleting nodes from a red-black tree. Further, I found dozens of red-black tree examples that eschewed deletion altogether! Is this fear justified?
Deleting nodes from a red-black tree certainly has its complications, but from these complications we will observe, prove, and generalize our way through the barriers of fear1.
On the surface, it may seem that breaking down a red-black tree should be easier than building it up. Is the same not true of other man-made structures such as bridges and buildings?
Not quite. Rather than tearing down a skyscraper with a wrecking ball, we will attempt to remove all load bearing walls from the 7th floor while maintaining structural integrity of the building.
That was a much better analogy.
Source of complication
What about deletion makes it more complicated than insertion? This question is based on the fact that deletion is more complicated than insertion. Looks like we have our first observation to make.
Observe what happens to a red-black tree when a node is inserted into it:
- Insertion happens at the leaf level.
- New nodes are colored red to prevent a black violation (we previously reasoned that red violations were easier to fix than black violations).
- New nodes are colored red since there is then a chance a violation may not occur at all.
- If a red violation occurs, we remedy it by re-coloring and/or rotating nodes in the tree.
Now we can compare these observations with what can be expected of a red-black tree when a node is removed from it.
- Deletion could happen anywhere in the tree. We may have to delete the root node, an interior node, or a leaf node.
- The node to delete could be red or black in color. We may need to know how to fix both a red and a black violation.
When inserting nodes, we are in control of where insertion happens and the circumstances under which it happens. The opposite is the case when we delete a node.
If deleting a node could be made similar to inserting a node, we may be able to leverage at least some of our existing knowledge. The two aspects of insertion that make it easier to deal with are color and locality.
Choosing the color of a node allows us to pick which violation we want to fix. We choose to insert red nodes instead of a black ones to avoid having to rebalance the black-height of the tree. We will not have this luxury when deleting a node since each node’s color is pre-determined and required to maintain the integrity of the tree.
There is a way to stack the deck in our favor by forcing a red node down the path of the deletion, but I will not be showing how to do such.
Locality may not seem to offer much benefit, but it provides us with a consistent base case for dealing with deletion. If we could always delete a leaf or near-leaf node (a node with at most one non-null child) then each deletion will work under similar conditions. This will allow us to generalize our deletion algorithm by focusing on the small picture and building on it.
Heir to the throne
If you are familiar with deleting nodes from a regular binary search tree, then you probably know what comes next.
When deleting an internal node from a binary search tree, the trick is to locate that node’s in-order predecessor (IOP) or in-order successor (IOS) and use its value as a replacement for the node to delete. Then simply remove the IOP or IOS from the tree.
We maintain the binary search tree characteristics of the tree by using the IOP as a replacement as it ensures that the left sub-tree has smaller values as the IOP is the largest valued node in the left sub-tree. The same is true of the IOS, but flip-flopped.
- In-order predecessor (IOP)
The IOP of a node X is the largest valued non-null node Y in the left sub-tree of X.
- In-order successor (IOS)
The IOS of a node X is the smallest valued non-null node Y in the right sub-tree of X.
Many tree deletion algorithms, even ones I’ve written, alternate between the IOP and IOS when deleting nodes. The idea here is to help keep the tree balanced by not always deleting nodes on one particular side. We are not going to be doing this. Rather, we will focus on the IOS to keep the algorithm simpler. Remember that I want this to be a learning resource and not necessarily the most efficient implementation. Writing code to alternate between the IOP and IOS is quite trivial though if you want to do that on your own.
The code for locating the IOS of a node is very straightforward given the definition:
Given a node, the above function will find its IOS by examining its right sub-tree and then recursing left as far as possible.
An interesting property of the IOS is that if it has a child, it will only have a right child. Perhaps this is not interesting and rather obvious. But it is useful information to leverage when we work on fixing the loss of a black node.
In order for us to take advantage of those observations, let’s prove them first.
- The IOS of any internal or near-leaf node is a leaf node or a near-leaf node with a null left child.
- A near-leaf node can have an IOS or IOP, but not both.
- This requires two parts. First we’ll prove that an internal or near-leaf node can have an IOS and then we’ll prove that an IOS has a null left child.
A. An internal or near-leaf node can have an IOS.
If X is an internal node then by definition it must have two non-null children. If X is a near-leaf node then by definition it must have at most one non-null child. Recall that the IOS of X is the smallest valued non-null node Y in the right sub-tree of X. Therefore X can have an IOS if it has a non-null right child.
B. An IOS has a null left child.
By definition, the IOS of a node X has the property that it is the smallest valued non-null node in X’s right sub-tree. Recall that every node in a binary search tree has the property that the value of its left child (if not null) is smaller than its own value. This means that the smallest valued non-null node in any sub-tree is the left most node from the root of the sub-tree. Therefore the IOS of a node X is the left most non-null node in the right sub-tree of X.
- By definition, a near-leaf node has at most one non-null child. For a node to have both an IOS and an IOP, it must have two non-null children. It then follows that a near-leaf node cannot have both an IOS and an IOP.
A few simple proofs can certainly go a long way. My deletion algorithm is based on these proofs holding, so its important to make sure our assumptions and observations can be proven.
Thoughts on deletion
What can we expect of the tree when a node is deleted from it? What types of violations might arise and when? How does the state of the tree give rise to how the tree might be fixed? Why is the sky blue? These are just a few questions that we will need answered before we can determine how to remedy any violations introduced by deleting a node.
We know that deleting a node really starts after we’ve located a leaf or near-leaf node, which may be the IOS or the original node to delete.
Our algorithm for deleting a node will be recursive and we will fix any violations that arise as we unwind our call stack. This should sound familiar as it is the approach we took with inserting nodes. This worked out well for insertion since we were able to look down the tree for red violations as we popped back up it. We could have returned some type of indication that a red violation existed instead of looking for it each time. Identifying a red violation however was trivial and this overhead was not worrisome.
Things are going to have to be a bit different with deletion however since we now have to worry about black height violations when we remove a black node.
There are a few options on the table to address this problem.
- After a recursive call, check the black height of the left and right sub-trees of the current root and see if they are different.
- Prior to a recursive call, check the black height of the sub-tree in the direction of the deletion. After the recursive call, check the black height of this sub-tree again and see if it is different.
- Modify the return type of the recursive function to indicate a black node loss.
Numbers one and two add a certain amount of complexity I feel is unnecessary. This is merely opinion though and you might feel differently. I chose number three for a couple reasons. First, it allows us to directly communicate a black node loss. Second, it does not clutter up the deletion algorithm with code which detracts from learning the mechanics of the algorithm.
So at this time I will introduce the Nonad:
A Nonad is a class composed of a Node and a Boolean value. The Boolean is used to indicate the loss of a black node in the sub-tree with the corresponding node as the root.
The name Nonad is most certainly a joke and is a combination of the words Node and Monad. The idea here borrows from the use of monads in software by supplementing (amplifying) the Node type with information regarding the loss of a black node in its sub-tree.
The ground work
I promise we’re close to the fun stuff. Next we lay the ground work for the Remove function by accounting for the recursion and the identification of the IOS.
Remember that when we find the node to delete it could be an internal node. When this happens, we need take the value of its IOS and then continue deleting the IOS. The code for identifying an internal node is also straightforward:
Regular readers of this blog will probably laugh at the use of the null anti-pattern here. In the context of this program however, this is not an issue (although there remains situational ambiguity).
Notice that once we find an internal node and get its IOS we do not return from the function, but rather proceed with recursing towards the IOS which is the node we need to delete.
The other case we need to account for is when we find the node to delete and it is not internal (a leaf or near-leaf). This node could be the IOS or the original node to delete. The code is the same in either case.
What can we expect of this leaf or near-leaf node? In the case of a red leaf node we can simply remove it without affecting the black height of the tree. It probably goes without saying that removing a red node does not cause a red violation.
If the node is a black leaf node, then we don’t have much a choice here. We have to remove the node and indicate the black node loss in our return type.
What if this node is near-leaf node? In this case we know that it has a single non-null child. If this is the IOS of the original node to delete, then it will have a non-null right child. However, the original node to delete may be a near-leaf and therefore could have a child on either side (but not both). This means we cannot assume the direction of the non-null child.
Could a near-leaf node be red? Assume it could. If this node has a red child, we would have a red violation. If this node has a black child we would have a black violation since the child in the opposite direction is null. Therefore a near-leaf node cannot be red.
Could a near-leaf node be black? Again, assume it could. This node could have a red child but not a black child. Just as above, a black child would indicate a black violation since the child in the opposite direction is null. Therefore a near-leaf node must be black and have a single non-null red child.
Deleting a black near-leaf node with a red child is very easy. We simply replace the black near-leaf X with the red leaf Y and then paint Y black:
To reiterate, we’re OK if we delete a red leaf node or a black near-leaf node with a red child. We get ourselves into a bit of trouble when deleting a black leaf node.
Next, we switch gears and focus on what to do after a recursive call and notice (via the amazingly resourceful Nonad) the loss of a black node.
Fix it already
Are you ready for a brave, brazen, and brash comment? Okay, well here it is:
Deleting a node from a red-black tree is not very difficult.
At least not as difficult as many others make it sound and look.
Believe it or not, we did a lot of the heavy lifting last time when we observed, justified, and generalized our insertion scenarios. Our previous learning's will help us tremendously in this section.
To supplement our existing knowledge, we need to figure out how to balance the black height of a sub-tree after one of its branches loses a black node. Two ideas come to mind. First, we could decrease the black height of the other branch. Although this would balance the black height of the sub-tree, we would have to propagate the black height loss further up the tree. This might be our only recourse in certain circumstances. Second, we could rotate the tree in such a way as to put a black node in the branch that lost a black node.
In the spirit of the last post, we will refer to the root node of a sub-tree as P, the child in the path of the deletion X, and X’s sibling S. When you see a gray node, know that I got lazy and did not want to create additional figures for each possible combination of node colors. A gray node will be examined at a later point and could be red or black and in some cases null. In addition, I will also use two parallel lines to indicate the direction of a black node loss (which is also the path of the deletion).
I will refer to directions without explicitly mentioning left or right. Rather, I will talk about the direction of the deletion and the direction of the sibling as sufficient substitutes.
Unlike last time, I did not present any scenarios that are invalid or impossible for us to encounter. At this point we know enough about red-black trees to not have to invalidate bunk scenarios. Also keep in mind that I drew only the cases where the deletion happened down the right sub-tree of the root node P.
- Figure 1.1
The sibling S is red, which means the root of the tree P must be black. Assuming P’s branches had an equal black height prior to the deletion, we can make a few very important observations. The first observation is that S must have two black children. Second, if X is not null, the black height of S must be greater than one. This is indicated by the C1 and C2 nodes. I colored these node gray since they could be red with black children or black themselves. Notice that I did not draw the children of B1. I did this to emphasize the children of B2, as they might cause us some problems in the future.
- Figure 1.2
The sibling S is black. If X is null, then S cannot have any children. In this case the gray nodes C1 and C2 will be null. If X is not null, then S must have a black height greater than one. This means the nodes C1 and C2 could be black or red with black children.
- Figure 1.3
Same observations as above, but the root of the tree P is red.
- Figure 1.4
This is a special case of figures 1.2 and 1.3. In this case, S has a red child in the direction of the deletion and a black child in the sibling direction. This provides sufficient information to know that X is not null.
- Figure 1.5
This is the same case as figure 1.4, except that S has a red child in the sibling direction.
Currently, we know deleting a black leaf node causes us to propagate the black height loss up the tree for fixing. And since this is the only black height loss scenario we are aware of, we use this case to determine how to address the above figures.
In figure 1.1 the sibling S is red. S must have two black children, and since X is null those children must not have any black children of their own. Observe that the black heights of B1 and B2 must be the same as the black height of X. Since X is null, the black heights of these three nodes must be 0. Further, observe that the black height of P prior to the deletion was 1. To preserve this fact and fix the black node loss, we can rotate S in the direction of the deletion, making sure to color S black to preserve the color of the root:
Remember that prior to the deletion, the black of height of the sub-tree rooted at P was 1. We’ve now moved our red node into the root of this sub-tree and colored it black. At first glance it looks like we’re done. However, if X is null we have a black height violation rooted at P because B2 is not null. This is simple enough to deal with though by coloring B2 red:
X being null, we’ve re-balanced the black height of the S’s tree!
Remember my earlier comment regarding locality? This is where we benefit from dealing with violations at the leaf level. In the case where X is null, this black violation was very easy to remedy. Further, it has given us the perfect opportunity to build upon the leaf case and make it work for trees of arbitrary sizes. Which is what we will now do considering the cases where X is not null.
Since we colored B2 red, we may have introduced a red violation since its children C1 and/or C2 could be red. We will casually call this node the problem child. Interestingly, note that the problem child is in the direction of the deletion (which is why we showed this nodes children in figure 1.1).
Thankfully we already know how to fix a red violation! I’ll briefly cover what to do, but refer to my post on insertion for more details.
If C1 is red, then we do not care about the color of C2. We simply color C1 black and rotate the parent of the problem child in the direction of the deletion:
Because we colored B2 red and picked up the black node P in its path, we maintain that B1 and B2 have equal black heights.
Further, we’ve made up for the original black node loss introducing the black node P in the path of the deletion. This means the red sibling case affords us the opportunity to fix the black node loss without propagating it up the tree.
Now assume C1 is black and C2 is red (a jagged red violation). We know how to fix a straight red violation, so to make the jagged violation straight we rotate the problem child in the direction opposite that of the deletion. From there, we proceed with a straight red violation as depicted in figure 3.
We now know how to remedy the loss of a black node in the red sibling case. The black sibling cases are not terribly difficult either.
Remember that we’re looking for an opportunity to rotate a black node into X’s branch while still maintaining the black height of S’s branch. Let’s examine the cases where the black sibling S does not have any immediate red children, as shown by figures 1.2 and 1.3.
In these figures, assume the nodes C1 and C2 are either null or black. If these nodes are null, then so is X. Observe that since S is black, P can be black or red. In the case where P is black, we do not have a red node at our disposal and must propagate the black node loss up the tree. Before we do though, we can balance P’s black height by coloring S red. Doing so balances the black height at P since each of its branches lost a black node.
In the event that P is red, we’re in luck. After coloring S red to balance each of P’s branches, we color P black to make up for the black height loss.
In figures 1.4 and 1.5 you’ll see S has a red child. We will refer to this node as the niece of X. Notice that the color of P is gray. In these figures the color of P is irrelevant since we have enough information to fix the tree without needing to leverage P’s color. If we happen to rotate a node into P’s position, we will be careful to maintain the color of the root.
Figure 1.4 shows us that the red node R1 is in the sibling direction. We will call this a straight red niece (straight meaning R1 is in the same position to S as S is to its parent P). Observe that the loss of a black node in X’s branch means that the sub-trees rooted at R1 and B1 now have black heights equal to X. This observation is the key to knowing what to do next.
If we could rotate P in the direction of the deletion and color it black, that would make up for the black node loss. This rotation would force S into the root of the tree, leaving the sibling branch short a black node. Simply coloring R1 black would make up for this black node loss, balancing the tree.
Keeping the color of the root intact, we rotated a black node into the direction of the deletion and balanced out the loss of S in the sibling direction by coloring R1 black. If X is null, that means C1, C2 and B1 must also be null, meaning our tree is in great shape. If X is not null, how can we be sure that we haven’t thrown the black height of our tree off?
Recall that the black height of X (after it lost a black node) is equal to the black height of B1. Further, since we rotated the black node P into the direction of the deletion we have made up for the initial black node loss. On the other side of the tree, coloring R1 black makes up for rotating S out of that branch and also balances out the addition of the black node P in the other branch. How great is that!
Our final case is figure 1.5, which shows us S has a red child R1 in the direction of the deletion. We will call this a jagged red niece (jagged meaning R1 is in the opposite position to S as S is to its parent P).
If we could rotate this figure in such a way as to create the straight red niece scenario, we would be finished. By now I’m sure there’s no surprise that we can do just that. Simply rotate S in the sibling direction and color S red and R1 black.
As you can see, prior to the rotation R1 had no impact on the black height of S’s sub-tree. This means the nodes B1, C2 and C2 must have equal black heights. We use these observations to conclude that we maintained the black height and the balance of the sibling branch.
Then we proceed with fixing the straight red niece scenario as shown in figure 6.
And now you know how to delete nodes from a red-black tree.
Since I have code scattered all throughout this post, here is a consolidated form of the code necessary to delete a node from a red-black tree. You will notice some overlap between what you see here what I posted last time.
I would like to wrap things up next time by showing how to validate a red-black tree automagically. Also, I’ll show what I did to randomly shuffle fixed sized datasets used to build up a tree and tear it right back down. It would also make sense to provide all the code shown throughout this series in one place, so I’ll take care of that as well.
1 – Lame, I know.