The good stuff
Finally that boring stuff is out of our way. Kidding of course; what may have been boring at times was absolutely necessary for us to get where we are right now. In fact, you really should read those post before continuing.
So where are we exactly? Well, we’re perfectly poised to build a red-black tree. The first two parts in this series and the supplemental post on tree rotations provided us with enough insight and code to tackle insertion.
Last time I shared with you how I plan on defining a node and I left out a thorough description and justification for the way it was defined. As uncharacteristic as that may be of me, it was on purpose. My Node class evolved as my understanding of red-black trees did. Because of this, I felt it irresponsible and even awkward to justify my definition of a node without showing how insertion and deletion benefits from it.
So as we discuss insertion I will be mindful to justify my decisions. After all, I want this to be a learning resource above all else. If by chance I fail to properly explain anything along the way, please comment and ask questions. I will be more than glad to clarify anything in this series.
Thoughts on insertion
A red-black tree is a binary search tree, so insertion should be easy. Fixing any violations that occur, however, is the tricky part. The same logic equally applies for deletion. So then let’s save the tricky part (fixing the tree) for later and just focus on inserting into a binary search tree.
I’m not one to make assumptions, but for the sake of brevity I have to assume that you are familiar with binary search trees. If you’re not familiar with them, take a look here before moving on.
Inserting nodes into a binary search tree is relatively easy and beautifully expressed as a recursive function:
There are two important items to address in the above code; the color of new nodes and how to maintain a black root node. To address those items we have to remind ourselves of the red-black tree properties to uphold and then identify how insertion impacts those properties. To refresh your memory:
- A node is red or black.
- The root of the tree is black.
- A red node cannot have red children.
- Every simple path from a root to any of its leaves must contain the same number of black nodes.
It is obvious that we impact the above properties at the time we create a new node. So the question is how can we create a new node with the least amount of impact? In order to answer that question we’ll need to put our properties up against some scenarios:
- A node is red or black.
The first decision we need to make is how to color a new node. Should all new nodes be red, black, or does it depend on the context? Not sure yet, let’s continue.
- The root of the tree is black.
It would appear that this property affects the way we address property #1. Perhaps the choice of color for new nodes does depend on the context in which the node is created. On the other hand, perhaps it would be easier to make a special case for the root node. Making special cases in code usually makes me second guess my approach as it often implies a lack of general understanding of the problem. I’ll opt to treat the tree with enough generality as to have no indication of the root node during insertion. Once we are done inserting we can force the root to be black.
- A red node cannot have red children.
This gives rise to a red violation. Are those easy to fix? We don’t know yet. Well what do we know then? A black node can have red children, a black node can have black children, and a red node can have black children. Do any of those observations help? Not sure yet, let’s move on.
- Every simple path from a root to any of its leaves must contain the same number of black nodes.
When violated, this gives rise to a black violation. Are those easy to fix? Again, we’re not sure. But this property yields an important insight regarding property #3 and property #1. Let’s assume that we default all new nodes to be black in color. What are the repercussions of that decision? Recall from last time a red-black tree is an abstraction of a 3-4 tree and that black nodes indicate a deeper level on the tree. Since all leaf nodes in a 3-4 tree must be on the same level and a new black node violates that property, we can conclude that we will always cause a black violation if new nodes are black in color. Is that a bad thing? We can’t answer that until we address the alternatives.
Let’s assume that we default all new nodes to be red in color. What are the repercussions of that decision? When a red node is a child of a black node, those two nodes are in the same conceptual 4-node in a 3-4 tree. In this case, the tree is a valid red-black tree and no violations were introduced. When a red node is a child of a red node, we have a red violation. We have two possible outcomes when inserting a red node, either we cause a violation or we don’t. As opposed to inserting black nodes that always result in a black violation.
The third option on the table is to make the choice of color dependent on the context. We only have two choices of color, red or black. If the parent of the new node is red, should I insert a black node to avoid a red violation? No! We already discussed that adding a black node to the tree will always cause a black violation. But then adding a red node in this case will cause a red violation. Hmmm, we’re damned if we do and damned if we don’t. However, we know that inserting a red node does not always cause a violation since it could have a black parent.
Based on the above observations, I made the decision to color new nodes red (as seen in the Node constructor above). This could be a bad decision if red violations are more difficult to fix than black violations. However, I have a suspicion that red violations are easier to fix than black violations.
A black violation means that our isomorphic 3-4 tree has leaf nodes on different levels and that fact can only based on a global observation of the tree. Whereas a red violation means our isomorphic 3-4 tree has a single bloated node and that fact is based on a local observation of the conceptual 4-node we inserted into. It appears that addressing red violations requires less information due to its localized nature. I was very careful here to not prove anything but rather rely on observation and intuition. With that said, let’s insert red nodes.
Let’s not forget about property #2. If we insert a red node into an empty tree, we end up with a red root. Uh oh. Let’s not overthink this one and simply force the root node to be black. You’ll see this in the public Insert function.
Now that we know inserting a red node could lead to a red violation, we need to know how to identify a red violation. In part #1 of this series I made it clear that I would like to avoid parent pointers. This means that we have to identify red violations by looking down the tree as opposed to up the tree. Our recursive insertion algorithm provides us with the perfect opportunity to do so; when the recursion is unwinding.
I suppose identifying red violations starts with identifying a red node:
Try to not get too excited, I know that’s a lot to take in.
Since we’re not using parent pointers, we have to look down the tree to identify a red violation. Identification is the easy part. To identify a red violation, we need to examine at most two levels of the tree because a red violation by definition is two consecutive red nodes. The tricky part is when to identify a red violation.
Let’s assume we have a red violation rooted at a node that we will call X. It has a red child Y, the direction of which is not important. The parent of X is P, the color of which must be black (keep that in the back of your mind). As we will soon learn, we do not have enough information about how to fix the tree until we examine the sibling of X, the child of P opposite that of X. This means that in order to fix a red violation, we must be able to look down upon the red violation from the parent of the root of the red violation. We need access to three levels of the tree.
It seems we are working through this problem backwards. If you boil down the above paragraphs, I essentially said that we have to know how to fix a red violation in order to know how to identify one. Seems we’re in a pickle. However, algorithms are not written linearly. Rather, algorithms change in response to new observations. It can be quite a vicious cycle, but ultimately rewarding.
Looking down the tree, as we unwind our recursive call stack, we can identify red violations as such:
Since a red violation can only occur in the direction of the insertion, our first test is to see if the child of our current node in the direction of the insertion is red. If it is, we then examine that node’s children to see if any one of them is red. If either of them are, we have a red violation.
An interesting property of this function is that it identifies red violations from the perspective of the parent of the root of the red violation. This property helps us avoid parent pointers as it provides us with enough information about the tree to fix a red violation.
That property is perhaps my favorite part about my implementation. Parent pointers are a pain to maintain and can be completely eliminated with a different perspective on the tree. There’s a lot of potential in that concept. Looking at a problem from different perspectives can often yield surprising results.
Fix it already
Since we know how and when to identify a red violation, I suppose it’s time we fix one. Let’s come up with some scenarios, justify or invalidate them, and then determine what to do about them.
In the figures above X is used as the root of the red violation. It’s red child Y causes the red violation. The parent of X is P, and X’s sibling is S.
- Figure 1.1
A red violation is rooted at X. Its red child Y is in the same direction as X is in relation to its parent P. We will call this a straight red violation. Further, X’s sibling S is black.
There appears to be a black violation since the black heights differ between subtrees of P. However, there is no implication being made that X or Y are leaf or near-leaf nodes; they could have additional children. Its important to make the assumption that there are no black violations. This assumption is safe since we avoid black violations by inserting red nodes. We have to make sure to protect this assumption while fixing red violations.
Could X have another red child? No, since if it did it would indicate our red-black tree was invalid prior to this red violation occurring.
We know we have to modify the tree to eliminate the red violation, all the while being careful not to introduce a black violation. Since X’s sibling S is black, we have to make the assumption that there is a black node in the right subtree of P. Does it matter where this black node is? No. Just know that it exists and we must preserve the black height of the tree.
Is the color of P important? Does it affect our decision about how to fix the red violation? The hasty answer is ‘Yes’, but it is incorrect. We can always make the assumption that P is black. If it were red, we would have had an invalid red-black tree prior to the red violation rooted at X.
- Figure 1.2
A red violation is rooted at X. Its red child Y is in the opposite direction as X is in relation to its parent P. We will call this a jagged red violation. The rest of the observations from Figure 1.1 remain.
- Figure 1.3
Here we have a straight red violation, where the sibling S of X is red.
Could there be black nodes further down the tree we do not see? Of course. Does it matter? No. We have to assume the black height of our tree is in check. We learned above that P must be black, so what we know for sure is that the black height of the subtree containing P is at least 1. That is all we have to preserve.
- Figure 1.4
Here we have a jagged red violation, where the sibling S of X is red. The rest of the observations from Figure 1.3 remain.
- Figure 1.5
Here we have 3 red violations! This exercise is based on the assumption that we have created a red violation rooted at X. With this assumption in mind, we had two red violations rooted at P prior to the one we’re attempting to fix. Since we fix red violations as they occur, this could only happen if we’re terrible at fixing red violations. We will never encounter this scenario.
- Figure 1.6
Refer to the observations from Figure 1.5.
- Figure 1.7
Similar to the observations from Figure 1.5, but this time we only have 2 red violations. Funny how I made that sound like a good thing. We know this is not possible.
- Figure 1.8
Refer to the observations from Figure 1.7.
As you can probably tell, I really enjoy observing scenarios and then justifying or invalidating them. The reason is it really helps me generalize my observations prior to writing any code. Think first, code later.
Now we need to generalize our scenarios. Reading the above observations a few times, here is what we can expect of the tree when we encounter a red violation:
- The parent P of the root of the red violation X is always black.
- The sibling S of the root of the red violation X could be red.
- The sibling S of the root of the red violation X could be black.
While we attempt to fix red violations using these generalizations, we need to keep the black height of the subtree rooted at P unchanged. We want to stay focused on fixing red violations and not inadvertently introduce any black violations. Any black violations created will only be temporary, as you will soon see.
At this time it might be obvious that an easy way to fix a red violation would be to color either X or its red child Y black. However, coloring Y black could introduce a black violation rooted at X since it may not have a black node in the subtree opposite that of Y.
If we color X black, we might introduce a black violation rooted at P. But what if the sibling S of X is red? We could balance out the black height by coloring S black too. Doing that however will increase the black height of the subtree containing P. Well, we already know that P must be black, so coloring it red will keep the black height in check. Of course coloring P red could introduce a red violation higher up the tree. That’s fine though since we can fix that later, as the recursion unwinds.
To summarize, red violations can be fixed by a simple recoloring when the sibling S of the root of the red violation X is red. This fixes the red violations shown in figures 1.3 and 1.4:
I simply combined figures 1.3 and 1.4 by showing Y in either place it could occur. The point is that in this case the position of Y is irrelevant.
So then what do we do when S is black? Again, it appears we can fix the tree through recoloring. If we eliminate the red violation by recoloring X black, we create a black violation. Well, what if we color P red? Fair enough, what does that look like?
Looks OK right? Not so fast; we’ve decreased the black height of the subtree containing P. If we assume that prior to the recoloring we had a valid red black tree, then that means the subtree rooted at X contains black nodes. Hmm, that would mean we have another black height violation rooted at X! If only we could force X to be the root of this subtree instead of P. If we could pull that one off, then we would fix the black height violations. Also, since X would be the root of the subtree and it is black, we wouldn’t introduce a red violation higher up in the tree.
Were you wondering why I’ve been linking to my post regarding tree rotations? Well now you know. The rest of this discussion will assume you’ve read that post and understand tree rotations.
Since we’re going to start rotating the tree, let’s work on figures 1.1 and 1.2 independently, addressing figure 1.1 first. In figure 1.1 we have a straight red violation, meaning the direction of Y in relation to its parent X is the same as the direction of X in relation to its parent P.
With P acting as the root of the rotation and X as the pivot, let’s go ahead and rotate left:
Comparing the right-hand side of figure 4 with our original tree (the left-hand side of figure 3), we see the root of our subtree is still black and we have eliminated the red violation. But have we introduced a black violation? At first glance, it might appear to be so.
Refer back to our original tree (the left-hand side of figure 3) and imagine that Y is the right child of X, meaning we have a straight red violation. What can we assume about the left child of X? It cannot be red since that would mean we failed to fix a red violation at some earlier time. Could it be null? No, if it were, we would have a black violation rooted at P. Alright then, its black. The assumption we then have to make is that Y has a black height of 1.
So then, have we introduced a black violation? No, because we are champions of observation and reason (my apologies for how lame that sounded). Figure 4 shows us the left child of X (the “?” node) was moved into P’s subtree and we know this node must be black. Further, we know Y’s subtree has a black height of 1. This entire subtree then has a black height of 1, just as it did prior to any recoloring or rotating.
We just fixed another red violation. Nailed it.
What happens when we have a jagged red violation? Refer back to our original tree (the left-hand side of figure 3) and imagine that Y is the left child of X. If we rotate P left, X ends up as the root of this subtree and we end up with this:
Red violation, shit.
We know how to fix a straight red violation, so could we turn a jagged red violation into a straight one? If we could, then we could just apply our fix for a straight red violation and call it a day. Let’s try that out.
We cannot just swap the children of X to force the tree into a straight red violation since that would invalidate our binary search tree. Rather, let’s rotate X right to force Y into X’s place.. Starting over from figure 1.3, here is what this rotation would yield:
That’s great! We’ve straightened out the jagged red violation. Please note that the “?” node is not necessarily the same node between transformations. It simply acts a placeholder for a node we’re not concerned with. Now we simply follow our steps for fixing a straight red violation and get this:
And not surprisingly, we just fixed another straight red violation.
All my examples above involved a red violation down the right subtree of the root node P. What about the left subtree? I suppose the same logic applies but with our rules regarding directions flipped. There appears to be additional observations and generalizations to be made regarding directions.
In order to avoid an if-else-mess of tests regarding which direction which node is in, we will observe how the direction of of a red violation affects the way we look at the rest of the tree.
- Assume the root of the red violation, X, is the right child of P. Also assume we’re dealing with a straight red violation. In this case, we rotate P left, making X the new root of the subtree.
- Assume the root of the red violation, X, is the right child of P. This time, assume we’re dealing with a jagged red violation. Here, we know X’s red child Y is X’s left child. Rotate X right, making it the right child of Y. This yields a straight red violation, and we proceed with step #1.
- Flipping things around a bit, assume the root of the red violation, X, is the left child of P. Also assume we’re dealing with a straight red violation. In this case, we rotate P right, making X the new root of the subtree.
- Finally, assume the root of the red violation, X, is again the left child of P. But this time we’re dealing with a jagged red violation. Here, we know X’s red child Y is X’s right child. Rotate X left, making it the left child of Y. This yields a straight red violation, and we proceed with step #3.
As usual, generalization follows observation:
- When rotating to fix a straight red violation, we rotate the root of the tree, P, in the direction opposite that of the root of the red violation, X.
- When rotating to fix a jagged red violation, we rotate the root of the red violation, X, in the same direction that X is in relation to its parent P.
And finally, code. The meat and potatoes is the private Insert function. Everything else is old news.
Go ahead and throw that in a C# console application and test it out. Go through the scenarios we did here and examine the tree along the way.
Obligatory sign off
As usual, the links to the other parts in this series are included below. Believe it or not, this post actually wound up shorter than I expected. If you thought it was too long, just wait until next time.
I mentioned in the introductory post that I reserve the right to call an audible regarding the parts of this series. I may have to include an additional part where we validate the trees our code produces.