An ordered tree (binary search tree) is used when we wish to store objects with (numerical) keys in a binary tree so that lookups can be done in order log_{2}N time, where N is the number of objects in the tree. But an ordered tree that is seriously ``unbalanced,'' that is, where paths from the root to the leaves have dramatically different lengths, will ruin the desired lookup behavior.
The worst-case example of an unbalanced ordered tree is the tree built by inserting a sorted sequence of objects (we show the numerical keys only; the objects attached with the keys are unimportant):
1 2 3 4The tree looks like this:
1 / \ . 2 / \ . 3 / \ . 4 / \ . .Obviously, a lookup in this tree is just a linear search, which is slower than log-time.
How can we maintain an ordered tree so that, regardless of the order of insertions, the tree remains balanced? There are several sophisticated technques for doing so; here we consider one of the most elegant, AVL trees.
A Node is balanced if the height of its left subtree is plus-or-minus-one the height of its right subtree.
A binary tree has the height-balanced property if all of its Nodes are balanced.
Here is an example of an AVL-tree, whose root holds 44:
44 / \ 17 78 / \ / \ . 32 50 88 / \ / \ / \ . . 48 62 . . / \ / \ . . . .For clarity, we redraw the tree and write the heights next to each Node, in parentheses:
44(4) / \ 17(2) 78(3) / \ / \ . 32(1) 50(2) 88(1) / \ / \ / \ . . 48(1) 62(1) . . / \ / \ . . . .Notice that, for every node in the tree, the heights of its subtrees are the same, plus-or-minus one.
(By the way, the initials, ``AVL,'' denote the two Russian researchers, G.M. Adel'son-Vel'skii and Y.M. Landis, who developed the key definitions and algorithms for the tree format.)
The example shows that an AVL-tree is not ``exactly'' balanced, or complete (like a heap), but it is ``balanced well enough''. Here is the reason why is it is ``well enough'':
(For example, if the tree held 2048 objects, then lookup in a complete tree takes 12 comparisons, worst case, whereas AVL lookup takes 24 comparisons, worst case. Linear search would take 1024 comparisons, on average, and 2048 comparisons, in worst case.)
Let's reconsider the above tree and insert 54 into it. We know how to perform insertion into ordered trees, and this gives us:
44(5)! / \ 17(2) 78(4)! / \ / \ . 32(1) 50(3) 88(1) / \ / \ / \ . . 48(1) 62(2) . . / \ / \ . . 54(1) . / \ . .Again, the heights are listed next to the nodes. The insertion of 54 has made unbalanced the two nodes marked by ! --- they do not have the height-balanced property. In general, many nodes, all located along the path from the root to the insertion position, might have their heights changed and become unbalanced. We must rebuild the tree so that the unbalanced nodes become balanced again.
A general diagram of the problematic situation looks like this: Say that an AVL tree has a subtree, Z, that has two differently heighted subtrees, and say that the larger-height subtree, Y, is ``full'':
root / \ . . . . Z(n+2) / \ *(n) Y(n+1) / \ / \ *(n) *(n) /\ /\(As usual, we write the heights in parentheses next to the nodes. Subtree Y is``full'' in the sense that both of its subtrees have the same height, and adding one more value to a subtree will cause the subtree's height to increase.) Say that we must insert a new number, k, and the insertion places k inside subtree Y, and this generates an imbalance:
root / \ . . . . Z(n+3)! / \ *(n) Y(n+2) / \ / \ *(n) *(n+1) /\ /\ kSubtree Y is still balanced, but since its height increased by one, this ruins Z's balance.
Is it remarkable that we can repair the situation by moving --- rotating --- not k but the nodes above it. Indeed, we will rotate just three nodes.
Here is the clever strategy:
44(5)! / \ 17(2) Z78(4)! / \ / \ . 32(1) Y50(3) 88(2) / \ / \ / \ . .48(1) X62(2) . . / \ / \ . . 54(1) . / \ . .Here is a more general diagram of the situation, with node heights indicated in parentheses:
Z(n+3)! / \ *(n) Y(n+2) / \ / \ *(n) X(n+1) /\ /\ kThat is, the insertion of the new object has caused Node X's height to increase by one, and this makes Node Y's height increase by one. (Node that Node X might itself be the new object, and note that Node Y is still balanced. Hence, the heights must be related as shown.) The result is that Node Z's height has increased from n+2 to n+3 and is unbalanced.
Rather than try to move Node k, our objective is to rearrange Nodes Z, Y, and X, so that the subtree becomes balanced again and recovers the height of n+2, which it had prior to the insertion. All three nodes might be moved, but their subtrees, and the rest of the tree, will be unaltered.
Now remember that each of Nodes Z, Y, and X hold numerical values, and remember that the three nodes are arranged in a path, a linear sequence, in the tree, like this:
Node Z \ Node Y / Node XTo reduce the height of this structure, it makes sense to rearrange the three nodes into this pattern:
Node b / \ Node a Node cThis would reduce the height by one! But which of Z, Y, X, should be Node b? Node a? Node c?
The answer is simple: We compare the values held in Nodes Z, Y, and X. The node that holds the smallest number will be ``Node a''; the node the holds the middle number will be ``Node b''; and the node the holds the largest number will be ``Node c''.
Another way of identifying Nodes a,b, and c is to visually ``read'' the tree from left to right; the leftmost node is Node a; the middle is Node b; and the rightmost is Node c. (This left-to-right ``reading'' can be coded as an in-order tree traversal.)
The above picture is incomplete, because Nodes Z, Y, and X have their own subtrees. We must label the the four subtrees attached to Nodes a, b, and c, as T0, T1, T2, and T3. Again, we can do this by ``reading'' the tree from left to right.
But here is a more precise statement of the algorithm: Given Nodes Z, Y, and X:
Here is the labelling for the example:
Z78c / \ Y50a 88T3 / \ / \ 48T0 X62b . . / \ / \ . . 54T1 .T2 / \ . .Now, given Nodes a, b, c, and given subtrees T0, T1, T2, T3, we rotate the nodes and reconnect them to their subtrees like this:
b / \ a c / \ / \ T0 T1 T2 T3For the example, the result of the rotation looks like this:
62b / \ 50a 78c / \ / \ 48T0 54T1 .T2 88T3 / \ / \ / \ . . . . . .If you carefully consider the algorithm we used to attach the labels, a, b, and c, and T0, T1, T2, and T3, you can verify that the rotated tree is still an ordered tree. Further, you can visually verify that the height of the tree has been reduced by one!
To validate the reduction in height, we can draw diagrams of all the possible combinations of Nodes a,b, and c that can arise, and we see that each rotation reduces the height of the root node by one. There are four cases:
CASE 1: Node Z a ==> Y / \ / \ T0 Node Y b Z X / \ / \ / \ T1 Node X c T0 T1 T2 T3 / \ T2 T3
CASE 2: Node Z c ==> Y / \ / \ Node Y b T3 X Z / \ / \ / \ Node X a T2 T0 T1 T2 T3 / \ T0 T1
CASE 3: Node Z a ==> X / \ / \ T0 Node Y c Z Y / \ / \ / \ Node X b T3 T0 T1 T2 T3 / \ T1 T2
CASE 4: Node Z c ==> X / \ / \ Node Y a T3 Y Z / \ / \ / \ T0 Node X b T0 T1 T2 T3 / \ T1 T2
The basic idea: as before, consider a node deletion as the deletion of the root (of a subtree). Replace the deleted root by an ``innermost'' node, as described in the lecture on ordered trees.
The promotion of the innermost node to the root might make one half of the rebuilt tree too shallow and unbalance the tree, and (repeated) rotations might be needed to repair the problem. The rotations would proceed along the path from the location of the former innermost node to the root of the overall tree.
So, starting from the leaf that replaced the promoted innermost node, search along the path from the leaf to the overall root, examining each node to see if it is unbalanced. If an unbalanced node is located, call it Node Z.
We must revise the definition of the Y-X-nodes:
Unfortunately, additional rotations might be necessary, so the technique is repeated for the parent node of the newly installed rotated root, and its parent, etc., until the root of the entire tree is balanced. Although this is more work, there are at most on the order of log_{2}N rotations, so the deletion operation is relatively efficient.