If a ``word'' is a sequence of characters, or in general, a sequence (or list) whose elements can be ordered, then there is a clever implementation of a dictionary as a so-called spelling tree or trie (pronounced ``try''). For example, say that we have some objects, o1, o2, o3, o4, and o5, and say that the respective keys of these objects are the the words, be, bed, bee, been, and it. We can organize the objects so that each key defines a path from the root of the tree to the node that holds the object:
null b/ \i null null e| |t o1 o5 d/ e\ o2 o3 |n o4For example, object o1, whose key is be, is found by traversing the path labelled b followed by e. Notice that object o2, whose key is bed, is located by following the path labelled b then e, then d.
For completeness sake, some paths lead to nodes where there are no values; such nodes hold ``null''. For example, the key, i, leads to a node that holds null. Notice that the ``leaves'' in the drawing are nodes that do not hold (links to) more nodes.
The labels on the branches replace the usual labels (fields) named ``left'' and ``right,'' and a Node may have some nonnegative quantity of subtrees. For this reason, a spelling tree is an example of an ``n-ary'' tree, where the value of n is a nonnegative number. (Of course, a binary tree is a 2-ary tree.)
Note that
This implies that insertions and lookups in a spelling tree take time
on the order of logM(N), which is roughly equivalent
to the time taken to do insertions in lookups in a binary tree,
where M = 2. (For some intuition, calculate the values of
logM(N), for these ranges of M and N:
M = 2 or 10;
N = 32 or 100 or 1000 or 10000.)
In practice, spelling trees are often preferred over binary trees to store keyed objects, because it is easy to work directly with the symbols within the keys. But we will see that the implementation becomes slightly more complicated.
The above inductive definition is not the only way to define the data type of spelling trees.
There are many ways to model sets, but since the set of spelling trees are indexed by letters of the alphabet, an array implementation works well. For the example tree above, the root Node object might look like this in heap storage:
a1 : Node --------------------------------- | value ==| null | | ------ | 'a' 'b' 'c' ... 'i' ... 'z' | ------------------------------------ | subtree: |null| a2 |null| ... | a3 | ... |null| | ------------------------------------ |subtree is an array whose indexes are the letters of the alphabet used to form keys. (Unfortunately, Java does not allow letters to be array indices, so you must do a conversion when you code the array.) Since the root has one subtree indexed by 'b' and one subtree indexed by 'i', there are nonnull addresses to the Node objects for these two subtrees.
Next, the node indexed by 'b' looks like this:
a2 : Node --------------------------------- | value ==| null | | ------ | 'a' 'b' 'c' 'd' 'e' ... 'z' | ------------------------------------ | subtree: |null|null|null| null | a4 | ... |null| | ------------------------------------ |and its subtree indexed by 'e' looks like this:
a4 : Node --------------------------------- | value ==| o1 | | ------ | 'a' 'b' 'c' 'd' 'e' ... 'z' | ------------------------------------ | subtree: |null|null|null| a5 | a6 | ... |null| | ------------------------------------ |
The main advantage of using an array to label the subtrees is that key processing is fast (because array lookup is fast). The main disadvantage is that a huge alphabet requires huge arrays within each node---this is a major loss if the array holds mostly null addresses.
When the alphabet is huge or not fixed, a linked list can be used to save the subtrees for a given Node (but this can make key lookup much slower).