Spelling Trees

Some people call a data structure a dictionary if it is a collection of ``words,'' and it has methods for inserting and finding specific words in the collection.

If a ``word'' is a sequence of characters, or in general, a sequence (or list) whose elements can be ordered, then there is a clever implementation of a dictionary as a so-called spelling tree or trie (pronounced ``try''). For example, say that we have some objects, o1, o2, o3, o4, and o5, and say that the respective keys of these objects are the the words, be, bed, bee, been, and it. We can organize the objects so that each key defines a path from the root of the tree to the node that holds the object:

               null
           b/       \i
          null      null
          e|         |t
          o1        o5
        d/ e\  
        o2  o3 
             |n
            o4
For example, object o1, whose key is be, is found by traversing the path labelled b followed by e. Notice that object o2, whose key is bed, is located by following the path labelled b then e, then d.

For completeness sake, some paths lead to nodes where there are no values; such nodes hold ``null''. For example, the key, i, leads to a node that holds null. Notice that the ``leaves'' in the drawing are nodes that do not hold (links to) more nodes.

The labels on the branches replace the usual labels (fields) named ``left'' and ``right,'' and a Node may have some nonnegative quantity of subtrees. For this reason, a spelling tree is an example of an ``n-ary'' tree, where the value of n is a nonnegative number. (Of course, a binary tree is a 2-ary tree.)

Note that

In practice, spelling trees are often preferred over binary trees to store keyed objects, because it is easy to work directly with the symbols within the keys. But we will see that the implementation becomes slightly more complicated.

Designing Spelling Trees

To make the previous drawing of a spelling tree come to life, we use a fixed alphabet for the keys (e.g., the characters 'a' through 'z'). Then, the inductive data type definition for a spelling tree might be written like this:
A SpellingTree object is
  1. A Node object, which contains
    • a Value
    • a set of SpellingTree objects (which might be empty), where each spelling-tree object is labelled by a symbol of the alphabet.

A Value is either:
  1. an object, called the ``value'', or
  2. empty (also known as ``null'')
That is, a spelling tree is a node that holds/links to other spelling trees.

The above inductive definition is not the only way to define the data type of spelling trees.

Implementing Spelling Trees

The inductive definition gives us a strong hint how to build a spelling tree: class Node would hold (the addresses of) a set of spelling trees, plus a ``value'' (which is an object or null).

There are many ways to model sets, but since the set of spelling trees are indexed by letters of the alphabet, an array implementation works well. For the example tree above, the root Node object might look like this in heap storage:

   a1 : Node
   ---------------------------------
   |  value ==| null |
   |           ------
   |             'a'  'b'  'c'  ...   'i'  ...   'z' 
   |            ------------------------------------
   |  subtree: |null| a2 |null| ... | a3 | ... |null|
   |            ------------------------------------
   |
subtree is an array whose indexes are the letters of the alphabet used to form keys. (Unfortunately, Java does not allow letters to be array indices, so you must do a conversion when you code the array.) Since the root has one subtree indexed by 'b' and one subtree indexed by 'i', there are nonnull addresses to the Node objects for these two subtrees.

Next, the node indexed by 'b' looks like this:

   a2 : Node
   ---------------------------------
   |  value ==| null |
   |           ------
   |             'a'  'b'  'c'  'd'   'e'  ...   'z' 
   |            ------------------------------------
   |  subtree: |null|null|null| null | a4 | ... |null|
   |            ------------------------------------
   |
and its subtree indexed by 'e' looks like this:
   a4 : Node
   ---------------------------------
   |  value ==| o1 |
   |           ------
   |             'a'  'b'  'c'  'd'  'e'  ...   'z' 
   |            ------------------------------------
   |  subtree: |null|null|null| a5 | a6 | ... |null|
   |            ------------------------------------
   |

The main advantage of using an array to label the subtrees is that key processing is fast (because array lookup is fast). The main disadvantage is that a huge alphabet requires huge arrays within each node---this is a major loss if the array holds mostly null addresses.

When the alphabet is huge or not fixed, a linked list can be used to save the subtrees for a given Node (but this can make key lookup much slower).