Inductively defined data structures

Roughly stated, there are three species of data structures in the computing world:

tabular data structures, like arrays and tables and grids. The objects held within an tabular data structure are equally visible and accessible, e.g.,
```
Array-------------------+
| "a" | "b" | "c" | "d" |
+-----------------------+
```
all four objects live at the same ``level.'' A loop is used to systematically process the contents of the structure, examining Cell 0, Cell 1, Cell 2, etc.
linked data structures, like singly and doubly linked lists, graphs, and networks. Individual objects are held in small ``containers'' or ``cells'' that are linked together. Extra variables are used to remember entry points into the linked structure, e.g.,
```
front    Cell--+---+    Cell--+---+    Cell--+------+   rear
| o-|--> | "a" | o-|--> | "b" | o-|--> | "c" | null |<--|-o |
+---+    +-----+---+    +-----+---+    +-----+------+   +---+
```
A loop is used to traverse the cells of the list, starting from the entry point and following the links embedded in the cells of the list.
layered (or recursive) data structures, like lists and trees and file (folder) systems. The objects held within a layered data structure are organized into ``levels'' or ``layers'' such that access to an object must be done by entering the structure's levels one at a time until the level is reached where the desired object resides:
```
Cell-----------------------+
|     "a"                  |
|     Cell--------------+  |
|     |    "b"          |  |
|     |    Cell-------+ |  |
|     |    |    "c"   | |  |
|     |    |    null  | |  |
|     |    +----------+ |  |
|     +-----------------+  |
+--------------------------+
```
A recursively defined method is used to systematically process the contents of the structure.
When a recursive data structure is designed so that each of its levels contains levels of only simpler complexity, then we say that the structure is an inductive data structure. (An example of a layered data structure that is not inductive is a list of infinite length.) In this course, we work only with inductive data structures.

The above three classifications are conceptual; how we code these ideas into Java or Prolog or C is our decision. (For example, we might implement a layered data structure using links!)

Although it may seem initially awkward to organize objects into levels, the technique is valuable in practice, because it readily supports structures that can organically ``grow'' while a program executes.

How to define an inductive data structure

We used class Cell to implement a data type of lists. It is time to define precisely what a list is. The classic name is a cons list (or conslist), and we describe it precisely by means of a set of definitional clauses:

An object is a Conslist-object if

it is a structure, Nil; or
it is a structure, Cons(h, t), where h is an Object and t is another Conslist-object.

(The terms, Cons and Nil --- rather than Cell and null --- are the traditional ones.) This form of definition is called an inductive definition, because we can use it to describe and generate finite-level Conslist objects.

If you wish, you can ``draw'' the inductive definition

An object is a Conslist-object if

it is a structure,
```
Nil
```

or, it is a Cons- structure,

Cons-----------------+
|  h: Object         |
|  t: +-Conslist-+   |
|     |  ...     |   |
|     +----------+   |
+--------------------+

For example,

Cons-----------------------+
|  h: "a"                  |
|  t: Cons--------------+  |
|     | h: "b"          |  |
|     | t: Cons-------+ |  |
|     |    | h: "c"   | |  |
|     |    | t: Nil   | |  |
|     |    +----------+ |  |
|     +-----------------+  |
+--------------------------+

is a picture of a Conslist that holds three string objects, "a", "b", and "c", within three levels of nested structure.

Since the pictures quickly get huge, we will often use the linear forms, e.g.,

Cons("a", Cons("b", Cons("c", Nil) ) )

to indicate the list's nested structure.

The correctness of the structure's construction is is formally justified as follows:

By Clause 2, Nil is a Conslist object;
Since Nil is a conslist object and "c" is a (string) object, then by Clause 1, Cons("c", Nil) is a conslist object;
Since Cons("c", Nil) is a Conslist object and "b" is an object, then by Clause 1, Cons("b", Cons("c", Nil)) is a Conslist object;
By similar justification, Cons("a", Cons("b", Cons("c", Nil))) is a Conslist object.

Now, data-structure building is a kind of physical ``game'', where we start with some basic piece, e.g., Nil, and we place it along with an object, like "c", in a box --- a Cons box. We then place the Cons box plus another object, say "b", into another Cons box. And so on. This is how we build Conslists.

Later, when we want to retrieve the string objects we ``packed'' into the Conslist, we will have to ``open'' the Cons boxes, one level at a time.

How we have implemented Conslists in Java

Given the inductive definition of conslist, we must use Java programming phrases to mimick the definition. Here is the way we were doing it, without even knowing that we were doing it:

the Nil object is coded as null
the Cons(h, t) object is coded as new Cell(h, t), using class Cell.

Hence, the conslist drawn above is written this tersely in Java:

new Cell("a", new Cell("b", new Cell("c", null)))

Of course, we can assemble this 3-level structure in increments, if we desire:

Cell x = new Cell("c", null);
Cell y = new Cell("b", x);
Cell z = new Cell("a", y);

How to program operations on conslists

When a data type is defined with an inductive definition, computation on elements of the data type is also defined inductively (recursively), and this is implemented by means of a recursively defined method.

The idea goes as follows: Since there are two forms of Conslists, then we should have two recipes for processing a Conslist---one for the Nil-structure and one for the Cons-structure. We might write each ``recipe'' as an equation, from algebra, like this:

process( Nil ) =  return ...some simple answer...

process( Cons(h, t) ) =  ... use recursion to compute  t_answer = process(t);
                         return ...an answer built from  h  and  t_answer...

Here is the lengthOf example specified in equational style:

/** length of  a ConsList:  */
lengthOf( Nil ) = 0
lengthOf( Cons(h, t) ) =  1 + lengthOf(t)

This set of equations, one per clause in the inductive definition, describes the computational steps needed to descend into the levels of a conslist object so that we can compute its length (or, if you will, its depth).

The above schema is mechanically reformatted into a Java method when we use null and class Cell to implement a conslist:

public int lengthOf(Cell l)
{ int length;
  if ( l == null )
       { length = 0; }
  else { length = 1 + lengthOf(l.getNext()); }
  return length;
}

Indeed, for inductively defined data structures, the equational-schema format gives us a fool-proof algorithm for processing the data structures!

How the recursions execute on the Conslists

Let's do some algebra with the equational formulation of lengthOf to count the length (depth) of the conslist, Cons("a", Cons("b", Cons("c", Nil))). Here is the definition, again:

lengthOf(Nil) = 0
lengthOf( Cons(h, t) ) =  1 + lengthOf(t)

and here is the calculation, like one would do in algebra class:

lengthOf( Cons("a", Cons("b", Cons("c", Nil))) )

       because the argument has form,  Cons(...,...),
         use the second equation:

= 1 + lengthOf( Cons("b", Cons("c", Nil)) )

       again, use the second equation:

= 1 + 1 + lengthOf( Cons("c", Nil) )

= 1 + 1 + 1 + lengthOf( Nil )

       the first equation applies here:

= 1 + 1 + 1 + 0

= 3

Here is a two-dimensional drawing of the above calculation; the drawing shows how the recursive style of data-structure processing descends into the Conslist structure while it calculates its answer:

lengthOf(
Cons-----------------------+
|     "a"                  |
|     Cons--------------+  |
|     |    "b"          |  |
|     |    Cons-------+ |  |
|     |    |    "c"   | |  |
|     |    |    Nil   | |  |
|     |    +----------+ |  |
|     +-----------------+  |
+--------------------------+  )

=  1 +
      lengthOf(
      Cons--------------+   
      |    "b"          |   
      |    Cons-------+ |   
      |    |    "c"   | |   
      |    |    Nil   | |   
      |    +----------+ |   
      +-----------------+   )

= 1 +
    1 +
       lengthOf(
           Cons-------+     
           |    "c"   |     
           |    Nil   |     
           +----------+ )

= 1 +
    1 +
      1 +
        lengthOf( Nil )

= 1 + 1 + 1 + 0  =  3

How does the above reasoning translate into Java programming? Once again, here is the coding of lengthOf in Java:

public int lengthOf(Cell l)
{ int length;
  if ( l == null )
       { length = 0; }
  else { length = 1 + lengthOf(l.getNext()); }
  return length;
}

We might well ask: Does the execution of the Java coding construct the graphical structures shown in the above drawings? Well, not exactly---recall that computer heap storage is ``flat'' and nonnested. Remember that we have been using Cells to represent such conslists. Hence, a three-level nested structure like Cons("a", Cons("b", Cons("c", Nil))) is in fact mimicked by three separate cells (and null) that are linked together with storage addresses:

a4 : Cell----+       a3: Cell----+       a2: Cell---+  
|  "a"       |       |   "b"     |       |  "c"     | 
|  a3        |       |   a2      |       |  null    |
+------------+       +-----------+       +----------+

Remember also that a series of recursive-method invocations are modelled with the activation-record stack in the Java Virtual machine. Thus, an execution configuration like this one:

1 + 1 + lengthOf( Cons("c", Nil) )

or drawn graphically,

  1 +
    1 +
       lengthOf(
           Cons-------+
           |    "c"   |
           |    Nil   |
           +----------+ )

shows the situation where a list is partially counted due to 3 recursions. Recall from the previous lecture that, within the Java Virtual Machine, the activation-record stack looks like this: (Note: the stack is tipped on its side so that it is growing from left to right.)

                                                  top
                                                   |
+--------------------------------------------------V-----
| +---------------+ +---------------+ +-----------+
| | l == a4       | | l == a3       | | l == a2   |
| | length = 1 + ?| | length = 1 + ?| |           |
| |  ...          | |  ...          | |   ...     |
| +---------------+ +---------------+ +-----------+
+--------------------------------------------------------

The activation-record stack shows that lengthOf has started three times, and the most recent activation is trying to count the length of the list at address a2. Once the a2-list length is counted, then the answer, an integer, will be returned to the caller, which adds one to it, giving the length of the a3 list. That answer is returned, to its caller, which adds one, giving the length of the a4-list.

Again, please review the previous lecture to see how the Java Virtual Machine uses an activation-record stack to compute recursive method calls.

In summary, the equational calculations and two-dimensional drawings give us powerful design and reasoning tools that are more elegant than but nonetheless consistent with the actual computer implementation. When trying to solve a complex data-structure problem, it is often helpful to visualize the solution as a graphical computation on the nested, recursive data structure.

More examples of Conslist calculations

If you are interested, here is another standard example written in the recursive style:

/** toString assembles a string representation of a Conslist:  */
toString(Nil) = ""
toString( Cons(h, t) ) =  h.toString() + " " + toString(t)

It is easy to reformat this example into a recursive Java method:

public String toString(Cell l)
{ String answer;
  if ( l == null )
       { answer = ""; }
  else { answer = l.getVal().toString() + " " + toString(l.getNext()); 
       }
  return answer;
}

Both lengthOf and toString are simple examples of recursive processing that traverses all elements of a conslist. Of course, we know that we can traverse a list with a mere while-loop. Are there patterns of list processing that are not merely mimicking loops? Yes---here are two:

/** append accepts two conslists as arguments and 
  *  build sa new list that has the contents 
  *  of the two, appended together */
append(Nil, ys) = ys
append(Cons(h, t),  ys) =  Cons(h, append(t, ys))

The above pattern is worth pondering---append(list1, list2) says we should build the appended list by descending into the innermost structure of list1, level by level, attaching list1's elements onto the front of list2 as we go. It is enlightening to see this in a calculational trace:

append( Cons("a", Cons("b", Nil)),  Cons("z", Nil) )
= Cons("a",  append( Cons("b", Nil),  Cons("z", Nil) ))
= Cons("a",  Cons("b",  append( Nil, Cons("z", Nil) )))
= Cons("a",  Cons("b",  Cons("z", Nil)))

Notice how the Cons structures are enclosing the recursive invocations.

It is fun to redraw the calculation graphically:

append(
Cons--------------+      Cons-------+
|    "a"          |      |    "z"   |
|    Cons-------+ |      |    Nil   |
|    |    "b"   | |      +----------+  )
|    |    Nil   | |   
|    +----------+ |   
+-----------------+  ,

=

Cons--------------------------------------------+
|    "a"                                        |
|    append( Cons-------+     Cons-------+      |
|            |    "b"   |     |    "z"   |      |
|            |    Nil   |     |    Nil   |      |
|            +----------+ ,   +----------+  )   |
+-----------------------------------------------+

=

Cons--------------------------------------------+
|    "a"                                        |
|    Cons---------------------------------+     |
|    |    "b"                             |     |
|    |    append( Nil,  Cons-------+      |     |
|    |                  |    "z"   |      |     |
|    |                  |    Nil   |      |     |
|    |                  +----------+  )   |     |
|    +------------------------------------+     |
+-----------------------------------------------+

=

Cons---------------------------+
|    "a"                       |
|    Cons----------------+     |
|    |    "b"            |     |
|    |    Cons-------+   |     |
|    |    |    "z"   |   |     |
|    |    |    Nil   |   |     |
|    |    +----------+   |     |
|    +-------------------+     |
+------------------------------+

It is a bit amazing that we can replicate this systematic recursive descent in Java, but we can:

public Cell append(Cell list1, Cell list2)
{ Cell answer;
  if ( list1 == null )
       { answer = list2; }
  else { answer =
            new Cell(list1.getVal(),
                     append(list1.getNext(), list2));
       }
  return answer;
}

The power comes from nesting the recursive invocation, append(list1.getNext(), list2) inside the use of new Cell( ... )!

Here is a question for you to consider: For this example,

Cell alist = new Cell("a", new Cell("b", null));
Cell blist = new Cell("z", null));
Cell clist = append(alist, blist);

are either of alist or blist altered due to the use of append(alist, blist) to construct clist? The answer is no---indeed, clist and blist share the same objects, but the lists are not altered! (If you are uncertain of this, draw a picture of computer heap storage and work the example by hand.)

Here is a second example that employs a similar cleverness:

/** reverse builds a list that is the reversed version
  *  of its argument */
reverse(Nil) = Nil
reverse(Cons(h, t)) = append( reverse(t), Cons(h, Nil) )

This one is a fun exercise for you to work for yourself:

public Cell reverse(Cell list)
{ Cell answer;
  if ( list1 == null )
       { answer = null; }
  else { answer = append(reverse(list.getNext()),
                         new Cell(list.getVal(), null));
       }
  return answer;
}

A question: why must we use new Cell(list.getVal(), null) and not just list.getVal()?

Other inductively defined data structures

Other data structures can be defined by means of inductive definition. Here are some classic examples:

File Systems

The standard disk file system is perhaps the best known inductively defined data structure:

An object is a FileSystem-object if

it is a structure, Textfile; or
it is a structure, Folder(t1, ..., tm, f1, ..., fn), where m >= 0 and n>= 0, all ti are Textfiles, and all fj are Filesystems.

The second clause of the definition states that folders can hold zero or more textfiles and zero or more (sub)folders.

A huge advantage of this form of file system is that it can grow as deeply as needed when the user adds more and more textfiles and folders.

Say that we want to write a program that counts all the textfiles held in a file system. How do we start? The equational specifications show us the way:

/** countFiles  counts the number of textfiles in a file system */

countFiles( Textfile ) =  return 1;

countFiles( Folder(t1, ..., tm, f1, ..., fn) )
   =  subcounts = 0;
      for ( j in 1 to n )  subcounts = subcounts + countFiles(fj);
      return  m + subcounts;

The recursions into the subfolders neatly total the counts of textfiles in the subfolders, which we sum into the total count.

Natural numbers

The natural numbers are the nonnegative integers---0, 1, 2, etc. Surprisingly, these numbers can be viewed as structures in an inductively defined data type:

An object is a Nat-object if

it is the object, Zero; or
it is a structure, Succ( N ), where N is an existing Nat object.

With this formulation, 0 is represented as Zero, 1 is represented as Succ(Zero), 2 is represented as Succ(Succ(Zero)), and so on. (Think of Nat objects as the numbers in ``base one'' arithmetic.)

Admittedly, representing numbers as nested structures is a game, but the ``game'' motivates modern-day set theory and even the construction of computer circuits.

Here are some examples of processing natural numbers:

Checking if a number is even:

isEven(Zero) = true
isEven(Succ(N)) = !isEven(N)

An example: is 3 even?

isEven(Succ(Succ(Succ(Zero))))
= ! isEven(Succ(Succ(Zero)))
= ! ! isEven(Succ(Zero))
= ! ! ! isEven(Zero)
= ! ! ! true
= ! ! false
= ! true
= false

Doubling a number:

double( Zero ) = Zero
double( Succ(N) ) = Succ(Succ( timesTwo(N) ))

Try it---say we compute double( Succ(Succ(Zero)) ):

double( Succ(Succ(Zero)) )
= Succ(Succ( double( Succ(Zero) ) ))
= Succ(Succ( Succ(Succ( double(Zero) )) ))
= Succ(Succ( Succ(Succ( Zero )) ))

Addition:

add( Zero,  N ) = N,   where  N  is any Nat-object whatsoever
add( Succ(M), N ) =  Succ( add(M,N) )

An example of 3 + 2:

add( Succ(Succ(Succ(Zero))),  Succ(Succ(Zero)) )
= Succ(  add(Succ(Succ(Zero)), Succ(Succ(Zero)) )
= Succ( Succ( add(Succ(Zero)), Succ(Succ(Zero)) ))
= Succ( Succ( Succ( add(Zero, Succ(Succ(Zero)) )))
= Succ( Succ( Succ( Succ(Succ(Zero)) )))

Multiplication:

mult( Zero,  N ) = Zero
mult( Succ(M), N) = add(N, mult(M, N))

This definition exploits the arithmetical fact that multiplication is repeated addition. Here is 3 * 2:

mult( Succ(Succ(Succ(Zero))),  Succ(Succ(Zero)) )
= add( Succ(Succ(Zero)),  mult( Succ(Succ(Zero)), Succ(Succ(Zero)) ) )
= add( Succ(Succ(Zero)),  add( Succ(Succ(Zero)),  mult( Succ(Zero), Succ(Succ(Zero)) ) )))
= add( Succ(Succ(Zero)),  add( Succ(Succ(Zero)),  
        add( Succ(Succ(Zero)), mult(Zero, Succ(Succ(Zero)) ))))
= add( Succ(Succ(Zero)),  add( Succ(Succ(Zero)),  
        add( Succ(Succ(Zero)), Zero))))

At this point, we can apply the definition of add to compute the final answer, Succ(Succ(Succ(Succ(Succ(Succ(Zero))))).

In a similar way, all of arithmetic can be defined as recursively defined operations on Nat objects, and indeed, all mechanical computation can be formalized solely in terms of recursive programming patterns on Nat objects.

Binary numerals

Electronic computers perform arithmetic on binary numbers , which can be exposed as an inductively defined data type:

An object is a BinaryNumeral-object if

it is the object, 0; or
it is the object, 1; or
it is a structure, [N]0, where N is an existing BinaryNumeral object. (Usually, we write this as just N0.)
it is a structure, [N]1, where N is an existing BinaryNumeral object. (Usually, we write this as just N1.)

For example, the binary numeral 13, normally written as 1101, has this internal structure: [[[1]1]0]1 The internal structure is important, because it is exploited in the wiring of arithmetic operations into a computer chip.

An example:

/** computing the decimal value of a BinaryNumber:  */
valueOf(0) = 0
valueOf(1) = 1
valueOf([N]0) = 2 * valueOf(N)
valueOf([N]1) = (2 * valueOf(N)) + 1

As an exercise, you might write the equations for adding, multiplying, etc., binary numbers. The equations you write turn out to be one form of the wiring diagrams taught in circuit theory.

Booleans

An inductive data type need not have recursion in its definition:

An object is a boolean-object if

it is the object, false; or
it is the object, true

Although there is no recursion in the inductive definition, the same principles apply when designing operations on elements of the type:

negation(false) = true
negation(true) = false

and(false, B) = false   (for any boolean, B)
and(true, B) = B        (for any boolean, B)

or(false, B) = B        (for any boolean, B)
or(true, B) = true      (for any boolean, B)

Binary Trees

Soon, we will study this data type in great detail, as it is perhaps the most important one in data structures:

An object is a BinaryTree-object if

it is the object, Leaf; or
it is a structure, Node(v, l, r), where v is an Object, and l and r are already existing BinaryTree-objects.