The `Java Collections` components

Many of the data structures we studied are already coded for you in the Java libraries, within the package, java.util. Here is a brief summary.

Stacks and Vectors

Two classes belong to the earliest version of java.util:

class Stack: This is an array-based implementation of a stack, with the standard push, pop, and top (called peek) operations.
class Vector: This is an array that can grow as needed. You can insert an object into a Vector by adding it to the end of the array: addElement(Object e) (and the array grows by one cell). Or, you can use an integer index like with the standard array: add(int index, Object e) (This operation will shift the elements of higher index to the right by one cell, and the array grows by one.)
Lookups can be done with an index: get(int index); or, you can extract elements from either end of the array without giving an index number. Because the insertion operation does not overwrite a cell in the Vector, there is an explicit remove operation for deleting a value from a Vector.

The ``Collections Framework''

Within java.util is a a family of data structures that share standard operations and properties; they are called the ``collections framework.'' The framework is ``defined'' by several Java interfaces that state some standard operations that classes in the framework must implement.

Here are the two most important interfaces:

interface Collection: The interface states the standard operations that one would expect of a data structure (a ``collection''): insertion, lookup, deletion. Here are the Java names:

public boolean add(Object o)
public boolean contains(Object o) (This is a kind of lookup.)
public boolean remove(Object o)
public boolean isEmpty()
public Iterator iterator() (This will be explained later.)
interface Map : The operations use keys to do insertions, etc:

public boolean put(Object key, Object o)
public Object get(Object key)
public boolean remove(Object key)
public boolean isEmpty()
public Iterator iterator()

Here are two interfaces that add more operations to Collection:

interface List extends Collection, which gives you operations to add and look up objects using array-indexes. (This means the data structure is a kind of array or numbered sequence --- it's too bad that they call it a ''list'' !)
interface Set extends Collection. This interface requires that there are no duplicate objects in a collection --- like a set. It includes operations for set-like operations like union, intersection, and set subtraction.

Classes that implement `interface List`

There are two important classes that implement interface List, that is, are numbered sequences:

class ArrayList: This is really just a Java Vector, that is, an array that grows as needed, recoded to fit into the Collections package.
class LinkedList: This is just a singly-linked list, extended with operations that let you find Cell number k in the list and return the object held in it. (Also, you can insert a new object into the middle of the linked list by adding it at position k.)

You should use these two classes to build other data structures that must be ``smart'' arrays or ``smart'' linked lists. For example, you might build a class Queue like this:

import java.util.*;
public class Queue
{ private LinkedList my_queue;

  public Queue()
  { my_queue = new LinkedList(); }

  public enqueue(Object ob)
  { my_queue.add(ob); }  // adds ob to the end of the linked list

  public Object dequeue()
  { Object answer = null;
    if ( !my_queue.isEmpty() )
       { answer = my_queue.remove(0); } // remove the front object in the list
    return answer;
  }
    ...
}

Classes that implement `interface Map`

These classes store ``Records'' --- objects paired with their keys:

class TreeMap implements Map: This is an ordered binary tree (a binary search tree) that uses the keys to store the objects. The tree is balanced using a ``red node-black node'' balancing strategy, which a bit more complicated than the AVL-balancing strategy but uses similar ideas.
class HashMap implements Map: This is a classical hash table, where you must state the size of the hash table when you construct a HashMap object. When a key,object pair are inserted, the key is converted into a hash code using polynomial coding with base 31. Collisions are resolved using linked-list chains (``buckets'') within the array elements.

Classes that implement `interface Set`

Interface Set is supposed to describe data structures that implement sets, having operations for set membership, union, and intersection. The Java language does not do well at providing set data structures, so instead we are asked to choose between a tree-simulation of a set and a hash-table simulation of a set. Neither solution is ideal.

class TreeSet implements Set: It's a binary tree that does not use any keys to save its objects. (Instead, the object's value is used as a ``key'' for storing the object in the tree.)
class HashSet implements Set: This is a hash table, where hash codes are manufactured from the object's value.

Indeed, both TreeSets and HashSets are really just TreeMaps and HashMaps that manufacture their own keys for the objects that are inserted.

Iterators

One standard sticky problem with data structures is printing the structure's contents in a simple way --- for example, we might copy the objects within a binary tree or a hash table into an array and return the array for printing.

An iterator is an ordered ``array'' of the contents of a data structure. There is a Java interface, interface Iterator. An iterator has at least these two methods:

public Object next(): shows us one of the objects in the data structure that we have not yet seen
public boolean hasNext(): tells us if there are more objects to look at

To understand these operations, let's compare them to an array.

Say that we copied the objects held in a tree into an array named iter. Then we write this loop to print the contents:

Object[] iter =  ... copy contents of tree into array ...

for ( int i = 0;  i != iter.length;  i = i + 1 )
    { Object next_object = iter[i];
      System.out.println( next_object.toString() );
        // remember that  toString  is a Java method that tries to convert
	// an object into a string for printing.  It often works!
    }

You do the same work with an iterator: Say that you built a data structure, my_data_structure, with one of the Java Collections classes listed above. Next, you added some objects into the structure, and now you want to print the contents:

Iterator iter = my_data_structure.iterator();  // copies the objects in
                    // my data structure into an ``array'' named  iter
while ( iter.hasNext() )  // are there more objects to look at ?
      { Object next_object = iter.next();   // get the next object
        System.out.println( next_object.toString() );
      }

The iterator structure hides the details about whether an array or a linked list or whatever is the best structure for returning the contents of a data structure for printing.

Summary

Now that you understand and know how to program linked lists, ordered trees, and hash tables, you can intelligently use the classes in the Java Collections package and save time when you are next asked to build a ``smart'' data structure.

You can read the local documentation for java.util at

http://www.cis.ksu.edu/VirtualHelp/Info/JDK1.3.1/docs/api/java/util/package-summary.html

Or, visit Sun's web site, java.sun.com, for the latest writeup.

The Java Collections components