Graphs

A road map is an example of a graph. So is a circuit diagram, a hardware diagram, a flowchart, a UML class diagram, a work-flow chart, and a structure diagram of a database system. Graphs are common, and it is important to know how to model them and process them within a computer.

Definition of a graph

Precisely stated, a (directed) graph is

  1. a set, N, of nodes
  2. a set, E, of pairs of elements from N, called edges.
For example, this picture of part of the flight plan of American Airlines,

can be written precisely as the graph,
N = { KC, Chi, Dal, NYC }
E = { (Dal,KC), (KC,Chi), (Chi,KC), (Chi,Dal), (Dal,Chi), (Chi,NYC)}
Each edge has a source and a target (In the example, there is an edge whose source is Chi and whose destination is NYC, but there is no edge whose source is NYC. It is legal to have an edge whose source and destination are the same node.)

A path from a node A to node B is a sequence of zero or more edges that start at A, connect together, and end at B. In the example, there is a path from Dal to NYC but there is no path from NYC to Dal.

Notice that a tree is a special form of graph, where one node is called the root, and there is exactly one path from the root to all the other nodes. Unlike a tree, a graph can have multiple paths between nodes, and a graph can have cycles (loops), where a path can go from a starting node to other nodes and back to the starting node again.

Another form of graph is an undirected graph, where every edge from a node A to node B is read also as an edge from B to A. We draw an undirected graph without arrowheads on the edges, like this:


A graph is a weighted graph if some form of label or value (called a ``weight'') is attached to each of its edges. For example, mileages might be attached to the directed graph seen first,


The formal representation displays each edge as a triple:
N = { KC, Chi, Dal, NYC }
E = { (Dal,600,KC), (KC,500,Chi), (Chi,500,KC), (Chi,800,Dal), (Dal,800,Chi), (Chi,900,NYC)}

Representing graphs in a computer

There are two standard ways of storing graphs in a computer: the adjacency matrix and the adjacency list.

An adjacency matrix is a kind of ``mileage table'':

        Chi  Dal  KC  NYC
       +-----------------
Chi    |  0  800  500 900
       |
Dal    |800    0  600  -1
       |
KC     |500   -1    0  -1
       |
NYC    | -1   -1   -1   0
Here, the weights indicate the existence of an edge, and -1 (and 0!) means there is no edge.

The adjacency list organizes the graph's edges into sets, based on the edges' sources. Here is the adjacency list for the example:


As a rule, use an adjacency matrix to store a graph where there are edges between almost all the nodes; use an adjacency list when the graph has few edges, that is, it is sparse, that is, there are fewer than N * log2 N edges, for a graph with N nodes.

The adjacency matrix and adjacency list are ``raw'' forms of graph and are not oriented towards solving any particular problem. The forms of problems that one must solve are typically:

The best way to solve such problems is to extract information from the adjacency matrix/list and construct an appropriate tree. There are two important trees built from graphs: spanning trees and regular trees.

Spanning trees

A spanning tree is a tree that lists all the nodes in a graph but not necessarily all the edges. Its root is the ``entry node'' into the graph. Here are spanning trees that we can generate from the example directed graph where we use Dal as the entry node. For simplicity, we omit the weights on the edges, since spanning trees are normally used to answer questions about the nodes.


and here is a spanning tree for entry node, NYC:

Spanning trees are good for answering questions/do processing of the nodes in a graph, where the edges are unimportant to the answer. For example, we can use a spanning tree to answer: Is there a path from one node to another? Is one node connected to all others?

Regular trees

A regular tree is a tree that generates all the paths that can be taken starting from an entry node, where along every path, no node repeats. Here is the regular tree that generates all paths from entry node, KC:


The point is, when we reach a node that repeats in a path, it is the same as a backwards arc --- a loop:

Here is the regular tree that generates all paths from entry node, Chi:

Regular trees are built to answer questions/do processing of the paths in a graph, where the edges and weights are important to the answer. For example, we can use a regular tree to answer: list all the paths from one node to another; find the shortest (weighted) path from one node to another.

We now examine two standard and classic graph problems and show how to use the trees to solve them.

Application: modelling sets by spanning trees --- UNION-FIND

For simplicity, we work with undirected graphs in this section.

A collection of graph nodes is connected if for every node in the collection, there is a path to every other node in the collection. For this (undirected) graph:


There are two connected collections: {Tokyo, Osaka, Seoul} and {KC, Dal, Chi, NYC}.

The collections are disjoint ``sets,'' and such sets play a crucial role in answering questions of the form: ``Is there a path from node A to node B?'' or ``Do A and B belong to the same set?'' To answer the question, we check if A and B are in the same connected collection.

To answer such questions, it is useful to maintain spanning trees that include all the nodes of the graph. For the example, two spanning trees are required, one for each ``set'':


Now, to answer the question, ``can we go from A to B?'', we check if nodes A and B have the same root in the spanning tree(s) where they are saved. For example, we can check ``Can we go from Dal to NYC?'' by finding Dal and NYC and confirming that they share the same root, KC.

Another view is that each set is named by the root of its spanning tree --- in the example, we have the Tokyo set and the KC set. Then, the questions just asked are set-membership questions: ``Is Dal in the KC-set''?

The previous checks of ``connectivity'' take time O(log2N), where N is the number of nodes in the graph (assuming that we can locate the source and destination nodes in fixed time, which we can do if we save the addresses of all the spanning-tree nodes in a hash table or spelling tree).

The operation is called a FIND operation. There is also a UNION operation, which corresponds to unioning together two sets:

Say that flights are added between Chi and Osaka, that is, an edge has been added between these two nodes. How can we efficiently combine the spanning trees to reflect the new graph edge?

There is a simple solution to updating the spanning trees: draw a line between Chi's root and Osaka's:


This does a ``union'' of the two ``sets'' and is called a UNION operation.

One drawback of the simple definition of UNION is that it can create unbalanced spanning trees --- the above tree is not height-balanced. One way to improve UNION is to link the spanning tree with fewer nodes to the one with more nodes. For the above example, this would cause Tokyo's tree to link to KC's, which gives a slightly better balance.

But we can do better: the FIND operation is often improved so that it rebalances a spanning tree after it finds a path from a node in the spanning tree to its root --- each time a FIND operation is performed, all the nodes traversed on the path to the root are attached directly to the root. For the above example, a FIND operation on NYC would cause both NYC and Chi to be relinked directly to the root, Tokyo:

         Tokyo
  /     /   |   \    \
Seoul Osaka KC  Chi  NYC
            |
            Dal

Application: finding the shortest path --- Dijkstra's algorithm

Consider again this directed, weighted graph:

A classic question that we might ask is: What is the shortest path (in terms of the weights) from one node to another? For example, what is the shortest path from Dal to NYC?

A naive solution would enumerate all possible paths from Dal to NYC, say by using a stack and depth-first search, starting from ``root node'' Dal. But there is a smarter, faster solution, due to Dijkstra, which constructs a regular tree that enumerates only the ``short'' paths.

The key idea behind Dijkstra's algorithm is: generate a regular tree whose root is the start node; generate all possible paths from the root, but stop a path if either

  1. the next node in the path is one that has already appeared in the path (this signifies the path has entered a loop and is not the shortest solution; it is the usual criterion for stopping a path in a regular tree)
  2. the next node is the destination node (and so we calculate the weight of the path from the root to the destination and see if it is the least we have seen so far)
  3. the next node is a node that we have already reached in some other path in the tree and the weight of the other path is already less-than-or-equal-to the weight of the path we are just now constructing --- this means the path we are just now constructing cannot be the shortest path
To help calculate the weights of the paths, we will annotate each node of the regular tree with the weight of the path from the root to that node. To remember each shortest path so far from the root to any given node, we will keep an array to remember the weights.

Here is an example --- say we want the shortest path from Dal to NYC in the graph just seen. We begin with this this array and initial regular tree:

           Dal  Chi   KC  NYC
          +----+----+----+----+
shortest: |   0|   ?|   ?|   ?|
          +----+----+----+----+

The array notes that the shortest path from Dal to Dal is 0; the other distances are undetermined at this time.

The edges from Dal generate this slightly larger regular tree and update the array to read

           Dal  Chi   KC  NYC
          +----+----+----+----+
shortest: |   0| 800| 600|   ?|
          +----+----+----+----+

Now, we must develop the tree at the nodes KC and Chi. When we try to develop KC, we find an edge that leads to Chi:

But when we calculate the weight of the path from Dal to KC to Chi, we find that it is already larger than the weight already stored in the array as the shortest known weight from Dal to Chi. This means there is no need to develop this path further, and we stop:

When we develop Chi, we find three edges: the first, to KC is to a node that has been seen before. The array tells us that the path found earlier to KC had weight 600, so the current path, which has weight 800+500 = 1300, is not least and can be terminated. The next edge leads to to Dal and a repeat of a node already seen on this path, so there is no way that this path can be shortest; we terminate this path. The third edge goes from Chi to NYC, and we update the array with our discovery:

           Dal  Chi   KC  NYC
          +----+----+----+----+
shortest: |   0| 800| 600|1700|
          +----+----+----+----+

The path from Dal to NYC is complete, and we stop this path.

At this point, all paths in the regular tree are completely developed, and we quit --- the shortest path goes from Dal to Chi to NYC.

Finally, we note that, in the previous example, the regular tree itself need not be explicitly constructed in computer storage. Instead, we keep a second array that remembers the edges that yielded the shortest path. Here is the example repeated, with the two arrays, one that remembers the weights and one that remembers the individual edges in the shortest path that we build:

           Dal  Chi   KC  NYC
          +----+----+----+----+
shortest: |   0|   ?|   ?|   ?|
          +----+----+----+----+

	  Dal  Chi   KC  NYC
	 +----+----+----+----+
previous:|null|   ?|   ?|   ?|
	 +----+----+----+----+
After the moves from Dal:
           Dal  Chi   KC  NYC
	  +----+----+----+----+
shortest: |   0| 800| 600|   ?|
          +----+----+----+----+

	   Dal  Chi   KC  NYC
          +----+----+----+----+
previous: |null| Dal| Dal|   ?|
          +----+----+----+----+
For city C, previous[C] remembers the immediate predecessor node whose edge delivers us to city C on the shortest path. For example, previous[KC] tells us that the shortest known path from Dal to KC ends with an edge from Dal to KC, and shortest[KC] tells us that the weight of the shortest path from Dal to KC is 600.

After the attempted moves from KC, there are no change to the arrays. The move from Chi to NYC gives the answer and the last edge in the shortest path:

           Dal  Chi   KC  NYC
	  +----+----+----+----+
shortest: |   0| 800| 600|1700|
          +----+----+----+----+

	   Dal  Chi   KC  NYC
          +----+----+----+----+
previous: |null| Dal| Dal| Chi|
          +----+----+----+----+
Now, we see that the shortest path from Dal to NYC must end with the edge from previous[NYC] to NYC, that is, from Chi to NYC. And, we see that the shortest path from Dal to Chi must end with the edge, previous[Chi] to Chi.

Indeed, we can assemble the path from the edges, listed in reverse: NYC-Chi-Dal (null).