Copyright © 2010 David Schmidt

Chapter 6:
The imperative paradigm: From an assignment core to Java


6.1 A core imperative language with type templates
    6.1.1 Using type templates for storage allocation and data-typing
6.2 Data-structure extensions
6.3 Control-structure extensions
6.4 Component extensions: Abstracts
6.5 Component extensions: Parameters
6.6 Component extensions: Blocks
6.7 Appending structs: subclasses and virtual methods
6.8 Conclusion


When one encounters a massive language, like C++ or C#, for the first time, one is tempted to ask, ``Who thought up this mess?'' Indeed, to understand a programming language, one must look past piles of syntax and identify the language's structure. The principles we studied in the previous chapters let us do that.

Languages that do computation with storable values are called imperative languages, because commands (like assignment) give orders --- imperatives --- about updating storage. This chapter presents the core of an imperative language and applies the extension techniques to grow the language into a modern, object-oriented language like Java or C#.


6.1 A core imperative language with type templates

Our starting language begins with these characteristic domains:
expressibles: integers, booleans, and handles to int-arrays and structs-of-ints
denotables: integers and handles to int-arrays and structs-of-ints
storables: these are exactly the denotables.
In one sentence, the imperative language computes on ints and stores them in arrays and structs:
===================================================

P : Program
D : Declaration
T : TypeTemplate
C : Command
E : Expression
L : NameExpression
N : Numeral
I : Identifier

P ::=  D ; C

E ::=  N  |  L  |  E1 + E2  |  E1 - E2  |  E1 > E2  |  E1 == E2  |  not E  |  new T

C ::=  L = E  |  print L  |  C1 ; C2  |  if E { C1 } else { C2 }  |  while E { C }

D ::=  var I = E  |  D1 ; D2

T ::=  struct (I : int)+  end  |  array[N] of int

L ::=  I  |  L . I  |  L [ E ]

N ::=  0 | 1 | 2 | ...

I ::=  alphanumeric strings beginning with a letter, not including keywords

===================================================
This is an assignment language, but the most interesting part of modern assignment languages is how variables are declared and how their storage is allocated. We write
var x = 3;
var z = new struct f: int; g: int end;
var r = new array[4] of int
to create three variable names in the program's namespace: x, which holds value 3; z, which holds a handle to a newly created struct object that holds two uninitialized ints named f and g; and r, which holds a handle to a newly created four-celled array object. Here's a picture:

Now, this next point is really important, and it is a key feature of modern imperative (object-oriented) programming: a TypeTemplate phrase is a storage-allocation template. Consider the above example one more time:

  1. new struct f: int; g: intend allocates storage for an object (struct) that holds two ints. This is because struct f: int; g: intend is a storage-allocation template for allocating an object that holds two cells.
  2. new array[4] of int allocates storage for 4 ints. This is because array[4] of int is a storage-allocation template for allocating a vector of 4 cells.
Type-structure phrases define object-allocation layouts. When we write new T, we activate T to construct an object with layout T. When we use var I = new T, we create a name, I, along with a cell that holds the handle to the object allocated from T.


6.1.1 Using type templates for storage allocation and data-typing

Let's constrast the above fragment,

var x = 3;
var z = new struct f: int; g: int end;
var r = new array[4] of int
to that in C and Pascal. To obtain the same allocations in C, one would write
===================================================

int x = 3;
struct MyStruct {int f; int g};   // you first name the allocation template;
struct MyStruct z;                // then you use it to declare a struct
int[4] r     // or,  int r[4]

===================================================

Like our model core language at the beginning of this chapter, C (and Pascal) use type-template phrases to allocate the correct amount of storage for struct z and array r. But C and Pascal also use the type-template phrases as data-type names, that is, ''z has data type MyStruct'' and ``r has data type int[4].''

The data-type names are used in the rest of the program to confirm that the variables are used properly. For example, C's type checker will validate that this assignment is consistent with z's and r's declarations:

z.f = r[2] + 1
but this one is not:
x = z[3] - r    // this is a ``type error''
Since TypeTemplate phrases can be used as data-type names as well as storage-allocation templates, we call them ``type templates.'' But remember: you can switch off C's type checker and then you are using the TypeTemplate phrases for their primary use --- to allocate storage.

Java and C# also use type templates as both data-type names and storage-allocation templates. The earlier example looks like this in Java/C#:

===================================================

int x = 3;
class MyStruct {int f; int g};   // you must declare the struct as a class 

MyStruct z = new MyStruct();     // Why must we say,  MyStruct,  twice ??

int[] r = new int[4]             // Why must we say,  int[..]  twice ??

===================================================
Notice the two questions just above? Java and C# make you use the type template first to attach a data-type name to a variable name and second to allocate the object that the variable names.

In Java, if you write merely,

MyStruct z;
int[] r;
then only the data-type names are attached to z and r and no struct and no array objects are allocated. Instead, z and r each have value nil (``no value'') in the namespace.

One might declare z and r without initial values so that the Java compiler learns their data types for future type checks (e.g., to enforce that only handles to MyStruct objects are assigned to variable z and that only handles to int arrays are assigned to variable r.)

We will not devote a lot of time in this chapter to data-type checking with type templates, but their use does not always go smoothly. Here is a standard problem:

===================================================

class MyStructA {int f; int g}; 
class MyStructB {int g; int f}; 
MyStructA x = new MyStructA();
MyStructB y = x      // Is this allowed or not ?
                     // After all, the struct has exactly the required fields ?!

===================================================
Java/C# consider this example erroneous; Algol68 does not. At the end of the chapter we study some difficult questions that arise when we use type templates as data-type names.


6.2 Data-structure extensions

The language core defined at the beginning of this chapter summarizes the state of imperative languages in the 1960s-1970s. A key characteristic of these languages was the restriction on what values can be saved as elements within a data structure like an array or a struct.

The data-structure extension principle suggests that all storable values --- data structures --- should be storable as elements within all other data structures. This means we should allow arrays of arrays and structs, and structs of arrays and structs.

We apply the data-structure extension principle with this simplication to the syntax of TypeTemplate:

===================================================

T ::=  struct D end  |  array[N] of T  |  int

===================================================
That is, a struct's body can declare any possible data structure, and an array can declare elements of any possible data structure. Finally, we make int the starter type template. It is often the case that, by simplifying a language's syntax, we develop a more general-purpose language as a result.

(Of course, there is a cost to such generalization. For example, the C language is designed so that storage is a sequence of integer cells, and data structures like arrays and structs are an illusionary naming convention for a sequence of contiguous cells. This storage model lets C programs compile into simple, fast target code --- C is meant for writing device drivers, not databases.)

With the new syntax of TypeTemplate, we can write nested data structures. Here is a struct named database that holds an int and an array of 100 structs:

===================================================

var database = new struct var howmany = 0;
                          var table = new array[100] of
                                            struct var idnum = new int;
                                                   var balance = 0
                                            end
                   end;
// insert an entry:
count = database.howmany;
if not(count == 100) {
    database.table[count].idnum = 999999;
    database.table[count].balance = 25;
    database.howmany = count + 1
}

===================================================

In the example, you should read database.table[count].idnum as ((database.table)[count]).idnum.

Say that we use C/Pascal-style semantics for this example: the declaration of database allocates storage for all 100 of the struct objects in table before a single entry has been entered into the database. Here is a picture of what the above code created:


Using the diagram, we can state precisely what database.table[count].idnum denotes:

Of course, you can write the previous code also like this:
count = database.howmany;
if not(count == 100) {
    item = database.table[count];   // item  has value  δ0
    item.idnum = 999999;            // δ0.idnum  is assigned 999999
    item.balance = 25;              // δ0.balance  is assigned 25
    database.howmany = count + 1
}

===================================================

The diagram shows that C-style allocation allocated all the array's structs at once. But this is a waste of storage if the database is mostly empty all the time.

For this reason, Java and C# take the opposite approach --- a nested type template allocates only the topmost level of object. You must use lots of new commands to construct the objects that fill the lower levels. This sounds strange, but programmers quickly learn to do this. Here is the above example written in Java style:

===================================================

// this template can allocate a two-celled struct object:
class ENTRY{ int idnum; int balance = 0 }

// this template can allocate a struct that holds one int cell and 100 cells
//  that can hold handles to other objects:
class DB{ int howmany = 0;  ENTRY[] table = new ENTRY[100] }

DB database = new DB();   // allocates one DB object but _no_ ENTRY objects!!!

// now, insert one ENTRY object into  database:
count = database.howmany;
if not(count == 100) {
    ENTRY item = new ENTRY()  // allocate the object only when needed !
    item.idnum = 999999;
    item.balance = 25;
    database.table[count] = item;  // store the object's handle in the array
    database.howmany = count + 1
}

===================================================
Here is a diagram of what this code constructed:

In summary, be careful if you move from C# to C (or vice versa) --- the two languages allocate objects differently! (If you want to do ``lazy allocation'' in C, you must use pointers and malloc, somewhat like this:

int* r;             // declares  r  as a pointer
int howmany = 100   // number of array elements desired
// save in  r  the address of a 100-celled array:
r =  (int*)(malloc(howmany * sizeof(int)));
// assign  99  to cell 0 of the array:
r*[0] = 99     
)

Languages like Python and Ruby let you "declare" and allocate a nested data structure one layer at a time. We can do this in our core language, by changing the syntax of TypeTemplate like this:

T ::=  struct D end  |  array[N]  |  int
Now, array[N] allocates an N-element array object whose cells all hold nil. Later, we fill in the structures within the array. The database example would get rewritten like this, which is a style halfway between the previous two examples:
===================================================

var database = new struct var howmany = 0;
                          var table = new array[100] 
                   end;
// insert an entry:
if not(database.howmany == 100) {
    database.table[howmany] = new struct var idnum = 999999;
                                         var balance = 25
                                  end;
    database.howmany = database.howmany + 1
}

===================================================
The result would be the same storage layout that we saw in the previous diagram.

This setup will let us insert values of different type templates into the same array, e.g., we can say next

database.table[howmany] = new array[2]
mixing structs with arrays in the same array. Perhaps you like this; if not, you add type checking like Java's.

Review

For the sections that follow, we will build on this syntax for declarations and type templates:
===================================================

E: Expresssion         N: Numeral
D: Declaration         T: TypeTemplate
I: Identifier

E ::=  N  | ... |  new T
D ::=  var I = E  |  D1 ; D2
T ::=  struct D end  |  array[N] of T  |  int

===================================================
To recap:

Exercises:

  1. Add Declaration, TypeTemplate, and the new T construction to the interpreter for the object language in Chapter 2. Build the interpreter to use C-style storage allocation for type templates.
  2. Rebuild the interpreter from Exercise (1) to use Java-style storage allocation for type templates.
  3. For either of the two previous exericses, add type checking to your interpreter, that is, when a variable is declared as var I = new T, template T is saved with I, its cell, and its value in the namespace; all subseqent assignments to I must be values that have type T. (A declaration, var I = E, computes both the type and value of E, and these are saved with I and its cell.)
  4. Now, "split" your interpreter from Exercise (3) into two parts: the first part is a type-checker program: it accepts an operator-tree representation of a program as its input, and it computes only the data-type information of the variables in the program --- it does not compute the values of expressions! If a program can be completely "type-interpreted" with no errors, then the second part, the interpreter from Exercise (1) or (2), is executed on the "type-checked" program. (A compiler-based implementation of a language will do parsing and type checking somewhat like this. The idea is to remove all the types and type checking from the run-time execution.)


6.3 Control-structure extensions

Commands use these control structures.
C ::=  . . .  |  C1 ; C2  |  if E { C1 } else { C2 }  |  while E { C }
The Declaration domain uses the sequencing control structure, ;, which is important because declarations can be initialized like this:
int x = 2;
int y = x * x
The order of the declarations, stated by ;, matters.

Since the language has arrays, it is useful to introduce a control structure that lets us ``iterate'' (process) an array's elements. Perhaps we add a for-loop, which counts upwards from some lower bound to some upper bound by ones:

C ::=   ... | for I = E1 upto E2 : C
Within the for-loop, variable I is used as an index for locating elements in an array, e.g.,
var evens = new array[9];
for index = 0 upto 8:
    evens[index] = index * 2
Does a struct require a control structure for ``iteration''? If we wanted one, it might look like this:
C ::=   ... | for I in fieldsOf L : C
The loop would look up each field in struct L which would be used to index L's fields. Here is an example:
var evens = new struct var a = 0;
                       var b = 2;
                       var c = 4
            end;
var sum = 0;
for k in fieldsOf evens :
    sum = sum + evens.k ;


6.4 Component extensions: Abstracts

The abstraction principle says we may add abstracts for each of the language's domains.

Procedures inside structs

In an earlier chapter, we saw that procedures are just named commands. Here they are, again:
===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C } 

C ::=  L = E  |  C1 ; C2  |  ... |  L()

T ::=  int  |  struct D end  |  array[E] of T

E ::=  N  |  ...  |  new T

L ::=  I  |  L . I  |  L [ E ]

===================================================
The definition construction is proc I(){C} and the calling construction is L().

The syntax for structs, seen above, exposes an important idea --- we can embed procedures within a struct. This insight led to the development of object-oriented programming as first seen in the Simula67 programming language --- objects are structs that contain both procedures and variables. Here is an example:

var clock = new struct var time = 0;
                       proc tick(){ time = time + 1 };
                       proc display(){ print time };
                       proc reset(){ time = 0 }
                end
The procedures inside clock maintain variable, time. We use clock like this:
clock.tick();
clock.tick();
clock.display()
Here is a diagram of storage after clock is declared and the previous calls are finished:

The picture shows that the clock object is implemented as a namespace, where tick, reset, and display are names of closure objects. When one calls clock.tick(), the following happens:
  1. clock is looked up in global namespace α; its value is β. This makes the call into β.tick().
  2. Within namespace β, tick is located; it names the stored procedure at address γ. The computer executes the stored code for tick, as pointed to by closure γ.
  3. A new namespace, ρ, is created for the called procedure, just like we saw in the previous chapter. Note that tick's parentns field (the link to global variables) is set to β. So, tick's code can look up and increment the nonlocal variable, time. Also, tick can call the other procedures that live with it in object β. Here is the storage configuration when the code for tick is executed:
    
    
  4. tick's code executes, and time within namespace β is incremented.
  5. Once tick completes, namespace ρ disappears.
Finally, note that the parentns link within namespace β is set to α; this allows the object to find global variables outside its ownership. (If we don't want an object to see variables outside its own namespace, we set parentns to nil.)

This example shows that

objects are structs; they are implemented as namespaces.

Type-structure abstracts: Classes

The structs we have coded so far are getting hard to read. This is why languages like Pascal, ML, Java, etc., let us name type templates. A type-template abstract is called a class. Here is how we define and call TypeTemplate abstracts:
===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C }  |  class I = T

T ::=  int  |  struct D end  |  array[E] of T  |  L

===================================================
Note that the L in the BNF rule for T is the construction that calls a named class. Here is how we might use a class to declare the database seen earlier:
===================================================

class Entry = struct var idnum = new int;
                     var balance = 0
              end;

class DB = struct var howmany = 0;
                  var table = new array[100] of Entry 
           end;

var database = new DB

===================================================
The classes divide the definition into readable substructures. Java people say that variable database ``denotes an object of class DB'' or ``database has type DB.'' But classes are not, strictly speaking, data types, even though Java lets a programmer think they are.

Please remember what classes (and TypeTemplate phrases) really are: A class is a storage-allocation template. When we define a class, we define an allocation template, and when we use the keyword, new, we activate the template to allocate storage. The Java compiler tries to use the class names as data types, but this can lead to trouble, as we see at the end of this chapter.

This development shows that a core assignment language with structs already has the computational power to do object-oriented programming --- it is merely a matter of introducing a couple of key abstracts (command abstracts and type-template abstracts) for convenience.

Declaration abstracts: Modules

There are other opportunities for applying the abstraction principle. One candidate is the Declaration domain. This gives a form of module --- a named, compound declaration that can be ``imported'' into a program. The syntax looks like this:
===================================================

D ::=  var I = E  |  D1 ; D2  | ...  |  module I = D  |  import L

===================================================
A module differs from a class because a called module activates a set of declarations at the point where it is imported (called); the importation ``links'' the module's declarations to the program. In contrast, a class is called to allocate storage.

Here is an example to make this point:

===================================================

module M = 
    var x = new int;
    class C = struct var a = 0  end;
    var y = new array[10] of C;
    proc initialize(){
        x = 0;
        for i = 0 upto 9 { y[i].a = i }
    }

// The above code often resides in a separate file, named  M.

import M;     // activates and links M's declarations to this program
initialize()  // calls the proc as if it were declared in the program
var z = new C;   // class  C  is used as if it were declared in the program
z.a = x + y[0].a

===================================================
Once M is imported, its declarations are linked to the program as if they had been written there in the first place.

Since module importation is a kind of linking, does it make sense to ``import'' twice? That is, can we do this?

import M;
import M
and is it the same as importing M just once? Most languages ignore repeated imports of the same module.

Here is another question: Should we allow this example?

===================================================

module M = var x = 7;

module N = var x = 99;

import M;
import N;
x = x + 1    // which variable  x  is updated?

===================================================
This is a serious issue when a large program is assembled from many modules that are linked together --- there is always a chance that two distinct modules declare the same name. For this reason, most languages require that a module's names are referenced with dot notation, like this:
===================================================

module M = newstruct var x = 7 end;  // might be in a separate file

module N = newstruct var x = 99 end;  // might be in a separate file

import M;
import N;
N.x = M.x + 1

===================================================
Languages that use modules-as-structs often add an operation that ``opens'' the module so that the declarations are exposed like we saw them in the first place:
module M = newstruct var x = 7 end;

from M import *;  // import *   means that all declarations in M are linked
                  // as if they were declared in the program.
x = x + 1
A variation of this operation opens the module for a limited scope:
module M = newstruct var x = 7 end;

module N = newstruct var x = 99 end;

import M;
import N;
with M do {
   N.x = x + 1   // inside the  with M do,  references to  x  mean  M.x
}


6.5 Component extensions: Parameters

In the previous chapter we learned how to add parameters to abstracts. These same techniques are useful for adding parameters to classes and modules. Here is an example of a parameterized class:

===================================================

class Entry = struct var idnum = new int;
                     var balance = 0
              end;

class DB(size) = struct var howmany = 0;
                        var table = new array[size] of Entry 
                 end;

var database = new DB(100)

===================================================
The parameter, size, lets us adjust the size of a new database.

Many object-oriented languages transmit the argument to a class's constructor method and not to the class itself: Here's how the above example looks in Java:

class Entry { int idnum;  int balance = 0 }

class DB {
    int howmany = 0;
    Entry[] table;

    DB(int size) {
        table = new Entry[size]
    }
}

DB database = new DB(100);
Modules might also be parameterized, perhaps by expressions or even type-templates. Consider this example, which defines a database module parameterized on an int and the type-template format to be stored in the database table:
===================================================

module DataBase(size, recordTemplate) =
    var howmany = 0;
    var table = new array[size] of recordTemplate;

    proc initialize(){
        for i = 0 upto size - 1 {
            table[i].init() }   // oops -- how do we know  recordTemplate  contains proc  init ?
    };

    proc find(index):
        if index >= 0 and index < size {
            answer = table[index].getVal()  // how do we know  recordTemplate  contains function  getVal  ?
        return answer
    }

===================================================
If we had this class,
===================================================

class Entry = struct var idnum = new int;
                     var balance = new int;
                     proc init(){
                         idnum = -1;
                         balance = 0
                     };
                     fun getVal(){ return balance }
              end;


===================================================
we could activate (import) the module like this:
Database(100, Entry)
This allocates storage for howmany and table, where the latter is an array of 100 Entry structs.

The coding of DataBase is suspect --- it assumes that whatever the recordTemplate type-template might be, it includes a procedure named init and a function named getVal. To ensure the security of module Database, we should annotate its parameters with these requirements. The data-type-like annotations are called an interface.

Here is a Java-like coding of the interface that we want:

===================================================

interface RecordInterface = struct init: void -> void;
                                   getVal: void -> int
                            end;

module DataBase(size: int, recordTemplate: RecordInterface) = ... like before ...

===================================================
The interface gives enough information that the programmer or compiler can check that the module is coded sensibly. The annotation of size ensures that an int argument will be bound to it, and the annotation of recordTemplate ensures that a struct object with at least an init procedure and a getVal function will be bound to it.

The type-template argument that is bound to parameter recordTemplate must match the interface; it must ``implement'' it, as one says in Java-speak:

Database(100, Entry)   // Entry  implements (matches) RecordInterface

In a similar way, we can use interfaces for module arguments to modules, to enfore correct linking:

===================================================

interface DataBaseInt = { howmany: int;
                          initialize: void -> void;
                          find: int -> int
                        };

module System(db: DataBaseInt) =   // db binds to a declaration that holds the
                                   // identifiers listed in  DataBaseInt
    var current = new int;
    var value = new int;

    db.initialize();
    current = db.howmany - 1;
    value = db.find(current)
;
 . . .

import System(import Database(100, Entry));   // link  Database  to  System

===================================================
Features like these are found in the module-oriented languages, Modula2 and Ada.


6.6 Component extensions: Blocks

We saw in the previous chapter how procedures can declare and use local variables. This idea is used to good effect with classes and modules, so that they can own private variables that cannot be altered from outside the scope of the abstract.

Here is the syntax of declaration blocks and type-template blocks:

===================================================

D ::=  var I = E  |  D1 ; D2  |  class I = T  |  module I = D  |  import L  |  begin D1 in D2 end

T ::=  int  |  struct D end  |  array[E] of T  |  L  |  begin D in T end

===================================================
Returning to the database example, we can improve the definitions so that the variables owned by the database are made private:
===================================================

class Entry = struct var idnum = -1;
                     var balance = 0
              end;

class DB(size) = struct
                    begin var howmany = 0  // these two declarations are private
                          var table = new array[size] of Entry  // to the struct
                    in 
                          proc find(i){ ...table[i].balance()... }
                          proc update(i,...){ ...table[i] ... howmany ... }
                 end end;

module DataBase(max) = begin var mybase = new DB(max)  // this declaration is private
                       in       
                             proc searchDataBase(...):
                                 ...mybase.find(...)...
                           
                             proc processDataBase(...):
                                 ...mybase.update(...)...

                             // but we cannot reference  mybase.howmany
                             // nor  mybase.table
                       end;

import DataBase(100);
DataBase.searchDataBase(...);
DataBase.processDataBase(...)
// but we cannot reference  Database.mybase

===================================================
The example shows how protection was placed around private variables howmany and table within DB so that once a DB object is allocated, all uses of the two variables must be made via the public procedures, find and update. The same idea is used to protect the struct, mybase, within module DataBase.

Of course, in C# and Java, the keyword, private, is used to label a declaration as local to a block.

The qualification principle makes it possible to encapsulate declarations within components so they are safe from unauthorized use, no matter where the component is inserted into a system. Large systems building is possible only because of the qualification principle.


6.7 Appending structs: subclasses and virtual methods

Two distinctive features of object-oriented languages are subclasses and virtual methods. These concepts are surprisingly complex, yet their correct use is critical to successful object-oriented programming.

Let's review subclasses with a small example from Java. Here is a class named Point:

===================================================

class Point {
    int x;  int y;
  
    Point(int initx, int inity) {   // constructor method
        x = initx;  y = inity
    }
    void paint() {
        System.paintPixel(x,y,255,255,255)   // paints a white pixel at x,y
    }
    boolean equals(Point q) {   // compares this point to point  q
        return  x == q.x  &  y == q.y
    }
}

===================================================
The class, which might reside within a graphics framework, defines two fields and a method that paints the point and a method that compares an object constructed from this class to another that has the structure of a Point, for example:
Point p1 = new Point(0,0);
Point p2 = new Point(1,1);
p1.paint();  p2.paint();
boolean b = p1.equals(p2);   // will compute to False
The graphics framework might also deal with color, so there is the notion of a colored point, which is a graphics point (pixel), colored with RGB (red-green-blue) coding:
===================================================

class ColoredPoint extends Point {
    int[] color;

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);
        color = new int[3];
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }
    void paint() {
        System.paintPixel(super.x, super.y, color[0], color[1], color[2])
    }
    boolean equals(ColoredPoint q) {
        return  super.equals(q)  &&  
                color[0] == q.color[0]  &&
                color[1] == q.color[1]  &&
                color[2] == q.color[2]
    }
}

===================================================
A ColoredPoint builds on the structure of a Point, adding a data structure that remembers RGB-values. For this reason, its paint method is rewritten so that it uses a point's location as well as its color. The recoding of paint is called a method override, because there is already a coding in the superclass but we are overriding it with a new coding. Method override is a key technique used in graphics libraries written in object-oriented style: a graphics widget, say a Button, is coded with simplistic paint and interrupt handling methods, which the programmer later overrides with more detailed ones.

Notice how the coding in ColoredPoint can obtain the values of x and y in Point with the label, super.

In the above example, to compare one colored-point object equal to another, we must also change equals: we must check that the x,y coordinates are identical (this is done with a call to super.equals --- the equals method in the super class, Point) as well as the RGB integers are equal.

Here are examples of Java points and colored points:

===================================================

Point a = new Point(0,0);  // at position 0,0 --- the upper left corner
ColoredPoint b = new Point(0,0, 255, 0, 100);   // violet
ColoredPoint c = new Point(0,0, 255, 0, 0);     // red

a.paint();  // paints a white point at 0,0 -- uses  paint in class Point
b.paint();  // paints a violet point at 0,0 -- uses  paint in class ColoredPoint

a.equals(b);   // calls  equals in  Point;  returns  True
b.equals(c);   // calls  equals in  ColoredPoint;  returns  False
b.equals(a)    // calls  equals in  Point(!);  returns  True

===================================================
The call, a.equals(b), uses the equals method attached to object a, a Point object, to see if object b has the same x,y coordinates; the color information in b does not matter --- from the perspective of a, b is ``equal'' to it.

The call, b.equals(c), uses b's equals method, within ColoredPoint, to compare b's position and color to c's. Finally, the third call, b.equals(a), is surprising, since b's equals method, expects a ColoredPoint argument. Actually, b possesses two equals methods --- one for comparing itself to ColoredPoint objects and one for comparing itself to Point objects. Here, the latter is used to force an equality comparison between a and c. Is this what you expected? Is it what you want? Why did it happen?

The answer is that objects are structs, and a subclass, like ColoredPoint, defines a struct type template that is the Point struct appended to the ColoredPoint struct. It is as if class ColoredPoint defines this type template:

{ int x; int y;
  void paint() { ... }  // this method will never be used (!)
  boolean equals(Point q) { ... }
  int[] color = ...;
  void paint() { ... }  // this newer method is always used
  boolean equals(ColoredPoint q) { ... }
}
and object b contains exactly these fields. The first occurrence of method paint is overridden by the second. The name, equals, is overloaded, because it has two ``bodies'', which are chosen based on the data type (pattern) of argument that is used when equals is called.

Stated more precisely, when a call like b.paint() or b.equals(c) occurs, the fields of object b are searched from newest to oldest (``last to first,'' in the above coding) until there is a match of a method name and its parameter types to the call. This is how b.equals(a) executes the variant, boolean equals(Point q).

If you can understand the above story, congratulations! But the explanation is a complicated mess. Let's start over. Returning to our core imperative language, we get subclasses like this: first, we add an append operation, +, to the syntax of type templates:

===================================================

T ::=  int  |  struct D end  | ... |  T1 + T2

===================================================
This lets us code structs in stages:
var p = new (struct
              var x = new int;
              var y = new int;
             end       
           +
             struct
              color = new array[3] of intecell
             end)

p.x = 0; 
p.color[1] = 255
The above example does not seem to add much, but once we add class names (and procedures/methods), it gets interesting:
===================================================

class Point = struct
         var x = new int;
         var y = new int;
         proc paint() { ...x...y... }
      end;

class Color = struct
        var color = new array[3] of int;
        proc paint() { ... super.x ... super.y ... color ... }
      end;

class ColoredPoint = Point + Color;   // aha!

var p = new ColoredPoint; // allocates a struct with all the fields (``attributes'')
                          // of  Point  and also  Color
p.x = 0;
p.color[1] = 255

===================================================
Some object-oriented languages use classes exactly as shown here. (The class-fragments are called mix-ins.)

The mainstream object-oriented languages (e.g., Java) allow only an ``incremental'' append, where a ``base class'' is ``extended.'' ColoredPoint is a subclass of Point because it builds on Point --- it has all Point's fields and then some. Subclassing is an abbreviation for appending structs.

To understand the complex Java example shown earlier, we should look at a storage-layout picture. Here is a simplified version of the example (just method overrides, no overloads):

class Point = struct
         var x = new int;
         var y = new int
         proc paint() { ...x...y... }
      end;
class Color =  struct
                  var color = new array[3] of int;
                  proc paint() { ... super.x ... super.y ... color ... }
               end;
class ColoredPoint = Point + Color;

var a = new Point;
var b = new ColoredPoint;
Variable a names an object in the heap; so does b, but the latter's object is broken into two namespaces, since it is built in two steps. Here is the picture of the storage layout:

We see that the global names are saved in namespace α. When a was declared, the object at heap address ε is allocated, based on the coding named Point.

There is a difference in the picture from the previous ones --- in object ε, notice that method paint, which names the procedure code stored at location τ, does not save an address of where to find global variables. So, we now have dynamic scoping, as we will see.

When b is declared, an object, ρ, holding the namespace of Color, is allocated. It is linked to the object, η, that holds the namespace of a Point. The link, which we used to call parentns, is now called super. This is why we can write super.x and super.y in the coding of method paint within class Color --- super is the name of the link to the ``super object.'' If we write super.x, we force the use of variable x in the superclass even if there is a variable x that is more local. This is how you see super used most often in Java.

Now, say that we do this method call:

b.paint()
Since the active namespace is α, the name b means ρ --- the call is ρ.paint. A new namespace is allocated:

The parentns link, which leads us to paint's nonlocal variables is set to the address of the object named in the method call, b.paint(). This address is also called self or this in some languages.

Say that the coding of paint mentions self.x and say that the call b.paint() executes. You can see from the diagram that the search for self.x looks in namespace φ to learn that self means ρ. So, the search for ρ.x looks in namespace ρ for x and proceeds to namespace η before x is found in the ``currently called object.'')

Since paint's self link is set only when paint is called, the link can be different for different calls of paint --- it is dynamic scoping. This leads to surprising consequences, as we see in the next section.

Methods and override

The previous example showed that appended structs can have two fields indexed by the same identifier name. This appears to be in clear violation of what a struct is --- a mapping from distinct field names to their values. But this violation, the presence of multiple fields with the same name, is a fundamental feature of object-oriented programming, as shown in the earlier examples, where a different coding of paint is used for a colored point object versus an ordinary point object.

From Smalltalk onwards, object-oriented languages allow objects to have multiple fields named the same. Whenever we reference one of the object's fields, there is a sequential (dynamic) lookup, starting from the ``newest'' (subclass) fields towards the ``oldest'' (superclass) fields.

We saw this with the paint methods in the earlier example. Here is a second example, coded in Java:

===================================================

class Point {
    int x;  int y;
    ...

    string toString()
    { return  "Point: " + x + "," + y }
}

class ColoredPoint extends Point {
    int[] color;
    ...

    string toString() {
      return "ColoredPoint: " + super.x + "," + super.y 
             + "; colors: " + color[0] + "," + color[1] + "," + color[2] }
    }


Point a = new Point(0,0);  // at position 0,0 --- the upper left corner
ColoredPoint b = new Point(0,0, 255, 0, 100);   // violet

System.out.println(a.toString());  // prints "Point: 0,0"
System.out.println(b.toString());  // prints "ColoredPoint: 0,0; colors: 255,0,100"

===================================================
When ColoredPoint object b is constructed, it has two fields named toString, but for the call, b.toString(), dynamic lookup always locates first and executes the newer version of toString within b, namely the one from class ColoredPoint. In this way, the older method is overridden --- cancelled -- by the newer one. In practice, overriding can be very useful, especially when you are linking your own subclasses to prewritten superclasses embedded within a framework. (This is how most GUIs are written, with frameworks.)

So far, so good! But strange things can happen with dynamic lookup. Here is the point-coloredpoint example again, rewritten to remove super.equals and rewritten so that whenever a colored point is compared to an ordinary (noncolored) point for equality, the result is false:

===================================================

class Point {
    int x;  int y;

    Point(int initx, int inity) {   // constructor method
        x = initx;  y = inity
    }

    boolean equals(Point q) {
        return  x == q.x  &&  y == q.y
    }

    boolean hasSameCoordinates(Point q) {
        return equals(q)
    }
}

class ColoredPoint extends Point {
    int[] color;

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);
        color = new int[3];
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }

    boolean equals(Point q) {
        if (q instanceof ColoredPoint)
            { return hasSameCoordinates(q)  &&
                color[0] == q.color[0]  &&
                color[1] == q.color[1]  &&
                color[2] == q.color[2]
            }
        else { return false }   // all Points are nonequal to ColoredPoints
    }
}

===================================================
We coded ColoredPoint so that its equals method overrides the equals method of Point, meaning that the latter is never used within a ColoredPoint object. We eliminated super.equals by calling a stand-in method, hasSameCoordinates, which is meant to check the equality of the x,y coordinates of two points. But our coding will go horribly wrong:
===================================================

Point a = new Point(0,0);
ColoredPoint b = new Point(0,0, 255, 0, 100); 

a.hasSameCoordinates(b);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals in  Point,  which returns  true

b.equals(a)    // calls  equals in  ColoredPoint, which returns  false
a.equals(b);   // calls  equals in  Point(!) which returns  true (?!)

b.hasSameCoordinates(a);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals  in  ColoredPoint(!),
                          // which returns  false (!!)

b.equals(b);   // calls  equals in  ColoredPoint, which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint(!!), which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint, which ... repeats forever  )-:

===================================================
Almost nothing goes correctly, thanks to dynamic lookup! The problem is that the coding of hasSameCoordinates must call the coding of equals within class Point to work correctly. This is destroyed by dynamic method lookup --- what we see in the coding of class Point has no relationship to what the computer does. How can we write programs in a language that we cannot trust with our own eyes?

In this situation, we must draw the storage layout and trace the use of the parentns (super and self) linkages.


6.8 Conclusion

Imperative languages are ``scratchpad languages,'' which treat computation as a game of writing numbers in the squares of a grid-sheet scratchpad, reading the numbers in the squares, and erasing the numbers in the squares and writing new ones in their place. This is the form of computation we do when we keep running totals, like vote counting at an election or keeping score at a basketball game. It is the form of computation we do when we work with databases, blackboard software architectures, and persistent storage.

Imperative languages are for updating storage in small, baby steps. Perhaps the storage is one big piece of primary memory, with millions of small cells, or perhaps the storage is split into hundreds of objects, each with its own location and each holding just a few cells, or perhaps the storage is the grid of RGB-pixels that lights up your computer's display. In any case, if the computation requires locating a cell in a storage structure, reading it, and changing it, then you will be using an imperative language to do it.

In the chapters that follow, we consider other views of computation.