Copyright © 2010 David Schmidt

Chapter 6:
The imperative paradigm: From an assignment core to Java


6.1 A core imperative language with declarations and type templates
    6.1.1 Using type templates for both data-typing and storage allocation
    6.1.2 Type-templates and nested data structures
6.2 Data and control-structure extensions
6.3 Component extensions: Abstracts
    6.3.1 Procedures inside structs
    6.3.2 Type-template abstracts: Classes
    6.3.3 Declaration abstracts: Modules
6.4 Component extensions: Parameters
6.5 Component extensions: Blocks
6.6 Subclasses and virtual methods
    6.6.1 Field lookup: static or dynamic?
    6.6.2 Appending structs: mixins
    6.6.3 How method override can go wrong
6.7 Conclusion


When one encounters a massive language, like C++ or C#, for the first time, one is tempted to ask, ``Who thought up this mess?'' Indeed, to understand a programming language, one must look past piles of syntax and identify the language's structure. The principles we studied in the previous chapters let us do that.

Languages that do computation with storable values are called imperative languages, because commands (like assignment) give orders --- imperatives --- about updating storage. This chapter presents the core of an imperative language and applies the extension techniques to grow the language into a modern, object-oriented language like Java or C#.


6.1 A core imperative language with declarations and type templates

We begin with a small extension of the object-oriented language from Chapter 1. In one sentence, the imperative language declares variables that hold ints and handles to arrays and structs. These are the characteristic domains:
expressibles: integers, booleans, and handles to arrays and structs
denotables (declarables): integers and handles to arrays and structs
storables (updatables): these are exactly the denotables.
===================================================

P : Program                   E : Expression
D : Declaration               L : NameExpression
T : TypeTemplate              N : Numeral
C : Command                   I : Identifier
E : Expression

P ::=  D ; C

E ::=  N  |  L  |  E1 + E2  |  E1 == E2  |  not E  |  new T  |  nil

C ::=  L = E  |  print L  |  C1 ; C2  |  if E { C1 } else { C2 }  |  while E { C }

D ::=  var I = E  |  D1 ; D2

T ::=  struct D end  |  array[N] of E

L ::=  I  |  L . I  |  L [ E ]

N ::=  0 | 1 | 2 | ...

I ::=  alphanumeric strings beginning with a letter, not including keywords

===================================================
This is an assignment language, and it is also an object language (notice expression new T), but the most interesting part of modern assignment languages is how variables are declared and how their storage is allocated. In our example core langugage, we write
var x = 3;
var z = new struct var f = 0; var g = 1 end;
var r = new array[4] of 0
to create three variable names in the program's namespace: x, which holds value 3; z, which holds a handle to a newly created struct object that holds ints f = 0 and int g = 1; and r, which holds a handle to a newly created four-celled array object initialized to all 0s. Here's a picture:

Now, this next point is really important, and it is a key feature of modern imperative (object-oriented) programming: a TypeTemplate phrase is a storage-allocation template. Consider the above example one more time:

  1. new struct var f = 0; var g = 1 end allocates storage for an object (struct) that holds two ints. It is a storage-allocation template for allocating an object that holds two cells.
  2. new array[4] of 0 allocates storage for 4 ints, initialized to 0. It is a storage-allocation template for allocating a vector of 4 cells.
Type-template phrases define object-allocation layouts. When we write new T, we activate T to construct an object with layout T. When we use var I = new T, we create a name, I, along with a cell that holds the handle to the object allocated from T.


6.1.1 Using type templates for both data-typing and storage allocation

Let's contrast the above fragment,

var x = 3;
var z = new struct var f = 0; var g = 1 end;
var r = new array[4] of 0
to that in C and Pascal. To obtain the same allocations in C, one would write
===================================================

int x = 3;
struct MyStruct {int f = 0; int g = 1};  // you first name the allocation template;
MyStruct z;   // then you use it to declare a struct
int[4] r;     // or say,   int r[4]
// now, you write a for-loop that places 0s in  r's  cells

===================================================

Like our model core language at the beginning of this chapter, C (and Pascal and Modula and Ada) use type-template phrases to allocate the correct amount of storage for struct z and array r. But these languages also use the type-template phrases as data-type names, that is, ''z has data type MyStruct'' and ``r has data type int[4].''

The data-type names are used in the rest of the program to confirm that the variables are used properly. For example, C's type checker will validate that this assignment is consistent with z's and r's declarations:

z.f = r[2] + 1
but this one is not:
x = z[3] - r    // this is a ``type error''
Since TypeTemplate phrases can be used as data-type names as well as storage-allocation templates, we call them ``type templates.'' But remember: you can switch off C's type checker and then you are using the TypeTemplate phrases for their primary use --- to allocate storage.

Java and C# also use type templates as both data-type names and storage-allocation templates. Our example looks like this in Java/C#:

===================================================

int x = 3;
class MyStruct {int f = 0; int g = 1};  // You declare the struct as a class 

MyStruct z = new MyStruct();   // Why do we say,  MyStruct,  twice ??

int[] r = new int[4];     // What is the difference between  int[] and int[4] ??
// Java automatically initializes an int-array to all  0s

===================================================
On the last two lines, Java and C# make you use the type template first to attach a data-type name to a variable name and second to allocate the object that the variable names.

In Java, if you write merely,

MyStruct z;
int[] r;
then only the data-type names are attached to z and r and no struct and no array objects are allocated. Instead, z and r each have value nil (``no handle'') in the namespace.

One might declare z and r without initial values so that the Java compiler learns their data types for future type checks (e.g., to enforce that only handles to MyStruct objects are assigned to variable z and that only handles to int arrays are assigned to variable r.)

We will not devote a lot of time in this chapter to data-type checking with type templates, but their use does not always go smoothly. Here is a standard problem:

===================================================

class MyStructA {int f; int g}; 
class MyStructB {int g; int f}; 
MyStructA x = new MyStructA();
MyStructB y = x      // Is this allowed or not ?
                     // After all, the struct has exactly the required fields ?!

===================================================
Java/C# consider this example erroneous; Algol68 does not. At the end of the chapter we study some difficult questions that arise when we use type templates as data-type names.


6.1.2 Type-templates and nested data structures

The syntax of TypeTemplate lets us nest data structures. Here is a struct named database that holds an int and an array of 100 structs:

===================================================

var database = new struct var howmany = 0;
                          var table = new array[100] of
                                          new struct var idnum = 0;
                                                     var balance = 0;
                                              end
                   end;
// insert an entry:
var count = database.howmany;
if not(count == 100) {
    item = database.table[count];
    item.idnum = 9999;
    item.balance = 25;
    database.howmany = count + 1
}

===================================================

As we have written it, the declaration of database allocates storage for all 100 of the struct objects in table before a single entry has been entered into the database. This is what happens when you code the above declarations in C or Pascal or Ada. Here is a picture of what the above code created:


Using the diagram, we can state precisely what, say, database.table[count].idnum denotes:

The diagram shows that we allocated all the array's structs at once. But this is a waste of storage if the database is mostly empty all the time.

For this reason, Java and C# take the opposite approach --- a nested type template allocates only the topmost level of object. You must use lots of new commands to construct the objects that fill the lower levels. This sounds strange, but programmers quickly learn to do this. Here is the above example written in Java style:

===================================================

// this template can allocate a two-celled struct object:
class ENTRY{ int idnum = 0;
             int balance = 0; }

// this template can allocate a struct that holds one int cell and 100 cells
//  that can hold handles to other objects:
class DB{ int howmany = 0; 
          ENTRY[] table = new ENTRY[100]; }

DB database = new DB();   // allocates one DB object but _no_ ENTRY objects!!!

// now, insert one ENTRY object into  database:
count = database.howmany;
if not(count == 100) {
    ENTRY item = new ENTRY();      // allocate one object only
    item.idnum = 9999;
    item.balance = 25;
    database.table[count] = item;  // store the object's handle in the array
    database.howmany = count + 1
}

===================================================
Here is a diagram of what this code constructed:

In summary, be careful if you move from Java/C# to C/Pascal/Ada (or vice versa) --- the two languages allocate objects differently! (If you want to do ``lazy allocation'' in C, you must use pointers and malloc, somewhat like this:

int* r;             // declares  r  as a pointer
int howmany = 100   // number of array elements desired
// save in  r  the address of a 100-celled array:
r =  (int*)(malloc(howmany * sizeof(int)));
// assign  99  to cell 0 of the array:
r*[0] = 99     
)

For now, we will stay with our example core language. In this case, if we wish to allocate the database but not fill it with records, we do this:

===================================================

var database = new struct var howmany = 0;
                          // initialize  table  to 100 cells holding  nil:
                          var table = new array[100] of nil 
                   end;
// insert an entry:
if not(database.howmany == 100) {
    database.table[howmany] = new struct var idnum = 999999;
                                         var balance = 25
                                  end;
    database.howmany = database.howmany + 1
}

===================================================
The result would be the same storage layout that we saw in the previous diagram.

Review

For the sections that follow, we use this syntax for declarations and type templates:
===================================================

E: Expresssion         N: Numeral
D: Declaration         T: TypeTemplate
I: Identifier

E ::=  N  | ... |  new T  |  nil
D ::=  var I = E  |  D1 ; D2  |  ...
T ::=  struct D end  |  array[N] of E

===================================================
To recap:

Exercises:

  1. Add Declaration, TypeTemplate, and the new T construction to the interpreter for the object language in Chapter 2.
  2. Here is another, useful syntax for type-templates:
    T ::=  struct D; C end  |  array[N] of E
    
    The C in the struct acts as initialization code. Implement this in your interpreter.
  3. Add type checking to your interpreter, that is, when a variable is declared as var I = new T, then I is saved in the program's namespace with template T and its value. All subseqent assignments to I must be values that have type T. (That is, a declaration, var I = E, computes both the type and value of E, and these are saved in I's cell.)
  4. Now, "split" your interpreter from Exercise (3) into two parts: the first part is a type-checker program: it accepts an operator-tree representation of a program as its input, and it computes only the data-type information of the variables in the program --- it does not compute the values of expressions! If a program can be completely "type-interpreted" with no errors, then the second part, the interpreter from Exercise (1) or (2), is executed on the "type-checked" program. (A compiler-based implementation of a language will do parsing and type checking somewhat like this. The idea is to remove all the types and type checking from the run-time execution.)


6.2 Data and control-structure extensions

Although it is not a topic of emphasis in this chapter, we briefly look at how the core language's data and control structures are extended.

Data-structures

The first language that used structs, COBOL, had a format somewhat like this:
D ::=  var I = E  |  D1 ; D2

T ::=  struct (I : int)+  end  |  array[N] of int
That is, the contents of a struct is limited to just integer variables. Arrays are also integer-only.

The data-structure extension principle suggests that all storable values --- all data structures --- should be storable as elements within all other data structures. This means we should allow arrays of arrays and structs, and structs of arrays and structs. In C and Pascal, the syntax of data structures evolved to this:

===================================================

D ::=  var I : T  |  D1 ; D2

T ::=  struct D end  |  array[N] of T  |  int

===================================================
That is, var I : T, allocates storage of shape T for I, where T can be an arbitrarily nested data structure. int is the ``starter'' type template.

(Of course, there is a cost to such generalization. For example, the C language is designed so that storage is a sequence of integer cells, and data structures like arrays and structs are an illusionary naming convention for a sequence of contiguous cells. This storage model lets C programs compile into simple, fast target code --- C is meant for writing device drivers, not databases.)

In a later section, when we add procedures, modules, etc., we will see how they can embed within data structures, causing a further generalization.

Control structures

Our core language uses these control structures for commands:
C ::=  . . .  |  C1 ; C2  |  if E { C1 } else { C2 }  |  while E { C }
But there can be control structures at other levels of the language. In fact, we already see one at the Declaration level: The Declaration domain uses the sequencing control structure, ;, which is important because declarations can be initialized like this:
int x = 2;
int y = x * x
The order of the declarations, stated by ;, matters. In some languages, it is useful to have other control structures for declarations (e.g., conditionals) to help decide what to declare.

Since the core language has arrays, it might be useful to introduce a control structure that lets us ``iterate'' (process) an array's elements. Perhaps we add a for-loop, which counts upwards from some lower bound to some upper bound by ones:

C ::=   ... | for I = E1 upto E2 : C
Within the for-loop, variable I is used as an index for locating elements in an array, e.g.,
var evens = new array[9];
for index = 0 upto 8:
    evens[index] = index * 2
Or perhaps we have an iterator that extracts the array elements, one by one:
foreach element in evens:
    print element * 2
Perhaps structs also require a control structure for iteration. It might look like this:
C ::=   ... | foreach I in L : C
The loop would look up each field in struct L which would be used to index L's fields. Here is an example:
var evens = new struct var a = 0;
                       var b = 2;
                       var c = 4;
            end;
var sum = 0;
foreach k in evens :
    sum = sum + evens.k ;
As an exercise, think about the control structures that might be useful at the level of Expression.


6.3 Component extensions: Abstracts

The abstraction principle says we may add abstracts for each of the language's domains.


6.3.1 Procedures inside structs

In an earlier chapter, we saw that procedures are just named commands. Here they are, again:
===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C } 

C ::=  L = E  |  C1 ; C2  |  ... |  L()

T ::=  struct D end  |  array [ E1 ] of E2

E ::=  N  |  ...  |  new T  |  nil

L ::=  I  |  L . I  |  L [ E ]

===================================================
The definition construction is proc I(){C} and the calling construction is L().

The syntax for structs, seen above, exposes an important idea --- we can embed procedures within a struct. This insight led to the development of object-oriented programming as first seen in the Simula67 programming language --- objects are structs that contain both procedures and variables. Here is an example:

var init = 0;
var clock = new struct var time = init;
                       proc tick(){ time = time + 1 };
                       proc display(){ print time };
                       proc reset(){ time = 0 }
                end
The procedures inside clock maintain variable, time. The struct is allocated as an object, with handle β:

One by one, the declarations in the struct are executed and placed into the active namespace, β. (Note that β lives at the top of the namespace stack; when the struct is constructed, β is popped.)

The clock object is a namespace, and tick, reset, and display are closures. Say we use clock like this:
clock.tick();
clock.tick();
clock.display()
When one calls clock.tick(), the following happens:
  1. clock is looked up in global namespace α; its value is β. This makes the call into β.tick().
  2. Within namespace β, tick is located; it names the stored procedure at address γ. The computer executes the stored code for tick, as pointed to by closure γ.
  3. A new namespace, ρ, is created for the called procedure, just like we saw in the previous chapter. Note that tick's parentns field (the link to global variables) is set to β. So, tick's code can look up and increment the nonlocal variable, time. Also, tick can call the other procedures that live with it in object β. Here is the storage configuration when the code for tick is executed:
    
    
  4. tick's code executes, and time within namespace β is incremented.
  5. Once tick completes, namespace ρ disappears.
Finally, note that the parentns link within namespace β is set to α; this allows the object to find global variables outside its ownership. (If we don't want an object to see variables outside its own namespace, we set parentns to nil.)

This example shows that

objects are structs; they are implemented as namespaces.


6.3.2 Type-template abstracts: Classes

The structs we have coded so far are getting hard to read. This is why languages like Pascal, ML, Java, etc., let us name type templates. A type-template abstract is called a class. Here is how we define and call TypeTemplate abstracts:
===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C }  |  class I = T

T ::=  struct D end  |  array [ E1 ] of E2  |  L

===================================================
Note that the L in the syntax rule for T is the construction that calls a named class.

The clock example in the previous section can be written like this:

class CLK = struct var time = 0;
                   proc tick(){ time = time + 1 };
                   proc display(){ print time };
                   proc reset(){ time = 0 }
            end;

var clock = new CLK;
clock.tick()
This indeed looks familiar. Here's what happens in the machine.
  1. CLK is declared, which means a closure object is constructed to hold its code and its link to its global variables:
    
    
  2. Next, clock is declared, which calls and executes CLK: A new namespace is allocated, its parentns link is set, and the namespace's handle is pushed onto the activation stack:
    
    
  3. The declarations listed within CLK's body are evaluated using the new namespace:
    
    
  4. At conclusion, the activation stack is popped, and the handle to the initialized object is assigned to clock in the active namespace:
    
    

(The parentns link saved in CLK's closure is critical to proper variable scoping. Here is a tricky example, unrelated to the above, where the proper x must be incremented when procedure p is called:

var x = 0;
class C1 = struct proc p() { x = x + 1 } // wants to increment global  x
           end;
class C2 = struct var x = 5;
                  var c = new C1;
           end;
var m = new C2;
m.c.p()
The parentns link ensures that m.c.p() increments the global variable x and not the x inside the object named by m. )

Classes are well used for descriptive data structuring. Here is how we declare the database seen earlier:

===================================================

class Entry = struct var idnum = 0;
                     var balance = 0
              end;

class DB = struct var howmany = 0;
                  var table = new array[100] of new Entry 
           end;

var database = new DB;

===================================================
The classes divide the definition into readable substructures. Java people say that variable database ``denotes an object of class DB'' or ``database has type DB.'' But classes are not, strictly speaking, data types, even though Java lets a programmer think they are.

Since a class is a declaration and since a struct holds declarations, we might format the above example more elegantly like this:

===================================================

class DB = struct
             class Entry = struct var idnum = 0;
                                  var balance = 0
                           end;
             var howmany = 0;
             var table = new array[100] of new Entry
           end;

var database = new DB;

===================================================
This makes Entry part of DB, which seems sensible if Entry is not needed in other parts of the program. (And if it is, we can allocate from it like this: new database.Entry.)

Please remember what classes (and TypeTemplate phrases) really are: A class is a storage-allocation template. When we define a class, we define an allocation template, and when we use the keyword, new, we activate the template to allocate storage.

This development shows that a core assignment language with structs already has the computational power to do object-oriented programming --- it is merely a matter of introducing a couple of key abstracts (command abstracts and type-template abstracts) for convenience.


6.3.3 Declaration abstracts: Modules

There are other opportunities for applying the abstraction principle. One candidate is the Declaration domain. This gives a form of module --- a named, compound declaration that can be ``imported'' into a program. The syntax looks like this:
===================================================

D ::=  var I = E  |  D1 ; D2  | ...  |  module I = D  |  import L

===================================================
A module differs from a class because a called module activates a set of declarations at the point where it is imported (called); the importation ``links'' the module's declarations to the program. In contrast, a class is called to allocate storage.

Here is an example to make this point:

===================================================

module M = 
    var x = 0;
    class C = struct var a = 0  end;
    var y = new array[10] of new C;
    proc initialize(){
        x = 0;
        for i = 0 upto 9 { y[i].a = i }
    }

// The above code often resides in a separate file, named  M.

import M;     // activates and links M's declarations to this program
initialize()  // calls the proc as if it were declared in the program
var z = new C;   // class  C  is used as if it were declared in the program
z.a = x + y[0].a

===================================================
Once M is imported, its declarations are linked to the program as if they had been written there in the first place.

Since module importation is a kind of linking, does it make sense to ``import'' twice? That is, can we do this?

import M;
import M
and is it the same as importing M just once? Most languages ignore repeated imports of the same module.

Here is another question: Should we allow this example?

===================================================

module M = var x = 7;

module N = var x = 99;

import M;
import N;
x = x + 1    // which variable  x  is updated?

===================================================
This is a serious issue when a large program is assembled from many modules that are linked together --- there is always a chance that two distinct modules declare the same name. For this reason, most languages require that a module's names are referenced with dot notation, like this:
===================================================

module M = newstruct var x = 7 end;  // might be in a separate file

module N = newstruct var x = 99 end;  // might be in a separate file

import M;
import N;
N.x = M.x + 1

===================================================
Languages that use modules-as-structs often add an operation that ``opens'' the module so that the declarations are exposed like we saw them in the first place:
module M = newstruct var x = 7 end;

from M import *;  // import *   means that all declarations in M are linked
                  // as if they were declared in the program.
x = x + 1
A variation of this operation opens the module for a limited scope:
module M = newstruct var x = 7 end;

module N = newstruct var x = 99 end;

import M;
import N;
with M do {
   N.x = x + 1   // inside the  with M do,  references to  x  mean  M.x
}

Once we add dot indexing to modules, they are looking a lot like classes! Indeed, this is why modules are not part of Java. (But Java does have packages, which are module-like.) Modules are most convenient, however, for linking together large program files into one --- classes do not do this very well.


6.4 Component extensions: Parameters

In the previous chapter we learned how to add parameters to abstracts. These same techniques are useful for adding parameters to classes and modules. Here is an example of a parameterized class:

===================================================

class Entry = struct var idnum = 0;
                     var balance = 0
              end;

class DB(size) = struct var howmany = 0;
                        var table = new array[size] of new Entry 
                 end;

var database = new DB(100);

===================================================
The parameter, size, lets us adjust the size of a new database.

Many object-oriented languages transmit the argument to a class's constructor method and not to the class itself: Here's how the above example looks in Java:

class Entry { int idnum;  int balance = 0 }

class DB {
    int howmany = 0;
    Entry[] table;

    DB(int size) {
        table = new Entry[size]
    }
}

DB database = new DB(100);
Java requires a constructor method to handle the parameter. Of course, constructor methods let you write other initialization code. Please see the Exercise at the end of this section for a further analysis of this difference.

Modules might also be parameterized, perhaps by expressions or even type-templates. Consider this example, which defines a database module parameterized on an int and the type-template format to be stored in the database table:

===================================================

module DataBase(size, recordTemplate) =
    var howmany = 0;
    var table = new array[size] of new recordTemplate;

    proc initialize(){
        for i = 0 upto size - 1 {
            table[i].init() }   // oops -- how do we know  recordTemplate  contains proc  init ?
    };

    proc find(index):
        if index >= 0 and index < size {
            answer = table[index].getVal()  // how do we know  recordTemplate  contains function  getVal  ?
        return answer
    }

===================================================
If we had this class,
===================================================

class Entry = struct var idnum = 0;
                     var balance = 0;
                     proc init(x,y){
                         idnum = x;
                         balance = y
                     };
                     fun getVal(){ return balance }
              end;


===================================================
we could activate (import) the module like this:
Database(100, Entry)
This allocates storage for howmany and table, where the latter is an array of 100 Entry structs.

The coding of DataBase is suspect --- it assumes that whatever the recordTemplate type-template might be, it includes a procedure named init and a function named getVal. To ensure the security of module Database, we should annotate its parameters with these requirements. The data-type-like annotations are called an interface.

Here is a Java-like coding of the interface that we want:

===================================================

interface RecordInterface = struct init: (int * int) -> void;
                                   getVal: void -> int
                            end;

module DataBase(size: int, recordTemplate: RecordInterface) = ... like before ...

===================================================
The interface gives enough information that the programmer or compiler can check that the module is coded sensibly. The annotation of size ensures that an int argument will be bound to it, and the annotation of recordTemplate ensures that a struct object with at least an init procedure and a getVal function will be bound to it.

The type-template argument that is bound to parameter recordTemplate must match the interface; it must ``implement'' it, as one says in Java-speak:

import Database(100, Entry)   // Entry  implements (matches) RecordInterface

In a similar way, we can use interfaces for module arguments to modules, to enfore correct linking:

===================================================

interface DataBaseInt = { howmany: int;
                          initialize: void -> void;
                          find: int -> int
                        };

module System(db: DataBaseInt) =   // db binds to a declaration that holds the
                                   // identifiers listed in  DataBaseInt
    var current = 0;
    var value = 0;

    db.initialize();
    current = db.howmany - 1;
    value = db.find(current)
;
 . . .

import System(import Database(100, Entry));   // link  Database  to  System

===================================================
Features like these are found in the module-oriented languages, Modula2 and Ada.

Exercise

Java and C# use constructor methods in their classes, so that arguments can be used to initialize newly allocated objects, e.g.,
class Entry { int idnum;  int balance = 0 }
class DB {
    int howmany = 0;
    Entry[] table;

    DB(int size) {
        if size > 0 {
            table = new Entry[size]; }
    }
}
DB database = new DB(100)
In our example object language, we can always add an init procedure, like this:
class Entry = struct var idnum = 0;
                     var balance = 0
              end;

class DB = struct var howmany = 0;
                  var table = nil;
                  proc init(size) { if size > 0 {
                                         table = new array[size] of nil } }
           end;
var database = new DB;
DB.init(100)
But consider this form of struct:
===================================================

P ::=  D ; C
D ::=  var I = E  |  D1 ; D2  |  class I1 ( I2 ) of T
E ::=  ...  |  new T
T ::=  ...  |  struct P end  |  array [ E1 ] of E2  |  L ( E )

===================================================
Now structs are collections of declarations followed by initialization code. Recode the above example in this new syntax. Implement it.

This example shows that structs (classes) are encapsulated programs that are executed (via new) and are queried for their answers (via L.E indexing). This is the basis of actor theory, where actors/agents are small programs, like ants in an ant colony, that ``execute in themselves'' and ``communicate'' their answers/knowledge.


6.5 Component extensions: Blocks

We saw in the previous chapter how procedures can declare and use local variables. This idea is used to good effect with classes and modules, so that they can own private variables that cannot be altered from outside the scope of the abstract.

Here is the syntax of declaration blocks and type-template blocks:

===================================================

D ::=  var I = E  |  D1 ; D2  |  class I = T  |  module I = D  |  import L  |  begin D1 in D2 end

T ::=  struct D end  |  array [ E1 ] of E2  |  L  |  begin D in T end

===================================================
Returning to the database example, we can improve the definitions so that the variables owned by the database are made private:
===================================================

class Entry = struct var idnum = -1;
                     var balance = 0;
              end;

class DB(size) = struct
                 begin var howmany = 0;  // these two declarations are private
                       var table = new array[size] of new Entry  // to the struct
                 in 
                       proc find(i){ ...table[i].balance()... }
                       proc update(i,...){ ...table[i] ... howmany ... }
                 end end;

module DataBase(max) = begin var mybase = new DB(max)  // this declaration is private
                       in       
                             proc searchDataBase(...):
                                 ...mybase.find(...)...
                           
                             proc processDataBase(...):
                                 ...mybase.update(...)...

                             // but we cannot reference  mybase.howmany
                             // nor  mybase.table
                       end;

import DataBase(100);
DataBase.searchDataBase(...);
DataBase.processDataBase(...)
// but we cannot reference  Database.mybase

===================================================
The example shows how protection was placed around private variables howmany and table within DB so that once a DB object is allocated, all uses of the two variables must be made via the public procedures, find and update. The same idea is used to protect the struct, mybase, within module DataBase.

Of course, in C# and Java, the keyword, private, is used to label a declaration as local to a block.

The qualification principle makes it possible to encapsulate declarations within components so they are safe from unauthorized use, no matter where the component is inserted into a system. Large systems building is possible only because of the qualification principle.


6.6 Subclasses and virtual methods

Two distinctive features of object-oriented languages are subclasses and virtual methods. These concepts are surprisingly complex, yet their correct use is critical to successful object-oriented programming. We start with a standard example.

GUI-buiding frameworks contain starter classes for windows, frames, buttons, text entries, and so on. Here is a sample:

===================================================

class Button {
   int x; int y;  // coordinates for the button's position

   proc paint() {  // code for painting the button on the display;
     pass          // the default does nothing
   }          

   proc refresh() {  // code for resetting and repainting the button
     ... technical code that talks to the framework and the OS
     ... x ... y   // references  x  and  y
     paint()       // calls  paint
   }
}

===================================================
The class contains just enough code to generate a blank button in a GUI. Now, a user can build on the code to define a customized button:
===================================================

class MyButton extends Button {   
   int x;  int z;   // some local variables

   proc customize() {   // a method that does some customized action
     ... x ... z
     ... super.y        // super.y  is explained below
   }

   proc paint() {
     ... code that draws a colored, labelled button
   }
}

===================================================
MyButton is a subclass of Button; it extends (adds to) Button with extra fields and methods. When the user states
var b = new MyButton;
this constructs an object that contains both the coding in Button and the coding in MyButton. The user can invoke b.customize() and b.paint() (this calls the newer method, paint, in MyButton!), and b.refresh() (calls the method in Button). But notice that refresh calls paint --- which copy of paint should refresh activate? The designers of GUI frameworks want b.refresh() to call the newer version of paint, the one in MyButton.

In this way, each time b is refreshed, a customized button is drawn. This is the situation provided by Smalltalk and Java. How can this happen, based on the use of closures and parentns links seen so far? Some changes are needed.

Here is a picture how this situation is implemented. Say we have these two declarations,

var a = new Button;
var b = new MyButton;
Here's what's in the heap:

Variable a names an object in the heap; so does b, but the latter's object is broken into two namespaces, linked together, since it is built in two stages. When b is declared, an object, ρ, holding the namespace of MyButton, is allocated. It is linked to the object, η, that holds the namespace of a Button. The link, which we used to call parentns, is now called super. This is why we can write super.y in the coding of method customize --- super is the name of the link to the ``super object.'' If we would write super.x inside method customize, we force the use of variable x in the superclass even if there is a variable x that is more local. This is how you see super used most often in Java.

There is another key difference in the picture from the previous ones --- notice that the closures for the declared methods do not save an address of where to find global variables. So, we have dynamic scoping, as we will see.

Say that we do this method call:

b.refresh()
Since the active namespace is α, the name b means ρ --- the call is ρ.refresh(). A search of ρ's namespace fails to find method refresh, but a search of the super namespace, ε, finds it. The code for refresh is fetched, and a new namespace, φ, is allocated for the call:
Executing code of  refresh  in  Button:


This is the crucial part: φ's parentns link, which leads to the nonlocal variables, is set to the address of the object named in the method call, b.refresh(). The link is called self or this.

This means refresh will look first in ρ (not ε!) for its nonlocal variables.

Now, the code for refresh calls paint. The active namespace, φ, is searched for paint and then the linked space, ρ is searched. In this way, the code for paint in MyButton is located and it is activated:

Executing the code of  paint  in  MyButton:


Because the self-link is determined only when a method is called, the implementation is called dynamic scoping. (Recall, when the link is saved with the closure when a method is declared, it is static scoping.) Another name for it is virtual method override. As noted earlier, virtual method override is the default in Java and Smalltalk, and it can be activated by the keyword, virtual, in C++ and C#.


6.6.1 Field lookup: static or dynamic?

In the previous example, class Button held fields x and y, and class MyButton held fields x and z. Should field lookup execute the same way as method lookup? For example, if we call b.refresh(), should refresh's code in class Button reference the field x in MyButton (dynamic scoping)? Or should refresh use the x in Button (static scoping)?

In Java, all method calls are dynamically scoped and all field lookups are statically scoped. This means refresh always uses the x and y in class Button. But this means we must retain the parentns links as well as add the self and super links. This is an interesting exercise to implement.

(IMPORTANT NOTE: The Java compiler can compute static field lookups before a program is executed, and it will embed byte code that locates a field without the implementation of parentns links. This optimization is best studied in compiling theory and we will not pursue it here.)

Python has classes, too, and a programmer can choose to use either dynamic or static lookup for either methods or fields: Use self.f for dynamic lookup and use f for static lookup --- that's it. This is my favorite solution to this mess. For example, if refresh references all its methods and fields dynamically, it is written like this:

   proc refresh() {  // code for resetting and repainting the button
     ... self.x ...  self.y
     self.paint() 
   }
and if it references all its methods and fields statically, it is written like this:
   proc refresh() {  // code for resetting and repainting the button
     ... x ... y
     paint() 
   }
and finally, if it reference methods dynamically and fields statically, it looks like this:
   proc refresh() {  // code for resetting and repainting the button
     ... x ...  y
     self.paint()
   }

Exercise

Say that we have subclasses and say that all field and method lookups are dynamically scoped. Explain how all objects, even the ones constructed by subclasses, can be implemented as single namespaces. (Hint: use a Python dictionary to implement objects.)


6.6.2 Appending structs: mixins

Subclasses do not arise as a language extension principle --- class extension is a brand new language operation, where structs can be appended together. Returning to our core language, we get subclasses like this: we add an append operation, +, to the syntax of type templates:

===================================================

T ::=  ...  |  struct D end  |  T1 + T2

===================================================
This lets us code structs in stages, e.g., a three-field struct can be assembled like this:
var p = new (struct
              var x = 0;
              var y = 0;
             end       
           +
             struct
              var z = new array[3] of 0;
             end);

p.z[1] = p.x;
The above does not seem so useful, but once we add class names and procedures, it gets interesting:
===================================================

class Button = struct
                 var x = ...
                 var y = ...
                 proc paint() { pass }
                 proc refresh() { ... x ... y ... paint() }
               end

class CustomFeatures = struct
                         var x = ...
                         var z = ...
                         proc customize() { ... x ... z ... super.y }
                         proc paint() { ... }
                       end
                          
class MyButton = Button + CustomFeatures  // aha!

===================================================
Some object-oriented languages use classes exactly as shown here. (The class-fragments are called mix-ins.) But mainstream object-oriented languages (e.g., Java) allow only an ``incremental'' append, where a named ``base class'' is extended with new fields, like we saw in the previous section.

In any case, MyButton is a subclass of Button because it builds on Button --- it has all Button's fields and then some. Subclassing is an abbreviation for appending structs.


6.6.3 How method override can go wrong

Dynamic scoping (method override) is a central feature to modern object-oriented programming. But even innocent uses of it can lead to huge trouble. We can see this in Java. Here is a starter Java example, where a class Point, representing a pixel on a graphics display, is extended by class ColoredPoint, which is a point plus RGB-color information. Notice the overridden method, toString. This is an entirely standard use of method overriding:

===================================================

class Point {
    int x;  int y;   // the x,y coordinates of a point

    Point(int initx, int inity) {
        x = initx;  y = inity
    }

    string toString()
    { return  "Point: " + x + "," + y }
}

class ColoredPoint extends Point {
    int[] color;    // the RGB values of a colored point

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);   // call  Point's constructor
        color = new int[3];
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }

    string toString() {
      return "ColoredPoint: " + super.x + "," + super.y 
             + "; colors: " + color[0] + "," + color[1] + "," + color[2] }
    }

Point a = new Point(0,0);  // at position 0,0 --- the upper left corner
ColoredPoint b = new ColoredPoint(0,0, 255, 0, 100);   // violet

System.out.println(a.toString());  // prints "Point: 0,0"
System.out.println(b.toString());  // prints "ColoredPoint: 0,0; colors: 255,0,100"

===================================================
When ColoredPoint object b is constructed, it has two methods named toString, but b.toString() does a dynamic lookup and executes the newer version of toString within b, namely the one from class ColoredPoint. In this way, the older method is overridden --- cancelled -- by the newer one.

So far, so good! But strange things can happen with dynamic lookup. Here is the example of Point and ColoredPaint with methods for equality comparison. Because a colored point is a point-plus-color, its equals method is redefined so that whenever a colored point is compared to an ordinary (noncolored) point for equality, the result is false:

===================================================

class Point {
    int x;  int y;

    Point(int initx, int inity) {
        x = initx;  y = inity
    }

    boolean equals(Point q) {
        return  x == q.x  &&  y == q.y
    }

    boolean hasSameCoordinates(Point q) {
        return equals(q)
    }
}

class ColoredPoint extends Point {
    int[] color;

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);
        color = new int[3];
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }

    boolean equals(Point q) {
        if (q instanceof ColoredPoint)
            { return hasSameCoordinates(q)  &&
                color[0] == q.color[0]  &&
                color[1] == q.color[1]  &&
                color[2] == q.color[2]
            }
        else { return false }   // all Points are nonequal to ColoredPoints
    }
}

===================================================
We coded ColoredPoint so that its equals method overrides the equals method of Point, meaning that the latter is never used within a ColoredPoint object. Within equals of ColoredPoint, notice that hasSameCoordinates checks whether two points have the same x,y coordinates. Our innocent-looking coding goes horribly wrong:
===================================================

Point a = new Point(0,0);
ColoredPoint b = new ColoredPoint(0,0, 255, 0, 100); 

a.hasSameCoordinates(b);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals in  Point,  which returns  true

b.equals(a)    // calls  equals in  ColoredPoint, which returns  false
a.equals(b);   // calls  equals in  Point(!) which returns  true (?!)

b.hasSameCoordinates(a);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals  in  ColoredPoint(!),
                          // which returns  false (!!)

b.equals(b);   // calls  equals in  ColoredPoint, which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint(!!), which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint, which ... repeats forever  )-:

===================================================
Almost nothing goes correctly, thanks to dynamic lookup! The problem is that the coding of hasSameCoordinates must call the coding of equals within class Point to work correctly. This is destroyed by dynamic method lookup --- what we see in the coding of class Point has no relationship to what the computer does. How can we write programs in a language that we cannot trust with our own eyes?

In this situation, we must draw the storage layout and trace the use of the super and self linkages to understand the consequences of dynamic scoping of virtual methods.


6.7 Conclusion

Imperative languages are ``scratchpad languages,'' which treat computation as a game of writing numbers in the squares of a grid-sheet scratchpad, reading the numbers in the squares, and erasing the numbers in the squares and writing new ones in their place. This is the form of computation we do when we keep running totals, like vote counting at an election or keeping score at a basketball game. It is the form of computation we do when we work with databases, blackboard software architectures, and persistent storage.

Imperative languages are for updating storage in small, baby steps. Perhaps the storage is one big piece of primary memory, with millions of small cells, or perhaps the storage is split into hundreds of objects, each with its own location and each holding just a few cells, or perhaps the storage is the grid of RGB-pixels that lights up your computer's display. In any case, if the computation requires locating a cell in a storage structure, reading it, and changing it, then you will be using an imperative language to do it.

In the chapters that follow, we consider other views of computation.