Chapter 6:
The imperative paradigm: From an assignment core to Java

6.1 A core imperative language with declarations and type templates
    6.1.1 Using type templates for storage allocation and type checking: Comparison to C/Java/C#
    6.1.2 Nested data structures
    6.1.3 Additional data and control-structure extensions
    6.1.4 Review of key concepts: declarations and type templates
6.2 Component extensions: Abstracts
    6.2.1 Procedures inside structs
    6.2.2 Type-template abstracts: Classes
    6.2.3 Compiling class code
    6.2.4 Declaration abstracts: Modules
6.3 Component extensions: Parameters
6.4 Component extensions: Blocks
6.5 Subclasses, virtual methods, and this
    6.5.1 Compiling subclasses
    6.5.2 Field lookup: static or dynamic?
    6.5.3 Foundations of subclasses
    6.5.4 Appending structs: mixins
    6.5.5 How method override can go wrong
6.6 Conclusion

When one encounters a massive language, like C++ or C#, for the first time, one is tempted to ask, ``Who thought up this mess?'' Indeed, to understand a programming language, one must look past piles of syntax and identify the language's structure. The principles we studied in the previous chapters let us do that.

Languages that do computation with storable values are called imperative languages, because commands (like assignment) give orders --- imperatives --- about updating storage. This chapter presents the core of an imperative language and applies the extension techniques to grow the language into a modern, object-oriented language like Java or C#.

6.1 A core imperative language with declarations and type templates

We begin with the object language from Chapter 2, extended with arrays and var declarations. These are the characteristic semantic domains:

expressibles(that is, values of expressions): integers, booleans, and handles to arrays and structs

denotables (declarables, that is, values of identifiers): integers and handles to arrays and structs

storables (updatables, that is, values that are stored/updated): these are exactly the denotables.

===================================================

P : Program                   E : Expression
D : Declaration               L : NameExpression
T : TypeTemplate              N : Numeral
C : Command                   I : Identifier
E : Expression

P ::=  D ; C

E ::=  N  |  L  |  E1 + E2  |  E1 == E2  |  not E  |  new T  |  nil

C ::=  L = E  |  print L  |  C1 ; C2  |  if E { C1 } else { C2 }  |  while E { C }

D ::=  var I = E  |  D1 ; D2

T ::=  struct D end  |  array[N] of E

L ::=  I  |  L . I  |  L [ E ]

N ::=  0 | 1 | 2 | ...

I ::=  alphanumeric strings beginning with a letter, not including keywords

===================================================

This is an object language (notice expression new T, where T is a TypeTemplate), and it is used with a heap virtual machine. Objects are just C-style "structs", and arrays are included mostly for amusement, since an array is a "struct" whose "fieldnames" are 0, 1, 2, and so on.

(Historial note: The first programming language with structs ("records") was COBOL --- Common Business-Oriented Language --- which was invented so that accountants could use computers to process personnel files. A struct/record was meant to hold someone's name, address, age, payrate, etc. COBOL was not object-oriented. Like C's structs, a COBOL record is allocated as a sequence of linear cells in storage, and records are allocated at the very start of program execution.)

The most interesting part of a modern assignment languages is how variables are declared and their storage is allocated. In our core langugage, we write

var x = 3;
var z = new struct var f = 0; var g = 1 end;
var r = new array[4] of 0

to create three variable names in the program's namespace: x, which holds value 3; z, which holds a handle to a newly created struct-object that holds ints f = 0 and g = 1; and r, which holds a handle to a newly created four-celled array object initialized to all 0s.

Here's a picture of how this program would be computed on a heap virtual machine:

Notice how the initial value, 0, was specified for the array's cells: new array[4] of 0. The language is defined so that every variable is declared with a completely specified initial value.

This next point is important --- it is a key feature of modern imperative programming: a TypeTemplate phrase is a storage-allocation template. Consider the example one more time:

new struct var f = 0; var g = 1 end allocates storage for an object (struct) that holds two ints. struct...end is a storage-allocation template for allocating an object that holds two cells.
new array[4] of 0 allocates storage for 4 ints, initialized to 0. array[4] of 0 is a storage-allocation template for allocating a vector of 4 cells.

Type-template phrases define object-allocation layouts. When we write new T, we activate T to construct an object with layout T. When we use var I = new T, we create a name, I, along with a cell that holds the handle to the object allocated from T.

6.1.1 Using type templates for storage allocation and type checking: Comparison to C/Java/C#

Let's contrast the above fragment,

var x = 3;
var z = new struct var f = 0; var g = 1 end;
var r = new array[4] of 0

to those in C and Pascal. To obtain the same allocations in C, one would write

===================================================

int x = 3;
struct MyStruct {int f = 0; int g = 1};  // first, name the allocation template;

MyStruct z;  // use it to allocate a struct with the two fields
int[4] r;    // r  is allocated as a 4-celled array

// Now, you must write a for-loop that places 0s in  r's  cells

===================================================

We say z ``has type'' MyStruct and r ``has type'' int[4]. The types help us confirm that the variables are used properly, and C's type checker will validate that this assignment is consistent with z's and r's declarations: z.f = r[2] + 1
but this one is not: x = z[3] - r // this is a ``type error''
In C, the TypeTemplate phrases can be used as data-type names as well as storage-allocation templates. But remember: you can switch off C's type checker and then you are using the TypeTemplate phrases for their primary use --- to allocate storage.

Java and C# also use type-templates as both data-type names and storage-allocation templates. Our example looks like this in Java/C#:

===================================================

int x = 3;
class MyStruct {int f = 0; int g = 1}; // You declare the struct as a class 

MyStruct z = new MyStruct();   // Why do we say,  MyStruct,  twice ?

int[] r = new int[4];          // What's the difference between  int[] and int[4] ?

===================================================

On the last two lines, Java/C# make you use the type-template to attach a data type to a variable name and to allocate the object that the variable names. (And, int[] is not the same thing as int[4]!)

Both C# and Scale let you remove the redundancy:

var z = new MyStruct();
var r = new int[4];

In Java, if you write merely, MyStruct z; int[] r;
then only the data-type names are attached to z and r and no struct and no array objects are allocated. Instead, z and r each have value nil (``no handle'') in the namespace.

We will not devote a lot of time to data-type checking with type templates, but it does not always go smoothly. Here is a standard problem:

===================================================

class MyStructA {int f; int g}; 
class MyStructB {int f; int g}; 
MyStructA x = new MyStructA();
MyStructB y = x      // Is this allowed or not ?

===================================================

Java/C# consider this example erroneous; Algol68 does not.

Type checking in the core language

The core language defined at the beginning of this chapter declares an integer variable like this: var x = 2 and an array like this: var r = new array[4] of nil, allowing assignments like this: r[1] = r; r[2] = x; x = r;
Java/C#'s compilers prohibit such actions --- once x is initialized with an int, it holds only ints thereafter. Say that we add typing to declarations in the core language at the beginning of the chapter. A data type is not the same as a storage-allocation template, so we really should define syntax domains for datatypes, like this:

===================================================

TD : TypedDeclaration
Y : Datatype

Y ::= int  |  structof TD end  | arrayof Y
TD ::=  I : Y  |  TD1 ; TD2

===================================================

Here is the earlier example, restated: var x : int; // declare x to hold only ints x = 3; var z : structof f: int; g: int end // declare z to hold only // structs of f and g z = new struct var f = 0; var g = 1 end; var r : arrayof int; // declare r to hold only arrays of int vars r = new array[4] of 0;
Datatype phrases, Y, are ``patterns,'' whereas type-template phrases, T, are storage-allocate templates --- they are different.

6.1.2 Nested data structures

We can nest data structures. Here is a struct named database that holds an int and a table of up to 100 entries:

===================================================

var max = 100;
var database = new struct var howmany = 0;
                          var table = new array[max] of nil 
                   end;
// insert an entry into the database's table:
var count = database.howmany;
if not(count == max) {
    database.table[count] = new struct var idnum = 9999;
                                       var balance = 25
                                  end;
    database.howmany = count + 1
}

===================================================

Because Java and C# possess classes, which are used as data-type names, the above example becomes more readable:

===================================================

class ENTRY{ int idnum = 0;
             int balance = 0; }

class DB{ int max = 100;
          int howmany = 0; 
          ENTRY[] table = new ENTRY[max];
          
          void insert(int id, int bal) {
            if not(howmany == max) {
               ENTRY item = new ENTRY();
               item.idnum = id;   item.bal = 25;
               database.table[count] = item; 
               howmany = howmany + 1
            }
            
DB database = new DB();  // allocates one DB object but _no_ ENTRY objects
database.insert(9999, 25);  // insert one ENTRY object

===================================================

Here is a diagram of what the Java code constructed:

Perhaps you took this example for granted. But in non-object languages, like Ada, Modula, Pascal, and C, the entire database and all 100 of its entries are allocated all at once, before a single entry has been entered into the database, like this:

===================================================

var database = new struct var howmany = 0;
                          var table = new array[100] of
                                         // allocate all 100 entries at once:
                                         new struct var idnum = nil;
                                                    var balance = 0;
                                         end
                   end;
// insert data for an entry:
var count = database.howmany;
if not(count == 100) {
    item = database.table[count];
    item.idnum = 9999;
    item.balance = 25;
    database.howmany = count + 1
}

===================================================

Here is a picture of what the above code created:

In summary, be careful if you move from Java/C# to C/Pascal/Ada (or vice versa) --- the two languages allocate objects differently! By the way, if you want to do ``lazy allocation'' in C, you must use pointers and malloc, somewhat like this:

===================================================

struct Entry{int idnum = 99; int bal = 0};  // the type-template

Entry* table;           // declares  table  as a pointer(!)
int howmany = 100       // number of array elements desired in the table

// allocate a 100-celled array and save its address in  table:
table =  (int*)(malloc(howmany * sizeof(int)));

Entry s;               // allocate a new struct
table*[0] = &s     // assign the struct's address to cell 0 of the array

===================================================

6.1.3 Additional data and control-structure extensions

Although it is not a topic of emphasis in this chapter, we briefly look at how the core language's data and control structures are extended.

Data-structures

The first language that used structs, COBOL, had a format somewhat like this: D ::= var I = new T | D1 ; D2 T ::= struct (I : int)+ end | array[N] of int where P+ means one-or-more P phrases
That is, the contents of a struct is limited to just integer variables. Arrays are also integer-only.

The data-structure extension principle suggests that all storable values --- all data structures --- should be storable as elements within all other data structures. This means we should allow arrays of arrays and structs, and structs of arrays and structs. In C and Pascal, the syntax of data structures evolved to this:

===================================================

D ::=  var I : T  |  D1 ; D2

T ::=  struct D end  |  array [ N ] of T  |  int

===================================================

That is, var I : T, allocates storage of shape T for I, where T can be an arbitrarily nested data structure. int is the ``starter'' type template.

(Of course, there is a cost to such generalization. For example, the C language is designed so that storage is a sequence of integer cells, and data structures like arrays and structs are an illusionary naming convention for a sequence of contiguous cells. This storage model lets C programs compile into simple, fast target code --- C is meant for writing device drivers, not databases.)

In a later section, when we add procedures, modules, etc., we will see how they can embed within data structures, causing a further generalization.

Control structures

Our core language uses these control structures for commands: C ::= . . . | C1 ; C2 | if E { C1 } else { C2 } | while E { C }
But there can be control structures at other levels of the language. In fact, we already see one at the Declaration level: The Declaration domain uses the sequencing control structure, ;, which is important because declarations can be initialized like this: var x = 2; var y = x * x
The order of the declarations, stated by ;, matters. In some languages, it is useful to have other control structures for declarations (e.g., conditionals) to help decide what to declare.

Since the core language has arrays, it might be useful to introduce a control structure that lets us ``iterate'' (process) an array's elements. Perhaps we add a for-loop, which counts upwards from some lower bound to some upper bound by ones:

C ::=   ... | for I = E1 upto E2 : C

Within the for-loop, variable I is used as an index for locating elements in an array, e.g., var evens = new array[9]; for index = 0 upto 8: evens[index] = index * 2
Or perhaps we have an iterator that extracts the array elements, one by one: foreach element in evens: print element * 2
Perhaps structs also require a control structure for iteration. It might look like this: C ::= ... | foreach I in L : C
The loop would look up each field in struct L which would be used to index L's fields. Here is an example: var evens = new struct var a = 0; var b = 2; var c = 4; end; var sum = 0; foreach k in evens : sum = sum + evens.k ;
As an exercise, think about the control structures that might be useful at the level of Expression.

6.1.4 Review of key concepts: declarations and type templates

For the sections that follow, we use this syntax for declarations and type templates:

===================================================

E: Expresssion         N: Numeral
D: Declaration         T: TypeTemplate
I: Identifier

E ::=  N  | ... |  new T  |  nil
D ::=  var I = E  |  D1 ; D2  |  ...
T ::=  struct D end  |  array [ N ] of E

===================================================

Here is a description of each construction's semantics. (The interpreter that we would write should follow this description closely.)

The meaning of N is the int that N represents
The meaning of new T is that storage is allocated that matches template T and the handle to that storage is returned as the value.
The meaning of var I = E is that a new variable I and its cell are added to the program's namespace and the value of E is placed in I's cell.
The meaning of D1 ; D2 is that D1's declaration is done first, then D2's.
The meaning of struct D end is a template, that when activated, allocates an object that will be the namespace into which the declarations defined by D are saved.
The meaning of array[N] of E is a template, that when activated, allocates an object that holds N-many cells. For each cell, expression E is computed and its value is saved in the cell.

Exercises:

Add Declaration, TypeTemplate, and the new T construction to the interpreter for the object language in Chapter 2.
Here is another, useful syntax for type-templates:
T ::= struct D; C end | array[N] of E
The C in the struct acts as initialization code. Implement this in your interpreter.
Add type checking to your interpreter, that is, when a variable is declared as var I = new T, then I is saved in the program's namespace with template T and its value. All subseqent assignments to I must be values that have type T. (That is, a declaration, var I = E, computes both the type and value of E, and these are saved in I's cell.)
Now, "split" your interpreter from Exercise (3) into two parts: the first part is a type-checker program: it accepts an operator-tree representation of a program as its input, and it computes only the data-type information of the variables in the program --- it does not compute the values of expressions! If a program can be completely "type-interpreted" with no errors, then the second part, the interpreter from Exercise (1) or (2), is executed on the "type-checked" program. (A compiler-based implementation of a language will do parsing and type checking somewhat like this. The idea is to remove all the types and type checking from the run-time execution.)

6.2 Component extensions: Abstracts

The abstraction principle says we may add abstracts for each of the language's domains.

6.2.1 Procedures inside structs

In an earlier chapter, we saw that procedures are just named commands. Here they are, again:

===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C } 

C ::=  L = E  |  C1 ; C2  |  ... |  L()

T ::=  struct D end  |  array [ E1 ] of E2

E ::=  N  |  ...  |  new T  |  nil

L ::=  I  |  L . I  |  L [ E ]

===================================================

The definition construction is proc I(){C} and the calling construction is L().

The syntax for structs, seen above, exposes an important idea --- we can embed procedures within a struct. This insight led to the development of object-oriented programming as first seen in the Simula67 programming language --- objects are structs that contain both procedures and variables. Here is an example:

var init = 0;
var clock = new struct var time = init;
                       proc tick(){ time = time + 1 };
                       proc display(){ print time };
                       proc reset(){ time = 0 }
                end

The procedures inside clock maintain variable, time. The struct is allocated as an object, with handle β:
One by one, the declarations in the struct are executed and placed into the active namespace, β. (While it is being constructed and initialized, β lives at the top of the namespace stack. Once the struct is constructed, β is popped.)
The clock object is a namespace, and tick, reset, and display are closures. Say we use clock like this: clock.tick(); clock.tick(); clock.display()
When one calls clock.tick(), the following happens:

clock is looked up in global namespace α; its value is β. This makes the call into β.tick().
Within namespace β, tick is located; it names the stored procedure at address γ. The computer executes the stored code for tick, as pointed to by closure γ.
A new namespace, ρ, is created for the called procedure, just like we saw in the previous chapter. Note that tick's parentns field (the link to global variables) is set to β. So, tick's code can look up and increment the nonlocal variable, time. Also, tick can call the other procedures that live in object β. Here is the storage configuration when the code for tick is executed:
(In Java/C#, the parentns link inside namespace ρ is called ``this,'' since namespace β is ``this object'' that tick updates.)
tick's code executes, and time within namespace β is incremented.
Once tick completes, ρ disappears from the activation stack, ns.

Finally, note that the parentns link within namespace β is set to α; this allows the object to find global variables outside its ownership. (If we don't want an object to see variables outside its own namespace, we set parentns to nil.)

The example shows that

objects are structs; they are implemented as namespaces.

6.2.2 Type-template abstracts: Classes

We can name type-template phrases; they are classes. Here is how we define and call TypeTemplate abstracts:

===================================================

D ::=  var I = E  |  D1 ; D2  |  proc I() { C }  |  class I = T

T ::=  struct D end  |  array [ E1 ] of E2  |  L

===================================================

where L ::= I | L . I | L [ E ]
is the same as before. The L in the syntax rule for T calls a named class. The clock example in the previous section can be written like this: class CLK = struct var time = 0; proc tick(){ time = time + 1 }; proc display(){ print time }; proc reset(){ time = 0 } end; var clock = new CLK; clock.tick()
This looks familiar. Here's what happens in the machine.

CLK is declared, which means a closure object is constructed to hold its code and its link to its global variables:
Next, clock is declared, which calls and executes CLK: This works like before: once CLK's closure is fetched and its code extracted, a new namespace is allocated, its parentns link is set, and the namespace's handle is pushed onto the activation stack:
Notice the red annotation I placed on the objects's parentns field --- when we study later subclasses, the parentns field will be replaced/renamed by super, which binds to the handle of the object that holds nonlocal variables (of the "super object").
The declarations listed within CLK's body are evaluated using the new namespace:
At conclusion, the activation stack is popped, and the handle to the initialized object, β, is assigned to clock in the active namespace:
When clock.tick() is called, its activation namespace is allocated:
Notice the red annotation I placed on the activation's parentns field --- when we study later virtual method calls, the parentns field will be replaced/renamed by this, which binds to the handle of the object whose fields are altered by tick.

If you are a Java/C# user, you are looking for the constructor method in the previous example. (The constructor method would be called at Step 4, above.) A constructor method is used to initialize an object's fields. But there is a simple alternative to a constructor method: here is how you do it, C++/Scala style --- allow initialization code in the struct:

T ::= struct D ; C end  |  . . .

The initialization commands, C, execute immediately after the class's fields, D, are declared. Once we allow parameters to classes, we get this correspondence: class CLK(start) = struct var time = 0; proc tick(){ time = time + 1 }; proc display(){ print time }; proc reset(){ time = 0 }; // initialization code: time = time + start; display(); end;
which corresponds to this Java-style code: class CLK { int time = 0; void tick() { time = time + 1 } void display(){ print time } void reset(){ time = 0; } // constructor method: public CLK(int start) {time = time + start; display();} }
Java uses constructor methods so that the compiler can generate simpler target code.

The above example revealed something even more important. When we revise the core syntax to look like this:

===================================================

P: Program            B: Block (THIS ONE IS NEW)
D: Declaration        T: TypeTemplate
E: Expression         etc.

P ::=  B
B ::=  D ; C

D ::=  var I = E  |  D1 ; D2  |  proc I() { C }  |  class I = T
T ::=  struct B end  |  array[ E1 ] of E2  |  L

E ::=  ...  |  new T

===================================================

making a struct hold a Block, we discover that the body of a struct is a "little program" that is waiting to be executed! (You can also say that a program is a "big struct"!)

This is the intuition behind the original "actor" programming paradigm --- an executing program generates "actor objects" that "live", "talk to each other", and "die" (but leave behind memory-objects that can be read by other actors).

HISTORICAL NOTE: Simula67 was designed for simulations (weather simulations, airplane-flight simulations, traffic simulations). For example, a traffic simulation might be modelled with car objects constructed from class Car, and highway segments and entrance/exit ramps constructed from class Queue. Simula also had "coroutines" to model multiple object activity. Simula led to Smalltalk and C++. The former went further by eliminating primitive values for int and bool and keeping only objects. The latter was a faithful addition of the class construction to C. The actor paradigm was developed at the same time as Simula.

Classes are well used for descriptive data structuring. Here is the earlier database example rewritten, using nested classes and parameters, Scala-style:

===================================================

class DB(maxEntries) = 
  struct
     class Entry(id, bal) = struct var idnum = id;
                                   var balance = bal;
                            end;
     var howmany = 0;
     var table = new array[maxEntries] of nil  // fill me later with Entry obs

     proc addEntry(id) { if (howmany != maxEntries) :
                            var e = new Entry(id, 0);
                            table[howmany] = e;
                            howmany = howmany + 1;
                            return e
                         end;  }                      
  end;
var database = new DB(1000);
 ...
var myentry = database.addEntry(9999);
 ...
var e = new database.Entry(8888, 30);

===================================================

Remember: A class is a storage-allocation template (perhaps with "initialization code"). When we define a class, we define a template, and when we use the keyword, new, we activate the template to allocate storage (and its initialization code) --- we execute a "little program" that manufactures its own little "memory".

This development shows that a core assignment language with structs already has the computational power to do object-oriented programming --- it is merely a matter of introducing a couple of key abstracts (command abstracts and type-template abstracts) for convenience.

6.2.3 Compiling class code

The previous diagrams show how an interpreter saves class and procedure declarations in closures and executes the closures when the classes and procedures are called.

The situation looks differently when a compiler is involved. The Java/C# compiler checks grammar and data-type compatibility, and pre-computes as many variable lookups as it can. The compiler generates an output (bytecode or .exe) program whose heap omits closures and whose namespaces omit variable names for classes and methods (procedures). Where there is a call to a class or a procedure, the compiler inserts a gosub jump in place of a namespace lookup for a closure.

Say that the previous example,

class CLK = struct var time = 0;
                   proc tick(){ time = time + 1 };
                   proc display(){ print time };
                   proc reset(){ time = 0 }
            end;

var clock = new CLK;
clock.tick()

is compiled, Java style. Here is what storage looks like when the compiled program executes the call to clock.tick():
The program's global namespace, α, has no entry for CLK --- For the call, new CLK, the compiler inserted a gosub instruction to the address of CLK's code. There is no closure lookup.

Since Java prevents objects from referencing nonlocal variables, the parentns link is not included in the object constructed by new CLK. Further, when the CLK object is allocated, there are no bindings tick, display, and reset, because the compiler embeds gosub instructions for these methods where they are called in the program.

The call to clock.tick() computes clock to be the name of handle, β, which is stored as the this (parentns) link in the activation record for the call to tick.

Many important semantics concepts are missing from the diagram (because they are embedded in the compiled bytecode), but the amount of run-time storage used is significantly less. One drawback of the compact storage layout is that procedures cannot be used with variables, like this: var p = clock.tick; p().

6.2.4 Declaration abstracts: Modules

There are other opportunities for applying the abstraction principle. One candidate is the Declaration domain. This gives a form of module ("package") --- a named, compound declaration that can be ``imported'' into a program. The syntax looks like this:

===================================================

D ::=  var I = E  |  D1 ; D2  | ...  |  module I = D end  |  import L

===================================================

A module differs from a class because a called module embeds a set of declarations at the point where it is imported (called). In contrast, a class is called to allocate a namespace and fill it.

Here is an example to make this point:

===================================================

module M = 
    var x = 0;
    class C = struct var a = 0  end;
    var y = new array[10] of new C;
    proc initialize(){
        x = 0;
        for i = 0 upto 9 { y[i].a = i }
    }
end;

// The above code often resides in a separate file, named  M.

import M;        // embeds M's declarations in this program
initialize()     // calls the proc as if it were declared in the program
var z = new C;   // class  C  is used as if it were declared in the program
z.a = x + y[0].a

===================================================

Once M is imported, its declarations are linked to the program as if they had been written there in the first place.

Since module importation is a kind of linking, does it make sense to ``import'' twice? That is, can we do this?

import M;
import M

and is it the same as importing M just once? Most languages ignore repeated imports of the same module.

Here is another question: Should we allow this example?

===================================================

module M = var x = 7 end;

module N = var x = 99 end;

import M;
import N;
x = x + 1    // which variable  x  is updated?

===================================================

This is a serious issue when a large program is assembled from many modules that are linked together --- there is always a chance that two distinct modules declare the same name. For this reason, most languages require that a module's names are referenced with dot notation, like this:

===================================================

module M = var x = 7 end;  // might be in a separate file

module N = var x = 99 end;  // might be in a separate file

import M;
import N;
N.x = M.x + 1

===================================================

Languages that use dot notation often add an operation that ``opens'' the module so that the declarations are exposed like we saw them in the first place: module M = var x = 7 end; from M import *; // import * means that all declarations in M are linked // as if they were declared in the program. x = x + 1
A variation of this operation opens the module for a limited scope: module M = var x = 7 end; module N = var x = 99 end; import M; import N; with M do { N.x = x + 1 // inside the with M do, references to x mean M.x }
We'll continue the development of modules in the next section, because they really benefit from parameters.

6.3 Component extensions: Parameters

In the previous chapter we learned how to add parameters to command procedures. This same technique is useful for adding parameters to classes and modules. We sneaked parameters into several earlier examples with classes, like this:

===================================================

class DB(size) = struct 
                     class Entry(id, bal) = struct var idnum = id;
                                                   var balance = ba
                                            end;
                     var howmany = 0;
                     var table = new array[size] of nil;
                     
                     proc addEntry(id) { 
                        if (howmany != size) :
                            var e = new Entry(id, 0);
                            table[howmany] = e;
                            howmany = howmany + 1;
                            return e
                        end; }                   
                 end;

var database = new DB(100);

===================================================

You can do this in Scala.

Most object-oriented languages make you transmit arguments to a class's constructor method and not to the class itself: Here's how the above example looks in Java:

class Entry { int idnum;  int balance; 
              Entry(int id, int bal) { idnum = id;  balance = bal; }
            }

class DB { int howmany = 0;  Entry[] table;  
           DB(int size) { table = new Entry[size]; }
           void addEntry(int id) { ... }
         }
         
DB database = new DB(100);

Java requires a constructor method to handle parameters. Please see the Exercise at the end of this section for a further analysis of this difference.

Semantics of parameter passing to classes

We know that arguments for a procedure call are saved in a namespace (activation record) constructed specially for the call. We can do the same thing for a "class call" or a "module call" --- we can make a separate namespace and place the parameters-and-arguments there. For example, the class call to Entry on the last line of class Entry(id, bal) = struct var idnum = id; var balance = bal; proc reset() { balance = bal; } end; var e = new Entry(9999, 25)
would construct a namespace that holds id and bal and also a second namespace that holds idnum, balance, reset, and a parentns link to the namespace that holds id and bal. But Scala uses the trick of placing all of id, bal, idnum, balance, and reset in one and the same namespace.

Interfaces to modules

Modules might also be parameterized, perhaps by expressions or even type-templates. Consider this example, which defines a database module parameterized on an int and a type-template:

===================================================

module DataBase(size, recordTemplate) =
    var howmany = 0;
    var table = new array[size] of new recordTemplate;

    proc initialize(){
        for i = 0 upto (size - 1) {
            table[i].init() }   // oops -- how do we know  recordTemplate  contains proc  init ?
    };

    proc find(index):
        if index >= 0 and index < size {
            answer = table[index].getVal()  // how do we know  recordTemplate  contains function  getVal  ?
        return answer
    }

===================================================

If we had this class,

===================================================

class Entry = struct var idnum = 0;
                     var balance = 0;
                     proc init(x,y){ idnum = x;  balance = y; };
                     fun getVal(){ return balance }
              end;

===================================================

we could activate (import) the module like this: Database(100, Entry)

The coding of DataBase is suspect --- it assumes that whatever the recordTemplate type-template might be, it includes a procedure named init and a function named getVal. To ensure the security of module Database, we should annotate its parameters with these requirements. The data-type-like annotations are called an interface.

Here is a Java-like coding of the interface that we want:

===================================================

interface RecordInterface = { void init (int, int);
                              int getVal(); 
                            }

module DataBase(int size, RecordInterface recordTemplate) = ... like before ...

===================================================

The interface gives enough information that the programmer or compiler can check that the module is coded sensibly. The annotation of size ensures that an int argument will be bound to it, and the annotation of recordTemplate ensures that a struct object with at least an init procedure and a getVal function will be bound to it.

The type-template argument that is bound to parameter recordTemplate must match the interface; it must ``implement'' it, as one says in Java-speak:

import Database(100, Entry)   // Entry  implements (matches) RecordInterface

Exercise

Java and C# use constructor methods in their classes, so that arguments can be used to initialize newly allocated objects, e.g., class Entry { int idnum; int balance = 0 } class DB { int howmany = 0; Entry[] table; DB(int size) { if size > 0 { table = new Entry[size]; } } } DB database = new DB(100)
In our example object language, we can always add an init procedure, like this: class Entry = struct var idnum = 0; var balance = 0 end; class DB = struct var howmany = 0; var table = nil; proc init(size) { if size > 0 { table = new array[size] of nil } } end; var database = new DB; DB.init(100)
But consider this form of struct:

===================================================

P ::=  D ; C
D ::=  var I = E  |  D1 ; D2  |  class I1 ( I2 ) of T
E ::=  ...  |  new T
T ::=  ...  |  struct P end  |  array [ E1 ] of E2  |  L ( E )

===================================================

Now structs are collections of declarations followed by initialization code. Recode the above example in this new syntax. Implement it.

This example shows that structs (classes) are encapsulated programs that are executed (via new) and are queried for their answers (via L.E indexing). This is the basis of actor theory, where actors/agents are small programs, like ants in an ant colony, that ``execute in themselves'' and ``communicate'' their answers/knowledge.

6.4 Component extensions: Blocks

We saw in the previous chapter how procedures can declare and use local variables. This idea is used to good effect with classes and modules, so that they can own private declarations that cannot be altered from outside the scope of the abstract.

Here is the syntax of a declaration block, which lets a declaration own private declarations:

===================================================

D ::=  var I = E  |  D1 ; D2  |  class I1 ( I2 ) = T  | ... |  begin D1 in D2 end

T ::=  struct D end  |  array [ E1 ] of E2  |  L ( E )

===================================================

The qualification principle generates the "private" and "public" fields in classes:

===================================================

class CLK(init) = struct 
                    begin var time = init;  // this is a "private" declaration
                    in  // here are the "public" declarations:
                        proc tick(n){ time = time + n };
                        proc reset(){ time = init }
                    end end;
                    
var clock = new CLK(0);
clock.tick(2);  // we cannot say,  clock.time = clock.time + 2

===================================================

(By the way, other visibility labellings, "protected", "package", etc. come about due to subclasses and modules. We won't study these yet.)

Returning to the database example seen above, we can improve the declaration so that the variables owned by the database are private:

===================================================

class Entry(id, bal) = struct var idnum = id;
                              var balance = bal;
                       end;

class DB(size) = struct
                 begin var howmany = 0;  // these two declarations are private
                       var table = new array[size] of nil;
                 in 
                       proc find(i){ ...table[i].balance()... }
                       proc update(i,...){ ...table[i] ... howmany ... }
                 end end;

module DataBase(max) = begin var mybase = new DB(max)  // this declaration is private
                       in       
                             proc searchDataBase(...) {
                                 ...mybase.find(...)... }
                           
                             proc processDataBase(...) (
                                 ...mybase.update(...)... }

                             // here, we cannot reference  mybase.howmany
                             // nor  mybase.table
                       end;

import DataBase(100);
DataBase.searchDataBase(...);
DataBase.processDataBase(...);
// but we cannot reference  Database.mybase

===================================================

Protection was placed around private variables howmany and table within DB so that once a DB object is allocated, all uses of the two variables must be made via the public procedures, find and update. The same idea is used to protect the struct, mybase, within module DataBase.

Of course, in C# and Java, the keyword, private, is used to label a declaration as local to a block.

The qualification principle makes it possible to encapsulate declarations within components so they are safe from unauthorized use, no matter where the component is inserted into a system. Large systems building is possible only because of the qualification principle.

Semantics of private declarations

We saw that a procedure call's parameters and local variables are saved in the activation record (namespace) that is allocated when the procedure is called. We can do the same for private variables to classes and modules. Here is a drawing of the storage layout for the clock example at the beginning of the section, where tick is called: class CLK(init) = struct begin var time = init; in proc tick(n){ time = time + n }; ... end end; var clock = new CLK(0); clock.tick(2);
A namespace holds parameter init and private variable time, and that space is linked to by the namespace for object clock. (We will use this same technique for objects built from subclasses!)

In contrast, compiler-based languages like Java and Scala will embed the parameters and private variables in the same namespace as object clock's, because the compiler can enforce the restriction that private variables are not referenced outside class CLK.

6.5 Subclasses, virtual methods, and `this`

Object-oriented languages use subclasses, virtual methods, and the pronoun keywords, super and this. These concepts are delicate, yet their correct use is critical to successful object-oriented programming. They require some small but crucial alterations to the heap virtual machine.

Review: classes and instance methods

We learned that a procedure declaration, proc p(x,y,...){C}, constructs a closure object for p. The closure holds the param names, x,y,..., the body, C, and a link to the global variables C can reference.

When p is declared inside a struct, p's closure holds the link to the newly allocated struct. This example,

var init = 0;
class CLK() = struct var time = init;
                       proc tick(){ time = time + 1 };
                       proc display(){ print time };
                       proc reset(){ time = 0 }
              end;
var clock = new CLK();

generates this heap image:
tick's closure holds a link back to β. When we call clock.tick(), the call's activation record sets its parentns link to β, which is extracted from the closure at γ. We say that tick is an instance method of object clock.

The implementation seen here, with closures, is standard to Ruby, Python, Scala, etc. It lets you use closure handles as storable values ("function pointers"), e.g.,

var p = clock.tick;  // sets  p  to handle γ 
p();  // increments  time  in  clock
myGraphicsWindow.setButtonPressEventHandlerTo(p);  // pass as an arg for later use

which is a useful technique for systems programming and event handling.

What we consider in the next section is a procedure closure that does not save a link to its global variables. When the procedure is called, the parentns link must be assigned from "how the procedure is called." Such a method is called a virtual method. When used with subclasses, virtual methods operate differently than instance methods.

Subclasses

Here is motivation for subclasses: GUI-building frameworks contain starter classes for windows, frames, buttons, text entries, and so on:

===================================================

class Button {
   int x; int y;   // coordinates for the button's position

   proc paint() {  // code for formatting the button:
                   //   makes settings for position, label, font, color, etc.
     ... x ... y   // do some formatting with x and y
   }          
   proc refresh() {  // code for redrawing the button on the display matrix
     paint();  // calls  paint  to format the button
     ... technical code that talks to the framework and the OS
   }
   proc handleButtonPress() {  // code for reacting to a button press
       pass    // The default does nothing
   }
}

===================================================

The class contains just enough code to generate a blank button in a GUI. A user can build on the code to define a useful, customized button:

===================================================

class MyButton extends Button {  // my own customized button
   int z;  // some extra data --- font, color, label, whatever ...

   proc paint() { // replaces the default coding in Button
     ...
     super.paint();  // call the default code to do some things
     ... z           // and then do some customized formatting, with  z
   }
   proc handleButtonPress() { // replaces the default coding in Button
      ...  // do the computation the button is supposed to trigger
   }
}

===================================================

MyButton is a subclass of Button; it extends (adds to) Button with extra fields and methods. When the user states var b = new MyButton();
this constructs a two-namespace object holding all the fields and methods from both Button and MyButton.

What should these calls do?

b.handleButtonPress() --- calls the coding of handleButtonPress in MyButton.
b.paint() --- calls the coding of paint in MyButton, which itself calls paint in Button (``super.paint()'') --- Button is the superclass and MyButton is the subclass of variable b. (In C#, one says, base.paint().)
b.refresh() --- calls the coding in Button, which itself calls paint() --- which copy of paint should execute?
The designers of GUI frameworks want b.refresh() to call the newer version of paint, in MyButton, so that all b's fields are properly formatted. This is the setup provided by Smalltalk/Java. (And you make this happen in C++/C# when you insert the extra keywords, virtual and override, on the header lines of the two versions of paint.)

How does the virtual machine execute these actions? Some small changes are needed. Say we have these two declarations,

var a = new Button;
var b = new MyButton;

Here's what's in the heap:
Variable a names an object in the heap; so does b, but its object is two namespaces, linked together. The link between the two is called super. This is why we write super.paint() within method paint --- super is the name of the link to the ``super object.'' (In C#, say base.paint().) The objects at ε and η have no super-objects.

There is another key difference --- a closure no longer saves a handle to its global variables --- a closure is now merely some code or a pointer to some code.

Virtual-method call: `b.paint()`

Say we do this method call: b.paint(). Since the active namespace is α, the name b means ρ --- the call is ρ.paint(). We find the entry for paint in namespace ρ, which leads us to the code at σ. We construct the activation record, μ, which must link to paint's global variables. What should the link be? (It's not saved in the closure anymore!) Use ρ, since the call was to ρ.paint()! Executing b.paint():
The link in activation record μ is named this. (Note: in some object languages, it is named self.)

Because the link to the global variables is determined only when a method is called, the implementation is called dynamic scoping. The method itself is called a virtual method, because it is not completely defined until its this-link is determined, which only happens when the method is called.

Virtual methods are the default in Java and Smalltalk, and they can be coded with the keyword, virtual, in C++ and C#.

Virtual-method call: `b.refresh()`

Next, we do this method call:

b.refresh()

Since the active namespace is α, the name b means ρ --- the call is ρ.refresh(). A search of ρ's namespace fails to find method refresh, but a search of the super namespace, η, finds it. The code for refresh is fetched, and a new namespace, φ, is allocated for the call: Executing b.refresh():
φ holds a this link to refresh's nonlocal variables: it is again ρ, the handle of b, because the call was b.refresh().

Now, the code for refresh calls paint(). The active namespace, φ, is searched for paint and then the linked space, ρ is searched --- the code for paint in MyButton is activated:

Executing   this.paint():

IMPORTANT: the call, paint(), is "the same as" this.paint() --- the value of this in namespace κ is copied from the value of this in namespace φ.

To summarize:

a virtual method, m, obtains its variable-lookup link, named this, only when it is called, ob.m(...) --- this is set to ob's handle

The above description is the simplest I can give you to explain subclasses and virtual methods --- it uses Java's virtual machine. A more modern virtual machine, used by Python and Scala, keep both a this link and a parentns link in each activation record.

6.5.1 Compiling subclasses

As usual, the storage layout generated from a Java or C# compiled program is simpler than what we see in the above picture, closures and not saved in the heap. And when an object is constructed from a subclass, only one namespace is allocated, not two. Here is a picture of the storage layout that C# generates for the previous example:
All the information about how virtual method lookups are computed is missing --- the compiler has hidden it in the target code that it generated!

6.5.2 Field lookup: static or dynamic?

In the previous example, class Button held fields x and y, and class MyButton held field z. Should field lookup execute the same way as method lookup? For example, if we call b.refresh(), can refresh's code in class Button reference the field z in MyButton (dynamic scoping)?

What if class MyButton declared its own field, x, and we call b.refresh(). Should refresh's code in class Button reference the x in MyButton (dynamic scoping) or the x in Button (static scoping)?

In the compiler-based languages, Java and C# and Scala, all method calls are dynamically scoped and all field lookups are statically scoped. This means refresh always uses fields in class Button and never in class MyButton. But this causes another odd behavior; consider this example:

===================================================

class C1 {
    int x = 1;
    public int f1() { return this.x; }
}
class C2 extends C1 {
    int x = 2;
    public int f2() { return this.x; }
}
class C3 extends C2 {
    int x = 3;
    public int f3() { return this.x; }
}

C3 c = new C3();
System.out.println(c.f3());   //  prints  3
System.out.println(c.f2());   //  prints  2
System.out.println(c.f1());   //  prints  1

===================================================

Now, each of the calls computes the same handle for c, and each call computes this.x, yet one time the result is 3, one time it's 2, and one time it's 1! The reason is that the compiler generates target code that indexes differently into c's object for different instances of x (and indeed, there are three of them).

This is not a good use of this! As a rule, in Java/C#/Scala, never use this with field names, only with method names.

Python has classes, too, and a programmer can choose to use either dynamic or static scoping for either methods or fields: Use self.f for dynamic lookup and use f for static lookup --- that's it. This is my favorite solution to this mess. For example, if refresh references all its methods and fields dynamically, it is written like this:

   proc refresh() {  // code for resetting and repainting the button
     ... self.x ...  self.y
     self.paint() 
   }

and if it references all its methods and fields statically, it is written like this: proc refresh() { // code for resetting and repainting the button ... x ... y paint() }
and finally, if it reference methods dynamically and fields statically, it looks like this: proc refresh() { // code for resetting and repainting the button ... x ... y self.paint() }
The Python implementation uses namespaces that contain all of parentns, super, and this.

Exercise

Say that we have subclasses and say that all field and method lookups are dynamically scoped. Explain how all objects, even the ones constructed by subclasses, can be implemented as single namespaces. (Hint: use a Python dictionary to implement objects.)

6.5.3 Foundations of subclasses

Classes are structs, and subclasses are structs that can be appended to structs. We develop this idea in careful detail, starting from a small example from Java. It leads to some surprising complications.

First, here is class Point:

===================================================

class Point {
    int x;  int y;  // the x,y coordinates
  
    Point(int initx, int inity) {   // constructor method
        x = initx;  y = inity
    }

    void paint() {
        System.paintPixel(x,y,255,255,255)   // paints a white pixel at x,y
    }
    boolean equals(Point q) {   // compares this point's location to point  q
        return  x == q.x  &  y == q.y
    }
}

===================================================

The class, which might reside within a graphics framework, defines two fields and a method that paints the point and a method that compares an object constructed from this class to another with the same structure, for example: Point p1 = new Point(0,0); Point p2 = new Point(1,1); p1.paint(); p2.paint(); boolean b = p1.equals(p2); // will compute to False
The graphics framework might also support color, so there is the notion of a colored point, which is a graphics point (pixel), colored with RGB (red-green-blue) coding:

===================================================

class ColoredPoint extends Point {
    int[] color = new int[3];

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }
    void paint() {
        System.paintPixel(super.x, super.y, color[0], color[1], color[2])
    }
    boolean equals(ColoredPoint q) {
        return  super.equals(q)  &&  
                color[0] == q.color[0]  &&
                color[1] == q.color[1]  &&
                color[2] == q.color[2]
    }
}

===================================================

A ColoredPoint builds on the structure of a Point, adding a data structure that remembers RGB-values. (Notice how the coding in ColoredPoint can obtain the values of x and y in Point with the label, super.)

Also, ColoredPoint has its own paint method that uses a point's location and its color. The recoding of paint is called a method override, because there is already a coding in the superclass but we are overriding it with a new coding. Method override is a key technique used in graphics libraries written in object-oriented style: a graphics widget, say a Button, is coded with simplistic paint and interrupt-handling methods, which the programmer later overrides with more detailed ones.

In the above example, to compare one colored-point object equal to another, we must also change equals: we must check that the x,y coordinates are identical (this is done with a call to super.equals --- the equals method in the super class, Point) as well as the RGB integers are equal.

Here are examples of Java points and colored points:

===================================================

Point a = new Point(0,0);  // at position 0,0 --- the upper left corner
ColoredPoint b = new Point(0,0, 255, 0, 100);   // violet
ColoredPoint c = new Point(0,0, 255, 0, 0);     // red

a.paint();  // paints a white point at 0,0 -- uses  paint in class Point
b.paint();  // paints a violet point at 0,0 -- uses  paint in class ColoredPoint

a.equals(b);   // calls  equals in  Point;  returns  True
b.equals(c);   // calls  equals in  ColoredPoint;  returns  False
b.equals(a)    // calls  equals in  Point(!);  returns  True

===================================================

The call, a.equals(b), uses the equals method attached to object a, a Point object, to see if object b has the same x,y coordinates; the color information in b does not matter --- from the perspective of a, b is ``equal'' to it.

The call, b.equals(c), uses b's equals method, within ColoredPoint, to compare b's position and color to c's. Finally, the third call, b.equals(a), is surprising, since b's equals method, expects a ColoredPoint argument. Actually, b possesses two equals methods --- one for comparing itself to ColoredPoint objects and one for comparing itself to Point objects. This is called method overloading. Here, the equals method that accepts a Point argmument is used to make the equality comparison between a and c. Is this what you expected? Is it what you want? Why did it happen?

Objects are structs, and a subclass, like ColoredPoint, defines a struct type template that is the Point struct appended to the ColoredPoint struct. It is as if class ColoredPoint defines this type template:

{ int x; int y;
  void paint() { ... }  // this method will never be used (!)
  boolean equals(Point q) { ... }
  int[] color = ...;
  void paint() { ... }  // this newer method is always used
  boolean equals(ColoredPoint q) { ... }
}

and object b contains exactly these fields. The first occurrence of method paint is overridden by the second. The name, equals, is overloaded, because it has two ``bodies'', which are chosen based on the data type (pattern) of argument that is used when equals is called.

Stated more precisely, when a call like b.paint() or b.equals(c) occurs, the fields of object b are searched from newest to oldest (``last to first,'' in the above coding) until there is a match of a method name and its parameter types to the call. This is how b.equals(a) executes the variant, boolean equals(Point q).

6.5.4 Appending structs: mixins

If you can understand the above story, congratulations! But the explanation is a complicated mess. Let's start over. Returning to our core imperative language, we get subclasses like this: first, we add an append operation, +, to the syntax of type templates:

===================================================

T ::=  int  |  struct D end  | ... |  T1 + T2

===================================================

This lets us code structs in stages: var p = new (struct var x = new int; var y = new int; end + struct color = new array[3] of int; end); p.x = 0; p.color[1] = 255
The above example does not seem so useful, but once we add class names (and procedures/methods), it gets interesting:

===================================================

class Point = struct
         var x = new int;
         var y = new int;
         proc paint() { ...x...y... }
      end;

class Color = struct
        var color = new array[3] of int;
        proc paint() { ... super.x ... super.y ... color ... }
      end;

class ColoredPoint = Point + Color;   // aha!

var p = new ColoredPoint; // allocates a struct with all the fields (``attributes'')
                          // of  Point  and also  Color
p.x = 0;
p.color[1] = 255

===================================================

Some object-oriented languages use classes exactly as shown here. (The class-fragments are called mix-ins. In Scala, they are called traits, and C# has a fakey version called a partial class.)

But mainstream object-oriented languages (e.g., Java) allow only an ``incremental'' append, where a named ``base class'' is extended with new fields, like this:

class ColoredPoint = Point extended with
                        var color = new array[3] of int;
                        proc paint() { ... super.x ... super.y ... color ... }
                     end;

In any case, ColoredPoint is a subclass of Point because it builds on Point --- it has all Point's fields and then some. Subclassing is an abbreviation for appending structs.

We now review the complex Java example shown earlier. Here is a simplified version of the example (just method overrides, no overloads):

class Point = struct
         var x = new int;
         var y = new int
         proc paint() { ...x...y... }
      end;
class Color = struct
                 var color = new array[3] of int;
                 proc paint() { ... super.x ... super.y ... color ... }
              end;
class ColoredPoint = Point + Color;

var a = new Point;
var b = new ColoredPoint;

Variable a names an object in the heap; so does b, but the latter's object is broken into two namespaces, linked together, since it is built in two steps. Here is the picture of the storage layout:
The global names are saved in namespace α.

When b is declared, an object, ρ, holding the namespace of Color, is allocated. It is linked to the object, η, that holds the namespace of a Point. The link, which we used to call parentns, is now called super. This is why we can write super.x and super.y in the coding of method paint within class Color --- super is the name of the link to the ``super object.'' If we write super.x, we force the use of variable x in the superclass even if there is a variable x that is more local. This is how you see super used most often in Java.

There is another key difference in the picture from the previous ones --- notice that the closures for method paint do not save an address of where to find global variables. So, we now have dynamic scoping, as we will see.

Now, say that we do this method call:

b.paint()

Since the active namespace is α, the name b means ρ --- the call is ρ.paint. A new namespace, φ, is allocated:
φ's parentns link, which leads to the nonlocal variables, is set to the address of the object named in the method call, b.paint(). This address is called self or this in some languages.

Say that the coding of paint mentions self.x and say that the call b.paint() executes. You can see from the diagram that the search for self.x looks in namespace φ to learn that self means ρ. So, the search for ρ.x looks in namespace ρ for x and proceeds to namespace η before x is found in the ``currently called object.'' (Note: the semantics of self varies from language to language --- Java's treatment of self works slightly differently from the simple explanation just given --- so be careful!)

Since Point's paint's self link is set only when paint is called, the link can be different for different calls to paint --- it is dynamic scoping. This leads to surprising consequences, as we see in the next section.

6.5.5 How method override can go wrong

The examples show that appended structs (objects constructed from subclasses) can have two fields with the same name. This appears in clear violation of what a struct is --- a mapping from distinct field names to their values. But this violation, the presence of multiple fields with the same name, is a fundamental feature of object-oriented programming, as seen in the above example where a different coding of paint is used for a colored-point object versus an ordinary point object.

Here is a second example, coded in Java, with an overridden field, toString:

===================================================

class Point {
    int x;  int y;
    ...

    string toString()
    { return  "Point: " + x + "," + y }
}

class ColoredPoint extends Point {
    int[] color;
    ...

    string toString() {
      return "ColoredPoint: " + super.x + "," + super.y 
             + "; colors: " + color[0] + "," + color[1] + "," + color[2] }
    }


Point a = new Point(0,0);  // at position 0,0 --- the upper left corner
ColoredPoint b = new Point(0,0, 255, 0, 100);   // violet

System.out.println(a.toString());  // prints "Point: 0,0"
System.out.println(b.toString());  // prints "ColoredPoint: 0,0; colors: 255,0,100"

===================================================

When ColoredPoint object b is constructed, it has two fields named toString, but b.toString() does a dynamic lookup and executes the newer version of toString within b, namely the one from class ColoredPoint. In this way, the older method is overridden --- cancelled -- by the newer one. In practice, overriding can be very useful, especially when you are linking your own subclasses to prewritten superclasses embedded within a framework. (This is how most GUIs are written, with frameworks.)

So far, so good! But strange things can happen with dynamic lookup. Here is the point-coloredpoint example again, rewritten to remove super.equals and rewritten so that whenever a colored point is compared to an ordinary (noncolored) point for equality, the result is false:

===================================================

class Point {
    int x;  int y;

    Point(int initx, int inity) {   // constructor method
        x = initx;  y = inity
    }

    boolean equals(Point q) {
        return  x == q.x  &&  y == q.y
    }

    boolean hasSameCoordinates(Point q) {
        return equals(q)
    }
}

class ColoredPoint extends Point {
    int[] color;

    ColoredPoint(initx, inity, initr, initg, initb) {
        super(initx, inity);
        color = new int[3];
        color[0] = initr;  color[1] = initg;  color[2] = initb
    }

    boolean equals(Point q) {
        if (q instanceof ColoredPoint)
            { return hasSameCoordinates(q)  &&
                color[0] == q.color[0]  &&
                color[1] == q.color[1]  &&
                color[2] == q.color[2]
            }
        else { return false }   // all Points are nonequal to ColoredPoints
    }
}

===================================================

We coded ColoredPoint so that its equals method overrides the equals method of Point, meaning that the latter is never used within a ColoredPoint object. We replaced super.equals by a useful auxiliary method, hasSameCoordinates, which checks when two points have the same x,y coordinates. But our coding goes horribly wrong:

===================================================

Point a = new Point(0,0);
ColoredPoint b = new Point(0,0, 255, 0, 100); 

a.hasSameCoordinates(b);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals in  Point,  which returns  true

b.equals(a)    // calls  equals in  ColoredPoint, which returns  false
a.equals(b);   // calls  equals in  Point(!) which returns  true (?!)

b.hasSameCoordinates(a);  // calls  hasSameCoordinates  in  Point,  which
                          // calls  equals  in  ColoredPoint(!),
                          // which returns  false (!!)

b.equals(b);   // calls  equals in  ColoredPoint, which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint(!!), which
               // calls  hasSameCoordinates  in Point, which
               // calls  equals in  ColoredPoint, which ... repeats forever  )-:

===================================================

Almost nothing goes correctly, thanks to dynamic lookup! The problem is that the coding of hasSameCoordinates must call the coding of equals within class Point to work correctly. This is destroyed by dynamic method lookup --- what we see in the coding of class Point has no relationship to what the computer does. How can we write programs in a language that we cannot trust with our own eyes?

In this situation, we must draw the storage layout and trace the use of the super and self linkages to understand the consequences of dynamic scoping of virtual methods.

6.6 Conclusion

Imperative languages are ``scratchpad languages,'' which treat computation as a game of writing numbers in the squares of a grid-sheet scratchpad, reading the numbers in the squares, and erasing the numbers in the squares and writing new ones in their place. This is the form of computation we do when we keep running totals, like vote counting at an election or keeping score at a basketball game. It is the form of computation we do when we work with databases, blackboard software architectures, and persistent storage.

Imperative languages are for updating storage in small, baby steps. Perhaps the storage is one big piece of primary memory, with millions of small cells, or perhaps the storage is split into hundreds of objects, each with its own location and each holding just a few cells, or perhaps the storage is the grid of RGB-pixels that lights up your computer's display. In any case, if the computation requires locating a cell in a storage structure, reading it, and changing it, then you will be using an imperative language to do it.

In the chapters that follow, we consider other views of computation.

Chapter 6: The imperative paradigm: From an assignment core to Java

6.1 A core imperative language with declarations and type templates

6.1.1 Using type templates for storage allocation and type checking: Comparison to C/Java/C#

Type checking in the core language

6.1.2 Nested data structures

6.1.3 Additional data and control-structure extensions

Data-structures

Control structures

6.1.4 Review of key concepts: declarations and type templates

6.2 Component extensions: Abstracts

6.2.1 Procedures inside structs

6.2.2 Type-template abstracts: Classes

6.2.3 Compiling class code

6.2.4 Declaration abstracts: Modules

6.3 Component extensions: Parameters

Semantics of parameter passing to classes

Interfaces to modules

Exercise

6.4 Component extensions: Blocks

Semantics of private declarations

6.5 Subclasses, virtual methods, and this

Review: classes and instance methods

Subclasses

Virtual-method call: b.paint()

Virtual-method call: b.refresh()

6.5.1 Compiling subclasses

6.5.2 Field lookup: static or dynamic?

Exercise

6.5.3 Foundations of subclasses

6.5.4 Appending structs: mixins

6.5.5 How method override can go wrong

6.6 Conclusion

Chapter 6:
The imperative paradigm: From an assignment core to Java

6.5 Subclasses, virtual methods, and `this`

Virtual-method call: `b.paint()`

Virtual-method call: `b.refresh()`