When one encounters a massive language, like C++ or C#, for the first time, one is tempted to ask, ``Who thought up this mess?'' Indeed, to understand a programming language, one must look past piles of syntax and identify the language's structure. The principles we studied in the previous chapters let us do that.
Languages that do computation with storable values are called imperative languages, because commands (like assignment) give orders --- imperatives --- about updating storage. This chapter presents the core of an imperative language and applies the extension techniques to grow the language into a modern, object-oriented language like Java or C#.
=================================================== P : Program E : Expression D : Declaration L : NameExpression T : TypeTemplate N : Numeral C : Command I : Identifier E : Expression P ::= D ; C E ::= N | L | E1 + E2 | E1 == E2 | not E | new T | nil C ::= L = E | print L | C1 ; C2 | if E { C1 } else { C2 } | while E { C } D ::= var I = E | D1 ; D2 T ::= struct D end | array[N] of E L ::= I | L . I | L [ E ] N ::= 0 | 1 | 2 | ... I ::= alphanumeric strings beginning with a letter, not including keywords ===================================================This is an assignment language, and it is also an object language (notice expression new T), but the most interesting part of modern assignment languages is how variables are declared and how their storage is allocated. In our example core langugage, we write
var x = 3; var z = new struct var f = 0; var g = 1 end; var r = new array[4] of 0to create three variable names in the program's namespace: x, which holds value 3; z, which holds a handle to a newly created struct object that holds ints f = 0 and int g = 1; and r, which holds a handle to a newly created four-celled array object initialized to all 0s. Here's a picture:
Now, this next point is really important, and it is a key feature of modern imperative (object-oriented) programming: a TypeTemplate phrase is a storage-allocation template. Consider the above example one more time:
Let's contrast the above fragment,
var x = 3;
var z = new struct var f = 0; var g = 1 end;
var r = new array[4] of 0
to that in C and Pascal.
To obtain the same allocations
in C, one would write
===================================================
int x = 3;
struct MyStruct {int f = 0; int g = 1}; // you first name the allocation template;
MyStruct z; // then you use it to declare a struct
int[4] r; // or say, int r[4]
// now, you write a for-loop that places 0s in r's cells
===================================================
Like our model core language at the beginning of this chapter, C (and Pascal and Modula and Ada) use type-template phrases to allocate the correct amount of storage for struct z and array r. But these languages also use the type-template phrases as data-type names, that is, ''z has data type MyStruct'' and ``r has data type int[4].''
The data-type names
are used in the rest of the program to confirm that the variables
are used properly.
For example,
C's type checker will validate that this assignment is consistent
with z's and r's declarations:
z.f = r[2] + 1
but this one is not:
x = z[3] - r // this is a ``type error''
Since TypeTemplate phrases can be used as data-type names as well
as storage-allocation templates, we call them ``type templates.''
But remember: you can switch off C's type checker and then you
are using the TypeTemplate phrases for their primary use --- to
allocate storage.
Java and C# also use type templates as both data-type names and
storage-allocation templates.
Our example looks like this in Java/C#:
===================================================
int x = 3;
class MyStruct {int f = 0; int g = 1}; // You declare the struct as a class
MyStruct z = new MyStruct(); // Why do we say, MyStruct, twice ??
int[] r = new int[4]; // What is the difference between int[] and int[4] ??
// Java automatically initializes an int-array to all 0s
===================================================
On the last two lines,
Java and C# make you use
the type template first to attach a data-type name to a variable name
and second to allocate the object that the variable names.
In Java, if you write merely,
MyStruct z;
int[] r;
then only the data-type names are attached to z and r and
no struct and no array objects are allocated.
Instead, z and r each have value nil (``no handle'') in the namespace.
One might declare z and r without initial values so that the Java compiler learns their data types for future type checks (e.g., to enforce that only handles to MyStruct objects are assigned to variable z and that only handles to int arrays are assigned to variable r.)
We will not devote a lot of time in this chapter to
data-type checking with type templates, but their use
does not always go smoothly. Here is a standard problem:
===================================================
class MyStructA {int f; int g};
class MyStructB {int g; int f};
MyStructA x = new MyStructA();
MyStructB y = x // Is this allowed or not ?
// After all, the struct has exactly the required fields ?!
===================================================
Java/C# consider this example erroneous; Algol68 does not. At the end of the
chapter we study some difficult questions that arise when we use
type templates as data-type names.
The syntax of TypeTemplate lets us nest data structures.
Here is a struct named database that holds an int
and an array of 100 structs:
===================================================
var database = new struct var howmany = 0;
var table = new array[100] of
new struct var idnum = 0;
var balance = 0;
end
end;
// insert an entry:
var count = database.howmany;
if not(count == 100) {
item = database.table[count];
item.idnum = 9999;
item.balance = 25;
database.howmany = count + 1
}
===================================================
As we have written it,
the declaration of database allocates storage
for all 100 of the struct objects in table before a single
entry has been entered into the database.
This is what happens when you code the above declarations
in C or Pascal or Ada.
Here is a picture of what the above code created:
Using the diagram, we can state precisely what, say, database.table[count].idnum denotes:
The diagram shows that we allocated all the array's structs at once. But this is a waste of storage if the database is mostly empty all the time.
For this reason,
Java and C# take the opposite approach --- a nested type template
allocates only the topmost level of object.
You must use lots of new
commands to construct the objects that fill the lower levels.
This sounds strange, but programmers quickly learn to do this.
Here is the above example written in Java style:
===================================================
// this template can allocate a two-celled struct object:
class ENTRY{ int idnum = 0;
int balance = 0; }
// this template can allocate a struct that holds one int cell and 100 cells
// that can hold handles to other objects:
class DB{ int howmany = 0;
ENTRY[] table = new ENTRY[100]; }
DB database = new DB(); // allocates one DB object but _no_ ENTRY objects!!!
// now, insert one ENTRY object into database:
count = database.howmany;
if not(count == 100) {
ENTRY item = new ENTRY(); // allocate one object only
item.idnum = 9999;
item.balance = 25;
database.table[count] = item; // store the object's handle in the array
database.howmany = count + 1
}
===================================================
Here is a diagram of what this code constructed:
In summary, be careful if you move from Java/C# to C/Pascal/Ada (or vice versa) ---
the two languages allocate objects differently!
(If you want to do ``lazy allocation'' in C, you must use pointers
and malloc, somewhat like this:
int* r; // declares r as a pointer
int howmany = 100 // number of array elements desired
// save in r the address of a 100-celled array:
r = (int*)(malloc(howmany * sizeof(int)));
// assign 99 to cell 0 of the array:
r*[0] = 99
)
For now, we will stay with our example core language.
In this case,
if we wish to allocate the database but not fill it with records,
we do this:
===================================================
var database = new struct var howmany = 0;
// initialize table to 100 cells holding nil:
var table = new array[100] of nil
end;
// insert an entry:
if not(database.howmany == 100) {
database.table[howmany] = new struct var idnum = 999999;
var balance = 25
end;
database.howmany = database.howmany + 1
}
===================================================
The result would be the same storage layout that we saw in the
previous diagram.
=================================================== E: Expresssion N: Numeral D: Declaration T: TypeTemplate I: Identifier E ::= N | ... | new T | nil D ::= var I = E | D1 ; D2 | ... T ::= struct D end | array[N] of E ===================================================To recap:
Exercises:
T ::= struct D; C end | array[N] of EThe C in the struct acts as initialization code. Implement this in your interpreter.
D ::= var I = E | D1 ; D2 T ::= struct (I : int)+ end | array[N] of intThat is, the contents of a struct is limited to just integer variables. Arrays are also integer-only.
The data-structure extension principle suggests that all storable
values --- all data structures --- should be storable as elements within
all other data structures. This means we should allow arrays of
arrays and structs, and structs of arrays and structs. In C and Pascal,
the syntax of data structures evolved to this:
===================================================
D ::= var I : T | D1 ; D2
T ::= struct D end | array[N] of T | int
===================================================
That is, var I : T, allocates storage of shape T for I, where
T can be an arbitrarily nested data structure.
int is the ``starter'' type template.
(Of course, there is a cost to such generalization. For example, the C language is designed so that storage is a sequence of integer cells, and data structures like arrays and structs are an illusionary naming convention for a sequence of contiguous cells. This storage model lets C programs compile into simple, fast target code --- C is meant for writing device drivers, not databases.)
In a later section, when we add procedures, modules, etc., we will see how they can embed within data structures, causing a further generalization.
C ::= . . . | C1 ; C2 | if E { C1 } else { C2 } | while E { C }But there can be control structures at other levels of the language. In fact, we already see one at the Declaration level: The Declaration domain uses the sequencing control structure, ;, which is important because declarations can be initialized like this:
int x = 2; int y = x * xThe order of the declarations, stated by ;, matters. In some languages, it is useful to have other control structures for declarations (e.g., conditionals) to help decide what to declare.
Since the core language has arrays, it might be
useful to introduce a control structure that lets us
``iterate'' (process) an array's elements.
Perhaps we add a for-loop, which counts upwards
from some lower bound to some upper bound by ones:
C ::= ... | for I = E1 upto E2 : C
Within the for-loop, variable I is used as an index for
locating elements in an array, e.g.,
var evens = new array[9];
for index = 0 upto 8:
evens[index] = index * 2
Or perhaps we have an iterator that extracts the array elements,
one by one:
foreach element in evens:
print element * 2
Perhaps structs also
require a control structure for iteration.
It might look like this:
C ::= ... | foreach I in L : C
The loop would look up each field in struct L
which would be used to index L's fields. Here is an example:
var evens = new struct var a = 0;
var b = 2;
var c = 4;
end;
var sum = 0;
foreach k in evens :
sum = sum + evens.k ;
As an exercise, think about the control structures that might
be useful at the level of Expression.
=================================================== D ::= var I = E | D1 ; D2 | proc I() { C } C ::= L = E | C1 ; C2 | ... | L() T ::= struct D end | array [ E1 ] of E2 E ::= N | ... | new T | nil L ::= I | L . I | L [ E ] ===================================================The definition construction is proc I(){C} and the calling construction is L().
The syntax for structs, seen above, exposes an important idea --- we can
embed procedures within a struct.
This insight led to the development of object-oriented
programming as first seen in the Simula67 programming
language --- objects are structs that contain both procedures
and variables. Here is an example:
var init = 0;
var clock = new struct var time = init;
proc tick(){ time = time + 1 };
proc display(){ print time };
proc reset(){ time = 0 }
end
The procedures inside clock maintain variable, time.
The struct is allocated as an object, with handle β:
One by one, the declarations in the struct are executed and placed into
the active namespace, β. (Note that β lives at the top
of the namespace stack; when the struct is constructed, β is popped.)
The clock object is a namespace, and
tick, reset, and display are closures.
Say we use clock like this:
clock.tick();
clock.tick();
clock.display()
When one calls clock.tick(), the following happens:
This example shows that
=================================================== D ::= var I = E | D1 ; D2 | proc I() { C } | class I = T T ::= struct D end | array [ E1 ] of E2 | L ===================================================Note that the L in the syntax rule for T is the construction that calls a named class.
The clock example in the previous section can be written like this:
class CLK = struct var time = 0;
proc tick(){ time = time + 1 };
proc display(){ print time };
proc reset(){ time = 0 }
end;
var clock = new CLK;
clock.tick()
This indeed looks familiar. Here's what happens in the machine.
(The parentns link saved in CLK's closure is critical to proper variable
scoping. Here is a tricky example, unrelated to the above,
where the proper x must be incremented
when procedure p is called:
var x = 0;
class C1 = struct proc p() { x = x + 1 } // wants to increment global x
end;
class C2 = struct var x = 5;
var c = new C1;
end;
var m = new C2;
m.c.p()
The parentns link ensures that m.c.p() increments the global variable x
and not the x inside the object named by m.
)
Classes are well used for descriptive data structuring.
Here is how we declare the database seen earlier:
===================================================
class Entry = struct var idnum = 0;
var balance = 0
end;
class DB = struct var howmany = 0;
var table = new array[100] of new Entry
end;
var database = new DB;
===================================================
The classes divide the definition into
readable substructures.
Java people say that
variable database ``denotes an object of class DB'' or
``database has type DB.''
But classes are not, strictly speaking, data types, even though
Java lets a programmer think they are.
Since a class is a declaration and since a struct holds declarations,
we might format the above example more elegantly like this:
===================================================
class DB = struct
class Entry = struct var idnum = 0;
var balance = 0
end;
var howmany = 0;
var table = new array[100] of new Entry
end;
var database = new DB;
===================================================
This makes Entry part of DB, which seems sensible if Entry is not
needed in other parts of the program. (And if it is, we can allocate
from it like this: new database.Entry.)
Please remember what classes (and TypeTemplate phrases) really are: A class is a storage-allocation template. When we define a class, we define an allocation template, and when we use the keyword, new, we activate the template to allocate storage.
This development shows that a core assignment language with structs already has the computational power to do object-oriented programming --- it is merely a matter of introducing a couple of key abstracts (command abstracts and type-template abstracts) for convenience.
=================================================== D ::= var I = E | D1 ; D2 | ... | module I = D | import L ===================================================A module differs from a class because a called module activates a set of declarations at the point where it is imported (called); the importation ``links'' the module's declarations to the program. In contrast, a class is called to allocate storage.
Here is an example to make this point:
===================================================
module M =
var x = 0;
class C = struct var a = 0 end;
var y = new array[10] of new C;
proc initialize(){
x = 0;
for i = 0 upto 9 { y[i].a = i }
}
// The above code often resides in a separate file, named M.
import M; // activates and links M's declarations to this program
initialize() // calls the proc as if it were declared in the program
var z = new C; // class C is used as if it were declared in the program
z.a = x + y[0].a
===================================================
Once M is imported, its declarations are
linked to the program as if they had been written there in the first place.
Since module importation is a kind of linking, does it make sense
to ``import'' twice? That is, can we do this?
import M;
import M
and is it the same as importing M just once?
Most languages ignore repeated imports of the same module.
Here is another question: Should we allow this example?
===================================================
module M = var x = 7;
module N = var x = 99;
import M;
import N;
x = x + 1 // which variable x is updated?
===================================================
This is a serious issue when a large program is assembled from
many modules that are linked together --- there is always a chance that
two distinct modules declare the same name. For this reason,
most languages require
that a module's names are referenced
with dot notation, like this:
===================================================
module M = newstruct var x = 7 end; // might be in a separate file
module N = newstruct var x = 99 end; // might be in a separate file
import M;
import N;
N.x = M.x + 1
===================================================
Languages that use modules-as-structs often add an operation
that ``opens'' the module so that the declarations are exposed
like we saw them in the first place:
module M = newstruct var x = 7 end;
from M import *; // import * means that all declarations in M are linked
// as if they were declared in the program.
x = x + 1
A variation of this operation opens the module for a limited scope:
module M = newstruct var x = 7 end;
module N = newstruct var x = 99 end;
import M;
import N;
with M do {
N.x = x + 1 // inside the with M do, references to x mean M.x
}
Once we add dot indexing to modules, they are looking a lot like classes! Indeed, this is why modules are not part of Java. (But Java does have packages, which are module-like.) Modules are most convenient, however, for linking together large program files into one --- classes do not do this very well.
In the previous chapter we learned how to add
parameters to abstracts. These same techniques are useful
for adding parameters to classes and modules.
Here is an example of a parameterized class:
===================================================
class Entry = struct var idnum = 0;
var balance = 0
end;
class DB(size) = struct var howmany = 0;
var table = new array[size] of new Entry
end;
var database = new DB(100);
===================================================
The parameter, size, lets us adjust the size of a new database.
Many object-oriented languages transmit the argument to a class's constructor
method and not to the class itself:
Here's how the above example looks in Java:
class Entry { int idnum; int balance = 0 }
class DB {
int howmany = 0;
Entry[] table;
DB(int size) {
table = new Entry[size]
}
}
DB database = new DB(100);
Java requires a constructor method to handle the parameter.
Of course, constructor methods let you write other initialization code.
Please see the Exercise at the end of this section for a further
analysis of this difference.
Modules might also be parameterized, perhaps by expressions
or even type-templates.
Consider this example, which defines
a database module parameterized on an int and the type-template format to be
stored in the database table:
===================================================
module DataBase(size, recordTemplate) =
var howmany = 0;
var table = new array[size] of new recordTemplate;
proc initialize(){
for i = 0 upto size - 1 {
table[i].init() } // oops -- how do we know recordTemplate contains proc init ?
};
proc find(index):
if index >= 0 and index < size {
answer = table[index].getVal() // how do we know recordTemplate contains function getVal ?
return answer
}
===================================================
If we had this class,
===================================================
class Entry = struct var idnum = 0;
var balance = 0;
proc init(x,y){
idnum = x;
balance = y
};
fun getVal(){ return balance }
end;
===================================================
we could activate (import) the module like this:
Database(100, Entry)
This allocates storage for howmany and table, where
the latter is an array of 100 Entry structs.
The coding of DataBase is suspect --- it assumes that whatever the recordTemplate type-template might be, it includes a procedure named init and a function named getVal. To ensure the security of module Database, we should annotate its parameters with these requirements. The data-type-like annotations are called an interface.
Here is a Java-like coding of the interface that we want:
===================================================
interface RecordInterface = struct init: (int * int) -> void;
getVal: void -> int
end;
module DataBase(size: int, recordTemplate: RecordInterface) = ... like before ...
===================================================
The interface gives enough information that the programmer
or compiler can check that the module is coded sensibly.
The annotation of size ensures that an int argument will be bound to it,
and the annotation of recordTemplate ensures that a struct object
with at least an init procedure and a getVal function will be bound to it.
The type-template argument that is bound to
parameter recordTemplate must match the interface; it must
``implement'' it, as one says in Java-speak:
import Database(100, Entry) // Entry implements (matches) RecordInterface
In a similar way, we can use interfaces for module arguments to modules,
to enfore correct linking:
===================================================
interface DataBaseInt = { howmany: int;
initialize: void -> void;
find: int -> int
};
module System(db: DataBaseInt) = // db binds to a declaration that holds the
// identifiers listed in DataBaseInt
var current = 0;
var value = 0;
db.initialize();
current = db.howmany - 1;
value = db.find(current)
;
. . .
import System(import Database(100, Entry)); // link Database to System
===================================================
Features like these are found in the module-oriented languages,
Modula2 and Ada.
class Entry { int idnum; int balance = 0 } class DB { int howmany = 0; Entry[] table; DB(int size) { if size > 0 { table = new Entry[size]; } } } DB database = new DB(100)In our example object language, we can always add an init procedure, like this:
class Entry = struct var idnum = 0; var balance = 0 end; class DB = struct var howmany = 0; var table = nil; proc init(size) { if size > 0 { table = new array[size] of nil } } end; var database = new DB; DB.init(100)But consider this form of struct:
=================================================== P ::= D ; C D ::= var I = E | D1 ; D2 | class I1 ( I2 ) of T E ::= ... | new T T ::= ... | struct P end | array [ E1 ] of E2 | L ( E ) ===================================================Now structs are collections of declarations followed by initialization code. Recode the above example in this new syntax. Implement it.
This example shows that structs (classes) are encapsulated programs that are executed (via new) and are queried for their answers (via L.E indexing). This is the basis of actor theory, where actors/agents are small programs, like ants in an ant colony, that ``execute in themselves'' and ``communicate'' their answers/knowledge.
Here is the syntax of declaration blocks and type-template blocks:
===================================================
D ::= var I = E | D1 ; D2 | class I = T | module I = D | import L | begin D1 in D2 end
T ::= struct D end | array [ E1 ] of E2 | L | begin D in T end
===================================================
Returning to the database example, we can improve the definitions
so that the variables owned by the database are made private:
===================================================
class Entry = struct var idnum = -1;
var balance = 0;
end;
class DB(size) = struct
begin var howmany = 0; // these two declarations are private
var table = new array[size] of new Entry // to the struct
in
proc find(i){ ...table[i].balance()... }
proc update(i,...){ ...table[i] ... howmany ... }
end end;
module DataBase(max) = begin var mybase = new DB(max) // this declaration is private
in
proc searchDataBase(...):
...mybase.find(...)...
proc processDataBase(...):
...mybase.update(...)...
// but we cannot reference mybase.howmany
// nor mybase.table
end;
import DataBase(100);
DataBase.searchDataBase(...);
DataBase.processDataBase(...)
// but we cannot reference Database.mybase
===================================================
The example shows how protection was placed
around private variables howmany and table within DB so that
once a DB object is allocated, all uses of the two variables must
be made via the public procedures, find and update.
The same idea is used to protect the struct, mybase, within
module DataBase.
Of course, in C# and Java, the keyword, private, is used to label a declaration as local to a block.
The qualification principle makes it possible to encapsulate declarations within components so they are safe from unauthorized use, no matter where the component is inserted into a system. Large systems building is possible only because of the qualification principle.
Two distinctive features of object-oriented languages are subclasses and virtual methods. These concepts are surprisingly complex, yet their correct use is critical to successful object-oriented programming. We start with a standard example.
GUI-buiding
frameworks contain starter classes for windows, frames,
buttons, text entries, and so on. Here is a sample:
===================================================
class Button {
int x; int y; // coordinates for the button's position
proc paint() { // code for painting the button on the display;
pass // the default does nothing
}
proc refresh() { // code for resetting and repainting the button
... technical code that talks to the framework and the OS
... x ... y // references x and y
paint() // calls paint
}
}
===================================================
The class contains just enough code to generate a blank button in a GUI.
Now, a user can build on the code to define a customized button:
===================================================
class MyButton extends Button {
int x; int z; // some local variables
proc customize() { // a method that does some customized action
... x ... z
... super.y // super.y is explained below
}
proc paint() {
... code that draws a colored, labelled button
}
}
===================================================
MyButton is a subclass of Button;
it extends (adds to) Button with extra fields and methods.
When the user states
var b = new MyButton;
this constructs an object that contains both the coding in Button and
the coding in MyButton. The user can invoke
b.customize() and b.paint() (this calls the newer method, paint, in MyButton!),
and b.refresh() (calls the method in Button). But notice that
refresh calls paint --- which copy of paint should refresh activate?
The designers of GUI frameworks want b.refresh() to call the newer version
of paint, the one in MyButton.
In this way, each time b is refreshed, a customized button is drawn. This is the situation provided by Smalltalk and Java. How can this happen, based on the use of closures and parentns links seen so far? Some changes are needed.
Here is a picture how this situation is implemented. Say we have these two declarations,
var a = new Button;
var b = new MyButton;
Here's what's in the heap:
Variable a names an object in the heap; so does b, but the latter's
object is broken into two namespaces, linked together, since it is built in two stages.
When b is declared, an object, ρ, holding the namespace of MyButton,
is allocated. It is linked to the object, η, that holds the
namespace of a Button. The link, which we used to call parentns,
is now called super. This is why we can write
super.y in the coding of method customize ---
super is the name of the link to the ``super object.''
If we would write super.x inside method customize, we force the use of variable x in the superclass
even if there is a variable x that is more local.
This is how you see super used most often in Java.
There is another key difference in the picture from the previous ones --- notice that the closures for the declared methods do not save an address of where to find global variables. So, we have dynamic scoping, as we will see.
Say that we do this method call:
b.refresh()
Since the active namespace is α, the name b means ρ ---
the call is ρ.refresh(). A search of ρ's namespace fails to
find method refresh, but a search of the super namespace,
ε, finds it. The code for refresh is fetched,
and a new namespace, φ, is allocated for the call:
Executing code of refresh in Button:
This is the crucial part:
φ's parentns link, which leads to the nonlocal variables,
is set to the address of the object named in the method call,
b.refresh(). The link is called self or this.
This means refresh will look first in ρ (not ε!) for its nonlocal variables.
Now, the code for refresh calls paint. The active namespace,
φ, is searched for paint and then the linked space,
ρ is searched.
In this way, the code for paint in MyButton is located and it is
activated:
Executing the code of paint in MyButton:
Because the self-link is determined only when a method is called, the implementation is called dynamic scoping. (Recall, when the link is saved with the closure when a method is declared, it is static scoping.) Another name for it is virtual method override. As noted earlier, virtual method override is the default in Java and Smalltalk, and it can be activated by the keyword, virtual, in C++ and C#.
In Java, all method calls are dynamically scoped and all field lookups are statically scoped. This means refresh always uses the x and y in class Button. But this means we must retain the parentns links as well as add the self and super links. This is an interesting exercise to implement.
(IMPORTANT NOTE: The Java compiler can compute static field lookups before a program is executed, and it will embed byte code that locates a field without the implementation of parentns links. This optimization is best studied in compiling theory and we will not pursue it here.)
Python has classes, too, and a programmer can choose to use either
dynamic
or static lookup for either methods or fields: Use self.f for dynamic
lookup and use f for static lookup --- that's it. This is my favorite
solution to this mess.
For example, if refresh references all its methods and fields
dynamically, it is written like this:
proc refresh() { // code for resetting and repainting the button
... self.x ... self.y
self.paint()
}
and if it references all its methods and fields statically, it is written
like this:
proc refresh() { // code for resetting and repainting the button
... x ... y
paint()
}
and finally, if it reference methods dynamically and fields statically,
it looks like this:
proc refresh() { // code for resetting and repainting the button
... x ... y
self.paint()
}
Subclasses do not arise as a language extension principle --- class extension
is a brand new language operation, where structs can be appended together.
Returning to our core
language, we get subclasses like this:
we add an append operation, +,
to the syntax of type templates:
===================================================
T ::= ... | struct D end | T1 + T2
===================================================
This lets us code structs in stages, e.g., a three-field struct
can be assembled like this:
var p = new (struct
var x = 0;
var y = 0;
end
+
struct
var z = new array[3] of 0;
end);
p.z[1] = p.x;
The above does not seem so useful,
but once we add class names and procedures, it gets interesting:
===================================================
class Button = struct
var x = ...
var y = ...
proc paint() { pass }
proc refresh() { ... x ... y ... paint() }
end
class CustomFeatures = struct
var x = ...
var z = ...
proc customize() { ... x ... z ... super.y }
proc paint() { ... }
end
class MyButton = Button + CustomFeatures // aha!
===================================================
Some object-oriented languages use classes exactly
as shown here. (The class-fragments are called mix-ins.)
But
mainstream object-oriented languages (e.g., Java)
allow only an ``incremental'' append, where a named ``base class''
is extended with new fields, like we saw in the previous section.
In any case, MyButton is a subclass of Button because it builds on Button --- it has all Button's fields and then some. Subclassing is an abbreviation for appending structs.
Dynamic scoping (method override) is a central feature to modern object-oriented
programming. But even innocent uses of it can lead to huge trouble.
We can see this in Java. Here is a starter Java example,
where a class Point, representing a pixel on a graphics display,
is extended by class ColoredPoint, which is a point plus RGB-color information. Notice the overridden method, toString. This is an entirely standard
use of method overriding:
===================================================
class Point {
int x; int y; // the x,y coordinates of a point
Point(int initx, int inity) {
x = initx; y = inity
}
string toString()
{ return "Point: " + x + "," + y }
}
class ColoredPoint extends Point {
int[] color; // the RGB values of a colored point
ColoredPoint(initx, inity, initr, initg, initb) {
super(initx, inity); // call Point's constructor
color = new int[3];
color[0] = initr; color[1] = initg; color[2] = initb
}
string toString() {
return "ColoredPoint: " + super.x + "," + super.y
+ "; colors: " + color[0] + "," + color[1] + "," + color[2] }
}
Point a = new Point(0,0); // at position 0,0 --- the upper left corner
ColoredPoint b = new ColoredPoint(0,0, 255, 0, 100); // violet
System.out.println(a.toString()); // prints "Point: 0,0"
System.out.println(b.toString()); // prints "ColoredPoint: 0,0; colors: 255,0,100"
===================================================
When ColoredPoint object b is constructed, it has two methods named
toString, but
b.toString() does a dynamic lookup and
executes
the newer version of toString within b, namely the one from
class ColoredPoint. In this way, the older method
is overridden --- cancelled -- by the newer one.
So far, so good! But strange things can happen with dynamic lookup.
Here is the example of Point and
ColoredPaint with methods for equality comparison.
Because a colored point is a point-plus-color, its
equals method is redefined so that
whenever a colored point
is compared to an ordinary (noncolored) point for equality,
the result is false:
===================================================
class Point {
int x; int y;
Point(int initx, int inity) {
x = initx; y = inity
}
boolean equals(Point q) {
return x == q.x && y == q.y
}
boolean hasSameCoordinates(Point q) {
return equals(q)
}
}
class ColoredPoint extends Point {
int[] color;
ColoredPoint(initx, inity, initr, initg, initb) {
super(initx, inity);
color = new int[3];
color[0] = initr; color[1] = initg; color[2] = initb
}
boolean equals(Point q) {
if (q instanceof ColoredPoint)
{ return hasSameCoordinates(q) &&
color[0] == q.color[0] &&
color[1] == q.color[1] &&
color[2] == q.color[2]
}
else { return false } // all Points are nonequal to ColoredPoints
}
}
===================================================
We coded ColoredPoint so that its equals method overrides
the equals method of Point, meaning that the latter is never
used within a ColoredPoint object.
Within equals of ColoredPoint, notice that
hasSameCoordinates checks whether two points have the same
x,y coordinates.
Our innocent-looking coding goes horribly wrong:
===================================================
Point a = new Point(0,0);
ColoredPoint b = new ColoredPoint(0,0, 255, 0, 100);
a.hasSameCoordinates(b); // calls hasSameCoordinates in Point, which
// calls equals in Point, which returns true
b.equals(a) // calls equals in ColoredPoint, which returns false
a.equals(b); // calls equals in Point(!) which returns true (?!)
b.hasSameCoordinates(a); // calls hasSameCoordinates in Point, which
// calls equals in ColoredPoint(!),
// which returns false (!!)
b.equals(b); // calls equals in ColoredPoint, which
// calls hasSameCoordinates in Point, which
// calls equals in ColoredPoint(!!), which
// calls hasSameCoordinates in Point, which
// calls equals in ColoredPoint, which ... repeats forever )-:
===================================================
Almost nothing goes correctly, thanks to dynamic lookup!
The problem is that the coding of
hasSameCoordinates must call the coding of equals
within class Point to work correctly.
This is destroyed by dynamic method lookup --- what we see in the
coding of class Point has no relationship to what the computer does.
How can we write programs in a language that we cannot trust
with our own eyes?
In this situation, we must draw the storage layout and trace the use of the super and self linkages to understand the consequences of dynamic scoping of virtual methods.
Imperative languages are for updating storage in small, baby steps. Perhaps the storage is one big piece of primary memory, with millions of small cells, or perhaps the storage is split into hundreds of objects, each with its own location and each holding just a few cells, or perhaps the storage is the grid of RGB-pixels that lights up your computer's display. In any case, if the computation requires locating a cell in a storage structure, reading it, and changing it, then you will be using an imperative language to do it.
In the chapters that follow, we consider other views of computation.