Copyright © 2012 David Schmidt

Namespace-based language modelling


1 Background: Virtual machine, interpreter, compiler
    1.1 C commands a von Neumann machine
2 Namespace-based semantic modelling
    2.1 Bindings and algebra
    2.2 Namespaces and algebra
    2.3 A virtual machine that computes namespaces
    2.4 An interpreter for an object-oriented language
3 Semantics definition == Definitional interpreter
    3.1 Operations upon the heap
    3.2 Semantic domains: L-values
    3.3 Operations for an activation stack
    3.4 Semantics equations
    3.5 A worked example, interpretive style
    3.6 A worked example, translation style
4 Tennent's naming principles
    4.1 Abstraction principle
    4.2 Parameterization principle
    4.3 Qualification principle
5 Compound namespaces and the semantics of this
    5.1 Subclasses
    5.2 Semantics of mix-ins
6 Eliminating the activation stack

How do programming languages apppear?

  1. A new language is developed to satisfy a need. For example, Fortran was developed to help physicists program their equations, Simula was developed to do simulations, Pascal was developed for portability, the C-languages were developed for systems programming, and Java was developed for embedded systems. There are many, many more examples, especially domain-specific languages (e.g., Javascript, Matlab, HTML, SQL, Excel, ...) . When you work in an application area, it is inevitable you will want a language customized to the application area. You might end up inventing it yourself.

  2. A language is designed by a one person or a small team with a common vision. The designer has both a deep understanding of the intended application area and its runtime (hardware) platform.

  3. The language's design begins with the runtime platform (runtime machine, virtual machine) --- the purpose of a programming language is to manipulate a machine.

  4. The language is prototyped (given a quick implementation) in terms of an interpreter coded in another, simpler language or even in terms of a core subset of itself (``bootstrapping''). Only after extensive use are efficient implementations or compiler implementations developed.
Although few of us design general-purpose programming languages, experienced programmers regularly design Domain-Specific Programming Languages (DSLs) for specialized domains like telecommunications, avionics, graphics, text processing, databases, bioinformatics, gaming, and so on.

Language design should follow a methodology, just like software development does. We will learn how to use a meta-language based on namespaces (dictionaries) to design and prototype a programming language.


1 Background: Virtual machine, interpreter, compiler

A programming language is a language for commanding a machine. There are many varieties of machines, hence there should be many varieties of programming languages. It is partly accident and partly ecomonics that the von Neumann architecture is the predominant machine for computing. So, what is the programming language for a von Neumann machine?

The processor of a von Neumann machine understands machine language, which can be typed on a keyboard as assembly language. The C language (and its ancestors, CPL, BCPL, and B) are assembly language "dressed up" by assignment equations, subroutines, and array and struct data structures. C is translated to machine language by a compiler. (See the next section.)

But there are other computing machines: there is an "expression machine", called a Lisp machine, that was proposed in the 1960s and built in the 1980s. There is a "stack machine", that was commanded by the language Algol60 and constructed by the Burroughs Company. There are other machines that were designed, some a century or more ago, but never built.

These days, software and language architects regularly design machines and their languages but emulate the machines in software --- the emulation is called a virtual machine. In its simplest form, a virtual machine is one or more data structures and operations that use the structures. The virtual-machine operations are the machine language.

How do we execute programs for a virtual machine? Do we write programs in machine language? Well, we can, but that is tedious and error prone. Instead we invent a "programmer's language" (actually, "programming language") whose constructions correspond to operations in the machine language. An interpreter is a controller program for the virtual machine: it reads a program written in the programming language and executes the corresponding machine-language operations. A good example is the Javascript interpreter, which reads a Javascript program and executes operations on the Java virtual machine ("JVM") embedded within a web browser. Other examples are the implementations of Smalltalk, Scheme, Ruby, and Python.

Many languages, especially those with declarations and static data typing, use a compiler, which reads a program and generates a file of the machine-language operations that the program commands. That is, "An interpeter does the machine operations now; a compiler makes a file of the machine operations for doing later."

Machines, whether virtual or actual, are based on some concept. The von Neumann machine is based on the concept that data is saved in tiny bits, each bit saved in a storage cell, and computation is done by repeatedly examining and altering the data in the cells. The cells are arranged as a long, linear vector. Von Neumann machines are suited for linear algebra, a bit of physics, and some general math, which was fine in the 1940s when they were designed. But computing applies to many more areas these days. That's why the Lisp machine, stack machine, byte-code machine, and many other machines followed. These machines are emulated as virtual machines.

In the 1980s, software architects placed importance on components --- collections of named values. Components communicate with each other by calling each other's names. The component structures look nothing at all like von-Neumann-style storage. What is the virtual machine for an "object-oriented" machine?

In these notes we will learn a meta-language (machine language) that computes on components called namespaces, and we will write interpreters that use namespace meta-language for object-oriented programming languages like Smalltalk and Java.


1.1 C commands a von Neumann machine

C is a language for doing actions on a von Neumann machine: there is linear storage, where each storage cell is indexed by an integer, its location.

A true "C-virtual-machine" is a von Neumann machine augmented by a table, a symbol table, which holds the variable names in the C program and the storage locations named by the variable names:


The interpreter/controller reads a C program and does its actions on the data structures. This trio defines Strachey's characteristic domains for languages.

C's syntax expresses the domains and operations of the machine; here is a minimal, "core" version:

===================================================

P : Program
CL : CommandList                 C : Command
DL : DeclarationList             D : Declaration
E : Expression                   L : LefthandSide
I : Identifier                   N : Numeral

P ::= DL ; CL

DL ::=  D  |  D ; DL 
D ::=  declare I

CL ::=  C  |  C ; CL
C ::=  L = E  |  while ( E ) { CL }  |  if ( E ) { CL1 } else { CL2 }

E ::=  N  |  ( E1 + E2 )  |  L  |  & L

L ::=  I  |  * L

N ::=  string of digits
I ::=  strings of alphanumerics 

===================================================
Assignment, =, manipulates memory, and operators, * and &, compute on locations. Symbol-table use happens within declaration and use of identifiers, I. The syntax defines C's expressible values to be ints and locations.

The program that created the above storage configuration was this one:

declare y; declare z; declare x;
y = 5 ;
z = &y ;      # store in  z's  cell the location named by  y
x = (6 + *z)  # add 6 to the int stored where  z  points and assign
  

We might express in the syntax the symbol table and memory themselves. Indeed, C lets you express approximations of both --- a struct is a C-coded "baby symbol table", and an array is a C-coded "baby memory". (We won't develop them here....)

Now, we can enrich the syntax with naming devices --- data types, procedures, functions (or if you are Bjarne Stroustrup, lots more!).

How do we state precisely what the syntax constructions tell the machine to do? There must be some sort of official description! The usual answer is: you write an interpreter, a definitional interpreter, that states the actions taken by each construction. The definitional interpreter itself must be coded in something --- machine language, maybe? Think about how you might write a definitional interpreter for C. How would this differ from a C compiler?


2 Namespace-based semantic modelling

A definitional interpreter can be written in an actual programming language, but better still, we should use the "machine language" of the intended virtual machine to best expose the meanings of the language's constructions. If the virtual machine is simple and cleanly designed, then it will have a simple and clean "machine language" that we can use for writing interpreters. In this case, we call the machine language a meta-language. ("Meta-language" means "a language for talking about another language".)

Programming-language history is full of such meta-languages: VDL (for tree machines), lambda-calculus (for rewriting machines), binary-machine code (for von Neumann/Turing machines) are three famous ones. More recently, there are also the object calculus, CCS, CSP, and various logical frameworks (Isabelle/HOL, Coq, LF).

The meta-language we will use in these notes is "namespace algebra". It is a notation for using namespaces, just like ordinary algebra is a notation for using numbers.


2.1 Bindings and algebra

Algebra defines operations on numbers; how to use the operations to write expressions that mean numbers; and how to name the meanings of expressions. For example, + is an operation, 2 + 3 is an expression, and expression means 5 because there is a law, 2 + 3 = 5.

A binding is an association between a name and a meaning. The name can be referenced later when the meaning is required Here is an example, where there are two bindings that help compute a meaning:

let x = 2 + 1
let y = x + x
x * y
We use the laws of algebra to compute the meaning of x * y:
===================================================

let x = 2 + 1 
let y = x + x
x * y         

==>  let x = 3      because 2 + 1 = 3
     let y = x + x
         x * y

==>  let x = 3      because x + x = 3 + 3 = 6
     let y = 6
     x * y

==> 18              because  x * y = 3 * 6 = 18

===================================================
x binds to the meaning, 3, and y binds to the meaning, 6, so that x * y evaluates to the meaning, 18.

Here is a "script" of bindings and expressions to solve:

===================================================

let x = 2 + 1
let y = x + x
x * y
((x + y) * (x * y)) + (2 * (x + y)) + 1
y - 1

===================================================
The script generates two bindings and three integers.

Algebra's syntax and its laws define a meta-language for computing numbers. If we wanted, we can build a machine that understands algebra and does the computation.


2.2 Namespaces and algebra

Consider an algebra not about numbers but about bindings themselves --- this is exactly the meta-language we need to analyze object languages.

As before, a binding, n=v, associates name n with meaning, v. Names are atoms ("strings"), and meanings are primitives: either integers (0, 1, 2, ...) or labels called handles (in the examples, we use Greek letters, α, β, ...). A namespace is a collection of bindings labelled by a handle, e.g. α:{'x'=3, 'y'=β} is a namespace of bindings for 'x' and 'y', labelled by handle α.

We want an algebra for computing on namespaces. Here are the operations of "namespace algebra":

creation:  alloc{}   creates a new, empty namespace, labelled by a new handle, h. The meaning of the expression is h.

membership: For name, n and expression, D,   member(n, D)   checks if n is bound in D: Say that D has as its meaning a handle, h. Then the meaning of the expression is True iff there is a binding, n=v, in the namespace labelled h, and is False otherwise.

binding: For name n, and expressions E and D,   bind(D,n,E)   makes a binding for name, n, in namespace D: Say that E has meaning, v, and D has handle, h as its meaning, then the binding n=v is added to the namespace labelled by h. If the namespace already had a binding for h, then that existing binding is cancelled. h is the meaning of the expression.

lookup:   find(D, n)   finds the value bound to name n in the namespace meant by D: Say that D has as its meaning a handle, h. Then the expression means v when n=v is the binding in the namespace labelled d. If not member(n, D), the operation generates an error and has no meaning.

As in ordinary algebra, we write a sequence of bindings and expressions in in namespace algebra. We call the sequence a script. Unlike ordinary algebra, namespace algebra generates meanings and namespaces. Here is an example script and the computation to its meaning:
    let d = bind(
             bind(alloc{}, 'x', 1), 
                             'y', 2)
    let e = alloc{}
    bind(d, 'x', e)
    bind(e, 'z', find(d, 'x'))    

    (no namespaces yet)
==> let d = α            
    let e = alloc{}
    bind(d, 'x', e)
    bind(e, 'z', find(d, 'x'))
and         because
alloc{} ==> α and   

bind(α, 'x', 1) ==> α and  

bind(α, 'y', 2) ==> α and  

==> let d = α
    let e = β
    bind(d, 'x', e)
    bind(e, 'z', find(d, 'x'))
and         because
alloc{} ==> β and  
==> let d = α
    let e = β
    α
    bind(e, 'z', find(d, 'x'))
and         because
 bind(d, 'x', e) = bind(α, 'x', β) ==> α
==> let d = α
    let e = β
    β
and         because
find(d, 'x') = find(α, 'x') ==> β

bind(β, 'z', β) ==> β
The example shows how "namespace algebra" constructs namespaces and calculates meanings that are handles and integers. Since the original binding to x in namespace α is updated, the original binding is "cancelled" or "hidden" or "overridden" or "erased". It is OK to draw the revised namespace like this:

Occasionally, we will abbreviate multiple bindings on a new namespace,

bind( ...
   bind( 
     bind(alloc{}, n0, v0),
                   n1, v1), 
                     ...,
                   nm, vm)
by alloc{n0= v0, ..., nm-1= vm-1, nm= vm}

Finally, it will be useful to have a "choice operation", which we write as if B : S1 else S2, that is, expression S1 or S2 is chosen to evaluate based on the boolean value of B.

(Comment: in ordinary algebra on numbers, we let 0 stand for False and 1 stand for True, so if B: S1 else S2 is really just this arithmetic expression: (B * S1) + (B * S2). The choice operation is coded with gates in circuit theory the same way, where * is an and-gate and + is an or-gate.)

Namespaces appear in semantic models as
symbol tables
environments
hash tables
record structs
objects
activation records
They can also model closures, stacks, finite-domain functions, vectors....


2.3 A virtual machine that computes namespaces

We use namespace algebra as the machine language of a virtual machine. The machine has a heap as its storage structure. Namespaces are constructed in the heap, indexed by their handles.

We don't want to build the "heap machine" out of gates and wires; we want to emulate it in software. There are two ways to do this: "top down" and "bottom up". We'll develop these approaches when we study Domain-Specific Programming Languages, and for now, we employ the bottom-up approach, which is particularly easy for the small namespace algebra:

  1. We choose an existing, "host" programming language that can easily model namespaces as dictionaries.
  2. We write functions in the host language to emulate the operations, alloc, member, find, and bind.
  3. We write a namespace-algebra script as a program in the host language that calls the functions coded in Step 2.

For Step 1, we use Python as the host language, because it has dictionaries as a built-in data structure. Python lets us define functions, and a Python program is a script just like a namespace-algebra script.

To get started, we make an important observation: a heap is itself a form of namespace that binds handles to namespaces. Our virtual machine will maintain a variable, heap, whose value is a dictionary:

===================================================

  heap : { Handle : Namespace }
where
  Namespace = { Identifier : Meaning }
  Meaning = Handle | Int

===================================================
Handles will be modelled as Python strings.

For Step 2, We write functions to emulate the namespace-algebra operations. Here are sample Python codings for the heap and the operations alloc and bind:

===================================================

heap = {}      # holds labelled namespaces
heapCount = 0  # how many namespaces stored in the  heap

def alloc(d):
    """allocates a new object in the heap.
       param d - a dictionary holding bindings to be copied
        into the new object
       returns - the handle to the new object
    """
    newhandle = genHandle()  # see just below
    heap[newhandle] = {}
    for name in d :
        heap[newhandle][name] = d[name]
    return newhandle

def genHandle() :
    """returns a new handle, a string, hi, where  i  is an int"""
    global heapCount
    han = 'h' + str(heapCount)
    heapCount = heapCount + 1
    return han


def bind(handle, fieldname, meaning):
    """binds fieldname to meaning in the namespace labelled by handle"""
    if handle in heap :
        heap[handle][fieldname] = meaning
    else :
        print "bind error: " + handle + " does not exist in the heap"
        raise Exception
    return handle


def printHeap() :
    """prints the contents of the heap"""
    print "heap = {"
    hlist = heap.keys()
    hlist.sort()
    for h in hlist: 
        print "  ", h, ":", heap[h]
    print "}"

===================================================
You should write the function thats implement member and find. Function printHeap is called to display the contents of the emulated heap storage.

Say that all these functions are saved in a file, NamespaceVM.py. For Step 3, we write a Python script that holds the algebra expression to compute:

===================================================

from NamespaceVM import *  # link to the emulator

# the namespace-algebra script follows:
d = bind(bind(alloc({}), 'x', 1), 'y', 2)
e = alloc({})
bind(d, 'x', e)
bind(e, 'z', find(d, 'x'))

# display the final configuration:
printHeap()

===================================================
This will print:
heap = {
   h0 : {'y': 2, 'x': 'h1'}
   h1 : {'z': 'h1'}
}
showing that two namespaces were created in the heap, labelled by handles h0 and h1.


2.4 An interpreter for an object-oriented language

The heap machine for namespace algebra is an excellent starting point: we will define a "programmer's language" for the heap machine's "machine language." The former will be what we call an object-oriented language.

The interpreter for the object-oriented language will look like this:


The input to the engine is an object-program converted into parse-tree format. As the engine traverses the tree and reads its contents, it performs namespace algebra, which constructs namespaces in the heap.

The engine has a register that holds a distinguished handle, named ns, which is the handle to the "active namespace" whose bindings are visible to the command currently being executed. In Java/C#, you would say that the "active namespace" is "this object" or just "this".

Here is the object language whose programs are input to the interpreter:

===================================================

C : Command                 L : LefthandSide
E : Expression              I : Variable
F : FieldName               N : Numeral

C ::=  L = E  |  if E : C1 else C2  |  while E : C  |  C1 ; C2

E ::=  N  |  ( E1 + E2 )  |  L  |  new { F }

F ::=  I  |  I , F

L ::=  I  |  L . I  |  this

N ::=  string of digits
I ::=  strings of letters, not including keywords

===================================================
Here is a sample object program that plausibly generates the earlier picture: (We still have to work out the details!)
x = 7;
y = new {f, g};
y.f = x;   
y.g = new {r};
z = y.g;
z.r = y.f;
   
In the next section, we will formalize how each command in the program maps to operations in namespace algebra. For the moment, follow your intuitions and try to explain how the commands in the above program would construct namespaces and set bindings.

Constructor code for new objects

The syntax for new namespaces (new{I1,I2,...}) is inadequate --- if we construct a new object, we want to initialize its fields. Here is a simple but perhaps surprising way to do just that:

E ::=  . . .  |  new { C }
The (compound) command C computes bindings that fill a newly allocated namespace --- C is "constructor code." (In Java/C#, C would be the code inside the "constructor method".)

Here is an example where there is constructor code inside a new object:

x = 7;
y = new { f = x;
          g = new {r = x}; // (*)
        };
z = y.g;                // assigns the handle, y.g, to z
z.r = y.f + y.g.r + x   // assigns 21  to  y.g.r's  cell
The second command assigns to variable y the handle of a new namespace and fills the namespace with bindings for f and g (which is assigned a handle to another new namespace holding a bindings to r).

How does the machine make these bindings?

When the program starts, the ns register in the interpreter points to the namespace where global variables, x, y, etc., will be saved. But when y = new {...} is encountered, the machine allocates a new namespace and resets the value of ns to the handle for the new namespace. So, variable f will be stored in the new namespace (and not the namespace where global variables live). And when g = new{...} is encountered, the ns register is reset again, to the handle of the new(est) namespace. At that point in the program, there are three namespaces that are active.

To remember the various values of ns, the machine maintains a stack of the namespaces' handles. The handle at the top of the stack refers to the local namespace. The stack is itself implemented with namespaces --- this is how it's done in Smalltalk, Javascript, Ruby, etc.!

Here is a snapshot of the execution at point (*), just before the namespace labelled by β is finished and β is assigned to y:


There is a stack (activation stack) of the active namespaces. In the diagram, the stack is a linked list whose cells possess fields ns and parent. The topmost cell in the activation stack holds the handle of the local namespace that is used for variable lookups. The parent fields chain together the linked list. Register actstack in the engine points to the top of the stack --- to the cell that holds the active value of ns. Once we have multiple active namespaces, we have multiple occurrences of ns!

In addition, parent links are added to all namespaces to indicate where each namespace's nonlocal variables are found. For example, object β's parent is object α (because it is nested within α's namespace). This is how the global x can be found so that the assignments f = x and then r = x can be evaluated.

The semantics of new{C} goes like this:

(i) allocate a new namespace and push its handle on the activation stack;
(ii) evaluate C, using the topmost handle on the activation stack for variable lookups and updates;
(iii) once C concludes, pop the stack;
(iv) return the handle that was popped as the meaning of new{C}.

Variable lookup: "inside out" and "outside in"

Reconsider this fragment of the previous example:
x = 7;
y = new { f = x;
          g = new {r = x};
        };
z = y.g;
z.r = y.f + y.g.r + x
There are two forms of variable lookup:
  1. A name, like x, is mentioned by itself, as in r = x. The namespace that holds a binding for x must be located by a search, starting from the local namespace and following the parent links. We call this an "inside out" lookup. The path of parent links is called the static chain.
  2. A name, like y.g.r, states the path to find the binding for r. There is no search: y names a namespace that holds a binding for g that is a handle to a namespace that holds the binding for r. This is an "outside in" lookup.
All object languages support outside-in lookup, and some --- not all --- support inside-out lookup as well. A language that has only outside-in lookup makes the programmer state exactly where to find a variable. Here is how the previous example is written using only outside-in lookup:
x = 7;
y = new { f = super.x;
          g = new {r = super.super.x};
        };
z = y.g;
z.r = y.f + y.g.r + this.x
super.x references the x bound in the namespace that is one level more global than the local one. this.x references the x bound in the local namespace. You should think about how to implement this and super with the ns and parent links in the heap machine. (Later, this and super take on a slightly different meaning.)

Java, C#, and C use outside-in lookup, and they limit inside-out lookup to just "local and one-level up". Python has both inside-out and outside-in lookup.

The example object language we will develop in these notes contains both inside-out and outside-in lookup.

Variable declarations

Consider this example:

x = 7;
y = new {z = x; x = 0};
x = x + y.z
Is variable z stored in y's namespace or in the global namespace? Does x = 0 reset global x or create a second x in y's namespace?

We must remove this confusion. There are several ways to do so, but the simplest is to add a new keyword, var, which we use to create a new binding in the local namespace. We rewrite the example like this:

var x = 7;
var y = new {var z = x; x = 0};
x = x + y.z
Now it is clear that variables x and y are bound in the "global" namespace, variable z is bound in y's namespace, and "global" x is updated to 0.

var I = E is a declaration of a new binding, and L = E is an assignment to an existing binding.


3 Semantics definition == Definitional interpreter

The previous examples raised significant questions, so we need a precise semantics definition for the object language. We write the semantics in namespace algebra, formatting the semantics in Strachey-style equations. For this syntax of the object-oriented language:
===================================================

P : Program
C : Command           E : Expression
T : Template          L : Lefthandside

P ::=  T

C ::=  var I = E  |  L = E  |  C1 ; C2  |  while E : C end

E ::=  N  |  ( E1 + E2 )  |  L  |  new T

T ::=  { C }

L ::=  I  |  L . I  |  this  

===================================================
we write a semantics that translates each syntax construction into a script of namespace algebra. The script explains exactly what each construction means.

There are two notions to be worked out before we study the semantics.


3.2 Semantic domains: L-values

In a C-like language, for an assignment, L = E, the meaning of E is computed and deposited into the storage cell whose location number is the meaning of L.

What if L is an array expression? The assignment, A[I] = E computes A to the base address of the array, and I computes to an offset or index within the array. The pair, (base, offset), are then computed to a storage cell location where the meaning of E is deposited.

If L is a struct expression, then S.I = E computes S to a base address and I computes to an offset; the pair, (base, offset), are computed to a location where the meaning of E is deposited.

Data-structure computation uses a base+offset pair to state the coordinates where an update should be made. The pair is called an L-value ("left-hand-side value").

Arrays and structs are data structures, and so are namespaces. In the object language, the semantics of assignment, L = E, uses L-values as follows:

  1. L is computed to the L-value, (handle, index).
  2. E is computed to a meaning, m.
  3. The binding, index = m, is made within the namespace labelled by handle.
Consider the L-values that are computed in this example for x and y.f:
var x = 7;
var y = new { var f = x };
x = y.f;
An inside-out lookup for x searches the static chain for the namespace labelled h that holds x's binding. The L-value is (h, 'x'). An outside-in lookup of y.f computes the handle, g, named by y so that the L-value is (g, 'f').

The L-values are used to do find and bind operations.


3.3 Operations for an activation stack

The object language allows nested objects, so its interpreter will require an activation stack to remember the handles of active, unfinished objects.

We model the activation stack by a linked list of namespaces, like we saw in the diagrams. This requires a register (variable) in the interpreter that remembers the top of the stack:

===================================================

  actstack : Handle  # remembers the handle to the currently active namespace 

===================================================
We define the usual operations on a linked-list stack whose cells have the format, {'ns'= Handle, 'parent'= Handle}. We define the functions with namespace-algebra notation.
===================================================

### make the namespace labelled  h  the active namespace:
let push(h) == let newtop = alloc{'ns'= h, 'parent'= actstack}
               let actstack = newtop

### forget the active namespace:
let pop() ==  let actstack = find(actstack, 'parent')

### retrieve the handle of the active namespace:
let top() == if actstack == nil : nil
             else : find(actstack, 'ns')

===================================================

Here is a namespace-algebra script that uses the functions to generate the updates demanded by the program, var x = 7; var y = new {var f = 8}; y.f = 99. Starting with an empty stack, it allocates a namespace to hold global variables, then declares x in it, and constructs a second namespace whose handle is y's value so that y is declared too:

let actstack = nil
push(alloc{})            # namespace for global vars 

bind(top(), 'x', 7)      # declare  var x = 7  in global ns

let newhandle = alloc{}  # allocate namespace for  y
push(newhandle)          # make the new namespace the active one
bind(top(), 'f', 8)      # var f = 8  in new namespace
pop()                    # now that the new object is initialized, it is no longer active
bind(top(), 'y', newhandle)      # var y = ...  in global ns

bind(find(top(), y), 'f', 99)  #  y.f = 99

The script shows a pattern of pushing and popping handles so that declarations use the top handle of the activation stack. The pattern is formalized in the semantics definition of the source programming language.

Exercise

Do the calculation of the previous namespace-algebra script. When you encounter a function call to push or pop or top, expand it in this two-step style:
let f(x) == E

 ... f(E') ...   ==>  ... f(m') ...    where  m' is E's meaning


                 ==>  ... [m'/x]E ...  where  [m'/x]E  is  E with            
                                       all occurrences of x
                                       replaced by m'
That is, first compute the meaning of the call's argument and then replace the call by the function's body, where the argument's meaning replaces the parameter variable. Example:
===================================================

let actstack = nil
push(alloc{})       

FIRST, COMPUTE THE ARGUMENT'S MEANING:
==>  let actstack = nil
     push(h0)               and  h0:{}

SECOND, REPLACE THE CALL BY THE FUNCTION'S BODY:
==>  let actstack = nil
     let newtop = alloc{'ns'= h0, 'parent'= actstack}
     let actstack = newtop

NOW, COMPUTE THE MEANING OF THE BODY:
=    let actstack = nil
     let newtop = bind(bind(alloc(),
                       'ns', h0),
                       'parent', actstack)
     let actstack = newtop

=    let actstack = nil
     let newtop = bind(bind(alloc(),
                       'ns', h0),
                       'parent', nil)
     let actstack = newtop

==>  let actstack = nil                        and h0:{}
     let newtop = bind(bind(h1, 'ns', h0),         h1:{}
                                'parent', nil, h1))  
     let actstack = newtop

==>  let actstack = nil                     and h0:{}
     let newtop = bind(h1, 'parent', nil)       h1:{'ns'= h0}
     let actstack = newtop

==>  let actstack = nil       and h0:{}
     let newtop =  h1             h1:{'ns'= h0, 'parent'= nil}
     let actstack = newtop

==>  let actstack = nil       and h0:{}     
     let newtop =  h1             h1:{'ns'= h0, 'parent'= nil}
     let actstack = h1

===================================================
At this point, there are two namespaces and actstack means h1.


3.4 Semantics equations

Finally, here are the semantics equations. They build namespaces in the heap and maintain the register, actstack, which points to the top of the linked-list activation stack:

===================================================

Virtual-machine data structures:

   heap : { Handle : Namespace }
            where  Namespace = { Identifier : Denotable }
                   Denotable = Handle | Integer
                   LValue = Handle * Identifier

   actstack : Handle


Program[[.]] updates actstack, heap:

[[ T ]] == let actstack = nil
           Template[[ T ]]


Command[[.]] updates actstack, heap:

[[ var I = E ]] ==  bind(top(), I, Expression[[ E ]])

[[ L = E ]] ==  bind(Lefthandside[[ L ]], Expression[[ E ]])

[[ while E : C end ]] ==  if Expression[[ E ]] != 0 :
                               Command[[ C ]]
                               Command[[ while E : C end ]]
                          else : (skip)

[[ C1 ; C2 ]] == Command[[ C1 ]]
                 Command[[ C2 ]]


Expression[[.]] updates actstack, heap and returns Denotable:

[[ N ]] ==  int(N)

[[ E1 + E2 ]] ==  Expression[[ E1 ]] + Expression[[ E2 ]]

[[ L ]] ==  find Lefthandside[[ L ]]

[[ new T ]] == Template[[ T ]]


Template[[.]] updates actstack, heap and returns Handle:

[[ { C } ]] == let newhandle = alloc{'parent' = top()}
               push(newhandle)
               Command[[ C ]]
               pop()
               newhandle


Lefthandside[[.]] returns LValue:

[[ I ]] ==  searchStatic(top(), I)

where  searchStatic(thishandle, I) ==
          if thishandle != nil :
             if member(I, thishandle) :
               (thishandle, I)
             else : searchStatic(find(thishandle, 'parent'), I)
          else : raise Exception


[[ L . I ]] ==  let  han = find Lefthandside[[ L ]]
                if member(I, han) :
                    (han, I)
                else : raise Exception

[[ this ]] ==  (actstack, 'ns') 

===================================================

Comments:


3.5 A worked example, interpretive style

The semantics shows the precise actions a program makes on a heap. Here is a completely worked example, where the program, {var x = 7; var y = new {var f = x}}, is expanded into its script of namespace-algebra performed on the heap machine. The expanded program appears on the left and the values of the heap and stack pointer appear on the right:

===================================================

ABBREVIATIONS:
T0 == { var x = 7; var y = new {var f = x} }
C0 == var x = 7
C1 == var y = new T1
T1 == {var f = x}


SCRIPT:                      DATA STRUCTURES:
---------------------        ---------------------

Program[[ T0 ]]
                             heap = {}
== let actstack = nil
   Template[[ T0 ]]
                             heap == {}
                             actstack == nil
== Template[[ T0 ]]

== let newhandle = allocate{'parent'= top()}
   push(newhandle)
   Command[[ C0; C1 ]]
   pop()
   newhandle
                             heap == { h0 = {'parent'=nil} }
                             actstack == nil
== let newhandle = h0
   push(h0)
   Command[[ C0; C1 ]]
   pop()
   h0
                             heap == { h0 = {'parent'=nil}, 
                                       h1 = {'ns'=h0, 'parent'=nil} }
                             actstack == h1
== Command[[ C0; C1 ]]
   pop()
   h0

== Command[[ var x = 7 ]]
   Command[[ C1 ]]
   pop()
   h0

== bind(top(), 'x', Expression[[ 7 ]])
   Command[[ C1 ]]; pop(); h0

== bind(h0, 'x', 7)
   Command[[ C1 ]]; pop(); h0

                            heap == { h0 = {'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil} }
                            actstack == h1    

== Command[[ var y = new T1 ]]
   pop(); h0

== bind(top(), 'y', Expression[[ new T1 ]])
   pop(); h0

== bind(h0, 'y', Template[[ {var f = x} ]])
   pop(); h0
   
== bind(h0, 'y', (let newhan = allocate({'parent' = top()})
                  push(newhan)
                  Command[[ var f = x ]]
                  pop()
                  newhan))
   pop(); h0

                            heap == { h0 = {'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil},
                                      h2 = {'parent'=h0},
                                      h3 = {'ns'=h2, 'parent'=h1}
                                    }
                            actstack == h3    

== bind(h0, 'y', (Command[[ var f = x ]]; pop(); h2))
   pop(); h0


== bind(h0, 'y', (bind(h2, 'f', Expression[[ x ]]); pop(); h2))
   pop(); h0


== bind(h0, 'y', (bind(h2, 'f', find Lefthandside[[ x ]]);
                  pop(); h2))
   pop(); h0

== bind(h0, 'y', (bind(h2, 'f', find(h0,'x')); pop(); h2))
   pop(); h0

== bind(h0, 'y', (bind(h2, 'f', 7); pop(); h2))
   pop(); h0

                            heap == { h0 = {'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil},
                                      h2 = {'f'=7, 'parent'=h0},
                                      h3 = {'ns'=h2, 'parent'=h1}
                                    }
                            actstack == h3    

== bind(h0, 'y', (h2; pop(); h2))
   pop(); h0


                            heap == { h0 = {'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil},
                                      h2 = {'f'=7, 'parent'=h0},
                                      h3 = {'ns'=h2, 'parent'=h1}
                                    }
                            actstack == h1

== bind(h0, 'y', h2 )
   pop(); h0  

                            heap == { h0 = {'y'=h2, 'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil},
                                      h2 = {'f'=7, 'parent'=h0},
                                      h3 = {'ns'=h2, 'parent'=h1}
                                    }
                            actstack == h1

== pop(); h0
                            heap == { h0 = {'y'=h2, 'x'=7, 'parent'=nil},  
                                      h1 = {'ns'=h0, 'parent'=nil},
                                      h2 = {'f'=7, 'parent'=h0},
                                      h3 = {'ns'=h2, 'parent'=h1}
                                    }
                            actstack == nil
== h0

===================================================
It is an easy job reformatting the semantics definition into a protoype computer implementation. (This is supposed to be how one designs a language --- write a semantics first and implement it second!)


3.6 A worked example, translation style

The semantics definition is a ``syntax-directed translation scheme,'' in the language of compiling theory. If we treat the namespace-algebra operations as ``machine language,'' then it is easy to ``compile'' (translate) a program into its semantics, which is a script of function calls and machine-language operations. Here is the previous program example, translated in this manner:
===================================================

ABBREVIATIONS:
T0 == { var x = 7; var y = new {var f = x} }
C0 == var x = 7
C1 == var y = new T1
T1 == {var f = x}


TRANSLATIONS OF PHRASES:
Program[[ T0 ]] == let actstack = nil
                   Template[[ T0 ]] 

Template[[ T0 ]] ==  let newhandle = alloc{'parent' = top()}
                     push(newhandle)
                     Command[[ C0 ]]
                     Command[[ C1 ]]
                     pop()
                     newhandle

Command[[ C0 ]] ==  bind(top(), 'x', Expression[[ 7 ]])
                ==  bind(top(), 'x', int(7))

Command[[ C1 ]] ==  bind(top(), 'y', Expression[[ new T1 ]])

Expression[[ new T1 ]] == Template[[ T ]]

Template[[ {var f = x} ]] == let newhandle = alloc{'parent' = top()}
                             push(newhandle)
                             Command[[ var f = x ]]
                             pop()
                             newhandle

Command[[ var f = x ]] ==  bind(top(),'f', Expression[[ x ]])

Expression[[ x ]] ==  find Lefthandside[[ x ]]
                  ==  find searchStatic(top(),'x')


ASSEMBLED 'TARGET CODE':
let actstack = nil
let newhandle = alloc{'parent' = top()}
push(newhandle)
bind(searchStatic(top(),'x'), int(7))
bind(searchStatic(top(),'y'), 
     (let newhandle = alloc{'parent' = top()}
     push(newhandle)
     bind(top(),'f', find searchStatic(top(),'x'))
     pop()
     newhandle))
pop()
newhandle

===================================================
If we were serious about implementing a compiler for this particular language, we would devise translations of the operations into machine language, and we would do some internal optimizations of the ``target code.'' (A typical optimization is the computation of the searchStatic operations into L-values; this is what happens in a Java or C# compiler, which precomputes the storage demands of allocated objects.) Finally, we would unwind nested function calls into the style of ``flat code'' that is typical of machine language.

Exercise

Namespaces can model arrays, because an array is merely an namespace whose fieldnames are 0, 1, 2, .... Modify the object language's syntax to this:
T ::=  { C } |  array [ N ] of E

L ::=  I  |  L . I  |  L [ E ]
Here is a sample program,
var x = 3;
var r = new array[4] of new {var f = 0};
r[x].f = x
which constructs an array of four objects.

The semantics of array declaration goes like this:

Template[[ array [ N ] of E ]] ==  allocArray(0, int(N), [[ E ]], alloc{})

allocArray(index, size, initcode, arrayhandle) ==
    if index == size :  
        arrayhandle   # finished
    else :
        allocArray(index + 1, size, code,
                      bind(arrayhandle, index, Expression initcode))
Write the semantics of array indexing, Lefthandside[[ L[E] ]]. How does it differ from L.I's semantics?

Calculate the semantics of the above program.


4 Tennent's naming principles

The little language defined so far is called a core language:
===================================================

P ::=  T
C ::=  var I = E  |  L = E  |  C1 ; C2 
E ::=  N  |  ( E1 + E2 )  |  L  |  new T
T ::=  { C }
L ::=  I  |  L . I  |  this

===================================================
because the core computational abilities are fixed.

Robert Tennent proposed these principles for adding naming constructions to a language core:


4.1 Abstraction principle

Here, we choose to name Command and Template phrases and place their declaration forms in the syntactic category of Declaration:
===================================================

P : Program
C : Command            D : Declaration
T : Template           E : Expression
L : Lefthandside       N : Numeral       I : Variable

P ::=  T

D ::=  proc I : C end  |  class I : T  |  var I = E

C ::=  L = E  |  C1 ; C2  |  D  |  L ( )

E ::=  N  |  ( E1 + E2 )  |  L  |  new T

T ::=  { C }  |  L ( )

L ::=  I  |  L . I  |  this

===================================================

Procedures are named commands

First, how do procedures work? It would seem that the declaration, proc I: C end, would bind I to the command, C, so that when a call, I(), later appears, we would find the binding for I, then extract command, C, then execute it. But this is too simplistic. Here are three examples that show why:
# Example 1:
var x = 7;
proc p: x = x + 1 end;
var y = new {var x = 2; p()}
When p() is called within the new object, p's body should likely update the global x and not the one in the new object. (But an argument might be made that the local x is incremented --- what do you think?)
# Example 2:
var clock = new {var time = 0;
                 proc tick(): time = time + 1 end};
var time = 99;
clock.tick()
The call, clock.tick(), should almost certainly increment the local variable time that resides in the object with procedure tick (and not the global time that is declared separately from tick).
# Example 3:
var clock1 = new {var time = 0;
                  proc tick(): time = time + 1 end};
var clock2 = new {var time = 99;
                  var f = -1};
clock2.f = clock1.tick;
clock2.f()
Field f in object clock2 is assigned (a handle to) clock1's tick procedure. When f is called (clock2.f()), this executes tick in clock1, which should almost certainly increment the time in clock1 (and not time in clock2).

There are three possible semantics to procedure declaration and call:

I. Static scoping via closures

In Example 1, say that the author of procedure p intends that p's body should update the x declared in the line preceding it. That is, p's body uses the namespace it is declared in as its starting point for variable lookup. This is called static scoping.

To implement static scoping, when p is declared, p is bound to the handle of a closure that contains both p's body and the handle to the active namespace at the time p is declared. The namespace that holds the command and the handle is called a closure.

The invocation, p(), causes a lookup of p to find the closure. The handle saved in the closure is pushed onto the activation stack, making that namespace the active one, and p's command is executed. At conclusion, the handle is popped from the activation stack.

Here is a diagram that shows the execution of Example 1, where p has been called, and its body, x = x + 1, is about to execute:


This is how static scoping is implemented in Algol60, Pascal, Ada, Modula, Python, Scheme, and almost all procedural, modular languages. It also nicely handles object structure; here is a diagram of Example 2 where, clock.tick() is called and its time = time + 1 is about to execute:

Again, since handle δ was saved in tick's closure, it can be pushed onto the activation stack so that clock's time variable is correctly updated.

Finally, since a closure is labelled by a handle, the handle can be assigned to a variable, and Example 3 can operate as expected. Here is a diagram of the machine state at the execution of the code body generated by clock2.f():


Since the closure remembered the handle to the namespace it should use, the call will update the correct time. The delegate object in C# is exactly a closure that operates as shown here.

II. Virtual scoping via L-Values

Starting from Smalltalk, object-oriented languages have provided a second approach to declaring and calling procedures. The result, virtual scoping, is "static scoping most of the time", but it behaves differently when subclasses are added to the language. (We will encounter these soon.)

First, a declaration, proc I: C end, binds I to a closure that holds just C; no handle is saved in the closure. When L() is called, L evaluates to some L-value, a pair of form, (θ, I), where θ is of course the namespace where I is bound to its closure. θ itself is pushed onto the activation stack. Then, the closure bound to I in namespace θ is fetched and its command extracted and executed, using θ's namespace as the active one, the source of its variables.

For Example 1, the execution looks almost exactly the same:


but the closure holds no handle. Example 2 also computes the same way as before. But Example 3 behaves differently, because the L-value computed for clock2.f is (φ, 'f'), which forces the code, time = time + 1, to update clock2's time:

Virtual scoping is used in Smalltalk and Java as the default. (Java disallows Example 3, by the way!) C# uses both static and virtual scoping for methods (to obtain the latter, use the keywork, virtual). Python uses static scoping for procedures and virtual scoping for procedures/methods declared within classes.

III. Dynamic scoping via naive execution

The approach mentioned at the beginning of this section --- for proc I: C end, bind I to C; for I(), find command C and execute it on the spot --- is called dynamic scoping.

That is, upon declaration, a closure holding only code is saved, as in virtual scoping. Upon call, the called procedure's body executes with whatever namespace is active at the position when the call appears. (Unlike static and virtual scoping, there is no handle pushed and popped.) In Example 1, the call to p updates the x declared in the inner object. In Example 2, the global time increments, and in Example 3, the time within clock2 increments.

Dynamic scoping is implemented in the original version of Lisp. (By the way, the first implementations of Smalltalk were Lisp-coded interpreters, which is how virtual scoping was developed.)

Classes are named Templates

A class is declared like a procedure and it is called in the same fashion:

class clock : { var time = 0;
                proc tick : time = time + 1 end};

var j = new clock();
var k = new clock();
k.tick()
new clock() is a "class call", which means that the binding for clock is found --- It's a handle to a closure holding the template that allocates time and tick. The template executes like the body of a procedure executes. As with procedures, a class might be used with static, virtual, or dynamic scoping. (In fact, C#/Java make it difficult, if not impossible, for a class definition to even reference a global variable, so the choice of scoping rarely matters!)

Here is a picture of the runtime configuration of the above example just before the call, k.tick(), returns, assuming static scoping of classes and procedures:


The two calls, clock(), constructed two namespaces, and k.tick() used the one named by k.

IMPORTANT: No new computational machinery is added to the model. This is the beauty in Tennent's approach --- it exploits the computational abilities already in place.

Formal semantics of abstracts

The semantics of both command and template abstracts follow a standard format. Here, we define static-scoping semantics. (You should write the semantics for virtual and dynamic scoping.)

===================================================

Declaration[[ proc I : C ]]  == 
       bind(top(), I,  alloc{'type'= 'proc', 'code'= [[ C ]], 'parent'= top()})

Declaration[[ class I : T ]] ==
       bind(top(), I,  alloc{'type'= 'class', 'code'= [[ T ]], 'parent'= top()})


### findClosure(label, lvalue)  extracts the closure bound at  lvalue,
### checks that its  type == label,
### and returns the closure's code and the handle to its global variables.
findClosure(label, (handle,name)) ==
     let clhandle = find (handle,name)
     if label == find(clhandle, 'type') :
        (find(clhandle, 'code'), find(clhandle, 'parent'))
     else : raise Exception
         
Command[[ L() ]] == 
         let (code, parentlink) = findClosure('proc', Lefthandside[[ L ]])
         push(parentlink)
         Command code 
         pop()

Template[[ L() ]] ==  
         let (code, parentlink) = findClosure('class', Lefthandside[[ L ]])
         push(parentlink)
         let newhandle = Template code 
         pop()
         newhandle

===================================================

Exercises

  1. Work the interpretive semantics of Examples 1-3 above.
  2. Define virtual and dynamic scoping semantics and apply them to Examples 1-3 as well.
  3. Add Expression abstracts to the language. How do these differ from procedures that return answers? (Add the latter, now, too.)
  4. Add Declaration abstracts to the language. How do these differ from classes? How are they similar to Modula modules or Java packages or C# namespaces?
  5. Are there any other forms of useful abstracts one might add to the language?

Declarations within abstracts

Now that the forms of scoping used with abstracts has been developed, we encounter a problem: what happens when a variable is declared within a procedure? Consider this example:
var x = 0;
proc p: var y = x; x = x + y end;
p()
# at this point, is variable  y  bound in the global namespace?
Should procedure p be allowed to add variable y to the global namespace? The semantics above allows this. Most all modern languages do not --- instead, y is bound in some separate, local namespace that is created for the private use of p's body. In the next section, we study how to implement a local namespace for procedure calls.


4.2 Parameterization principle

The previous section focussed on scope issues. But procedure calls usually come with parameters, and procedure bodies usually declare local variables, so for each procedure call, we must allocate a namespace (called an activation record or frame) to hold parameter-argument bindings and local-variable declarations.

Let's add expression parameters to procedures and classes:

===================================================

D ::=  proc I1 ( I2 ) : C end  |  class I1 ( I2 ) : T
C ::=  . . .  |  L ( E )
T ::=  . . .  |  L ( E )

===================================================
When proc I1(I2): C end is declared, a closure is constructed to hold parameter I2, command C, and the active namespace's handle, which serves as the handle to find I1's nonlocal variables.

When the call, L(E), appears, L is evaluated to the L-value that leads to I1's closure; E is evaluated to its meaning, m; a new namespace (activation record) is allocated to hold the binding, I2 = m, along with the handle extracted from the closure. The handle of the activation record is pushed onto the activation stack, C executes, and the stack is popped at conclusion.

Here is a small example that shows how the activation record is created when a procedure is called:

var time = 0;
proc tock(n): 
   var m = 2;
   time = time + (m + n)   # (*)
end;
tock(3)
At point (*), just before the call to tock finishes, the configuration looks like this:

At this point, the active namespace is δ, tock's activation record. tock's global variables are found from (δ, 'parent'), that is, at the namespace labelled α, When the call finishes, the stack pops, and the active namespace reverts to α.

Here is a second example, a parameterized class:

class clock(init, increment): { var time = init;
                                proc tick():
                                    time = time + increment;
                                proc reset(what):
                                    var w = what + increment;
                                    time = w    #(**)
                              }
var j = new clock(3, 1);
j.reset(0)
The semantics of class declaration and invocation work the same as a procedure's. Here is a snapshot of this program, at (**), just before j.reset(0) returns:

The static chain for the call j.reset(0) is the sequence of parent links starting from σ. Notice the namespace that retains the values of init and increment, which were bound when the object was constructed and must be saved for the use of tick and reset.

Java/C# do not allow parameters to classes (instead, they use parameterized constructor methods), but it is no problem to implement them. Again, no new semantic machinery is required. Here is the semantics for parameterized procedures.

===================================================

Declaration[[ proc I1 ( I2 ) : C ]]
   ==  bind(top(), I1, alloc{'type' = 'proc, 'param'= I2,
                             'code'= [[ C ]], 'parent'= top()} )

### returns a tuple of the  parameter name, command, and static link:
findClosure(label, (handle,name)) ==
     let clhandle = find (handle,name)
     if label == find(clhandle, 'type') :
        (find(clhandle, 'param'), find(clhandle, 'code'), find(clhandle, 'parent'))
     else : raise Exception

Command[[ L ( E ) ]] ==
         let (paramname, code, parentlink) = findClosure('proc', Lefthandside[[ L ]])
         push(alloc{paramname = Expression[[ E ]], 'parent' = parentlink})
         Command code
         pop()

===================================================
You should write a similar semantics for a parameterized class.

Exercises

  1. Revise the syntax and semantics so that procedures can accept commands as parameters. (Hint: review closures and "Example 3" from the previous section.)

  2. Template parameters work well with classes; they define "generic types", which are crucial to the Haskell and C# language libraries. Revise the syntax and semantics of classes to accept template parameters.


4.3 Qualification principle

The last extension principle allows some bindings to be hidden (``private'') from exterior access. It is inspired by Algol60's command block:
begin D in C end
The semantics of the command block goes like this: a new namespace is allocated, into which D's bindings are placed; C uses the namespace, after which the namespace is deactivated:

For commands, the semantics looks like this:

Command[[ begin D in C end ]] ==  push(alloc{'parent' = top()})
                                  Declaration[[ D ]]
                                  Command[[ C ]]
                                  pop()
The semantics ensures that declarations, D, as well as any declarations constructed within C are deposited in a local namespace. This ensures privacy --- no code outside of the begin ... end can directly reference the bindings made in the block.

The format works especially well for templates, and it is the origin of ``private fields'' in classes:

===================================================

T ::=  { C }  |  private D in T  |  L ( E )

Template[[ private D in T ]] ==  push(alloc{'parent' = top()})
                                 Declaration[[ D ]]
                                 let han = Template[[ T ]]
                                 pop()
                                 han

===================================================
Example:
var clock = new private var time = 0 in
                { proc tick(n): time = time + n end }

clock.tick(1)  #  time  cannot be referenced directly here
No new semantic machinery is required.

Other phrase forms can use blocks, but modern languages often marry blocks to parameterized abstracts, so that the private declarations are treated like extra parameter bindings. This approach works fine for straightforward examples, like this one:

class clock(init, increment)
        private var time = init :
        { proc tick(): time = time + increment end;
          proc reset(what): time = what end
        }
An activation, like var c = new clock(0,2), would construct a two-namespace object, where one namespace holds init = 0, increment = 2, and time = 0, and another holds the handles to the closures named by tick and reset. The example establishes a correspondence to this code fragment, which uses a template-block:
var c = new private var init = 0; var increment = 2; var time = init in
            { proc tick(): time = time + increment end;
              proc reset(what): time = what end
            }
Tennent remarked there should be a ``correspondence principle'' in the semantics between declaration and parameter binding --- the semantical effects should be the same.

It is easy to modify the semantics of parameterized abstracts to store private declarations: For the format, class I1 (I2) private D : T, the semantics of the class declaration is to store within a closure all of I2, D, and T. The semantics of the class's call, L(E), is to locate the closure named by L's L-value, allocate a new namespace, insert into it I2 bound to E's value, push the namespace, evaluate D (its bindings will be deposited into the new namespace), and then evaluate T (which allocates a namespace for the ``public fields'' of the object. At conclusion, the namespaces are popped from the activation stack.

Public and private fields

Most object languages use a simpler layout for private declarations than the compound namespaces we've just seen. Instead, ``private'' bindings are merged together with ``public'' bindings in one and the same namespace. If we do this, we must augment the language's syntax with extra keywords to assert the fields' visibilities:
D ::=  M var I = E  |  M proc I1 ( I2 ) : C end  | ...
M ::=  private  |  public
A private declaration cannot be indexed from outside its own namespace, that is, only inside-out lookup works --- outside-in lookup, that is, L.I dot-notation, is not allowed to reference a binding labelled private.

This approach is typically implemented by a compiler, e.g., the Java compiler checks in advance, during its type-checking phase, that no private-labelled field is indexed outside-in.

As an exercise, you should define the semantics of the private-public labels and the Lefthandside semantics of lookup.

The main advantage of combining private and public fields in the same namespace is the increased utility of the this pronoun:


5 Compound namespaces and the semantics of this

In the core language, there was a pronoun, this, and its meaning was the active namespace, which at that time coincided with the notion of ``this object.'' Now that called procedures generate activation records and now that objects with private declarations can be allocated as compound namespaces, the correspondence between "active namespace" and "this object" breaks down. Here is a use of this in an object-oriented language:

class C() : { var f = 0;
              proc p(f) : this.f = f  end };
var ob = new C();
ob.p(3)
Say that new C() generates a namespace whose handle is α and holds the binding, 'f' = 0. Then the call, ob.p(3), generates a namespace whose handle is β that holds the parameter binding, 'f' = 3. The active namespace is β. The command, this.f = f, assigns 3 (the binding found at (β, 'f')) to the f whose L-value is (α, 'f').

Why does this happen? A typical object-oriented language uses virtual scoping for procedure (method) calls, and this is determined by the L-value computed by virtual scoping.

The call, ob.p(3) proceeds like this:

  1. Evaluate ob.p to the L-value, (α, p).
  2. Create a new activation (namespace) for the call to p, and save in it these bindings: f = 3 and this = α. The name, this, is treated as an extra parameter name.
  3. The assignment, this.f = f, computes as expected, because this.f computes the L-value, (α,f), and f computes to an inside-out lookup of parameter f, which computes L-value (β,'f') and whose lookup yields 3.
To summarize as simply as possible:
a call, objectName.methodName(args), computes the handle for objectName, which is bound to the name, this, for the execution of the body of methodName.

We already studied the semantics of virtual scoping. To employ it now, we must enforce this one significant restriction to the programming language:

It is illegal for the handle of a closure to be an expressible value (that is, the value of an expression). In particular, we cannot assign a closure handle to a variable:
var c = new { var x = 1;
              proc p(x): this.x = x  end
            };
var d = new { var x = 2;
              var f = c.p  # ILLEGAL ---  c.p  is NOT a legal expression
            };
d.f(99)
That is, we disallow "Example 3" from Section 4.1 on the Abstraction Principle.

The object language we are developing is Java-like. (Recall that Python allows both static scoping and virtual scoping, so the above example is allowed in Python.) As an exercise, consider how to use the two forms of closures, for static and for virtual scoping, in the same language. Write the semantics definition of procedure call to check which form of procedure closure is called. This is how it's done in C# and Python.

Here is how to define virtual-scoping/this for procedure calls but allow inside-out variable lookup to execute with static scoping. This essentially (but not exactly!) matches Java's semantics:

===================================================

### same as before:
Declaration[[ proc I1 ( I2 ) : C ]]
   ==  bind(top(), I1, alloc{'type' = 'proc, 'param'= I2,
                             'code'= [[ C ]], 'parent'= top()} )

### same as before:
findClosure(label, (handle,name)) ==
     let clhandle = find lvalue
     if label == find(clhandle, 'type') :
        (find(clhandle, 'param'), find(clhandle, 'code'), find(clhandle, 'parent'))
     else : raise Exception


### modify this equation:
Command[[ L ( E ) ]] ==
         let (thishandle, id) = Lefthandside[[ L ]]
         let (paramname, code, parentlink) = findClosure('proc', thishandle, id)
         push(alloc{paramname = Expression[[ E ]], 'this' = thishandle,
                    'parent' = parentlink})
         Command code
         pop()


### modify this equation:
Expression[[ L ]] ==  let val = find Lefthandside[[ L ]]
                      if notHandleToClosure(val) :  val
                      else : raise Exception


Lefthandside[[ I ]] == ...as before...

Lefthandside[[ L . I ]] == ...as before...

Lefthandside[[ this ]] == if member('this', top()) :
                              (top(), 'this')
                          else : raise Exception  # OR, (top(), 'ns')


===================================================

When subclasses arrive, the semantics shown for L(E) and for L.I change once more.

Exercise

What goes wrong with this example program when its semantics is computed so that private variables are saved in a namespace that is separate from the namespace that holds the public methods?
class Clock() : private var time = 0 in
                { proc tick(): time = time + 1 end;
                  proc reset(time): this.time = time end  # the problem is here
                }
var c = new Clock();
c.tick();
c.reset(99);
Explain why the error disappears if the private variable is saved in the same namespace with the public methods. (Hint: reread the end of the previous section, about public-private labelling.)


5.1 Subclasses

Tennent's extension principles do not yield subclasses. Indeed, subclassing is an operation on templates, independent from abstracts, parameters, and blocks. In its most general form, it is a ``template append'' operation:

T ::=  { C }  |  L ( E )  |  T1 + T2
This form is called a mix-in, and it would be used like this:
class Point(a,b) : { var x = a;  var y = b;
                     proc paint() : ...x...y...a...b... end;
                     proc display() : this.paint() end
                   };
class Color(c) : { var color = c;
                   proc paint(...) : super.paint(); ...color... end
                 };

class ColoredPoint(x,y,c) : Point(x,y) + Color(c);
var RGB = 999999;
var p = new ColoredPoint(0,0,RGB)
We will undertake a more modest version of the append operation, one that looks like Java:
===================================================

D ::=  ...  |  class I1 ( I2 ) : T  |  var I = E

E ::=  ...  |  new T

T ::=  { C }  |  L ( E )  |  extends T with { C } 

L ::=  this  |  super  |  I  |  L . I

===================================================
Here is an example in the more modest syntax:
class Point(a,b) : { var x = a;  var y = b;
                     proc paint() : ...x...y...a...b... end;
                     proc display() : this.paint() end
                   };
class ColoredPoint(m,n,c) : extends Point(m,n) with
                 { var color = c;
                   proc paint(...) : super.paint(); ...color... end
                 };

var p1 = new Point(0,0);
var p2 = new ColoredPoint(9,88,777)
Object p1 will be modelled by two namespaces, one holding bindings for a and b, and one holding x, y, paint, and display. Similarly, p2 will be a four-namespace object. All of p2.paint, p2.display, p2.color, p2.x, and p2.y are well-defined L-values, and we must alter the semantics of L.I to ensure this is so.

And there is the notion of ``superobject'' (``superclass''), as used in super.paint() --- p2's paint method will call the paint method in the ``superobject part'' of p2.

Finally, there is virtual-method override at work: the call, p2.display(), invokes the method display in Point, whose call, this.paint(), activates the paint method in ColoredPoint (whereas, p1.display() activates paint in Point).

All these issues will be handled by a new linkage, implemented by a new field, 'super', which links subclass-namespaces to superclass-namespaces.

Now, there are two steps involved in computing an L-value like p2.display:

  1. Compute the L-value of p2. This will be an ``entry handle'' into a compound namespace.
  2. Search the compound namespace for the sub-namespace that holds the closure named by display.
Step 2 is done with a search along the chain of 'super' links --- the virtual chain.

Here is the relevant fragment of the language:

===================================================

D ::=  proc I1 ( I2 ) : C  |  class I1 ( I2 ) : T  |  var I = E

C ::=  L = E  |  D  |  |  L ( E ) 

E ::=  L  |  new T 

T ::=  { C }  |  L ( E )  |  private D in T  |  extends T with { C }

L ::=  I  |  L . I  |  this  |  super

===================================================
As always, a program constructs a heap of namespaces and manages a linked-list activation stack that is pointed to by actstack.

An abstract (procedure/method) call is handled differently whether it has form L.I(E) or L(E), so we define an auxilary function for handling calls:

===================================================

Program[[.]] updates heap:

[[ T ]] ==  let actstack = nil
            Template[[ T ]]


Declaration[[.]] updates heap

[[ var I = E ]] ==  bind(top(), I, Expression[[ E ]])

[[ proc I1 ( I2 ) : C ]] ==
           bind(top(), I1, alloc{'type' = 'proc', 'param' = I2,
                                 'code' = [[ C ]], 'parent' = top()})

[[ class I1 ( I2 ) : T ]] ==
           bind(top(), I1), alloc{'type' = 'class', 'param' = I2,
                                  'code' = [[ T ]], 'parent' = top()})


Command[[.]] updates heap:

[[ D ]] ==  Declaration[[ D ]]

[[ L = E ]] ==  bind(Lefthandside[[ L ]], Expression[[ E ]])

[[ L ( E ) ]] ==  let (han, paramname, code) = callClosure('proc', [[ L ]])
                  bind(han, paramname, Expression[[ E ]])
                  push(han)
                  Command code 
                  pop()

### callClosure(label, L) locates the closure named by L,
### constructs an activation record for its call, and returns a tuple holding
### the handle to the activation and the closure's paramname and code 

callClosure(label, [[ I ]]) ==
     let clhandle = find Lefthandside[[ I ]]
     if label == find(clhandle, 'type') :
         (alloc{'parent' = find(clhandle, 'parent')}, 
             find(clhandle, 'param'),
               find(clhandle, 'code'))
     else : raise Exception

callClosure(label, [[ L . I ]])  ==
     let thishandle = find Lefthandside[[ L ]]
     let clhandle = find Lefthandside[[ L . I ]]
     if label == find(clhandle, 'type') :
         (alloc{'parent' = find(clhandle, 'parent'),  'this' = thishandle},
             find(clhandle, 'param'),
               find(clhandle, 'code'))
     else : raise Exception




Expression[[.]] updates heap returns Denotable:

[[ L ]] ==  let v = find Lefthandside[[ L ]]
            if notHandleToClosure(v) :  v
            else : raise Exception

[[ new T ]] ==  Template[[ T ]]



Template[[.]] updates heap and returns Handle:

[[ { C } ]] == newhandle = alloc{'parent' = top(), 'super' = nil}
               push(newhandle)
               Command[[ C ]]
               pop()
               newhandle


[[ private D in T ]] == push(alloc{'parent' = top()})
                        Declaration[[ D ]]
                        let han2 = Template[[ T ]]
                        pop()
                        han2

[[ extends T with { C } ]] == let han1 = Template[[ T ]]
                              let han2 = alloc{'parent' = top(), 'super' = han1}
                              push(han2)
                              Command[[ C ]]
                              pop()
                              han2


[[ L ( E ) ]] == let (han, paramname, code) = callClosure('class', [[ L ]])
                 bind(han, paramname, Expression[[ E ]])
                 push(han)
                 let obhan = Template[[ code ]]
                 pop()
                 obhan



Lefthandside[[.]] returns LValue:

[[ I ]] ==  searchChain('parent', top(), I) 

[[ L . I ]] ==  searchChain('super', find Lefthandside[[ L ]], I)

# Starting from namespace  han,  searches for  I  in the namespaces linked
# together by  linkname  and returns the L-value where  I  resides:
searchChain(linkname, han, I) ==
          if han != nil :
             if member(I, han) :
               (han, I)
             else : searchChain(linkname, find(han, linkname), I)
          else : raise Exception


[[ this ]] ==  if member('this', top()):
                  (top(), 'this') 
               else : raise Exception

[[ super ]] ==  searchChain('parent', top(), 'super')

===================================================
Important points:

Every object language has its own unique semantics for method definition, scoping, subclasses, and this. Be careful!

  • The semantics of super listed above matches the one used in C# (there, it is named base) --- the meaning is determined by static scoping. (Java restricts the use of super to inside constructor methods.)


    5.2 Semantics of mix-ins

    We can simplify the semantics of templates and classes if we add a parameter to the semantic function for templates. The more general syntax of templates is
    ===================================================
    
    T ::=  { C }  |  T1 extendedby T2 |  private D in T  |  L ( E )
    
    ===================================================
    and the semantics goes like this:
    ===================================================
    
    Template[[.]] updates heap and returns Handle:
    
        Template[[ T ]] = evalTemplate(T, nil)
    
    
    ### evalTemplate(t, s) allocates template  t  using handle  s  as its super-link
    
    evalTemplate( [[ { C } ]], superlink)  ==
                   let newhandle = alloc{'parent' = top(), 'super' = superlink}
                   push(newhandle)
                   Command[[ C ]]
                   pop()
                   newhandle
    
    evalTemplate([[ T1 extendedby T2 ]], superlink) ==
                                evalTemplate(T2, evalTemplate(T1, superlink))
    
    evalTemplate([[ private D in T ]] , superlink) ==
                   push(alloc{'parent' = top()})
                   Declaration[[ D ]]
                   let han2 = evalTemplate(T, superlink)
                   pop()
                   han2
    
    evalTemplate
    [[ L ( E ) ]] == let (han, paramname, code) = callClosure('class', [[ L ]])
                     bind(han, paramname, Expression[[ E ]])
                     push(han)
                     let obhan = Template[[ code ]]  # ignore superlink!
                     pop()
                     obhan
    
    
    
    ===================================================
    evaluateTemplate(t, s) links the allocated object for template t to the object's super-object, s. This makes the semantics of mixins, T1 extendedby T2, truly simple --- it is two objects linked together. The usual form of inheritance, I(...) extendedby {C}, is a simple instance of a mixin.

    The multiple inheritance of procedure p that appears to arise in x's object in

    class C(): {proc p(...):...};
    class D():  C() extendedby {proc q(...):... p(...)...};
    var x = new {proc p(...): ...} extendedby D()
    
    is resolved by the above semantics in copy-rule style: x.q() calls procedure p in class C. This matches the expansion of x's body into
    var x = new  {proc p(...): ...} 
                 extendedby
                 ({proc p(...):...} extendedby {proc q(...):... p(...)...})
    


    6 Eliminating the activation stack

    The semantics we have developed is in syntax-directed translation format, so named because each syntax phrase is translated into its meaning, which is a script that manipulates the heap and activation-stack pointer. Compilers are structured as syntax-directed translations that translate parse trees into object-code (scripts).

    The stack-heap machine model we have used is typical for object languages. If we are less dogmatic, we can dispense with the activation stack by adding a parameter to the semantic functions that is the handle to the active namespace. The parameter has the same value as did the top of the activation stack.

    Here is the core language from the beginning of these Notes redefined with the active-namespace parameter. The semantics gets simpler because there are no push-pop-top actions:

    ===================================================
    
    P ::=  T
    C ::=  var I = E  |  L = E  |  C1 ; C2  |  while E : C end
    E ::=  N  |  ( E1 + E2 )  |  L  |  new T
    T ::=  { C }
    L ::=  I  |  L . I 
    
    
    Program[[.]] updates heap:
    
    [[ T ]] ==  Template[[ T ]](nil)
    
    
    Command[[.]](handleToActiveNamespace)  updates heap:
    
    [[ var I = E ]](a) ==  bind(a, I, Expression[[ E ]](a))
    
    [[ L = E ]](a) ==   bind(Lefthandside[[ L ]](a), Expression[[ E ]](a))
    
    [[ while E : C end ]](a) ==  if Expression[[ E ]](a) != 0 :
                                     Command[[ C ]]
                                     Command[[ while E : C end ]]
                                 else : (skip)
    
    [[ C1 ; C2 ]](a) == Command[[ C1 ]](a)
                        Command[[ C2 ]](a)
    
    
    Expression[[.]](handleToActiveNamespace)  updates heap and returns Denotable:
    
    [[ N ]](a) ==  return int(N)
    
    [[ E1 + E2 ]](a) ==  Expression[[ E1 ]](a) + Expression[[ E2 ]](a)
    
    [[ L ]](a) ==  find (Lefthandside[[ L ]](a))
    
    [[ new T ]](a)  ==  Template[[ T ]](a)
    
    
    Template[[.]](handleToActiveNamespace)  updates heap and returns Handle:
    
    [[ { C } ]](a)  == let newhandle = allocate {'parent' = a}
                       Command[[ C ]](newhandle)   # use new namespace for C 
                       newhandle
    
    
    Lefthandside[[.]](handleToActiveNamespace)  returns LValue:
    
    [[ I ]](a) ==  searchStatic(a, I)
    
    where  searchStatic(thishandle, I) ==
              if thishandle != nil :
                 if member(I, thishandle) :
                   (thishandle, I)
                 else : searchStatic(find(thishandle, 'parent'), I)
              else : raise Exception
    
    [[ L . I ]](a) ==  let  han = find (Lefthandside[[ L ]](a))
                       if member(I, han) :
                           (han, I)
                       else : raise Exception
    
    
    ===================================================
    The semantics of procedures gets simpler, too. Here is the static-scoping semantics:
    ===================================================
    
    Declaration[[ proc I1 ( I2 ) : C ]](a) == 
            bind(a, I1, alloc{'type'= 'proc', 'code'= [[ C ]], 'parent'= a)
    
    Command[[ L () ]](a) == 
             let (code, parentlink) = findClosure('proc', Lefthandside[[ L ]], a)
             Command code parentlink
    
    ===================================================
    as does the semantics of blocks and private definitions.

    What we have developed is a Scott-Strachey denotational-style semantics based on namespace algebra.

    Exercise: Revise the semantics definitions for the templates of the language with classes and mix-ins, so that Template[[ T ]](a)(superlink) allocates the object(s) defined by T and returns the corresponding entry handle. This would make the semantics go as follows:

    Program[[ T ]] == Template[[ T ]](nil)(nil)
    
    and
    Expresson[[ new T ]](a)  == Template[[ T ]](a)(nil)
    
    and
    Template[[ { C } ]](a)(superlink) ==
                     let newhandle = alloc{'parent' = a, 'super' = superlink}
                     Command[[ C ]](newhandle)
                     newhandle
    
    and so on.