Copyright © 2010 David Schmidt

Chapter 2:
Two architectures for imperative languages


2.1 An interpreter for a C-like language
    2.1.1 Data-type declarations
2.2 An interpreter for an object-oriented language


The previous chapter showed how to implement a language's syntax with a parser and the semantics with an interpreter. The former converts a sequence of textual words into an operator tree, which we represent as a nested list. The latter traverses the operator tree and updates internal data structures (e.g., the namespace).

The intepreter in the previous chapter is overly naive. In this chapter we study two more realistic interpreter architectures: one for C-like languages and one for object-oriented languages. Here are the key features of each:

An interpreter architecture is also called a virtual machine, because it is a software-emulated version of a hardware machine that we might construct to execute programs in the language.


2.1 An interpreter for a C-like language

Here is the command language from the previous chapter extended by two new constructions, &L (``the location of variable L'') and *L (''the value at the location that was extracted at the location named by L''). A location is called a pointer.
===================================================

P : Program
CL : CommandList
C : Command
E : Expression
D : Declaration
L : LefthandSide
I : Variable
N : Numeral

P ::=  CL

CL ::= C  |  C ; CL

C ::=  L = E  |  while E : C end  |  print L  | 

E ::=  N  |  ( E1 + E2 )  |  L  |  & L

L ::=  I  |  * L

N ::=  string of digits

I ::=  strings of letters, not including the keywords,  while,  print

===================================================
There is a new syntax domain, LefthandSide, which names the phrases that can appear on the left-hand side of an assignment.

To understand this language, we look at the architecture of its interpreter (virtual machine). There are three components: the controller ("processor unit") which are the interpreter's functions, an environment ("symbol table") data structure, and a memory vector. Data is stored in the memory's cells, and the location numbers of the cells are saved in the environment. Here is a sample configuration:


The layout shows that variable x names location 2 in the memory; y names location 0, and z names location 1. In the memory itself, location 0 holds 5, location 1 holds 0, etc. Perhaps the configuration was constructed by these commands:
y = 5;
z = 0;
x = (6 + y)
Since there are no declarations in our little language, the environment is updated and memory cells are allocated when new variables are mentioned. For example, the first command, y = 5, stores 5 in newly allocated memory cell 0 and y is bound to location 0 in the environment. Similarly, z names newly allocated location 1, and location 1 holds 0. Finally, 6 is added to the int that is found at location 0 and stored in newly allocated locaton 2.

But the same environment-memory layout is constructed by the following commands, which manipulate location numbers as well as ordinary ints:

y = 5;
z = &y;      # store in  z's  cell the location named by  y
x = (6 + *z)  # add 6 to the int stored at the location stored in  z's  location
              #  and store the sum in the location named by  x  (whew!)
The first command works like before, but the second stores y's location (0) into z's location --- read &y as ``y's location.'' Some people say that ``z points to y.'' The third command adds 6 to the integer found at location 0:
  1. z's value is 0
  2. *z's value is 5, which is the int stored at location 0.
The sum, 11, is stored in location 2. You can read *z as ``the value in the location pointed to by z.'' The * is called a dereferencing operator.

Pointers are powerful, because they let you use locations the same way you use ints. (You can even do arithmetic with them.) They are used a lot in systems programming, and it is critical that you understand them.

Let's study the interpreter that implements pointers. Here are the operator trees the interpreter uses; they are the same as in the last chapter, plus the two new forms:

===================================================

PTREE ::=  [ CTREE+ ]
           where  CTREE+  means  one or more CTREEs

CTREE ::=  ["=", LTREE, ETREE]  |  ["while", ETREE, CLIST]  |  ["print", VAR]

ETREE ::=  NUM  |  ["+", ETREE, ETREE]  |  ["&", LTREE]  |  LTREE

LTREE ::=  VAR  |  ["*", LTREE]

NUM ::=  string of digits

VAR ::=  string of letters but not  "while" or "print" or "end"

===================================================
For example, the PTREE for the earlier example, y = 5; z = &y; x = (6 + *z), looks like this:
[["=", "y", "5"],
 ["=", "z", ["&", "y"]],
 ["=", "x", ["+", "6", ["*", "z"]]]
}

Here is the coded interpreter --- the entire virtual machine, controller plus data structures. Read the comments at the top and then read the coding of interpretLTREE, which calculates the meaning of an assignment's left-hand side, which computes to a location number.

Next, study interpretETREE to see how ["&", LTREE] computes to a meaning different from just LTREE. This is crucial, because the former computes to a location number and the latter computes to the number stored at the location number. It may look strange, but that's how it's done in C!

===================================================

"""Interpreter for a C-like mini-language with pointers.
   There are two crucial data structures:
"""
memory = []  # a global variable that models primary storage.
             # It is a list (array), where the indexes to the array are
             # meant to model addresses.  For example, if
             #   memory = [5,0,11],  then location 2 of  memory  holds  11.

env = {}     # a global variable that holds the program's variable names
             # and the locations that each denotes. 
             # It is a Python hash table (dictionary).  For example, if
             #   env = {'x'=2, 'y'=0, 'z'=1}, then variable x names location 2
             # in  memory,  and  memory[2] == 11.   This is how computer
             # storage is used in an implementation of C.   

def interpretCLIST(p) :
    """pre: p  is a program represented as a  CLIST ::=  [ CTREE+ ]
                  where  CTREE+  means  one or more CTREEs
       post:  memory  holds all the updates commanded by program  p
    """
    for command in p :
        interpretCTREE(command)


def interpretLTREE(x) :
    """pre: x  is an L-value represented as an LTREE:
         LTREE ::= VAR | ["*", LTREE]
       post: ans  holds the location named by  x
       returns:  ans
    """
    if isinstance(x, str) :   # a VAR ?
        if x in env :
            ans = env[x]  # look up its location
        else :  # it's a brand new var, so allocate a memory cell for it:
            memory.append("err")  # add a cell at the end of  memory
            env[x] = len(memory) - 1  # remember the location
            ans = env[x]
    else :  # a pointer dereference,  ["*", LTREEy]
        y = interpretLTREE(x[1])   # get value of  LTREEy
        ans = memory[y]    # dereference it and return the location therein
    return ans


def interpretCTREE(c) :
    """pre: c  is a command represented as a CTREE:
         CTREE ::= ["=", LTREE, ETREE] | ["print", LTREE] | ["while", ETREE, CLIST]
       post:  memory  holds all the updates commanded by  c
    """
    operator = c[0]
    if operator == "=" :   # assignment command, ["=", LTREE, ETREE]
        lval = interpretLTREE(c[1])
        exprval = interpretETREE(c[2])  # evaluate the right-hand side
        memory[lval] = exprval  # do the assignment
    elif operator == "print" :   # print command,  ["print", VAR]
        num = interpretETREE(c[1])
        print num
    elif operator == "while" :   # while command
        expr = c[1]
        body = c[2]
        while (interpretETREE(expr) > 0) :
            interpretCLIST(body) 
    else :   # error
        crash("invalid command")


def interpretETREE(e) :
    """pre: e  is an expression represented as an ETREE:
          ETREE ::=  NUMERAL  |  LTREE  |  ["&", LTREE]  |  [OP, ETREE1, ETREE2]
                     where OP is either "+" or "-"
      post:  ans  holds the value of  e
    """
    if isinstance(e, str) and  e.isdigit() :   # a NUMERAL
        ans = int(e)
    elif isinstance(e, list) and (e[0] == "+" or e[0] == "-"):  # [OP, E1, E2]
        op = e[0]
        ans1 = interpretETREE(e[1])
        ans2 = interpretETREE(e[2])
        if op == "+" : 
            ans = ans1 + ans2
        elif op == "-" : 
            ans = ans1 - ans2
        else :
            crash("illegal arithmetic operator")
    elif isinstance(e, list) and e[0] == "&" :  # ["&", LTREE] 
        ans = interpretLTREE(e[1])   # return the location named by the LTREE
    else:  # is an LTREE that must be dereferenced:
        x = interpretLTREE(e)
        ans = memory[x]
    return ans


def crash(message) :
    """pre: message is a string
       post: message is printed and interpreter stopped
    """
    print message + "! crash! core dump: ", memory
    raise Exception   # stops the interpreter


def main(program) :
    global memory, env  # these variables are global to main
    memory = []         # reset them both
    env = {}
    try :
      interpretCLIST(program)
    except Exception :
        print "Due to error, interpreter must quit prematurely.  Sorry."
    print "final env =", env
    print "final memory =", memory

===================================================

Exercise: Test the interpreter with these programs:

x = 3

main([["=", "x", "3"]])


x = 2;  p = &x;  y = *p

main([["=", "x", "2"],
      ["=", "p", ["&", "x"]],
      ["=", "y", ["*", "p"]]
     ])


x = 1;  p = &x;  *p = 2

main([["=", "x", "1"],
      ["=", "p", ["&", "x"]],
      ["=", ["*", "p"], "2"]
     ])


x = 0;  y = (6 + &x);  *x = 999;  print y

main([["=", "x", "0"],
      ["=", "y", ["+", "6", ["&", "x"]]],
      ["=", ["*", "x"], "999"], 
      ["print", "y"]
     ])

Exercise: In C, arrays are laid out in memory as a sequence of cells. For example, these commands,

x = 4 ;
r = [0,0,0,0] ;
y = 99
would generate this environment and memory:
env = {"x": 0,  "r": 1,  y: 5}

memory = [4, 0, 0, 0, 0, 99]
This makes an array lookup, like r[E], work like this:
  1. look up r's location in the environment
  2. add to it the int value of E
  3. extract the int stored at the location computed in Step 2.
Implement arrays in the interpreter. Then, modify the interpreter so that it can detect an array-indexing error, where E is too large or too small to index correctly into the sequence of cells allocated for r.


2.1.1 Data-type declarations

The ``bare'' architecture for C lets you mix ints and locations, which almost always causes trouble. For this reason, C lets you declare variables with a type tag that indicates how the variable will be used. Here is the program seen earlier with declarations embedded into it:
declare y : int ;
declare z : * int ;
y = 5 ;
z = &y ;
declare x : int ;
x = (6 + *z)
(In C, declarations are written more tersely, e.g., int y, but we use the declare keyword to get your attention.) The declarations indicate that only z can store locations in its memory cell. The program as written will operate correctly, but in contrast, this one will fail:
declare x : int ;
x = 0 ;
declare y : int ;
y = (6 + &x)
because an int is added to a location. This program also fails:
declare x : int ;
x = 0 ;
*x = 999
because x's location never holds a location.

Here is the language with declarations:

===================================================

P ::=  CL
CL ::= C  |  C ; CL
C ::=  L = E  |  while E : CL end  |  print L  |  declare I : T
T ::=  int  |  * T
E ::=  N  |  ( E1 + E2 )  |  L  |  & L
L ::=  I  |  * L
N ::=  string of digits
I ::=  strings of letters, not including the keywords,  while,  print


PTREE ::=  [ CTREE+ ]
           where  CTREE+  means  one or more CTREEs

CTREE ::=  ["=", LTREE, ETREE]  |  ["while", ETREE, CLIST]  |  ["print", VAR]
           | ["dec", VAR, TYPE]

TYPE ::= [ "ptr"* , "int" ]
         where  "ptr"*  means zero or more occurrences of  "ptr"
         (Note: it is bad luck for us that C uses  *  as well to mean  "ptr"!)

ETREE ::=  NUM  |  ["+", ETREE, ETREE]  |  ["&", LTREE]  |  LTREE

LTREE ::=  VAR  |  ["*", LTREE]

NUM ::=  string of digits

VAR ::=  string of letters but not  "while" or "print" or "end" or "declare" or "int"

===================================================

The interpreter remembers the type tags, in the environment. The tags are checked every time a variable is used. Here is a picture of the configuration generated by the first example program in this section:


Since z is a pointer variable, its type tag, *int, is saved internally as ["ptr", "int"]. Now, read the interpreter's code to see how the declarations create type tags and how interpretCTREE uses them to check the correctness of assignments and how interpretETREE uses them to check correctness of additions:
===================================================

"""Interpreter for a C-like mini-language with variables, pointers,
   declarations and loops.    There are two crucial data structures:
"""
memory = []  # a global variable that models primary storage.
             # It is a list (array), where the indexes to the array are
             # meant to model addresses.  For example, if
             #  memory = [5,0],  then location 0 of  memory  holds  5.

env = {}     # a global variable that holds the program's variable names
             # and the  type,location  that each denotes. 
             # It is a Python hash table (dictionary).  For example, if
             #   env = {'x'= (["int"],0),  'p' = (["ptr", "int"], 1}, 
             # then variable  x  names location 2, which holds an int,
             # and  p  names location 1, which holds a pointer to an int;
             # that is, location 1 holds a location that holds an int.

def interpretCLIST(p) :
    """pre: p  is a program represented as a  CLIST ::=  [ CTREE+ ]
                  where  CTREE+  means  one or more CTREEs
       post:  memory  holds all the updates commanded by program  p
    """
    for command in p :
        interpretCTREE(command)


def interpretLTREE(x) :
    """pre: x  is an L-value represented as an LTREE:
         LTREE ::= VAR | ["*", LTREE]
       returns:  datatype,location  pair named by  x
    """
    if isinstance(x, str) :   # a VAR ?
        if x in env :
            ans = env[x]  # look up its location
        else :  # undeclared
            crash("variable " + x + " undeclared")
    else :  # a pointer dereference,  ["*", LTREE]
        datatype, loc = interpretLTREE(x[1])  
        if datatype[0] == "ptr" :
            ans = (datatype[1:], memory[loc])    # dereference it
        else :
            crash("variable not a pointer")
    return ans
    


def interpretCTREE(c) :
    """pre: c  is a command represented as a CTREE:
         CTREE ::= ["=", LTREE, ETREE] | ["print", LTREE]
                   |  ["while", ETREE, CLIST]  |  ["dec", VAR, TYPE]
         where  TYPE ::= [ "ptr"*, "int" ]
           and  "ptr"*  means  zero or more occurrences of  "ptr"
                separated by commas

       post:  memory  holds all the updates commanded by  c
    """
    operator = c[0]
    if operator == "=" :   # assignment command, ["=", LTREE, ETREE]
        type1,lval = interpretLTREE(c[1])
        type2,exprval = interpretETREE(c[2])
        if type1 == type2 :
            memory[lval] = exprval  # do the assignment
        else :
            crash("incompatible types for assignment")
    elif operator == "print" :   # print command,  ["print", VAR]
        type,num = interpretETREE(c[1])
        print num
    elif operator == "while" :   # while command
        expr = c[1]
        body = c[2]
        type,val = interpretETREE(expr)
        while (val > 0) :
            interpretCLIST(body) 
    elif operator == "dec" :  # declaration
        x = c[1]
        if x in env :
            crash("variable " + x + " redeclared")
        else :
            memory.append("err")  # add a cell at the end of  memory
            env[x] = (c[2], len(memory) - 1)  # save type and location for x
    else :   # error
        crash("invalid command")


def interpretETREE(e) :
    """pre: e  is an expression represented as an ETREE:
          ETREE ::=  NUMERAL | [OP, ETREE1, ETREE2] | ["&", LTREE] | LTREE
                     where OP is either "+" or "-"
      post:  ans  holds the datatype and value of  e
      returns:  ans,  a  datatype,value  pair
    """
    if isinstance(e, str) and  e.isdigit() :   # a NUMERAL
        ans = (["int"], int(e))
    elif isinstance(e, list) and (e[0] == "+" or e[0] == "-"):  # [OP, E1, E2]
        op = e[0]
        type1, ans1  = interpretETREE(e[1])
        type2, ans2 = interpretETREE(e[2])
        if type1 == ["int"]  and  type2 == ["int"] :
            if op == "+" : 
                ans = (["int"], ans1 + ans2)
            elif op == "-" : 
                ans = (["int"], ans1 - ans2)
            else :
                crash("illegal arithmetic operator")
        else :
            crash("cannot do arithmetic on non-ints")
    elif isinstance(e, list) and e[0] == "&" :  # ["&", LTREE] 
        type0,val0 = interpretLTREE(e[1]) 
        ans = (["ptr"] + type0,  val0)
    else:  # is an LTREE that must be dereferenced:
        type0,loc0 = interpretLTREE(e)
        ans = (type0, memory[loc0])
    return ans


def crash(message) :
    """pre: message is a string
       post: message is printed and interpreter stopped
    """
    print message + "! crash! core dump: ", memory
    raise Exception   # stops the interpreter


def main(program) :
    global memory, env  # these variables are global to main
    memory = []         # reset them both
    env = {}
    interpretCLIST(program)
    print "final env =", env
    print "final memory =", memory

===================================================
When you read interpretETREE, you see that the meaning of an expression must be a pair: a data type and a value. For example, 3 computes to (["int"], 3). The type tag ensures that the 3 is assigned only to a variable whose type is ["int"]. A pointer to an int will have type ["ptr", "int"]. For example, if we have declare x : int, which allocates location 0 for x, then ["&", "x"] evaluates to (["ptr", "int"], 0).)

Exercises:

  1. Install the interpreter and run these programs:
    declare x: int;  x = 3;  declare y: int;  y = (x + 1)
    
    main([["dec", "x", ["int"]],
          ["=", "x", "3"],
          ["dec", "y", ["int"]],
          ["=", "y", ["+", "x", "1"]]
        ])
    
    
    declare x: int;  declare p: *int;  x = 2;  p = &x;  declare y: int;  y = *p
    
    main([["dec", "x", ["int"]],
          ["dec", "p", ["ptr", "int"]],
          ["=", "x", "2"],
          ["=", "p", ["&", "x"]],
          ["dec", "y", ["int"]],
          ["=", "y", ["*", "p"]]
         ])
    
    declare x: int;  declare p: *int;  x = 1;  p = &x;  x = *x
    
    main([["dec", "x", ["int"]],
          ["dec", "p", ["ptr", "int"]],
          ["=", "x", "1"],
          ["=", "p", ["&", "x"]],
          ["=", "x", ["*", "x"]]
         ])
    
    

  2. Write the parser for the syntax and combine it with the interpreter.

  3. Try every combination of pointers and ints you can think of to try to break the interpreter.

  4. Try this example:
    declare x : int ;
    declare y : * int ;
    declare z : * * int ;
    z = &y ;
    y = &x ;
    **z = 99
    
    What happens?

  5. Try this example:
    declare x: int ;
    x = 3 ;
    while x:  declare y: int; y = 1; x = (x - y); print x  end ;
    print x
    
    What happens? Is the declaration of y in the loop body legal in C, even when the loop repeats ? (Yes.) Repair the interpreter so that it is legal here, too.

  6. Modify the syntax of declarations so that you can declare and initialize a variable together:
    C ::= . . .  |  declare I : T = E
    
    Revise both the parser and interpreter.

  7. Add procedures to the language:
    C ::=  . . .  |  proc I : C
    
    Note: if you worked the previous exercise, you have basically worked this one, too, in this format:
    C ::=  . . .  |  declare I : proc = C
    
    Add procedure types to the type tags:
    T ::=  int  |  proc  |  * T
    
    Revise the interpreter so that a variable can point to procedure code:
    declare x : int ;
    proc p :  x = (x + 1) ;
    declare y : * proc ;
    y = &p ;
    call *y
    
    C lets you do this. (Have you heard of a ``function pointer''?)


2.2 An interpreter for an object-oriented language

The second important architecture for imperative languages is the object-oriented one. This architecture differs from the one for C because it replaces the vector memory by a heap, which is a collection of namespaces. Recall that a namespace (also called a dictionary) is a dynamic record that pairs fieldnames to values. We used a namespace in the C-architecture to pair together a program's variables with location numbers. Now, we use namespaces to model objects.

Here is an example object-oriented program that creates two objects:

x = 7;
y = new {f, g};
y.f = x;
y.g = new {r};
y.g.r = y.f;
The two global variables are x and y, the former an int and the latter an object with two fields, f and g. A second object is created and assigned to y's g field. Although this program lacks classes and methods, it is easy to guess what should be allocated in the heap. The virtual machine consists of a controller (the interpreter's functions), the heap, and a cell that remembers the address ("handle") of the namespace that holds the interpreted program's global variables:

The program's namespace (global variables) lives in the heap at location α, and the two objects are saved at β and γ. We use the term, ``handle'', to talk about the location of an object, e.g., y's handle is β.

Here is the syntax:

===================================================

P : Program
CL : CommandList
C : Command
E : Expression
F : FieldNames
L : LefthandSide
I : Variable
N : Numeral

P ::=  CL

CL ::= C  |  C ; CL

C ::=  L = E  |  if E : CL1 else CL2  end  |  print L  |

E ::=  N  |  ( E1 + E2 )  |  L  |  new { F }

F ::=  I  |  I , F

L ::=  I  |  L . I

N ::=  string of digits

I ::=  strings of letters, not including keywords

===================================================
The new constructions are new (allocates a new object) and field indexing, represented by .. In a later chapter, we add methods, classes, etc.

The operator trees are

===================================================

PTREE ::=  CLIST

CLIST ::=  [ CTREE+ ]
           where  CTREE+  means  one or more CTREEs

CTREE ::=  ["=", LTREE, ETREE]  |  ["if", ETREE, CLIST, CLIST]
        |  ["print", LTREE]

ETREE ::=  NUM  |  ["+", ETREE, ETREE] |  ["deref", LTREE]  |  ["new", OB ]

OB ::=  [ ID+ ]
            where ID+  means one or more IDs

LTREE ::=  [ ID+ ]

NUM   ::=  a nonempty string of digits

ID    ::=  a nonempty string of letters

===================================================
Study the forms of ETREE; they are important. Also, note that an LTREE is a sequence of variable/field names that must be traversed to find a location.

For example, the CTREEs for a = new {f,g}; a.f = 3; a.g = (1 + a.f) look like this:

[ ["=", ["a"], ["new", ["f","g"]]],
  ["=", ["a", "f"], "3"],
  ["=", ["a", "g"], ["+", "1", ["deref", ["a", "f"]]]]  ]

Here is the interpreter; it uses a heap and a var, ns, that remembers the handle to the namespace with the global variables: picture.

===================================================

"""Interpreter for object-oriented language.
"""
### THE HEAP:

heap = {}
heap_count = 0  # how many objects stored in the heap


"""The program's heap is a dictionary that maps handles to namespaces.
   An object is a namespace.

      heap : { (HANDLE : NAMESPACE)+ }
             where  HANDLE = a string of digits
                    NAMESPACE = a dictionary that maps var names to values
   Example:
     heap = { "0": {"x":7, "y":"1", "z":"2"},
              "1": {"f":"nil", "g":5}, 
              "2": {"r":12}  }
     heap_count = 3
        is an example heap, where handle "0" names a namespace
        whose  x  field holds int 7, "y" field holds handle "1"
        (a handle to the object at heaploc "1"), "z" holds handle "2".
       Heaploc "1" holds a two-field namespace;
       Heaploc "2" holds a namespace with a single field, "r"

   The above example heap was generated from this sample program:
        x = 7;
        y = new {f, g};
        y.g = 5;
        z = new {r};
        z.r = (y.g + x);
"""

### NAMESPACE (environment):  is the handle to the namespace in the heap
#                             that holds the program's global variables

ns = ""   # This will be initialized in the main function, interpretPTREE, as
          #    ns = allocateNS()       See below.


### MAINTENANCE FUNCTIONS FOR THE  heap:

def clearTheHeap():
    """resets the heap to be empty"""
    global heap_count, heap
    heap_count = 0
    heap = {}


def allocateNS() :
    """allocates a new, empty namespace in the heap and returns its handle"""
    global heap_count
    newloc = str(heap_count)
    heap[newloc] = {}
    heap_count = heap_count + 1
    return newloc


def dereference(lval) :
    """looks up the value of  lval  in the heap
       param: lval is a pair,  (handle,fieldname),

       returns: Say that  lval = (h,f).  The function extracts the
               object at  heap[h], indexes it with f,  and returns
               the value found there:   return  heap[h][f]
               For example, for the heap at the top of this program,
                 dereference(("0","x")) returns 7
                 dereference(("1","f")) returns "nil"
    """
    ans = "nil"
    loc,f = lval
    if isinstance(loc, str) and loc.isdigit() :  # a legal heaploc
        ans = heap[loc][f]
    else :
        crash("dereference error; lval =" + str(lval))
    return ans


def store(lval, rval) :
    """stores the  rval  into the heap at  lval
       params:  lval -- a pair,  (handle, field), as described above
                rval -- an int or a handle

       Say that  lval = (h,f).  The function finds the object at  h
         and saves  rval  at index  f  within that object:  heap[h][f] = rval.
       For example, for the heap at the top of this program,
                 store(("1","f"), 98)  would replace "nil"  by 98.
    """
    loc,qual = lval
    if isinstance(loc, str) and loc.isdigit() :  # a legal heaploc
        heap[loc][qual] = rval
    else : crash("store error; lval =" + str(lval))


############ Interpreter functions:

# See the end of program for the driver function,  interpretPTREE

def interpretCLIST(clist) :
    """pre: clist  is a program represented as a   CLIST ::=  [ CTREE+ ]
                  where  CTREE+  means  one or more CTREEs
       post:  memory  holds all the updates commanded by program  p
    """
    for command in clist :
        interpretCTREE(command)


def interpretCTREE(c) :
    """pre: c  is a command represented as a CTREE:
       CTREE ::=  ["=", LTREE, ETREE]  |  ["if", ETREE, CLIST, CLIST2] 
        |  ["print", LTREE] 
       post:  heap  holds the updates commanded by  c
    """
    operator = c[0]
    if operator == "=" :   # , ["=", LTREE, ETREE]
        lval = interpretLTREE(c[1])
        rval = interpretETREE(c[2])
        store(lval, rval)
    elif operator == "print" :   # ["print", LTREE]
        v = dereference(interpretLTREE(c[1]))
        print  v 
    elif operator == "if" :   # ["if", ETREE, CLIST1, CLIST2]
        test = interpretETREE(c[1])
        if test != 0 :
            interpretCLIST(c[2])
        else :
            interpretCLIST(c[3])
    else :   # error
        crash("invalid command")


def interpretETREE(etree) :
    """interpretETREE computes the meaning of an expression operator tree.
         ETREE ::=  NUM  |  ["+", ETREE, ETREE] |  ["deref", LTREE] 
                 |  ["new", OB ]
         OB = [ ID* ]
        post: updates the heap as needed and returns the  etree's value,
              either an int, a handle, or "nil"
    """
    ans = "nil"
    if isinstance(etree, str) and etree.isdigit() :  # NUM
      ans = int(etree) 
    elif  etree[0] == "+" :      # ["+", ETREE, ETREE]
        ans1 = interpretETREE(etree[1])
        ans2 = interpretETREE(etree[2])
        if isinstance(ans1,int) and isinstance(ans2, int) :
            ans = ans1 + ans2
        else : crash("addition error --- nonint value used")
    elif  etree[0] == "deref" :
        lval = interpretLTREE(etree[1])
        ans = dereference(lval)
    elif  etree[0] == "new" :
        handle = allocateNS()
        fields = etree[1]
        for f in fields :
            store((handle,f), "nil")
        ans = handle  
    else :  crash("invalid expression form")
    return ans


def interpretLTREE(ltree) :
    """interpretLTREE computes the meaning of a lefthandside operator tree.
          LTREE ::=  [ ID+ ]     where  +  means "one or more"
       Here are some example  ltree values:
         ["x"]                #  x
         ["x", "g"]           #  x.g
         ["x", "g", "h"]      #  x.g.h

       Returns a pair, (handle, field), where  handle is the location
         in the heap where an object/namespace is stored,
         and  field is an ID that indexes into the object.

       Look again at the example heap at the very top of this program. 
       For that heap, here is what the function returns:
          ["x"]        # returns  ("0","x")
          ["x", "f" ]  #  returns ("1", "f")
    """
    id = ltree[0]
    lval = (ns, id) # we start from the main ns
    indexlist = ltree[1:]
    while indexlist != [] :
        fieldname = indexlist[0]
        indexlist = indexlist[1:]
        handle = dereference(lval)
        lval = (handle, fieldname)
    return lval


def crash(message) :
    """pre: message is a string
       post: message is printed and interpreter stopped
    """
    print message + "! crash! ns=", ns, "heap=", heap
    raise Exception   # stops the interpreter


# MAIN FUNCTION ####

def interpretPTREE(tree) :
    """interprets a complete program tree
       pre: tree is a  PTREE ::= [ CLIST ]
       post: final values are deposited in memory and env
    """
    global ns
    # initialize heap and ns:
    clearTheHeap()
    ns = allocateNS()

    interpretCLIST(tree)
    print "final namespace =", ns
    print "final heap =", heap

    raw_input("Press Enter key to terminate")

===================================================
Exercise: Here are three sample programs. Use the interpreter on each and study the final configurations:
  1. x = 2;
    y = new {f, g};
    y.f = x;
    y.g = new {h};
    print y.f
    
    interpretPTREE( [ ["=", ["x"], "2"],
            ["=", ["y"], ["new", ["f","g"]]],
            ["=", ["y","f"], ["deref",["x"]]],
            ["=", ["y","g"], ["new", ["h"]]],
            ["print", ["y", "f"]]
          ]
        )
    
  2. x = 7;
    y = new {f,g};
    y.f = x;
    y.g = r;
    y.g.r = y.f
    
    interpretPTREE( [ ["=", ["x"], "7"],
            ["=", ["y"], ["new", ["f","g"]]],
            ["=", ["y","f"], ["deref",["x"]]],
            ["=", ["y","g"], ["new", ["r"]]],
            ["=", ["y","g","r"], ["deref",["y", "f"]]],
          ])
    
  3. y = new {f,g};
    x = y;
    y.h = 5;   # the object grows as needed!
    print y
    
    interpretPTREE( [ ["=", ["y"], ["new", ["f","g"]]],
            ["=", ["x"], ["deref", ["y"]]],
            ["=", ["y", "h"], "5"],
            ["print", ["y"]]
          ])
    
    

Exercise: Notice there is no trouble in reusing a global variable name as a field name:

x = 2;
y = new {x, y};
y.x = x;  y.y = y;  x = y.x
The indexing makes clear which variable should be consulted.
  1. Now, let's make the language look closer to an object language. Say that the fields in a new object can be initialized, like this: For example,
    x = 2; 
    y = new {a = x;  b = a}
    
    This says that y's field a is set to x's value, and field b is set to a's value.

    We will use this syntax for initialization commands:

    E ::=  . . .  |  new { F* }
           where  F*  means zero of more occurrences of  F,  separated by  ;
    F ::=  I = E
    
    Revise the parser and interpreter for these new constructions. How does your semantics operate on the example? (That is, does the semantics locate the value of x when it is initializing a? Does it locate the value of a when it is initializing b? Make certain that it does. Hint: in the interpreter, replace variable ns by a stack of handles to namespaces. Use the stack when computing the semantics of a LefthandSide.

  2. In most object-oriented languages, there is are pronouns, this and super, which can be used inside an object to refer to a local field and a nonlocal field. For example,
    x = 2;  y = new {x = super.x;  y = this.x}
    
    states that y's field x is initialized with the value of global variable x and field y is initialized with the value of field x in this object being constructed.

    Revise the syntax of Lefthandside to read:

    L ::=  I  |  L . I  |  this  |  super
    
    and modify parser and interpreter to implement the new constructions.

  3. If you have solved the previous two problems, you are ready for this one: Modify the syntax of new{F} to be new{CL}, that is:
    E ::=  N  |  ( E1 + E2 )  |  L  |  new { CL }
    
    That is, when a new object is allocated, the commands, CL, execute. Here is an example:
    x = 7;
    y = { if x :
            a = x
          else
            b = 0 end;
          c = x;
          print c
        }
    
    This code sets x to 7 and sets y to {a:7, c:7}. You see that we are not far from modern objects, which contain a mix of declarations and executable constructor code.