The previous chapter showed how to implement a language's syntax with a parser and the semantics with an interpreter. The former converts a sequence of textual words into an operator tree, which we represent as a nested list. The latter traverses the operator tree and updates internal data structures (e.g., the namespace).
The intepreter in the previous chapter is overly naive. In this chapter we study two more realistic virtual-machine architectures, both for assignment-based ("imperative") languages. One is for C and one is for Smalltalk. Here are the key features of each:
To understand the previous paragraph, we look at the
virtual machine for C. The
the controller ("processor unit") is implemented by the interpreter functions, and there are two data structures: an environment ("symbol table") and linear memory.
Data is stored in the memory's cells, and the location numbers of the cells
are saved in the environment. Here is a sample configuration:
The symbol table maps each variable name to a location number in memory.
The memory holds ints in its cells.
The picture shows that variable x names location 0 in the memory;
y names location 1, and z names location 4.
In the memory itself, location 0 holds 8 and locations 3 and 4 hold 9.
Perhaps the configuration was constructed by these commands:
int x;
int[3] y;
int z;
x = 8; z = x + 1; y[2] = z;
The first three lines are declarations, which in C are storage allocation commands. (''Data types'' in C tell the C compiler how many storage cells to reserve in memory.) The locations of the
cells are saved in the environment. So, x names 0, and y names 1, which is the starting, or "base" address of a three-celled array.
z names 4.
A C-programmer must be conscious of a variable's meaning (its location number) and the dereferencing of the meaning (the number stored in the cell indexed by the location).
For example, x = 8 says ``store int 8 in the location that x means'', that is, store 8 in location 0.
But the meaning of x in the expression, x + 1, on the right-hand side of the assignment, z = x + 1, is not 0, but 8. When a variable appears in an expression, its meaning is its location dereferenced to extract the int stored in the corresponding memory cell. So, the meaning of z = x + 1 is ``store int 8+1 = 9 in location 4.''
The meaning of the left-hand side of an assignment in C is always a location. The meaning of y[2] in the last assignment, y[2] = z, is location 1 + 2 = 3 --- y's base address plus offset 2. This means location 3 receives int, 9, which was dereferenced from z's location, 4.
C allows a programmer to write code about locations and deferencing via the operations, & and * --- & means "the location of" and * means "do a dereference of." These two operators are used with "pointer variables". Say that we extend the above example with these lines of C-code:
*int p = &x; // pointer p's meaning is location 5,
// whose cell receives 0
y[0] = p; // location 1's cell receives 0,
// the value from dereferencing location 5
y[1] = *p; // location 2's cell receives 8,
// the value dereferenced from dereferencing location 5
y[2] = &p + 1; // location 3 holds 6, which is p's location plus one (!)
C lets you use locations the same way you use ints. You can even do arithmetic with them. They are used a lot in systems programming, and it is critical that you understand them.
=================================================== P : Program CL : CommandList L : LefthandSide C : Command I : Variable E : Expression N : Numeral P ::= CL CL ::= C | C ; CL C ::= L = E | while E : CL end | print L E ::= N | ( E1 + E2 ) | L | & L L ::= I | * L N ::= string of digits I ::= strings of letters, not including the keywords, while, print, end ===================================================There is a new syntax domain, LefthandSide, which names the phrases that can appear on the left-hand side of an assignment. The semantics of a lefthand-side phrase, L, is a location number. The semantics of an expression, E, is an int (or a location number).
Here is a sample configuration:
Perhaps the configuration was constructed by these commands:
y = 5; z = 0; x = (6 + y);
Since there are no declarations in our little language, the environment is
updated and memory cells
are allocated when new variables are first mentioned. (See the Exercise at
the end of this section
for declarations.)
This is the crucial feature of our C-like language:
Each variable has a location number, and each location number labels a memory cell that holds an int. For this assignment,
x = x + 1;
The meaning of x on the left-hand side of the assignment is location 2 ---
call this x's L-value. The meaning of x on the right-hand side of
the assignment is 11 (the int stored in location 2). Call this
x's R-value. Thus, location 2's cell will be updated to 12.
Now,
return
to the above picture. That
same environment-memory layout is
constructed by these commands,
which manipulate location numbers as well as ints:
===================================================
y = 4;
z = &y; # store in z's cell the location named by y
x = (7 + *z) # add 7 to the int stored at the location stored in z's location
# and store the sum in the location named by x (whew!)
*z = (y + 1) # add 1 to the int in y's cell and store the sum at the location saved in
# z's location
===================================================
The first command works like usual, but the second stores
y's location (0) into z's location ---
read &y as ``y's location.'' Some people say that
``z points to y.''
The third command adds 7 to the integer found at location 0:
The final command computes y + 1, that is, 5, and stores it at location 0, because z names location 1 and location 1 (that is, *z) holds 0.
The syntax of a C-assignment is expressed like this:
L = E
where both L and E can be compound phrases.
The meaning of the Lefthandside L-phrase is called an L-value;
the meaning of the Expression E-phrase is called an R-value.
In our version of C, an L-value is always a location, and an R-value
is always an int (locations are ints in C!).
For example, for
z = &y
The L-value is location 1 and the R-value is (int/location) 0.
For
*z = (y + 1)
the L-value is location 0 and the R-value is int 5.
We study the interpreter to understand & and *.
Here are the operator trees the interpreter uses;
they are the same as in
the last chapter, plus the two new forms:
===================================================
PTREE ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
CTREE ::= ["=", LTREE, ETREE] | ["while", ETREE, CLIST] | ["print", VAR]
ETREE ::= NUM | ["+", ETREE, ETREE] | ["&", LTREE] | LTREE
LTREE ::= VAR | ["*", LTREE]
NUM ::= string of digits
VAR ::= string of letters but not "while" or "print" or "end"
===================================================
For example, the PTREE for the earlier example,
y = 4; z = &y; x = (7 + *z); *z = (y + 1), looks like this:
[["=", "y", "4"],
["=", "z", ["&", "y"]],
["=", "x", ["+", "7", ["*", "z"]],
["=", ["*", "z"], ["+", "y", "1"]
]
Here is the coded interpreter --- the entire virtual machine, controller plus data structures. Read the comments at the top and then read the coding of interpretLTREE, which calculates the meaning of an assignment's left-hand side, which computes to a location number.
Next, study interpretETREE to see how ["&", LTREE]
computes to a meaning different from just LTREE. This is
crucial, because the former computes to a location number and the
latter computes to the number stored at the location number.
It may look strange, but that's how it's done in C!
===================================================
"""Interpreter for a C-like mini-language with pointers.
There are two crucial data structures:
"""
memory = [] # a global variable that models primary storage.
# It is a list (array), where the indexes to the array are
# meant to model addresses. For example, if
# memory = [5,0,11], then location 2 of memory holds 11.
env = {} # a global variable that holds the program's variable names
# and the locations that each denotes.
# It is a Python hash table (dictionary). For example, if
# env = {'x':2, 'y':0, 'z':1}, then variable x names location 2
# in memory, and memory[2] == 11. This is how computer
# storage is used in an implementation of C.
def interpretPTREE(program) :
"""pre: program is a program represented as a CLIST
post: memory holds all the updates commanded by program p
"""
global memory, env # these variables are global to main
memory = []; env = {} # reset them both
try :
interpretCLIST(program)
except Exception :
print "Due to error, interpreter must quit prematurely. Sorry."
print "final env =", env
print "final memory =", memory
def interpretCLIST(p) :
"""pre: p is a program represented as a CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
post: memory holds all the updates commanded by program p
"""
for command in p :
interpretCTREE(command)
def interpretLTREE(x) :
"""pre: x is an L-value represented as an LTREE:
LTREE ::= VAR | ["*", LTREE]
post: ans holds the location named by x
returns: ans
"""
if isinstance(x, str) : # a VAR ?
if x in env :
ans = env[x] # look up its location
else : # it's a brand new var, so allocate a memory cell for it:
memory.append(0) # add a cell at the end of memory
env[x] = len(memory) - 1 # remember the location
ans = env[x]
else : # a pointer dereference, ["*", LTREE]
y = interpretLTREE(x[1]) # get value of LTREE, a location number
ans = memory[y] # dereference it and return the location therein
return ans
def interpretCTREE(c) :
"""pre: c is a command represented as a CTREE:
CTREE ::= ["=", LTREE, ETREE] | ["print", LTREE] | ["while", ETREE, CLIST]
post: memory holds all the updates commanded by c
"""
operator = c[0]
if operator == "=" : # assignment command, ["=", LTREE, ETREE]
lval = interpretLTREE(c[1])
exprval = interpretETREE(c[2]) # evaluate the right-hand side
memory[lval] = exprval # do the assignment
elif operator == "print" : # print command, ["print", VAR]
num = interpretETREE(c[1])
print num
elif operator == "while" : # while command
expr = c[1]
body = c[2]
while (interpretETREE(expr) != 0) :
interpretCLIST(body)
else : # error
crash("invalid command")
def interpretETREE(e) :
"""pre: e is an expression represented as an ETREE:
ETREE ::= NUMERAL | LTREE | ["&", LTREE] | [OP, ETREE1, ETREE2]
where OP is either "+" or "-"
post: ans holds the value of e
"""
if isinstance(e, str) and e.isdigit() : # a NUMERAL
ans = int(e)
elif isinstance(e, list) and (e[0] == "+" or e[0] == "-"): # [OP, E1, E2]
op = e[0]
ans1 = interpretETREE(e[1])
ans2 = interpretETREE(e[2])
if op == "+" :
ans = ans1 + ans2
elif op == "-" :
ans = ans1 - ans2
else :
crash("illegal arithmetic operator")
elif isinstance(e, list) and e[0] == "&" : # ["&", LTREE]
ans = interpretLTREE(e[1]) # return the location named by the LTREE
# DO NOT DEREFERENCE !
else: # is an LTREE that MUST BE DEREFERENCED:
x = interpretLTREE(e)
ans = memory[x]
return ans
def crash(message) :
"""pre: message is a string
post: message is printed and interpreter stopped
"""
print message + "! crash! core dump: ", memory
raise Exception # stops the interpreter
===================================================
Exercise:
Test the interpreter with these programs:
x = 3
interpretPTREE([["=", "x", "3"]])
x = 2; p = &x; y = *p
interpretPTREE([["=", "x", "2"],
["=", "p", ["&", "x"]],
["=", "y", ["*", "p"]]
])
x = 1; p = &x; *p = 2
interpretPTREE([["=", "x", "1"],
["=", "p", ["&", "x"]],
["=", ["*", "p"], "2"]
])
x = 0; y = (6 + &x); *x = 999; print y
interpretPTREE([["=", "x", "0"],
["=", "y", ["+", "6", ["&", "x"]]],
["=", ["*", "x"], "999"],
["print", "y"]
])
Exercise:
In C, the primary purpose of a declaration is to allocate memory.
In particular, arrays are laid out in memory
as a sequence of cells.
Say we add int and array declarations, e.g.,
int x;
int[4] r;
x = 2;
r[x] = x + 1;
int y;
would allocate one memory cell to hold a int, four memory cells to hold
a linear array, and one cell for another int.
The resulting environment and memory are
env = {"x": 0, "r": (1,4), "y": 5}
memory = [2, 0, 0, 3, 0, 0]
Note that array r has a location and a length.
(Actually, x has a location and a length, too: (0,1).
So does y: (5,1).)
Implement the syntax and semantics of simple int and array declarations:
D : Declaration
C : Command
D ::= int I | int [ N ] I
C ::= D | L = E | ...
L ::= I | I [ E ] | * L
E ::= & L | N | ...
Should your interpreter check to see if the index, E, in
I[E] is "in bounds"? (You can switch on or off this feature in a
C compiler.)
int y; int* z; y = 5; z = &y; int x = 6 + *z;The declaration for z indicates that z should be used only as a pointer, that is, its R-value is a location number. The following program will generate a warning/error from the C compiler, which announces that y is misused in the assignment:
int x = 0 ; int y; y = &x; // should not assign a location number to an int varAlthough it is not so common, it is legal to declare a variable to be a pointer to a pointer to ...:
int* y; int** z = &y;The levels of pointers will make the interpreter work hard to check consistency of expressions and assignments. Here is the language with declarations:
=================================================== P ::= CL CL ::= C | C ; CL C ::= L = E | while E : CL end | print L | T I T ::= int | T * E ::= N | ( E1 + E2 ) | L | & L L ::= I | * L N ::= string of digits I ::= strings of letters, not including the keywords, while, print, end PTREE ::= [ CTREE+ ] where CTREE+ means one or more CTREEs CTREE ::= ["=", LTREE, ETREE] | ["while", ETREE, CLIST] | ["print", VAR] | ["declare", VAR, TYPE] TYPE ::= a list holding a sequence of "ptr" and "int" strings, e.g., ["int"], ["ptr", "int"], ["ptr", "ptr", "int"], etc. ETREE ::= NUM | ["+", ETREE, ETREE] | ["&", LTREE] | LTREE LTREE ::= VAR | ["*", LTREE] NUM ::= string of digits VAR ::= string of letters but not "while" or "print" or "end" or "declare" or "int" ===================================================
The interpreter remembers the
type tags, in
the environment. The tags are checked every time a variable is used.
Here is a picture of the configuration generated by the first
example program in this section:
Since z is a pointer variable, its type tag, *int,
is saved internally as ["ptr", "int"].
Now, read the interpreter's code to see how the declarations
create type tags and how interpretCTREE uses them to check the correctness
of assignments and how interpretETREE uses them to check correctness
of additions:
===================================================
"""Interpreter for a C-like mini-language with variables, pointers,
declarations and loops. There are two crucial data structures:
"""
memory = [] # a global variable that models primary storage.
# It is a list (array), where the indexes to the array are
# meant to model addresses. For example, if
# memory = [5,0], then location 0 of memory holds 5.
env = {} # a global variable that holds the program's variable names
# and the type,location that each denotes.
# It is a Python hash table (dictionary). For example, if
# env = {'x': (["int"],0), 'p': (["ptr", "int"], 1},
# then variable x names location 2, which holds an int,
# and p names location 1, which holds a pointer to an int;
# that is, location 1 holds a location that holds an int.
def interpretPTREE(p) :
"""pre: p is a program represented as a CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
post: memory holds all the updates commanded by program p
"""
global memory, env # these variables are global to main
memory = [] # reset them both
env = {}
interpretCLIST(p)
print "final env =", env
print "final memory =", memory
def interpretCLIST(p) :
"""pre: p is a program represented as a CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
post: memory holds all the updates commanded by program p
"""
for command in p :
interpretCTREE(command)
def interpretLTREE(x) :
"""pre: x is an L-value represented as an LTREE:
LTREE ::= VAR | ["*", LTREE]
returns: datatype,location pair named by x
"""
if isinstance(x, str) : # a VAR ?
if x in env :
ans = env[x] # look up its location
else : # undeclared
crash("variable " + x + " undeclared")
else : # a pointer dereference, ["*", LTREE]
datatype, loc = interpretLTREE(x[1])
if datatype[0] == "ptr" :
ans = (datatype[1:], memory[loc]) # dereference it
else :
crash("variable not a pointer")
return ans
def interpretCTREE(c) :
"""pre: c is a command represented as a CTREE:
CTREE ::= ["=", LTREE, ETREE] | ["print", LTREE]
| ["while", ETREE, CLIST] | ["declare", VAR, TYPE]
where TYPE ::= [ "ptr"*, "int" ]
and "ptr"* means zero or more occurrences of "ptr"
separated by commas
post: memory holds all the updates commanded by c
"""
operator = c[0]
if operator == "=" : # assignment command, ["=", LTREE, ETREE]
type1,lval = interpretLTREE(c[1])
type2,exprval = interpretETREE(c[2])
if type1 == type2 :
memory[lval] = exprval # do the assignment
else :
crash("incompatible types for assignment")
elif operator == "print" : # print command, ["print", VAR]
type,num = interpretETREE(c[1])
print num
elif operator == "while" : # while command
expr = c[1]
body = c[2]
type,val = interpretETREE(expr)
while (val > 0) :
interpretCLIST(body)
elif operator == "declare" : # declaration
x = c[1]
if x in env :
crash("variable " + x + " redeclared")
else :
memory.append("err") # add a cell at the end of memory
env[x] = (c[2], len(memory) - 1) # save type and location for x
else : # error
crash("invalid command")
def interpretETREE(e) :
"""pre: e is an expression represented as an ETREE:
ETREE ::= NUMERAL | [OP, ETREE1, ETREE2] | ["&", LTREE] | LTREE
where OP is either "+" or "-"
post: ans holds the datatype and value of e
returns: ans, a datatype,value pair
"""
if isinstance(e, str) and e.isdigit() : # a NUMERAL
ans = (["int"], int(e))
elif isinstance(e, list) and (e[0] == "+" or e[0] == "-"): # [OP, E1, E2]
op = e[0]
type1, ans1 = interpretETREE(e[1])
type2, ans2 = interpretETREE(e[2])
if type1 == ["int"] and type2 == ["int"] :
if op == "+" :
ans = (["int"], ans1 + ans2)
elif op == "-" :
ans = (["int"], ans1 - ans2)
else :
crash("illegal arithmetic operator")
else :
crash("cannot do arithmetic on non-ints")
elif isinstance(e, list) and e[0] == "&" : # ["&", LTREE]
type0,val0 = interpretLTREE(e[1])
ans = (["ptr"] + type0, val0)
else: # is an LTREE that must be dereferenced:
type0,loc0 = interpretLTREE(e)
ans = (type0, memory[loc0])
return ans
def crash(message) :
"""pre: message is a string
post: message is printed and interpreter stopped
"""
print message + "! crash! core dump: ", memory
raise Exception # stops the interpreter
===================================================
When you read interpretETREE, you see that the meaning of
an expression must be a pair: a data type and a value.
For example, 3 computes to (["int"], 3).
The type tag ensures that the 3 is assigned only to a variable
whose type is ["int"].
A pointer to an int will have type ["ptr", "int"]. For example,
if we have declare x : int, which
allocates location 0 for x, then
["&", "x"] evaluates to (["ptr", "int"], 0).)
Exercises:
int x int; x = 3; int y; y = (x + 1) main([["declare", "x", ["int"]], ["=", "x", "3"], ["declare", "y", ["int"]], ["=", "y", ["+", "x", "1"]] ]) int x; int* p; x = 2; p = &x; int y; y = *p main([["declare", "x", ["int"]], ["declare", "p", ["ptr", "int"]], ["=", "x", "2"], ["=", "p", ["&", "x"]], ["declare", "y", ["int"]], ["=", "y", ["*", "p"]] ]) int x; int* p; x = 1; p = &x; x = *x main([["declare", "x", ["int"]], ["declare", "p", ["ptr", "int"]], ["=", "x", "1"], ["=", "p", ["&", "x"]], ["=", "x", ["*", "x"]] ])
C ::= ... | declare I : T T ::= int | * TThis is easier to parse.)
int x ; int* y ; int** z ; z = &y ; y = &x ; **z = 99What happens?
int x; x = 3 ; while x: int y; y = 1; x = (x - y); print x end ; print xWhat happens? Is the declaration of y in the loop body legal in C, even when the loop repeats ? (Yes.) Repair the interpreter so that it is legal here, too.
C ::= . . . | declare I : T = ERevise both the parser and interpreter.
C ::= . . . | proc I : CNote: if you worked the previous exercise, you have basically worked this one, too, in this format:
C ::= . . . | declare I : proc = CAdd procedure types to the type tags:
T ::= int | proc | * TRevise the interpreter so that a variable can point to procedure code:
declare x : int ; proc p : x = (x + 1) ; declare y : * proc ; y = &p ; call *yC lets you do this. (Have you heard of a ``function pointer''?)
The translate functions traverse the operator tree and emit target-code instructions that state the actions meant by the tree. The translate functions construct the symbol table (environment) so that the data type and memory location of each program variable is known while the target code is generated. This means the translate functions can check all data-type constraints and can replace all variable names by location numbers. As a result, the executable target code contains no types, no names --- only loads and stores with memory locations.
For example, the C program,
int y;
*int z;
int x;
y = 5;
z = &y;
x = (6 + *z)
generates the symbol table shown above and translates into this target code:
malloc 3 # allocate cells for y,z,x
loadaddr 0 # addr of y
loadconst 5
store # y = 5
loadaddr 1 # addr of z
loadaddr 0 # addr of y
store # z = &y
loadaddr 2 # addr of x
loadconst 6
loadaddr 1 # addr of z
load # z
load # *z
add
store # x = (6 + *z)
When the CPU reads and performs the instructions, it updates memory as
shown in the diagram above.
Earlier in the chapter, we saw how a compiler is made from an interpreter
by replacing the interpreter's computation actions on memory by code-generating
instructions. Since the C-interpreter already has the operator-tree
traversal strategy and already has the declaration processing and type-checking
actions, we can convert it into a compiler by replacing all the interpreter's
actions on memory by target-code generations:
===================================================
def translateETREE(e) :
"""pre: e is an expression represented as an ETREE:
ETREE ::= NUMERAL | [OP, ETREE1, ETREE2] | ["&", LTREE] | LTREE
where OP is either "+" or "-"
returns: ans, a datatype,code pair
post: code is a string, the target code for computing e's meaning
"""
if isinstance(e, str) and e.isdigit() : # a NUMERAL
ans = (["int"], "loadconst " + e + "\n") # replaces int(e)
elif isinstance(e, list) and (e[0] == "+" or e[0] == "-"): # [OP, E1, E2]
op = e[0]
type1, code1 = translateETREE(e[1])
type2, code2 = translateETREE(e[2])
if type1 == ["int"] and type2 == ["int"] :
if op == "+" :
ans = (["int"], code1 + code2 + "add\n") # replaces v1 + v2
elif op == "-" :
ans = (["int"], code1 - code2 + "sub\n")
else :
crash("illegal arithmetic operator")
else :
crash("cannot do arithmetic on non-ints")
elif isinstance(e, list) and e[0] == "&" : # ["&", LTREE]
type0,code0 = translateLTREE(e[1])
ans = (["ptr"] + type0, code0)
else: # is an LTREE that must be dereferenced:
type0,code0 = translateLTREE(e)
ans = (type0, loc0 + "load\n") # replaces memory[loc0]
return ans
def translateCTREE(c) :
"""pre: c is a command represented as a CTREE:
CTREE ::= ["=", LTREE, ETREE] | ["print", LTREE]
| ["while", ETREE, CLIST] | ["dec", VAR, TYPE]
where TYPE ::= [ "ptr"*, "int" ]
and "ptr"* means zero or more occurrences of "ptr"
separated by commas
post: returns code, a string that holds the target code that does
the actions commanded by c
"""
operator = c[0]
if operator == "=" : # assignment command, ["=", LTREE, ETREE]
type1,lcode = translateLTREE(c[1])
type2,ecode = translateETREE(c[2])
if type1 == type2 :
code = lcode + ecode + "store\n" # replaces memory[lval] = eval
else :
crash("incompatible types for assignment")
elif operator == . . .
. . .
return code
. . .
===================================================
This example shows that even when a language is meant to have a compiler as its standard implementation, an interpreter should be built first, because the interpreter
Here is a baby object-oriented program that creates two objects:
x = 7;
y = new {f, g}; // constructs a new object with two fields
y.f = x;
y.g = new {x};
y.g.x = 1 + y.f;
The two global variables are x and y, the former an int and the
latter an object with two fields, f and g. A second object,
holding variable x,
is created and assigned to y's g field.
Although this program
lacks classes and methods, it is easy to guess what should be allocated in the
heap. The virtual machine consists of a controller (the interpreter's functions), the heap,
and a cell that remembers the address ("handle") of the namespace that holds the program's global variables:
The program has its own namespace/object for global variables,
at handle α,
and the two objects are saved at handles
β and γ. We use the term, ``handle'', to talk about the
location of an object, e.g., y's handle is β. (Also, you cannot
do arithmetic with handles, unlike in C!)
Assignment in object language works with handles and variables, like this:
For
x = 7;
the meaning of the left-hand side, here x, is the base-offset pair, (α, 'x'). That is, the ``coordinates'' for
storing 7 in the heap lead to object α, index x.
For
y.f = x;
the meaning of the left-hand side is calculated in stages:
In an object language, the "dot" dereferences a base-offset pair into a handle and uses the handle to make a base-offset pair. There is no other "arithmetic" on handles!
We must understand how the base-offset pairs are constructed and used.
Here is the syntax of the baby object language:
===================================================
P : Program
CL : CommandList L : LefthandSide
C : Command I : Variable
E : Expression N : Numeral
P ::= CL
CL ::= C | C ; CL
C ::= L = E | if E : CL1 else CL2 end | print L |
E ::= N | ( E1 + E2 ) | L | new { I,* }
where I,* means zero or more I-phrases, separated by commas
L ::= I | L . I
N ::= string of digits
I ::= strings of letters, not including keywords
===================================================
The new constructions are new (allocates a new object)
and field indexing, represented by ..
In a later chapter, we add methods, classes, etc.
Once again, an assignment has the form, L = E. But the L-values and R-values used in an object language are markedly different from those used in C:
For y.f = x, the L-value is (β,f) and the R-value is 7 (because the value saved at position x in object α is 7). This means 7 will be stored in the field f of the object labelled by β.
For y.g.x = y, the L-value is (γ, x) (because y.g has R-value γ!), and the assignment's R-value is β (because β is stored at position (α, y)). This means β will be stored in the field x of the object labelled by γ. (It also means that y and y.g.x are aliases to one and the same object.)
The operator trees for the language are
===================================================
PTREE ::= CLIST
CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
CTREE ::= ["=", LTREE, ETREE] | ["if", ETREE, CLIST, CLIST]
| ["print", LTREE]
ETREE ::= NUM | ["+", ETREE, ETREE] | ["deref", LTREE] | ["new", OB ]
OB ::= [ ID* ]
where ID* means zero or more IDs
LTREE ::= ID | ["dot", LTREE, ID]
NUM ::= a nonempty string of digits
ID ::= a nonempty string of letters
===================================================
Study the forms of ETREE; they are important.
Also, note that an LTREE
is a sequence of variable/field names that must be traversed
to find a location.
For example, the CTREEs for
a = new {f,g}; a.f = 3; a.g = (1 + a.f) look like this:
[ ["=", "a", ["new", ["f","g"]]],
["=", ["dot", "a", "f"], "3"],
["=", ["dot", "a", "g"], ["+", "1", ["deref", ["dot", "a", "f"]]]] ]
Here is the interpreter; it uses a heap and a var, ns, that remembers
the handle to the namespace with the global variables.
The heap is grouped with a collection of maintenance functions, and
if the code was any larger, the heap and its functions should be moved into
a module by themselves.
===================================================
"""Interpreter for object-oriented language:
PTREE ::= CLIST
CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
CTREE ::= ["=", LTREE, ETREE] | ["if", ETREE, CLIST, CLIST]
| ["print", LTREE]
ETREE ::= NUM | ["+", ETREE, ETREE] | ["deref", LTREE] | ["new", OB ]
OB ::= [ ID+ ]
where ID+ means one or more IDs
LTREE ::= ID | ["dot", LTREE, ID]
NUM ::= a nonempty string of digits
ID ::= a nonempty string of letters
"""
### THE HEAP:
heap = {}
heap_count = 0 # how many objects stored in the heap
"""The program's heap is a dictionary that maps handles to namespaces.
An object is a namespace.
heap : { (HANDLE : NAMESPACE)+ }
where HANDLE = a string of form, "hn", where n is a numeral
NAMESPACE = a dictionary that maps var names to values
Example:
heap = { "h0": {"x":7, "y":"h1", "z":"h2"},
"h1": {"f":"nil", "g":5, "h":"h2"},
"h2": {"x":12} }
heap_count = 3
is an example heap, where handle "h0" names a namespace
whose x field holds int 7, "y" field holds handle "h1"
(a handle to the object at heaploc "1"), "z" holds handle "h2".
"h1" labels a two-field namespace, and
"h2" labels a namespace with a single field, "x"
The above example heap was generated from this sample program:
x = 7;
y = new {f, g, h};
y.g = 5;
z = new {x};
z.x = (y.g + x);
y.h = z
"""
# Maintenance functions for the heap:
def clearTheHeap():
"""resets the heap to be empty"""
global heap_count, heap
heap_count = 0
heap = {}
def allocateNS() :
"""allocates a new, empty namespace in the heap and returns its handle"""
global heap_count
newhandle = "h" + str(heap_count)
heap[newhandle] = {}
heap_count = heap_count + 1
return newhandle
def lookup(lval) :
"""looks up the value of lval in the heap
param: lval is a pair, (handle,fieldname),
returns: Say that lval = (h,f). The function extracts the
object at heap[h], indexes it with f, and returns
the value found there: return heap[h][f]
For example, for the heap at the top of this program,
lookup(("h0","x")) returns 7
lookup(("h1","f")) returns "nil"
"""
ans = "nil"
handle,fieldname = lval
if handle in heap and fieldname in heap[handle] : # a legal heaploc
ans = heap[handle][fieldname]
else :
crash("dereference error; lval =" + str(lval))
return ans
def store(lval, rval) :
"""stores the rval into the heap at lval
params: lval -- a pair, (handle, fieldname), as described above
rval -- an int or a handle
Say that lval = (h,f). The function finds the object at h
and saves rval at index f within that object: heap[h][f] = rval.
For example, for the heap at the top of this program,
store(("h1","f"), 98) would replace "nil" by 98.
"""
handle,fieldname = lval
if handle in heap : # a legal heaploc
heap[handle][fieldname] = rval
else : crash("store error; lval =" + str(lval))
### NAMESPACE (environment): is the handle to the namespace in the heap
# that holds the program's global variables
ns = "nil" # This will be initialized in the main function, interpretPTREE, as
# ns = allocateNS() See below.
### INTERPRETER FUNCTIONS:
def interpretPTREE(tree) :
"""interprets a complete program tree
pre: tree is a PTREE ::= [ CLIST ]
post: final values are deposited in memory and env
"""
# initialize heap and ns:
global ns # asking for permission to assign to global var, ns
clearTheHeap()
ns = allocateNS()
interpretCLIST(tree)
print "final namespace =", ns
print "final heap =", heap
raw_input("Press Enter key to terminate")
def interpretCLIST(clist) :
"""pre: clist is a program represented as a CLIST ::= [ CTREE+ ]
where CTREE+ means one or more CTREEs
post: memory holds all the updates commanded by program p
"""
for command in clist :
interpretCTREE(command)
def interpretCTREE(c) :
"""pre: c is a command represented as a CTREE:
CTREE ::= ["=", LTREE, ETREE] | ["if", ETREE, CLIST, CLIST2]
| ["print", LTREE]
post: heap holds the updates commanded by c
"""
operator = c[0]
if operator == "=" : # , ["=", LTREE, ETREE]
lval = interpretLTREE(c[1])
rval = interpretETREE(c[2])
store(lval, rval)
elif operator == "print" : # ["print", LTREE]
v = dereference(interpretLTREE(c[1]))
print v
elif operator == "if" : # ["if", ETREE, CLIST1, CLIST2]
test = interpretETREE(c[1])
if test != 0 :
interpretCLIST(c[2])
else :
interpretCLIST(c[3])
else : # error
crash("invalid command")
def interpretETREE(etree) :
"""interpretETREE computes the meaning of an expression operator tree.
ETREE ::= NUM | ["+", ETREE, ETREE] | ["deref", LTREE]
| ["new", OB ]
OB = [ ID* ]
post: updates the heap as needed and returns the etree's value,
either an int, a handle, or "nil"
"""
ans = "nil"
if isinstance(etree, str) and etree.isdigit() : # NUM
ans = int(etree)
elif etree[0] == "+" : # ["+", ETREE, ETREE]
ans1 = interpretETREE(etree[1])
ans2 = interpretETREE(etree[2])
if isinstance(ans1,int) and isinstance(ans2, int) :
ans = ans1 + ans2
else : crash("addition error --- nonint value used")
elif etree[0] == "deref" :
lval = interpretLTREE(etree[1])
ans = lookup(lval)
elif etree[0] == "new" :
handle = allocateNS()
fields = etree[1]
for f in fields :
store((handle,f), "nil")
ans = handle
else : crash("invalid expression form")
return ans
def interpretLTREE(ltree) :
"""interpretLTREE computes the meaning of a lefthandside operator tree.
LTREE ::= ID | LTREE . ID
Here are some example ltree values:
"y" # y
["dot", "y", "h"] # y.h
["dot", ["dot", "y", "h"], "r"] # y.h.r
Returns a pair, (handle, field), where handle is the location
in the heap where an object/namespace is stored,
and field is an ID that indexes into the object.
Look again at the example heap at the very top of this program.
For that heap, here is what the function returns:
"y" # returns ("h0","y")
["dot", "y", "f"] # returns ("h1", "f")
[["dot", "y", "h"], "r"] # returns ("h2", "r")
"""
if isinstance(ltree, str) : # just a single ID ?
ans = (ns, ltree) # use ns as handle to field ltree
else : # ltree has form, ["dot", LTREE1, ID]
lval = interpretLTREE(ltree[1])
handle = lookup(lval)
ans = (handle, ltree[2])
return ans
def crash(message) :
"""pre: message is a string
post: message is printed and interpreter stopped
"""
print message + "! crash! ns=", ns, "heap=", heap
raise Exception # stops the interpreter
===================================================
Exercise:
Here are three sample programs. Use the interpreter on each and
study the final configurations:
x = 2; y = new {f, g}; y.f = x; y.g = new {h}; print y.f interpretPTREE( [ ["=", "x", "2"], ["=", "y", ["new", ["f","g"]]], ["=", ["dot", "y","f"], ["deref", "x"]], ["=", ["dot", "y", "g"], ["new", ["h"]]], ["print", ["dot", "y", "f"]] ] )
x = 7; y = new {f,x}; y.f = x; y.x = {r}; y.x.r = y.f interpretPTREE( [ ["=", "x", "7"], ["=", "y", ["new", ["f","x"]]], ["=", ["dot", "y", "f"], ["deref", "x"]], ["=", ["dot", "y", "x"], ["new", ["r"]]], ["=", ["dot", ["dot", "y", "x"], "r"], ["deref", ["dot", "y", "f"]]], ])
y = new {f,g}; x = y; y.h = 5; # the object grows as needed! print y interpretPTREE( [ ["=", "y", ["new", ["f","g"]]], ["=", "x", ["deref", "y"]], ["=", ["dot", "y", "h"], "5"], ["print", "y"] ])
Exercise:
Notice there is no trouble in reusing a global variable name as a field name:
x = 2;
y = new {x, y};
y.x = x; y.y = y; x = y.x
The indexing makes clear which variable should be consulted.
x = 2; y = new {a = x; b = a}This says that y's field a is set to x's value, and field b is set to a's value.
We will use this syntax for
initialization commands:
E ::= . . . | new { F* }
where F* means zero of more occurrences of F, separated by ;
F ::= I = E
Revise the parser and interpreter
for these new constructions. How does your semantics
operate on the example? (That is, does the semantics locate
the value of x when it is initializing a?
Does it locate the value of a when it is initializing b?
Make certain that it does. Hint: in the interpreter, replace
variable ns by a stack of handles to namespaces.
Use the stack when computing the semantics of a LefthandSide.
x = 2; y = new {x = super.x; y = this.x}states that y's field x is initialized with the value of global variable x and field y is initialized with the value of field x in this object being constructed.
Revise the syntax of Lefthandside to read:
L ::= I | L . I | this | super
and modify parser and interpreter to implement the new constructions.
E ::= N | ( E1 + E2 ) | L | new { CL }That is, when a new object is allocated, the commands, CL, execute. Here is an example:
x = 7; y = { if x : a = x else b = 0 end; c = x; print c }This code sets x to 7 and sets y to {a:7, c:7}. You see that we are not far from modern objects, which contain a mix of declarations and executable constructor code.