We will use ML
to write an interpreter for a mini, untyped C with pointers and recursive procedures.
We model the C-machine's symbol table and its linear memory as lists that
"grow" and "shrink" when a procedure is called and is completed.
For example, for
===================================================
var x = 2;
proc p(a,b):
x = (a + b); //(2)
end;
var y = (x + 1);
//(1)
p(y, 4);
//(3)
===================================================
the symbol table (environment), env, and memory, mem, appear to have these values at program point //(1):
env is a list that holds the declared names and meanings, and mem holds the ints
saved in storage locations. The env-list is drawn in cons-layout, so
you can see how part of the list is shared (the α-part).
The mem-list is also a list, but there is no sharing,
so I drew it as a
linear list.
Notice that procedure p's closure holds (a link to) the subenvironment (sublist) α.
When p(y, 4) is called, the closure is extracted, and the arguments are
evaluated and stored in linear memory, like in real-life C.
Subenvironment α is extended with bindings for
parameters a and b and also with a copied binding for procedure p,
in case p calls itself.
At point //(2), this is how the env and mem appear to the body of
procedure p:
mem has expanded into [7,3,3,4].
When p's call ends, the env reverts
to what it was before the call,
and the length of mem
reverts ("pops") to its former length but the updates remain.
At point //(3), the env-mem configuration looks like this:
Here is the execution of the sample program with the system I built:
===================================================
Parse tree:
(VarDecl("x", Num(2))
:: ProcDecl("p", "a"::"b"::nil, Assign(Var("x"), Add(Deref(Var("a")), Deref(Var("b")))) :: nil)
:: VarDecl("y", Add(Deref(Var("x")), Num(1)))
:: nil,
Call("p", Deref(Var("y"))::Num(4)::nil)
:: nil)
Execution starts:
val it = () : unit
Successful termination!
Environment =
(y, Loc(1))
:: (p, Closure(params:[ a, b,], body:[ Assign(Var(x), Add(Deref(Var(a)), Deref(Var(b)))),], linkto:(x, Loc(0)) :: nil))
:: (x, Loc(0))
:: nil
Store =
[ 7, 3, ]
===================================================
=================================================== PROGRAM ::= DECLIST COMMLIST DECLIST ::= DECLARATION ; DECLIST | (nothing) DECLARATION ::= var ID = EXPRESSION | proc ID ( IDLIST ) : COMMANDLIST end COMMAND ::= LEFTHANDSIDE = EXPRESSSION | print EXPRESSION | if EXPRESSION : COMMANDLIST else COMMANDLIST end | ID ( EXPLIST ) EXPRESSION ::= NUMERAL | ( EXPRESSION OPERATOR EXPRESSION ) | & LEFTHANDSIDE | LEFTHANDSIDE LEFTHANDSIDE ::= ID | * LEFTHANDSIDE COMMANDLIST is zero or more COMMANDs, separated by semicolons EXPLIST is zero or more EXPRESSIONs, separated by commas IDLIST is zero or more IDs, separated by commas OPERATOR ::= + | - NUMERAL is a sequence of digits from the set, {0,1,2,...,9} ID is a string of letters, not a keyword ===================================================This little program shows all the syntax forms:
var x = 0; proc sum(y, count) : if count : *y = (*y + count); sum(y, (count - 1)); else print *y; end end; sum(&x, 4)When executed, it prints 10.
I have implemented most of the language for you. You must finish some operations on the linear memory, declaration initialization, pointer referencing and dereferencing, and procedure call.
The input to the SML-coded interpreter are operator (parse) trees defined by
these ML datatypes:
===================================================
type Id = string
datatype LVal = Var of Id | Star of LVal
datatype Exp = Num of int | Add of Exp * Exp | Sub of Exp * Exp
| Ampersand of LVal | Deref of LVal
datatype Comm = Assign of LVal * Exp | Print of Exp
| If of Exp * (Comm list) * (Comm list)
| Call of Id * (Exp list)
datatype Decl = VarDecl of Id * Exp
| ProcDecl of Id * (Id list) * (Comm list)
type Prog = (Decl list) * (Comm list)
===================================================
Let's look at interpretComm:
===================================================
(* interpretComm : Comm -> Env -> Store -> Store *)
fun interpretComm (Assign(lv, e)) env mem =
let val lhs = interpretLVal lv env mem
val rhs = interpretExp e env mem
in update(lhs, rhs, mem)
end
| interpretComm (Print(e)) env mem =
...
| interpretComm (If(e, clist1, clist2)) env mem =
...
| interpretComm (Call(id, elist)) env mem =
...
===================================================
There is one equation for each form of Comm value.
interpretComm uses three input arguments ---
a Comm-value, an env, and a mem --- to build its answer, which is a new
mem. The arguments are not placed in a tuple --- ML
lets you supply the arguments in a sequence, without commas and parens.
(This produces a nested-arrow typing for interpretComm. You don't need
to worry about the typing.)
interpretComm "passes around" env and mem as arguments to other functions, e.g., interpretExp e env mem. In the end, interpretComm assembles an updated mem that is returned as its answer. (Within the ML implementation, the handles to the envs and mems are passed and returned; the actual structures are constructed in ML's storage heap.)
This form of "parameter-passing-of-data-structures" is central to modern, distributed-systems programming --- processes/agents in a system pass parameters (messages) to each other that are (handles to) complex, shared data structures, and they return answers that are (handles to) complex, shared data structures.
Here are the primary functions in the interpeter:
===================================================
(* interpretLVal : LValue -> Env -> Store -> int (a location number)
accepts an LValue phrase, an env, and the mem to calculate and
return a location number that can be used to index into the mem.
*)
(* interpretExp : Exp -> Env -> Store -> int
accepts an Exp phrase, an env, and the mem to calculate and
return the integer meaing of the Exp phrase
*)
(* interpretComm : Comm -> Env -> Store -> Store *)
accepts a Comm phrase, an env, and the mem to calculate and
return an updated value for the mem
*)
(* interpretDecl : Decl -> Env -> Store -> (Env * Store)
accepts a Decl phrase, an env, and the mem to calculate and
return a pair: (updated env, updated mem)
*)
===================================================
Python is a great "glue language" for connecting together programs written in different languages. You use Python's os module to talk to the operating system and tell it how to connect the programs. You can see this in run.py and you can read about it in the Python Standard Library book.
The starter system is contained in the folder, Ex5, that is found on the web page with this assignment sheet. Start the entire system by starting the Python driver program, run.py.
My implementation can be used as it is, but it doesn't do a lot --- important pieces must be implemented. This is what you will do to finish this assignment:
In interpret.sml:
I have provided test cases that you are required to try; see testcases.txt for these. Place the test cases and their outputs in a file named tests.txt
Important: Your work will go fastest if you write and test one at a time each function you write. Work in small steps! The ML type checker/theorem prover does an analysis that goes beyond any other type checker you have ever used --- it is doing a kind of "structural testing" of the code by means of predicate-logic theorem proving. As a result, it might take some effort to remove all errors from a function. This can be frustrating, but the payoff is that you will spend very little time on testing!
For Step 1, above, test store.sml separately, like this: sml store.sml. See the testcases.txt sheet for details.
For each of Steps 2-4, edit a function, then first try to execute with an empty program, to see if the SML checker locates any compile/type errors
in your new code:
===================================================
START python run.py
YOU WILL SEE IN A COMMAND WINDOW:
Type program; OK to do it on multiple lines; terminate with !
as the first symbol on a line by itself:
!
Parse tree:
(nil,
nil)
SML interpreter now starts. When finished, press Ctrl z
...(lots of ML val definitions display here)...
IF THERE ARE ANY ERRORS, THEY WILL LIST HERE!
OTHERWISE, YOU WILL SEE THE FOLLOWING:
[opening tree.txt]
val tree = ([],[]) : Prog
val it = () : unit
Execution starts:
val it = () : unit
Successful termination!
Environment =
nil
Store =
[ ]
===================================================
Once you get "successful termination" of an empty program, then you are
ready to test you new code with a real program.
Remember to document any new functions you write and modify the documentation of the other functions to match your implemented extensions.
The teaching assistants will study your work and your tests and apply some additional tests before scoring the submission.
If you are working with a partner, both you and your partner upload the same zipped folder.