We will use ML to write an interpreter for a mini, untyped C with pointers and recursive procedures. We model the C-machine's symbol table and its linear memory as lists that "grow" and "shrink" when a procedure is called and is completed. For example, for
var x = 2; proc p(a,b): x = (a + b); //(2) end; var y = (x + 1); //(1) p(y, 4); //(3)the symbol table (environment), $env$, and memory, $mem$, appear to have these values at program point $//(1)$:
Notice that procedure $p$'s closure holds (a link to) the subenvironment (sublist) $α$.
When $p(y, 4)$ is called, the closure is extracted, and the arguments are
evaluated and stored in linear memory, like in real-life C.
Subenvironment $α$ is extended with bindings for
parameters $a$ and $b$ and also with a copied binding for procedure $p$,
in case $p$ calls itself.
At point $//(2)$, this is how the $env$ and $mem$ appear to the body of
procedure $p$:
When $p$'s call ends, the $env$ reverts
to what it was before the call,
and the length of $mem$
reverts ("pops") to its former length but the updates remain.
At point $//(3)$, the $env$-$mem$ configuration looks like this:
Here is the execution of the sample program with the system I built:
Parse tree: (VarDecl("x", Num(2)) :: ProcDecl("p", "a"::"b"::nil, Assign(Var("x"), Add(Deref(Var("a")), Deref(Var("b")))) :: nil) :: VarDecl("y", Add(Deref(Var("x")), Num(1))) :: nil, Call("p", Deref(Var("y"))::Num(4)::nil) :: nil) Execution starts: val it = () : unit Successful termination! Environment = (y, Loc(1)) :: (p, Closure(params:[ a, b,], body:[ Assign(Var(x), Add(Deref(Var(a)), Deref(Var(b)))),], linkto:(x, Loc(0)) :: nil)) :: (x, Loc(0)) :: nil Store = [ 7, 3, ]
PROGRAM ::= DECLIST COMMLIST DECLIST ::= DECLARATION ; DECLIST | (nothing) DECLARATION ::= var ID = EXPRESSION | proc ID ( IDLIST ) : COMMANDLIST end COMMAND ::= LEFTHANDSIDE = EXPRESSSION | print EXPRESSION | if EXPRESSION : COMMANDLIST else COMMANDLIST end | ID ( EXPLIST ) EXPRESSION ::= NUMERAL | ( EXPRESSION OPERATOR EXPRESSION ) | & LEFTHANDSIDE | LEFTHANDSIDE LEFTHANDSIDE ::= ID | * LEFTHANDSIDE COMMANDLIST is zero or more COMMANDs, separated by semicolons EXPLIST is zero or more EXPRESSIONs, separated by commas IDLIST is zero or more IDs, separated by commas OPERATOR ::= + | - NUMERAL is a sequence of digits from the set, {0,1,2,...,9} ID is a string of letters, not a keywordThis little program shows all the syntax forms:
I have implemented most of the language for you. You must finish some operations on the linear memory, declaration initialization, pointer referencing and dereferencing, and procedure call.
The input to the SML-coded interpreter are operator (parse) trees defined by these ML datatypes:
type Id = string datatype LVal = Var of Id | Star of LVal datatype Exp = Num of int | Add of Exp * Exp | Sub of Exp * Exp | Ampersand of LVal | Deref of LVal datatype Comm = Assign of LVal * Exp | Print of Exp | If of Exp * (Comm list) * (Comm list) | Call of Id * (Exp list) datatype Decl = VarDecl of Id * Exp | ProcDecl of Id * (Id list) * (Comm list) type Prog = (Decl list) * (Comm list)
Let's look at $interpretComm$:
(* interpretComm : Comm -> Env -> Store -> Store *) fun interpretComm (Assign(lv, e)) env mem = let val lhs = interpretLVal lv env mem val rhs = interpretExp e env mem in update(lhs, rhs, mem) end | interpretComm (Print(e)) env mem = ... | interpretComm (If(e, clist1, clist2)) env mem = ... | interpretComm (Call(id, elist)) env mem = ...There is one equation for each form of $Comm$ value. $interpretComm$ uses three input arguments --- a $Comm$-value, an $env$, and a $mem$ --- to build its answer, which is a new $mem$. The arguments are not placed in a tuple --- ML lets you supply the arguments in a sequence, without commas and parens. (This produces a nested-arrow typing for $interpretComm$. You don't need to worry about the typing.)
$interpretComm$ "passes around" $env$ and $mem$ as arguments to other functions, e.g., $interpretExp e env mem$. In the end, $interpretComm$ assembles an updated $mem$ that is returned as its answer. (Within the ML implementation, the handles to the $env$s and $mem$s are passed and returned; the actual structures are constructed in ML's storage heap.)
This form of "parameter-passing-of-data-structures" is central to modern, distributed-systems programming --- processes/agents in a system pass parameters (messages) to each other that are (handles to) complex, shared data structures, and they return answers that are (handles to) complex, shared data structures.
Here are the primary functions in the interpeter:
(* interpretLVal : LValue -> Env -> Store -> int (a location number) accepts an LValue phrase, an env, and the mem to calculate and return a location number that can be used to index into the mem. *) (* interpretExp : Exp -> Env -> Store -> int accepts an Exp phrase, an env, and the mem to calculate and return the integer meaing of the Exp phrase *) (* interpretComm : Comm -> Env -> Store -> Store *) accepts a Comm phrase, an env, and the mem to calculate and return an updated value for the mem *) (* interpretDecl : Decl -> Env -> Store -> (Env * Store) accepts a Decl phrase, an env, and the mem to calculate and return a pair: (updated env, updated mem) *)
Python is a great "glue language" for connecting together programs written in different languages. You use Python's $os$ module to talk to the operating system and tell it how to connect the programs. You can see this in $run.py$ and you can read about it in the Python Standard Library book.
The starter system is contained in the folder, $Ex5$, that is found on the web page with this assignment sheet. Start the entire system by starting the Python driver program, $run.py$.
My implementation can be used as it is, but it doesn't do a lot --- important pieces must be implemented. This is what you will do to finish this assignment:
In $interpret.sml$:
I have provided test cases that you are required to try; see $testcases.txt$ for these. Place the test cases and their outputs in a file named $tests.txt$
Important: Your work will go fastest if you write and test one at a time each function you write. Work in small steps! The ML type checker/theorem prover does an analysis that goes beyond any other type checker you have ever used --- it is doing a kind of "structural testing" of the code by means of predicate-logic theorem proving. As a result, it might take some effort to remove all errors from a function. This can be frustrating, but the payoff is that you will spend very little time on testing!
For Step 1, above, test $store.sml$ separately, like this: $sml store.sml$. See the $testcases.txt$ sheet for details.
For each of Steps 2-4, edit a function, then first try to execute with an empty program, to see if the SML checker locates any compile/type errors in your new code:
START python run.py YOU WILL SEE IN A COMMAND WINDOW: Type program; OK to do it on multiple lines; terminate with ! as the first symbol on a line by itself: ! Parse tree: (nil, nil) SML interpreter now starts. When finished, press Ctrl z ...(lots of ML val definitions display here)... IF THERE ARE ANY ERRORS, THEY WILL LIST HERE! OTHERWISE, YOU WILL SEE THE FOLLOWING: [opening tree.txt] val tree = ([],[]) : Prog val it = () : unit Execution starts: val it = () : unit Successful termination! Environment = nil Store = [ ]Once you get "successful termination" of an empty program, then you are ready to test you new code with a real program.
Remember to document any new functions you write and modify the documentation of the other functions to match your implemented extensions.
The teaching assistants will study your work and your tests and apply some additional tests before scoring the submission.
If you are working with a partner, both you and your partner upload the same zipped folder.