CIS505 Assignment 5

10 points; due Thursday, October 30.

We will use ML to write an interpreter for a mini, untyped C with pointers and recursive procedures. We model the C-machine's symbol table and its linear memory as lists that "grow" and "shrink" when a procedure is called and is completed. For example, for

===================================================

var x = 2; 
proc p(a,b): 
   x = (a + b);  //(2)
end;
var y = (x + 1);
//(1)
p(y, 4); 
//(3)

===================================================

the symbol table (environment), env, and memory, mem, appear to have these values at program point //(1):
env is a list that holds the declared names and meanings, and mem holds the ints saved in storage locations. The env-list is drawn in cons-layout, so you can see how part of the list is shared (the α-part). The mem-list is also a list, but there is no sharing, so I drew it as a linear list.

Notice that procedure p's closure holds (a link to) the subenvironment (sublist) α.

When p(y, 4) is called, the closure is extracted, and the arguments are evaluated and stored in linear memory, like in real-life C. Subenvironment α is extended with bindings for parameters a and b and also with a copied binding for procedure p, in case p calls itself. At point //(2), this is how the env and mem appear to the body of procedure p:

mem has expanded into [7,3,3,4].

When p's call ends, the env reverts to what it was before the call, and the length of mem reverts ("pops") to its former length but the updates remain. At point //(3), the env-mem configuration looks like this:

Here is the execution of the sample program with the system I built:

===================================================

Parse tree:
(VarDecl("x", Num(2))
 :: ProcDecl("p", "a"::"b"::nil, Assign(Var("x"), Add(Deref(Var("a")), Deref(Var("b")))) :: nil)
 :: VarDecl("y", Add(Deref(Var("x")), Num(1)))
 :: nil,
Call("p", Deref(Var("y"))::Num(4)::nil)
 :: nil)

Execution starts:
val it = () : unit
Successful termination!
Environment =
(y, Loc(1))
:: (p, Closure(params:[ a, b,], body:[ Assign(Var(x), Add(Deref(Var(a)), Deref(Var(b)))),], linkto:(x, Loc(0)) :: nil))
:: (x, Loc(0))
:: nil
Store =
[ 7, 3, ]

===================================================

Mini-C syntax

The language has vars, procs, and pointers:

===================================================

PROGRAM ::=  DECLIST COMMLIST 

DECLIST ::=  DECLARATION ; DECLIST  |  (nothing)
DECLARATION ::=  var ID = EXPRESSION  |  proc ID ( IDLIST ) : COMMANDLIST end

COMMAND ::=  LEFTHANDSIDE = EXPRESSSION  |  print EXPRESSION
          |  if EXPRESSION : COMMANDLIST else COMMANDLIST end
          |  ID ( EXPLIST ) 

EXPRESSION ::=  NUMERAL  |  ( EXPRESSION OPERATOR EXPRESSION )
             |  & LEFTHANDSIDE  |  LEFTHANDSIDE

LEFTHANDSIDE ::=  ID  |  * LEFTHANDSIDE

COMMANDLIST is zero or more COMMANDs, separated by semicolons
EXPLIST  is zero or more  EXPRESSIONs,  separated by commas
IDLIST  is zero or more  IDs, separated by commas

OPERATOR ::=  +  |  -
NUMERAL  is a sequence of digits from the set, {0,1,2,...,9}
ID  is a string of letters, not a keyword

===================================================

This little program shows all the syntax forms: var x = 0; proc sum(y, count) : if count : *y = (*y + count); sum(y, (count - 1)); else print *y; end end; sum(&x, 4)
When executed, it prints 10.

I have implemented most of the language for you. You must finish some operations on the linear memory, declaration initialization, pointer referencing and dereferencing, and procedure call.

Interpreter input format

The input to the SML-coded interpreter are operator (parse) trees defined by these ML datatypes:

===================================================

type Id = string

datatype LVal = Var of Id | Star of LVal

datatype Exp = Num of int | Add of Exp * Exp | Sub of Exp * Exp
             | Ampersand of LVal | Deref of LVal

datatype Comm = Assign of LVal * Exp | Print of Exp
              | If of Exp * (Comm list) * (Comm list)
              | Call of Id * (Exp list)

datatype Decl = VarDecl of Id * Exp
              | ProcDecl of Id * (Id list) * (Comm list)

type Prog = (Decl list) * (Comm list)

===================================================

Interpreter structure

It really helps if you understand how the interpreter is structured. The interpreter has a function for each form of operator tree, e.g., interpretComm executes values from datatype Comm.

Let's look at interpretComm:

===================================================

(* interpretComm : Comm -> Env -> Store -> Store *)
fun interpretComm (Assign(lv, e)) env mem = 
        let val lhs = interpretLVal lv env mem
            val rhs = interpretExp e env mem
        in  update(lhs, rhs, mem)
        end

|   interpretComm (Print(e)) env mem = 
        ...
|   interpretComm (If(e, clist1, clist2)) env mem =
        ...
|   interpretComm (Call(id, elist)) env mem = 
        ...

===================================================

There is one equation for each form of Comm value. interpretComm uses three input arguments --- a Comm-value, an env, and a mem --- to build its answer, which is a new mem. The arguments are not placed in a tuple --- ML lets you supply the arguments in a sequence, without commas and parens. (This produces a nested-arrow typing for interpretComm. You don't need to worry about the typing.)

interpretComm "passes around" env and mem as arguments to other functions, e.g., interpretExp e env mem. In the end, interpretComm assembles an updated mem that is returned as its answer. (Within the ML implementation, the handles to the envs and mems are passed and returned; the actual structures are constructed in ML's storage heap.)

This form of "parameter-passing-of-data-structures" is central to modern, distributed-systems programming --- processes/agents in a system pass parameters (messages) to each other that are (handles to) complex, shared data structures, and they return answers that are (handles to) complex, shared data structures.

Here are the primary functions in the interpeter:

===================================================

(* interpretLVal : LValue -> Env -> Store -> int (a location number)
      accepts an  LValue phrase, an env, and the mem to calculate and
      return a location number that can be used to index into the mem.
*)
(* interpretExp : Exp -> Env -> Store -> int 
     accepts an  Exp phrase, an env, and the mem to calculate and
     return the integer meaing of the Exp phrase
*)
(* interpretComm : Comm -> Env -> Store -> Store *)
      accepts a Comm phrase, an env, and the mem to calculate and
      return an updated value for the mem
*)
(* interpretDecl : Decl -> Env -> Store -> (Env * Store)  
      accepts a Decl phrase, an env, and the mem to calculate and
      return a pair: (updated env, updated mem)
*)

===================================================

Interpreter implementation

The implementation is a hybrid, part Python and part SML. I didn't like SML's built-in parser generator, so I reused the PLY parser generator, which runs in Python. So, I wrote a Python driver program, run.py, that reads the input program, calls PLY to build the operator tree, then calls the OS to start sml interpret.sml. The SML program reads the operator tree from a file and interprets (executes) it.

Python is a great "glue language" for connecting together programs written in different languages. You use Python's os module to talk to the operating system and tell it how to connect the programs. You can see this in run.py and you can read about it in the Python Standard Library book.

The starter system is contained in the folder, Ex5, that is found on the web page with this assignment sheet. Start the entire system by starting the Python driver program, run.py.

My implementation can be used as it is, but it doesn't do a lot --- important pieces must be implemented. This is what you will do to finish this assignment:

What to implement

You will finish coding the SML "modules", store.sml and interpret.sml:

In store.sml, write the code that is missing for the operations, update (which "updates" a cell in the "memory") and revert (which "pops" the memory when a procedure call finishes). These functions build new list answers from their list arguments, like the sorting function you coded for Assignment 4.
In interpret.sml:
finish the coding of interpretDecl so that a variable declaration, var I = E, correctly initializes the declared variable, I, with the value of E.
finish the coding of interpretExp so that pointer reference, &L, works correctly. Finish the coding of interpretLVAl so that *L works correctly. Reread Chapter 2 if you forgot how these should operate.
In interpretComm, write the semantics of procedure call, I(E1, E2,...,En):
- find I's closure;
- remember the current length of the memory vector;
- provided that the number of arguments, n, equals the number of parameter names in I's closure, then
  (i) extend the environment extracted from the closure with a (re)binding of I to the closure;
  (ii) evaluate arguments E1, E2,...,En; allocate cells for their values in the memory; and bind the location numbers of the allocated cells to the parameter names in the extended environment. (See the pictures at the top of this page.)
  (iii) execute the procedure's body extracted from the closure with the extended environment and memory;
  (iv) upon termination of the procedure's body, "pop" (revert) the memory to its former size and return the updated memory as the answer.

I have provided test cases that you are required to try; see testcases.txt for these. Place the test cases and their outputs in a file named tests.txt

Important: Your work will go fastest if you write and test one at a time each function you write. Work in small steps! The ML type checker/theorem prover does an analysis that goes beyond any other type checker you have ever used --- it is doing a kind of "structural testing" of the code by means of predicate-logic theorem proving. As a result, it might take some effort to remove all errors from a function. This can be frustrating, but the payoff is that you will spend very little time on testing!

For Step 1, above, test store.sml separately, like this: sml store.sml. See the testcases.txt sheet for details.

For each of Steps 2-4, edit a function, then first try to execute with an empty program, to see if the SML checker locates any compile/type errors in your new code:

===================================================

START  python run.py  
YOU WILL SEE IN A COMMAND WINDOW:

Type program; OK to do it on multiple lines; terminate with  !
  as the first symbol on a line by itself:

!
Parse tree:
(nil,
nil)
SML interpreter now starts.  When finished, press  Ctrl z
...(lots of ML val definitions display here)...

IF THERE ARE ANY ERRORS, THEY WILL LIST HERE!
OTHERWISE, YOU WILL SEE THE FOLLOWING:
[opening tree.txt]
val tree = ([],[]) : Prog
val it = () : unit
Execution starts:
val it = () : unit
Successful termination!
Environment =
nil
Store =
[ ]

===================================================

Once you get "successful termination" of an empty program, then you are ready to test you new code with a real program.

Remember to document any new functions you write and modify the documentation of the other functions to match your implemented extensions.

Teamwork

You may do this exercise with a partner. What you did for Exercise 3 does not matter --- if you wish to work with a partner, new or old, you must notify me by Wednesday, 5pm, October 22.

Submission and grading

Make a folder, YourFirstNameYourLastName5, that contains store.sml, interpret.sml, and tests.txt. (No other files in the system are changed!) Zip the folder into a .zip file and submit the .zip file to the CIS505 site on K-State Online.

The teaching assistants will study your work and your tests and apply some additional tests before scoring the submission.

If you are working with a partner, both you and your partner upload the same zipped folder.