CIS505 Assignment 2

10 points; due September 20

We will develop an object-oriented language in stages. We start with the syntax of the assignment language from Exercise 1,

P  ::=  CL
CL ::=  C;*   (zero or more  C  separated by ; )
C  ::=  L = E  |  if E : CL else CL end  |  print E
E  ::=  N  |  E1 OP E2  | L
OP ::=  +  |  -
L  ::=  I
N  ::=  a nonempty string of digits
I  ::=  a nonempty string of letters, not a keyword

but this time implemented by a heap-storage virtual machine, like that in Chapter 2, Section 2.2.

To start, you get for free the complete system for the above language. Its parser is generated from a tool, PLY, which built the parser from the grammar rules. You can look inside a23pars.py to see what I mean. There is documentation at http://www.dabeaz.com/ply/.)

When you download and run the interpreter on a program, you will see it behaves like this:

===================================================

Type program; OK to do it on multiple lines; terminate with  !
  as the first symbol on a line by itself:
x = 2;
print x;
y = (x + 1);
x = (y + y)
!
Parse tree:
[[], [['=', 'x', '2'], ['print', ['deref', 'x']], ['=', 'y', ['+', ['deref', 'x'], '1']], ['=', 'x', ['+', ['deref', 'y'], ['deref', 'y']]]]]
Execution:
2
namespace = h0
heap = {
  h0 : {'x': 2}
}
Successful termination.
namespace = h0
heap = {
  h0 : {'y': 3, 'x': 6}
}
Press Enter key to terminate

===================================================

Notice that (i) The parse tree has an empty list of declarations (since the core language doesn't have declarations). (ii) The global variables are stored in a namespace in the heap, and the handle to that namespace is remembered by the interpreter. (The handles are not Greek letters but h0, h1, etc.) (iii) The print E command prints E's value along with a "dump" of the current storage. This will help you debug your work.

What you will do

You will extend the language with declared variables and parameterized procedures. Procedures can own private declarations. (Objects and classes are added in Assignment 3.) First complete Part A of this exercise. Once you do so, work Part B. (If you don't finish Part B, it is not a disaster, but be certain to do Part A!)

Part A: declarations and procedure calls

First, we add integer declarations and procedures with parameters. A sample program in the extended language looks like this:

===================================================

int x = 2;
proc p(y, z):
   print y;  x = (y - z);  q(z);  z = 0
end;
proc q(y): x = (x + (y - 1)); print y;  end;
print x;
p(9, (x+1));

===================================================

Here is the execution of the above program. It uses an activation stack, which you must add to the interpreter. Declared procedures are saved as closures.

===================================================

Type program; OK to do it on multiple lines; terminate with  !
  as the first symbol on a line by itself:

int x = 2;
proc p(y, z):
   print y;  x = (y - z);  q(z);  z = 0
end;
proc q(y): x = (x + (y - 1)); print y;  end;
print x;
p(9, (x+1));
!
Parse tree:
[[['int', 'x', '2'], ['proc', 'p', ['y', 'z'], [], [['print', ['deref', 'y']], ['=', 'x', ['-', ['deref', 'y'], ['deref', 'z']]], ['call', 'q', [['deref', 'z']]], ['=', 'z', '0']]], ['proc', 'q', ['y'], [], [['=', 'x', ['+', ['deref', 'x'], ['-', ['deref', 'y'], '1']]], ['print', ['deref', 'y']]]]], [['print', ['deref', 'x']], ['call', 'p', ['9', ['+', ['deref', 'x'], '1']]]]]
Execution:
2
activation stack = ['h0']
heap = {
  h0 : {'q': 'h2', 'parentns': 'nil', 'p': 'h1', 'x': 2}
  h1 : {'body': [['print', ['deref', 'y']], ['=', 'x', ['-', ['deref', 'y'], ['deref', 'z']]], ['call', 'q', [['deref', 'z']]], ['=', 'z', '0']],
        'params': ['y', 'z'], 
        'type': 'proc', 
        'link': 'h0'}
  h2 : {'body': [['=', 'x', ['+', ['deref', 'x'], ['-', ['deref', 'y'], '1']]],['print', ['deref', 'y']]], 
        'params': ['y'], 
        'type': 'proc', 
        'link': 'h0'}
}
9
activation stack = ['h0', 'h3']
heap = {
  h0 : {'q': 'h2', 'parentns': 'nil', 'p': 'h1', 'x': 2}
  h1 : {'body': [['print', ['deref', 'y']], ['=', 'x', ['-', ['deref', 'y'],['deref', 'z']]], ['call', 'q', [['deref', 'z']]], ['=', 'z', '0']], 
        'params': ['y', 'z'], 
        'type': 'proc', 
        'link': 'h0'}
  h2 : {'body': [['=', 'x', ['+', ['deref', 'x'], ['-', ['deref', 'y'], '1']]],['print', ['deref', 'y']]], 
        'params': ['y'], 
        'type': 'proc', 
        'link': 'h0'}
  h3 : {'y': 9, 'parentns': 'h0', 'z': 3}
}
3
activation stack = ['h0', 'h3', 'h4']
heap = {
  h0 : {'q': 'h2', 'parentns': 'nil', 'p': 'h1', 'x': 8}
  h1 : {'body': [['print', ['deref', 'y']], ['=', 'x', ['-', ['deref', 'y'],['deref', 'z']]], ['call', 'q', [['deref', 'z']]], ['=', 'z', '0']], 
        'params': ['y', 'z'], 
        'type': 'proc', 
        'link': 'h0'}
  h2 : {'body': [['=', 'x', ['+', ['deref', 'x'], ['-', ['deref', 'y'], '1']]],['print', ['deref', 'y']]], 
        'params': ['y'], 
        'type': 'proc', 
        'link': 'h0'}
  h3 : {'y': 9, 'parentns': 'h0', 'z': 3}
  h4 : {'y': 3, 'parentns': 'h0'}
}

Successful termination.
activation stack = ['h0']
heap = {
  h0 : {'q': 'h2', 'parentns': 'nil', 'p': 'h1', 'x': 8}
  h1 : {'body': [['print', ['deref', 'y']], ['=', 'x', ['-', ['deref', 'y'],['deref', 'z']]], ['call', 'q', [['deref', 'z']]], ['=', 'z', '0']], 
        'params': ['y', 'z'], 
        'type': 'proc', 
        'link': 'h0'}
  h2 : {'body': [['=', 'x', ['+', ['deref', 'x'], ['-', ['deref', 'y'], '1']]],['print', ['deref', 'y']]], 
        'params': ['y'], 
        'type': 'proc', 
        'link': 'h0'}
  h3 : {'y': 9, 'parentns': 'h0', 'z': 0}
  h4 : {'y': 3, 'parentns': 'h0'}
}
Press Enter key to terminate

===================================================

The implementation uses an activation stack (implemented as a Python list) of handles to namespaces (Python dictionaries). A namespace holds declared variables and procedures. Procedures are saved as closures, modelled here by dictionaries. When a procedure is called, a new namespace is allocated, the parameter-argument bindings are stored in the new namespace, the parentns link is set, and the namespace's handle is pushed onto the activation stack. When a procedure completes, its handle is popped from the activation stack. See Section 5.2.1 of the lecture notes for how this works.

IMPORTANT: the execution did no garbage collection of unneeded namespaces. I won't stop you if you want to erase unneeded namespaces, but it isn't required.

Here's the extended language's grammar:

===================================================

P : Program                      E : Expression
CL : CommandList                 L : LefthandSide
C : Command                      I : Variable
DL : DeclarationList             N : Numeral
D : Declaration

P  ::=  DL  CL

DL ::=  D;*
   that is, a sequence of zero or more Ds, separated by  ;
   ( DL ::=  D  |  D ; DL  |  empty  )

D  ::=  int I = E  |  proc I ( IL ) : CL end 

CL ::=  C;*
C  ::=  L = E  |  if E : CL1 else CL2 end  |  print E  |  L ( EL )

EL ::=  E,*
E  ::=  N  |  ( E1 OP E2 )  |  L
OP ::=  +  |  -

L ::= I
IL ::=  I,*
I ::=  strings of letters, not including keywords

N ::=  string of digits

===================================================

Interpreter input format

The input to the interpreter is the list-based operator tree constructed by the parser. The syntax goes like this:

===================================================

PTREE ::=  [DLIST, CLIST]

DLIST ::=  [ DTREE* ]
           where  DTREE*  means zero or more DTREEs
DTREE ::=  ["int", ID, ETREE]  |  ["proc", ID, ILIST, [], CLIST]
           (note: the [] in the "proc" operator tree will be used in Part B)

CLIST ::=  [ CTREE* ]
CTREE ::=  ["=", LTREE, ETREE]  |  ["if", ETREE, CLIST, CLIST]
        |  ["print", ETREE]  |  ["call", LTREE, ELIST]

ELIST ::=   [ ETREE* ]
ETREE ::=  NUM  |  [OP, ETREE, ETREE] |  ["deref", LTREE]  
      where  OP ::=  "+"  | "-"

LTREE ::=  ID

ILIST ::= [ ID* ]
ID    ::=  a nonempty string of letters

NUM   ::=  a nonempty string of digits

===================================================

I have all this already built for you, just like I did for Assignment 1.

Do Part A in these steps:

Here are the steps you take to do this assignment:

Go to the CIS505 assignment page and download the folder, Ex2. It contains the scanner (a23lex), parser (a23parse), parser-generator library (ply), heap (heapmodule), starting interpreter (interpret), and driver (run). Double-click the run icon or type python run.py to run the prototype system and test the example at the top of this sheet.
Within heapmodule.py, replace the active-namespace variable, ns, by an activation stack, and write functions that push a handle onto the activation stack, pop the stack, check if the stack is empty, and return the top handle. Modify activeNS, initializeHeap, and printHeap so that they use the stack (and not ns). Test that the system still executes the starter test case ok. (Hint: read the notes at http://people.cis.ksu.edu/~schmidt/505f14/pythonstrucs.html to see how to use a Python list like a stack.)
Next, in interpret.py, add integer declarations to interpretDTREE. For int I = E, do this: compute the value of E in the active namespace (the namespace whose handle lies at the top of the activation stack). Make certain that variable I is not already declared in the active namespace. (If it is already declared, then it's an error that prints a message and stops execution.) Finally, bind I to E's value in the active namespace. It will be simplest if you write a new function, declare, add it to heapmodule.py, and call it to declare a new variable.
IMPORTANT: modify the interpreter to enforce that the variable that appears in position L of L = E is already declared. (Otherwise, it's an error that prints a message and stops execution.) It will be simplest if you modify the update function in heapmodule.py to do the check.
Next, implement procedure declarations in interpretDTREE: For proc I(IL): CL end, do this: bind I to the handle of a newly allocated closure (the closure is implemented as a dictionary) that holds IL, CL, and the handle to the active namespace. (See the CIS505 notes, Section 5.2.1, for details. Study carefully the example at the top of this assignment sheet.) Make certain that variable I is not already declared in the active namespace (Otherwise, it's an error that prints a message and stops execution).
Now, implement procedure call in interpretCTREE: For L(EL), do these steps, as described in the CIS505 notes: (i) Compute the meaning of L, verify that the meaning is the handle to a procedure closure, and extract from that closure these parts: IL, CL, and parentns link. (If L is not bound to a handle of a proc closure, it's an error that stops execution.) (ii) evaluate EL to a list of values (iii) Allocate a new namespace. (iv) Within the new namespace, bind parentns to the handle extracted from the closure; bind the values from EL to the corresponding names in IL. (Make certain that the number of arguments in EL equals the number of parameters in IL. Otherwise, it's an error that prints a message and stops execution). (v) Push the new namespace's handle onto the activation stack, execute CL, and upon completion pop the activation stack.
Since the intepreter uses multiple namespaces, you must implement a smarter lookup algorithm: within interpretLTREE, when searching for a variable, V, look in the active namespace first; if V is not there, then look for V in the parentns namespace. (See the CIS505 lecture notes, Section 5.2.1.)

Remember to document appropriately your modified interpreter.

Part B: multiple levels of declarations

Don't start this unless Part A is completely finished correctly.

C, Java, and C# artificially restrict programs to have just two levels of declaration nesting: local and global. This is done so the compiler can work with a simple storage layout. Other object languages have no such restriction. In Part B, you extend the language so there are multiple levels of declarations.

Here is an example: procedure p owns local declarations, z and q, and it can reference the global variable x (and itself!). Within q, there will be three levels of declarations, which you see when q is called:

int x = 2;
proc p(y):
   int z = 3;
   proc q(m) : print m;  x = ((m + y) + z) end;
   q((x + z));
end;
p(9);

Here is the execution: Parse tree: [[['int', 'x', '2'], ['proc', 'p', ['y'], [['int', 'z', '3'], ['proc', 'q', ['m'], [], [['print', ['deref', 'm']], ['=', 'x', ['+', ['+', ['deref', 'm'], ['deref', 'y']], ['deref', 'z']]]]]], [['call', 'q', [['+', ['deref', 'x'], ['deref','z']]]]]]], [['call', 'p', ['9']]]] Execution: 5 activation stack = ['h0', 'h2', 'h4'] heap = { h0 : {'parentns': 'nil', 'p': 'h1', 'x': 2} h1 : {'body': [['call', 'q', [['+', ['deref', 'x'], ['deref', 'z']]]]], 'params': ['y'], 'type': 'proc', 'link': 'h0', 'locals': [['int', 'z', '3'], ['proc','q', ['m'], [], [['print', ['deref', 'm']], ['=', 'x', ['+', ['+', ['deref', 'm'], ['deref', 'y']], ['deref', 'z']]]]]]} h2 : {'y': 9, 'parentns': 'h0', 'z': 3, 'q': 'h3'} h3 : {'body': [['print', ['deref', 'm']], ['=', 'x', ['+', ['+', ['deref', 'm'], ['deref', 'y']], ['deref', 'z']]]], 'params': ['m'], 'type': 'proc', 'link':'h2', 'locals': []} h4 : {'parentns': 'h2', 'm': 5} } Successful termination. activation stack = ['h0'] heap = { h0 : {'parentns': 'nil', 'p': 'h1', 'x': 17} h1 : {'body': [['call', 'q', [['+', ['deref', 'x'], ['deref', 'z']]]]], 'params': ['y'], 'type': 'proc', 'link': 'h0', 'locals': [['int', 'z', '3'], ['proc','q', ['m'], [], [['print', ['deref', 'm']], ['=', 'x', ['+', ['+', ['deref', 'm'], ['deref', 'y']], ['deref', 'z']]]]]]} h2 : {'y': 9, 'parentns': 'h0', 'z': 3, 'q': 'h3'} h3 : {'body': [['print', ['deref', 'm']], ['=', 'x', ['+', ['+', ['deref', 'm'], ['deref', 'y']], ['deref', 'z']]]], 'params': ['m'], 'type': 'proc', 'link':'h2', 'locals': []} h4 : {'parentns': 'h2', 'm': 5} } Press Enter key to terminate
When q is called, its namespace, h4, holds its parameter, m, and a link to h2, which holds y, z, and q, and a link to h0, which holds x and p. All these variables are visible to q's code.

The language's syntax for declarations now looks like this:

===================================================

D ::=  int I = E  |  proc I ( IL ) : DL CL end

===================================================

The local declarations DL are evaluated when the procedure is called, just before the procedure's CL is computed. The operator trees for declarations look like this:

===================================================

DTREE ::=  ["int", ID, ETREE]  
        |  ["proc", ID, ILIST, DLIST, CLIST]  
           # Note:  DLIST holds the local decs; it might be empty, too.

===================================================

You revise the code in interpretDTREE so that it builds a closure that saves the local declarations. When the procedure is called in interpretCTREE, you evaluate the local declarations just before you execute the procedure's commands. These steps are easy.

The interesting part is improving variable lookup in interpretLTREE so that it uses a loop to follow the parentns links to locate nonlocal variables. Be careful.

Testing

The Ex2 folder contains a file of test cases that you must use. Use at least these tests to check your implementation. You might devise additional tests to see if the interpreter detects program errors and prints appropriate messages.

IMPORTANT: For each successful test case, you must copy the test case and all its output from the command window and paste it into a file named tests.txt. You will not receive credit for the tests unless you submit them in the tests.txt file.

If you do not know how to paste into a command window or how to copy and paste from a command window, please ask me! It is not hard to learn.

Submission and grading

Make a folder, YourFirstName.YourLastName2, and place in it your modified versions of heapmodule.py and interpret.py and also tests.txt. (See the note just above about the tests!) Zip the folder into a .zip file and submit the .zip file to the CIS505 site on K-State Online. The teaching assistants will study your work and your tests and apply some additional tests before scoring the submission.