Copyright © 2010 David Schmidt

Chapter 7:
The functional paradigm: From an arithmetic core to ML


7.1 Arithmetic and algebra
7.2 A core language for list structures
    7.2.1 Control structures
7.3 Expression abstracts
7.4 Interpreter architecture for functional languages
7.5 Expression parameters
    7.5.1 Lambda abstractions
    7.5.2 Adding closures to the interpreter
    7.5.3 Recursively defined functions
7.6 Types and inductively defined data structures
    7.6.1 Parameter patterns
7.7 Conclusion


There is a paradigm of programming that dispenses with assignments and does computation with only expressible and denotable values --- answers are computed and named but never updated. This means loops and assignments are useless. Computation in this paradigm, the functional paradigm, proceeds like algebra, where one does equals-for-equals substitution to compute the answer that is bound (once and forever) to a variable name.

Examples of functional programming languages are Lisp, Scheme, ML, Ocaml, and Haskell. We study their principles in this chapter.


7.1 Arithmetic and algebra

The first programming language you ever learned, arithmetic, is a functional language --- its expressibles are the nonnegative ints, it has no denotables, and it has no storables. Here's its syntax:
===================================================

E: Expression
N: Numeral

E ::=  N  |  E1 + E2  |  E1 - E2  |  E1 * E2  |  E1 / E2
N ::=  sequences of digits

===================================================
Programs in this language are expressions, and computation does equals-for-equals substitution of answers for subexpressions, like this:
(2 * (4 - 3)) + 5  ==  (2 * 1) + 5  ==  2 + 5  == 7
You used equational laws, like 4 - 3 == 1 and 2 * 1 == 2, on expressions of numerals to compute the answer. This program's output (answer) is 7, because there are no subexpressions left to compute.

When you learned algebra, you learned that expressions can be named. The language of algebra is arithmetic extended with expression abstracts: The expressibles are ints, the denotables are the expressibles, and there are no storables:

===================================================

P: AlgebraProgram
D: Definition
E: Expression

P ::=  given D solve E
D ::=  I = E  |  D1 and D2
E ::=  N  |  E1 + E2 | ... | I

===================================================
Here is an example program in our ``algebra language'':
given
   y = 2 * x
   and
   z = y - 1
solve 
   y * z
The computation proceeds by calculation and substitution of answers for subexpressions until no more substitutions can be done:
y * z == y * (y - 1)                    # substitute for z
      == (y * y) - (1 * y)              # apply distributive law for *
      == (2 * x * 2 * x) - (1 * 2 * x)  # substitute for y
      == (2 * 2 * x * x) - (1 * 2 * x)  # apply commutative law for * 
      == (2 * 2 * x * x) - (2 * x)      # 1 * 2 == 2
      == (4 * x * x) - (2 * x)          # 2 * 2 == 4
In algebra, we apply calculation laws that apply to expressions (e.g. E1 * E2 == E2 * E1), and we calculate on unknowns (e.g., x). We also replace a name, like y, with the expression it names --- the equation, y = 2 * x, defines a new law we can use.

More generally in algebra, a Definition is any equation that contains one or more variables, e.g., 3 + y = (2 * x) + (2 + 1) is a definition. The calculation laws of algebra let us compute that y = 2 * x and also x = y / 2, as needed.

Can we make algebra ``grow up'' into a full-fledged programming language that has the same flavor?


7.2 A core language for list structures

Algebra is for playing ``number games.'' Many computing problems are ``data-structure games,'' where we assemble and disassemble data structures like stacks, queues, trees, and tables. Rather than using storage locations to hold the elements that we extract and insert, we can calculate directly on the entire data structure. This is the motivation for the first functional programming language, Lisp, a language for building lists and trees of words.

Here is a core language that resembles core Lisp; the expressibles are atoms and lists. For the moment, there are no denotables and no storables:

===================================================

E: Expression
A: Atom (words)

E ::=  A  |  nil  | cons E1 E2  |  head E  |  tail E |  ifnil E1 then E2 else E3
A ::= strings of letters

===================================================
An atom is a string, like "abc". nil is a list with nothing in it. In Python and ML, this is []. We will occasionally use [] for nil in our calculations. (We will follow a classic Lisp convention and treat nil and "nil" as the same value(!) If this sounds wierd to you, don't worry about it....)

cons places the value denoted by E1 on the front of the list denoted by E2, e.g., cons "b" (cons "a" nil) evaluates to a two-element list, ["b","a"]. (You can think of cons as ``stack push.'') We will use the ["b","a"] notation in our calculations.

In ML, cons is written ::, for example, "b" :: ( "a" :: [] ) assembles ["b","a"].

head returns the front element of a list, e.g., head (cons "b" (cons "a" nil)) computes to "b". (It is like ``stack top.'') In ML, head is spelled hd.

tail returns a new list missing the former front element, e.g., tail (cons "b" (cons "a" nil)) is cons "a" nil, which is the one-element list, ["a"]. (It is like ``stack pop.'') In ML, tail is spelled tl.

The ifnil-expression is a control structure that chooses to compute E2 or E3 depending on whether or not E1 computes to an empty list. (Note: when E1 computes to a non-nil atom, this is not an empty list, so the else-arm is selected.)

This language does ``list arithmetic,'' e.g.,

ifnil (cons "a" nil)
  then  nil
  else  head (tail (cons "b" (cons "c" nil)))

== head (tail (cons "b" (cons "c" nil)))

== head (cons "c" nil)  ==  "c"
The previous example used these equational laws:
===================================================

ifnil nil then E1 else E2  ==  E1

ifnil (cons E1 E2) then E3 else E4  ==  E4

head (cons E1 E2)  ==  E1

tail (cons E1 E2)  ==  E2

===================================================
So we have an ``arithmetic'' for lists, based on these equations.

If we use ML-style [...] notation for list values, we can rewrite the above expression as a computer would do it:

ifnil (cons "a" nil)
  then nil
  else head (tail (cons "b" (cons "c" nil)))

== ifnil ["a"] then nil else head (tail (cons "b" (cons "c" nil)))

== head (tail (cons "b" (cons "c" nil)))   # chose the else-arm

== head (tail (cons "b" ["c"]))  # evaluate arguments first

== head (tail ["b","c"])

== head ["c"] == "c"
This next example shows we can embed a conditional expression inside an expression:
cons "a" (ifnil nil then nil else (cons "b" nil))

== cons "a" nil  ==  ["a"]
The language lets us mix atoms and lists within the same list, e.g., this list
cons (cons "b" (cons "c" nil)) (cons "a" (cons nil nil))
computes to the data structure, [["b","c"], "a", []]. In contrast, the ML language has a data-typing system that makes lists behave like Java's arrays --- all the elements must have the same data type --- we cannot mix atoms and lists in ML.

Finally, Lisp allows cons to glue together a list to an atom or an atom to an atom, like cons "a" "b" and cons nil "a". These data structures are called dotted pairs, and you can think of them like Python tuples.

The above examples display the underlying principle of the functional paradigm: computation is expression simplification, like in algebra, with equals-for-equals substitution of subexpressions within a program.


7.2.1 Control structures

When we compute upon arithmetic expressions, we don't think of control structures. This is because control structures are primarily used to sequence the updates to a file or storage. But the previous examples showed how an if-structure lets us choose which list expression to compute, and it will be useful to control which expressions are evaluated when we process input data or input parameters.

The interactive version of ML uses a weak sequential control: After an expression is computed to its answer, the answer is saved in a temporary variable, named it. The expression that evaluates next can reference it, and once that expression finishes, its answer becomes the new value of it. Here is an example of a sequence of three expressions to simpify, separated by semicolons:

cons "a" nil;  head it;  cons it (cons it nil)
This compound expression computes to ["a","a"], because the first expression, cons "a" nil, computes to ["a"], so the next expression, head it, computes to "a" (because it names ["a"], and the last expression computes to ["a","a"] (because it names "a").


7.3 Expression abstracts

Let's give names to expressions. (These are expression abstracts.)
===================================================

D: Definition
E: Expression
A: Atom
I: Identifier

E ::=  A  |  nil  |  cons E1 E2  |  head E  |  tail E 
    |  ifnil E1 then E2 else E3  |  let D in E end  |  I

D ::=  I = E 

===================================================
I = E binds identifier I to expression E. Now, whenever we mention I, it means E and can be substituted by E, equals for equals. I is not a location in storage, it is a constant value, set just once, just like ``final variables'' in Java, just like variables in algebra.

Here is an example:

let x = nil  in
let y = cons "a" x  in
cons x (tail y) 
This looks like the algebra example we saw at the start of the chapter, and it computes as expected:
let x = nil  in
let y = cons "a" x  in
cons x (tail y)  

== let x = nil  in
   cons x (tail (cons "a" x))    # substitute for  y 

== cons nil (tail (cons "a" nil))  # substitute for x

==  ... == cons nil nil   #  ==  [[]]
The order that we substitute for occurrences of x and y does not matter, just like in algebra. The equational law for the let-construction is written by logicians like this:
===================================================

let I = E1 in E2  ==  [E1/I]E2

===================================================
where [E1/I]E2 represents expression E2 changed so that all its occurrences of I are changed to E1. In the above example, we did this:
let x = nil  in
let y = cons "a" x  in
cons x (tail y)

== let x = nil in [(cons "a" x)/y](cons x (tail y)

== let x = nil in cons x (tail (cons "a" x))

== [nil/x](cons x (tail (cons "a" x)))  ==  cons nil (tail (cons"a" nil))

== cons nil nil
but we could have done the substitutions in the other order, too:
let x = nil  in
let y = cons "a" x  in
cons x (tail y)

== [x/nil](let y = cons "a" x  in  cons x (tail y))

== let y = cons"a" nil in cons nil (tail y)

== [cons "a" nil/y](cons nil (tail y)) 

== cons nil (tail (cons "a" nil))  ==  cons nil nil

Because let definitions can be embedded anywhere, we can write complex expressions like this:

tail (let x = nil in cons x (let y = "a" in cons y x))
It is difficult to see how an interpreter for the language will maintain a namespace or a stack of namespaces to remember which names are visible to which subphrases of the expression. We will study this soon. For now, we apply the algebra laws stated above and simplify the expression to its answer:
===================================================

tail (let x = nil in cons x (let y = "a" in cons y x))

== tail (let x = nil in cons x (cons "a" x))

== tail (cons nil (cons "a" nil))

== cons "a" nil   # == ["a"]

===================================================

Here is a more delicate example, where we redefine x in nested blocks. (Perhaps we shouldn't allow this, but it is similar to writing a procedure that contains a local variable of the same name as a global variable.) When we try substitution, we get into trouble:

===================================================

let x = "a"  in
    let y = cons x nil  in
        let x = nil  in
           cons y x

===================================================
If we substitute for y first, we appear to get
let x = "a"  in
   let x = nil  in
      cons (cons x nil) x
This is incorrect --- it confuses the two definitions of x and there is no way we will obtain the correct answer, cons (cons "a" nil) nil.

The example displays a famous problem that dogged 19th-century logicians. The solution is, when we substitute for an identifier, we never allow a clash of two definitions --- we must rename the inner definition, if necessary. In the earlier example, if we substitute for y first, we must eliminate the clash between the two defined xs by renaming the inner one:

===================================================

let x = "a"  in
    let y = cons x nil  in
        let x' = nil  in
           cons y x'

===================================================
Now the subsitution can proceed with no problem. How can a computer implementation do this without rebuilding the program while it computes on it? We consider this question next.


7.4 Interpreter architecture for functional languages

We can give an interpreter semantics to the functional language as it stands. We start from the interpreter for object languages, since lists are objects. We also build namespaces and will create a new namespace for each occurrence of a let construction.

Here is the syntax of operator trees we will use; it closely matches the source syntax:

===================================================

ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]  |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]
DTREE ::=  [ID, ETREE]
ATOM  ::=  a string of letters
ID    ::=  a string of letters, not including keywords

===================================================
For example, the expression,
let x = (cons "abc" nil)  in
let y = ifnil x  nil (head x) in
(cons y x)
has this operator tree:
["let", ["x", ["cons", "abc", ["nil"]],
        ["let", ["y", ["ifnil", ["ref", "x"],
                                ["nil"],
                                ["head", ["ref", "x"]]]],
                ["cons", ["ref", "y"], ["ref", "x"]]]]]
The interpreter will process the expression's operator tree in the standard way, opening its subparts and computing their meanings. In the process, each occurrence of a "cons" operator will create a new object in the interpreter's heap --- a cons cell (``cons-pair''). After all, a list is built using nil and multiple cons operations, and this is how the list is assembled in the heap: from "nil" and cons cells. When a let definition is processing, the binding is saved in a newly created namespace, which is linked to the currently active namespace, using the "parentns" link shown in the previous chapter for modelling procedure calls and superclasses in object languages.

Here is a drawing of the heap storage for the above example when the interpreter has finished computing the final answer, cons "abc" (cons "abc" nil): The final answer is actually the handle, δ, but this is ``pretty printed'' as the list structure, as cons "abc" (cons "abc" nil). (Can you see why?)


The namespace used to evaluate the phrase, cons y x is γ, which holds the binding for y and a link to the namespace that holds the binding for x. Notice that the final answer shares the cons cell at α that is used by x. This sharing of substructure is safe because there is no assignment to alter objects already stored in the heap. This is an advantage of using functional languages over imperative languages for object management --- there is less copying of data.

Here is the code for the interpreter:

===================================================

"""Interpreter for functional language with  cons  and  simple let.

Here is the syntax of operator trees interpreted:

ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]  |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]

DTREE ::=  [ID, ETREE]

ATOM  ::=  a string of letters
ID    ::=  a string of letters, not including keywords

"""

### HEAP:

heap = {}
heap_count = 0  # how many objects stored in the heap

"""The program's heap --- a dictionary that maps handles
                          to namespace or cons-pair objects.

      heap : { (HANDLE : NAMESPACE or CONSPAIR)+ }
        where 
         HANDLE = a string of digits
         NAMESPACE = {IDENTIFIER : DENOTABLE} + {"parentns" : HANDLE or "nil"} 
            that is, each namespace must have a "parentns" field
         CONSPAIR = (DENOTABLE, DENOTABLE)
         DENOTABLE = HANDLE or ATOM or "nil"
         ATOM = string of letters

   Example:
     heap = { "0": {"w": "nil", "parentns": "nil"},
              "1": {"x":"ab", "parentns": "0"},
              "2": {"z":"3", "parentns":"1"},
              "3": ("ab","nil"), 
              "4": {"y": "5",  "parentns":"2"},
              "5": ("3","3")
            }
     heap_count = 6
        is an example heap, where handles "0","1","2","4"  name  namespaces
        which hold definitions for  w,x,z,y,  due to  let  expressions.
        Handles "3" and "5" name cons-cells that are constructed due to
        a use of  cons.
        This example heap might have been constructed by this expression:
         let w = nil  in
         let x = "ab" in
         let z = cons x w in
         let y = cons z z in
          ...

       The values stored are  w = [],  x = "ab",  z = ["ab"], y = [["ab"], "ab"]
"""

### ASSOCIATED MAINTENANCE FUNCTIONS FOR THE  heap:

def isHandle(v) :
    """checks if  v  is a legal handle into the heap"""
    return isinstance(v, str) and v.isdigit() and  int(v) < heap_count


def allocate(value) :
    """places  value  into the heap with a new handle
       param:  value - a namespace or a pair
       returns the handle of the newly saved value
    """
    global heap_count
    newloc = str(heap_count)
    heap[newloc] =  value
    heap_count = heap_count + 1
    return newloc


def deref(handle):
    """ looks up a value stored in the heap: returns  heap[handle]"""
    return heap[handle]


### MAINTENANCE FUNCTIONS FOR NAMESPACES:


def lookupNS(handle, name) :
    """looks up the value of  name  in the namespace named by  handle
       If  name  isn't found, looks in the  parentns and keeps looking....
       params: a handle and an identifier 
       returns: the first value labelled by  name  in the chain of namespaces 
    """
    if isHandle(handle):
        ns = deref(handle)
        if not isinstance(ns, dict):
            crash("handle does not name a namespace")
        else :
            if name in ns :
                ans = ns[name]
            else :   # name not in the most local ns, so look in parent:
                ans = lookupNS(ns["parentns"], name)   
    else :
        crash("invalid handle: " + name + " not found")
    return ans


def storeNS(handle, name, value) :
    """stores  name:value  into the namespace saved at  heap[handle]
    """
    if isHandle(handle):
        ns = deref(handle)
        if not isinstance(ns, dict):
            crash("handle does not name a namespace")
        else :
            if name in ns :
                crash("cannot redefine a bound name in the current scope")
            else : 
                ns[name] = value


#########################################################################

# See the end of program for the driver function,  interpretPTREE


def evalETREE(etree, ns) :
    """evalETREE computes the meaning of an expression operator tree.

      ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]  |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]

       post: updates the heap as needed and returns the  etree's value,
    """

    def getConspair(h):
        """dereferences handle  h  and returns the conspair object stored
           in the heap at  h
        """
        if isHandle(h):
            ob = deref(h)
            if isinstance(ob, tuple):  # a pair object ?
                ans = ob
            else :
                crash("value is not a cons pair")
        else :
            crash("value is not a handle")
        return ans


    ans = "error"
    if isinstance(etree, str) :  #  ATOM
        ans = etree 
    else :
        op = etree[0]
        if op == "nil" :
            ans = op

        elif op == "cons" :
            arg1 = evalETREE(etree[1], ns)
            arg2 = evalETREE(etree[2], ns)
            ans = allocate((arg1,arg2))   # store new conspair in heap

        elif op == "head" :
            arg = evalETREE(etree[1], ns)
            ans, tail = getConspair(arg)
 
        elif op == "tail" :
            arg = evalETREE(etree[1], ns)
            head, ans = getConspair(arg)

        elif op == "ifnil" :
            arg1 = evalETREE(etree[1], ns)
            if arg1 == "nil" :
                ans = evalETREE(etree[2], ns)
            else :
                ans = evalETREE(etree[3], ns)

        elif op == "let" :
            newns = evalDTREE(etree[1], ns)  # make new ns of new definitions
            ans = evalETREE(etree[2], newns)  # use new ns to eval ETREE
            # at this point,  newns  isn't used any more, so forget it!

        elif op == "ref" :
            ans = lookupNS(ns, etree[1])

        else :  crash("invalid expression form")
    return ans


def evalDTREE(dtree, ns) :
    """computes the meaning of a new definition and stores
       the  ID, meaning  binding in a new namespace, chained to  ns
           DTREE ::=  [ID, ETREE]
       returns a handle to the new namespace
    """
    name = dtree[0]
    expr = dtree[1]
    value = evalETREE(expr, ns)
    newns = allocate({name:value, "parentns": ns})
    return newns


###########################

def crash(message) :
    """pre: message is a string
       post: message is printed and interpreter stopped
    """
    print message + "! crash! ns=", ns, "heap=", heap
    raise Exception   # stops the interpreter


def prettyprint(value):
    if isHandle(value):
        v = deref(value)
        if isinstance(v, tuple):
           ans = "(cons " + prettyprint(v[0]) + " " + prettyprint(v[1]) +")"
        else :
           ans = "HANDLE TO DICTIONARY AT " + v
    else :
        ans = value
    return ans
        

# MAIN FUNCTION ###########################################################

def evalPGM(tree) :
    """interprets a complete program 
       pre: tree is an ETREE
       post: final values are deposited in heap
    """
    global heap, heap_count
    # initialize heap and ns:
    heap = {}
    heap_count = 0

    ans = evalETREE(tree, "nil")
    print "final answer =", ans
    print "pretty printed answer =", prettyprint(ans)
    print "final heap ="
    print  heap

    raw_input("Press Enter key to terminate")

===================================================
Install this code and run it with Python. Try at least these test cases:
evalPGM( ["cons", "abc", ["nil"]] )

evalPGM( ["head", ["cons", "abc", ["nil"]]] )

evalPGM( ["ifnil", ["nil"], ["head", ["cons", "abc", ["nil"]]], "def"] )

evalPGM( ["let", ["x", "abc"], ["cons", ["ref", "x"], ["nil"]]] )

evalPGM( ["let", ["x", ["cons", "abc", ["nil"]]],
         ["cons", ["ref", "x"], ["ref", "x"]]])

evalPGM( ["let", ["x", ["cons", "abc", ["nil"]]],
         ["let", ["y", "nil"], ["cons", ["ref", "x"], ["ref", "y"]]] ])
Study the heap layouts as well as the answers computed for each case.

Interpreter with ML-lists and multiple-let definitions

We can make the functional language look more like ML if we allow ML-style lists, that is, expressions of form, [v0, v1, ..., vm], and also let-constructions that define multiple bindings at once. Here is an example:
let val x = ["abc", "de"]
    val y = cons "f" x
in
    head (let val z = [x, y] in tail z)
This computes x == ["abc", "de"], y == ["f", "abc", "de"], and z == [["abc", "de"], ["f", "abc", "de"]] to obtain the final answer, "f".

Here is the storage layout that our revised interpreter will compute for this example:


Now, ML-lists are saved as heap objects, as well as namespaces and cons cells. (Cons cells are retained because they provide a quick, simple way of attaching a new element to the front of a list object already stored in the heap --- see handle γ in the picture, which represents the list, ["f", "abc", "de"].

The language's syntax is extended like this:

===================================================

E ::=  A  |  nil  |  [ E* ]
                  where  E*  means zero or more  E  phrases, separated by commas
    |  cons E1 E2  |  head E  |  tail E 
    |  ifnil E1 then E2 else E3  |  let D in E end  |  I

D ::=  val I = E  |  D1 D2

===================================================
Now we can write lists like [v0, ..., vm] and blocks like let val I1 = E1 ... val Im = En in E.

Here is the modified interpreter. Its heap stores namespaces, cons cells, and lists. Its code is almost identical to the one in the previous subsection but is reproduced in its entirely so that you can read the revised comments at the top and you do not have to cut-and-paste code fragments to assemble it. (Look for the changes to evalETREE, evalDTREE, and prettyprint.)

===================================================

"""Interpreter for functional language with cons, lists, and
     multi-definition let.
   Here is the syntax of operator trees interpreted:

ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]   |  ["list",  ETREE*]
        |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]

DTREE ::=  [ [ID, ETREE]+ }

ATOM  ::=  a string of letters
ID    ::=  a string of letters, not including keywords

"""

### HEAP:

heap = {}
heap_count = 0  # how many objects stored in the heap

"""The program's heap --- a dictionary that maps handles
                          to namespaces or cons-cells or multicell lists.

      heap : { (HANDLE : NAMESPACE or CONSPAIR or LIST)+ }
        where 
         HANDLE = a string of digits
         NAMESPACE = {IDENTIFIER : DENOTABLE} + {"parentns" : HANDLE or "nil"} 
            that is, each namespace must have a "parentns" field
         CONSPAIR = (DENOTABLE, DENOTABLE)
         LIST = [ DENOTABLE* ]
         DENOTABLE = HANDLE or ATOM or "nil"
         ATOM = string of letters

   Example:
     heap = { "0": {"x":"ab", "y":"2", "z":"1", "parentns": "nil"},
              "1": ["ab", "ab"],
              "2": ("cd", "1"), 
              "3": {"w": "4", "parentns": "0"},
              "4": ["ab","2","1"]
            }
     heap_count = 5

     This example heap might have been constructed by this expression:
         let x = "ab"
             z = [x, x]
             y = cons "cd" z
         in ...
            let w = [x,y,z]  in ...
             
"""

### ASSOCIATED MAINTENANCE FUNCTIONS FOR THE  heap:

def isHandle(v) :
    """checks if  v  is a legal handle into the heap"""
    return isinstance(v, str) and v.isdigit() and  int(v) < heap_count


def allocate(value) :
    """places  value  into the heap with a new handle
       param:  value - a namespace or a pair
       returns the handle of the newly saved value
    """
    global heap_count
    newloc = str(heap_count)
    heap[newloc] =  value
    heap_count = heap_count + 1
    return newloc


def deref(handle):
    """returns  heap[handle]"""
    return heap[handle]


### MAINTENANCE FUNCTIONS FOR NAMESPACES:


def lookupNS(handle, name) :
    """looks up the value of  name  in the namespace named by  handle
       If  name  isn't found, looks in the  parentns
       params: a handle and an identifier 
       returns: the first value labelled by  name  in the chain of namespaces 
    """
    if isHandle(handle):
        ns = deref(handle)
        if not isinstance(ns, dict):
            crash("handle does not name a namespace")
        else :
            if name in ns :
                ans = ns[name]
            else :   # name not in the most local ns, so look in parent:
                ans = lookupNS(ns["parentns"], name)   
    else :
        crash("invalid handle: " + name + " not found")
    return ans


def storeNS(handle, name, value) :
    """stores  name:value  into the namespace at  heap[handle]
    """
    if isHandle(handle):
        ns = deref(handle)
        if not isinstance(ns, dict):
            crash("handle does not name a namespace")
        else :
            if name in ns :
                crash("cannot redefine a bound name in the current scope")
            else : 
                ns[name] = value


##########################################################################
### Here is the syntax of operator trees interpreted by this interpreter:

"""

ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]   |  ["list",  ETREE*]
        |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]

DTREE ::=  [ [ID, ETREE]+ }

ATOM  ::=  a string of letters
ID    ::=  a string of letters, not including keywords
"""

#########################################################################

# See the end of program for the driver function,  interpretPTREE


def evalETREE(etree, ns) :
    """evalETREE computes the meaning of an expression operator tree.

      ETREE ::=  ATOM  |  ["nil"]  |  ["cons", ETREE, ETREE] |  ["head", ETREE] 
        |  ["tail", ETREE]  |  ["list" ETREE*]
        |  ["ifnil", ETREE1, ETREE2, ETREE3]
        |  ["let", DTREE, ETREE]  |  ["ref", ID]

       post: updates the heap as needed and returns the  etree's value,
    """

    def getCons(whichpart, h):
        """dereferences handle  h  and returns the conspair component 
           in the heap at  h
           parms: whichpart - is "head" or "tail"
                  h - handle to the conspair/list to be opened
           returns:  the value of  whichpart  of the conspair/list at  h
        """
        if isHandle(h):
            ob = deref(h)
            if isinstance(ob, tuple):  # a pair object ?
                if whichpart == "head" :
                    ans = ob[0]
                elif whichpart == "tail" :
                    ans = ob[1]
            elif isinstance(ob,list) and  ob != [] : # a nonempty list object ?
                # break the list into its head,tail pair:
                if whichpart == "head" :
                    ans = ob[0]
                elif whichpart == "tail" :
                    ans = allocate(ob[1:])  # must allocate new ob for tail )-:
            else :
                crash("value is not a cons pair")
        else :
            crash("value is not a handle")
        return ans
    # end function getCons

    ans = "error"
    if isinstance(etree, str) :  #  ATOM
        ans = etree 
    else :
        op = etree[0]
        if op == "nil" :
            ans = op

        elif op == "cons" :
            arg1 = evalETREE(etree[1], ns)
            arg2 = evalETREE(etree[2], ns)
            ans = allocate((arg1,arg2))   # store new conspair in heap

        elif op == "head" :
            arg = evalETREE(etree[1], ns)
            ans = getCons("head", arg)
 
        elif op == "tail" :
            arg = evalETREE(etree[1], ns)
            ans  = getCons("tail", arg)

        elif op == "list" :   # ["list", ETREE* ]
            args = etree[1:]
            newlist = []
            for e in args:   # eval each of the list elements and collect them
                newlist.append(evalETREE(e, ns))
            ans = allocate(newlist)   # store the new list into the heap

        elif op == "ifnil" :
            arg1 = evalETREE(etree[1], ns)
            if arg1 == "nil" or (isHandle(arg1) and deref(arg1) == []) :
                ans = evalETREE(etree[2], ns)
            else :
                ans = evalETREE(etree[3], ns)

        elif op == "let" :
            newns = evalDTREE(etree[1], ns)  # make new ns of new definitions
            ans = evalETREE(etree[2], newns)  # use new ns to eval ETREE
            # at this point,  newns  isn't used any more, so forget it!

        elif op == "ref" :
            ans = lookupNS(ns, etree[1])

        else :  crash("invalid expression form")
    return ans



def evalDTREE(dtree, ns) :
    """computes the meaning of a sequence of new definitions and stores
       the  ID, meaning  bindings in a new namespace
           DTREE ::=  [ [ID, ETREE]+ ]
       returns a handle to the new namespace
    """
    newns = allocate({"parentns": ns})  # create the new ns in the heap
    for bindingpair in dtree :  # add all the new bindings to the new ns
        name = bindingpair[0]
        expr = bindingpair[1]
        value = evalETREE(expr, newns)
        storeNS(newns, name, value)
    return newns

###########################

def crash(message) :
    """pre: message is a string
       post: message is printed and interpreter stopped
    """
    print message + "! crash! ns=", ns, "heap=", heap
    raise Exception   # stops the interpreter


def prettyprint(value):
    if isHandle(value):
        v = deref(value)
        if isinstance(v, tuple):
           subans1 = prettyprint(v[0])
           subans2 = prettyprint(v[1])
           if subans2[0] == "[" :
               if subans2 == "[]" :
                   ans = "[" + subans1 + "]"
               else :
                   ans = "[" + subans1 + ", " + subans2[1:]
           else :
               ans = "(cons " + subans1 + " " + subans2 + ")"
        elif isinstance(v, list):
           ans = "["
           for arg in v :
               ans = ans + prettyprint(arg) + ", "
           if ans[len(ans)-1] == " " :
               ans = ans[:len(ans)-2]
           ans = ans + "]"
        else :
           ans = "HANDLE TO DICTIONARY AT " + v
    elif value == "nil":
        ans = "[]"
    else : ans = value
        
    return ans

# MAIN FUNCTION ###########################################################

def evalPGM(tree) :
    """interprets a complete program 
       pre: tree is an ETREE
       post: final values are deposited in heap
    """
    global heap, heap_count
    # initialize heap and ns:
    heap = {}
    heap_count = 0

    ans = evalETREE(tree, "nil")
    print "final answer =", ans
    print "pretty printed answer =", prettyprint(ans)
    print "final heap =", heap

    raw_input("Press Enter key to terminate")

===================================================
Install the interpreter and test it on these examples:
evalPGM( ["list", "abc", "de"])

evalPGM( ["cons", "abc", ["list", "abc", "de"] ] )

evalPGM( ["tail", ["tail", ["list", "abc", "de"]] ])

evalPGM( ["let", [["x", ["list", "abc", "de"]]],
                 ["head", ["cons", ["ref", "x"], ["ref", "x"]]]] )

evalPGM( ["let", [["x", ["cons", "abc", ["list"]]],
                  ["y", "nil"]],
                 ["cons", ["ref", "x"], ["ref", "y"]]] )

evalPGM( ["let", [["x", ["list", "abc", ["nil"]]]],
                  ["y", ["ref", "x"]]],
                 ["cons", ["ref", "x"], ["ref", "y"]]] )


7.5 Expression parameters

Definitions can be parameterized; the result are functions:

===================================================

E ::=  ... |  [ E* ]  |  let D in E  |  I  |  I(E*)
D ::=  val I = E  |  fun I1(I2*) = E  |  D1 D2

===================================================
where E* means zero or more expressions, separated by commas and I* means zero or more identifiers, separated by commas.

When a function is defined, its code (expression) is saved. When the function is called, its arguments bind to its parameters, and the function's code evaluates. Here is a small example:

===================================================

let fun second(list) = let val rest = tail list
                       in  head rest
in
second(["a", "b", "c"])

===================================================

The argument binds to the parameter name when the function is called. The rewriting might go like this: (We have a better way to do it in a moment!)

===================================================

second(["a","b","c"])

== let val list = ["a","b","c"] in    # bind the argument to the parameter
   let val rest = tail list
   in  head rest

== let val rest = tail ["a","b","c"]  in   # subsititue for list
   head rest

== let val rest = ["b","c"]
   in  head rest

== head ["b","c"]  == "b"

===================================================
We will have much more to say about functions in a moment.


7.5.1 Lambda abstractions

The previous example calculation did not follow precisely the style we used so far. In particular, we should substitute the code for second into the position where second is referenced. Let's see where this leads us.

The definition,

fun second(list) = let val rest = tail list
                   in  head rest
can be written as an ordinary val-definition like this, by moving the parameter name to the right of the equals sign:
val second = (list) let val rest = tail list  
                    in  head rest
It is a tradition to place the word, lambda, in front of the parameter name, (list), so that the reader can identify it clearly:
===================================================

val second = lambda list : let val rest = tail list
                           in  head rest

===================================================
second is the same function, just formatted a little differently. Now it is clear that second is the name of the function code, lambda list : let val rest = tail list in head rest. Both the parameter name and the body are the function code.

Let's rework the algebraic calculation of the example in this format:

===================================================

let val second = lambda list: let val rest = tail list in head rest
in  second(["a","b","c"])

===================================================
We begin by substituting for second:
===================================================

(lambda list: let val rest = tail list in head rest)(["a","b","c"])

== let val rest = tail ["a","b","c"] in head rest   # substitute ["a","b","c"] for list

== let val rest = ["b","c"] in head rest

== head ["b","c"]  ==  "b"

===================================================
The expression, lambda list: ..., is the function code, divorced from its name. We copied the function code into the position where the function is referenced --- called. This is how substitution is supposed to work: you replace an identifier by the value it names. The binding of argument to parameter is defined by this equational rule:
===================================================

(lambda PARAM: BODY)(ARG)  ==  [ARG/PARAM]BODY

===================================================
That is, (lambda PARAM: BODY)(ARG) == let PARAM = ARG in BODY. Now, everything works smoothly.

The syntax we use for functions and function call can look like this:

===================================================

E ::=  ...  |  let D in E  |  I  |  E1(E2*)  |  lambda I*: E
D ::=  val I = E  |  D1 D2

===================================================
A function body, lambda I*: E, is an expression, just like nil or cons(E1,E2), because it can appear embedded within an expression:
cons "a" (lambda x: [x,x])(nil) 

== cons "a" [nil, nil]  == ["a", nil, nil]
The nameless function is is called the lambda abstraction --- it is a kind of abstract, a naming device, for the parameter. The lambda abstraction has a long, rich history, extending to the debates in 19th-century philosophy that let to the development of modern set theory and predicate logic. They also happen to be quite useful for computation!

Now the language's characteristic domains go as follows: the expressible values are atoms, lists, and functions on expressibles. The denotable values are exactly the expressibles. There are still no storable (updateable) values.


7.5.2 Adding closures to the interpreter

We can easily add lambda abstractions to the interpreters in the previous section --- we use closure objects, just like we did when we implemented procedures in an object language.

As before a closure object is a pair, consisting of function-code-plus-parent-namespace-pointer. A couple of pictures will make this clear. For this sample program:

===================================================

let val second = lambda list: head (tail list)  #(i)
in  let val x = ["a","b","c"]
    in  cons  second(x)  x   # (ii)

===================================================
The heap layout at point (i), where second is called and computes its answer, "b", looks like this:

second's value, saved in namespace α, is the closure object at handle β. The closure remembers the code for the function along with a link to the namespace of global names that can be seen by the function. When second was called, namespace ε was created to hold its parameter, list, and ε becomes the current namespace. Note that x, saved in namespace γ, is not visible to second's code.

Once second returns its answer, "b" (the head of the sublist at handle φ), we reach point (ii), and the current namespace reverts to γ, which lets us compute the final answer, the list named by handle &eta (that is, ["b","a","b","c"]):


Notice that the objects at ε and φ are no longer accessible to the main program, and a garbage-collector program can safely erase the ``garbage objects.''


7.5.3 Recursively defined functions

A function can restart itself by looking up its own definition; this is called recursion. (You can see this in the previous picture: the code for second can refer to itself if it wishes.) A recursive call executes exactly the same way as any other function call; no new machinery is needed.

By using recursion, we can write repetitive computations in the functional language. Here is an example that uses an input() command to read a sequence of values from the user and assemble them into a list, in reverse order:

===================================================

let fun makeList(ans) =        #  ans  binds to a list argument
        let val v = input()    #  input()  asks the user to type a value
        in  ifnil v then ans
            else makeList(cons v ans)   # cons the input value to  ans
                                        # and repeat the function

in  makeList(nil)  # start the recursions with an empty list

===================================================
An equational calculation of the example proceeds like this. (Say that the user will enter input values "a", "b", and nil to the program. This will construct the list, ["b","a"].)
===================================================

makeList(nil)

== let val ans = []  in   # bind argument to parameter name
   let val v = input() 
   in  ifnil v then ans
       else makeList(cons v ans)

== let val v = input()  in
   ifnil v then []
           else  makeList(cons v [])

# the user types input "a" :

== let v = "a"  in
   ifnil v then []
           else  makeList(cons v [])

== ifnil "a" then []
             else makeList(cons "a" [])

== makeList(cons "a" [])


== makeList(["a"])  ==  let val ans = ["a"] in    
                        let val v = input()
                        in  ifnil v then ans
                            else makeList(cons v ans)

                    ==  let val v = input() 
                        in  ifnil v then ["a"]
                            else makeList(cons v ["a"])

                    # the user types input "b":

                    ==  let val v = "b"
                        in  ifnil v then ["a"]
                            else makeList(cons v ["a"])

                    == makeList(cons "b" ["a"])

                    # restart, again:

                    == makeList(["b","a"]) == let val ans = ["b","a"] in
                                              let val v = input() 
                                              in  ifnil v then ans
                                                  else makeList(cons v ans)
                                               
                                           == let val v = input() 
                                              in  ifnil v then ["b","a"]
                                                  else makeList(cons v ["b","a"])
                                             # user types nil:

                                             == let val v == nil 
                                                in  ifnil v then ["b","a"]
                                                    else ...

                                             == ["b","a"]
                   == ["b","a"]
== ["b","a"]

===================================================
Each time makeList appears in the calculation, we substitute its code (actually, its lambda abstraction --- see the previous section!) for it.

Here is another example, a function that checks if value v is found within a list, list. If yes, then v is returned as the answer, else nil is returned:

===================================================

fun member(v, list) = ifnil list then nil
                      else if v == head(list) 
                           then  v
                           else  member(v, tail list)

===================================================
The function restarts to repeatly search inside all the tails of the list until it finds v (or fails when there are no more tails left to search).

Here is a function that glues (appends) two lists together:

===================================================

fun append(list1, list2) = ifnil list1 
                           then list2
                           else let val rest = append(tail list1, list2)
                                in  cons (head list1) rest

===================================================
append uses the trick of extracting all the head elements of all the tails within list1 to build the answer. This is a crucial technique to understand, so here is an example:
===================================================

let val x = ["a","b"]
in  append(x, ["c"])

== append(["a","b"], ["c"])

== let val list1 = ["a","b"]
       val list2 = ["c"] in
   ifnil list1 then ... else ...

== let val rest = append(["b"],["c"]) == let val list1 = ["b"] 
   in  cons("a", rest)                       val list2 = ["c"] in
                                         ifnil list1 then ... else ...

                                      == let val rest = append([],["c"]) == let val list1 = []
                                         in  cons "b" rest
                                                                                val list2 = ["c"] in
                                                                            ifnil list1 then ...
 
                                                                         == ["c"]
                                      == let val rest = ["c"]
                                         in  cons "b" rest

                                      == cons "b" ["c"]

                                      == ["b", "c"]
== let val rest = ["b","c"]
   in  cons "a" rest

== cons "a" ["b","c"]

== ["a", "b", "c"]

===================================================
This example starts function append three times to move one by one the elements in the first argument list into the second argument list.

Here is a function that uses append to build a new list that reverses the order of elements in an argument list:

===================================================

fun reverse(list) = ifnil list
                    then nil
                    else let val rest = reverse(tail list)
                         in  append(rest, [head list])

===================================================
A simpler coding of reverse uses an extra parameter (an ``accumulator parameter'') to collect the eventual answer:
===================================================

let fun reverse2(x, ans) = ifnil x
                           then ans
                           else reverse2(tail x,  cons (head x) ans)
in 
fun reverse(list) = reverse2(list, nil)

===================================================

You may find these little list games artificial and even a bit frustrating, but all of mathematics is based upon them! The reason is that a number, like 4, is coded as a list, ["a", "a", "a", "a"]. Function append defines addition on base-one numbers, and you should code functions that do multiplication and subtraction. If you have more enthusiasm, use lists to encode base-two numbers, where 4 is represented by the list, ["one", "zero", "zero"]. You should code the standard arithmetic operations for base-two numbers and design a CPU and list-base memory for an emulated computer.


7.6 Types and inductively defined data structures

The language we have developed has two types of data: atoms and lists. (If we allow lambda abstractions, there is a third type --- functions.) Some functional languages (e.g., Lisp and Scheme) allow arbitrary combinations of values, e.g., lists that mix atoms and lists, and allow operations on all possible values (e.g., equality comparison of an atom versus a list). Other languages, like ML, Ocaml, and Haskell, require a Pascal- or Java-like type system, where each value has a specific type, each list's elements have one and the same type, and only values of the same type can be compared for equality.

In this section, we add data typing to the language's syntax and see how it encourages us to define new types of data structure also.

Here is the syntax of types of the core language:

===================================================

T: Type

T ::=  Atom  |  T List

===================================================
We will require that every expressible value must conform to a specific type. For example,
"a"  has type  Atom

cons("a", nil),  which evaluates to  ["a"],   has type  Atom List

cons(nil, cons("a", nil), cons("a", cons("b", nil))),
  which evaluates to  [[], ["a"], ["a","b"]]   has type (Atom List)List
These requirements can be formalized by logic-rule typing laws, defined on the operators of the language:
===================================================


A : Atom               E1 : T    E2 : T List
                     ------------------------
                         cons(E1, E2) : T List

E : T List             E : T List
-----------          ---------------
head(E) : T          tail(T) : T List


E1 : T List   E2 : T'   E3 : T'
---------------------------------
 ifnil E1 then E2 else E3 : T'

===================================================
A question remains: What is the type of nil? The answer is, T' List, for any type T' whatsoever. So, nil has ``many types,'' depending on where it is inserted in an expression. (See the earlier examples.) For this reason, its ``typing rule'' goes:
===================================================

nil : T List

===================================================
where you can fill in T as you wish.

If we have types, we can can have ``type abstracts,'' where we invent names for types we define. This idea was used brilliantly by Rod Burstall in the language, Hope, and adapted by Luca Cardelli into ML. Here is an example of a type abstract that defines a data type of binary trees that hold atoms at their nodes and leaves:

===================================================

type Tree = leaf of Atom  |  node of Atom * Tree * Tree

===================================================
The identifiers, leaf and node, are constructors, for constructing trees, like cons is a constructor for constructing lists.

Here are some expressions that have data type Tree:

===================================================

leaf("a"),  which represents a leaf tree,  "a"

node("b", leaf("a"), leaf("c")), which represents the node tree,  "b"
                                                                 /   \
                                                                "a"  "c"
let val tree1 = leaf("a") 
    tree2 = node("b", tree1, tree1) 
in  node("c", tree1, tree2)      which represents     "c"
                                                     /   \
                                                   "a"    "b"
                                                         /  \
                                                       "a"  "a"

   or, for that matter, represents    "c"
                                     /   \ 
                                    +   "b"
                                    |   /  \
                                    |   |  |
                                   "a"<--<-+    where leaf("a") is shared


===================================================
Because the type is defined in terms of itself (that is, trees can hold other, smaller trees), it is called inductively defined. This means we can assemble trees of arbitrary depth, just like we can construct lists of arbitrary length. Indeed, the lists already built into the core language are inductively defined, somewhat like this:
type List = nil |  cons of Expressible * List

where  Expressible = Atom + List

User-defined types define schemas for data-structure building, much like classes do for object-oriented programming. Here are some types for a library:

type Item =  book of Atom * Atom    # book(title,author)
          |  dvd of Atom            # dvd(title)

type Holding = key of Atom * Item   # key(idnumber, item)

type Database = Holding List
A major strength of functional languages is the ability to construct complex inductively defined data structures and process them. If you review the chapter on grammars, interpreters, and parsers, you see inductively-defined data types everywhere.


7.6.1 Parameter patterns

Every data structure should have control structures that are oriented towards processing the data structure. For the functional language, the control structure is embedded as argument patterns in the functions we write. For example, for
===================================================

type Tree = leaf of Atom  |  node of Atom * Tree * Tree

===================================================
We might write a function that collects into a list the atoms in a Tree:
===================================================

fun collect(tree) =
     if isinstance(tree, leaf(x))
        then [x]      # x binds to the atom held within  leaf(x)
     else if isinstance(tree, node(x, y, z))
        then let val left = collect(y)      # x, y, z  bind to the components of
                 val right = collect(z)     #  the node
             in  cons x append(left, right)

===================================================
The isinstance operator tries to match its first argument to the pattern in its second argument, and if successful, binds the names in the pattern to the components within the first argument. But there is a prettier way to write this, by merging isinstance with the parameter name, tree, like this:
===================================================

fun collect(leaf(x)) = [x]
  | collect(node(x,y,z)) = cons  x  append(collect(y), collect(z))

===================================================
There is a tremendous amount of computation embedded within these two lines of coding! Here is an example:
===================================================

collect(node("a", leaf("b"), leaf("c")))

== let val x = "a" 
       val y = leaf("b") 
       val z = leaf("c") in    # these three bindings are made by the pattern match
   cons x append(collect(y), collect(z))

== cons "a", 
     append(
        collect(leaf("b")), == let val x = "b" in [x]  == ["b"]
        collect(leaf("c"))  == let x = "c" in [x]  == ["c"]
     )

== cons("a",
     append(   
        ["b"]
        ["c"]
     )        == ... == ["b","c"]
   )

== cons("a",
        ["b","c"])

== ["a", "b", "c"]

===================================================

In the Chapter on BNF, we saw a grammar for arithmetic operator trees and we saw a translator that mapped operator trees into postfix representation. Here is what this all looks like when written with a inductively defined type:

===================================================

type Optree = numeral of Atom   # e.g.,  numeral("3")
           |  expr of Atom * Optree * Optree  # e.g., expr("+", tree1, tree2)

fun postfix(numeral(n)) = [n]
  | postfix(expr(operator, arg1, arg2) =
       let val ans1 = postfix(arg1)
           val ans2 = postfix(arg2) in
       append(ans1, append(ans2, [operator]))

===================================================
For example, the operator tree for 2 + (3 * 4) would be tree0 = expr("+", numeral("2"), expr("*", numeral("3"), numeral("4"))), so postfix(tree0) computes to the postfix translation, ["2", "3", "4", "*", "+"]. Our use of Python lists as operator trees is merely an approximation of an inductively defined type, Optree!


7.7 Conclusion

A functional language lets you compute by substituting equals-for-equals, like you learned when you learned arithmetic and algebra. This works because every identifier denotes a constant value, bound once and forever. It also means that the heap-base implementation can do massive sharing of substructures within a program.

The computation-as-algebra concept fails with imperative languages, because identifiers denote variables whose stored values change from line to line. For example, if we have an imperative program, like this one,

int x;
int y;
x = 0;
if ... {
    x = x + 1 }
else {
    x = 2
}
y = x;
it makes absolutely no sense to use x = 0 to ``substitute'' 0 for ``all occurrences'' of x in the rest of the program. Instead, we must execute the program with an instruction counter and primary storage --- there is no way to understand the program without the storage it manipulates, and there is no way to ``calculate'' the program's meaning by substitution.

In summary, imperative languages work with storage structures that are incrementally updated, for example, a data base that is shared by many programs over a period of time. Or, a graphics system that maintains the storage structure that represents the pixels on a display.

In contrast, functional languages solve self-contained problems that assemble inputs into a data structure and then process the data structure into an answer. An example is a compiler, which translates an input program into a tree data structure and then processes the tree into output code. Or, a batch payroll system that converts a file of payroll information into a file of paycheks. Or, a library of numerical functions that compute answers to questions in physics or biology.

When you face a problem that can be solved by equational-style calculation, solve it with a functional language.