There is a paradigm of programming that dispenses with traditional assignment. As a result, computation in this paradigm, the functional paradigm, looks somewhat like algebra, where one does equals-for-equals substitution to compute the answer that is bound (once and forever) to a name.
The reason one might take seriously this approach is because many many errors arise due to updating a cell in the wrong order or due to updating a shared cell or due to a race condition. The problem has appeared recently in multi-core processors, where there are huge problems with synchronizing all the processors' caches with primary storage.
Examples of functional programming languages are Lisp, Scheme, ML, Ocaml, and Haskell. We study their principles in this chapter.
+-----+ | CPU | (controller) +-----+ | V +---------------+-+-+- ... +-+-+ | program codes | | | | | | +---------------+-+-+- ... +-+-+ (data table saved in cells)The program updates the table's cells over and over until the instructions are finished. The von Neumann machine is based on a theoretical model known as the Universal Turing machine, which looks much like the picture above.
But not all computations are table driven. Think about arithmetic!
Here is a "program" --- (3 * (4 + 5) --- and its computation:
(3 * (4 + 5)) = (3 * 9) = 27
Here,
computation rewrites the program until it can be rewritten no more.
There are no ``cells'' to update.
The rewriting rules for arithmetic can be stated in algebra-equation style,
using a formalism called a Post system. If we represent
(nonnegative)
numbers in Base 1 format (e.g., 0 is 0, 1 is s0, 2 is ss0, 3 is sss0, etc.), then only four equations
are needed to define addition and multiplication:
(0 + N) = N
(sM + N) = (M + sN)
(0 * N) = 0
(sM * N) = (N + (M * N))
M and N are algebra-style variables, e.g.,
the second rule lets us compute (s0 + ss0) = (0 + sss0) (where M matches
0 and N matches ss0), then the first rule lets us compute
(0 + sss0) = sss0. That is, we computed (1 + 2) equals 3.
As an exercise, compute (2 * 3), that is (ss0 * sss0):
(ss0 * sss0) = (sss0 + (s0 * sss0))
= (sss0 + (sss0 + (0 * sss0)))
= (sss0 + (sss0 + 0))
= (sss0 + (ss0 + s0))
= (sss0 + (s0 + ss0)) = (sss0 + (0 + sss0)) = (sss0 + sss0)
= . . . = ssssss0
A computer based on this approach might look like this:
+----------------------------------+
| hard-wired equations for * and + | (in real life, an ALU holds this wiring)
+----------------------------------+
|
V
+------------------------+
| arithmetic expression |
+------------------------+
The machine repeatly scans the arithmetic expression, searching for a
phrase that can be rewritten by one of the equations for * and +.
There is no instruction counter, nor data cells.
The arithmetic-expression part is best represented as an operator tree,
which is easy to traverse, match, and rewrite:
At the end, unneeded subtrees remain as garbage, but these can be
erased by a garbage collector.
This layout will lead to a beautiful solution to storage sharing.
Based on the arithmetic example,
you can readily imagine a ``universal rewriting machine,''
which operates with user-defined equations and an expression:
+-------------------------+
| equation-rewrite engine |
+-------------------------+
| |
V V
+------------+-----------------+
| equations | expression tree |
+------------+-----------------+
The machine repeatedly matches equations in the ''equations'' part
to subtrees in the ``expression'' part and does rewriting.
There is no sequential code, no instruction counter, no data cells ---
only equations and a tree that is constantly reconfigured.
This is a different paradigm than the Turing/von Neuman machine, but it
is equally computationally powerful.
There is an even more exotic version of rewriting, called the
lambda calculus, where there is only one operator,
λ, and programs and data are coded just with it and algebra
variables. Here are some codings:
0: (λs(λz z))
1: (λs(λz (s z)))
2: (λs(λz (s (s z))))
and so on
M + N : ((M plusOne) N)
where plusOne : (λn(λs(λz (s ((n s) z)))))
M * N : ((M +) N)
There is just one rewrite equation for λ:
((λx M) N) = [N/x]M
where [N/x]M is the replacement of all (free) occurrences of x by N in M
The lambda-calculus is equally computationally powerful to Post systems and
Turing/von Neuman machines.
We will examine it later in the chapter.
Now we are ready to learn the functional programming paradigm, which does computing as program rewriting. As shown above, the rewriting can be implemented as a machine that traverses and constructs operator trees.
=================================================== E: Expression N: Numeral E ::= N | E1 + E2 | E1 - E2 | E1 * E2 N ::= sequences of digits ===================================================Programs in this language are expressions, and computation does equals-for-equals substitution of answers for subexpressions. For the program, (2 * (4 - 3)) + 5, here is its computation:
(2 * (4 - 3)) + 5 = (2 * 1) + 5 = 2 + 5 = 7We used equational laws, like 4 - 3 = 1 and 2 * 1 = 2, on the operators and numerals to compute the answer. This program's output (answer) is 7, because there are no subexpressions left to compute.
When you learned algebra, you learned that expressions can be
named. The language of algebra is arithmetic extended with
expression abstracts:
===================================================
P: AlgebraProgram
D: Definition
E: Expression
P ::= given D solve E
D ::= I = E | D1 and D2
E ::= N | E1 + E2 | E1 * E2 | I
===================================================
The computation laws for algebra are interesting and important.
Here are the ones you learn in abstract algebra:
((A + B) + C) = (A + (B + C)) (i)
(A + B) = (B + A) (ii)
(A + 0) = A (iii)
((A * B) * C) = (A * (B * C)) (iv)
(A * B) = (B * A) (v)
(A * 1) = A (vii)
(A * (B + C)) = ((A * B) + (A * C)) (viii)
These laws can be applied in any order to a program.
Here is an example program:
given
y = 2 * x
and
z = y + 1
solve
2 * z
We apply the equations to 2 * z to compute a phrase that has
the same meaning as the starting program but has a more direct representation:
2 * z = 2 * (y + 1) # substitute for z
= (2 * y) + (2 * 1) # (viii)
= (2 * y) + 2 # 2 * 1 = 2
= (2 * (2 * x)) + 2 # substitute for y
= ((2 * 2) * x) + 2 # (iv)
= (4 * x) + 2 # 2 * 2 = 4
Since we do not know the value of x, we stop here.
Here is a core language that resembles core Lisp or core ML;
the expressibles are atoms and lists.
For the moment, there are no denotables and no storables:
===================================================
E: Expression
A: Atom (words)
E ::= A | nil | E1 :: E2 | hd E | tl E | ifnil E1 then E2 else E3
A ::= strings of letters
===================================================
(In ML, you would say, if E1 = nil then E2 else E3, and in general, if B then E2 else E3, where B is a boolean-typed expression.)
=================================================== hd (E1 :: E2) = E1 (i) tl (E1 :: E2) = E2 (ii) ifnil nil then E1 else E2 = E1 (iii) ifnil (E1 :: E2) then E3 else E4 = E4 (iv) ===================================================(Note: some people add this equation, so that the conditional can test on atoms as well as lists:
We have an ``arithmetic'' for lists, based on these equations.
Here's one small example, a list that mixes
atoms and lists:
"a" :: (tl (hd (("b" :: nil) :: ("c" :: nil))))
= "a" :: (tl ("b" :: nil)) # rule (i)
= "a" :: nil # rule (ii)
In the above,
the start list is ("b" :: nil) :: ("c" :: nil), which we would write as
[["b"], "c"] in ML. We then extract its head, ["b"] and then its tail,
[], to which we cons "a", giving the answer, "["a"].
Here's another example:
ifnil ("a" :: nil)
then nil
else hd (tl ("b" :: ("c" :: nil)))
= hd (tl ("b" :: ("c" :: nil))) # rule (iv)
= hd ("c" :: nil) = "c" # rules (ii) and (i)
The order in which we use the equations does not matter
since there is no assignment command or sequencing control structure.
The previous example could be worked like this:
ifnil ("a" :: nil)
then nil
else hd (tl ("b" :: ("c" :: nil)))
= ifnil ("a" :: nil)
then nil
else hd ("c" :: nil)
= ifnil ("a" :: nil)
then nil
else "c"
= "c"
We
get the same answer with our equations, regardless of the order we apply them.
(This is called the confluence or Noetherian property of a
rewriting system.)
So far, our language doesn't look like much, but since we can build lists that mix atoms and other lists, we can easily model trees, tables, dictionaries, and indeed all the important data structures of computer science. And with one key extension (parameterized expression abstracts that can call themselves), we will achieve a language that has the same computing power as all known programming languages.
This next example shows we can embed a conditional expression inside
an expression:
"a" :: (ifnil nil then nil else ("b" :: nil)) = "a" :: nil
Finally, we can use :: to glue a list to an atom or an atom to an atom, e.g., "a" :: "b" and nil :: "a". These data structures are called dotted pairs, and they are like Python pairs.
The interactive version of ML uses a weak sequential control: After an
expression is computed to its answer, the answer is saved
in a temporary variable, named it. The expression
that evaluates next can reference it, and once that expression
finishes, its answer becomes the new value of
it. Here is an example of a sequence of three expressions
to simpify, separated by semicolons:
"a" :: nil; hd it; it :: (it :: nil)
In ML,
this compound expression computes to ["a","a"], because the
first expression, "a" :: nil, computes to ["a"],
so the next expression, hd it, computes to "a" (because
it names ["a"], and the last expression computes to
["a","a"] (because it names "a").
=================================================== D: Definition E: Expression A: Atom I: Identifier E ::= A | nil | E1 :: E2 | hd E | tl E | ifnil E1 then E2 else E3 | let D in E end | I D ::= val I = E ===================================================val I = E binds identifier I to expression E. Now, whenever we mention I, it means E and can be substituted by E, equals for equals. I is not a location in storage, it is a constant value, set just once. (Java lets you declare a ``final variable'' that is initialized and fixed forever, e.g., final double pi = 3.14.159.)
We add one new rewriting equation to our semantics:
===================================================
let val I = E1 in E2 end = [ E1 / I ] E2
===================================================
where [ E1 / I ] E2 means that we substitute the phrase E1 for all
(free) occurrences of name I within phrase E2.
Here is an example:
let val x = "a" :: nil in
let val y = "b" :: x in
x :: (tl y) end
end
This can compute as follows:
===================================================
let val x = "a" :: nil in
let val y = "b" :: x in
x :: (tl y) end end
= let val y = "b" :: ("a" :: nil) in
("a" :: nil) :: (tl y) end
= ("a" :: nil) :: (tl ("b" :: ("a" :: nil)))
= ("a" :: nil) :: ("a" :: nil)
===================================================
The answer displays as the list, [["a"], "a"], in ML.
As usual, the example can be worked in another order:
===================================================
let val x = "a" :: nil in
let val y = "b" :: x in
x :: (tl y) end end
= let val x = "a" :: nil in
x :: (tl ("b" :: x) end
= let val x = "a" :: nil in
x :: x end
= ("a" :: nil) :: ("a" :: nil)
===================================================
Sequencing does not matter when there are no assignments.
let definitions can be embedded,
like this:
tl (let x = nil in x :: (let y = "a" in y :: x end) end)
which computes to
===================================================
= tl (let x = nil in x :: ("a" :: x) end)
= tl (nil :: ("a" :: nil))
= "a" :: nil
===================================================
Here is a more delicate example, where we redefine x in nested
blocks. (This is similar
to writing a procedure that contains a local variable of the same
name as a global variable.)
===================================================
let val x = "a" in
let val y = x :: nil in
let val x = nil in
y :: x
end
end
end
===================================================
If we substitute carelessly, we get into trouble!
Say we substitute for y first and appear
to get
let val x = "a" in
let val x = nil in
(x :: nil) :: x
end
end
This is incorrect --- it confuses the two definitions of x and
there is no way we will obtain the correct answer,
("a" :: nil) :: nil.
The example displays a famous problem that dogged
19th-century logicians. The solution is, when we substitute
for an identifier, we never allow a clash of two definitions ---
we must rename the inner definition, if necessary.
In the earlier example, if we substitute for y first, then
we must eliminate the clash between the two defined xs by renaming the
inner x:
===================================================
let val x = "a" in
let val y = x :: nil in
let val x' = nil in
y :: x'
end
end
end
===================================================
Now the substitution proceeds with no problem.
We can define substitutation and renaming precisely with equations.
The equations cover all possible cases of substitution and only the last
3 equations are interesting:
===================================================
let val I = E1 in E2 end = [ E1 / I ] E2
where
[ E0 / I ] A = A # atoms are left alone
[ E0 / I ] nil = nil # so is nil
[ E0 / I ] E1 :: E2 = [ E0 / I ]E1 :: [ E0 / I ]E2 # substitute into both parts
[ E0 / I ] hd E = hd [ E0 / I ]E # substitute into the subexpression...
[ E0 / I ] tl E = tl [ E0 / I ]E
[ E0 / I ] ifnil E1 then E2 else E3 =
ifnil [ E0 / I ]E1 then [ E0 / I ]E2 else [ E0 / I ]E3
[ E0 / I ] let D in E end = let [ E0 / I ]D in [ E0 / I ]E end
[ E0 / I ] I = E0 # replace I
[ E0 / I ] I' = I' if I' not= I # don't alter a var different than I
[ E0 / I ] val I' = E = val I' = [ E0 / I ]E if I not= I' and I' does not appear in E0
# that is, if there is no name clash
[ E0 / I ] val I = E = val I = E # do nothing because I is redefined here
[ E0 / I ] val I' = E = val I'' = [ E0 / I ][ I'' / I' ]E
if I not= I' and I' appears in E0 and I'' is a new name that does not appear in either E0 or E
# if there is a name clash, rename I' to some new I''
===================================================
The last equation makes clear that a name clash is repaired by inserting
an extra substitution to replace the name that clashes.
A machine based on equation rewriting can apply the above substitution laws without problem. Still, the equations are defining an expression-tree traversal algorithm that it is easy to implement.
=================================================== ifnil nil then E1 else E2 = E1 ifnil (E1 :: E2) then E3 else E4 = E4 hd (E1 :: E2) = E1 tl (E1 :: E2) = E2 let val I = E1 in E2 end = [ E1 / I ] E2 ===================================================to an operator tree in the functional language until an answer appears. You have built the tree-rewriting machine shown in the Introduction section! Such a machine would represent the program as an operator tree and would repeatedly search the tree, top down, to find a subtree that matched on an equation. For example, the program
let val x = "a" :: nil in ifnil x then nil else (hd x) :: x endwould compute like this:
Since this is a data structures language, computation is moving links between substructures. This makes substitution (e.g., let val x = ... in ...) a matter of moving links and not copying code. Here, the answer tree is "a" :: ("a" :: nil). This is the approach used to implement the Haskell language.
There is another approach --- Recall from Chapter 1
that function interpret traversed an expression
tree and computed its meaning by computing and combining the meanings of its
subtrees. In picture form, interpret computed like this:
This
traversal technique was first
invented for computing Lisp programs, and we now use it here.
Our interpreter reads the programmer as an operator tree and uses heap storage. When the interpreter computes the meaning of a subtree, it constructs the meaning in the heap from two-celled objects, called cons cells. Each let subtree constructs a namespace. Once an object is constructed in heap storage, it is never altered. This allows massive, natural, sharing of data structures and eliminates sequencing, sharing, and race errors.
Here is how the earlier program computes:
Now, interpret "a" :: nil creates a cons cell in the heap:
The cell's handle binds to x in a new namespace for the let, and
the namespace is used to interpret the meaning of ifnil:
Select the appropriate arm:
and compute the meaning of (hd x) :: x, which looks up x twice,
does a head operation, and builds a cons cell to represent the final
answer:
The answer, handle γ, is printed as "a" :: ("a" :: nil).
Throughout the computation, α is used in multiple places
to represent the structure, "a" :: nil. The sharing is safe because
α's cons cell is never updated after it is first constructed.
Here is the syntax of operator trees; it closely matches
the source syntax:
===================================================
ETREE ::= ATOM | ["nil"] | ["cons", ETREE, ETREE] | ["head", ETREE]
| ["tail", ETREE] | ["ifnil", ETREE1, ETREE2, ETREE3]
| ["let", DTREE, ETREE] | ["ref", ID]
DLIST ::= [ [ID, ETREE]+ ]
that is, a list of one or more [ID, ETREE] pairs
ATOM ::= a string of letters
ID ::= a string of letters, not including keywords
===================================================
For example, the expression,
let val x = "abc" :: nil
val y = ifnil x then nil else hd x
in
y :: x
end
has this operator tree:
["let", [["x", ["cons", "abc", ["nil"]]],
["y", ["ifnil", ["ref", "x"],
["nil"],
["head", ["ref", "x"]] ]]],
["cons", ["ref", "x"], ["ref", "y"]]]
Here is the code for the interpreter:
===================================================
"""Interpreter for functional language with cons and simple let.
Here is the syntax of operator trees interpreted:
ETREE ::= ATOM | ["nil"] | ["cons", ETREE, ETREE] | ["head", ETREE]
| ["tail", ETREE] | ["ifnil", ETREE1, ETREE2, ETREE3]
| ["let", DLIST, ETREE] | ["ref", ID]
DLIST ::= [ [ID, ETREE]+ ]
that is, a list of one or more [ID, ETREE] pairs
ATOM ::= a string of letters
ID ::= a string of letters, not including keywords
"""
### HEAP:
heap = {}
heap_count = 0 # how many objects stored in the heap
"""The program's heap --- a dictionary that maps handles
to namespace or cons-pair objects.
heap : { (HANDLE : NAMESPACE or CONSPAIR)+ }
where
HANDLE = a string of digits
NAMESPACE = {IDENTIFIER : DENOTABLE} + {"parentns" : HANDLE or "nil"}
that is, each namespace must have a "parentns" field
CONSPAIR = (DENOTABLE, DENOTABLE)
DENOTABLE = HANDLE or ATOM or "nil"
ATOM = string of letters
Example:
heap = { "0": {"w": "nil", "x": "ab", "parentns": "nil"},
"1": {"z":"2", "parentns":"0"},
"2": ("ab","nil"),
"3": {"y": "4", "parentns":"1"},
"4": ("2","2")
}
heap_count = 5
is an example heap, where handles "0","1","3" name namespaces
which hold definitions for w,x,z,y, due to let expressions.
Handles "2" and "4" name cons-cells that are constructed due to
a use of cons.
This example heap might have been constructed by this expression:
let val w = nil
val x = "ab" in
let z = x :: w in
let y = z :: z in
...
The values computed are w = [], x = "ab", z = ["ab"], y = [["ab"], "ab"]
"""
### ASSOCIATED MAINTENANCE FUNCTIONS FOR THE heap:
def isHandle(v) :
"""checks if v is a legal handle into the heap"""
return isinstance(v, str) and v.isdigit() and int(v) < heap_count
def allocate(value) :
"""places value into the heap with a new handle
param: value - a namespace or a pair
returns the handle of the newly saved value
"""
global heap_count
newloc = str(heap_count)
heap[newloc] = value
heap_count = heap_count + 1
return newloc
def deref(handle):
""" looks up a value stored in the heap: returns heap[handle]"""
return heap[handle]
### MAINTENANCE FUNCTIONS FOR NAMESPACES:
def lookupNS(handle, name) :
"""looks up the value of name in the namespace named by handle
If name isn't found, looks in the parentns and keeps looking....
params: a handle and an identifier
returns: the first value labelled by name in the chain of namespaces
"""
if isHandle(handle):
ns = deref(handle)
if not isinstance(ns, dict):
crash("handle does not name a namespace")
else :
if name in ns :
ans = ns[name]
else : # name not in the most local ns, so look in parent:
ans = lookupNS(ns["parentns"], name)
else :
crash("invalid handle: " + name + " not found")
return ans
def storeNS(handle, name, value) :
"""stores name:value into the namespace saved at heap[handle]
"""
if isHandle(handle):
ns = deref(handle)
if not isinstance(ns, dict):
crash("handle does not name a namespace")
else :
if name in ns :
crash("cannot redefine a bound name in the current scope")
else :
ns[name] = value
#########################################################################
# See the end of program for the driver function, interpretPTREE
def evalETREE(etree, ns) :
"""evalETREE computes the meaning of an expression operator tree.
ETREE ::= ATOM | ["nil"] | ["cons", ETREE, ETREE] | ["head", ETREE]
| ["tail", ETREE] | ["ifnil", ETREE1, ETREE2, ETREE3]
| ["let", DLIST, ETREE] | ["ref", ID]
post: updates the heap as needed and returns the etree's value,
"""
def getConspair(h):
"""dereferences handle h and returns the conspair object stored
in the heap at h
"""
if isHandle(h):
ob = deref(h)
if isinstance(ob, tuple): # a pair object ?
ans = ob
else :
crash("value is not a cons pair")
else :
crash("value is not a handle")
return ans
ans = "error"
if isinstance(etree, str) : # ATOM
ans = etree
else :
op = etree[0]
if op == "nil" :
ans = op
elif op == "cons" :
arg1 = evalETREE(etree[1], ns)
arg2 = evalETREE(etree[2], ns)
ans = allocate((arg1,arg2)) # store new conspair in heap
elif op == "head" :
arg = evalETREE(etree[1], ns)
ans, tail = getConspair(arg)
elif op == "tail" :
arg = evalETREE(etree[1], ns)
head, ans = getConspair(arg)
elif op == "ifnil" :
arg1 = evalETREE(etree[1], ns)
if arg1 == "nil" :
ans = evalETREE(etree[2], ns)
else :
ans = evalETREE(etree[3], ns)
elif op == "let" :
newns = evalDLIST(etree[1], ns) # make new ns of new definitions
ans = evalETREE(etree[2], newns) # use new ns to eval ETREE
# at this point, newns isn't used any more, so forget it!
elif op == "ref" :
ans = lookupNS(ns, etree[1])
else : crash("invalid expression form")
return ans
def evalDLIST(dtree, ns) :
"""computes the meaning of a sequence of new definitions and stores
the ID, meaning bindings in a new namespace
DTREE ::= [ [ID, ETREE]+ ]
returns a handle to the new namespace
"""
newns = allocate({"parentns": ns}) # create the new ns in the heap
for bindingpair in dtree : # add all the new bindings to the new ns
name = bindingpair[0]
expr = bindingpair[1]
value = evalETREE(expr, newns)
storeNS(newns, name, value)
return newns
###########################
def crash(message) :
"""pre: message is a string
post: message is printed and interpreter stopped
"""
print message + "! crash! ns=", ns, "heap=", heap
raise Exception # stops the interpreter
def prettyprint(value):
if isHandle(value):
v = deref(value)
if isinstance(v, tuple):
ans = "(cons " + prettyprint(v[0]) + " " + prettyprint(v[1]) +")"
else :
ans = "HANDLE TO DICTIONARY AT " + v
else :
ans = value
return ans
# MAIN FUNCTION ###########################################################
def evalPGM(tree) :
"""interprets a complete program
pre: tree is an ETREE
post: final values are deposited in heap
"""
global heap, heap_count
# initialize heap and ns:
heap = {}
heap_count = 0
ans = evalETREE(tree, "nil")
print "final answer =", ans
print "pretty printed answer =", prettyprint(ans)
print "final heap ="
print heap
raw_input("Press Enter key to terminate")
===================================================
Install the above code and run it with Python.
Try at least these test cases:
evalPGM( ["cons", "abc", ["nil"]] )
evalPGM( ["head", ["cons", "abc", ["nil"]]] )
evalPGM( ["ifnil", ["nil"], ["head", ["cons", "abc", ["nil"]]], "def"] )
evalPGM( ["let", [["x", "abc"]], ["cons", ["ref", "x"], ["nil"]]] )
evalPGM( ["let", [["x", ["cons", "abc", ["nil"]]]],
["cons", ["ref", "x"], ["ref", "x"]]])
evalPGM( ["let", [["x", ["cons", "abc", ["nil"]]], ["y", "nil"]],
["cons", ["ref", "x"], ["ref", "y"]]] )
Study the heap layouts as well as the answers computed for each case.
One important point: there is no activation stack in the interpreter. Instead, each interpretTREE function is parameterized on the handle of the namespace it should use for doing variable lookups. This technique accomplishes the same work of an activation stack without all the coding overhead --- the stack is ``threaded'' through the sequence of function calls.
Definitions can be parameterized;
the result are functions:
===================================================
E ::= ... | [ E* ] | let D in E end | I | I(E*)
D ::= val I = E | fun I1(I2*) = E | D1 D2
===================================================
where E* means zero or more expressions, separated by commas
and I* means zero or more identifiers, separated by commas.
When a function is defined, its code (expression)
is saved. When the function is called, its arguments
bind to its parameters, and the function's code evaluates.
Here is a small example:
===================================================
let fun second(list) = let val rest = tl list
in hd rest
in
second(tl("a" :: ("b" :: ( "c" :: nil))))
===================================================
The argument binds
to the parameter name when the function is called.
The rewriting might go like this:
(We have a better way to do it in a moment!)
===================================================
second(tl("a" :: ("b" :: ( "c" :: nil))))
= let val list = tl("a" :: ("b" :: ( "c" :: nil))) in # bind the arg to the param
let val rest = tl list in
hd rest
= let val rest = tl( tl("a" :: ("b" :: ( "c" :: nil)))) in
hd rest
= hd ( tl( tl("a" :: ("b" :: ( "c" :: nil)))))
= ... = "c"
===================================================
Since there are no assignments, it is unimportant when we compute
the value named by the parameter.
The previous example did not implement the function call in precisely the style we have used so far. In particular, we should substitute the code for second into the position where second is referenced. Let's see where this leads us.
The definition,
fun second(list) = let val rest = tl list
in hd rest end
can be written as an ordinary val-definition like this, by moving
the parameter name to the right of the equals sign:
val second = (list) let val rest = tl list
in hd rest end
It is a tradition to place the word, lambda, in front of the
parameter name, (list), so that the reader can identify it clearly:
===================================================
val second = lambda list : let val rest = tl list
in hd rest end
===================================================
second is the same function, just formatted a little differently.
Now it is clear that second is the name
of the function code,
lambda list : let val rest = tl list in hd rest end.
This construction is called a lambda abstraction or
an anonymous function.
(Important: in ML, the lambda abstraction is coded, fn list => let val rest = tl list in hd rest end.)
The lambda expression comes with this semantic equation, which is a variant
of the one we use for let:
===================================================
(lambda I : E1) E2 = [ E2 / I ] E1
===================================================
Let's rework the previous example using the lambda abstraction:
let val second = lambda list: let val rest = tl list in hd rest end
in second(tl("a" :: ("b" :: ( "c" :: nil))))
end
We substitute for second and then bind the argument to the parameter:
= (lambda list: let val rest = tl list in hd rest end)(tl("a" :: ("b" :: ( "c" :: nil)))
= let val rest = tl (tl("a" :: ("b" :: ( "c" :: nil)))) in hd rest
= hd( tl (tl("a" :: ("b" :: ( "c" :: nil)))) )
= ... = "c"
The expression, lambda list: ..., is the function code, divorced
from its name. We copied the function code into the position where
the function is called. This is how substitution is
supposed to work: you replace an identifier by the value it names.
Using the new equation, for lambda, we bind the argument to its
parameter name, and
everything works smoothly.
Here is the ``minimal form'' of our functional language:
===================================================
E ::= ... | let D in E end | I | E1(E2*) | lambda I*: E
D ::= val I = E | D1 D2
===================================================
A function body, lambda I*: E, is an expression,
just like nil or E1 ::E2, because it can appear as part of
val I = E or even anonymously within
an expression, e.g.,
"a" :: (lambda x: (x :: x))(nil)
= "a" :: (nil :: nil)
The nameless function is
called ``lambda abstraction'' because it is a kind of
abstract, a naming device, for the parameter.
The lambda abstraction has a long, rich history, extending to the
debates in 19th-century philosophy that let to the development
of modern set theory and predicate logic. It also happens to be
quite useful for computation!
Now the language's characteristic domains go as follows: the expressible values are atoms, lists, and functions on expressibles. The denotable values are exactly the expressibles. There are still no storable (updateable) values.
As before, a closure object is a pair, consisting of function-code-plus-parent-namespace-pointer. A couple of pictures will make this clear.
For this sample program:
===================================================
. . .
let val second = lambda list: hd (tl list)
in let val x = "a" :: ("b" :: nil)
in second(x) :: x
end end
===================================================
The heap layout at the point where second is called
looks like this:
second's value, saved in namespace β, is the closure object
at handle γ. The closure remembers the code for the function
along with a link to its global names.
When second is called, namespace τ is created to hold its
parameter, list:
Once second returns its answer, "b" (because the tail of cons cell
κ is ε, and the head of its cons cell is "b"),
the current namespace reverts to δ, which lets us compute the
answer, the handle to a cons cell that holds "b" and κ:
The answer is called it in ML. Here, it is the handle to a list.
It is easy to recode the interpreter in the earlier section to handle
this form of call. Note again that we don't require an activation
stack --- we parameterize the interpretTREE functions on the handle
of the current namespace used for lookups. That's it.
A function can restart itself by looking up its own definition; this is called recursion. (You can see this in the previous picture: the code for second can refer to itself if it wishes.) A recursive call executes exactly the same way as any other function call; no new machinery is needed.
Recursion with parameters can substitute for assignment and iteration.
This example,
x = 0
while x < 100 :
print x
x = x + 1
end
depends on destructive assignment in the loop body to work.
But it is not critical to have destructive assignment once we have
recursively defined functions. In ML, we can write
let fun printloop(x) = if x < 100
then (print(Int.toString(x)); print("\n");
printloop(x+1))
else nil (* done -- do nothing *)
in printloop(0)
end
Parameter passing replaces destructive assignment.
(ML has a print expression that can print a string and returns nil
as its answer. There is also a ; operator that sequences one expression
followed by another.)
Here is a useful ML function that reads a sequence
of text lines from the keyboard and collects the lines into a list.
The function reads one line and restarts itself to
read more lines. It quits when it sees a "!":
===================================================
fun collectText() =
let val txt = TextIO.inputLine TextIO.stdIn
in if hd (explode txt) = #"!" (* if head is "!" *)
then [] (* then quit *)
else txt :: collectText() (* else save and RESTART *)
end
===================================================
Note: some implementations require this wordier version:
===================================================
(* collectText reads a sequence of text lines from the input terminal
and assembles them into a list of strings. It quits when it sees
a "!" as the first character of a textline.
It returns the list of strings as its answer.
*)
fun collectText() =
let val t = TextIO.inputLine TextIO.stdIn (* read the line *)
in if isSome(t) (* if line is nonempty *)
then let val txt = valOf(t) (* then pull out its text *)
in if hd (explode txt) = #"!" (* if head char is "!" *)
then [] (* then quit *)
else txt :: collectText() (* else save txt and RESTART *)
end
else []
end;
===================================================
Try this function in ML, and you will see that it reads a sequence like
hello there.
123
!
and returns the list, ["hello there.\n", "\n", "123\n"].
The function restarts itself each time to look for another line of user
input.
The ML instructions for textual input are ugly, so here is the recursion
pattern again, this time to build a list of
the first k+1 powers of 2.
For example, powers(3) computes and returns the list, [8,4,2,1]:
The example shows how recursive calls assemble a data structure
in stages, from ``back to front'':
===================================================
(* powers builds of list of powers of two from 2**k down to 2**0
param: k the upper bound, a nonnegative int
returns: a list, [2**k, 2**(k-1) ..downto.. 2**0]
*)
fun powers(k) =
if k = 0
then [1] (* because 2**0 = 1 *)
else let val answers = powers(k - 1) in
(* assert: answers holds [2**(k-1) ..downto.. 2**0] *)
(2 * (hd answers)) :: answers (* cons 2**k to answers *)
end
;
===================================================
Here is a sample calculation:
===================================================
powers(3)
= let val answers = powers(2) in
in (2 * (hd answers)) :: answers
end
= let val answers = (let val answers = powers(1)
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end
= let val answers = (let val answers = (let val answers = powers(0)
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end
===================================================
The recursive calls generate fresh copies of the function (in the implementation,
new activation namespaces are constructed) and build on the answer
from powers(0) to build the answer from powers(1), etc.
To finish:
===================================================
= let val answers = (let val answers = (let val answers = [1]
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end
= let val answers = (let val answers = (2 * (hd [1])) :: [1]
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end
= let val answers = (let val answers = [2,1]
in (2 * (hd answers)) :: answers
end)
in (2 * (hd answers)) :: answers
end
= ...
= let val answers = [4,2,1]
in (2 * (hd answers)) :: answers
end
= [8,4,2,1]
===================================================
Perhaps we want the numbers in ascending order. We reverse a list in ML
like this, again using recursion to build the answer in
stages:
===================================================
(* reverse reverses the elements in a list.
param: ns - a list, e.g, ["c","b","a"]
returns: a list that holds the items of ns in reverse order, e.g., ["a","b","c"]
*)
fun reverse(ns) =
if ns = []
then []
else reverse(tl ns) @ [hd ns] (* in ML, @ means list append *)
===================================================
So, reverse(powers(3)) computes to [1,2,4,8]. Calculate this
with equations and run it on the computer.
The recursion was done with the tail of the argument list, that is, with a list that is one smaller than the original argument. In this way, we ``count down'' (disassemble) the list down to an empty one, which stops the recursions.
In ML,
we can define functions on lists in an equational
style, with parameter patterns, like this:
===================================================
(* reverse reverses the elements in a list.
param: ns - a list, e.g, ["c","b","a"]
returns: a list that holds the items of ns in reverse order, e.g., ["a","b","c"]
*)
fun reverse(nil) = []
| reverse(n :: ns) = reverse(ns) @ [n]
===================================================
The if, hd, and tl are automatically computed by matching
the structure of the argument to the two possible equations for
computing the argument's answer.
Finally, people who like loops often write this variant of list
reverse. The second parameter is called an accumulator, because
it accumulates the answer in stages:
===================================================
(* reverseloop(ns, ans) reverses the items in list ns and appends them
to the end of ans.
params: both ns and ans are lists.
returns: a list that holds the elements of ans followed by
the elements of ns in reverse order.
To use the function to reverse a list, x, do this: reverseloop(x, []).
*)
fun reverseloop(nil, ans) = ans
| reverseloop(n::ns, ans) = reverseloop(ns, n :: ans)
===================================================
Using parameter patterns, we can easily write a function that searches
a list for a value:
===================================================
(* member(v, xs) searches list xs to see if v is a member in it.
params: v - a value; xs - a list of values
returns: true exactly when v is found in xs
*)
fun member(v, nil) = false
| member(v, (w::rest)) = if v = w then true
else member(v, rest)
===================================================
The function searches the list from front to back.
Here is a small example, where a user can issue update, lookup, and undo
commands to a database that holds key,value pairs. Notice that the database
is actually a handle to a list, assembled from cons cells, in the heap.
An update adds a new cons cell to the
database, and an undo resets the database's handle:
===================================================
(* A database of (key,value) pairs, modelled as a list of form,
(k1,v1) :: ((k2,v2) :: ... :: nil))
*)
(* Auxiliary function update adds a new key,value pair to the database.
The new pair cancels any existing pair with the same key.
params: key, value
returns: the (handle to) the updated database
*)
fun update(key, value, database) =
let val newdatabase = (key, value) :: database
in newdatabase
end
(* Auxiliary function lookup finds the value corresponding to a key in the database.
params: key, database
returns: the value such that (key,value) lives in the database
*)
fun lookup(k, database) =
if database = []
then raise Empty (* error --- empty database *)
else let val (key,value) = hd database
in if key = k (* is desired value the most recent update? *)
then value
else lookup(k, tl database) (* if not, look deeper... *)
end
(* Main function processTransaction is a "loop" that reads user requests and
processes the database accordingly. The requests are either:
-- update key value
-- lookup key
-- undo most previous update
Params: database: (the handle to) the current database
history: a list of handles to previous databases
*)
fun processTransaction(database, history) =
(* 3 lines of ugly ML code to read one textline: *)
let val text = TextIO.inputLine TextIO.stdIn
in if isSome(text)
then let val request = valOf(text) in
(* now, we decode and process the request: *)
...extract command, key, value, etc. from request...
if command = "update"
then let val newDatabase = update(key, value, database)
in (print("update transaction\n");
processTransaction(newDatabase, database :: history)
)
end
else if command = "lookup"
then (print("lookup transaction\n");
print (lookup(key, database));
processTransaction(database, history)
)
else if command = "undo"
then (print("undo transaction\n");
processTransaction(hd history, tl history)
)
else raise Empty (* bad command *)
end
else print "End of Session";
... code here to archive the database on disk ...
end
===================================================
You can see that processTransaction loops, always remembering
the handle to the current value of the database plus keeping a list
of handles to previous versions of the database in case a rollback is
necessary. The rollback step is beautifully simple: reset the handle
to the current database back to the handle of the previous version of the
database. This technique works because there is no assignment that might
alter a value in the heap once the value is stored there!
Of course, Amazon or Google do not use simple list implementations of their databases. Instead, spelling trees ("tries") or hash tables are extended with cons-cell-style update. We look at trees and other structures in the next section.
Other languages, like ML and Haskell, use a Pascal/Java-like type system, where each value has a specific type; all the elements of a list must have the same type; and only values of the same type can be compared for equality.
Let's look at ML, which uses a type checker.
Here is the syntax of types in ``core ML'':
===================================================
T: Type
T ::= string | T list
===================================================
For example,
"a" has type string
"a" :: nil, which evaluates to ["a"], has type string list
nil :: (("a" :: nil) :: ("a" :: ("b" :: nil))),
which evaluates to [[], ["a"], ["a","b"]], has type (string list) list
These requirements are formalized by logic-rule typing laws, defined
on the syntax of the language:
===================================================
A : string E1 : T E2 : T list
------------------------
E1 :: E2 : T list
E : T list E : T list
----------- ---------------
hd E : T tl T : T list
E1 : T list E2 : T' E3 : T'
---------------------------------
if E1 = nil E1 then E2 else E3 : T'
===================================================
These laws are coded into the ML type checker/interpreter.
When the ML interpreter
examines an expression, it uses the laws to calculate the data type
of the expression. It then computes the meaning of the expression.
So, if you start the ML interpreter and type,
- nil :: (("a" :: nil) :: ("a" :: ("b" :: nil)));
the response is
val it = [[], ["a"], ["a","b"]] : (string list) list
One question remains: What is the type of nil?
The answer is, T' list, for any type T' whatsoever.
So, nil has ``many types,'' depending on where it is inserted
in an expression. (See the earlier examples.)
For this reason, its ``typing rule'' goes:
===================================================
nil : T List
===================================================
where you can fill in T as you wish.
(ML would print val it = [] : 'a list, using 'a as the dummy type name.)
When you define an ML function, the type checker checks the
function's code and calculates the type. The interpreter constructs
the function's closure and shows the type, e.g.,
- fun double(n) = n * 2;
val double = fn : int -> int
The function's type is int -> int, that is,
an int argument is required to produce an int answer.
Sometimes an function can be used with arguments of different types,
e.g.,
- fun second(xs) = hd(tl xs);
val second = fn : 'a list -> 'a
The function can be used as second [1,2,3] or as second["a","b","c"]
or as
second( nil :: (("a" :: nil) :: ("a" :: ("b" :: nil))) ).
If we have types, we can
can have ``type abstracts,'' where we give names to types.
This idea was used brilliantly by Rod Burstall in
the language, Hope and adapted by Luca Cardelli into the modern version
of ML, now called SML ("Standard ML").
Here is a type abstract that defines
a data type of binary trees that hold ints at their nodes:
===================================================
datatype IntTree = Leaf | Node of int * IntTree * IntTree
===================================================
The names Leaf and Node are constructors,
for constructing trees,
just like nil and :: are constructors for lists.
Here are some expressions that have data type IntTree:
===================================================
Leaf which represents a leaf tree, *
Node(2, Node(1, Leaf, Leaf), Node(5, Leaf, Leaf)) which represents 2
/ \
1 5
* * * *
let val t = Node(1, Leaf, Leaf) in
let val u = Node(5, t, t) in
in Node(3, t, u)
end end which represents 3
/ \
1 5
* * / \
1 1
* * * *
or, for that matter, represents 3
/ \
+--> 1 5
| * * / \
|________|__|
because the implementation shares substructure.
===================================================
Because the type is defined in terms of itself (that is, trees can hold
other, smaller trees), it is called inductively defined.
This means we can assemble trees of arbitrary depth, just like
we can construct lists of arbitrary length.
We can use parameter patterns defined by a datatype. Say that we
use IntTree to build ordered binary trees. Here is a tree-search algorithm
expressed with patterns:
===================================================
(* member(i, t) searches ordered IntTree t for int i *)
fun member(i, Leaf) = false
| member(i, Node(j, t1, t2)) = if i = j then true
else if i < j then member(i, left)
else member(i, right)
;
val member = fn : int * IntTree -> IntTree
===================================================
and here is a function that collects the ints embedded in a tree:
===================================================
(* collect(t) returns a list holding all the ints in IntTree t *)
fun collect(Leaf) = []
| collect(Node(i, left, right)) = collect(left) @ [i] @ collect(right)
;
val collect = fn : IntTree -> int list
===================================================
Perhaps most important, here is the function that inserts an int into an ordered
tree:
===================================================
(* insert(i, t) inserts i into ordered tree t
pre: i is an int; t is an IntTree whose nodes are ordered
post: returns an ordered IntTree containing t's values and i
*)
fun insert(i, Leaf) = Node(i, Leaf, Leaf)
| insert(i, Node(j, left, right)) =
if i < j then Node(j, insert(i, left), right)
else Node(j, left, insert(i, right))
;
val insert = fn : int * IntTree -> IntTree
===================================================
Notice that an insertion constructs a tree with a new root node and a new "spine"
along the path of insertion
and with shared structure of all the parts unaffected by the insertion.
User-defined datatypes define schemas for data-structure building,
much like classes do for object-oriented programming. Here are some
types for a library's database, modelled as a list of entries of books
and DVDs:
datatype Item = Book of string * string (* Book(title,author) *)
| Dvd of string (* Dvd(title) *)
datatype DBEntry = Entry of int * Item (* Key(idnumber, item) *)
type Database = DBEntry list
Datatypes work great for defining tree-like structures that mix
strings, ints, subtrees, lists, etc.
If you review Chapter 1 on grammars, interpreters, and parsers,
you see inductively-defined data types everywhere.
For example, for this syntax of expressions:
EXPR ::= DIGIT | - EXPR | ( EXPR + EXPR )
A parser might generate operator trees of this datatype:
===================================================
datatype ETree = Digit of char | Negation of Etree | Addition of Etree * Etree
===================================================
The function that interprets ETrees is short and sweet:
===================================================
(* interpretETree computes the int meaning of its ETree argument *)
fun interpretETree(Digit c) = (ord(c) - ord(#"0"))
| interpretETree(Negation(t)) = -(interpretETree(t))
| interpretETree(Addition(t1,t2)) = interpretETree(t1) + interpretETree(t2)
;
val interpretETree = fn : ETree -> int
===================================================
Using ML-style datatypes, we can write a language's interpreter in about
as many lines as the number of the language's operator-tree constructions.
Compare this to the bulky interpreters we must write in Java or C or even
in Python or Perl.
It is easy to use parameter patterns to write exactly these structures
for an inductively defined data type. Here are three sample instances
of these control structures
for binary trees of values:
===================================================
datatype 'a Tree = Leaf | Node of 'a * Tree * Tree
fun map(f, Leaf) = Leaf
| map(f, Node(a, t1, t2)) = Node(f(a), map(f, t1), map(f, t2))
fun filter(b, Leaf) = []
| filter(b, Node(a, t1, t2)) = (if b(a) then [a] else [])
@ filter(f, t1) @ filter(f, t2)
fun reduce(r, startvalue, Leaf) = startvalue
| reduce(r, startvalue, Node(a, t1, t2)) =
let v1 = reduce(r, startvalue, t1) in
let v2 = reduce(r, v1, t2) in
r(a, v2)
end end
===================================================
Notice that f, b, and r are functions that are arguments
to the control structures.
ML has built-in versions of map and reduce (two variants: foldl and foldr) for linear lists.
The computation-as-algebra concept fails with imperative languages,
because identifiers denote variables whose
stored values change from
line to line. For example, if we have an imperative
program, like this one,
int x;
int y;
x = 0;
if ... {
x = x + 1 }
else {
x = 2
}
y = x;
it makes absolutely no sense to use x = 0 to
``substitute'' 0 for ``all occurrences''
of x in the rest of the program.
Instead, we must execute the program with an instruction
counter and primary storage --- there is no way to understand
the program without the storage it manipulates, and
there is no way to ``calculate''
the program's meaning by substitution.
In summary, imperative languages work with storage structures that are incrementally updated, for example, a voting table or a graphics system that maintains the pixels on a display.
In contrast, functional languages solve self-contained problems that assemble inputs into a data structure and then process the data structure into an answer. An example is a compiler, which translates an input program into a tree data structure and then processes the tree into output code. Or, a batch payroll system that converts a file of payroll information into a file of paycheks. Or, a library of numerical functions that compute answers to questions in physics or biology.
When you face a problem that can be solved by equational-style calculation, solve it with a functional language.