Take-Home Final Exam

Due by 2 p.m., Tuesday, December 13. Late exams will not be accepted. You may use the text and your notes, but you must not discuss these problems with other students. If you have any questions, please see me or send me e-mail (bhoward@cis). Note that this policy differs from that used on the homeworks; all work on this exam must be your own.

Consider the following regular expression for valid format strings for the printf family of functions (adapted from the ANSI C standard):
([^%]° % [-+ 0#]° ([1-9] [0-9]° | * | e) (. [0-9]° | . * | e) ([hlL] | e) [cdefginopsuxEGX%])° [^%]°
Note the difference between the typewriter-style *, which is the asterisk character, and the superscript ° (the closest character I could find in HTML), which is the Kleene star of regular expressions. Also observe that we are using Unix-style character sets, such as [^%] and [1-9] (which stand for all the characters except % and the digits from 1 to 9, respectively); you may treat these as single ``characters'' in what follows.
1. Use Thompson's Construction and the Subset Construction (if necessary) to convert this into a DFA.
2. Present this DFA as a (regular) grammar by assigning a nonterminal to each node and including the production X --> aY for each edge from X to Y labelled with a; the starting node will be the starting nonterminal, and for every accepting node X you should also add the production X --> e.
3. Is the resulting grammar in LL(1) form? Justify your answer; for extra credit, formulate a general theorem about the form of grammars converted from DFAs.
4. Rearrange the grammar into LL(1) form (if necessary) and give a pseudocode description of a recursive descent parser for the language of format strings.
5. Why might you not want to construct a production-quality lexer by following the above procedure?
Augment the following modified subset of the PSub grammar with semantic actions to implement an interpreter. You may assume that the only identifiers will be the single-character variables a through z; your actions should operate on a store of 26 locations also labelled a through z (which you should initialize to zero at the beginning). We will follow the FORTRAN convention of taking the variables i through n to be integers, and all the rest reals. Part of your job is also to design an appropriate set of auxiliary functions and provide documentation as to what they do. You do not need to convert the result to LL(1) form.
```
S --> S id := expression
    | epsilon
expression
  --> expression add_operator term
    | add_operator term
    | term
add_operator
  --> + | -
term
  --> term mul_operator factor
    | factor
mul_operator
  --> * | / | div | mod
factor
  --> id
    | num
    | ( expression )
```