Take-Home Final Exam
Due by 2 p.m., Tuesday, December 13. Late exams will not be
accepted. You may use the text and your notes, but you must not
discuss these problems with other students. If you have any questions,
please see me or send me e-mail (bhoward@cis). Note that this
policy differs from that used on the homeworks; all work on this exam
must be your own.
- Consider the following regular expression for valid format
strings for the printf family of functions (adapted from the
ANSI C standard):
([^%]° % [-+ 0#]° ([1-9] [0-9]° | * | e)
(. [0-9]° | . * | e) ([hlL] | e)
[cdefginopsuxEGX%])° [^%]°
Note the difference between the typewriter-style *, which is
the asterisk character, and the superscript ° (the closest
character I could find in HTML), which is the Kleene star of regular
expressions. Also observe that we are using Unix-style character sets,
such as [^%] and [1-9] (which stand for all the
characters except % and the digits from 1 to
9, respectively); you may treat these as single ``characters''
in what follows.
- Use Thompson's Construction and the Subset Construction (if
necessary) to convert this into a DFA.
- Present this DFA as a (regular) grammar by assigning a nonterminal
to each node and including the production X --> aY for each
edge from X to Y labelled with a; the
starting node will be the starting nonterminal, and for every accepting
node X you should also add the production
X --> e.
- Is the resulting grammar in LL(1) form? Justify your answer;
for extra credit, formulate a general theorem about the form of grammars
converted from DFAs.
- Rearrange the grammar into LL(1) form (if necessary) and give a
pseudocode description of a recursive descent parser for the language of
format strings.
- Why might you not want to construct a production-quality lexer by
following the above procedure?
- Augment the following modified subset of the PSub grammar with
semantic actions to implement an interpreter. You may assume that the
only identifiers will be the single-character variables a
through z; your actions should operate on a store of 26
locations also labelled a through z (which you should
initialize to zero at the beginning). We will follow the FORTRAN
convention of taking the variables i through n to be
integers, and all the rest reals. Part of your job is also to design
an appropriate set of auxiliary functions and provide documentation as
to what they do. You do not need to convert the result to LL(1) form.
S --> S id := expression
| epsilon
expression
--> expression add_operator term
| add_operator term
| term
add_operator
--> + | -
term
--> term mul_operator factor
| factor
mul_operator
--> * | / | div | mod
factor
--> id
| num
| ( expression )