Programming Project 1

The first programming assignment is to write a lexical analyzer for PSub, a subset of Pascal described in part below (based on a language described in the Appendix of Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986. More of the language will be described in the next assignment). You should structure your lexer as a function getlex which takes no arguments and returns the token code of the next lexeme found on the standard input stream; if there are no more tokens, then it should return a code of zero. The function should also put the lexeme corresponding to the token (that is, the actual string of characters from the input) in the global variable lexval; if you prefer, this string may be returned through a variable parameter of getlex or through some form of multiple-value return if your implementation language permits. The main program that you submit for this project should repeatedly call getlex until it returns a zero, printing out one line consisting of the token code followed by the lexeme for each token found in the input. For example, if the input consists of the following PSub program

program test;
var e: real;
begin
    e:=3.1416-355/113
end.

then the output should be (assuming I did this right by hand)

3       program
1       test
36      ;
15      var
1       e
35      :
19      real
36      ;
6       begin
1       e
26      :=
2       3.1416
29      -
2       355
30      /
2       113
7       end
38      .

If an error is encountered in the input, the program should print out a brief error message plus the line containing the error, and then quit. Note that for PSub, the only errors that can be detected by lexical analysis are illegal characters (such as !@#$%...), incomplete numbers (``2.'', for example, or `` 5.8E+''), and unclosed comments at the end of the input.

You should submit a printout containing the source code for your lexer function and the main program which uses it, plus the result of running the analyzer on the example program given below (I have placed this program in the file /pub/CIS606/pp1input on the cis machines).

This assignment will be due at the start of class on Friday, September 16. To be fair to the other students, I will not grant extensions; if your program is not working by the deadline, submit what you have and I will try to assign partial credit. You may work in pairs (but not in larger groups); in fact, I would encourage this method if you are comfortable working with someone else, as it means less for me to grade!

As with all programming assignments, it will not be sufficient to simply produce code that works---it must also be adequately structured and documented. You may use any programming language you like on any machine you prefer, as long as there is a reasonable chance that I will be able to read your code (if you want to use something exotic, you should probably ask me first---for example, we would likely both regret it if you decided to use APL). The program should be written ``from scratch''; that is, I do not want you to use compiler-building tools such as the Unix program lex (at least, not yet).

The following is a list of all the tokens of PSub, together with the corresponding codes you should use:

identifier  1 | while    11 | not    21 | (  31
number      2 | do       12 | or     22 | )  32
program     3 | repeat   13 | and    23 | [  33
procedure   4 | until    14 | div    24 | ]  34
function    5 | var      15 | mod    25 | :  35
begin       6 | array    16 | :=     26 | ;  36
end         7 | of       17 | ..     27 | ,  37
if          8 | integer  18 | relop  28 | .  38
then        9 | real     19 | addop  29
else       10 | boolean  20 | mulop  30

Here are the lexical conventions of PSub:

Submit the output of your lexer on this PSub program:

{Example program from page 746 of the Dragon Book}
program example;
var x, y: integer;
function gcd(a, b: integer): integer;
begin
    if b = 0 then gcd := a
    else gcd := gcd(b, a mod b)
end;

begin
    read(x);
    read(y);
    write(gcd(x, y))
end.