CIS 606 Translator Design I Summer 1996

CIS 606: Programming Project 2

Brian Howard

--- ---

Table of Contents

Introduction

The second programming assignment is to construct a predictive parser for PSub, a subset of Pascal based on the language described in the Appendix of the Dragon book. It builds on the lexical analyzer and symbol table handler constructed in Project 1. This ``literate program'' was prepared using noweb.

Main Program

The main program for this phase calls the function ParseProgram() to recognize an entire PSub program from the input (stdin by default, or another file may be specified as a command line argument). At the end, it checks that there is no remaining unparsed input, then exits.
<pp2.c>=
#include <stdio.h>
#include <stdlib.h>
#include "global.h"
#include "symtab.h"
#include "token.h"
#include "lexer.h"
#include "parser.h"

char *progname, /* name under which parser is being run */
     *filename; /* name of PSub input file */

Token currTok; /* current token returned from lexer */

main(int argc, char *argv[]) {
    FILE *fileptr;         /* input file for lexing/parsing */
    int lexonly = FALSE;   /* TRUE to just run lexer */
    int debugflag = FALSE; /* TRUE for tracing output from parser */
    int c;                 /* option flag from getopt() */

    <process arguments>

    if (lexonly) {
        <drive the lexer>
    }
    else {
        ParseProgram();
        if (getTokenKind(currTok) != EofTOK) {
            fprintf(stderr, "Error: input remaining after end of program"
                " in %s, line %d\n", filename, lineno);
        }
    }
    exit(EXIT_SUCCESS);
}
Defines argc, argv, c, currTok, debugflag, filename, fileptr, lexonly, progname (links are to index).

Process command line arguments as described above: -l sets lexonly to run just the lexer, and -p sets debugflag to instruct the parser to produce debugging output (as required for Project 2). If a filename is present, then that is taken as the name of the PSub program to be parsed, otherwise the standard input is used. We employ the C library function getopt(). From the man page:

#include <stdlib.h>

int getopt(int argc, char * const *argv, const char *optstring);

extern char *optarg;
extern int optind, opterr, optopt;

getopt() returns the next option letter in argv that matches a letter in optstring. It supports all the rules of the command syntax standard....

optstring must contain the option letters the command using getopt() will recognize; if a letter is followed by a colon, the option is expected to have an argument, or group of arguments, which may be separated from it by white space. optarg is set to point to the start of the option argument on return from getopt().

getopt() places in optind the argv index of the next argument to be processed. optind is external and is initialized to 1 before the first call to getopt(). When all options have been processed (that is, up to the first non-option argument), getopt() returns EOF....

getopt() prints an error message on the standard error and returns a ``?'' (question mark) when it encounters an option letter not included in optstring or no argument after an option that expects one. This error message may be disabled by setting opterr to 0. The value of the character that caused the error is in optopt.

<process arguments>= (<-U)
progname = argv[0];
while ((c = getopt(argc, argv, "lp")) != EOF) {
    switch (c) {
        case 'l': lexonly = TRUE;
                  break;
        case 'p': debugflag = TRUE;
                  break;
        case '?': fprintf(stderr, "usage: %s [-l|-p] [infile]\n", progname);
                  exit(EXIT_FAILURE);
    }
}

if (optind < argc) {
    filename = argv[optind];
    if (!(fileptr = fopen(filename, "r"))) {
        perror(progname);
        exit(EXIT_FAILURE);
    }
} else {
    filename = "standard input";
    fileptr = stdin;
}

if (lexonly) {
    initLexer(fileptr);
} else {
    initParser(fileptr, debugflag);
}

For compatibility with Project 1, if the command line option -l is given then the input is simply lexed by repeated calls to getNextToken(), with each token being printed as it is found; at the end, the symbol table is printed.

<drive the lexer>= (<-U)
do {
    currTok = getNextToken();
    printf("%-20s %-20s %-20s\n",
           showTokenKind(getTokenKind(currTok)),
           getTokenLexeme(currTok),
           showTokenValue(getTokenValue(currTok)));
} while (getTokenKind(currTok) != EofTOK);
printf("\n");
printSymbolTable();

Global Variables

We maintain the program name, input file name, and current line number for reporting errors. The usual constants FALSE and TRUE are also defined here.
<global.h>=
#ifndef GLOBAL_H
#define GLOBAL_H

#define FALSE 0
#define TRUE 1

extern char *progname;
extern char *filename;
extern int lineno;

#endif GLOBAL_H
Defines FALSE, TRUE (links are to index).

Symbol Table

Interface

The interface to the symbol table consists of a definition for the abstract type SymTabEntry and a function lookupSymbol() which takes a string argument and returns the corresponding table entry for that symbol, creating a new entry if necessary. There is also a function printSymbolTable() which prints the contents of the table for debugging purposes.
<symtab.h>=
#ifndef SYMTAB_H
#define SYMTAB_H

<SymTabEntry ADT>

SymTabEntry lookupSymbol(const char *symbol);

void printSymbolTable(void);

#endif SYMTAB_H

For this project, each symbol table entry only needs to contain the name of the identifier; future projects will require additional fields such as type and scope information. Identifier names are assumed to be less than MAXSYMLEN characters, although attempts are made to check that this is not exceeded.

<SymTabEntry ADT>= (<-U)
#define MAXSYMLEN 100
typedef struct {
    char name[MAXSYMLEN];
} *SymTabEntry;

/* Constructor */
SymTabEntry createEntry(const char *name);

/* Destructor */
void destroyEntry(SymTabEntry entry);

/* Selector */
char *getSymbolName(const SymTabEntry entry);
Defines MAXSYMLEN, SymTabEntry (links are to index).

Implementation

The symbol table implementation uses the C library function tsearch(). From the man page:
#include <search.h>

void *tsearch(const void *key, void **rootp,
int (*compar)(const void *, const void *));

tsearch() is a routine for manipulating binary search trees. It is generalized from Knuth (6.2.2) Algorithm T. All comparisons are done with a user-supplied routine. This routine is called with two arguments, the pointers to the elements being compared. It returns an integer less than, equal to, or greater than 0, according to whether the first argument is to be considered less than, equal to, or greater than the second argument. The comparison function need not compare every byte, so arbitrary data may be contained in the elements in addition to the values being compared.

tsearch() is used to build and access the tree. key is a pointer to a datum to be accessed or stored. If there is a datum in the tree equal to *key (the value pointed to by key), a pointer to this found datum is returned. Otherwise, *key is inserted, and a pointer to it returned. Only pointers are copied, so the calling routine must store the data. rootp points to a variable that points to the root of a tree. A NULL value for the variable pointed to by rootp denotes an empty tree; in this case, the variable will be set to point to the datum which will be at the root of the new tree.

The comparison function we provide uses strncasecmp() to perform a case-insensitive test, since case is not significant in PSub.

<symtab.c>= [D->]
#include <search.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "global.h"
#include "symtab.h"

<SymTabEntry ADT implementation>

static void *root = NULL; /* the root of the symbol table */

static int compareEntries(const void *entry1, const void *entry2) {
    return strncasecmp(((const SymTabEntry)entry1)->name,
                       ((const SymTabEntry)entry2)->name, MAXSYMLEN);
}

SymTabEntry lookupSymbol(const char *symbol) {
    SymTabEntry newEntry, *foundEntryPtr;

    newEntry = createEntry(symbol);
    foundEntryPtr = (SymTabEntry *)tsearch((void *)newEntry, &root,
                                           compareEntries);
    if (foundEntryPtr == NULL) {
        perror(progname);
    }
    if (*foundEntryPtr != newEntry) {
        destroyEntry(newEntry);
    }
    return *foundEntryPtr;
}
Defines compareEntries, lookupSymbol, root (links are to index).

To print the symbol table, we use the C library function twalk(). From the man page:

#include <search.h>

void twalk(void *root, void (*action)(void *, VISIT, int));

twalk() traverses a binary search tree. root is the root of the tree to be traversed. (Any node in a tree may be used as the root for a walk below that node.) action is the name of a routine to be invoked at each node. This routine is, in turn, called with three arguments. The first argument is the address of the node being visited. The second argument is a value from an enumeration data type typedef enum { preorder, postorder, endorder, leaf } VISIT; (defined in the <search.h> header), depending on whether this is the first, second or third time that the node has been visited (during a depth-first, left-to-right traversal of the tree), or whether the node is a leaf. The third argument is the level of the node in the tree, with the root being level zero.

Our auxiliary function printEntry() prints the symbol table entry when the VISIT parameter is either postorder or leaf, which means that each interior node is printed between its left child and its right child; this gives what is commonly called an inorder traversal (despite the misleading use of postorder). The result is that the entries are printed in sorted order.
<symtab.c>+= [<-D]
static void printEntry(const void *entry, VISIT order, int level) {
    if (order == postorder || order == leaf) {
        printf("%s\n", getSymbolName(*(SymTabEntry *)entry));
    }
}

void printSymbolTable(void) {
    printf("Symbol Table\n");
    printf("------------\n");
    twalk(root, printEntry);
    printf("------------\n");
}
Defines printEntry, printSymbolTable (links are to index).

<SymTabEntry ADT implementation>= (<-U)
SymTabEntry createEntry(const char *name) {
    SymTabEntry entry;

    if (entry = (SymTabEntry)malloc(sizeof entry)) {
        entry->name[0] = 0; /* set to null string */
        strncat(entry->name, name, MAXSYMLEN);
        return entry;
    } else {
        perror(progname);
    }
}

void destroyEntry(SymTabEntry entry) {
    free((void *)entry);
}

char *getSymbolName(const SymTabEntry entry) {
    return entry->name;
}
Defines createEntry, destroyEntry, getSymbolName (links are to index).

Lexical Analysis

Lexer Interface

The interface to the lexer consists of two functions: getNextToken(), which returns the next token found on the input, and initLexer(), which takes as an argument the file pointer which the lexer should use for input. It depends on the abstract type Token, defined in token.h.
<lexer.h>=
#ifndef LEXER_H
#define LEXER_H

#include "token.h"

Token getNextToken(void);

void initLexer(FILE *fileptr);

#endif LEXER_H

Lexer Implementation

Here is the structure of the lex specification:
<lexer.l>=
%{
#include <stdlib.h>
#include "global.h"
#include "symtab.h"
#include "token.h"
#include "lexer.h"

<lexical C declarations>
%}
<lexical definitions>
%%
<lexical rules>
%%
<lexical C functions>

The actions of the lexer include keeping track of the current line number, lineno, for reporting errors:

<lexical C declarations>= (<-U) [D->]
int lineno = 1;
Defines lineno (links are to index).

flex uses the macro YY_DECL to declare the lexing function; we want it to be called getNextToken() and have return type Token:

<lexical C declarations>+= (<-U) [<-D->]
#define YY_DECL Token getNextToken(void)
Defines getNextToken (links are to index).

The scanned token will be collected in currToken. For convenience, we define the macro setCT() to initialize the appropriate fields using setToken():

<lexical C declarations>+= (<-U) [<-D]
Token currToken;

#define setCT(kind,val) setToken(currToken, kind, yytext, val)
Defines currToken, setCT (links are to index).

The definitions start with a case-insensitive version of the alphabet:

<lexical definitions>= (<-U) [D->]
a   [Aa]
b   [Bb]
c   [Cc]
d   [Dd]
e   [Ee]
f   [Ff]
g   [Gg]
h   [Hh]
i   [Ii]
j   [Jj]
k   [Kk]
l   [Ll]
m   [Mm]
n   [Nn]
o   [Oo]
p   [Pp]
q   [Qq]
r   [Rr]
s   [Ss]
t   [Tt]
u   [Uu]
v   [Vv]
w   [Ww]
x   [Xx]
y   [Yy]
z   [Zz]

An integer is one or more digits, and a real number is an integer followed by either a fractional part or an exponent, or both:

<lexical definitions>+= (<-U) [<-D->]
integer [0-9]+
frac    "."{integer}
expt    {e}[-+]?{integer}
real    {integer}({frac}{expt}?|{expt})
Defines expt, frac, integer, real (links are to index).

An identifier is a letter followed by one or more alphanumerics; the case is preserved but is not significant in searching the symbol table.

<lexical definitions>+= (<-U) [<-D->]
ident   [A-Za-z][A-Za-z0-9]*
Defines ident (links are to index).

Whitespace is spaces, tabs, carriage returns, and newlines. Newlines are handled separately to maintain the line number count

<lexical definitions>+= (<-U) [<-D->]
white   [ \t\r]+
newline \n
Defines newline, white (links are to index).

The exclusive states COMMENT1 and COMMENT2 handle the body of comments, respectively surrounded by { } and (* *), so that we can continue to process newlines inside them.

<lexical definitions>+= (<-U) [<-D]
%x COMMENT1 COMMENT2
Defines COMMENT1, COMMENT2 (links are to index).

Here are the rules for dealing with whitespace and comments:

<lexical rules>= (<-U) [D->]
{white} /* eat up whitespace */

{newline} lineno++;

"{" BEGIN(COMMENT1);
"(*" BEGIN(COMMENT2);

<COMMENT1,COMMENT2>{newline} lineno++;

<COMMENT1>[^}\n]+ /* ignore the comment body except for newlines */
<COMMENT2>[^*\n]+ /* ignore the comment body except for newlines */
<COMMENT2>"*" /* ignore lone stars in the comment body */

<COMMENT1>"}" BEGIN(INITIAL);
<COMMENT2>"*)" BEGIN(INITIAL);

<COMMENT1,COMMENT2><<EOF>> {
    fprintf(stderr, "Error: unclosed comment at EOF in %s, line %d\n",
        filename, lineno);
    return setCT(EofTOK, setNoValue());
    }

<<EOF>> return setCT(EofTOK, setNoValue());

Here are the rules for dealing with keywords:

<lexical rules>+= (<-U) [<-D->]
{p}{r}{o}{g}{r}{a}{m}       return setCT(ProgramTOK, setNoValue());
{p}{r}{o}{c}{e}{d}{u}{r}{e} return setCT(ProcedureTOK, setNoValue());
{f}{u}{n}{c}{t}{i}{o}{n}    return setCT(FunctionTOK, setNoValue());
{b}{e}{g}{i}{n}             return setCT(BeginTOK, setNoValue());
{e}{n}{d}                   return setCT(EndTOK, setNoValue());
{i}{f}                      return setCT(IfTOK, setNoValue());
{t}{h}{e}{n}                return setCT(ThenTOK, setNoValue());
{e}{l}{s}{e}                return setCT(ElseTOK, setNoValue());
{w}{h}{i}{l}{e}             return setCT(WhileTOK, setNoValue());
{d}{o}                      return setCT(DoTOK, setNoValue());
{r}{e}{p}{e}{a}{t}          return setCT(RepeatTOK, setNoValue());
{u}{n}{t}{i}{l}             return setCT(UntilTOK, setNoValue());
{v}{a}{r}                   return setCT(VarTOK, setNoValue());
{i}{n}{t}{e}{g}{e}{r}       return setCT(IntegerTOK, setNoValue());
{r}{e}{a}{l}                return setCT(RealTOK, setNoValue());
{b}{o}{o}{l}{e}{a}{n}       return setCT(BooleanTOK, setNoValue());
{n}{o}{t}                   return setCT(NotTOK, setNoValue());

Here are the rules for dealing with operators:

<lexical rules>+= (<-U) [<-D->]
":="    return setCT(AssignTOK, setNoValue());

"="         return setCT(RelOpTOK, setOpValue(EqOP));
"<>"        return setCT(RelOpTOK, setOpValue(NotEqOP));
"<"         return setCT(RelOpTOK, setOpValue(LessOP));
"<="        return setCT(RelOpTOK, setOpValue(LessEqOP));
">"         return setCT(RelOpTOK, setOpValue(GrtrOP));
">="        return setCT(RelOpTOK, setOpValue(GrtrEqOP));

{o}{r}      return setCT(AddOpTOK, setOpValue(OrOP));
"+"         return setCT(AddOpTOK, setOpValue(PlusOP));
"-"         return setCT(AddOpTOK, setOpValue(MinusOP));

{a}{n}{d}   return setCT(MulOpTOK, setOpValue(AndOP));
{m}{o}{d}   return setCT(MulOpTOK, setOpValue(ModOP));
{d}{i}{v}   return setCT(MulOpTOK, setOpValue(DivOP));
"*"         return setCT(MulOpTOK, setOpValue(TimesOP));
"/"         return setCT(MulOpTOK, setOpValue(DivideOP));

Here are the rules for dealing with identifiers and numbers (note that the identifier rule has to come after all of the keyword tokens):

<lexical rules>+= (<-U) [<-D->]
{ident} {
    return setCT(IdentifierTOK, setSymbolValue(lookupSymbol(yytext)));
    }

{integer} return setCT(NumberTOK, setIntegerValue(atoi(yytext)));
{real}    return setCT(NumberTOK, setRealValue(atof(yytext)));

And here are the rules for dealing with punctuation and everything else:

<lexical rules>+= (<-U) [<-D]
"(" return setCT(LParenTOK, setNoValue());
")" return setCT(RParenTOK, setNoValue());
":" return setCT(ColonTOK, setNoValue());
";" return setCT(SemicolonTOK, setNoValue());
"," return setCT(CommaTOK, setNoValue());
"." return setCT(PeriodTOK, setNoValue());
.   {
    fprintf(stderr, "Error: illegal character %s in %s, line %d\n",
        yytext, filename, lineno);
    }

All we need to do to initialize the lexer is set yyin to the desired input file pointer.

<lexical C functions>= (<-U)
void initLexer(FILE *fileptr) {
    yyin = fileptr;
}
Defines initLexer (links are to index).

Token Interface

A Token contains the kind of the token, the actual lexeme found, and the corresponding interpretation of that lexeme as a value (in the case of a token kind where the lexeme provides additional information). For example, the token 3.14 has kind NumberTOK, lexeme "3.14", and value 3.14 (as a double). Since the value of an identifier will be the corresponding symbol table entry, this depends on the abstract type SymTabEntry, defined in symtab.h.
<token.h>=
#ifndef TOKEN_H
#define TOKEN_H

#include "symtab.h"

<TokenKind ADT>
<TokenValue ADT>

typedef struct {
    TokenKind kind;
    char *lexeme;
    TokenValue val;
} Token;

/* Initializer */
Token setToken(Token token, const TokenKind kind, char *lexeme,
           const TokenValue val);

/* Selectors */
TokenKind getTokenKind(const Token token);
char *getTokenLexeme(const Token token);
TokenValue getTokenValue(const Token token);

#endif TOKEN_H
Defines Token (links are to index).

Here are all of the token kinds of PSub, including the special kind EofTOK which is only returned on end-of-file.

<TokenKind ADT>= (<-U)
typedef enum {
    IdentifierTOK, NumberTOK,
    ProgramTOK, ProcedureTOK, FunctionTOK,
    BeginTOK, EndTOK, IfTOK, ThenTOK, ElseTOK,
    WhileTOK, DoTOK, RepeatTOK, UntilTOK,
    VarTOK, IntegerTOK, RealTOK, BooleanTOK,
    NotTOK, AssignTOK, RelOpTOK, AddOpTOK, MulOpTOK,
    LParenTOK, RParenTOK, ColonTOK, SemicolonTOK,
    CommaTOK, PeriodTOK, EofTOK
} TokenKind;

/* Converter */
char *showTokenKind(const TokenKind kind);
Defines TokenKind (links are to index).

The value component of a Token may be one of several types: int, double, SymTabEntry, or OpKind. The type field identifies what sort of datum is in the value field; if it is NoVAL, then there is no associated value for this token (for example, it might just be a keyword or punctuation).

<TokenValue ADT>= (<-U)
typedef enum {
    IntegerVAL, RealVAL, SymbolVAL, OpVAL, NoVAL
} TokenValueType;

<OpKind ADT>

typedef struct {
    TokenValueType  type;
    union {
        int         intval;
        double      realval;
        SymTabEntry symbolval;
        OpKind      opval;
    } value;
} TokenValue;

/* Initializers */
TokenValue setIntegerValue(const int intval);
TokenValue setRealValue(const double realval);
TokenValue setSymbolValue(const SymTabEntry symbolval);
TokenValue setOpValue(const int opval);
TokenValue setNoValue(void);

/* Converter */
char *showTokenValue(const TokenValue value);

/* Selectors will be needed in future projects, but not yet */
Defines TokenValue, TokenValueType (links are to index).

The type OpKind distinguishes which particular operator was seen, since the token kind only discriminates between relational, additive, and multiplicative operators.

<OpKind ADT>= (<-U)
typedef enum {
    EqOP, NotEqOP, LessOP, LessEqOP, GrtrOP, GrtrEqOP, /* RelOps */
    OrOP, PlusOP, MinusOP,                             /* AddOps */
    AndOP, DivOP, ModOP, TimesOP, DivideOP             /* MulOps */
} OpKind;

/* Converter */
char *showOpKind(const OpKind kind);
Defines OpKind (links are to index).

Token Implementation

<token.c>=
#include "global.h"
#include "symtab.h"
#include "token.h"

<TokenKind ADT implementation>
<TokenValue ADT implementation>

Token setToken(Token token, const TokenKind kind, char *lexeme,
               const TokenValue val) {
    token.kind = kind;
    token.lexeme = lexeme;
    token.val = val;
    return token;
}

TokenKind getTokenKind(const Token token) {
    return token.kind;
}

char *getTokenLexeme(const Token token) {
    return token.lexeme;
}

TokenValue getTokenValue(const Token token) {
    return token.val;
}
Defines getTokenKind, getTokenLexeme, getTokenValue, setToken (links are to index).

<TokenKind ADT implementation>= (<-U)
char *showTokenKind(const TokenKind kind) {
    switch (kind) {
        case IdentifierTOK: return "IdentifierTOK";
        case NumberTOK:     return "NumberTOK";
        case ProgramTOK:    return "ProgramTOK";
        case ProcedureTOK:  return "ProcedureTOK";
        case FunctionTOK:   return "FunctionTOK";
        case BeginTOK:      return "BeginTOK";
        case EndTOK:        return "EndTOK";
        case IfTOK:         return "IfTOK";
        case ThenTOK:       return "ThenTOK";
        case ElseTOK:       return "ElseTOK";
        case WhileTOK:      return "WhileTOK";
        case DoTOK:         return "DoTOK";
        case RepeatTOK:     return "RepeatTOK";
        case UntilTOK:      return "UntilTOK";
        case VarTOK:        return "VarTOK";
        case IntegerTOK:    return "IntegerTOK";
        case RealTOK:       return "RealTOK";
        case BooleanTOK:    return "BooleanTOK";
        case NotTOK:        return "NotTOK";
        case AssignTOK:     return "AssignTOK";
        case RelOpTOK:      return "RelOpTOK";
        case AddOpTOK:      return "AddOpTOK";
        case MulOpTOK:      return "MulOpTOK";
        case LParenTOK:     return "LParenTOK";
        case RParenTOK:     return "RParenTOK";
        case ColonTOK:      return "ColonTOK";
        case SemicolonTOK:  return "SemicolonTOK";
        case CommaTOK:      return "CommaTOK";
        case PeriodTOK:     return "PeriodTOK";
        case EofTOK:        return "EofTOK";
    }
}
Defines showTokenKind (links are to index).

<TokenValue ADT implementation>= (<-U)
<OpKind ADT implementation>

TokenValue currTokenValue;

TokenValue setIntegerValue(const int intval) {
    currTokenValue.type = IntegerVAL;
    currTokenValue.value.intval = intval;
    return currTokenValue;
}

TokenValue setRealValue(const double realval) {
    currTokenValue.type = RealVAL;
    currTokenValue.value.realval = realval;
    return currTokenValue;
}

TokenValue setSymbolValue(const SymTabEntry symbolval) {
    currTokenValue.type = SymbolVAL;
    currTokenValue.value.symbolval = symbolval;
    return currTokenValue;
}

TokenValue setOpValue(const int opval) {
    currTokenValue.type = OpVAL;
    currTokenValue.value.opval = opval;
    return currTokenValue;
}

TokenValue setNoValue(void) {
    currTokenValue.type = NoVAL;
    return currTokenValue;
}

/* a buffer in which to print out numbers */
static char valueString[100];

char *showTokenValue(const TokenValue val) {
    switch (val.type) {
        case IntegerVAL:
            sprintf(valueString, "%d", val.value.intval);
            return valueString;
        case RealVAL:
            sprintf(valueString, "%G", val.value.realval);
            return valueString;
        case SymbolVAL:
            return getSymbolName(val.value.symbolval);
        case OpVAL:
            return showOpKind(val.value.opval);
        case NoVAL:
            return "";
    }
}
Defines currTokenValue, setIntegerValue, setNoValue, setOpValue, setRealValue, setSymbolValue, showTokenValue, valueString (links are to index).

<OpKind ADT implementation>= (<-U)
char *showOpKind(const OpKind kind) {
    switch (kind) {
        case EqOP:      return "EqOP";
        case NotEqOP:   return "NotEqOP";
        case LessOP:    return "LessOP";
        case LessEqOP:  return "LessEqOP";
        case GrtrOP:    return "GrtrOP";
        case GrtrEqOP:  return "GrtrEqOP";
        case OrOP:      return "OrOP";
        case PlusOP:    return "PlusOP";
        case MinusOP:   return "MinusOP";
        case AndOP:     return "AndOP";
        case DivOP:     return "DivOP";
        case ModOP:     return "ModOP";
        case TimesOP:   return "TimesOP";
        case DivideOP:  return "DivideOP";
    }
}
Defines showOpKind (links are to index).

Syntactic Analysis

Parser Interface

The interface to the parser consists of two functions: ParseProgram(), which attempts to recognize an entire PSub program in the stream of tokens from the lexer, and initParser(), which takes as arguments the file pointer which the lexer should use for input and a debugging flag to control whether a trace of the parse is printed. The program calling the parser must provide an externally visible definition for the variable currTok, which contains the most recently lexed token.
<parser.h>=
#ifndef PARSER_H
#define PARSER_H

void ParseProgram(void);

void initParser(FILE *fileptr, const int debugflag);

extern Token currTok;

#endif PARSER_H

Parser Implementation

The parser is structured as a recursive descent, predictive parser, based on an LL(1) grammar (described in the following sections).

The function initParser() initializes the lexer with the appropriate file pointer, gets the first token into currTok, and sets the flag telling whether the parser should print a debugging trace. If the flag debugparse is true, then the parsing routines print the following information:

<parser.c>=
#include <stdio.h>
#include <stdlib.h>
#include "global.h"
#include "token.h"
#include "lexer.h"
#include "parser.h"

static int debugparse = FALSE; /* print out trace of parse if TRUE */

<auxiliary parsing routines>
<recursive descent parsing functions>

void initParser(FILE *fileptr, const int debugflag) {
    initLexer(fileptr);
    currTok = getNextToken();
    debugparse = debugflag;
}
Defines debugparse, initParser (links are to index).

<recursive descent parsing functions>= (<-U)
<mutually recursive function declarations>
<parse a program>
<parse variable delcarations>
<parse subprogram declarations>
<parse statements>
<parse expressions>

Since the parsing functions are all mutually recursive, we must declare them all before giving any of their definitions.

<mutually recursive function declarations>= (<-U) [D->]
void ParseProgram(void);
static void ParseVarDecls(void);
static void ParseDeclList(void);
static void ParseDeclListRest(void);
static void ParseDeclaration(void);
static void ParseIdentifierList(void);
static void ParseIdentifierListRest(void);
static void ParseType(void);
<mutually recursive function declarations>+= (<-U) [<-D->]
static void ParseSubprogramDecls(void);
static void ParseSubprogramDeclaration(void);
static void ParseSubprogramHead(void);
static void ParseArguments(void);
static void ParseParameterList(void);
static void ParseParameterListRest(void);
static void ParseParameter(void);
<mutually recursive function declarations>+= (<-U) [<-D->]
static void ParseCompoundStatement(void);
static void ParseStatementList(void);
static void ParseStatementListRest(void);
static void ParseStatement(void);
static void ParseStatementRest(void);
<mutually recursive function declarations>+= (<-U) [<-D]
static void ParseExpressionList(void);
static void ParseExpressionListRest(void);
static void ParseExpression(void);
static void ParseExpressionRest(void);
static void ParseSimpleExpression(void);
static void ParseSimpleExpressionRest(void);
static void ParseTerm(void);
static void ParseTermRest(void);
static void ParseFactor(void);
static void ParseFactorRest(void);

LL(1) Grammar Rules

program --> program id ; var_decls subprogram_decls compound_statement .
<parse a program>= (<-U)
void ParseProgram(void) {
    nonTerm("Program");
    match(ProgramTOK);
    match(IdentifierTOK);
    match(SemicolonTOK);
    ParseVarDecls();
    ParseSubprogramDecls();
    ParseCompoundStatement();
    match(PeriodTOK);
}
Defines ParseProgram (links are to index).

var_decls --> var decl_list
| epsilon

<parse variable delcarations>= (<-U) [D->]
static void ParseVarDecls(void) {
    nonTerm("Var_Decls");
    if (check(VarTOK)) {
        match(VarTOK);
        ParseDeclList();
    } else {
        matchEps();
    }
}
Defines ParseVarDecls (links are to index).

decl_list --> declaration decl_list_rest

<parse variable delcarations>+= (<-U) [<-D->]
static void ParseDeclList(void) {
    nonTerm("Decl_List");
    ParseDeclaration();
    ParseDeclListRest();
}
Defines ParseDeclList (links are to index).

decl_list_rest --> declaration decl_list_rest
| epsilon

<parse variable delcarations>+= (<-U) [<-D->]
static void ParseDeclListRest(void) {
    nonTerm("Decl_List_Rest");
    if (check(IdentifierTOK)) {
        ParseDeclaration();
        ParseDeclListRest();
    } else {
        matchEps();
    }
}
Defines ParseDeclListRest (links are to index).

declaration --> identifier_list : type ;

<parse variable delcarations>+= (<-U) [<-D->]
static void ParseDeclaration(void) {
    nonTerm("Declaration");
    ParseIdentifierList();
    match(ColonTOK);
    ParseType();
    match(SemicolonTOK);
}
Defines ParseDeclaration (links are to index).

identifier_list --> id identifier_list_rest

<parse variable delcarations>+= (<-U) [<-D->]
static void ParseIdentifierList(void) {
    nonTerm("Identifier_List");
    match(IdentifierTOK);
    ParseIdentifierListRest();
}
Defines ParseIdentifierList (links are to index).

identifier_list_rest --> , id identifier_list_rest
| epsilon

<parse variable delcarations>+= (<-U) [<-D->]
static void ParseIdentifierListRest(void) {
    nonTerm("Identifier_List_Rest");
    if (check(CommaTOK)) {
        match(CommaTOK);
        match(IdentifierTOK);
        ParseIdentifierListRest();
    } else {
        matchEps();
    }
}
Defines ParseIdentifierListRest (links are to index).

type --> integer
| real
| boolean

<parse variable delcarations>+= (<-U) [<-D]
static void ParseType(void) {
    nonTerm("Type");
    if (check(IntegerTOK)) {
        match(IntegerTOK);
    } else if (check(RealTOK)) {
        match(RealTOK);
    } else {
        match(BooleanTOK);
    }
}
Defines ParseType (links are to index).

subprogram_decls --> subprogram_declaration subprogram_decls
| epsilon

<parse subprogram declarations>= (<-U) [D->]
static void ParseSubprogramDecls(void) {
    nonTerm("Subprogram_Decls");
    if (check(FunctionTOK) || check(ProcedureTOK)) {
        ParseSubprogramDeclaration();
        ParseSubprogramDecls();
    } else {
        matchEps();
    }
}
Defines ParseSubprogramDecls (links are to index).

subprogram_declaration --> subprogram_head var_decls compound_statement ;

<parse subprogram declarations>+= (<-U) [<-D->]
static void ParseSubprogramDeclaration(void) {
    nonTerm("Subprogram_Declaration");
    ParseSubprogramHead();
    ParseVarDecls();
    ParseCompoundStatement();
    match(SemicolonTOK);
}
Defines ParseSubprogramDeclaration (links are to index).

subprogram_head --> function id arguments : type ;
| procedure id arguments ;

<parse subprogram declarations>+= (<-U) [<-D->]
static void ParseSubprogramHead(void) {
    nonTerm("Subprogram_Head");
    if (check(FunctionTOK)) {
        match(FunctionTOK);
        match(IdentifierTOK);
        ParseArguments();
        match(ColonTOK);
        ParseType();
        match(SemicolonTOK);
    } else {
        match(ProcedureTOK);
        match(IdentifierTOK);
        ParseArguments();
        match(SemicolonTOK);
    }
}
Defines ParseSubprogramHead (links are to index).

arguments --> ( parameter_list )
| epsilon

<parse subprogram declarations>+= (<-U) [<-D->]
static void ParseArguments(void) {
    nonTerm("Arguments");
    if (check(LParenTOK)) {
        match(LParenTOK);
        ParseParameterList();
        match(RParenTOK);
    } else {
        matchEps();
    }
}
Defines ParseArguments (links are to index).

parameter_list --> parameter parameter_list_rest

<parse subprogram declarations>+= (<-U) [<-D->]
static void ParseParameterList(void) {
    nonTerm("Parameter_List");
    ParseParameter();
    ParseParameterListRest();
}
Defines ParseParameterList (links are to index).

parameter_list_rest --> ; parameter parameter_list_rest
| epsilon

<parse subprogram declarations>+= (<-U) [<-D->]
static void ParseParameterListRest(void) {
    nonTerm("Parameter_List_Rest");
    if (check(SemicolonTOK)) {
        match(SemicolonTOK);
        ParseParameter();
        ParseParameterListRest();
    } else {
        matchEps();
    }
}
Defines ParseParameterListRest (links are to index).

parameter --> identifier_list : type

<parse subprogram declarations>+= (<-U) [<-D]
static void ParseParameter(void) {
    nonTerm("Parameter");
    ParseIdentifierList();
    match(ColonTOK);
    ParseType();
}
Defines ParseParameter (links are to index).

compound_statement --> begin statement_list end

<parse statements>= (<-U) [D->]
static void ParseCompoundStatement(void) {
    nonTerm("Compound_Statement");
    match(BeginTOK);
    ParseStatementList();
    match(EndTOK);
}
Defines ParseCompoundStatement (links are to index).

statement_list --> statement statement_list_rest

<parse statements>+= (<-U) [<-D->]
static void ParseStatementList(void) {
    nonTerm("Statement_List");
    ParseStatement();
    ParseStatementListRest();
}
Defines ParseStatementList (links are to index).

statement_list_rest --> ; statement statement_list_rest
| epsilon

<parse statements>+= (<-U) [<-D->]
static void ParseStatementListRest(void) {
    nonTerm("Statement_List_Rest");
    if (check(SemicolonTOK)) {
        match(SemicolonTOK);
        ParseStatement();
        ParseStatementListRest();
    } else {
        matchEps();
    }
}
Defines ParseStatementListRest (links are to index).

statement --> id statement_rest
| compound_statement
| if expression then statement else statement
| while expression do statement
| repeat statement_list until expression
| epsilon

<parse statements>+= (<-U) [<-D->]
static void ParseStatement(void) {
    nonTerm("Statement");
    if (check(IdentifierTOK)) {
        match(IdentifierTOK);
        ParseStatementRest();
    } else if (check(BeginTOK)) {
        ParseCompoundStatement();
    } else if (check(IfTOK)) {
        match(IfTOK);
        ParseExpression();
        match(ThenTOK);
        ParseStatement();
        match(ElseTOK);
        ParseStatement();
    } else if (check(WhileTOK)) {
        match(WhileTOK);
        ParseExpression();
        match(DoTOK);
        ParseStatement();
    } else if (check(RepeatTOK)) {
        match(RepeatTOK);
        ParseStatementList();
        match(UntilTOK);
        ParseExpression();
    } else {
        matchEps();
    }
}
Defines ParseStatement (links are to index).

statement_rest --> := expression
| ( expression_list )
| epsilon

<parse statements>+= (<-U) [<-D]
static void ParseStatementRest(void) {
    nonTerm("Statement_Rest");
    if (check(AssignTOK)) {
        match(AssignTOK);
        ParseExpression();
    } else if (check(LParenTOK)) {
        match(LParenTOK);
        ParseExpressionList();
        match(RParenTOK);
    } else {
        matchEps();
    }
}
Defines ParseStatementRest (links are to index).

expression_list --> expression expression_list_rest

<parse expressions>= (<-U) [D->]
static void ParseExpressionList(void) {
    nonTerm("Expression_List");
    ParseExpression();
    ParseExpressionListRest();
}
Defines ParseExpressionList (links are to index).

expression_list_rest --> , expression expression_list_rest
| epsilon

<parse expressions>+= (<-U) [<-D->]
static void ParseExpressionListRest(void) {
    nonTerm("Expression_List_Rest");
    if (check(CommaTOK)) {
        match(CommaTOK);
        ParseExpression();
        ParseExpressionListRest();
    } else {
        matchEps();
    }
}
Defines ParseExpressionListRest (links are to index).

expression --> simple_expression expression_rest

<parse expressions>+= (<-U) [<-D->]
static void ParseExpression(void) {
    nonTerm("Expression");
    ParseSimpleExpression();
    ParseExpressionRest();
}
Defines ParseExpression (links are to index).

expression_rest --> relop simple_expression
| epsilon

<parse expressions>+= (<-U) [<-D->]
static void ParseExpressionRest(void) {
    nonTerm("Expression_Rest");
    if (check(RelOpTOK)) {
        match(RelOpTOK);
        ParseSimpleExpression();
    } else {
        matchEps();
    }
}
Defines ParseExpressionRest (links are to index).

simple_expression --> addop term simple_expression_rest
| term simple_expression_rest

<parse expressions>+= (<-U) [<-D->]
static void ParseSimpleExpression(void) {
    nonTerm("Simple_Expression");
    if (check(AddOpTOK)) {
        match(AddOpTOK);
        ParseTerm();
        ParseSimpleExpressionRest();
    } else {
        ParseTerm();
        ParseSimpleExpressionRest();
    }
}
Defines ParseSimpleExpression (links are to index).

simple_expression_rest --> addop term simple_expression_rest
| epsilon

<parse expressions>+= (<-U) [<-D->]
static void ParseSimpleExpressionRest(void) {
    nonTerm("Simple_Expression_Rest");
    if (check(AddOpTOK)) {
        match(AddOpTOK);
        ParseTerm();
        ParseSimpleExpressionRest();
    } else {
        matchEps();
    }
}
Defines ParseSimpleExpressionRest (links are to index).

term --> factor term_rest

<parse expressions>+= (<-U) [<-D->]
static void ParseTerm(void) {
    nonTerm("Term");
    ParseFactor();
    ParseTermRest();
}
Defines ParseTerm (links are to index).

term_rest --> mulop factor term_rest
| epsilon

<parse expressions>+= (<-U) [<-D->]
static void ParseTermRest(void) {
    nonTerm("Term_Rest");
    if (check(MulOpTOK)) {
        match(MulOpTOK);
        ParseFactor();
        ParseTermRest();
    } else {
        matchEps();
    }
}
Defines ParseTermRest (links are to index).

factor --> id factor_rest
| num
| ( expression )
| not factor

<parse expressions>+= (<-U) [<-D->]
static void ParseFactor(void) {
    nonTerm("Factor");
    if (check(IdentifierTOK)) {
        match(IdentifierTOK);
        ParseFactorRest();
    } else if (check(NumberTOK)) {
        match(NumberTOK);
    } else if (check(LParenTOK)) {
        match(LParenTOK);
        ParseExpression();
        match(RParenTOK);
    } else {
        match(NotTOK);
        ParseFactor();
    }
}
Defines ParseFactor (links are to index).

factor_rest --> ( expression_list )
| epsilon

<parse expressions>+= (<-U) [<-D]
static void ParseFactorRest(void) {
    nonTerm("Factor_Rest");
    if (check(LParenTOK)) {
        match(LParenTOK);
        ParseExpressionList();
        match(RParenTOK);
    } else {
        matchEps();
    }
}
Defines ParseFactorRest (links are to index).

The auxiliary routine check() is a macro which returns a truth value telling whether the current token has the given token kind. The function match() gets the next token if the current token has the given kind, otherwise it prints an error message and quits. If the tracing flag debugparse is set, then it also prints out the lexeme matched. The function matchEps() is called when an epsilon-production is used; it just prints a newline if the tracing flag is set.

<auxiliary parsing routines>= (<-U) [D->]
#define check(kind) (kind == getTokenKind(currTok))

static void match(const TokenKind kind) {
    if (check(kind)) {
        if (debugparse) {
            printf("%s\n", getTokenLexeme(currTok));
        }
        currTok = getNextToken();
    } else {
        fprintf(stderr, "Error: looking for a %s, found %s in %s, line %d\n",
            showTokenKind(kind), getTokenLexeme(currTok), filename, lineno);
        exit(EXIT_FAILURE);
    }
}

static void matchEps() {
    if (debugparse) {
        printf("\n");
    }
}
Defines check, match, matchEps (links are to index).

Every parsing routine starts with a call to nonTerm(), which will print out the name of the non-terminal if debugparse is true.

<auxiliary parsing routines>+= (<-U) [<-D]
static void nonTerm(const char *name) {
    if (debugparse) {
        printf("%s ", name);
    }
}
Defines nonTerm (links are to index).

Indices

Code Chunks

Identifiers