--- ---
ParseProgram()
to recognize an entire PSub program from the input (stdin
by
default, or another file may be specified as a command line argument).
At the end, it checks that there is no remaining unparsed input, then
exits.
<pp2.c>= #include <stdio.h> #include <stdlib.h> #include "global.h" #include "symtab.h" #include "token.h" #include "lexer.h" #include "parser.h" char *progname, /* name under which parser is being run */ *filename; /* name of PSub input file */ Token currTok; /* current token returned from lexer */ main(int argc, char *argv[]) { FILE *fileptr; /* input file for lexing/parsing */ int lexonly = FALSE; /* TRUE to just run lexer */ int debugflag = FALSE; /* TRUE for tracing output from parser */ int c; /* option flag from getopt() */ <process arguments> if (lexonly) { <drive the lexer> } else { ParseProgram(); if (getTokenKind(currTok) != EofTOK) { fprintf(stderr, "Error: input remaining after end of program" " in %s, line %d\n", filename, lineno); } } exit(EXIT_SUCCESS); }
Definesargc
,argv
,c
,currTok
,debugflag
,filename
,fileptr
,lexonly
,progname
(links are to index).
Process command line arguments as described above: -l sets
lexonly
to run just the lexer, and -p sets debugflag
to instruct the parser to produce debugging output (as required for
Project 2). If a filename is present, then that is taken as the name
of the PSub program to be parsed, otherwise the standard input is
used. We employ the C library function getopt()
. From the man
page:
#include <stdlib.h>
int getopt(int argc, char * const *argv, const char *optstring);
extern char *optarg;
extern int optind, opterr, optopt;
getopt()
returns the next option letter inargv
that matches a letter inoptstring
. It supports all the rules of the command syntax standard....
optstring
must contain the option letters the command usinggetopt()
will recognize; if a letter is followed by a colon, the option is expected to have an argument, or group of arguments, which may be separated from it by white space.optarg
is set to point to the start of the option argument on return fromgetopt()
.
getopt()
places inoptind
theargv
index of the next argument to be processed.optind
is external and is initialized to 1 before the first call togetopt()
. When all options have been processed (that is, up to the first non-option argument),getopt()
returnsEOF
....
getopt()
prints an error message on the standard error and returns a ``?'' (question mark) when it encounters an option letter not included inoptstring
or no argument after an option that expects one. This error message may be disabled by settingopterr
to 0. The value of the character that caused the error is inoptopt
.
<process arguments>= (<-U) progname = argv[0]; while ((c = getopt(argc, argv, "lp")) != EOF) { switch (c) { case 'l': lexonly = TRUE; break; case 'p': debugflag = TRUE; break; case '?': fprintf(stderr, "usage: %s [-l|-p] [infile]\n", progname); exit(EXIT_FAILURE); } } if (optind < argc) { filename = argv[optind]; if (!(fileptr = fopen(filename, "r"))) { perror(progname); exit(EXIT_FAILURE); } } else { filename = "standard input"; fileptr = stdin; } if (lexonly) { initLexer(fileptr); } else { initParser(fileptr, debugflag); }
For compatibility with Project 1, if the command line option -l
is given then the input is simply lexed by repeated calls to
getNextToken()
, with each token being printed as it is found; at the
end, the symbol table is printed.
<drive the lexer>= (<-U) do { currTok = getNextToken(); printf("%-20s %-20s %-20s\n", showTokenKind(getTokenKind(currTok)), getTokenLexeme(currTok), showTokenValue(getTokenValue(currTok))); } while (getTokenKind(currTok) != EofTOK); printf("\n"); printSymbolTable();
FALSE
and TRUE
are
also defined here.
<global.h>= #ifndef GLOBAL_H #define GLOBAL_H #define FALSE 0 #define TRUE 1 extern char *progname; extern char *filename; extern int lineno; #endif GLOBAL_H
DefinesFALSE
,TRUE
(links are to index).
SymTabEntry
and a function lookupSymbol()
which
takes a string argument and returns the corresponding table entry for
that symbol, creating a new entry if necessary. There is also a
function printSymbolTable()
which prints the contents of the table
for debugging purposes.
<symtab.h>= #ifndef SYMTAB_H #define SYMTAB_H <SymTabEntry ADT> SymTabEntry lookupSymbol(const char *symbol); void printSymbolTable(void); #endif SYMTAB_H
For this project, each symbol table entry only needs to contain the name
of the identifier; future projects will require additional fields such
as type and scope information. Identifier names are assumed to be
less than MAXSYMLEN
characters, although attempts are made to check
that this is not exceeded.
<SymTabEntry ADT>= (<-U) #define MAXSYMLEN 100 typedef struct { char name[MAXSYMLEN]; } *SymTabEntry; /* Constructor */ SymTabEntry createEntry(const char *name); /* Destructor */ void destroyEntry(SymTabEntry entry); /* Selector */ char *getSymbolName(const SymTabEntry entry);
DefinesMAXSYMLEN
,SymTabEntry
(links are to index).
tsearch()
. From the man page:
The comparison function we provide uses#include <search.h>
void *tsearch(const void *key, void **rootp,
int (*compar)(const void *, const void *));
tsearch()
is a routine for manipulating binary search trees. It is generalized from Knuth (6.2.2) Algorithm T. All comparisons are done with a user-supplied routine. This routine is called with two arguments, the pointers to the elements being compared. It returns an integer less than, equal to, or greater than 0, according to whether the first argument is to be considered less than, equal to, or greater than the second argument. The comparison function need not compare every byte, so arbitrary data may be contained in the elements in addition to the values being compared.
tsearch()
is used to build and access the tree.key
is a pointer to a datum to be accessed or stored. If there is a datum in the tree equal to*key
(the value pointed to bykey
), a pointer to this found datum is returned. Otherwise,*key
is inserted, and a pointer to it returned. Only pointers are copied, so the calling routine must store the data.rootp
points to a variable that points to the root of a tree. ANULL
value for the variable pointed to byrootp
denotes an empty tree; in this case, the variable will be set to point to the datum which will be at the root of the new tree.
strncasecmp()
to perform a
case-insensitive test, since case is not significant in PSub.
<symtab.c>= [D->] #include <search.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "global.h" #include "symtab.h" <SymTabEntry ADT implementation> static void *root = NULL; /* the root of the symbol table */ static int compareEntries(const void *entry1, const void *entry2) { return strncasecmp(((const SymTabEntry)entry1)->name, ((const SymTabEntry)entry2)->name, MAXSYMLEN); } SymTabEntry lookupSymbol(const char *symbol) { SymTabEntry newEntry, *foundEntryPtr; newEntry = createEntry(symbol); foundEntryPtr = (SymTabEntry *)tsearch((void *)newEntry, &root, compareEntries); if (foundEntryPtr == NULL) { perror(progname); } if (*foundEntryPtr != newEntry) { destroyEntry(newEntry); } return *foundEntryPtr; }
DefinescompareEntries
,lookupSymbol
,root
(links are to index).
To print the symbol table, we use the C library function twalk()
.
From the man page:
Our auxiliary function#include <search.h>
void twalk(void *root, void (*action)(void *, VISIT, int));
twalk()
traverses a binary search tree.root
is the root of the tree to be traversed. (Any node in a tree may be used as the root for a walk below that node.)action
is the name of a routine to be invoked at each node. This routine is, in turn, called with three arguments. The first argument is the address of the node being visited. The second argument is a value from an enumeration data typetypedef enum { preorder, postorder, endorder, leaf } VISIT;
(defined in the<search.h>
header), depending on whether this is the first, second or third time that the node has been visited (during a depth-first, left-to-right traversal of the tree), or whether the node is a leaf. The third argument is the level of the node in the tree, with the root being level zero.
printEntry()
prints the symbol table entry
when the VISIT
parameter is either postorder
or leaf
, which
means that each interior node is printed between its left child and its
right child; this gives what is commonly called an inorder
traversal (despite the misleading use of postorder
). The result is
that the entries are printed in sorted order.
<symtab.c>+= [<-D] static void printEntry(const void *entry, VISIT order, int level) { if (order == postorder || order == leaf) { printf("%s\n", getSymbolName(*(SymTabEntry *)entry)); } } void printSymbolTable(void) { printf("Symbol Table\n"); printf("------------\n"); twalk(root, printEntry); printf("------------\n"); }
DefinesprintEntry
,printSymbolTable
(links are to index).
<SymTabEntry ADT implementation>= (<-U) SymTabEntry createEntry(const char *name) { SymTabEntry entry; if (entry = (SymTabEntry)malloc(sizeof entry)) { entry->name[0] = 0; /* set to null string */ strncat(entry->name, name, MAXSYMLEN); return entry; } else { perror(progname); } } void destroyEntry(SymTabEntry entry) { free((void *)entry); } char *getSymbolName(const SymTabEntry entry) { return entry->name; }
DefinescreateEntry
,destroyEntry
,getSymbolName
(links are to index).
getNextToken()
, which returns the next token found on the input,
and initLexer()
, which takes as an argument the file pointer which
the lexer should use for input. It depends on the abstract type
Token
, defined in token.h.
<lexer.h>= #ifndef LEXER_H #define LEXER_H #include "token.h" Token getNextToken(void); void initLexer(FILE *fileptr); #endif LEXER_H
<lexer.l>= %{ #include <stdlib.h> #include "global.h" #include "symtab.h" #include "token.h" #include "lexer.h" <lexical C declarations> %} <lexical definitions> %% <lexical rules> %% <lexical C functions>
The actions of the lexer include keeping track of the current line
number, lineno
, for reporting errors:
<lexical C declarations>= (<-U) [D->] int lineno = 1;
Defineslineno
(links are to index).
flex uses the macro YY_DECL
to declare the lexing function;
we want it to be called getNextToken()
and have return type Token
:
<lexical C declarations>+= (<-U) [<-D->] #define YY_DECL Token getNextToken(void)
DefinesgetNextToken
(links are to index).
The scanned token will be collected in currToken
. For convenience,
we define the macro setCT()
to initialize the appropriate fields
using setToken()
:
<lexical C declarations>+= (<-U) [<-D] Token currToken; #define setCT(kind,val) setToken(currToken, kind, yytext, val)
DefinescurrToken
,setCT
(links are to index).
The definitions start with a case-insensitive version of the alphabet:
<lexical definitions>= (<-U) [D->] a [Aa] b [Bb] c [Cc] d [Dd] e [Ee] f [Ff] g [Gg] h [Hh] i [Ii] j [Jj] k [Kk] l [Ll] m [Mm] n [Nn] o [Oo] p [Pp] q [Qq] r [Rr] s [Ss] t [Tt] u [Uu] v [Vv] w [Ww] x [Xx] y [Yy] z [Zz]
An integer is one or more digits, and a real number is an integer followed by either a fractional part or an exponent, or both:
<lexical definitions>+= (<-U) [<-D->] integer [0-9]+ frac "."{integer} expt {e}[-+]?{integer} real {integer}({frac}{expt}?|{expt})
Definesexpt
,frac
,integer
,real
(links are to index).
An identifier is a letter followed by one or more alphanumerics; the case is preserved but is not significant in searching the symbol table.
<lexical definitions>+= (<-U) [<-D->] ident [A-Za-z][A-Za-z0-9]*
Definesident
(links are to index).
Whitespace is spaces, tabs, carriage returns, and newlines. Newlines are handled separately to maintain the line number count
<lexical definitions>+= (<-U) [<-D->] white [ \t\r]+ newline \n
Definesnewline
,white
(links are to index).
The exclusive states COMMENT1 and COMMENT2 handle the body of comments, respectively surrounded by { } and (* *), so that we can continue to process newlines inside them.
<lexical definitions>+= (<-U) [<-D] %x COMMENT1 COMMENT2
DefinesCOMMENT1
,COMMENT2
(links are to index).
Here are the rules for dealing with whitespace and comments:
<lexical rules>= (<-U) [D->] {white} /* eat up whitespace */ {newline} lineno++; "{" BEGIN(COMMENT1); "(*" BEGIN(COMMENT2); <COMMENT1,COMMENT2>{newline} lineno++; <COMMENT1>[^}\n]+ /* ignore the comment body except for newlines */ <COMMENT2>[^*\n]+ /* ignore the comment body except for newlines */ <COMMENT2>"*" /* ignore lone stars in the comment body */ <COMMENT1>"}" BEGIN(INITIAL); <COMMENT2>"*)" BEGIN(INITIAL); <COMMENT1,COMMENT2><<EOF>> { fprintf(stderr, "Error: unclosed comment at EOF in %s, line %d\n", filename, lineno); return setCT(EofTOK, setNoValue()); } <<EOF>> return setCT(EofTOK, setNoValue());
Here are the rules for dealing with keywords:
<lexical rules>+= (<-U) [<-D->] {p}{r}{o}{g}{r}{a}{m} return setCT(ProgramTOK, setNoValue()); {p}{r}{o}{c}{e}{d}{u}{r}{e} return setCT(ProcedureTOK, setNoValue()); {f}{u}{n}{c}{t}{i}{o}{n} return setCT(FunctionTOK, setNoValue()); {b}{e}{g}{i}{n} return setCT(BeginTOK, setNoValue()); {e}{n}{d} return setCT(EndTOK, setNoValue()); {i}{f} return setCT(IfTOK, setNoValue()); {t}{h}{e}{n} return setCT(ThenTOK, setNoValue()); {e}{l}{s}{e} return setCT(ElseTOK, setNoValue()); {w}{h}{i}{l}{e} return setCT(WhileTOK, setNoValue()); {d}{o} return setCT(DoTOK, setNoValue()); {r}{e}{p}{e}{a}{t} return setCT(RepeatTOK, setNoValue()); {u}{n}{t}{i}{l} return setCT(UntilTOK, setNoValue()); {v}{a}{r} return setCT(VarTOK, setNoValue()); {i}{n}{t}{e}{g}{e}{r} return setCT(IntegerTOK, setNoValue()); {r}{e}{a}{l} return setCT(RealTOK, setNoValue()); {b}{o}{o}{l}{e}{a}{n} return setCT(BooleanTOK, setNoValue()); {n}{o}{t} return setCT(NotTOK, setNoValue());
Here are the rules for dealing with operators:
<lexical rules>+= (<-U) [<-D->] ":=" return setCT(AssignTOK, setNoValue()); "=" return setCT(RelOpTOK, setOpValue(EqOP)); "<>" return setCT(RelOpTOK, setOpValue(NotEqOP)); "<" return setCT(RelOpTOK, setOpValue(LessOP)); "<=" return setCT(RelOpTOK, setOpValue(LessEqOP)); ">" return setCT(RelOpTOK, setOpValue(GrtrOP)); ">=" return setCT(RelOpTOK, setOpValue(GrtrEqOP)); {o}{r} return setCT(AddOpTOK, setOpValue(OrOP)); "+" return setCT(AddOpTOK, setOpValue(PlusOP)); "-" return setCT(AddOpTOK, setOpValue(MinusOP)); {a}{n}{d} return setCT(MulOpTOK, setOpValue(AndOP)); {m}{o}{d} return setCT(MulOpTOK, setOpValue(ModOP)); {d}{i}{v} return setCT(MulOpTOK, setOpValue(DivOP)); "*" return setCT(MulOpTOK, setOpValue(TimesOP)); "/" return setCT(MulOpTOK, setOpValue(DivideOP));
Here are the rules for dealing with identifiers and numbers (note that the identifier rule has to come after all of the keyword tokens):
<lexical rules>+= (<-U) [<-D->] {ident} { return setCT(IdentifierTOK, setSymbolValue(lookupSymbol(yytext))); } {integer} return setCT(NumberTOK, setIntegerValue(atoi(yytext))); {real} return setCT(NumberTOK, setRealValue(atof(yytext)));
And here are the rules for dealing with punctuation and everything else:
<lexical rules>+= (<-U) [<-D] "(" return setCT(LParenTOK, setNoValue()); ")" return setCT(RParenTOK, setNoValue()); ":" return setCT(ColonTOK, setNoValue()); ";" return setCT(SemicolonTOK, setNoValue()); "," return setCT(CommaTOK, setNoValue()); "." return setCT(PeriodTOK, setNoValue()); . { fprintf(stderr, "Error: illegal character %s in %s, line %d\n", yytext, filename, lineno); }
All we need to do to initialize the lexer is set yyin
to the desired
input file pointer.
<lexical C functions>= (<-U) void initLexer(FILE *fileptr) { yyin = fileptr; }
DefinesinitLexer
(links are to index).
Token
contains the kind of the token, the actual lexeme found, and
the corresponding interpretation of that lexeme as a value (in the case
of a token kind where the lexeme provides additional information). For
example, the token 3.14
has kind NumberTOK
, lexeme "3.14"
,
and value 3.14
(as a double
). Since the value of an identifier
will be the corresponding symbol table entry, this depends on the
abstract type SymTabEntry
, defined in symtab.h.
<token.h>= #ifndef TOKEN_H #define TOKEN_H #include "symtab.h" <TokenKind ADT> <TokenValue ADT> typedef struct { TokenKind kind; char *lexeme; TokenValue val; } Token; /* Initializer */ Token setToken(Token token, const TokenKind kind, char *lexeme, const TokenValue val); /* Selectors */ TokenKind getTokenKind(const Token token); char *getTokenLexeme(const Token token); TokenValue getTokenValue(const Token token); #endif TOKEN_H
DefinesToken
(links are to index).
Here are all of the token kinds of PSub, including the special kind
EofTOK
which is only returned on end-of-file.
<TokenKind ADT>= (<-U) typedef enum { IdentifierTOK, NumberTOK, ProgramTOK, ProcedureTOK, FunctionTOK, BeginTOK, EndTOK, IfTOK, ThenTOK, ElseTOK, WhileTOK, DoTOK, RepeatTOK, UntilTOK, VarTOK, IntegerTOK, RealTOK, BooleanTOK, NotTOK, AssignTOK, RelOpTOK, AddOpTOK, MulOpTOK, LParenTOK, RParenTOK, ColonTOK, SemicolonTOK, CommaTOK, PeriodTOK, EofTOK } TokenKind; /* Converter */ char *showTokenKind(const TokenKind kind);
DefinesTokenKind
(links are to index).
The value component of a Token
may be one of several types:
int
, double
, SymTabEntry
, or OpKind
. The type
field
identifies what sort of datum is in the value
field; if it is
NoVAL
, then there is no associated value for this token (for
example, it might just be a keyword or punctuation).
<TokenValue ADT>= (<-U) typedef enum { IntegerVAL, RealVAL, SymbolVAL, OpVAL, NoVAL } TokenValueType; <OpKind ADT> typedef struct { TokenValueType type; union { int intval; double realval; SymTabEntry symbolval; OpKind opval; } value; } TokenValue; /* Initializers */ TokenValue setIntegerValue(const int intval); TokenValue setRealValue(const double realval); TokenValue setSymbolValue(const SymTabEntry symbolval); TokenValue setOpValue(const int opval); TokenValue setNoValue(void); /* Converter */ char *showTokenValue(const TokenValue value); /* Selectors will be needed in future projects, but not yet */
DefinesTokenValue
,TokenValueType
(links are to index).
The type OpKind
distinguishes which particular operator was seen,
since the token kind only discriminates between relational, additive,
and multiplicative operators.
<OpKind ADT>= (<-U) typedef enum { EqOP, NotEqOP, LessOP, LessEqOP, GrtrOP, GrtrEqOP, /* RelOps */ OrOP, PlusOP, MinusOP, /* AddOps */ AndOP, DivOP, ModOP, TimesOP, DivideOP /* MulOps */ } OpKind; /* Converter */ char *showOpKind(const OpKind kind);
DefinesOpKind
(links are to index).
<token.c>= #include "global.h" #include "symtab.h" #include "token.h" <TokenKind ADT implementation> <TokenValue ADT implementation> Token setToken(Token token, const TokenKind kind, char *lexeme, const TokenValue val) { token.kind = kind; token.lexeme = lexeme; token.val = val; return token; } TokenKind getTokenKind(const Token token) { return token.kind; } char *getTokenLexeme(const Token token) { return token.lexeme; } TokenValue getTokenValue(const Token token) { return token.val; }
DefinesgetTokenKind
,getTokenLexeme
,getTokenValue
,setToken
(links are to index).
<TokenKind ADT implementation>= (<-U) char *showTokenKind(const TokenKind kind) { switch (kind) { case IdentifierTOK: return "IdentifierTOK"; case NumberTOK: return "NumberTOK"; case ProgramTOK: return "ProgramTOK"; case ProcedureTOK: return "ProcedureTOK"; case FunctionTOK: return "FunctionTOK"; case BeginTOK: return "BeginTOK"; case EndTOK: return "EndTOK"; case IfTOK: return "IfTOK"; case ThenTOK: return "ThenTOK"; case ElseTOK: return "ElseTOK"; case WhileTOK: return "WhileTOK"; case DoTOK: return "DoTOK"; case RepeatTOK: return "RepeatTOK"; case UntilTOK: return "UntilTOK"; case VarTOK: return "VarTOK"; case IntegerTOK: return "IntegerTOK"; case RealTOK: return "RealTOK"; case BooleanTOK: return "BooleanTOK"; case NotTOK: return "NotTOK"; case AssignTOK: return "AssignTOK"; case RelOpTOK: return "RelOpTOK"; case AddOpTOK: return "AddOpTOK"; case MulOpTOK: return "MulOpTOK"; case LParenTOK: return "LParenTOK"; case RParenTOK: return "RParenTOK"; case ColonTOK: return "ColonTOK"; case SemicolonTOK: return "SemicolonTOK"; case CommaTOK: return "CommaTOK"; case PeriodTOK: return "PeriodTOK"; case EofTOK: return "EofTOK"; } }
DefinesshowTokenKind
(links are to index).
<TokenValue ADT implementation>= (<-U) <OpKind ADT implementation> TokenValue currTokenValue; TokenValue setIntegerValue(const int intval) { currTokenValue.type = IntegerVAL; currTokenValue.value.intval = intval; return currTokenValue; } TokenValue setRealValue(const double realval) { currTokenValue.type = RealVAL; currTokenValue.value.realval = realval; return currTokenValue; } TokenValue setSymbolValue(const SymTabEntry symbolval) { currTokenValue.type = SymbolVAL; currTokenValue.value.symbolval = symbolval; return currTokenValue; } TokenValue setOpValue(const int opval) { currTokenValue.type = OpVAL; currTokenValue.value.opval = opval; return currTokenValue; } TokenValue setNoValue(void) { currTokenValue.type = NoVAL; return currTokenValue; } /* a buffer in which to print out numbers */ static char valueString[100]; char *showTokenValue(const TokenValue val) { switch (val.type) { case IntegerVAL: sprintf(valueString, "%d", val.value.intval); return valueString; case RealVAL: sprintf(valueString, "%G", val.value.realval); return valueString; case SymbolVAL: return getSymbolName(val.value.symbolval); case OpVAL: return showOpKind(val.value.opval); case NoVAL: return ""; } }
DefinescurrTokenValue
,setIntegerValue
,setNoValue
,setOpValue
,setRealValue
,setSymbolValue
,showTokenValue
,valueString
(links are to index).
<OpKind ADT implementation>= (<-U) char *showOpKind(const OpKind kind) { switch (kind) { case EqOP: return "EqOP"; case NotEqOP: return "NotEqOP"; case LessOP: return "LessOP"; case LessEqOP: return "LessEqOP"; case GrtrOP: return "GrtrOP"; case GrtrEqOP: return "GrtrEqOP"; case OrOP: return "OrOP"; case PlusOP: return "PlusOP"; case MinusOP: return "MinusOP"; case AndOP: return "AndOP"; case DivOP: return "DivOP"; case ModOP: return "ModOP"; case TimesOP: return "TimesOP"; case DivideOP: return "DivideOP"; } }
DefinesshowOpKind
(links are to index).
ParseProgram()
, which attempts to recognize an entire PSub program
in the stream of tokens from the lexer,
and initParser()
, which takes as arguments the file pointer which
the lexer should use for input and a debugging flag to control whether a
trace of the parse is printed. The program calling the parser must
provide an externally visible definition for the variable currTok
,
which contains the most recently lexed token.
<parser.h>= #ifndef PARSER_H #define PARSER_H void ParseProgram(void); void initParser(FILE *fileptr, const int debugflag); extern Token currTok; #endif PARSER_H
The function initParser()
initializes the lexer with the
appropriate file pointer, gets the first token into currTok
, and
sets the flag telling whether the parser should print a debugging
trace. If the flag debugparse
is true, then the parsing routines
print the following information:
<parser.c>= #include <stdio.h> #include <stdlib.h> #include "global.h" #include "token.h" #include "lexer.h" #include "parser.h" static int debugparse = FALSE; /* print out trace of parse if TRUE */ <auxiliary parsing routines> <recursive descent parsing functions> void initParser(FILE *fileptr, const int debugflag) { initLexer(fileptr); currTok = getNextToken(); debugparse = debugflag; }
Definesdebugparse
,initParser
(links are to index).
<recursive descent parsing functions>= (<-U) <mutually recursive function declarations> <parse a program> <parse variable delcarations> <parse subprogram declarations> <parse statements> <parse expressions>
Since the parsing functions are all mutually recursive, we must declare them all before giving any of their definitions.
<mutually recursive function declarations>= (<-U) [D->] void ParseProgram(void); static void ParseVarDecls(void); static void ParseDeclList(void); static void ParseDeclListRest(void); static void ParseDeclaration(void); static void ParseIdentifierList(void); static void ParseIdentifierListRest(void); static void ParseType(void);
<mutually recursive function declarations>+= (<-U) [<-D->] static void ParseSubprogramDecls(void); static void ParseSubprogramDeclaration(void); static void ParseSubprogramHead(void); static void ParseArguments(void); static void ParseParameterList(void); static void ParseParameterListRest(void); static void ParseParameter(void);
<mutually recursive function declarations>+= (<-U) [<-D->] static void ParseCompoundStatement(void); static void ParseStatementList(void); static void ParseStatementListRest(void); static void ParseStatement(void); static void ParseStatementRest(void);
<mutually recursive function declarations>+= (<-U) [<-D] static void ParseExpressionList(void); static void ParseExpressionListRest(void); static void ParseExpression(void); static void ParseExpressionRest(void); static void ParseSimpleExpression(void); static void ParseSimpleExpressionRest(void); static void ParseTerm(void); static void ParseTermRest(void); static void ParseFactor(void); static void ParseFactorRest(void);
<parse a program>= (<-U) void ParseProgram(void) { nonTerm("Program"); match(ProgramTOK); match(IdentifierTOK); match(SemicolonTOK); ParseVarDecls(); ParseSubprogramDecls(); ParseCompoundStatement(); match(PeriodTOK); }
DefinesParseProgram
(links are to index).
var_decls --> var decl_list
| epsilon
<parse variable delcarations>= (<-U) [D->] static void ParseVarDecls(void) { nonTerm("Var_Decls"); if (check(VarTOK)) { match(VarTOK); ParseDeclList(); } else { matchEps(); } }
DefinesParseVarDecls
(links are to index).
decl_list --> declaration decl_list_rest
<parse variable delcarations>+= (<-U) [<-D->] static void ParseDeclList(void) { nonTerm("Decl_List"); ParseDeclaration(); ParseDeclListRest(); }
DefinesParseDeclList
(links are to index).
decl_list_rest --> declaration decl_list_rest
| epsilon
<parse variable delcarations>+= (<-U) [<-D->] static void ParseDeclListRest(void) { nonTerm("Decl_List_Rest"); if (check(IdentifierTOK)) { ParseDeclaration(); ParseDeclListRest(); } else { matchEps(); } }
DefinesParseDeclListRest
(links are to index).
declaration --> identifier_list : type ;
<parse variable delcarations>+= (<-U) [<-D->] static void ParseDeclaration(void) { nonTerm("Declaration"); ParseIdentifierList(); match(ColonTOK); ParseType(); match(SemicolonTOK); }
DefinesParseDeclaration
(links are to index).
identifier_list --> id identifier_list_rest
<parse variable delcarations>+= (<-U) [<-D->] static void ParseIdentifierList(void) { nonTerm("Identifier_List"); match(IdentifierTOK); ParseIdentifierListRest(); }
DefinesParseIdentifierList
(links are to index).
identifier_list_rest --> , id identifier_list_rest
| epsilon
<parse variable delcarations>+= (<-U) [<-D->] static void ParseIdentifierListRest(void) { nonTerm("Identifier_List_Rest"); if (check(CommaTOK)) { match(CommaTOK); match(IdentifierTOK); ParseIdentifierListRest(); } else { matchEps(); } }
DefinesParseIdentifierListRest
(links are to index).
type --> integer
| real
| boolean
<parse variable delcarations>+= (<-U) [<-D] static void ParseType(void) { nonTerm("Type"); if (check(IntegerTOK)) { match(IntegerTOK); } else if (check(RealTOK)) { match(RealTOK); } else { match(BooleanTOK); } }
DefinesParseType
(links are to index).
subprogram_decls --> subprogram_declaration subprogram_decls
| epsilon
<parse subprogram declarations>= (<-U) [D->] static void ParseSubprogramDecls(void) { nonTerm("Subprogram_Decls"); if (check(FunctionTOK) || check(ProcedureTOK)) { ParseSubprogramDeclaration(); ParseSubprogramDecls(); } else { matchEps(); } }
DefinesParseSubprogramDecls
(links are to index).
subprogram_declaration --> subprogram_head var_decls compound_statement ;
<parse subprogram declarations>+= (<-U) [<-D->] static void ParseSubprogramDeclaration(void) { nonTerm("Subprogram_Declaration"); ParseSubprogramHead(); ParseVarDecls(); ParseCompoundStatement(); match(SemicolonTOK); }
DefinesParseSubprogramDeclaration
(links are to index).
subprogram_head --> function id arguments : type ;
| procedure id arguments ;
<parse subprogram declarations>+= (<-U) [<-D->] static void ParseSubprogramHead(void) { nonTerm("Subprogram_Head"); if (check(FunctionTOK)) { match(FunctionTOK); match(IdentifierTOK); ParseArguments(); match(ColonTOK); ParseType(); match(SemicolonTOK); } else { match(ProcedureTOK); match(IdentifierTOK); ParseArguments(); match(SemicolonTOK); } }
DefinesParseSubprogramHead
(links are to index).
arguments --> ( parameter_list )
| epsilon
<parse subprogram declarations>+= (<-U) [<-D->] static void ParseArguments(void) { nonTerm("Arguments"); if (check(LParenTOK)) { match(LParenTOK); ParseParameterList(); match(RParenTOK); } else { matchEps(); } }
DefinesParseArguments
(links are to index).
parameter_list --> parameter parameter_list_rest
<parse subprogram declarations>+= (<-U) [<-D->] static void ParseParameterList(void) { nonTerm("Parameter_List"); ParseParameter(); ParseParameterListRest(); }
DefinesParseParameterList
(links are to index).
parameter_list_rest --> ; parameter parameter_list_rest
| epsilon
<parse subprogram declarations>+= (<-U) [<-D->] static void ParseParameterListRest(void) { nonTerm("Parameter_List_Rest"); if (check(SemicolonTOK)) { match(SemicolonTOK); ParseParameter(); ParseParameterListRest(); } else { matchEps(); } }
DefinesParseParameterListRest
(links are to index).
parameter --> identifier_list : type
<parse subprogram declarations>+= (<-U) [<-D] static void ParseParameter(void) { nonTerm("Parameter"); ParseIdentifierList(); match(ColonTOK); ParseType(); }
DefinesParseParameter
(links are to index).
compound_statement --> begin statement_list end
<parse statements>= (<-U) [D->] static void ParseCompoundStatement(void) { nonTerm("Compound_Statement"); match(BeginTOK); ParseStatementList(); match(EndTOK); }
DefinesParseCompoundStatement
(links are to index).
statement_list --> statement statement_list_rest
<parse statements>+= (<-U) [<-D->] static void ParseStatementList(void) { nonTerm("Statement_List"); ParseStatement(); ParseStatementListRest(); }
DefinesParseStatementList
(links are to index).
statement_list_rest --> ; statement statement_list_rest
| epsilon
<parse statements>+= (<-U) [<-D->] static void ParseStatementListRest(void) { nonTerm("Statement_List_Rest"); if (check(SemicolonTOK)) { match(SemicolonTOK); ParseStatement(); ParseStatementListRest(); } else { matchEps(); } }
DefinesParseStatementListRest
(links are to index).
statement --> id statement_rest
| compound_statement
| if expression then statement else statement
| while expression do statement
| repeat statement_list until expression
| epsilon
<parse statements>+= (<-U) [<-D->] static void ParseStatement(void) { nonTerm("Statement"); if (check(IdentifierTOK)) { match(IdentifierTOK); ParseStatementRest(); } else if (check(BeginTOK)) { ParseCompoundStatement(); } else if (check(IfTOK)) { match(IfTOK); ParseExpression(); match(ThenTOK); ParseStatement(); match(ElseTOK); ParseStatement(); } else if (check(WhileTOK)) { match(WhileTOK); ParseExpression(); match(DoTOK); ParseStatement(); } else if (check(RepeatTOK)) { match(RepeatTOK); ParseStatementList(); match(UntilTOK); ParseExpression(); } else { matchEps(); } }
DefinesParseStatement
(links are to index).
statement_rest --> := expression
| ( expression_list )
| epsilon
<parse statements>+= (<-U) [<-D] static void ParseStatementRest(void) { nonTerm("Statement_Rest"); if (check(AssignTOK)) { match(AssignTOK); ParseExpression(); } else if (check(LParenTOK)) { match(LParenTOK); ParseExpressionList(); match(RParenTOK); } else { matchEps(); } }
DefinesParseStatementRest
(links are to index).
expression_list --> expression expression_list_rest
<parse expressions>= (<-U) [D->] static void ParseExpressionList(void) { nonTerm("Expression_List"); ParseExpression(); ParseExpressionListRest(); }
DefinesParseExpressionList
(links are to index).
expression_list_rest --> , expression expression_list_rest
| epsilon
<parse expressions>+= (<-U) [<-D->] static void ParseExpressionListRest(void) { nonTerm("Expression_List_Rest"); if (check(CommaTOK)) { match(CommaTOK); ParseExpression(); ParseExpressionListRest(); } else { matchEps(); } }
DefinesParseExpressionListRest
(links are to index).
expression --> simple_expression expression_rest
<parse expressions>+= (<-U) [<-D->] static void ParseExpression(void) { nonTerm("Expression"); ParseSimpleExpression(); ParseExpressionRest(); }
DefinesParseExpression
(links are to index).
expression_rest --> relop simple_expression
| epsilon
<parse expressions>+= (<-U) [<-D->] static void ParseExpressionRest(void) { nonTerm("Expression_Rest"); if (check(RelOpTOK)) { match(RelOpTOK); ParseSimpleExpression(); } else { matchEps(); } }
DefinesParseExpressionRest
(links are to index).
simple_expression --> addop term simple_expression_rest
| term simple_expression_rest
<parse expressions>+= (<-U) [<-D->] static void ParseSimpleExpression(void) { nonTerm("Simple_Expression"); if (check(AddOpTOK)) { match(AddOpTOK); ParseTerm(); ParseSimpleExpressionRest(); } else { ParseTerm(); ParseSimpleExpressionRest(); } }
DefinesParseSimpleExpression
(links are to index).
simple_expression_rest --> addop term simple_expression_rest
| epsilon
<parse expressions>+= (<-U) [<-D->] static void ParseSimpleExpressionRest(void) { nonTerm("Simple_Expression_Rest"); if (check(AddOpTOK)) { match(AddOpTOK); ParseTerm(); ParseSimpleExpressionRest(); } else { matchEps(); } }
DefinesParseSimpleExpressionRest
(links are to index).
<parse expressions>+= (<-U) [<-D->] static void ParseTerm(void) { nonTerm("Term"); ParseFactor(); ParseTermRest(); }
DefinesParseTerm
(links are to index).
term_rest --> mulop factor term_rest
| epsilon
<parse expressions>+= (<-U) [<-D->] static void ParseTermRest(void) { nonTerm("Term_Rest"); if (check(MulOpTOK)) { match(MulOpTOK); ParseFactor(); ParseTermRest(); } else { matchEps(); } }
DefinesParseTermRest
(links are to index).
factor --> id factor_rest
| num
| ( expression )
| not factor
<parse expressions>+= (<-U) [<-D->] static void ParseFactor(void) { nonTerm("Factor"); if (check(IdentifierTOK)) { match(IdentifierTOK); ParseFactorRest(); } else if (check(NumberTOK)) { match(NumberTOK); } else if (check(LParenTOK)) { match(LParenTOK); ParseExpression(); match(RParenTOK); } else { match(NotTOK); ParseFactor(); } }
DefinesParseFactor
(links are to index).
factor_rest --> ( expression_list )
| epsilon
<parse expressions>+= (<-U) [<-D] static void ParseFactorRest(void) { nonTerm("Factor_Rest"); if (check(LParenTOK)) { match(LParenTOK); ParseExpressionList(); match(RParenTOK); } else { matchEps(); } }
DefinesParseFactorRest
(links are to index).
The auxiliary routine check()
is a macro which returns a truth
value telling whether the current token has the given token kind. The
function match()
gets the next token if the current token has the
given kind, otherwise it prints an error message and quits. If the
tracing flag debugparse
is set, then it also prints out the lexeme
matched. The function matchEps()
is called when an
epsilon-production is used; it just prints a newline if the tracing
flag is set.
<auxiliary parsing routines>= (<-U) [D->] #define check(kind) (kind == getTokenKind(currTok)) static void match(const TokenKind kind) { if (check(kind)) { if (debugparse) { printf("%s\n", getTokenLexeme(currTok)); } currTok = getNextToken(); } else { fprintf(stderr, "Error: looking for a %s, found %s in %s, line %d\n", showTokenKind(kind), getTokenLexeme(currTok), filename, lineno); exit(EXIT_FAILURE); } } static void matchEps() { if (debugparse) { printf("\n"); } }
Definescheck
,match
,matchEps
(links are to index).
Every parsing routine starts with a call to nonTerm()
, which will
print out the name of the non-terminal if debugparse
is true.
<auxiliary parsing routines>+= (<-U) [<-D] static void nonTerm(const char *name) { if (debugparse) { printf("%s ", name); } }
DefinesnonTerm
(links are to index).