Educational Objectives Summary: After completing this assignment, the student should be able to do the following:
- Draw parse trees for legal expressions
- Given an LL(1) grammar, implement a recursive descent parser as a Java class:
- Define functions (methods) for each non-terminal in the grammar
- Use sequencing and recursion as defined in the productions of the grammar
- Explain how legal expressions are parsed by the code
- Explain why non-legal expressions are not fully parsed and where the parser detects the non-legality
- Illustrate decorated parse trees for legal expressions in an augmented LL(1) grammar
- Read and understand implementations of augmented LL(1) grammars
- Expand augmented LL(1) grammars to include new language featrures
- Modify and expand existing implementations of augmented LL(1) grammars to include additions to the grammar
- Implement a compiler front end for a small LL(1) grammar with output an abstract syntax tree (AST).
- Implement a compiler for a calculator language producing an AST with known arithmetic semantics accomplished
- Traverse an AST to produce a Lisp-like representation of a legal input expression
Deliverables summary: Parser2.java, Calc2.java, AST2.java, Calc2Lisp.java, log.txt
Educational Objectives: After completing this assignment, the student should be able to do the following:
Operational Objectives: Implement a recursive descent parser in Java for a calculator language based on a BNF grammar.
Deliverables: One file Parser2.java
Arithmetic operators in a programming language are typically left associative with the notable exception of exponentiation (^) which is right associative. (However, this rule of thumb is not universal.)
Associativity can be captured in a grammar. For a left associative binary operator lop we can have a production of the form:
<expr> -> <term> | <expr> lop <term>
For example, a+b+c is evaluated from the left to the right by summing a and b first. Assuming that <term> represents identifiers, the parse tree of a+b+c with the grammar above is:
<expr> / | \ <expr> + <term> / | \ | <expr> + <term> c | | <term> b | a
As you can see, the left subtree represents a+b which is a subsexpression of a+b+c, because a+b+c is parsed as (a+b)+c.
Note that the production for a left associative operator is left recursive. To eliminate left recursion, we can rewrite the grammar into:
<expr> -> <term> <term_tail> <term_tail> -> lop <term> <term_tail> | empty
This (part of the) grammar is LL(1) and therefore suitable for recursive descent parsing. However, the parse tree structure does not capture the left-associativity of the lop operator.
Draw the parse tree of a+b+c using the LL(1) grammar shown above. You may assume that <term> represents identifiers. Hint: draw the tree from the top down by simulating a top-down predictive parser.
For a right associative operator rop we can create a grammar production of the form:
<expr> -> <term> | <term> rop <expr>
An example right associative operator is exponentiation ^, so a^b^c is evaluated from the right to the left such that b^c is evaluated first.
Draw the parse tree of a^b^c. You may assume that <term> represents identifiers.
The precedence of an operator indicates the priority of applying the operator relative to other operators. For example, multiplication has a higher precedence than addition, so a+b*c is evaluated by multiplying b and c first. In other words, multiplication groups more tightly compared to addition. The rules of operator precedence vary from one programming language to another.
The relative precedences between operators can be captured in a grammar as follows. A nonterminal is introduced for every group of operators with identical precedence. The nonterminal of the group of operators with lowest precedence is the nonterminal for the expression as a whole. Productions for (left associative) binary operators with lowest to highest precedences are written of the form suitable for recursive descent parsing. Here is an outline:
<expr> -> <e1> <e1_tail> <e1> -> <e2> <e2_tail> <e1_tail> -> <lowest_op> <e1> <e1_tail> | empty <e2> -> <e3> <e3_tail> <e2_tail> -> <second_lowest_op> <e2> <e2_tail> | empty ... <eN> -> '(' <expr> ')' | '-' <eN> | identifier | number <eN_tail> -> <highest_op> <eN> <eN_tail> | empty
where <lowest_op> is a nonterminal denoting all operators with the same lowest precedence, etc.
The following Java program uses these concepts to implement a recursive descent parser for a calculator language:
/* Parser.java Implementes a parser for a calculator language Uses java.io.StreamTokenizer and recursive descent parsing Compile: javac Parser.java */ import java.io.*; /* Calculator language grammar: <expr> -> <term> <term_tail> <term> -> <factor> <factor_tail> <term_tail> -> <add_op> <term> <term_tail> | empty <factor> -> '(' <expr> ')' | '-' <factor> | identifier | number <factor_tail> -> <mult_op> <factor> <factor_tail> | empty <add_op> -> '+' | '-' <mult_op> -> '*' | '/' */ public class Parser { private static StreamTokenizer tokens; private static int token; public static void main(String argv[]) throws IOException { InputStreamReader reader; if (argv.length > 0) reader = new InputStreamReader(new FileInputStream(argv[0])); else reader = new InputStreamReader(System.in); // create the tokenizer: tokens = new StreamTokenizer(reader); tokens.ordinaryChar('.'); tokens.ordinaryChar('-'); tokens.ordinaryChar('/'); // advance to the first token on the input: getToken(); // check if expression: expr(); // check if expression ends with ';' if (token == (int)';') System.out.println("Syntax ok"); else System.out.println("Syntax error"); } // getToken - advance to the next token on the input private static void getToken() throws IOException { token = tokens.nextToken(); } // expr - parse <expr> -> <term> <term_tail> private static void expr() throws IOException { term(); term_tail(); } // term - parse <term> -> <factor> <factor_tail> private static void term() throws IOException { factor(); factor_tail(); } // term_tail - parse <term_tail> -> <add_op> <term> <term_tail> | empty private static void term_tail() throws IOException { if (token == (int)'+' || token == (int)'-') { add_op(); term(); term_tail(); } } // factor - parse <factor> -> '(' <expr> ')' | '-' <expr> | identifier | number private static void factor() throws IOException { if (token == (int)'(') { getToken(); expr(); if (token == (int)')') getToken(); else System.out.println("closing ')' expected"); } else if (token == (int)'-') { getToken(); factor(); } else if (token == tokens.TT_WORD) getToken(); else if (token == tokens.TT_NUMBER) getToken(); else System.out.println("factor expected"); } // factor_tail - parse <factor_tail> -> <mult_op> <factor> <factor_tail> | empty private static void factor_tail() throws IOException { if (token == (int)'*' || token == (int)'/') { mult_op(); factor(); factor_tail(); } } // add_op - parse <add_op> -> '+' | '-' private static void add_op() throws IOException { if (token == (int)'+' || token == (int)'-') getToken(); } // mult_op - parse <mult_op> -> '*' | '/' private static void mult_op() throws IOException { if (token == (int)'*' || token == (int)'/') getToken(); } }
Copy (and download if needed) this example parser program from:
~cop4020p/[semester]/proj2/
Compile and execute:
javac Parser.java java Parser
Give the output of the program when you type 2*(1+3)/x; and explain why this expression is accepted by the parser by drawing the parse tree. Give the output of the program when you type 2x+1; and explain why it is not accepted. At what point in the program does the parser fail?
Extend the parser program to include syntax checking of function calls with one argument, given by the new production for <factor>:
<factor> -> '(' <expr> ')' | '-' <factor> | identifier '(' <expr> ')' | identifier | number
Test your implementation with 2*f(1+a);. Also draw the parse tree of 2*f(1+a);.
Extend the parser to include syntax checking of the exponentiation operator ^, so that expressions like -a^2 and -(a^b)^(c*d)^(e+f) can be parsed. Note that exponentation is right associative and has the highest precedence, even higher than unary minus, so -a^2 is evaluated by evaluating a^2 first. To implemented this, you must add a <power> nonterminal and also change the production of <factor> so that the parse tree of -a^2 is:
<factor> / \ - <power> / | \ a ^ <power> | 2
Keep this step of the project as a working Java program named "Parser2.java". This will be collected using the standard submit script configured with the project 2 deliverables.sh. Your answers for the non-programming questions should be inserted as "Appendix 1" at the end of your log.txt. Use plain text to draw the trees as required.
Hint: The compiled "Parser2v.class" is distributed in area51. You can use this to see how an expression is processed through a correctly functioning parser. (This is a "verbose" version. Your Parser2.java should not be verbose.)
Educational Objectives: After completing this assignment, the student should be able to do the following:
Operational Objectives: Implement a calculator with assignment in Java using an L-attributed grammar
Deliverables: One file Calc2.java
Consider the following augmented LL(1) grammar for an expression language:
<expr> -> <term> <term_tail> term_tail.subtotal := term.value; expr.value := term_tail.value <term> -> <factor> <factor_tail> factor_tail.subtotal := factor.value; term.value := factor_tail.value <term_tail1> -> '+' <term> <term_tail2> term_tail2.subtotal := term_tail1.subtotal+term.value; term_tail1.value := term_tail2.value | '-' <term> <term_tail2> term_tail2.subtotal := term_tail1.subtotal-term.value; term_tail1.value := term_tail2.value | empty term_tail1.value := term_tail1.subtotal <factor1> -> '(' <expr> ')' factor1.value := expr.value | '-' <factor2> factor1.value := -factor2.value | number factor1.value := number <factor_tail1> -> '*' <factor> <factor_tail2> factor_tail2.subtotal := factor_tail1.subtotal*factor.value; factor_tail1.value := factor_tail2.value | '/' <factor> <factor_tail2> factor_tail2.subtotal := factor_tail1.subtotal/factor.value; factor_tail1.value := factor_tail2.value | empty factor_tail1.value := factor_tail1.subtotal
Note: the indexing (1 and 2) used with nonterminals, such as <factor1> and <factor2>, is only relevant to the semantic rules to identify the specific occurrences of the nonterminals in a production. (See text.)
Draw the decorated parse tree for -2*3+1 that shows the attributes and their values.
The following calculator Java program implements the attribute grammar shown above to calculate the value of an expression. To this end, the synthesized value attributes are returned as integer values from the methods that correspond to nonterminals. Inherited subtotal attributes are passed to the methods as arguments:
/* Calc.java Implementes a parser and calculator for simple expressions Uses java.io.StreamTokenizer and recursive descent parsing Compile: javac Calc.java Execute: java Calc or: java Calc <filename> */ import java.io.*; public class Calc { private static StreamTokenizer tokens; private static int token; public static void main(String argv[]) throws IOException { InputStreamReader reader; if (argv.length > 0) reader = new InputStreamReader(new FileInputStream(argv[0])); else reader = new InputStreamReader(System.in); // create the tokenizer: tokens = new StreamTokenizer(reader); tokens.ordinaryChar('.'); tokens.ordinaryChar('-'); tokens.ordinaryChar('/'); // advance to the first token on the input: getToken(); // parse expression and get calculated value: int value = expr(); // check if expression ends with ';' and print value if (token == (int)';') System.out.println("Value = " + value); else System.out.println("Syntax error"); } // getToken - advance to the next token on the input private static void getToken() throws IOException { token = tokens.nextToken(); } // expr - parse <expr> -> <term> <term_tail> private static int expr() throws IOException { int subtotal = term(); return term_tail(subtotal); } // term - parse <term> -> <factor> <factor_tail> private static int term() throws IOException { int subtotal = factor(); return factor_tail(subtotal); } // term_tail - parse <term_tail> -> <add_op> <term> <term_tail> | empty private static int term_tail(int subtotal) throws IOException { if (token == (int)'+') { getToken(); int termvalue = term(); return term_tail(subtotal + termvalue); } else if (token == (int)'-') { getToken(); int termvalue = term(); return term_tail(subtotal - termvalue); } else return subtotal; } // factor - parse <factor> -> '(' <expr> ')' | '-' <expr> | identifier | number private static int factor() throws IOException { if (token == (int)'(') { getToken(); int value = expr(); if (token == (int)')') getToken(); else System.out.println("closing ')' expected"); return value; } else if (token == (int)'-') { getToken(); return -factor(); } else if (token == tokens.TT_WORD) { getToken(); // ignore variable names return 0; } else if (token == tokens.TT_NUMBER) { getToken(); return (int)tokens.nval; } else { System.out.println("factor expected"); return 0; } } // factor_tail - parse <factor_tail> -> <mult_op> <factor> <factor_tail> | empty private static int factor_tail(int subtotal) throws IOException { if (token == (int)'*') { getToken(); int factorvalue = factor(); return factor_tail(subtotal * factorvalue); } else if (token == (int)'/') { getToken(); int factorvalue = factor(); return factor_tail(subtotal / factorvalue); } else return subtotal; } }
Copy this example Calc.java program from [LIB]/proj2/, and compile and run it:
javac Calc.java java Calc
Explain why the input 1/2; to this program produces the value 0. What are the relevant parts of the program involved in computing this result?
Extend the attribute grammar with two new productions and two new attributes for all nonterminals:
The two new productions with corresponding semantic rules are as follows:
<expr1> -> 'let' identifier '=' <expr2> expr2.in := expr1.in; expr1.value := expr2.value expr1.out := expr2.out.put(identifier=expr2.value) | <term> <term_tail> term.in := expr1.in; term_tail.in := term.out; term_tail.subtotal := term.value; expr1.value := term_tail.value; expr1.out := term_tail.out <factor1> -> '(' <expr> ')' expr.in := factor1.in; factor1.value := expr.value factor1.out := expr.out | '-' <factor2> factor2.in := factor1.in; factor1.value := -factor2.value; factor1.out := factor2.out | identifier factor1.value := factor1.in.get(identifier) factor1.out := factor1.in | number factor1.value := number; factor1.out := factor1.in
The first production introduces an assignment construct as an expression, similar to the C/C++ assignment which can also be used within an expression, as in this example:
(let x = 3) + x; Value = 6
The semantic rule expr2.in := expr1.in copies the symbol table of the context in which expr1 is evaluated to the context of expr2. The evaluation of expr2 may change the symbol table and the table is copied to expr1 with the semantic rule expr1.out := expr2.out. For this part of the assignment, you have to change the semantic rules of all other productions in the grammar to include assignments for the in and out attributes to pass the symbol table. Write down the grammar with these new semantic rules.
Implement the two new productions and semantic rules in an updated Calc2.java program.
To implement a symbol table with identifier-value bindings, you can use the Java java.util.Hashtable class as follows:
import java.util.*; ... public class Calc { ... public static void main(String argv[]) throws IOException { ... Hashtable<String,Integer> exprin = new Hashtable<String,Integer>(); Hashtable<String,Integer> exprout; ... int value = expr(exprin, exprout); ... private static int expr (Hashtable<String,Integer> exprin, Hashtable<String,Integer> exprout) throws IOException { if (token == tokens.TT_WORD && tokens.sval.equals("let")) { getToken(); // advance to identifier String id = tokens.sval; getToken(); // advance to '=' getToken(); // advance to <expr> int value = expr(exprin, exprout); exprout.put(id, new Integer(value)); ... // return statement here } else { Table x = exprin; // Java likes references to be initialized int subtotal = term(exprin, x); return term_tail(subtotal, x, exprout); } } private static int factor (Hashtable<String,Integer> factorin, Hashtable<String,Integer> factorout) throws IOException { ... else if (token == tokens.TT_WORD) { String id = tokens.sval; getToken(); factorout = factorin; return ((Integer)factorin.get(id)).intValue(); } ...
The put method puts a key and value in the hashtable, where the value must be a class instance so an Integer instance is created. The get method returns the value of a key. The intValue method of Integer class returns an int. Test your new Calc2.java application. For example:
let x = 1; Value = 1 (let x = 1) + x; Value = 2 (let a = 2) + 3 * a; Value = 8 1 + (let a = (let b = 1) + b) + a; Value = 5
Save this assignment as a working Java program named "Calc2.java". It will be collected using the standard submit script configured with the project 2 deliverables.sh. Your answers for the non-programming questions should be in "Appendix 2" at the end of your log.txt. Use plain text to draw trees and write grammars as required.
Suggestions on drawing trees. There are (at least) two basic ways to illustrate trees using ascii text. The first is "pyramidal":
<expr>(-3) -------------------------------- / \ <term>(-5) <term_tail1>[-5](-3) -------- ------------- / \ / | \ ................................................................. Note: [] represents inherited attribute values () represents synthesized attribute values
A second way to represent the same tree is a squared off version:
<expr>(-3) ---------------------------------------------------- | | <term>(-5) <term_tail1>[-5](-3) --------------- ---------------------------- | | | | | ..................................................................... Note: [] represents inherited attribute values () represents synthesized attribute values
The latter may be easier to use, especially for decorated parse trees where a lot of information is displayed for each node. Note that in either case we are using [] to enclose inherited attribute values and () to enclose synthesized attribute values.
Hint: The compiled "Calc2v.class" is distributed in area51. You can use this to see how an expression is processed through a correctly functioning calculator. (This is a "verbose" version. Your Calc2.java should not be verbose.)
Educational Objectives: After completing this assignment, the student should be able to do the following:
Operational Objectives: Implement a calculator-to-lisp application in Java using the abstract syntax tree class defined in AST.java
Deliverables: Two files AST2.java and Calc2Lisp.java
Copy (and download if needed) the CalcAST.java and AST.java source files from
[LIB]/proj2/
The CalcAST program constructs an abstract syntax tree (AST) representation of arithmetic expressions. For example, when the expression that you input is 1+2; the program constructs the following AST:
+ / \ 1 2
This tree structure is constructed with the AST class, which has a tree node structure that contains an optional operator (e.g. +), an optional value (e.g. 1), and optional left and right subnodes for the operands to unary and binary operators. The AST class has a toLisp method. When invoked it will output the expression in Lisp form, such as (+ 1 2) for example.
Compile the sources on linprog with:
javac CalcAST.java
And run the resulting program:
java CalcAST
The program will wait for input from the command line, so type 1+2;<enter> for example. The program output will be the Lisp equivalent of this expression (+ 1 2). (Note that the toLisp method does a preorder traversal of the AST, implemented recursively. See COP 4530 Lecture Notes.)
Modify the CalcAST.java program to pre-evaluate parts of expressions when possible. That is, all arithmetic operations are performed when the operands are numeric. When one of the operands is non-numeric (symbolic), an AST node is created. The output of the program will be partially evaluated expressions translated into Lisp.
In addition, add productions and code to implement the power operator ^ (see Part 1 above). For the implementation, you need to use the static Math.pow method of class Math to compute powers. This operator must be evaluated when possible, along with the other arithmetic operators.
You may find it convenient to strengthen the AST class. Whether you do or not, copy the file AST.java to AST2.java and rename the class to AST2. Your Calc2Lisp should be a client of AST2. Both files should be turned in (using the submit script as usual).
Examples:
java Calc2Lisp 2*(1+3)-2^3+xyz; xyz java Calc2Lisp 2*(1+3)-2^3+x*y*z; (* (* x y) z)
The outputs are simplified Lisp expressions - xyz is an identifier while (* (* x y) z) is the product of x, y, z.
Note that the AST node structure includes a val member that can be used to store a node's value and to pass values as part of the AST instances that are returned from methods (as synthesized attribute values) and passed to methods (as inherited attribute values). The type of val is Object, so to create an AST node with an integer value, say 7, you need: new AST(new Integer(7)).
Here are some sample calculations you can use to test your caclulator:
1+2+3; 6 1*2*3; 6 1*-2*(3-6); 6 1+2+x+3; (+ (+ 3 x) 3) x+1+2; (+ (+ x 1) 2) x+0; x 1*x; x x^1; x --2; 2 --x; (-(- x)) 2+3+x+4+5; (+ (+ (+ 5 x) 4) 5) 2*3*x*4*5; (* (* (* 6 x) 4) 5) 2^3^x^4^5; (^ 2 (^ 3 (^ x 1024)))
Note that the semantic rules of the grammar enforce associativity, so 1+2+x+3 is evaluated from the left. The evaluation process does not consider commutativity, so the expression does not simplify to x+6.
The files Parser2.java, Calc2.java, AST2.java, Calc2Lisp.java, and log.txt will be collected by the submit script configured with deliverables.sh.