Homework 4

Out: 10/24, due by 11:59pm 10/30

Lex and Yacc

The goal of this exercise is to acquaint yourself with two very powerful tools for the generation of compilers: lex and yacc. When a program is presented to a compiler, the compiler has to divide the program into meaningful units, called tokens, and then discover the relationship among these units. The algorithm that divides the program into units is called a lexical analyzer, scanner, or lexer. Once the tokens are identified, the compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is the function of a parser. The parser requires a list of rules that define the relationships among the tokens. This list of rules is called a grammar.

Lex (or the gnu version which is called flex) is a program that takes a set of descriptions of possible tokens and produces a C routine that implements a scanner. Yacc takes a concise description of a grammar and produces a C routine that implements a parser for the grammar.

Lex and yacc are tools which can be used to solve a variety of interesting problems, even outside the realm of compilers. Once you understand how they work, you will find that the use of lex and yacc will save you many programming hours. Consult the Lex&Yacc Page for lots of information on these useful Unix tools. We also have some comments on running lex and yacc at UMBC.

Calculator

Start by creating a directory called "calc" and copying the files from this directory into it. The files are also accessible for direct copying on the GL system from the directory /afs/umbc.edu/users/p/a/park/pub/cmsc331/fall17/hw4/calc. You can copy the files with the command:

    cp -r /afs/umbc.edu/users/p/a/park/pub/cmsc331/fall17/hw4/calc .

(Note: there is a ' .' (space, then period) at the end of that command--that is very important!)

The key files are Makefile, calc.h, calc.l, and calc.y . We've also included input as a sample input for calc and output is the result of executing calc <input >output .

Note: we have also provided Makefile.bison for those who want to use Bison instead of yacc. Download it, then rename it to "Makefile" (with the Unix command mv) to replace the original.

Modifying the grammar file

For this exercise you will need to modify the files calc.l and calc.y, which are the lex file that tokenizes the input, and the yacc file that describes the rules of the grammar that implements the calculator. First you should experiment playing with the calculator. Make sure to run "make clean" to remove older versions of the files and then run "make":

 make clean
 make

You now have an infix calculator. If you type ``3+2'' the calculator answers with ``5''; if you type ``x=4'' followed by ``3*x'' the calculator responds with ``12''. You invoke the calculator by running the program calc. See the sample session for an example. Now you should play with the calculator and check what works and what does not. You should be able to do addition, subtraction, and multiplication, division, equality and assignment as well as using variable names. We suggest that you look at the files calc.l and calc.y as you play. This way you will understand how the calculator was specified. Your task is to modify the calculator to convert the grammar to use Polish prefix notation and also add some additional operators.

In prefix notation, operators come before their their operands. An advantage, shared with Polish postfix notion, is that the notation does not require the concept of operator precedence to make expressions unambiguous as long as all of the operators take a fixed number of arguments. Postfix notation was used in some calculators (e.g., the HP-10C) and is supported in the Haskell programming language. As we will see, the LISP family of languages can be viewed as using a prefix notation with explicit parenthesis -- a notation that some still refer to as "Cambridge Prefix".

Modify the calc grammar to support prefix notation. To solve the requirement that all operators have fixed arity (i.e., number of arguments), use the tilda symbol (~) for unary minus instead of overloading the - operator.

Then add the following additional operators:

New binary comparison operators: < and >
Logical binary operators and and or and the logical unary operator not
A conditional operator if that takes three arguments: a test, value to return if the test is true and value to return if the test is false

These changes will be relatively simple, and will require modifying both the lex file calc.l, and the yacc file calc.y. See test_session, test_input and test_output for examples that your prefix calculator should handle. Here are some comments to help you get started.

If we want to use prefix notation with just one token look ahead, an operator must take a fixed number of arguments. So we can not use the same operator (-) for both binary and unary minus. Instead, we will use the ~ character to represent unary minus. This is one simple change you will have to make.
We will use words for the logical operators: and, or and not and the conditional if. The rules, as they are, will treat these tokens as NAMEs; we want to modify the scanner to treat them as distinct token types. So you will have to modify the file calc.l, to add four new rules, one for each of these strings that will return the appropriate token for which you might use the symbol AND, OR, NOT and IF. Here is an example lex rule you will have to add.
```
and     { return AND; }
```
But be careful where you place the rule. The lex rule that recognizes a NAME can also match the sequence 'and'. Since both rules match the same number of characters, the one that comes first will be the one used. We always want to match a keyword like 'and', 'or' and 'not' as a token, so these rules must come before the one that matches a NAME. We do not have the same problem with a token like '<' because that will not be matched by any other rule.
Since we have added new tokens, you have to add a %token declaration for each in the first section of the calc.y file. These four new tokens will not have a type associated with them, so add a declaration like "%token AND". Because we are using a Polish prefix grammar, we do not have to worry about precedence or associativity. However, you can just leave these declarations in the first section of the calc.y file -- they will not cause a problem. You can also delete them.
I recommend doing the modifications in two steps. First try converting the initial calculator from infix to prefix without adding the new operators. Once that is working, save a copy and then work on adding the new operators.

Don't worry if yacc complains about shift/reduce conflicts so long as the parser works properly. Recall that yacc applies a deterministic resolution to these conflicts (preferring shifts).

What to submit

Submit three files to the hw4 project via the submit system on gl:

your modified calc.y
your modified calc.l
a typescript file showing your program being compiled (don't forget to do a make clean first so that everything recompiles), and then run by hand with some simple input.

CMSC 331 — Fall 2017

Homework 4

Lex and Yacc

Calculator

Modifying the grammar file

What to submit