Homework 4

Out: 10/24, due by 11:59pm 10/30

Lex and Yacc

The goal of this exercise is to acquaint yourself with two very powerful tools for the generation of compilers: lex and yacc. When a program is presented to a compiler, the compiler has to divide the program into meaningful units, called tokens, and then discover the relationship among these units. The algorithm that divides the program into units is called a lexical analyzer, scanner, or lexer.  Once the tokens are identified, the compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is the function of a parser. The parser requires a list of rules that define the relationships among the tokens. This list of rules is called a grammar

Lex (or the gnu version which is called flex) is a program that takes a set of descriptions of possible tokens and produces a C routine that implements a scanner. Yacc takes a concise description of a grammar and produces a C routine that implements a parser for the grammar. 

Lex and yacc are tools which can be used to solve a variety of interesting problems, even outside the realm of compilers. Once you understand how they work, you will find that the use of lex and yacc will save you many programming hours. Consult the Lex&Yacc Page for lots of information on these useful Unix tools. We also have some comments on running lex and yacc at UMBC.

Calculator

Start by creating a directory called "calc" and copying the files from this directory into it. The files are also accessible for direct copying on the GL system from the directory /afs/umbc.edu/users/p/a/park/pub/cmsc331/fall17/hw4/calc. You can copy the files with the command:

    cp -r /afs/umbc.edu/users/p/a/park/pub/cmsc331/fall17/hw4/calc .
    
(Note: there is a ' .' (space, then period) at the end of that command--that is very important!)

The key files are Makefile, calc.h, calc.l, and calc.y . We've also included input as a sample input for calc and output is the result of executing calc <input >output .

Note: we have also provided Makefile.bison for those who want to use Bison instead of yacc. Download it, then rename it to "Makefile" (with the Unix command mv) to replace the original.

Modifying the grammar file

For this exercise you will need to modify the files calc.l and calc.y, which are the lex file that tokenizes the input, and the yacc file that describes the rules of the grammar that implements the calculator. First you should experiment playing with the calculator. Make sure to run "make clean" to remove older versions of the files and then run "make":

 make clean
 make
You now have an infix calculator. If you type ``3+2'' the calculator answers with ``5''; if you type ``x=4'' followed by ``3*x'' the calculator responds with ``12''. You invoke the calculator by running the program calc. See the sample session for an example. Now you should play with the calculator and check what works and what does not. You should be able to do addition, subtraction, and multiplication, division, equality and assignment as well as using variable names. We suggest that you look at the files calc.l and calc.y as you play. This way you will understand how the calculator was specified.  Your task is to modify the calculator to convert the grammar to use Polish prefix notation and also add some additional operators.

In prefix notation, operators come before their their operands. An advantage, shared with Polish postfix notion, is that the notation does not require the concept of operator precedence to make expressions unambiguous as long as all of the operators take a fixed number of arguments. Postfix notation was used in some calculators (e.g., the HP-10C) and is supported in the Haskell programming language. As we will see, the LISP family of languages can be viewed as using a prefix notation with explicit parenthesis -- a notation that some still refer to as "Cambridge Prefix".

Modify the calc grammar to support prefix notation. To solve the requirement that all operators have fixed arity (i.e., number of arguments), use the tilda symbol (~) for unary minus instead of overloading the - operator.

Then add the following additional operators:

These changes will be relatively simple, and will require modifying both the lex file calc.l, and the yacc file calc.y. See test_session, test_input and test_output for examples that your prefix calculator should handle. Here are some comments to help you get started.

Don't worry if yacc complains about shift/reduce conflicts so long as the parser works properly. Recall that yacc applies a deterministic resolution to these conflicts (preferring shifts).

What to submit

Submit three files to the hw4 project via the submit system on gl: