Homework 3

Out: 8/28, due by 11:59pm 10/4 (Wed.)

Semantics, regular expressions, lex

We will use the lex command and Makefile we provide, and check your program with test inputs that include the example given as well as other cases.

(1) Axiomatic semantics I (15 points)

Compute the weakest precondition for the following sequence of statements, given the stated post condition.

{ ?? }
a = 2 * (3 * b - a);
b = 3 * a - 1;
{b > 17}

Hint: Start by deciding what variable or variables should be mentioned in the precondition. Recall the rule for sequences: the precondition for a statement in a sequence is equal to the postcondition for the statement that preceeds it. Note: Your answer will involve a relationship between a and b.

(2) Axiomatic semantics II (15 points)

Compute the weakest precondition for the following statements and the given post condition. .

{ ?? }
if (x > y) then y = 2 * x - 1 else y = 3 * x + 1;
{ y > 7 }

Hint: Analyze the two cases that the if-then-else creates: one where x is greater than y and one where it is not. Figure out what are the weakest preconditions for each. Find the strongest single precondition that is logically implied by both.

On Regular expressions

The lexical level of a programming language is usually specified with regular expressions or finite state automata. Consider the definition of strings for simple integers. We specify that an unsigned integer must be either a single zero, or be a string that begins with a non-zero digit and continues with any number of additional digits (e.g., '1', '451', and '0' are all ok, but '001', 'foo' and '3.14' are not ok). This can be described by the deterministic finite automaton (DFA) to the right. Note that we identify the start state, and mark each accepting state with an asterisk. The transitions (arrows) are labeled with the input that will cause the transition and we use the common notation [c1-c2] to mean any single character in the ascii range between c1 and c2. We can also write this language specification as a regular expression:

0 | [1-9] [0-9]*

(3) Anaconda numbers in lex (70 points)

You are developing a new programming language Anaconda and need to decide how integers should be represented. The following is a specification for an Anaconda integer that one of your teammates comes up with.

An Anaconda integer can be unambiguously specified in decimal, binary, octal or hexadecimal form. All of these forms can begin with an optional sign, which can be a "+" or a "-" with no separating characters between it and the rest of the integer.

A decimal integer can be a single zero or a sequence of one or more decimal digits that starts with a non-zero decimal digit. The decimal digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.

An octal number starts with a zero and is followed by a single zero or a non-empty sequence of octal digits that starts with a non-zero octal digit. An octal digit is one of 0, 1, 2, 3, 4, 5, 6 or 7.

A hexadecimal number starts with a zero and is followed by an upper or lower case "x", and either a single zero or a non-zero hexadecimal digit and a sequence of zero or more hexadecimal digits. Hexadecimal digits include the decimal digits and the letters a, b, c, d, e and f and their uppercase versions.

Notice how long and confusing this description has to be to ensure that it is unambiguous. You can see that we do not allow "leading zeros" in any of the representations, e.g., 0x007 is a bad hex number, and 007 is both a bad octal number and a bad decimal number.

(3a) Lexical Scanner in Java/Python/C (20 points)

Write a program in Java, Python, or C/C++ that reads in a single character at a time, and when it reaches end-of-file, reports whether it has read in a decimal, octal, or hexadecimal number, and what that number is (as a string), or "NONSENSE" if it isn't a well-formed number. Name your program file lexical.{py|c|java} (suffix obviously depending on what language you choose).

(3b) Deterministic Finite Automata (DFA) (15 points)

Draw deterministic finite automata (DFA) that correspond to the three patterns for Anaconda integers. In other words, we want one DFA for each pattern, three DFA in all. You can either do this by hand, scan it in and add it to a pdf file or use a drawing program to produce the diagrams. Google Docs do allow you to create drawings and save them as pdf. (15 points)

(3c) Lexical Scanner in Lex (35 points)

Develop a lexical scanner for Anaconda integers using the Unix utility Lex or Flex. Start by reading the compact guide to lex and yacc. The look at and experiment with the the simplescanner example which uses lex to create a simple lexical scanner for a Pascal-like language. The file scanner.l is the input to lex. Running make with the Makefile will use lex to create scanner.c and compile this to produce an executable named scanner. Invoking this with a simple file input will produce this session.

Now look at the directory anacondaScanner that we have provided with a stub scanner.l and a sample input file and a session that should be produced using that sample input.

For your convenience, both of the directories simplescanner and anacondaScanner are available as the tar files: anacondaScanner.tar and simplescanner.tar.

Extra Credit (10 points)

(A slight variation on what we talked about in class:) Design a DFA that will recognize accept the regular expression:

(a | ε) a a b c

(Note that ε (epsilon) means the empty string.)

What to submit

Submit three files to the hw3 project via the submit system on gl:

The first should be named hw3.pdf, and should contain your answers to problems 1, 2 and 3b (and the extra credit problem).
The second file is lexical.{py|c|java}, which will have the solution to problem 3a.
The third is scanner.l from problem 3c.

CMSC 331 — Fall 2017