UMBC CMSC202, Computer Science II, Spring 1998, Sections 0101, 0102, 0103, 0104 and Honors

Tuesday April 21, 1998

Assigned Reading:

A Book on C:
Programming Abstractions in C: 14.1 - 14.5

Handouts (available on-line): none

Topics Covered:

Another kind of binary tree is an expression tree --- a representation of arithmetic expressions. For example, the expression 2 + 3 can be represented as a binary tree with a root labeled with +. The root would have two children labeled with 2 and 3. For more complicated arithmetic expressions, the root of the tree is the operation that is performed last. The left subtree and right subtree of the root are the left and right operands of that last operation. For example, in the expression (2 + 3) * (4 + 5), the * is performed last, so the left and right subtrees represent (2 + 3) and (4 + 5), respectively.

Note that in an expression tree, we do not need to store parentheses, since the order of evaluation is implicit in the structure of the tree.
We can derive an expression tree class, ETree, from the Tree class we used in the previous lecture. (See header file and implementation.) One difference here is that we use the item.h header file to indicate that we want each node of the tree to hold a token. Here a token is simply a structure that holds two fields --- a symbol and a value. The symbol field determines whether a node in the binary tree is labeled with an operation (and which one) or a number. In the case of a number, the value field stores the value of that number. This is the same arrangement we used in the recursive descent parser discussed earlier in the semester. The type token_t is defined in the file token.h. However, the Tree class really wants an Item class. This is declared in the header file tokenitem.h and implemented in tokenitem.C. So far, the arrangement is not much different from what we have already seen with the List class, when we wanted linked lists that hold structs instead of strings.
In the derived class ETree, we have a new member function called Evaluate() which computes the value of the arithmetic expression stored in the expression tree. The implementation of Evaluate() is a straightforward recursive function, but there is one subtle point. Since the ETree class is derived from Tree, the left and right data members of ETree are inherited from the Tree class. In the declaration of the Tree class, left and right are declared as pointers to Tree. This does not change when left and right are inherited. Thus, the left and right fields of an ETree node are pointers to Tree and not pointers to ETree. As a result, we cannot recursively compute the value of the left subtree of an ETree node using the statement: The compiler will complain that the Tree class does not have a member function called Evaluate(). We can get around this problem, by coercing the type of left to a pointer to ETree and storing the value in the variable lchild (which is a pointer to ETree). For the right subtree, we combine the two statements into one:
Next, we test the ETree class from a main program. In this program, we manually construct the expression tree for the expression (2 + 3) * 5. The sample run shows the result of evaluating the expression tree.
Manually constructing expression trees can get really tedious. Fortunately, we can easily adapt our recursive descent parser to construct an expression tree instead of evaluating the expression. First, we converted the tokenizer from C to C++. The tokenizer becomes a class. Once initialized with a string in the alternate constructor, the member functions of the Tokenizer class will provide a stream of tokens to the parser. Tokens are logical units of the input string. For example, the expression ( 123 + 456 - 2 ) would be divided into 7 tokens for "(", "123", "+", "456", "-", "2" and ")". Note that the tokenizer does not understand the meaning of "+" or ")", it deals strictly with characters and strings. Nevertheless, having a tokenizer greatly reduces the job of the parser, since the parser can now deal with logical entities rather than characters and strings. See header file and implementation of the tokenizer.
Our parser is also converted from C to C++. Because of the bottom up nature of this parser, we do not turn the parser functions into member functions of the ETree class. Thus, the new parser is simply the same as the C version with some of the loops restructured. See header file and implementation of the parser.
Finally, we can test our parser using a main program that prompts the user to enter an arithmetic expression. The sample run shows that our parser has retained the ability to detect and report syntax errors.

Last Modified: 28 Apr 1998 09:41:07 EDT by Richard Chang

Back up to Spring 1998 CMSC 202 Section Homepage