UMBC CMSC202, Computer Science II, Fall 1998, Sections 0101, 0102, 0103, 0104

Project 5: Trees and Templates

Due: Friday, December 11, 1998

Objective:

The objectives of this project are

to practice writing templates, and
to practice working with binary search trees.

Background:

It is frequently the case that we want to store auxiliary information in the nodes of a binary search tree to help us manage the binary search tree or to support additional operations on the binary search tree. For this project, we will work with binary search trees where we keep in each node the size of the subtree rooted at that node. Here, the size of a subtree is simply defined as the number of nodes in the subtree (including the root of the subtree). The size information must be maintained after each insertion and deletion from the binary search tree. Fortunately this can be done without changing the asymptotic running time of the insert and delete operations. When we insert a new node into the binary search tree, we simply add 1 to the size data stored in each ancestor of the node. Similarly, when a node is removed from the tree, the size data in its ancestors are reduced by 1. (Recall that during the delete operation, the node removed from the tree might not originally contain the key that the client requested to be deleted.)

Using the size information, we can support additional operations for a binary search tree:

rank(k) = the number of items in the binary search tree with key strictly less than k.
range(k1, k2) = the number of items in the binary search tree with key greater than or equal to k1 and strictly less than k2.

Each operation would take time proportional to the height of the binary search tree.

Assignment:

Your assignment is to write the templates for a binary search tree class which maintains the size information as described above. Your binary search tree template class should support the following operations:

insert(x): add item x to the binary search tree.
extract(k): remove and return one item with key k from the binary search tree. If there is more than one item with key k, your operation can remove any one of them.
rank(k): compute the rank of key k in the tree, as defined above.
range(k1, k2): compute the range, as defined above.
find(k): find an item with key k.
size(x): return the size of the subtree rooted at x.

Implementation Issues:

The worst case running time of each operation must be proportional to the height of the binary search tree.
Be careful with the extract operation. If there are no items in the tree with key k, then none of the size information should change! Also make sure that you are changing the size information at the correct nodes.
The word "delete" is a keyword in C++. That's why we are calling the delete operation extract().
The type of the value returned by find() should be appropriate. For example, we should be able to use this return value in the call to size().
Make sure you understand the definition of rank and range. The definition was carefully chosen to make the implementation straightforward. (E.g., do not substitute "strictly less than" with "less than or equal to".)
The functions rank(k) and range(k1, k2) make sense even if the tree does not contain any items with key k, k1 or k2.
Test data will be made available in the directory:
These are files of integers. Your main program should insert these integers in a binary search tree. Each file is accompanied by a file which contains a short sequence of operations that you should perform on the binary search tree after inserting the integers.
Your templates should be generic. I.e., it should be possible to instantiate the templates with data types other than int.

What to turn in:

Turn in all the files that are needed to compile your program. For each test data file, include a main program that builds the binary search tree from the file, performs the requested operations, and prints out the results of the rank and range operations. Include a typescript file with the sample runs of these main programs.

Extra Credit:

For extra credit, implement the following scheme to keep your binary search tree balanced. During an insertion or a deletion, if we find a node where one of its subtrees has size that is more than twice the size of the other subtree, then we rebalance the subtree rooted at that node. The rebalance procedure requires us pull apart the subtree and rebuild it in the following way.

First we turn the subtree into a sorted array. This is accomplished by an inorder walk of the subtree and storing the address of each node in the appropriate element of the array. (The array should be an array of pointers.) From this sorted array, we can recursively construct a balanced tree by picking the middle element of the array to be the root of the subtree.

During a single insert or delete operation, we will rebalance at most one subtree. If inserting a node in the tree will cause several subtrees to be unbalanced, we will do the rebalancing at the node closest to the root of the tree.

Additional note: We do not want to rebalance subtrees with a small number of nodes. For this project, you should rebalance a subtree only if it has at least 5 nodes.

The rebalancing operation should take time proportional to the number of items in the subtree. This is a time consuming operation, but we won't have to rebalance very often. In fact, it can be shown, using a technique called amortized analysis, that any sequence of m operations takes time at most O(m log n) where n is the maximum number of items in the tree. Thus, on average, each operations takes time O(log n).

For an extra 10%, implement the balanced binary search tree scheme described above. Don't forget to update the size information when you rebalance a subtree. Run your program on the same set of data as the regular project. As usual, extra credit is all or nothing --- you either get 10% or 0% extra credit.

Last Modified: 24 Nov 1998 13:25:28 EST by Richard Chang

Back up to Fall 1998 CMSC 202 Section Homepage