- to practice writing templates, and
- to practice working with binary search trees.

It is frequently the case that we want to store auxiliary information in the nodes of a binary search tree to help us manage the binary search tree or to support additional operations on the binary search tree. For this project, we will work with binary search trees where we keep in each node the size of the subtree rooted at that node. Here, the size of a subtree is simply defined as the number of nodes in the subtree (including the root of the subtree). The size information must be maintained after each insertion and deletion from the binary search tree. Fortunately this can be done without changing the asymptotic running time of the insert and delete operations. When we insert a new node into the binary search tree, we simply add 1 to the size data stored in each ancestor of the node. Similarly, when a node is removed from the tree, the size data in its ancestors are reduced by 1. (Recall that during the delete operation, the node removed from the tree might not originally contain the key that the client requested to be deleted.)

Using the size information, we can support additional operations for a binary search tree:

- rank(k) = the number of items in the binary search tree
with key strictly less than k.
- range(k1, k2) = the number of items in the binary search
tree with key greater than or equal to k1 and strictly less than k2.

Your assignment is to write the templates for a binary search tree class which maintains the size information as described above. Your binary search tree template class should support the following operations:

- insert(x): add item x to the binary search tree.
- extract(k): remove and return one item with key k from the binary
search tree. If there is more than one item with key k, your operation can
remove any one of them.
- rank(k): compute the rank of key k in the tree, as defined above.
- range(k1, k2): compute the range, as defined above.
- find(k): find an item with key k.
- size(x): return the size of the subtree rooted at x.

- The worst case running time of each operation must be proportional to
the height of the binary search tree.
- Be careful with the extract operation. If there are no items in the
tree with key k, then none of the size information should change! Also make
sure that you are changing the size information at the correct nodes.
- The word "delete" is a keyword in C++. That's why we are
calling the delete operation extract().
- The type of the value returned by find() should be appropriate. For
example, we should be able to use this return value in the call to size().
- Make sure you understand the definition of rank and range. The
definition was carefully chosen to make the implementation straightforward.
(E.g., do not substitute "strictly less than" with "less
than or equal to".)
- The functions rank(k) and range(k1, k2) make sense even if the tree
does not contain any items with key k, k1 or k2.
- Test data will be made available in the directory:
/afs/umbc.edu/users/c/h/chang/pub/cs202/project5/ These are files of integers. Your main program should insert these integers in a binary search tree. Each file is accompanied by a file which contains a short sequence of operations that you should perform on the binary search tree after inserting the integers.

- Your templates should be generic. I.e., it should be possible to
instantiate the templates with data types other than int.

Turn in all the files that are needed to compile your program. For each test data file, include a main program that builds the binary search tree from the file, performs the requested operations, and prints out the results of the rank and range operations. Include a typescript file with the sample runs of these main programs.

For extra credit, implement the following scheme to keep your binary search tree balanced. During an insertion or a deletion, if we find a node where one of its subtrees has size that is more than twice the size of the other subtree, then we rebalance the subtree rooted at that node. The rebalance procedure requires us pull apart the subtree and rebuild it in the following way.

First we turn the subtree into a sorted array. This is accomplished by an inorder walk of the subtree and storing the address of each node in the appropriate element of the array. (The array should be an array of pointers.) From this sorted array, we can recursively construct a balanced tree by picking the middle element of the array to be the root of the subtree.

During a single insert or delete operation, we will rebalance at most one subtree. If inserting a node in the tree will cause several subtrees to be unbalanced, we will do the rebalancing at the node closest to the root of the tree.

*Additional note:* We do not want to rebalance subtrees with a
small number of nodes. For this project, you should rebalance a subtree
only if it has at least 5 nodes.

The rebalancing operation should take time proportional to the number of
items in the subtree. This is a time consuming operation, but we won't have
to rebalance very often. In fact, it can be shown, using a technique called
*amortized analysis,* that any sequence of m operations takes time at
most O(m log n) where n is the maximum number of items in the tree. Thus,
on average, each operations takes time O(log n).

For an extra 10%, implement the balanced binary search tree scheme described above. Don't forget to update the size information when you rebalance a subtree. Run your program on the same set of data as the regular project. As usual, extra credit is all or nothing --- you either get 10% or 0% extra credit.

Last Modified: 24 Nov 1998 13:25:28 EST by Richard Chang

Back up to Fall 1998 CMSC 202 Section Homepage