CMSC 341 Spring 2005
Project 3

Assigned Wednesday, March 16, 2005
Due 11:59 pm on Tuesday, April 5, 2005
Correction:
March 20
An error on Figure 1 has been corrected:
18 and 28 should be children of 25, not of 33.
Correction:
March 24
An error in the formula for generating random integers have been corrected:
module (%), rather than division (/) should be used.. I.e., r = rand()%(n+1).

Correction  and
Clarification:

March 30
1. An error in sample output is corrected,see Corrected Sample output
2. To make the life easier, in RBST,you can call "find" function to determine if the item to be inserted is already in the tree.
3. Number of comparisons: you should compare the following
    regular BST: number of comparisons done during insert operations;
    RBST:  number of comparisons done during insert operations, including those done in split operations.
    in both, determining if a (sub)tree is empty, i.e., comparing with NULL, should be counted as one comparison.
Correction  and
Clarification:

March 31
One student pointed out that the comparison given above is unfair to BST because RBST does insertion after find, so it
only counts comparisons for those which are not yet in the tree but regular BST counts all comparisons regardless if an item
is already in. For a fair comparison, YOU SHOULD EXCLUDE comparisons for duplicate items for BST. One way to do
this is to do a find first and only do insert if find fails (the same way as for RBST insertion). The Sample output is modified accordingly

Objectives

The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees  to have expected height of O(log n) without performing any tree balance operations. 

Randomized Binary Search Tree (RBST)

The ordinary binary search tree (BST) as defined in the class works well for random input. The trees of n keys will  have a ``large'' probability of being balanced  and thus of height of O(log n). As a result it is well known that the search, insert or delete operations have expected time of O(log n). But the assumption that the entries are inserted in random order does not always hold; moreover it is known that, under some specific sequence of insertions and deletions, the resulting tree can be degenerated to have height of O(n). Various techniques have been developed to address the issue of degeneracy of a BST, including AVL trees, Splaying trees, Red-Black trees, to mention just a few. Randomized binary Search Tree (RBST), proposed by Martínez and Roura [2], is another such technique.

RBST is a special BST which  uses randomized version of the insertion and deletion operations. These operations ensure that each of the n  keys has an equal probability (1/n) to be the root of the tree. Consequently, regardless the sequence of operations and the order of insertions and deletions of keys, the resulting binary search tree has the same probability as if it had been built on the random permutation of the n keys, and thus the expected cost of operations on a RBST is guaranteed to be of order O(log n).

In this project, you are asked to only implement the randomized insertion operation, which is described next, followed by an illustrative example and the algorithms.

Randomized Insertion.
The randomized insertion works as follows (assuming all keys are nonnegative integers and that the key to be inserted is not already in the tree)
To insert a key X into a BST T with n keys, we use the insertion at root algorithm (to be given below) with probability 1/(n+1), and with probability 1-1/(n+1) we insert the key recursively into the right or left subtree, according to the respective values of the key and the root.
The main idea behind insertion at root is to use the split operation: This operation creates two new BST's L and R from a key x and a given BST  T, which is destroyed by the operation: The first BST L contains the keys of T that are smaller than X, and the second BST R contains the keys of T that are larger than X. Then we insert x into a new node, whose left and right children are L and R, respectively. An example is given in Figure 1 below.

                                       Split by 26
                15
               /  \
              /    \
             /      \                              15
            10      30                            /  \
           /  \     /  \                         /    \
          /    \   /    \                       /      \
         5     14 25    33                     10      25                  30
                 /  \                         /  \     /                  /  \
                /    \                       /    \   /                  /    \
               18    28                     5     14 18                 28    33

             T                        L                     R

                            Resulting tree after inserting 26 at the root


                                            26
                                           /  \
                                          /    \
                                         15     \
                                        /  \     \
                                       /    \     \
                                      /      \     \
                                     10      25    30
                                    /  \     /    /  \
                                   /    \   /    /    \
                                  5     14 18   28    33
 
Algorithms
Outlines of algorithms for "randomized insertion", "insertion at root", and "split a BST"  are given below
 
algorithm for randomized insertion
bst randomizedInsert (int x, bst T)
{
    pick a random number, r, between 0 and the size of T, inclusive
    if r == the size of the tree
          insert x at the root
    otherwise
          randomly insert x in the proper subtree of T
}

algorithm for insertion at root
bst insertAtRoot( int x, bst T)
{
   split T into two subtrees, called L and R
   make a new root node containing x
   reattach the L and R to the new root as its left and right subtrees
}
algorithm to split a tree
void split( int x, bst T, bst S, bst G) /* assuming x is not in T */
{
 if T is null, make S and G null
else if x is less than T->key
set G = T /* all keys at the root and in right subtree are greater than x */
split T's left subtree into S and G's left subtree
/* some keys in T's left subtree are less than x, other may be greater than x */
else /* x is greater than T->key */
set S = T
split T's right subtree into S's right subtree and G
}
Explanation of algorithm insert_at_root.
T is to be split into two BST S and G where S contain all key in T that are less than x, and G contains those greater than x.  If T is empty, nothing must be done and both S and G are also empty. Otherwise, if x < T->key, then the right subtree of T and the root of T belong to G. To compute S, and the remaining part of G, that is, the subtree that contains the keys in T->left which are greater than x, we make a recursive call to split( x, T->left). If x > T->key, we proceed in a similar manner.

Consider the example in Figure 1. Since 26>T->key, we set S pointing to T, then proceed to split T->right on 26. the two subtrees coming out of this split will become the right subtree of S (containing those keys > x) and become G, respectively.

         split(26, T)            split(26, T->right)      split(26, T->right->left)  split(26, T->right->left->right)
            26 > 15                   26 < 30                    26> 25                   26 < 28

         S-->  15                  S-->  15                  S-->  15                  S-->  15
             /    \                    /                         /                         /
                \                                                                          
           /        \                                                                           
          10        30              10          30 <-G        10          30 <-G       10          30   <-G        
         /  \      /  \            /  \        /  \          /  \           \          /  \          \
        /    \    /    \          /    \      /    \        /    \           \        /    \          \
       5     14  25    33        5     14    25    33      5     14    25    33      5     14    25   33
                /  \                        /  \                      /  \                      /          
               /    \                      /    \                    /    \                    /
              18    28                    18    28                  18    28                  18        28    

            split left subtree of 28  does nothing because it is empty. S and G are formed with the returns of all recursive calls. Then 26 becomes the key of the root of the new BST with S and G as its left and right subtrees. Figures below show the process of rewinding of recursive calls, reading from right to left.

                   26
                 /    |
               
/     |
         S-->  15     |             S-->  15                   S-->  15                  S-->  15
              /  |    |                  /  |                      /                         /    
               |    |                 /   |                     /                                
               |    |                /    |                                                       
           10    |   30 <-G          10     |    30 <-G         10          30 <-G        10           30 <-G        
          /  \    \   | \            /  \    \    | \          /  \          | \         /  \           \
         /    \    \  |           /    \      |  \        /    \         |  \       /    \           \
        5     14   2533        5     14   25  |    33    5     14   25   |   33    5     14    25    33
                   /  |                       /   |                     /    |                     /              
                  /   |                      /    |                    /     |                    /           
                 18   28                    18    28                  18     28                  18    28      


Your Tasks

There are two main tasks for this project:
  1. Define and  implement  randomized insertion and related operations  following the algorithms described above. This should be done by modifying the author's BST code.
  2. Conduct experiments comparing the performance of ordinary BST and RBST (only concerning with randomized insertion, not randomized deletion). Specifically, for each experiment
The following are the specific tasks

The command line

Project 3 will be invoked with a command line that consists of two arguments. The first argument will be an positive integer (L), telling the program the depth the level-order print shall print. The second argument is the  the name of a file that contains a sequence (may be empty) of non-negative integers the program shall use to build the two trees by insertions. The file could be empty. If it is not empty, it contains only nonnegative integers. The integers in the file may be in a single line or multiple lines.

Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.

An example command line looks like
     Proj3 5 /afs/umbc.edu/users/y/p/ypeng/pub/cs341s05/test/Proj3/test


Sample Output

Corrected Sample output for a small test input is now available.

Files To Be Submitted

Submit all files required to build an executable named Proj1 by running make.
Also submit 341-Spring04-proj1_questions.txt - with your answers to the questions.

Submit the files using the procedure given to you in the class.
If your makefile is set up correctly, you should be able to execute the command make submit.

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check . Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj3

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj3 5 test

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.