CMSC 341 Spring 2011
Project 2

Assigned

Wednesday, Feb 23, 2011

Due

11:59 pm on Wednesday, March 30, 2011

Revised Due Time: 11:59 pm on Sunday, April 3, 2011

Update

Thursday, March 10

1. Due time has been revised to allow you a few more days to complete the project.

2. Sample data set and output is added.

3. A note on how to use nextInt(k) to generate random numbers is added

(in Implementation Notes section).

4. Pseudo code was revised slightly to make it more readable. A note is added to

explain parameter passing in recursive calls in split function

Objectives

The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees to have expected height of O(log n) without performing any tree balance operations.

Randomized Binary Search Tree (RBST)

The ordinary binary search tree (BST) as defined in the class works well for random input. The trees of n keys will have a ``high'' probability of being balanced and thus having height of O(log n). But the assumption that the keys are inserted in random order does not always hold; it is known that with some specific sequence of insertions and deletions the resulting tree can be degenerated to have height of O(n), making the BST performing as poor as linked lists. Various techniques have been developed to address the issue of degeneracy of a BST, including AVL trees, Splaying trees, Red-Black trees, to mention just a few. Randomized binary Search Tree (RBST), proposed by Martínez and Roura [1], is another such technique.

RBST is a special BST which uses randomized version of the insertion and deletion operations. These operations ensure that each of the n keys has an equal probability (1/n) to be the root of the tree. Consequently, regardless the sequence of operations and the order of their insertions and deletions, the resulting binary search tree has the same probability as if it had been built on the random permutations of the n keys, and thus the expected height a RBST is guaranteed to be of order O(log n). Of course in worst case any particular RBST may still have its height in O(n).

In this project, you are asked to only implement the randomized insertion operation, which is described next, followed by an illustrative example and the algorithms.

Randomized Insertion.
The randomized insertion works as follows (assuming all keys are integers): To insert a key X into a BST T with n keys, we use the insertion at root algorithm (to be given below) with probability 1/(n+1), and with probability 1 - 1/(n+1) we insert the key recursively into the right or left subtree, according to the respective values of the key and the root.

The main idea behind insertion at root is to use the split operation: This operation creates two new BST's L and R from a key X and a given BST T, which is modified by the operation. The BST L contains the keys of T that are smaller than X, and the BST R contains the keys of T that are larger than X. Then we insert X into a new node, whose left and right children are L and R, respectively. An example is given in Figure 1 below.

Algorithms
Outlines of algorithms for "randomized insertion", "insertion at root", and "split a BST" are given below

algorithm for randomized insertion

bst randomizedInsert (int x, bst T)
{
pick a random number, r, between 0 and the size of T, inclusive
if r == the size of the tree
insert x at the root

/* x has probability of 1/(T+1) inserting at the root */
else

if x < T.key then
T.left = randomizedInsert(x, T.left);

else

T.right = randomizedInsert(x, T.right)
}

algorithm for insertion at root

bst insertAtRoot( int x, bst T)
{
bst L, bst R;

   split T into two subtrees, called L and R
   make a new root node containing x
   reattach L and R to the new root as its left and right subtrees
}

algorithm to split a tree
void split( int x, bst T, bst L, bst R) /* assuming x is not in T */

         if T is null, make L and R null

        else if x < T.key

          set R = T /* all keys at the root and in right subtree are greater than x */

          recursively split T's left subtree into L and R’s left subtree

             /* some keys in T's left subtree are greater than x, other may be less than x */

        else /* x is greater than T.key */

          set L = T

          recursively split T's right subtree into R and L's right subtree

Note: Since the parameter passing in Java is only by value, directly using L, R, or L.left, R.right in recursive calls inside split function will make L and R null, you need to find a way to work around it. One suggestion is to use local variables, say lNode and rNode, in the recursive calls and link them to L and R after these calls.

So your recursive calls will look like

Split(x, T.left, L, rNode) if x < T.key; and Split(x, T.right, lNode, R) if x > T.key

Then after the recursive calls, link R.right and L.left to rNode and lNode, respectively.

You need to work out the details to make your code run correctly.

Explanation of algorithm insert_at_root.
T is to be split into two BST L and R where L contain all key in T that are smaller than x, and R contains those greater than x. If T is empty, nothing shall be done and both L and R are also empty. Otherwise, if x < T.key, then the right subtree of T and the root of T belong to R. To compute L, and the remaining part of R, that is, the subtree that contains the keys in T.left which are greater than x, we make a recursive call to split T.lef. If x > T.key, we proceed in a similar manner.

Consider the example in Figure 1. Since 26 > T.key, we set L pointing to T, then proceed to split T.right on 26. the two subtrees coming out of this split will become the right subtree of L (containing those keys > x) and become R, respectively. In the diagraphs below, the part of the original tree not split is colored black, the Ls from slit operations colored red, R’s blue. Also note that for illustrative purpose only two of the four parameters for split function are shown in the diagram.

Split left subtree of 28 does nothing because it is empty. L and R are formed with the returns of all recursive. Then 26 becomes the key of the root of the new BST with L and R as its left and right subtrees. Figures below, reading from right to left, show the process of unwinding of recursive calls where R’ is attached to the left of R and L’s to the right of L.

Your Tasks

In this project you are asked to compare the performance of ordinary BST and RBST. You will build a BST using the standard BST insert operation and a RBST using the randomized insert operation from a sequence of non-negative integers (no duplicate) read in from a data file. Then you will

Output the height of each of the two trees.
Print both trees in level-order from the root to level L, which is given in the command line.

Implementation notes

For uniformity, you must use parameterized java.util.Random to create your generator, using 123456789 as the seed. Each time when randomized insertion is called, the random number generator should produce an integer r between 0 to n, where n is the size of the current subtree.

Note nextInt(k) in Random class pseudo-randomly produces integers in [0, k), exclusive of k. So to produce random numbers between 0 to |T|, inclusive, you should use nextInt(|T|+1).

We can use the authors’ code for BST, which was presented in the class and can be found at http://users.cis.fiu.edu/~weiss/dsaajava2/code/. You need to write your own functions needed for this project that are not found in the authors’ code. You can modify this code for your RBST implementation.

The command line

Project 2 will be invoked with a command line that consists of two arguments. The first argument will be a positive integer (L), telling the program the depth of the level-order print shall print. The second argument is the full path name of a file that contains a sequence (may be empty) of non-negative integers the program shall use to build the two trees by insertions. The integers in the file may be in a single line or multiple lines, they are separated from each other by one or more space or by line break.

Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.

An example command line looks like

  ant -Dargs="6 /afs/umbc.edu/users/y/p/ypeng/pub/cs341s11/Proj2/testData1.txt" run

Sample Output

The test data set can be found in the location given in the example command line above. It contains the following non-negative integers:

1 3 5 4 6 9 8 7

The output from executing the above command line shall look like the following:

Height of the normal BST: 6

Height of the randomized BST: 4

Level order print of the normal BST

Level 0: 1

Level 1: 3

Level 2: 5

Level 3: 4 6

Level 4: 9

Level 5: 8

Level 6: 7

Level order print of the randomized BST

Level 0: 3

Level 1: 1 5

Level 2: 4 8

Level 3: 6 9

Level 4: 7

Submission

The repository location for this project is:

      /afs/umbc.edu/users/y/p/ypeng/pub/cs341s11/Proj2

References

[1] Martínez, Conrado; Roura, Salvador (1998), "Randomized binary search trees", Journal of the ACM 45 (2): 288–323, doi:10.1145/274787.274812