CMSC 341 Project 3

Assigned

Monday, Oct 29, 2012

Due

11:59 pm on Sunday, Nov 18, 2012

Update, Nov 08

A sample output is added at the end of the description

Clarification, Nov. 12

To ensure consistency in random number generation: you should

1.      Construct random number generator when the randomized BST is constructed, NOT when randomizedInsert is called;

2.      Do nextInt whenever randomizedInsert is called, even if the current tree is empty, in which case you should have nextInt(0+1).


Objectives

The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees to have expected height of O(log n) without performing any tree balance operations. 


Randomized Binary Search Tree (RBST)

The ordinary binary search tree (BST) as defined in the class works well for random input. The trees of n keys will have a “high” probability of being balanced and thus having height of O(log n). But the assumption that the keys are inserted in random order does not always hold. It is known that with some specific sequence of insertions and deletions the resulting tree can be degenerated to have height of O(n), making the BST performing as poorly as linked lists. Various techniques have been developed to address the issue of degeneracy of a BST, including AVL trees, Splaying trees, Red-Black trees, to mention just a few. Randomized binary Search Tree (RBST), proposed by Martmnez and Roura [1], is another such technique. Additional references of RBST and another closely related data structure treap can be found at Wikipedia [2].

RBST is a special BST which uses randomized version of the insertion and deletion operations. These operations ensure that each of the n keys has an equal probability (1/n) to be the root of the tree. Consequently, regardless the sequence and order of insertion and deletion operations, the resulting binary search tree would be the same as if it had been built on the random permutations of the n keys, and thus the expected height a RBST is guaranteed to be of O(log n). Of course in worst case any particular RBST may still have its height in O(n), albeit with a very low probability when n is large.

In this project, you are asked to only implement the randomized insertion operation.


Randomized Insertion
This operation requires that each node stores the size of the subtree rooted at that node (i.e., the number of nodes contained in that subtree, including the root). The randomized insertion works as follows (assuming all keys are positive integers): To insert a key x into a BST with n keys, we insert x as the root with a probability of 1/(n+1), and insert it recursively into one of the two subtrees with a probability 1 - 1/(n+1). The probabilistic decision will be done with the help of Java random number generator (see later in Implementation notes section).

 

Several techniques have been developed for inserting a new node as the root. In this project, we will first insert the new node as a leaf as is done in a ordinary BST and then rotate it about its parent all the way to the root. During the rotation, you need to modify the size of subtrees involved.

 

Algorithms

Outlines of algorithms for "randomized insertion" and "insertion at root" are given below
 
algorithm for randomized insertion

randomizedInsert (int x, bst T)
{
    pick a random number, r, between 0 and the size of T, inclusive;
    if r == the size of the tree then
          insertAtRoot(x, T);     
       else

          if x < T.key then
             T.left = randomizedInsert(x, T.left);

             else

                T.right = randomizedInsert(x, T.right);
}

 

algorithm for insertion at root

insertAtRoot(int x, bst T)
{
  insert x as a leaf in T;

  rotate the node containing x about its parent all the way to the root of T;
}

 
A simple example

Figure below shows a simple example of randomized insertion where key 16 is inserted into the BST on the left. Suppose the first random number generated from the random number generator is 3, we proceed to insert 16 into the right subtree. Suppose the second random number is 2, which equals the size of that subtree, so 16 should be inserted as its root. The tree in the middle of the figure is after inserting 16 as a leaf; the tree at the right is after the rotations. Note the change of the size of each node in these two trees.

            


Your Tasks

In this project you are asked to compare the performance of ordinary BST and RBST. You will build a BST using the standard BST insert operation and a RBST using the randomized insert operation from a sequence of positive integers (no duplicate) read in from a data file. Then you will

Implementation notes

Note nextInt(k) in Random class produces integers in [0, k), exclusive of k. So to produce random numbers between 0 to n, inclusive, you should use nextInt(n+1).

      /afs/umbc.edu/users/y/p/ypeng/pub/cs341f12/Proj3

The command line

Project 3 will be invoked with a command line that consists of two arguments. The first argument will be a positive integer (L), telling the program the depth of the level-order print shall print. The second argument is the full path of a file that contains a sequence (may be empty) of positive integers the program shall use to build the two trees by insertions. The integers in the file may be in a single line or multiple lines; they are separated from each other by one or more space or by line break.

 

Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.

 

An example command line looks like

  ant -Dargs="6 /afs/umbc.edu/users/y/p/ypeng/pub/cs341f12/test-cases/Proj3/testData1.txt" run


Sample Output

For L = 6 and the dataset containing the following integers

 

1 3 5 4 6 9 8 7

 

The output:

 

Height of the normal BST: 6

Height of the randomized BST: 3

 

Level order print of the normal BST

Level 0: 1

Level 1: 3

Level 2: 5

Level 3: 4 6

Level 4: 9

Level 5: 8

Level 6: 7

 

Level order print of the randomized BST

Level 0: 8

Level 1: 5 9

Level 2: 3 6

Level 3: 1 4 7


References

[1] Martmnez, Conrado; Roura, Salvador (1998), "Randomized binary search trees", Journal of the ACM 45 (2): 288323, doi:10.1145/274787.274812

[2] http://en.wikipedia.org/wiki/Treap#Randomized_binary_search_tree