Monday, Oct 29, 2012
11:59 pm on Sunday, Nov 18, 2012
Update, Nov 08
A sample output is added at the end of the description
Clarification, Nov. 12
To ensure consistency in random number generation: you should
1. Construct random number generator when the randomized BST is constructed, NOT when randomizedInsert is called;
2. Do nextInt whenever randomizedInsert is called, even if the current tree is empty, in which case you should have nextInt(0+1).
The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees to have expected height of O(log n) without performing any tree balance operations.
The ordinary binary search tree (BST) as defined in the
class works well for random input. The trees of n keys will have a
“high” probability of being balanced and thus having height of
O(log n). But the assumption that the keys are
inserted in random order does not always hold. It is known that with some
specific sequence of insertions and deletions the resulting tree can be
degenerated to have height of O(n), making the
BST performing as poorly as linked lists. Various techniques have been
developed to address the issue of degeneracy of a BST, including AVL trees,
Splaying trees, Red-Black trees, to mention just a few. Randomized binary
Search Tree (RBST), proposed by Martmnez and Roura , is another such technique. Additional references
of RBST and another closely related data structure treap
can be found at Wikipedia .
RBST is a special BST which uses randomized version of the insertion and deletion operations. These operations ensure that each of the n keys has an equal probability (1/n) to be the root of the tree. Consequently, regardless the sequence and order of insertion and deletion operations, the resulting binary search tree would be the same as if it had been built on the random permutations of the n keys, and thus the expected height a RBST is guaranteed to be of O(log n). Of course in worst case any particular RBST may still have its height in O(n), albeit with a very low probability when n is large.
In this project, you are asked to only implement the randomized insertion operation.
This operation requires that each node stores the size of the subtree rooted at that node (i.e., the number of nodes contained in that subtree, including the root). The randomized insertion works as follows (assuming all keys are positive integers): To insert a key x into a BST with n keys, we insert x as the root with a probability of 1/(n+1), and insert it recursively into one of the two subtrees with a probability 1 - 1/(n+1). The probabilistic decision will be done with the help of Java random number generator (see later in Implementation notes section).
Several techniques have been developed for inserting a new node as the root. In this project, we will first insert the new node as a leaf as is done in a ordinary BST and then rotate it about its parent all the way to the root. During the rotation, you need to modify the size of subtrees involved.
Outlines of algorithms for "randomized insertion"
and "insertion at root" are given below
algorithm for randomized insertion
x, bst T)
pick a random number, r, between 0 and the size of T, inclusive;
if r == the size of the tree then
if x < T.key then
T.left = randomizedInsert(x, T.left);
T.right = randomizedInsert(x,
algorithm for insertion at root
insertAtRoot(int x, bst
insert x as a leaf in T;
rotate the node
containing x about its parent all the way to the root of T;
A simple example
Figure below shows a simple example of randomized insertion where key 16 is inserted into the BST on the left. Suppose the first random number generated from the random number generator is 3, we proceed to insert 16 into the right subtree. Suppose the second random number is 2, which equals the size of that subtree, so 16 should be inserted as its root. The tree in the middle of the figure is after inserting 16 as a leaf; the tree at the right is after the rotations. Note the change of the size of each node in these two trees.
In this project you are asked to compare the performance of ordinary BST and RBST. You will build a BST using the standard BST insert operation and a RBST using the randomized insert operation from a sequence of positive integers (no duplicate) read in from a data file. Then you will
Note nextInt(k) in Random class produces integers in [0, k), exclusive of k. So to produce random numbers between 0 to n, inclusive, you should use nextInt(n+1).
Project 3 will be invoked with a command line that consists of two arguments. The first argument will be a positive integer (L), telling the program the depth of the level-order print shall print. The second argument is the full path of a file that contains a sequence (may be empty) of positive integers the program shall use to build the two trees by insertions. The integers in the file may be in a single line or multiple lines; they are separated from each other by one or more space or by line break.
Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.
An example command line looks like
ant -Dargs="6 /afs/umbc.edu/users/y/p/ypeng/pub/cs341f12/test-cases/Proj3/testData1.txt" run
For L = 6 and the dataset containing the following integers
Height of the normal BST: 6
Height of the randomized BST: 3
Level order print of the normal BST
Level 0: 1
Level 1: 3
Level 2: 5
Level 3: 4 6
Level 4: 9
Level 5: 8
Level 6: 7
Level order print of the randomized BST
Level 0: 8
Level 1: 5 9
Level 2: 3 6
Level 3: 1 4 7