Assigned |
Monday, Oct 29, 2012 |
Due |
11:59 pm on Sunday, Nov 18, 2012 |
Update, Nov 08 |
A sample output is added at the end of the description |
Clarification, Nov. 12 |
To ensure consistency in random number generation: you should 1. Construct random number generator when the randomized BST is constructed, NOT when randomizedInsert is called; 2. Do nextInt whenever randomizedInsert is called, even if the current tree is empty, in which case you should have nextInt(0+1). |
The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees to have expected height of O(log n) without performing any tree balance operations.
The ordinary binary search tree (BST) as defined in the
class works well for random input. The trees of n keys will have a
“high” probability of being balanced and thus having height of
O(log n). But the assumption that the keys are
inserted in random order does not always hold. It is known that with some
specific sequence of insertions and deletions the resulting tree can be
degenerated to have height of O(n), making the
BST performing as poorly as linked lists. Various techniques have been
developed to address the issue of degeneracy of a BST, including AVL trees,
Splaying trees, Red-Black trees, to mention just a few. Randomized binary
Search Tree (RBST), proposed by Martmnez and Roura [1], is another such technique. Additional references
of RBST and another closely related data structure treap
can be found at Wikipedia [2].
RBST is a special BST which uses randomized version of the insertion
and deletion operations. These operations ensure that each of the
n keys has an equal probability (1/n) to be the root of the
tree. Consequently, regardless the sequence and order of insertion and deletion
operations, the resulting binary search tree would be the same as if it had
been built on the random permutations of the n keys, and thus the expected
height a RBST is guaranteed to be of O(log n).
Of course in worst case any particular RBST may still have its height in O(n), albeit with
a very low probability when n is
large.
In this project, you are asked to only implement the randomized insertion
operation.
Randomized Insertion
This operation requires that each node stores the size
of the subtree rooted at that node (i.e., the number
of nodes contained in that subtree, including the
root). The randomized insertion works as follows (assuming all keys are
positive integers): To insert a key x into a BST with n keys, we
insert x as the root with a
probability of 1/(n+1), and insert it
recursively into one of the two subtrees with a
probability 1 - 1/(n+1).
The probabilistic decision will be done with the help of Java random number
generator (see later in Implementation notes section).
Several techniques have been developed for inserting a new node as the root. In this project, we will first insert the new node as a leaf as is done in a ordinary BST and then rotate it about its parent all the way to the root. During the rotation, you need to modify the size of subtrees involved.
Algorithms
Outlines of algorithms for "randomized insertion"
and "insertion at root" are given below
algorithm for randomized insertion
randomizedInsert (int
x, bst T)
{
pick a random number, r, between 0 and the size of T, inclusive;
if r == the size of the tree then
insertAtRoot(x,
T);
else
if x < T.key then
T.left
= randomizedInsert(x, T.left);
else
T.right = randomizedInsert(x,
T.right);
}
algorithm for insertion
at root
insertAtRoot(int x, bst
T)
{
insert x as a leaf in T;
rotate the node
containing x about its parent all the way to the root of T;
}
A simple example
Figure below shows a simple example of
randomized insertion where key 16 is inserted into the BST on the left. Suppose
the first random number generated from the random number generator is 3, we
proceed to insert 16 into the right subtree. Suppose
the second random number is 2, which equals the size of that subtree, so 16 should be inserted as its root. The tree in
the middle of the figure is after inserting 16 as a leaf; the tree at the right
is after the rotations. Note the change of the size of each node in these two
trees.
In this project you are asked to compare the performance of ordinary BST and RBST. You will build a BST using the standard BST insert operation and a RBST using the randomized insert operation from a sequence of positive integers (no duplicate) read in from a data file. Then you will
Implementation notes
Note nextInt(k) in Random class produces integers in [0, k), exclusive of k. So to produce random numbers between 0 to n, inclusive, you should use nextInt(n+1).
/afs/umbc.edu/users/y/p/ypeng/pub/cs341f12/Proj3
Project 3 will be invoked with a command line that consists of two arguments. The first argument will be a positive integer (L), telling the program the depth of the level-order print shall print. The second argument is the full path of a file that contains a sequence (may be empty) of positive integers the program shall use to build the two trees by insertions. The integers in the file may be in a single line or multiple lines; they are separated from each other by one or more space or by line break.
Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.
An example command line looks like
ant -Dargs="6
/afs/umbc.edu/users/y/p/ypeng/pub/cs341f12/test-cases/Proj3/testData1.txt"
run
For L = 6 and the dataset containing the
following integers
The
output:
Height of the normal BST: 6
Height of the randomized
BST: 3
Level order print of the normal
BST
Level 0: 1
Level 1: 3
Level 2: 5
Level 3: 4 6
Level 4: 9
Level 5: 8
Level 6: 7
Level order print of the
randomized BST
Level 0: 8
Level 1: 5 9
Level 2: 3 6
Level 3: 1 4 7
[1] Martmnez, Conrado; Roura, Salvador (1998), "Randomized binary search trees", Journal of the ACM 45 (2): 288323, doi:10.1145/274787.274812
[2] http://en.wikipedia.org/wiki/Treap#Randomized_binary_search_tree