CMSC 341 Spring 2004
Project 4

Assigned	Wednesday, April 7, 2004
Due	11:59 pm on Tuesday April 20, 2004
Note	Questions and a sample output will be posted shortly
Updates	*09 April* QuadProbing.C included both template and non-template code. This problem could cause the linker to multiply include the non-template code resulting in linker errors. The non-templated code has been moved into two new files in Mr. Frey's public directory -- Prime.C which contains isPrime() and nextPrime() and HashFuncs.C which contains the two hash functions. Prime.H and HashFuncs.H have also been created. Be sure to compile and link these new .C files and include them in your makefile.
Updates	*12 April* Questions and sample output can be found here. Depending on the specifics of your implementation, the numbers may be slightly different from yours. There is an inconsistencycy in the text on how to count the number of probes in an operation. In this project, tobe consistent with the number of comparisons in BST, we consider it is the number of cells accessed in an operation. Therefore, the minimum would be 1, not 0.
Correction	*14 April* There is an error on Experiment 5.1. It should be "the smallest prime" not the smallest integer".
Updates	*14 April* To ensure uniform output, when generate random integers, you should first generate the integer, then the probbility for which vector it goes. That is, first int num = rand(); then double r = rand()/RAND_MAX.

Objectives

The purpose of this project is to give you some exposure to Hash Tables (HT) and to experimentally study the time performance of hash tables, in comparison with that of binary search trees (BST). As discussed in the class, a randomly generated BST of n nodes takes on average of O(n log n) time to build; the height of such a BST is O(log n), which leads to a O(log n) average time performance for find and insert operations. On the other hand, find and insert operations for HT have O(1) time performance, provided the table is sufficiently large and a proper hashing function is used. In this project, you will verify these theoretical results through a sequence of experiments with BST and HTs using closed hashing and quadratic probing.

Description

In this project, you will first construct a BST and several HT's of different size by inserting the same sequence of randomly generated integers, and then conduct both successful and failed find operations over these data structures. You will collect statistics related to the performance of both BST and HT during these operations and report your findings. Below are the specifics.

Data Preparation
For a fair comparison, the same set of integers will be used to construct the BST and HT's, and the same sequence of keys will be used for the find operations in both BST and HTs. For this purpose, you are asked to build two vectors of randomly generated integers v1 and v2. Both vectors are of size N (the value of N will be provided as the second of the two command line parameters. Integers in v1 will be used for constructing BST and HT's; those in v2 are used for find operations. These two vectors are constructed as follows:

Continuously generate random integers using the system's random number generating function rand, the seed used in srand will be provided as the first of the two command line parameters.
For each integer generated, put it into the two vectors according to the following distribution: 0.33 in v1 only, 0.33 in v2 only, and 0.34 in both v1 and v2. (As did in Project 2, you should generate another random number r between 0 and 1, if r <= 0.33, then put the integer into v1, if 0.33 < r <= 0.66, put it into v2, other wise, put it into both v1 and v2.)
Since these two vectors are filled randomly, one, say v1, may be filled to size N sooner than the other. In this case, continue the process in 2 until v2 is also filled to size N, and discard any integers that would have been distributed to v1 since is it already full.
You shall NOT check and remove duplicates when constructing the two vectors. The actual number of integers inserted into BST and HT's may be slightly less than N since attempts of inserting duplicates will be ignored by both BST and HT code.

Approximately half of integers in v2 are also in v1, and another half are not, and consequent approximately half of your find operations will succeed, and another half will fail.

BST
When doing the tree construction and find(x) operations, you need to collect the number of comparisons for each operation. To be comparable to probings in HT, the comparison here means how many objects are compared with x in insert and find, NOT the number of actual comparison operators (=, <, >) executed. For example, inserting an integer into an empty tree has 1 comparison (comparing with null), a successful find with x = 4 on a path 1 -> 2 -> 3 ->4 -> 5 has 4 comparisons, and a failed find with x = 6 along the same path would have 6 comparisons (the 5 nodes plus the null).

Hash Tables
You will be experimenting with closed hashing, using the division method for hashing and quadratic probing for resolving collisions as specified below:

               h'(K) = K mod M

               h(K, I) = (h'(K) + I²) mod M

HT's of different sizes, reflecting different load factors "lambda", will be experimented. For a given N and lambda, the HT size will be the smallest prime number greater than N/lambda. To observe the influence of lambda on the HT performance, you are asked to run your program with lambda = 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 for a given N.

Note: there is a small probability that, when quadratic probing is used, you may not be able to find a empty cell for insertion even though the table is not full. If this happens, record it and then proceed to insert the next integer from v2.

Experiments
The procedure for the experiments is outlined below.

Read in S (the seed for srand()) and N from the command line;
Construct integer vectors v1 and v2 as described earlier;
Construct a BST by successive insertions of integers from v1;
For each integer x in v2, do find(x) with the BST constructed in step 3;
for lambda = 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 do
5.1. construct a HT with size m, the smallest prime no less than N/lambda, by successive insertions of integers from v1;
5.2. for each integer x in v2, do find(x) with the HT constructed in step 5.1.

When constructing BST and HTs, the order of integers from v1 inserted shall be
v1[0], v1[1],..., v1[N-1].

Statistics needed for the final report should be collected throughout the experiments. What data should be included in the final report are given next.

Reporting
At the end of the experiment, report the following in a well formatted output.

S and N read from the command line.
For BST

The total, average, and maximum number of comparisons performed over all insert operations in constructing the tree;
The number of successful find operations, and the minimum, average, and maximum number of comparisons among all successful find operations;
The number of failed find operations, and the minimum, average, and maximum number of comparisons among all failed find operations;

For HT's: for each lambda value, report

The value of lambda;
The table size;
The total, average, and maximum number of probings performed over all insert operations in constructing the HT;
Also report if and how many insertions failed;
The number of successful find operations, and the minimum, average, and maximum number of probings among all successful find operation;
The number of failed find operations, and the minimum, average, and maximum number of probings among all failed find operation;

Your Tasks

Implement a BST class and a Hash Table class by modifying the author's code.

Author's code (BinarySearchTree.H, BinarySearchTree.C, QuadProbing.H, QuadProbing.C) can be found at Mr. Frey's directory
afs/umbc.edu/users/d/e/dennis/pub/CMSC341.
In particular, you need to change the author's insert and find methods in both classes to collect statistics, and allow the main to access these statistics.
DO NOT double the hash table size and rehash when the load factor exceeds 0.5. Modify the author's code accordingly.

Write your main( ) that drives and controls the experiment.
Write any other classes or auxiliary code you feel are necessary for this project
Write a makefile for this project.
Answer the questions found in the file

/afs/umbc.edu/users/y/p/ypeng/pub/CMSC341/Proj4/341-Spring04-p4_questions.txt

These questions are worth 10% of the project grade.

Program Requirements

Your code must follow all project guidelines established for this course. See the project organization and project policies web pages. You are not required to modify the author's code to follow these guidelines.
You are free to name your main program file, but your executable must be named Proj4.
Your code must handle (try/throw/catch) any exceptions thrown by the author's code or by your own classes.
All "average" values should be printed with 4 decimal places of precision.

Files To Be Submitted

Submit any files which you have written and modified:

file that contains your main( );
files for your modified BST class;
files for your modified Hash Table class;
any other files you write for this project;
your makefile -- which must be named makefile or Makefile;
the question file with your answers inserted.

Submit the files using the procedure given to you for your section of the course.
If your makefile is set up correctly, you should be able to execute the command make submit.

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check . Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example: submitmake cs341 Proj4

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project> [command-line args]

Example: submitrun cs341 Proj4 1234 1000 100

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).

Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.

CMSC 341 Spring 2004 Project 4