CMSC 341Spring 2013
Project 4

Assigned	Wednesday, May 1, 2013
Due	11:59 pm on Tuesday May 14

Objectives

The purpose of this project is to give you significant exposure to Hash Tables. You will be conducting experiments with probing hash tables using linear, quadratic and double hashing. You will be asked to analyze the data you collect and compare the data with the expected theoretical results.

Description

In this project, you will be inserting randomly generated integers into three hash tables. For one hash table, you will use linear probing for collision resolution. For another you will use quadratic probing. For the third hash table, you will use double hashing. You will be collecting data about clustering and probing in hash tables.

Your program will attempt to insert N unique random integers into the hash tables, reporting the collected data on a specified interval (after every INTERVAL insertion attempts). For example, with N = 1000 and INTERVAL = 200, you will attempt to insert 1000 integers, and report data after attempting to insert 200, 400, 600, 800 and 1000 integers. See the sample output for recommended organization of the data. You may assume that N is a multiple of INTERVAL.

Throughout this project description, "N" will refer to the number of attempted insertions into the hash table; "K" will refer to the key being inserted; "I" will refer to the probe number; and "M" will refer to the hash table size. These definitions are the same as those used in class.

Your program will accept the following command line arguments (in this order)

the name of a file which contains unique random integers separated by whitespace.
N-- the total number of random integers to attempt to insert (e.g. 1000)
INTERVAL -- the interval (number of attempted insertions) for reporting data (e.g. 200)
M - the size of the hash table
R - the largest prime less than M

After attempting to insert INTERVAL integers, your program will report the following data:

the total number of attempted insertions (successful and unsuccessful) so far
the value of lambda (successful insertions / table size)
the total number of probes so far (for both successful and unsuccessful insertions)
the average number of probes needed to insert (successfully or unsuccessfully) an integer into each hash table
the maximum number of probes needed to insert (successfully or unsuccessfully) an integer into each hash table
the number of successful insertions
the number of unsuccessful insertion attempts
the number of clusters in the hash table
A cluster is defined to be 1 or more consecutive occupied slots in the hash table. When counting clusters, do not "wrap around".
the maximum cluster size in each hash table
the average cluster size in each hash table

Requirements, Notes, and Hints

Code for the author's hash table class can be checked out using CVS. Feel free to modify the file QuadraticProbingHashTable.java as needed.

Note that the author's code uses quadratic probing for collision resolution. This project requires three different collision resolution methods, so you'll need to add the other two collision resolution methods.

The author's code automatically rehashes the hash table when the table is 50% full. This code must be removed so that we can study the effects that filling the hash table.

The author's code also initializes the hash table to a size equal to the next prime after the integer passed to the constructor as M. You'll need to disable this code as well, so that it makes a table of size M, where M is passed as a parameter. FYI, the value of M we use will be a prime.

An insertion is "unsuccessful" if h(k, i) == h(k, j) for some j not equal to i. I.e. you ever attempt to store the key into the same slot twice. You should check this condition after each probe so that we all count unsuccessful insertions in the same way.

Your code must handle (try/throw/catch) any exceptions thrown by the author's code or by your own classes.

All "average" values should be printed with 2 decimal places of precision.

Probing Functions

For all probing, use the function
h( K ) = K mod M
For linear probing, use the function
h( K, I ) = ( h( K ) + I ) mod M
For quadratic probing, use the function
h( K, I ) = ( h( K ) + I²) mod M
For double hashing, use the function
h ( K, I ) = ( h( K ) + I * h₂( K ) ) mod M
where h₂ ( K ) = R - ( K mod R )
and where R is the largest prime that is smaller than the table size.

Sample Output

Mr. Winner has prepared some sample output for this project. The relevant files include:

proj4sampleinput.txt contains the data
proj4sampleoutput.txt is what the output from your program should look like
proj4sampletables.txt is a print of the actual hash tables for your debugging purposes
proj4samplecommandline.txt is one example of what my command line looked like

Kevin says that he wrote (and debugged) this program quickly, so if your output differs significantly from his and you are certain that you are right, please see him in order to discuss any discrepancies.

The following sample output is for formatting and organizational purposes only. No real data is provided.
In this example, N = 1000 (the total number of insertions to attempt) and INTERVAL = 100 (the reporting frequency). Your output will consist of three tables like the one shown below
.There will be separate tables for linear probing, quadratic probing and double hashing.

In the tables, the columns are defined as follows

General:

"N" -- the total number of attempted insertions.
"lambda" -- the load factor -- the number of successful insertions / table size

For the "Inserts" columns:

"success" -- the running total of the number of successful insertions
"failed" -- the running total of the number of insertions that failed because no slot could be found.
Note that success + failed = N

For the "Probes" columns:

"total" -- the running total number of probes for all insertions (successful and unsuccessful)
"avg" -- the average number of probes for each insertion (successful and unsuccessful)
"max" -- the maximum number of probes necessary for a single insertion (successful or unsuccessful)

For the "Cluster" columns:

"number" -- the number of clusters in the table
"avg" -- the average size of each cluster
"max" -- the maximum cluster size in the table

			Linear Probing Analysis (Table size = xxxxxx)                 
                  --- Inserts ---   ------- Probes -------    ----- Clusters ----- 
     N   lambda  success  failed     total    avg    max     number   avg     max  
   200     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx
   400     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx 
   600     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx   
   800     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx 
  1000     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check. Make sure they work. Do this before the due date.

Grading and Academic Integrity

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.