CMSC 341 Fall 2005
Project 5
Disjoint Set Experimentation

Assigned Monday, Nov 28th
Due 11:59pm, Sunday Dec 11th
Updates

Objectives

Background

In class, the theoretical performance of certain operations on a data structure are presented and sometimes proven; sometimes not. In the case of the disjoint set data structure, we showed that certain heuristics affect the performance of the union and find operations. In particular we showed that using Union-by-Weight results in worst case O(lg n) performance for find (where n is the size of the universe of elements). We also showed that using Path-Compression resulted in amortized O(lg n) performance for find. And finally we showed that using both Union-by-Weight and Path-Compression resulted in amortized O(1) performance for find .

In this project, we will find out if the theory is correct.


Description

In this project you will implement a disjoint set (DJS) data structure which can be configured to use Union-by-Weight and/or Path-Compression or neither of these. After implementing this data structure, you will perform a sequence of union and find operations, periodically displaying various data and calculations. The methodology for performing the experiment is described below. You are at liberty to run the experiment as often as you like, but a minimal set of data must be submitted with your project.

Once you have completed the experiment and analyzed the data, you will write a report that describes the results of your experiment and if/how those results conform to the theoretically expected results.

The performance of a find operation is measured by the length of the path from the element being found to its representative. We will measure this length by counting the number of times find is called recursively. Your DJS data structure must be able to count these calls and provide an accessor for this value.

You may use the author's DJS code (found in Mr. Frey's public directory) or implement your own DJS class.

The Command Line

The command line for project 5 consists of the following parameters, in this order
  1. The number elements in the universe, N, numbered 0 to N - 1
  2. The number of operations to perform, M.
  3. The random number generator seed, S
  4. The data reporting interval, I. Print the required data after every I operations. Assume that I is a factor of M.
  5. One of the strings "compress" or "NOcompress" (without the quotes), indicating whether or not path compression should be used.
  6. One of the strings "byWeight" or "NOTbyWeight" (without the quotes), indicating whether or not union-by-weight should be used.

The Experiment

In this experiment we will use a random number generator to determine which operation to perform and the element(s) on which to perform it. So that your results can be duplicated, you will seed your random number generator from a command line parameter. We will be using the functions srandom and random. See the man pages for details on how to use them.

In this experiment, the following values are reported every time I operations have been completed. See the sample output below for an appropriate format, legend and column headers.

  1. The number of operations performed so far
  2. The number of find operations performed so far
  3. The number of union operations performed so far
  4. The total number of recursive calls to find so far
  5. The number of recursive calls to find since the last time reported
  6. The average number of recursive calls to find for each operation
  7. The average number of recursive calls to find since the last time reported.
  8. The average number of recursive calls to find for each operation divided by lg2( N ).

The experiment consists of the following steps

  1. Construct an appropriately configured DJS data structure.
  2. Seed the random number generator
  3. For each of the M operations
    Choose a random number. If the number is odd, select two random numbers from 0 to N-1 and union their sets together. If the number is even, select one random number from 0 to N-1 and perform a find operation on that element.
    Every I operations, report the data described above.

The Report

In lieu of project questions, you will write a report about your experiment and your data analysis. This report will count 25 points of the project grade. Spelling and grammar count. Your report file must be named Proj5Report.txt and must be written in plain text. Be sure to include your name, section, UMBC username, etc at the beginning of your report.

Your report should begin with a brief description of your DJS class. Describe how you designed your class to be configurable with respect to using path-compression and/or union-by-weight.

The bulk of your report should provide an analysis of the data for each of the four DJS configurations (with/without path-compression and union-by-weight). For each configuration, describe how the reported data (total recursive calls, average recursive calls, average recursive calls divided by lg2( N ), etc.) behave (i.e. become constant, get increasingly larger, get decreasingly smaller, converges to a value, etc.) as the number of operations increases. Describe the expected behavior according to the theory discussed in class and how/if/why your data supports or conflicts with that theory.

Project Notes, Hints, and Requirements

  1. The C/C++ math library does not provide a function for calculating lg2( N ). It does provide the function log10( N ) which returns the log of N in base 10. Use the math identity lg2( N ) = log10( N ) / log10( 2 ) to calculate the log of N in base 2. You must #include <cmath> to use log10.
  2. Note that union is a keyword in C/C++ and so cannot be used as the name of a function, variable, class, etc.
  3. When performing a union operation, be sure to first find the repesentatives of the sets being unioned.
  4. You must #include <cstdlib> to use random( ) and srandom( )
  5. Mr. Frey's public directory for this project is
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj5
  6. Run your experiment with various universe sizes, number of operations, with/without path compression and union-by-weight. Analyze this data to write your report.
  7. You must submit four data files from your experimentation. In each case, use N = 1000, M = 10000, I = 100 and S = 1234
    1. Submit the file exp1.dat which results from running your project without path compression and without union-by-weight
    2. Submit the file exp2.dat which results from running your project with path compression and without union-by-weight
    3. Submit the file exp3.dat which results from running your project without path compression but with union-by-weight
    4. Submit the file exp4.dat which results from running your project with both path compression and union-by-weight

Sample Output

This sample output is provided only as a format example. Your output must be presented in a readable table format. Experiment Parameters --------------------- Number of Elements = 500 lg( Nr Elements ) = 8.96578 Number of Operations = 1000 Union-by-Weight = NO Path-Compression = NO Random Seed = 1234 Table Legend ------------ Nr Ops = Cumulative Number of Union/Find Operations TC = Total Recursive Calls Delta TC = Change in TC from previous line AC = Average Recursive Calls per Operation Delta AC = Change in AC from previous line K1 = Average Calls / lg( Nr Elements ) Total Ops Unions Finds TC Delta TC AC Delta AC K1 100 42 58 6 - 0.0600 - 0.0067 200 82 118 31 25 0.1550 0.0950 0.0173 300 139 161 65 34 0.2167 0.0617 0.0242 400 190 210 125 60 0.3125 0.0958 0.0349 500 238 262 209 84 0.4180 0.1055 0.0466 600 282 318 311 102 0.5183 0.1003 0.0578 700 332 368 455 144 0.6500 0.1317 0.0725 800 382 418 593 138 0.7412 0.0912 0.0827 900 432 468 745 152 0.8278 0.0865 0.0923 1000 482 518 904 159 0.9040 0.0762 0.1008

Files To Be Submitted

Submit any files which you have written or modified -- source code and makefile.

Submit your project report. This file must be in plain text and be named Proj5Report.txt.

Submit the required experiment data files

If your makefile is set up correctly, you should be able to execute the command make submit.

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj5 <parameter list>

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj5 <parameter list>

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj5

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project>

Example:  submitrun cs341 Proj5 <parameter list>

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.