CMSC 341 - Fall 2001

Project 4

Assigned Wednesday, 07 November 2001
Due Monday, 26 November 2001
Modifications By popular demand, the due date for this project has been moved to midnight Monday 11/2/6/01
Have a nice Thanksgiving


Background

We have been studying different variations of binary search tress and investigating their theoretical performance using asymptotic analysis.  This project will give you an opportunity to gather and analyze empirical data about these trees and determine for yourself if the performance is really what the theory says it should be.  You will be modifying the author's binary search tree code and his top-down splay tree code to add new functionality.  You will then exercise these trees and gather data about their performance and shape.  Finally, you will be asked to analyze your data.  You'll also get some practice with C++ file I/O and formatted output.

Description

In this project you will be reading random integers from a file, building a BST and SPLAY tree with those values, then performing find( ) operations for both trees.  After all values from the file have been inserted, your program will print the first few elements from the tree and report the number of nodes inserted (N) and the value of lg(N).  Your program will then perform the following operations
  1. find( ) for every 100th value in the file
  2. find( ) for every 50th value in the file
  3. find( ) for every  25th value in the file
After all insertions have been completed and after every 100 operations thereafter,  your program will report the following information in a tabular format (see the sample output below).  DO NOT assume that the number of values in the file is a multiple of 100. When all operations have been completed and reported print your tree again.

The Command Line

Your program will accept two command line arguments -- the name of the data file and the number of nodes from the tree to print.  If the name of the data file is provided, you may assume the file is in the specified data file format.


 

Classes

Modify the author's BST class and top-down SplayTree class to provide the following additional public methods
  1. height( ) - returns the height of the tree (NOTE - because the height of a tree with a single node is 0, the height of an empty tree is defined as -1)
  2. size ( ) -- returns the number of nodes in the tree
  3. IPL ( ) - returns the IPL (internal path length) of the tree - the sum of the depths of all the nodes.  The IPL of an empty tree is 0.
  4. compares ( ) - returns the total number of data comparisons performed since tree construction
  5. printTree (int n) -- prints the first n nodes of the tree in level-order.  Each node is printed with it's data value and level.  The root is level 0.  If n is bigger than the number of nodes in the tree (unlikely except for testing), print the entire tree.

Tasks

Here are your tasks:
  1. Modify the author's BST code to support the new public methods described above.  Write whatever test programs you think are necessary to test your BST.
  2. Modify the author's Splay tree code to support the new public methods described above.  Write whatever test programs you think are necessary to test your Splay tree.
  3. Write main( ) and required auxiliary functions to read data from the file, perform the required tree operations and accumulate and report the required data.
  4. Answer the project questions by copying the file /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj4/341-Fall01-p4_questions.txt, editing it to include your answers and submitting the edited file.

Task Notes

Task Hints

    This project will take a long time to write and test.  Here are some hints that should help you organize and understand your tasks a bit better.

Data Files

Four large data files are available for testing - Proj4.8K.dat, Proj4.16K.dat,  Proj4.32K.dat and Proj4.64K.dat  Not surprisingly, these files contain 8K, 16K, 32K and 64K values respectively.  The data files may be assumed to contain whitespace separated values of type int. These values are generated with rand( ) and lie between 0 and MAX_INT. There are 10 values per line. You will want to create your own smaller data files for testing, but larger data files will be necessary to correctly analyze the data.  DO NOT assume that the number of values in the file is a multiple of 100 or 1000.  Your program should run successfully regardless of the number of values in the file.

A program that generates random integers and writes them to a file is available for your use.  The program is GenRandInts and is located in Mr. Frey's public directory /afs/umbc.edu/users/d/e/dennis/pub/CMSC341.


Sample Output

Sample output is located in Mr. Frey's public directory for this project
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj4 in the file p4output.  It was generated from the test file Proj4.64K.dat also located in this project's public directory.  Twenty nodes were printed for each tree.

Your output must match the sample output in the following respects

  1. Decimal values must have 4 decimal places
  2. All values are right justified in their fields
  3. Tree nodes must be printed 4 per line and in the specified format -- [ element, depth ]
  4. Columns must have headings

Project Submission

DO submit these files

Since you are submitting your own makefile, you are free to name your files whatever you like, with the following restrictions:

DO NOT submit these files

Do not submit any files from Mr. Frey's public directory that you use without modification.. These files should be included in your project through your makefile.  If found in your submittal directory, these files will be deleted (for example, Queue.H and Queue.C)

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj4

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line-args]

Example:  submitrun cs341 Proj4 test.dat 20

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Fall01-proj4_questions.txt are worth 20% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.


Level-Order Tree Traversal

A level-order traversal is one that visits the root first, then the root's children (left to right), then the root's grandchildren (left to right) and so on down the tree.  This traversal is generally implemented with a queue.  You have to decide what to put on the queue that represents the nodes.  Here's the pseudo code

        declare a queue to hold the nodes
        enqueue the root
        while the queue is not empty
            dequeue a node
            do something with the node (i.e."visit" the node)
            enqueue the node's left child (if there is one)
            enqueue the node's right child (if there is one)


Last modified on Tuesday, November 6, 2001 by Dennis Frey