CMSC-341 Spring 2003

Project 3

Assigned Mar 9th, 2003
Due Sunday Mar 23, 2003 by 11:59PM
Updates
March 11, 2003, 12:45pm
You may assume the data files contain non-negative integers
March 10, 2003, 10:20am
The initial posting on Saturday had the incorrect prototype for the K-ary tree node constructor. The correct prototype as specified in this document is
KaryTreeNode (const Object& data, int K);
where K specifies the number of children (K) in each node.


Background

In this course, we discuss the asymptotic performance of many data structures and often give proofs or convincing arguments that validate the claims about performance. In this project, you will conduct an experiment to gather empirical evidence that the actual performance of Binary Search Trees (BSTs) and K-ary Trees (KTs) matches the theoretical performance.

Description

In this project, you will read random integers from a file and insert them into a modfied version of the author's BST. After every M insertions, you will display information about the tree, as described below and seen in the sample output.

You will then insert those same values into a K-ary tree with K children per node and print the same information about the tree.

The project questions will ask you to analyze the data.


A summary of things to do

  1. Copy the author's BST code (BinarySearchTree.H/C) from Mr. Frey's public directory to your directory.
  2. Modify the author's BST code in the following ways
    1. Modify insert() to allow duplicates. Insert duplicates on the left.
    2. Add a new public method: int CountLeaves ( void ) const; that returns the number of leaves in the tree.
    3. Add a new public method: int CountNodes ( void ) const; that returns the number of nodes in the tree
    4. Add a new public method: int Height ( void ) const; that returns the height of the tree.
  3. Implement a K-ary tree node class template.
    1. See class lecture notes for ideas on this class.
    2. The only method is the private constructor KaryTreeNode (const Object& data, int K); where K specifies the number of children (K) in each node.
    3. The K-ary tree class is a friend of the K-ary tree node class.
  4. Implement a minimal K-ary tree class template. (see class lecture notes for the algorithm for find( ) which will give you an idea about how to implement the methods below.

    You must implement the following public methods for the K-ary tree

    1. KaryTree (int K); -- a constructor that specifies the number of children (K) in each node of the K-ary tree.
    2. ~KaryTree( void ) -- the destructor; do this last
    3. void Insert (const Object& data) -- that inserts the data into the tree. Attach the new node to a randomly chosen child. You may assume that insertion never fails.
    4. int CountLeaves ( void ) const; that returns the number of leaves in the tree.
    5. int CountNodes ( void ) const; that returns the number of nodes in the tree
    6. int Height ( void ) const; that returns the height of the tree.
  5. Implement the application in Proj3.C
  6. Create a makefile for the project
  7. Copy the question file from Mr. Frey's public directory and edit it with your answers.
    The questions for this project are worth 30% of you grade.
  8. Submit your project
  9. Use submitmake and submitrun to verify your submittal

The Command Line

Project 3 will be invoked with three (3) command line arguments in this order
  1. The file of integers to read and insert
  2. K - the number of children in each K-ary tree node
  3. M - the reporting interval; display one line of data after every M insertions


The Data File

The data file will contain random non-negative integers separated by whitespace. Data files are available from Mr. Frey's public directory.
File names are of the form N.dat, where N is an integer indicating the number of values in the file. E.g 2000.dat contains 2000 integers.

Sample Output

After every M insertions (see
The Command Line above), your program will display one line of data. Each line of data will consist of the following information, in the order specified below. All data must be neatly formatted in columns as shown below. Decimal output (the last 4 columns) must be printed with 4 decimal places.
  1. The number of nodes (N) in the tree
  2. The number of leaves (L) in the tree
  3. The height (H) of the tree
  4. The number of leaves / log (number of nodes) -- L / lgN
    For BSTs, use log base 2; for K-ary trees, use log base K
  5. The number of leaves / number of nodes -- L / N
  6. The height / log (number of node) -- H / lgN
    For BSTs, use log base 2; for K-ary trees, use log base K
  7. The height / number of nodes -- H / N
The following output was produced by inserting the file 100.dat into a BST and 3-ary tree with reporting after every 10 insertions. Use it as an example of formatting. Many more values are required before patterns can bee seen in the data.

linux3[82]% Proj3 100.dat 3 10 BST Data from file 100.dat Nodes Leaves Height L / lgN L / N H / lgN H / N ----- ------ ------ ------- ------- ------- ------ 10 4 4 1.2041 0.4000 1.2041 0.4000 20 6 5 1.3883 0.3000 1.1569 0.2500 30 11 7 2.2417 0.3667 1.4266 0.2333 40 16 8 3.0064 0.4000 1.5032 0.2000 50 16 9 2.8349 0.3200 1.5947 0.1800 60 21 9 3.5552 0.3500 1.5236 0.1500 70 25 9 4.0788 0.3571 1.4684 0.1286 80 28 9 4.4290 0.3500 1.4236 0.1125 90 33 10 5.0833 0.3667 1.5404 0.1111 100 34 11 5.1175 0.3400 1.6557 0.1100 K-ary Data from file 100.dat with K = 3 Nodes Leaves Height L / lgN L / N H / lgN H / N ----- ------ ------ ------- ------- ------- ------ 10 5 3 2.3856 0.5000 1.4314 0.3000 20 10 4 3.6673 0.5000 1.4669 0.2000 30 16 4 5.1681 0.5333 1.2920 0.1333 40 21 5 6.2542 0.5250 1.4891 0.1250 50 26 5 7.3016 0.5200 1.4041 0.1000 60 29 5 7.7814 0.4833 1.3416 0.0833 70 34 5 8.7920 0.4857 1.2929 0.0714 80 38 5 9.5269 0.4750 1.2535 0.0625 90 43 5 10.4983 0.4778 1.2207 0.0556 100 48 6 11.4509 0.4800 1.4314 0.0600

Hints, Notes and Suggestions

These hints, notes and suggestions are simply things we thought of as we designed and implemented the project. Depending on your approach to the project, not all of these may apply to you.
  1. Mr. Frey's public directory for this project is /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj3
  2. Input files and output files are located in Mr. Frey's public directory.
    Input files are named N.dat where N is the number of integers in the file. E.g. 1000.dat
    Output files are named N.K.M.dat where N is the number of integers inserted, K is number of children in each K-ary node and M is the reporting interval. E.g. 1000.5.50.dat
  3. In some cases, smaller reporting intervals (which means more data) will help you see the pattern in the K-ary tree data better.
  4. In some cases, running your program on the same input file with different values of K will help you see the pattern in the K-ary tree data better.
  5. Use the random number generator (RNG) random( ) when randomly choosing one of the K children for K-ary tree insertion. DO NOT seed the RNG using srandom().
    NOTE:If you are not developing on linux.gl.umbc.edu, your K-ary data will almost certainly NOT match the data provided since you will undoubtedly be using a different RNG. The patterns in the data should be the same nonetheless.
  6. For initial testing, consider writing a Print() method for the trees. On each line of output, display a node's value and the values stored in each of its children. If you insert known values (rather than random ones), you can verify that CountLeaves(), CountNodes() and Height() are working properly. The Print() method performs a level-order traversal which requires using a Queue to hold tree node pointers. Here's a print from a BST 50: 40, 60 40: null, null, 60: null, 70 70: null, 80 80: null, null, and one from a K-ary tree with K=3 50: null, 40, null, 40: 60, 80, 70, 60: null, null, null, 80: null, null, null, 70: null, null, null,
  7. One way to read a file from beginning to end twice is to close() the file and then open() it again. It is necessary to clear() the file between the close() and the open().
  8. The C++ math library (#include <cmath>) provides the function log10(double x) that returns the log of X, base 10. To compute the log of X, base K, note that log X base K = log X base 10 / log K base 10.
  9. You can redirect your program's output to a file using Unix file redirection,
    E.g Proj3 100.dat 5 10 > 100.5.10.out redirects Proj3's output to the file named 100.5.10.out
  10. Think recursively.
  11. Your program should be able to handle files of 200000 integers. The processing takes a couple of minutes.

Project Submission

Because you are submitting your own makefile, you are free to name your files whatever you like, with the following restrictions:

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs which are not part of the UCS submit program. They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

To access these programs, do one of the following

  1. Create an alias in your .cshrc file (the preferred method)
    alias submitmake /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitmake
    alias submitrun /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitrun
  2. Type the full path name everytime you run them
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitmake cs341 Proj3
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitrun cs341 Proj3 txfile
  3. Copy them to your directory
    You will have to give yourself permission to execute these programs after you have copied them. This is accomplished using the chmod command

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj3

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj3 test.dat 10 2

This runs the project, assuming there is an executable (i.e. submitmake was run successfully) and that test.dat is in your local (not submittal) directory.


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Spring03-proj3_questions.txt are worth 30% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after 11:59pm of the due date will not be accepted. Do not submit any files after that time.
 


Last modified on Saturday March 8, 2003 by Dennis Frey
email: frey@cs.umbc.edu