CMSC 341 Spring 2007

Project 4

Assigned Wed 18 April 2007
Due Tues 01 May 2007 by 11:59PM
Update 27 Apr The original prototype for the Print( ) method was incorrect. Since the ostream parameter had a default value (cout), it must be the 2nd parameter. The prototype has been changed.

21 Apr
The CPUTimer class has been modified to report only USER cpu time and not both USER and SYSTEM cpu time. This seems to give more consistent results.
Bigger files are now available for testing.... p4.1000000.cmds and p4.500000.cmds contain 1,000,000 and 500,000 insertions and a single PT command respectively.
Since the output of the CPU timer varies, we suggest running your program "many" (e.g. ~20) times and using the average CPU time when answering the project questions.


Background

Priority queues are an important data structure, with applications in areas such as job scheduling for printers and CPUs. Array-based binary heaps are one very efficient implementation of priority queues, with O(1) time for findMin( ) and O(lg n) time insert( ) and deleteMin( ). Because the branching factor (the number of children a node can have) in a binary heap is 2, the base of the logarithm in O(lg n) is 2 as well. In this project you will modify the author's binary heap code to implement a k-ary heap, where k is the branching factor of the heap.

Why might it be a good idea to use k-ary heaps rather than binary heaps? The height of a complete binary tree containing 10,000 items is 13, meaning that if the items are stored in a binary heap, insertions and deletions might take up to 13 swaps. If the same items are stored in a heap with branching factor 16, no more than 3 swaps will be required. The disadvantage of having 16 children for one node is that, for example, when percolating down you have to find the smallest of the children, and the cost of this operation grows linearly with the number of children.

In addition to implementing k-ary heaps, you will measure the CPU time required to perform sequences of various operations in an attempt to get an empirical view into how the cost of these operations is affected by the branching factor of the heap.

NOTE: The questions for this project require you to run your program on a particular data file using heaps with different branching factors. Please take this into account and allocate enough time to generate and analyze the results.

Mr. Frey's public directory for this project is /afs/umbc.edu/users/f/r/frey/pub/CMSC341/Proj4


Description

Here are your tasks:

  1. Read and understand the description of the BinaryHeap ADT in the text.
  2. Copy the binary heap file (BinaryHeap.h) from Mr. Frey's public directory to your directory.
  3. Read through CPUTimer.h and CPUTimer.cpp in Mr. Frey's public directory. Your makefile should reference these files. DO NOT submit these files. The grading scripts will remove these files if present.
  4. Using the BinaryHeap code as a starting point, implement a KaryHeap class template.
  5. Design and implement Proj4.cpp which contains main( ) and is the driver program for this project.
  6. Implement a makefile for this project.
  7. Answer the questions posed in 341-spring07-proj4_questions.txt. Copy the file 341-spring07-p4_questions.txt from Mr. Frey's public directory to your own directory and edit it to provide your answers to the questions. Don't forget to submit the edited file; it is 20% of this project's grade.

    Run your project using the command file named p4.commands.huge in Mr. Frey's public directory to gather the data necessary to answer these questions.

    Submit your output from using p4.commands.huge per the directions in the questions file.


Definition of the KaryHeap ADT

Rather than having at most two children, nodes in a k-ary heap can have at most k children, where k is a parameter to the constructor.

Clearly, each heap will need a private data member to record the value of k passed into the constructor. You will need to modify the author's binary heap code to allow nodes to have more than 2 children. Also, code which calculates and uses the index of a node's parent and children will require modification.

For a given "arity" of k, the indices of the children of the node at index I (when the root is at index 1) are

k(I - 1) + 2, k(I - 1) + 3, ...., k(I - 1) + k + 1 The index of the parent of the node at array index I (with I > 1) is floor( (I - 2) / k ) + 1

You will also need to implement the following methods:

As always, you are free to add whatever private data and methods you desire, and you must provide the public methods listed in this project description. If you need to add any additional public methods, think very carefully about it and provide ample justification for why the method must be public in a README file that must be submitted. Points will be deducted for public methods that are not well justified.

The CPUTimer

To measure the impact of k on the performance of k-ary heaps, you will use the CPUTimer class. This class is provided for you and must be used without modification. Objects of type CPUTimer can be used to record the amount of CPU time used by your program. The public interface is defined below and in CPUTimer.h You must use the timer to accumlate the total CPU time used for all KaryHeap operation. That is, Start() the timer before each KaryHeap operation and Stop( ) the timer immediately after the operation completes. For example,
int main(int argc, char **argv) 
{
  KaryHeap<int> kheap( 6, 20 );
  CPUTimer timer;

  timer.Start();
  kheap.insert( 42);
  timer.Stop();

  // .....

  timer.Start( );
  kheap.IncreaseKey( 3, 20 );
  timer.Stop( );

  // .....
  cout << "Total CPU Time = " << CPUTimer.Total( ) << endl;
  exit( 0 );
}

The Command Line

Project 4 will be invoked with a command line that consists of two arguments - k, the "arity" of the heap (which must be an integer greater than or equal to 2) and the name of the data file to read. As usual, you should check the validity of all command line arguments.

After validating the command line, you should create a single KaryHeap with the specified arity, and then perform all of the operations given in the data file on that heap.


The Data File

IMPORTANT: DO NOT ECHO COMMANDS READ FROM THE DATA FILE.

"Small" test files will be used to test the functionality of your KaryHeap. However, you will use a test file with very large numbers of commands (i.e. tens of thousands) to gather data for answering the project questions and if you echo these commands the output of the grading scripts will be much too large. That's also the reason that command names (see below) are 1 or 2 characters long. We don't want to waste space in these files with long command names.

Commands in the file specify operations to be performed on the k-ary heap. Each line in the file represents one command. Blank lines may appear anywhere in the file and should be ignored. Lines which begin with '#' are comments and should also be ignored. Otherwise, you can assume that any line containing a command is syntactically well-formed. We make this assumption so you don't have to spend lots of time making the command file parser bullet proof.

The command file format follows:


Sample Output

Sample output is available for your study in 341-spring07-p4-sample_output.txt in Mr. Frey's public directory.

Project Submission

Submit all files required to build your project except for CPUTimer.h and CPUTimer.cpp which must be referenced by your makefile. These files will be deleted from our submittal directory.
Submit Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/f/r/frey/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj4

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj4 test.dat

This runs the project, assuming there is an executable (i.e. submitmake was run successfully) and that test.dat is in your local directory.


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Spring07-proj4_questions.txt are worth 10% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.