CMSC-341 Fall 2002

Project 4

Assigned 06 Nov 2002
Due 20 Nov 2002 by 11:59PM
Updates 11 Nov 2002 - The function prototypes for find(), increaseKey(), and decreaseKey() had the '&' in the wrong place. They have been fixed.
09 Nov 2002 - The broken questions and sample output links now work.
07 Nov 2002 - The description says there are CPUTimer methods named last() and total(). They are actually named getLast() and getTotal().
07 Nov 2002 - Some students are getting errors about arguments being re-ordered in the BinaryHeap constructor. Take a new copy of BinaryHeap.C and the error will go away.


Background

Priority queues are an important data structure, with applications in areas such as job scheduling for printers and CPUs. Array-based binary heaps are one very efficient implementation of priority queues, with O(1) time findMin() and O(lg n) time insert() and deleteMin(). Because the branching factor (the number of children a node can have) in a binary heap is 2, the base of the logarithm in O(lg n) is 2 as well. In this project you will modify the author's binary heap code to implement a d-ary heap, where d is the branching factor of the heap.

Why might it be a good idea to use d-ary heaps rather than binary heaps? The height of a complete binary tree containing 10,000 items is 13, meaning that if the items are stored in a binary heap, insertions and deletions might take up to 13 swaps. If the same items are stored in a heap with branching factor 16, no more than 3 swaps will be required. The disadvantage of having 16 children for one node is that, for example, when percolating down you have to find the smallest of the children, and the cost of this operation grows linearly with the number of children.

In addition to implementing d-ary heaps, you will measure the CPU time required to perform sequences of various operations in an attempt to get an empirical view into how the cost of these operations is affected by the branching factor of the heap.

NOTE: The questions for this project require you to run your program on a particular data file using heaps with different branching factors. Please take this into account and allocate enough time to generate and analyze the results.


Description

Here are your tasks:

  1. Read and understand the description of the BinaryHeap ADT in the text.
  2. Copy the binary heap files (BinaryHeap.C, BinaryHeap.H, and dsexceptions.H) from Dr. Oates' public directory
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341
    to your directory.
  3. Read through CPUTimer.C and CPUTimer.H Dr. Oates' public directory
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341
    . Your makefile should reference these files rather than a local copy in your directory.
  4. Using the BinaryHeap code as a starting point, implement a d-aryHeap class.
  5. Read and understand the description of the STL version of the vector ADT at the following URL:

    http://www.msoe.edu/eecs/cese/resources/stl/vector.htm

  6. Modify your d-ary heap code so that it dynamically adjusts its size as new items are added and never throws an Overflow exception.
  7. Design and implement Proj4.C which contains main( ) and is the driver program for this project.
  8. Implement a makefile for this project.
  9. Answer the questions posed in 341-Fall02-proj4_questions.txt. Copy the file
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj4/341-Fall02-p4_questions.txt
    to your own directory and edit it to provide your answers to the questions. Don't forget to submit the edited file; it is 10% of this project's grade.

Definition of the ADT

All classes that you design and implement must support the "Big-4", even if the default behavior provided by the compiler would be acceptable.

As always, you are free to add whatever private data and methods you desire, and you must provide the public methods listed in this project description. If you need to add any additional public methods, think very carefully about it and provide ample justification for why the method must be public in comments in your code. Points will be deducted for public methods that are not well justified.

The d-aryHeap

Rather than having at most two children, nodes in a d-ary heap can have at most d children, where d is a parameter to the constructor. Therefore, you will need to implement a d-ary heap constructor whose prototype is given below: Clearly, each heap will need a private data member to record the value of d passed into the constructor. You will need to modify the author's binary heap code to allow nodes to have more than 2 children. For a d-ary heap, the children of the node at array index I (when the root is at index 1) are at indices:
d(I - 1) + 2
d(I - 1) + 3
...
d(I - 1) + d
d(I - 1) + d + 1
Also, the parent of the node at array index I is at:
floor((I - 2) / d) + 1

You will also need to implement the following methods:

Because we want to explore the asyptotic performance of d-ary heap operations, we will need to build heaps with large numbers of elements. The author's code requires knowledge of the maximum size of the heap when it is created. You must alter the code so that the vector is dynamically enlarged as needed when items are created. The key to this will be to use the push_back() method to add items to the vector when it is full. Be sure to use operator[] to add items when the vector is not full.

The CPUTimer

To measure the impact of d on the performance of d-ary heaps, you will use the CPUTimer class. This class is provided for you and must be used without modification. Objects of type CPUTimer can be used to record the amount of CPU time used by your program. The public interface is as follows. You must declare and start a CPUTimer before performing any d-aryHeap operations. For example:
void main(int argc, char **argv) {
  CPUTimer ct;
  ct.start();
  
  ...

  ct.stop();
  exit(0);
}

The Command Line

Project 4 will be invoked with a command line that consists of two arguments - the arity of the heap (which must be an integer greater than or equal to 2) and the name of the data file to read. As usual, you should check the validity of all command line arguments.

After validating the command line, you should create a single d-aryHeap with the specified arity, and then perform all of the operations given in the data file on that heap.


The Data File

IMPORTANT: DO NOT ECHO COMMANDS READ FROM THE DATA FILE. We will create test files with large numbers of commands (i.e. tens of thousands), and if you echo these commands the output of the grading scripts will be much too large. That's also the reason that command names (see below) are 1 or 2 characters long. We don't want to waste space in these files with long command names. Commands in the file specify operations to be performed on the d-ary heap. Each line in the file represents one command. Blank lines may appear anywhere in the file and should be ignored. Otherwise, you can assume that any line containing a command is syntactically well-formed. We make this assumption so you don't have to spend lots of time making the command file parser bullet proof.

The command file format follows:


Sample Output

Sample output is available for your study at

/afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj4/341-Fall02-p4-sample_output.txt

Project Submission

Submit all files required to build your project.
Submit Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj4

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj4 test.dat

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Fall02-proj4_questions.txt are worth 10% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.