Project 3, Min-Max &amp; Quartile Heaps

Objectives

The objective of this programming assignment is to have you work with heaps and to practice working with generic classes in Java.

Introduction

In this project, you will implement the min-max heap data structure described in Homework 4, Question #3. Four of these min-max heaps can be used to create a quartile heap — a data structure that can be used to collect percentile statistics.

A quartile heap supports insertion in O( log n ) time and can report the minimum, the 25th percentile, the median (or equivalently, the 50th percentile), the 75th percentile and the maximum value in the heap in constant time. Quartile heaps do not support deletion.

You can construct a quartile heap using 4 min-max heaps. The first min-max heap stores all the values in the lowest 25th percentile. The second min-max heap stores the values in the 25th to 50th percentile. The third min-max heap has the 50th to 75th percentile and the fourth min-max heap has the values above the 75th percentile. The intention is for these four heaps to hold the same number of items (off by one, at most). Furthermore, the values in the first heap should be smaller than or equal to the values in the second heap, which are in turn smaller than or equal to the values in the third heap, which are smaller than or equal to the values in the fourth heap.

In this arrangement, the minimum value in the quartile heap is simply the minimum value in the first heap. The 25th percentile value is the maximum value in the first heap. The median is the largest value in the second heap, and so on. Since we are using min-max heaps, these minimum and maximum values can be retrieved from the appropriate min-max heap in constant time.

Assignment

Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.

Your assignment is to implement a min-max heap data structure and to use these in turn to construct a quartile heap. Your implementation must be generic. Recall, that Java does not like to create arrays of generic type, but heaps are usually implemented using arrays. In Project 2 we circumvented the problem using ArrayList from the Java Collections API. For this project, we will take a different approach.

We will use an interface to specify the types that can be used with the generic min-max heaps.

public interface WorksWithMMHeap<T> { public T newObject() ; public T [] newArray(int n) ; public int compare(T n1, T n2) ; }

Two of the methods, newObject() and newArray(), create objects and arrays of type T. So, we just need to supply the min-max heap methods with an example of an object of type T. Then, the newObject() and newArray() methods can be used to create objects and arrays of type T. To make sure that we always have an object of type T handy, we can require that such an object be given to the constructor.

For example, your min-max heap class should begin with the declaration:

public class MMHeap< T extends WorksWithMMHeap<T> > { private T aT ; ...

We can use the aT member to reference an object of type T. The constructor can start with:

public MMHeap(T exemplar, int initialSize) { aT = exemplar ; ...

After the call to this constructor, which immediately stores a reference to exemplar in aT, the other methods in the MMHeap class can access the methods listed in the WorksWithMMHeap interface by invoking:

aT.newArray(100) ; aT.newObject() ; aT.compare(x,y) ;

Here's an example of a class Int4MMH which implements the WorksWithMMHeap interface: Int4MMH.java.

MMHeap Requirements

Here are the requirements for your implementation of the MMHeap class.

1. It must be in a package called mmheap.

2. You must have a constructor with the following signature:

public MMHeap(T exemplar, int initialSize) ;

This should be the only constructor for the class. The constructor should set aside an array big enough for initialSize items.

3. You must have a size() method that reports the number of items in the min-max heap:

public int size() ;

The size() method should run in constant time.

4. You must have an insert() method:

public void insert(T x) ;

The insert() method should run in O( log n ) time. Insertion of duplicate values is allowed. Duplicate entries should appear multiple times in the MMHeap.

5. You must have a dump() method that prints out the contents of the min-max heap (along with other debugging information):

public void dump() ;

6. You must have two methods, getMin() and getMax(), that return the smallest and the largest values currently in the min-max heap.

public T getMin() ; public T getMax() ;

These two methods should run in constant time. If the heap is empty, these methods should throw a NoSuchElementException exception.

7. You must have two methods, deleteMin() and deleteMax(), that remove and return the smallest and largest items, respectively:

public T deleteMin() ; public T deleteMax() ;

These two methods should run in O( log n ) time.

QHeap Requirements

Here are the requirements for your quartile heap class.

1. It must be in a package called mmheap.

2. The class must be declared as:

public class QHeap <T extends WorksWithMMHeap<T>>

3. You must implement a constructor with the signature:

public QHeap(T exemplar, int n) ;

4. You must implement a method insert() that adds an item to the quartile heap:

public void insert(T x) ;

The insert() method should run in O( log n ) time. Insertion of duplicate values is allowed. Duplicate entries should appear multiple times in the QHeap.

5. You must implement the statistics reporting methods:

public T getMin() ; public T getMax() ; public T get25() ; public T get50() ; public T get75() ;

All of these methods must run in constant time. If there are fewer than 4 items in the quartile heap, then calls to these methods may result in a NoSuchElementException exception.

6. The sizes of the 4 min-max heaps must be within 1 of each other. Furthermore, if there is a size difference, the bigger heaps must be the lower percentile heaps. So, this is OK:

Heap 1 ( 0% - 25%) 15 items
Heap 2 (25% - 50%) 15 items
Heap 3 (50% - 75%) 14 items
Heap 4 (75% - 100%) 14 items

But, this is not:

Heap 1 ( 0% - 25%) 15 items
Heap 2 (25% - 50%) 14 items
Heap 3 (50% - 75%) 15 items
Heap 4 (75% - 100%) 14 items

Note: this does not mean you insert a new item in a lower percentile heap first, since the placement of an item in a heap depends on its value. For example, if you have to insert a value that is the largest value you have seen so far, then that value must go in the fourth heap. Now your fourth heap might be bigger than the third heap. You have to remove the smallest item in the fourth heap and insert that into the third heap. This might in turn make the third heap too big ...

7. You must implement a dump() method that prints out the statistics of the quartile heap and also prints out the contents of the 4 min-max heaps.

Here are two small sample programs that should compile with your program on GL. You will need the interface defined in WorksWithMMHeap.java and the implementation of that interface in Int4MMH.java in order to run these programs.

Note: These are simple tests and do not check all the conditions necessary for deleteMin() and deleteMax(). You must at least also check all the cases from Homework 4.

Here's another test program for you to try out: Proj3CensusTest.java. You will need the class CensusCity.java, too. This test case uses census data from the United States Census Bureau. It also demonstrates that you can have objects of the same class compared in different ways. In this case the boolean data member use2011 determines whether the compare() method will use the 2010 census data or the 2011 population estimates.

Implementation Notes

Here we list some recommendations and point out some traps and pitfalls.

• You need some way to keep track of whether the last level of your min-max heap is an odd level or an even level. Think through this. If you mess this part up, then your min-max heap will be messed up.

• In deleteMin(), when the new item trickles down and hits bottom, there are two reasons that may force it to percolate back up the max heap levels. Make sure that you understand these two conditions. The situation is analogous for deleteMax().

• During a deleteMin(), there are 4 cases where you should stop the trickling down process:

1. You should stop when the key is actually in the right place. (It is smaller than the min items in the left and right subtrees.)
2. You should stop when the node has no children. (You actually hit bottom.)
3. You should stop when the node has 1 or 2 children, but no grandchildren.
4. You should stop when the node has 2 children, but only 1 or 2 grandchildren.

Note that it is impossible for a heap to have no right child, but the left child has children. Also, if you have 3 or 4 grandchildren, the only reason you would have stopped trickling down is that the key is already in the right place (Case 1).

You need to handle the four cases separately, because checking if you need to bounce up the maxheap levels has to be handled differently in each case. Also, Case 4 has subcases depends on whether the smallest item is the right child or one of the grandchildren. (Here's an illustration of Case 4 on a separate page.)

Of course, the situation is analogous for deleteMax().

• There are special cases for deleteMax() when the min-max heap has only one or two items.

• You are probably better off making 4 separate methods that trickle down the min heap levels, trickle down the max heap levels, percolate up the min heap levels and percolate up the max heap levels.

• In QHeap, Java is perfectly happy to make a generic MMHeap object:

MMHeap<T> mmh = new MMHeap<T>(aT, n) ;

but will throw a tantrum if you try to create an array of MMHeap:

MMHeap<T> [] mmhArray = new MMHeap<T>[4] ; // doesn't compile

• Let's just define the 25th percentile to be the maximum item in the first min-max heap, the 50th percentile to be the maximum item in the second min-max heap and the 75th percentile to be the maximum of the third min-max heap. Don't take the average of the maximum from one heap and the minimum from the next heap.

What to Submit

Read the course project submission procedures. Submission closes by script immediately after midnight. Submit well before the 11:59pm deadline, because 12:00:01 might already be late (the script takes only a few seconds to run).

You should copy over all of your Java source code and have your .java files in their own directories which are in turn under the src directory. You must also supply an ANT build file.

Make sure that your code is in the ~/cs341proj/proj3/ directory and not in a subdirectory of ~/cs341proj/proj3/. In particular, the following Unix commands should work. (Check this.)

cd ~/cs341proj/proj3 ant compile ant run ant clean