Project 1, Square Lists

Due: Tuesday, October 1, 11:59pm

Objectives

The objective of this programming assignment is to have you practice implementing a new data structure and also to gain some experience using the a data structure from the Java Collections API.

Introduction

For this project, you will implement a square list data structure. A square list is a linked list of linked lists where each linked list is guaranteed to have O(√n) items:

Fig. 1: A typical square list.

Figure 1 shows a typical square list of 25 integer values. The positions of the items are important. The ordering of the items in this example is: 3, 2, 5, 10, 21, 9, 32, 14, 41, 20, 23, ..., 15, 1, 11.

The idea of a square list is that we want to keep the length of the top-level and of all the inner lists bounded by O(√n). That way we can find the i^th item of the list in O(√n) time. For example, if we want to find the 9th item in the list, we can progress down the top-level list and check the length of the inner lists. We know the 9th item cannot be in the first inner list, since it only has 3 items. It also cannot be in the second inner list, since the first two lists combined only has 6 items. Instead, we can find the 9th item of the square list by looking for the 3rd item of the third inner list, which turns out to be 41.

To accomplish this search in O(√n) time, we must be able to determine the length of each inner list in O(1) time. In the worst case, we have to search through O(√n) items of the top-level list before we find the inner list that holds the i^th item. An additional O(√n) steps will allow us to find the desired item in that inner list, since the length of each inner list is also O(√n).

The main difficulty in maintaining a square list is that as we add items to and remove items from a square list, the length of the inner lists can change. This can happen in two ways. First, obviously, when we add items to or remove items from an inner list, the length of that inner list changes. Secondly, the length of an inner list relative to O(√n) can also change when we add or remove items elsewhere in the square list because doing so changes the value of n. For example, the 5th inner list in Figure 1 has 5 items. This happens to be exactly √25. If we removed all 10 items from the first 3 inner lists, that would leave us with only 15 items in the entire square list. Now the length 5th inner list is bigger than √15 even though the length of the 5th inner list remained at 5.

Square List Maintenance

Our goal is to make sure that the top-level list and all the inner lists have lengths bounded by O(√n). It is too expensive to require that our square list always has √n inner lists, each with √n items. Instead, we maintain the following two conditions:

Condition 1:: Every inner list has ≤ 2 √n items.
Condition 2:: There are no adjacent short inner lists, where short is defined as having ≤ √n/2 items.

Notice that neither condition says anything about the length of the top-level list. Instead, we claim that if Condition 2 holds, then the top-level list cannot have more than 4 √n items. Too see this, suppose the contrary. That is, suppose that the top-level list has more than 4 √n items. (Yes, this is the beginning of a proof by contradiction.) Then, there must be more than 2 √n inner lists that are not short (otherwise, two of the short inner lists would be adjacent). Thus, the total number of items in these non-short lists must exceed 2 √n × √n/2 = n. This is a contradiction because n is the number of items (by definition) and cannot exceed itself. Therefore, the number of inner lists (and thus the length of the top-level list) must be bounded by 4 √n.

These observations allow us to maintain the O(√n) bounds on the lengths of the top-level list and the inner lists using the following procedure:

Consolidate:

Traverse the top-level list.
Whenever an empty inner list is encountered, remove that inner list.
Whenever two adjacent short inner lists are encountered, merge them into a single inner list. (See Figures 2 and 3.)
Whenever an inner list is found to have more than 2 √n items, split them into two lists of equal length. (See Figure 4.)

Some notes on the Consolidate procedure:

Our strategy for this project is to run Consolidate after every operation that adds an item to or removes an item from the square list. (This is a simplification. See Addendum for a longer explanation.)
When two short lists are merged into one, the order of the items in the square list must not change. In Figure 2, the original order of the items in the short lists where 10, 21, 32, 14. In Figure 3, the order of the items in the merged list is the same.
We need the data structure for the inner list to support a merge operation in constant time. A singly linked list with a tail pointer would work.
In Figure 4, the inner list that is too long has an even number of items. If a long list has an odd number of items, then after the split, one list will have one more item than the other. This does not affect the asymptotic running time.
When a long list is split, the order of the items must be preserved. (See Figure 4.)
Without any splits, the total running time for the Consolidate procedure is O(√n), because we can merge short lists in constant time.
The split step can be costly because it takes O(t) time to split an inner list in half, where t is the length of the inner list. We can show using amortized analysis that splits do not happen very often. The proof is not hard but is beyond the scope of this course. The amortized analysis gives an amortized running time of O(√n) for most of the list operations (except indexOf). The amortized analysis shows that any mix of m list operations (not including indexOf) will take a total running time of O(m√n). Thus, the amortized time for each of the m square list operations is O(√n). Although, it is tempting to think of the amortized running time as an "average" running time, this is not accurate because the amortized analysis does not depend on the sequence of operations being "nice" or "average" in any way. Even when an adversary chooses the nastiest sequence of operations which results in the maximum number of splits, the total running time for that sequence of m operations will still be bounded by O(m√n).

Fig. 2: A square list with adjacent short inner lists. Note that 2 < √22/2 ≈ 2.345.

Fig. 3: Adjacent short lists merged.

Fig. 4: A long inner list split into two lists. Note that 6 > 2 √8 ≈ 5.657.

Assignment

Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.

Your assignment is to implement a square list data structure that hold Java Integer values. The inner lists must be implemented as a singly-linked lists with dummy headers and tail pointers. This implementation should be your own work. In particular, you must not use any data structures from Java Collections API for the inner lists. For the top-level list, you must use the generic LinkedList class from the Java Collections API.

This assignment specifies the interface between the main program and your square list implementation, but you are free to design the data structure for the inner list as you please (subject to performance considerations listed below). Note that design is part of the grading criteria, so you do need to apply good design principles to your inner list data structure.

Your inner list class must have the following methods with the specified running times.

A method called merge() that merges two inner lists in constant time.
A method called size() that reports the length of an inner list in constant time.
A method called split() that divides an inner list into two lists of approximately equal size. (The lengths of the two lists should differ by at most 1.) This method should take O(t) time, where t is the length of the original inner list.

Your SquareList class must have the following methods with the specified signatures and running times.

A default constructor that initializes a SquareList properly. It should run in O(1) time.
A consolidate() method implemented as described above. This method should run in O(√n) time not counting splits. The splits should take time proportional to the length of the inner list that is split.
Two methods to insert at the beginning and at the end of a SquareList with the following signatures:
```
public void addFirst (Integer data) 
public void addLast (Integer data) 
```
These methods should call consolidate() after insertion. They must run in constant time, not counting the time for consolidate().
A method to remove the item at the beginning of a SquareList with the following signature:
```
public Integer removeFirst () 
```
This method must return the Integer value that was at the beginning of the list or null if the list was empty. This method should call consolidate() after removal. The removeFirst() method should run in constant time, not counting the time for consolidate().
A method to add an item at a given position of a SquareList.
```
public void add(int pos, Integer data) 
```
Positions start at 0. Thus, add(0,5) should insert 5 at the beginning of the list. Also, if a square list originally held 1, 2, 3, 4, 5, then after add(2,99) the list should hold 1, 2, 99, 3, 4, 5. If pos is not valid, this method should do nothing. This method should call consolidate() after insertion. The add() method should take time O(√n) not counting the time for consolidate().
A method to remove an item from a given position of a SquareList and return its value.
```
public Integer remove(int pos) 
```
As with add(), positions start at 0. So, if a square list originally held 1, 2, 3, 4, 5, then after remove(3) the list should hold 1, 2, 3, 5. If pos is not valid, this method should return null. This method should call consolidate() after removal. The remove() method should take time O(√n) not counting the time for consolidate().
A method to return the value of an item at a given position of a SquareList.
```
public Integer get(int pos) 
```
As with add(), positions start at 0. So, if a square list originally held 1, 2, 3, 4, 5, then get(2) should return 3. If pos is not valid, this method should return null. The get() method should take time O(√n).
A method to change the value of an item at a given position of a SquareList.
```
public void set(int pos, Integer data) 
```
As with add(), positions start at 0. So, if a square list originally held 1, 2, 3, 4, 5, then after set(2,99) the list should hold 1, 2, 99, 4, 5. The set() method should take time O(√n).
A method that returns the number of items in a SquareList.
```
public int size() 
```
The size() method should run in constant time.
A method that returns the position of the first occurrence of a value in a SquareList.
```
public int indexOf(Integer data) 
```
If data does not appear in the list, then this method should return -1. As with add(), positions start at 0. So, if a square list originally held 1, 2, 3, 4, 5, then indexOf(5) should return 4. The indexOf() method should run in O(n) time.
A debugging method:
```
public void dump() 
```
The dump() method should print out the size of the SquareList and for each inner list, the size of the inner list, each item in the inner list and the item that the tail pointer references. The output should clearly indicate where an inner list begins and ends. The running time of dump() does not matter since it is used for debugging purposes only. However, you should implement dump() in the most reliable manner possible (i.e., avoid calls to methods which might themselves be buggy).

Finally, your square list class must be called SquareList and must be accessible from a main program in a different package. You can check that your code compiles correctly with this sample main program: Proj1Test.java. This test program must be placed in a separate directory named driver (since it belongs to the driver package). The output of Proj1Test.java should look something like this: Proj1Test.txt. The output may differ somewhat depending, for example, on how you split long lists with odd length. Your code must compile with Proj1Test.java without alteration.

Implementation Notes

Here we list some recommendations and point out some traps and pitfalls.

Start this project early. This is a two-week project, not a one-week project plus a one-week delay. You will most likely need all two weeks to finish.
Apply the incremental programming methodology. Implement one or two methods and fully debug these methods before writing more code. Implement and fully debug your inner list data structure before you do anything for the SquareList class.
Your data structure for the inner lists cannot be a doubly linked list.
In a singly-linked list, a dummy header node is an extra node in the linked list that comes before the first node that contains data. When the list is empty, the list has just the header node and no other nodes. Having a header node simplifies coding for singly linked lists. For example, insertion at the beginning of the list does not require a special case.
A tail pointer is not a tail node (which would not be very useful in a singly-linked list). A tail pointer is simply a reference to the last node in the linked list. When the list is empty, the tail pointer should reference the dummy header. Tail pointers make the merge operation fast, since we don't have to search the linked list to find the last node.
Common errors while implementing singly linked lists with dummy headers and tail pointers:
- Off-by-one errors: inserting or removing an item in the position one spot before or after
- Forgetting to update the size of the list
- Forgetting to update the tail pointer when the last item is inserted or removed
You do not have to implement the full suite of list operations for the inner list data structure — just whatever is needed to support the SquareList operations.
Be paranoid! Check that you are not merging a list with itself.
Section 3.3 of the textbook has a good description of the generic LinkedList class and iterators.
In the LinkedList class, the methods getFirst() and getLast() throw NoSuchElementException exceptions if the list is empty. Your code must catch these exceptions. (Otherwise it won't compile with Proj1Test.java.)
The logic for the consolidate() method can be tricky. Think through this before coding.
Do not use two iterators simultaneously on the LinkedList for the top-level list. You will need to use the remove() method when you delete empty inner lists and when you merge short lists. When you use one iterator to invoke the remove() method, it will invalidate the other iterator. When you use the next() method again, it will throw the exception: ConcurrentModificationException.
Do write your own main program for testing. Proj1Test.java is a basic test that checks a few features and that your program will compile correctly. It is not a comprehensive test. Just because your program runs correctly with Proj1Test.java, it does not mean you will receive 100. It does not mean your program does not have bugs. Your program will be tested and graded against other main programs with bigger, meaner and nastier test cases. Just as an example, Proj1Test.java never checks if removeFirst() works correctly when the square list is empty, but you should.
Do document your code.

What to Submit

Follow the course project submission procedures. You should copy over all of your Java source code with .java files in their own directories which are in turn under the src directory. You must also supply an Ant build file.

Make sure that your code is in the ~/cs341proj/proj1/ directory and not in a subdirectory of ~/cs341proj/proj1/. In particular, the following Unix commands should work. (Check this.)

cd ~/cs341proj/proj1
ant compile
ant run
ant clean

Discussion Topics

Here are some topics to think about to help you understand square lists. You can discuss these topics with other students without contradicting the course Academic Conduct Policy.

Suppose you start with an empty square list and keep inserting items in the front of the list. When does the first merge occur?
What is the smallest number of items you can have in a square list that has 11 inner lists?
Do we ever encounter long inner lists that have to be split (other than the first inner list) if we only allowed insertion and removal at the beginning of the list?
After you split an inner list, is it possible that the same inner list has to be split again after the very next square list operation? after two operations? when could the next split occur?
Can you ever encounter 3 short lists in a row during the Consolidate procedure? Does it matter? and should you write code whose correctness depends on the answer to these questions?

Addendum

Here are some ways we could improve the square list data structure. In particular, we can make addFirst(), removeFirst() and addLast() run in O(1) amortized time instead of O(√n) amortized time. Implementing these features would make the running time of set() and get() O(√n) amortized time instead of O(√n) worst case time.

However, do not submit code with these features as they will totally mess up the grading.

Early empty list deletion:: As soon as you notice that an inner list is empty, delete it. Don't wait for consolidate().
Don't make long lists:: In addFirst() and addLast(), start a new inner list if the length of the first or last inner list will exceed √n after insertion.
Early splits:: Whenever we perform an operation on an inner list that involves looking through the list (for example, a search, but not size()), split the list if the length is already greater than 1.5 √n. Since we are looking through the list already, we can do the split now instead of later.
Early merge:: Whenever we remove an item from an inner list, check to see if the inner list is now a short list. If the list is now short and an adjacent inner list is also short, we can merge those two now instead of waiting for consolidate(). This will only take an additional O(1) time.
Delayed Consolidation:: Don't consolidate after every operation that changes the length of the list. Wait for the number of inner lists to exceed 5 √n. It costs us O( √n) time just to look through the top-level list. We shouldn't do this for every addFirst() or addLast(), especially if we don't make long lists. Wait until enough short lists pile up and then merge them in one sweep.
Delayed Splits:: Long lists don't bother us unless we have to search through them. So, during consolidate, just ignore the long lists. The long lists will be split when we work with them if we implement early splits. This actually turns early splits into delayed splits, so we should adjust the threshold for early/delayed splits back to 2 √n.

If we implemented all of these strategies, an amortized analysis will show that addFirst(), removeFirst() and addLast() run in O(1) amortized time.

CMSC 341 — Data Structures — Fall 2013