CMSC 341 Spring 2005
Project 2

Assigned

Wednesday, March 2, 2005

Due

11:59 pm on Friday March 18, 2005
NOTE DATE CHANGE!

Update: March 7th

Addenda and Clarifications for Project 2 have been posted.

Update: March 10th

A new version of P2DataGen.o is available. It will exhibit a higher degree of recency than the original. The original is still available, for those wishing to experiment with it. Its name is OldP2DataGen.o

Update: March 14th

Sample Output for Project 2 is available.

 


Objectives

The List ADT is a fundamental building block for other ADTs such as the stack and queue.  This project will give you experience working with lists in a computer experiment. In this experiment, you are asked to compare the time performance of the standard list (as defined in the class) and lists known as most-recently-accessed-first (MRAF) with different data sets of varying size that are randomly generated to simulate different application scenarios.


Computer experiments like this (i.e., simulating problems using randomly generated instances to minimize potential bias) are commonly used in computer science and related fields to validate theoretical results, test hypotheses, or observe trends and properties. As a simple example, this project will give you the basic ideas of such computer experiments.


Description

The list ADT is a container with the property that data may be inserted at any position. This flexibility leaves the designer of an application free to insert objects anywhere, depending on the application's needs. Consider an application that constructs a list from an input stream with duplicate items. The following algorithm can be used to hold input items:

for each input element x, if x is in the list, do nothing,

otherwise, insert x at the end of the list.

 

This strategy is called IAE, "insert at end" of the list. We think this strategy may be inefficient when input data exhibits recency. Here recency refers to a phenomenon where items accessed recently are more likely to be accessed soon;  the probability of accessing the same element decreases, the longer it has been since the last occurrence of that element in the input stream. Some phenomena that display this property are the URL's visited in a web session and captured in a browser's cache, "send-to" email addresses, and conversation topics in threaded discussions (as found in Net News, for instance). Two well-known mechanisms to support efficient access of data exhibiting this property are caches in computers and human short-term memory. In this project, we are going to invent a new one, the Most-Recent-Aceesed-First (MRAF). The placement strategy for the MRAF list is:

 

for each input element x, if x is in the list, move it to the beginning of the list, otherwise, insert x at the beginning of the list.

 

In this project, you will extend and modify the text's template list class to support these two strategies, IAE and MRAF, for list construction, and compare their time performance experimentally with several data sets with different degrees of  recency.


Input Data

Most of the time, we expect to obtain input from an input stream. For this exercise, we would like to reuse the same input data for each of our experiments. We will accomplish this by using a library of functions designed for this experiment. Instead of declaring an input stream variable, you will need to use the following functions to obtain test data.

Experiment data are provided by two functions, whose prototypes are:

            int getRecencyDatum(); and

            int getRandomDatum();

 

            getRecencyDatum returns a number from data that exhibits recency.

            getRandomDatm returns a number where recency is not present.

 

The interfaces for these two functions (and a number of other functions) is in the header file P2DataGen.h, located in /afs/umbc.edu/users/e/d/edelman/pub/CMSC341/Project2. There is also a file named P2DataGen.o, which you will also need to include.

The header file contains detailed instructions for initializing the functions you need.


The Experiment:

Define an access to be the set of operations performed to process one input element x. The cost of an access is the number of list comparisons required to locate the item in the list.  We expect there to be a performance difference between MRAF and IAE, as measured by the cost of accesses. To examine the cost of access as a function of the recency of the data, you will collect and display experimental data for a series of test runs.

 

 

The main section:

1)      read a predetermined number of items (the number of items is determined from the command line parameter N, the number of  input items to be processed) and insert each one into both an MRAF List and also an IAE List.

2)      after  each K accesses (K is also a command line parameter), output the experimental statistics collected so far in the experiment. For example, if N were 500 and K were 50, your code should show intermediate statistics 10 times.

3)      count the following: the number of  items obtained from the RecencyDatum source, the number of items obtained from the RandomDatum source, the number of unique items (this is an indirect measure of the degree of recency), the total number of compares for finds.

 

You will run the main experiment 5 times.

            The first run will have NO recency. all data will be drawn from the random data source (using function GetRandomDatum() to obtain all data for the experiment).

The second run will have LOW recency. This means that 25% of the input data will be drawn at random from the high recency data (using function GetRecencyDatum() for data exhibiting recency, and GetRandomData() for the rest).

            The third run will have MEDIUM recency. 50% of the input will be drawn at random from the high recency data

            The fourth  run will have HIGH recency. 75% of the input will be drawn at random from the high recency data

            the fifth run will have VERY HIGH recency – all of the data will be drawn from the high recency data.

Between each experiment, you will need to issue a call to the function ResetData(), which (essentially) "rewinds" the input"streams". This gives us the same data for each of our experiments. More information on ResetData is included in the header file P2DataGen.h

Note: "drawing at random" means that you will need to generate a random number drawn from the half-open interval [0, 1).

 


The Command Line

Project 2 will be invoked with a command line that contains these parameters

N – the total number of items to be input (an integer)

K – the reporting count (an integer)


Sample Output (not from a run)

Your program output for each should look something like this:

 

After 500 input items: LOW recency:

            number of items drawn from recency  125

            number of items drawn from random:  375

            number of unique items drawn from recency:  83

            munber of unique items drawn from random:  306

                                                MRAF                         IAE

            Total Accesses             401                              1944

            Average Number          8.325                          12.968

                of Accesses

 

Your code should produce these blocks after every K input items and also after processing all N inputs.


Your Tasks

1)      Study the behavior of the C++ random number generating functions. You may do this by examining the man pages for the functions srand(), rand(), and drand48().

2)      Obtain the header (.h) and implementation (.cpp) files for List, ListNode, ListItr,  and dsexceptions.h from /afs/umbc.edu/users/e/d/edelman/pub/CMSC341/Project2

3)      Modify the List class to produce a templatized MRAF List class

4)      Modify the List class to produce a templatized  IAE List class

5)      Write and test a main function to conduct the experiment and report its results.

6)      Write or modify the makefile you used in Project 1 to build project 2. Note: your makefile must reference the files P2DataGen.h and P2DataGen.o. Do not submit these files and do not copy them to your personal directory

7)      You may use the author's exception class or write your own.

8)      Copy the Project 2 questions to your directory and answer them. Make sure to submit them with your project

 

 


Program Requirements

  1. Your code must follow all project guidelines established for this course. See the project organization and project policies web pages.
  2. You must follow a consistent coding style, as recommended by your professor.
    One example is the CS 202 Coding Standards web page.
  3. Your main program file must be named Proj2.cpp
  4. Your executable must be named Proj2
  5. You may use any of the author's code (LinkedList.h, LinkedList.cpp, iterator, node classes, etc). 
  6. Any C++ class you implement that has a public interface must include the "Big-4".
  7. Your code must handle (try/throw/catch) any exceptions thrown by the author's code or by your own classes.

Files To Be Submitted

Submit any files which you have written or modified -- source code and makefile. Your files MUST be named as listed below.

Submit the files using the procedure given to you for your section of the course.
If your makefile is set up correctly, you should be able to execute the command make submit.

DO NOT submit the author's unmodified LinkedList, ListNode, ListItr, or any files that you didn't write. These files should be referenced by your makefile.



Submission Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj2 <parameter list>

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project>

Example:  submitrun cs341 Proj2

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.