CMSC 341 Spring 2008
Project 3

Assigned

Monday, March 24, 2008

Due

11:59 pm on Friday, April 11, 2008

Note:

April 2, 2008

 

 

 

The full path to a file “words.txt” found in the section of command line and input file is

meant to illustrate how the command line looks like. This file does NOT exist.

 

You can find a small sample input file here. You can assume the actual test files will follow

the same format (Note: a word is separated from its occurrence count by one or more spaces

and/or tabs).

Objectives

In this project you will

  1. Get some experience working with binary search trees;
  2. Empirically observe the effectiveness of splay tree in maintaining good amortized time performance;
  3. Exercise in random sampling, a technique widely used in computer simulations in various applications.
  4. Have a taste of building GUI using java. This will earn you extra credits!

Description

You are given a set S of English words and the number of their occurrences found in a text corpus. You are asked to build a binary search tree (BST) and a splay tree by inserting words randomly selected from S according to their frequencies (e.g., more frequently occurring words shall have more chances to be selected). How exactly the words are selected is described in the section Random sampling below. The cycle of “sampling and inserting?continues for N times where N is a pre-specified number. In the process, you shall collect statistics and periodically output the performance measures.

 

Tree construction: use the insert function in the author’s code for BST and Splay tree, respectively. The author’s code can be found at afs/umbc.edu/users/y/p/ypeng/pub/CMSC341/Proj3.

 

Output:

number of total insertions and number of failed insertions;

number of comparisons per attempted insertion for these 2000 insertions

number of comparisons per attempted insertion for all insertions

height of the tree

for splay trees,  the total number of rotations of any kind

 
           Insertions   Failed       Comparisons     Comparisons     Height   Rotations
                        Insertions   Per Insertion   Per Insertion
                                     (last 2000)     (cumulative)
           ===============================================================================
     BST     2000       1791         7.690500        7.690500        17         NA
     Splay   2000       1791         5.916000        5.916000        14       4033
           ===============================================================================
     BST     4000       3777         8.095000        7.892750        17         NA
     Splay   4000       3777         6.100000        6.008000        18       8085
           ===============================================================================
     ...
 
     BST in level order:
        Level 0: year (958)
        Level 1: urgent (21)
        Level 2: said (1961)   wall (160)
        Level 3: friend (133)  several (377)   visitors (36)   wearing (47)
        Level 4: asked (398)   one (3297)   see (772)   time (1601)   vivid (25)
     Splay tree in level order
        ...

 

Random sampling: You will read the set of words (wi) and their occurrences (oi) from an input text file, for example:

 
     ability  74
     able     216
     about    1815
     above    296
     academic 56
     ...

 

The frequencies of these words are then calculated by

           fi = oi /sumk (ok).

The set { fi} (see the third row in Table 1 below) defines the probability distribution of all words in S = {wi}. Each time we want a word to insert, a particular word wi should have the chance of fi be selected. This process, called random sampling, can be realized in a deterministic procedure with the help of a random number generator and a quantity Fi defined as

          F0 = 0;   F1 = f1;   Fi = sumk<i (fk) for i > 1.

Fi can be viewed as the cumulative probabilities of all words with index less than i (see the forth row in Table 1 below). Note that 1) Fi is monotonically increasing and 2) fi = Fi ?Fi-1.

            Table 1: words and their frequencies

i

1

2

3

4

5

wi

ability

able

about

above

academic

oi

74

216

1815

296

56

fi

0.030118

0.087912

0.738706

0.120472

0.022792

Fi

0.030118

0.11803

0.856736

0.977208

1.0

 

Finally, the word selection/sampling procedure goes as follows:

  1. generate a random number r (r is a real number between 0 and 1 and should be generated from a uniform distribution)
  2. return wi with Fi-1 < r <= Fi.

 

The following is a sequence of random numbers and the corresponding words selected from Table 1, together with the BST resulted from inserting these words.    

 

Command line and input file:

Project 3 will be invoked with a command line that consists of 3 arguments: 1) integer N, the total number of insertions to be executed; 2) integer Seed, to be used as the seed for random number generation; and 3) the name of an input file of (wi, oi) pairs. For example

        proj3/Proj3 20000 153976 /afs/umbc.edu/users/y/p/ypeng/pub/cs341s08/test/Proj3/words.txt


Project Requirements and Notes

  1. Download the author’s code for BST

BinarySearchTree.java

SplayTree.java

UnderflowException.java

from afs/umbc.edu/users/y/p/ypeng/pub/CMSC341/Proj3.

  1. Modify the author’s code as necessary to carry out the tasks described in the last section.
  2. You are free to use any data structure to store the words and their frequency information. You need to write your own code for random sampling procedure for selecting words for insertion.
  3. Use java.util.random to create the random number generator, which will take the second argument from the command line as the seed. Generate successive random numbers (real numbers between 0 and 1) using the generator’s nextDouble()method.
  4. Put all your classes into a package proj3. Your executable should EXACTLY be proj3/Proj3.
  5. Answer the questions posed in 341-Spring08-proj3_questions.txt. Copy the file
    /afs/umbc.edu/users/y/p/ypeng/pub/CMSC341/Proj3/341-Spring08-proj3_questions.txt to your own directory and edit it to provide your answers to the questions. Don't forget to submit the edited file; it's 10% of this project's grade.
  6. Again you can assume the input file is well formed and that all words are in lower case. You can also assume that N, the first command line argument, is a multiple of 2000.

GUI (20 points of extract credit)

Another way to observe the behavior of different types of trees is to let a user manually build and manipulate the trees. This can be facilitated by a graphic user interface (gui) which accepts the instructions from the user, carries out the instructions, and outputs, in the human understandable form, the results. You can find useful Java tools for building GUI in its AWT and SWING classes. 

 

In this project, you are asked to implement a GUI to interact with BOTH standard BST and splay tree as implemented by the author’s code.  In other words, you do not need to worry about things like words and their frequencies. The following are the specific requirements. For full extra credit, you need to implement all these components and functionality.

 

Layout:

Components:

 

  1. A text field to allow the user to enter elements.
  2. A text display area for printing output.
  3. Four function buttons for tree operations:

 

At the click of a button a user will need to be able to:

 

  1. Add an element.
  2. Remove an element.
  3. Find an element.
  4. Empty the trees.

 

The Display Area:

 

The display area is to provide two services to the user.  The first is to allow the user to see the result of their interactions with the trees.  The second is to function as a history that can be scrolled through to see the changes of the tree structures.  This gives rise to the following functionality.

 

Requirements:

 

Output requirements:

 

The User Input Area:

 

 

Hints:

 

  1. Tutorials for java GUI are at http://java.sun.com/docs/books/tutorial/uiswing/
  2. All java components can be found at: http://java.sun.com/docs/books/tutorial/uiswing/components/index.html
  3. If you are using ssh you will need a ip forwarding ssh client such as xwin32 or cygwin.

 


Files To Be Submitted

Submit the following files:

  1. *.java (including Proj3.java),
  2. 341-Spring08-proj3_questions.txt
  3. README

Submit the files using the procedure below.

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. The class name is cs341 and the project name is Proj3. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.