CMSC-341 Fall 2002

Project 3

Assigned 09 Oct 2002
Due 23 Oct 2002 by 11:59PM
Updates
14 Oct 2002
The constructors in the author's BST code have been modified. The order of the items in the initialization list has been reversed. The original order was causing warnings on some version of the compiler.
09 Oct 2002
The project description says that all ConcordanceEntry methods must be private. That is not correct. You need to be able to create a ConcordanceEntry outside of Concordance code so that you can pass it in to the Concordance constructor. Make the methods described in this document for ConcordanceEntrys public.
09 Oct 2002
You must count blank lines when computing line numbers. For example, if a text file starts with two blank lines and then has the text "Four score and seven years ago", the word "four" occurs on line 3. The text on which the sample output is based had some blank lines that have been removed so that the sample output conforms to this clarification.


Background

This project will give you experience working with Binary Search Trees and tree iterators in the context of implementing a Concordance ADT. The Merriam-Webster online dictionary says that a concordance is "an alphabetical index of the principal words in a book or the works of an author with their immediate contexts". To build a concordance, your program will read the words in data file, store them in a BST, and record for each word the line number(s) on which it occurs. Iterators will be used to walk through and print out ranges of words in the concordance.

Note: A word is defined to be any string of characters delimited by white space or punctuation. For this project we will consider only the following to be punctuation - period, question mark, exclamation point, comma, semi-colon. For example, if the data file contains "dog.cat", the concordance should contain "dog" and "cat", but not "dog.cat". Also, convert all words to either upper or lower case before adding them to the concordance.

Because C++ allows us to define ADTs, via templates, that can hold any type of object, we can implement a more general kind of concordance that is an ordered index of objects along with information about where they occur in a dataset. It is this more general definition of a concordance that you will implement, though you will instantiate it to build a concordance of the kind described in the dictionary entry above.


Description

Your concordance implementation will require you to write four new classes - ConcordanceEntry, Concordance, BinarySearchTreeItr, and ConcordanceItr. The Concordance will store ConcordanceEntrys in a BST so that they can be accessed quickly in sorted order, and each ConcordanceEntry will contain a linked list of the locations (lines numbers) of the corresponding object (word) in the dataset (text file). Concordance iterators will be used to walk through concordance entries, and they will make use of BST iterators.

In this project, as with the previous projects, you will read commands from a file. There will be commands for loading a text file and building a concordance and for printing the contents of the concordance in various ways.

Here are your tasks:

  1. Copy the linked list files (LinkedList.C, LinkedList.H, and dsexceptions.H) from Dr. Oates' public directory
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341
    to your directory.

  2. Copy the BST files (BinarySearchTree.C and BinarySearchTree.H) from Dr. Oates' public directory
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341
    to your directory. The BST code uses dsexceptions.H, but you will have already copied it with the linked list code.

  3. Implement the ConcordanceEntry class.

  4. Implement the Concordance class.

  5. Implement the BinarySearchTreeItr class using the example implementations of tree iterators given in the lecture notes as a starting point (though it's OK to come up with a completely new implementation if you want to).

  6. Implement the ConcordanceItr class. It will create and manipulate BinarySearchTreeItrs.

  7. Design and implement Proj3.C which contains main( ) and is the driver program for this project.

  8. Implement a makefile for this project.

  9. Answer the questions posed in 341-Fall02-proj3_questions.txt. Copy the file
    /afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj3/341-Fall02-p3_questions.txt
    to your own directory and edit it to provide your answers to the questions. Don't forget to submit the edited file; it is 10% of this project's grade.

Definition of the ADT

All classes that you design and implement must support the "Big-4", even if the default behavior provided by the compiler would be acceptable.

As always, you are free to add whatever private data and methods you desire, and you must provide the methods listed in this project description. If you need to add any additional public methods, think very carefully about it and provide ample justification for why the method must be public in comments in your code. Points will be deducted for public methods that are not well justified.

The ConcordanceEntry

Each concordance entry will contain an object and a list of the places that it occurs. Because this project will require you to build concordances for text documents in which you maintain a list of the lines on which each word occurs, your code will declare concordance entries as follows: Note that this implies you must define the Concordance and ConcordanceEntry classes to accept two distinct template parameters - the type of the object and the type of the information stored about occurrences of the object.

You must store occurrence information in a singly linked list which is a private data member in the ConcordanceEntry class. For this project, the list will contain the line numbers on which the word occurs in the data file.

You will need to implement at least the following methods:

The Concordance

The Concordance class stores ConcordanceEntrys in a BST so that they can be accessed quickly in sorted order. You must implement the following methods: Your concordance must be able to supply three different types of iterators. They are as follows:

The BinarySearchTreeItr

You must implement a binary search tree iterator that, when advanced, performs an inorder traversal of the tree. Example implementations are in the lecture notes, and you are free to use the code therein. You may also write an implementation from scratch if you want to. Be sure to put the iterator interface and implementation in their own .C and .H files, respectively. That is, do not put them in BinarySearchTree.C and BinarySearchTree.H. You may, however, modify these files as needed to accommodate the iterator, e.g. by adding friend declarations or methods for getting iterators from a BST.

The BST iterator must support at least the following methods: isPastEnd, advance, and retrieve. Their behavior should be analogous to the behavior of the corresponding ListItr methods.

You must add a first method to the BST class that returns an iterator pointing to the first element in the BST in an inorder traversal.

The ConcordanceItr

The ConcordanceItr class will create and manipulate BST iterators to walk through the elements of a concordance. You must implement at least the following methods: isPastEnd, advance, and retrieve. Their behavior should be analogous to the behavior of the corresponding BinarySearchTreeItr methods.

Note that you will need additional private data members to support iteration over a range of values (as described above for the Concordance class). All iterator functionality should be implemented within this one class. That is, there should not, for example, be distinct classes or advance methods to handle range iterators.


The Command Line

Project 3 will be invoked with a command line that consists of a single argument -- the name of the command file to read. As usual, you should check the validity of all command line arguments.

The Command File

Commands in the file specify operations to be performed with concordances. Each line in the file represents one command. Blank lines may appear anywhere in the file and should be ignored. Otherwise, you can assume that any line containing a command is syntactically well-formed. We make this assumption so you don't have to spend lots of time making the command file parser bullet proof.

The command file format follows:


Sample Output

Sample output is available for your study at

/afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj1/341-Fall02-p3-sample_output.txt
The datafile loaded in the sample output contains the following text:

Tomorrow, and tomorrow, and tomorrow
Creeps in this petty pace from day to day
To the last syllable of recorded time


Project Submission

DO submit these files
Because you are submitting your own makefile, you are free to name your files whatever you like, with the following restrictions:
DO NOT submit these files
Do not submit any of the linked list or BST code UNLESS YOU MAKE MODIFICATIONS to it. If you use the code in its original form, include it from Dr. Oates' directory via your makefile. If you make changes to the linked list or BST code, submit your changes and have your makefile compile with your versions.
Submit Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj2

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj2 test.dat

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Fall02-proj2_questions.txt are worth 10% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.