- CMSC 341: Project 2

Project 2

Assigned Sept. 25, 2000
Due Oct. 9, 2000


Background

The list is probably the most frequently-used data structure. Understanding the list ADT is essential to almost everything that follows in this course. You will have the opportunity to implement a list in Project 2, and to become familiar with iterators. The application is to develop a concordance.

A concordance is an sorted list of symbols (in our instance, words) and the locations (for us, lines) where they appear in a document (for us, a body of text). Generally, concordances are generated for such things as extensively studied works of literature and religion. For instance, you might use a concordance to find all the places that Shakespeare uses the word 'arrow' or that 'fish' are mentioned in the Bible. The user interface for your digital concordance will include functions such as:


Objectives

This project will deepen your skills in coding with templates. You will
  1. work with the listADT and extend it by creating a data type, the sorted list
  2. work with iterators
  3. practice designing robust classes
In addition, you will continue to practice using the string data class.

What to do

You will implement three classes: the first class is a ConcordanceEntry class, which defines one type of entry in the concordance (you could theoretically make a concordance to index things other than words, but we won't do that now). A ConcordanceEntry consists of a word and a sorted list of line numbers on which the word appears in a body of text. The class ConcordanceEntry needs to be Comparable. The ordering relation should be on data member that holds the word.

The second class is the SortedList. The third class is the Concordance, which is a type of SortedList of entries. For this assignment, the concordance entries will be ConcordanceEntry. You'll start with the list ADT as described in Chapter 3 to implement these classes, and the data structure you'll use is the singly-linked implementation of lists (not an array or vector implementation).

The main function
Your main function should do these tasks:
  1. read the command line to obtain the name of the text file for which it will build the concordance
  2. build the concordance
  3. display the concordance on cout. The display should be in word order.
  4. for each word, on how many lines does it appear?
  5. for the most frequently appearing word, print all lines on which it appears. This is called "word in context." Exclude the following overly common words: the, a, an, and, or.
The SortedList class
The sorted list is first of all, a list. It has the added property that each element in the list compares < its successor. The class interface is identical to the one presented in the book, except for "insert" which should first find the appropriate sorted-order place for the item and then place it there. The header file for this class may be found in the course directory for project 2 (/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2).
The Concordance class
The concordance is a sorted list of entries. Like other sorted lists, it has the property that each element in the list compares < its successor. The header file for this class may be found in the course directory for project 2 (/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/Concordance.H). You may add methods and data to this class, but may not change or remove what is there . Most methods of the Concordance class are standard to the lists we've seen.
The ConcordanceEntry class
For this assignment, each concordance entry is a ConcordanceEntry and has two data members: a string, which is the word being stored in the concordance, and a list of line numbers. A "word" is a bunch of characters. Words may contain letters, numbers, and the symbols "-" (hyphen) and "'" (apostrophe). All other characters should be considered to be delimiters. Lines containing only delimiters (and no valid characters) should be considered to be empty. Empty lines are counted, but do not add entries to the concordance. You need to decide on some standard way of dealing with capitalization: the word "tomorrow", in the following quote
    Tomorrow, and tomorrow, and tomorrow
        Creeps in this petty pace from day to day
	To the last syllable of recorded time;
     
needs to be included in the concordance only once, even though on occurrence is capitalized and others are not.
Also, the concordance should have only one occurrence of a line number for a word, even if that word appears more than once in a line, as do "tomorrow" and "and".

Design and implement your own ConcordanceEntry class.

Grading and Academic Integrity

Project grading is described in the Project Policy handout. Please remember that you must provide good documentation as described in the handout.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.


Files To Be Submitted

You should submit only the files you have written, and a makefile. The files to be submitted are: Please do not submit any of the files provided to you such as LinkedList.h, etc.

Submit the files using the procedure given to you for your section of the course.


Sample Output

Sample output is available for your study at
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/sample_output.txt
It was obtained by executing Proj2 on umbc9 with a command-line argument of fox_frag.txt.

A copy of the executable is available for you to try out. It is

/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/Proj2
Other sample input files can be found in
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/*.txt