CMSC 341 Fall 2005
Project 2

Assigned Wednesday, October 5, 2005
Due 11:59pm Wednesday October 19, 2005
Updates Oct 9
Since an MRU list is inherently NOT sorted, new elements can be inserted anywhere in the MRU list. The statement " New words are inserted at the front of the MRU List." (in the Description section) is part of the project requirements so that everyone gets exactly the same output. This may be accomplished by having the user of the MRU list always insert at the front of the list, or by overriding the LinkedList's insert( ) function.

Objectives

  1. To gain experience reading and using code written by another programmer
  2. To gain experience using inheritance with class templates
  3. To gain experience using Linked Lists in a real application
  4. To gain experience implementing the Adapter Design Pattern

Description

A spell checker is a common application used in conjunction with document preparation. Since many documents focus on one subject, many words in a document are often repeated. Hence, it makes sense that checking the spelling of repeated words should be made as fast as possible. In this project you will implement a rudimentary spell checker that provides for faster access to frequently used words. The spell checker will be implemented as a List of Lists.

We will implement two variations of the author's linked list implementation by deriving two new kinds of linked list from the author's list using C++ inheritance. One new list is called an "MRU List" (MRULL). MRU stands for "Most Recently Used". Every time an element of an MRU List is "found" it is moved to the front of the list. In this way, frequently accessed data is always close to the front of the list, so subsequent "finds" for that data are quicker. New words are inserted at the front of the MRU List. For our spell-checker this means that repeated words in a document are checked more quickly.

The other new Linked List is the "Sorted Linked List" (SLL). A SLL keeps its data sorted at all times by inserting new elements in the appropriate place in the list.

Project 2 will be invoked with multiple command line arguments. The first argument is the name of the text file which will be checked for spelling errors. The second argument is the name of the dictionary file. The third command line argument is the number of words to print for each letter on the command line. The remaining argument(s) are single lower-case characters [a-z] for which words and recently used words from the dictionary are printed.

Spelling Rules

  1. A "word" in the text file is defined to be a sequence of characters surrounded by whitespace. Any sequence of characters containing anything other than the letters A-Z and a-z should be considered misspelled, but see below concerning punctuation characters.
  2. The leading character (and only the leading character) of any word should be considered as case-insensitive. E.g. if you find the word Polish in the text file and that word is not found in the dictionary, you should then try to find polish.
  3. All punctuation characters found in the text file should be removed. Punctuation characters are defined as periods, exclamation marks, and question marks found at the end of a sentence as well as quotations, parenthesis, dashes (hyphens), semi-colons, and commas found anywhere in the text.
    For example, in the sentence "Hi, how are you today?"., you would remove the quotes, comma, and the question mark and search the dictionary for the words Hi, how, are, you and today.

The ADTs

The MRU List ADT

The Most Recently Used (MRU) List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- Elements which are successfully found are moved to the front of the list. The list is unchanged in the event of an unsuccessful find operation.

The Sorted List

The Sorted List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- the elements are always in sorted order according to operator<

The Dictionary ADT

The dictionary contains C++ strings which represent the complete set of correctly spelled words. To facilitate fast look-up, these words are stored in an MRU list for each letter of the alphabet. The Dictionary is essentially a SortedList of MRU word lists.

The Dictionary ADT defines the following operations

  1. Insert( word ) -- adds the specified word to the dictionary.
  2. Find( word ) -- searches the dictionary for the specified word.
  3. PrintMRU( stream, letter, N ) -- prints the N most recently used words in the dictionary that begin with the specified letter to the specified stream. If less than N words start with the specified letter, then all words beginning with that letter are printed.
    The words must be printed in a neat columnar format, 5 words per row. The width of the columns will depend on the length of the longest word to be printed. Note that this operation is case-INsensitive. The parameter passed to the function may be upper- or lower-case. In either event, words that begin with the same upper- or lower-case letter must be printed.

    For example, PrintMRU( cout, 'a', 10) and PrintMRU( cout, 'A', 10) BOTH print words which begin with 'a' OR 'A'.

  4. A default constructor which creates an empty dictionary.
  5. A destructor.

Word List ADT

You will find it necessary (or at least easier) to associate some information information with the MRU list for each letter of the alphabet. The WordList ADT encapsulates this information and becomes an adapter for the MRU list. The WordList ADT defines the following operations
  1. appropriate constructor(s)
  2. appropriate comparison operators
  3. an appropriate Print function
  4. Insert( word ) -- inserts the specified word into the WordList
  5. Find( word ) -- finds the specified word in the WordList
  6. a destructor

Misspelled Word List ADT

Your program must maintain a list of misspelled words along with the number of times each misspelled word appears. To do so, you must implement the MisspelledList ADT which defines the following operations
  1. appropriate constructor(s)
  2. Insert( word ) -- inserts the misspelled word into the MisspelledList. If the word is already in the list, its count is incremented.
  3. Print( ostream ) -- prints the Misspelled list in alphabetical order.
  4. a destructor

CountedWord ADT

Since the printed list of misspelled words also includes the number of times each misspelled word appears, good OO design dictates that these data items be stored together in a simple object which we are calling a CountedWord. (You're free to choose a better name if you can think of one).

The CountedWord ADT defines the following operations

  1. appropriates constructor(s)
  2. Increment( ) -- increments the word's counter ( ++ operator is allowed instead )
  3. appropriate comparison operators
  4. Print( ostream ) that outputs the word and and count in the format "<word> : <count>" (without the quotes).
  5. a destructor

Basic Project Requirements

  1. ALL LISTS used in this project must be linked lists. NO VECTORS are permitted.
  2. The MRU list and Sorted Linked List (SLL) must be implemented using inheritance. They are both derived from the author's linked list.
  3. You must use the author's code (linked list, list node, and list iterator) as is -- no changes. This code is found in Mr. Frey's directory. DO NOT submit any of the author's code. Your makefile should reference this code as in project 1.
  4. All exceptions thrown by the author's code or any code that you write must be caught and dealt with appropriately. Your program should continue processing after an exception if at all possible.
  5. No new friends in any class.
  6. Only the Dictionary and MisspelledList classes are instantiated in main( ) or in functions called from main( ).
  7. You may overload the output operator (operator<<) for any class.
  8. You may include any private data and private functions to any class.
  9. NO NEW PUBLIC FUNCTIONS ARE PERMITTED.
  10. Your program performs the following tasks
    1. Reads the file of words and puts them into the dictionary
      The words "a", "an" and "the" (known as "articles") should be added to the dictionary. They may or may not be present in the dictionary file read by your program.
    2. Reads the text file and checks each word for proper spelling according to the spelling rules above.
    3. Prints the list of misspelled words and the number of times each misspelled word appeared. This list is printed in alphabetical order.
      The words must be printed in a neat columnar format, 5 words per row. The format is shown in the sample output. For formatting purposes only, you may assume that no misspelled word will appear more than 99 times.
    4. For each letter found on the command line, prints the first N (taken from the command line) from that letter's MRU list in the dictionary.

Project Notes and Hints

  1. Project related files can be found in Mr. Frey's public directory for this project /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj2
  2. In the member functions of the SortedLinkedList and the MRUList, if you wish to call the List functions first( ) or zeroth( ), you must do so as List<T>::first( ) or List<T>::zeroth( ). Ordinarily this is not necessary in a derived class (because we're calling a base class method), but the compiler gives an error. Must be something unique to inheritance with templates.
  3. This is not an easy project. Start early

Sample Output

The output below is provided for formatting purposes only. It is not the result of executing any version of project 2. Your output may vary slightly, but must conform to the output format found elsewhere in this description. Your output must be column aligned for each letter. Use the length of the longest word starting with each letter to determine your column width. linuxserver2[105] Proj2 textfile.txt dictionary.dat 15 f k a Spell checking textfile.text using dictionary.dat There were 10 misspelled words ------------------------------ aren't : 1 brokin :10 brothar : 1 cuddel : 3 didn't : 3 freind : 2 mooth : 4 times2 : 1 verry : 1 you're : 3 The first 15 words beginning with 'f' ------------------------------------- fireman first front flight Fred fudge fun famous finance floods frankly future fast fickle froth The first 15 words that begin with 'k' -------------------------------------- kettle kitten ketchup Karl kitchen Kringle kludge kangaroo karate keyhole kidnapper kidneys knight knocker knots The first 15 words that begin with 'a' -------------------------------------- aspen asteroid asthma ankle athlete atlas attack atom attic August Australia author autograph automobile avoid

Project Questions

Copy the file 341-Fall05-proj2_questions.txt found in Mr. Frey's public directory for this project to your directory. Edit it to answer the questions. Be sure to submit this file.

Files To Be Submitted

Submit any files which you have written or modified -- source code and makefile. DO NOT submit any unmodified code provided by the author or your instructors. In particular, this means LinkedList.H and LinkedList.C found in Mr. Frey's public directory. As in project 1, your makefile should reference these files.

Be sure to submit the answers to the project questions in plain text format.

Submit the files using the procedure given to you for your section of the course.
If your makefile is set up correctly, you should be able to execute the command make submit.

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj2 <parameter list>

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project>

Example:  submitrun cs341 Proj2

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.