CMSC 341 Spring 2009
Project 2

Assigned Friday, February 20, 2009
Due 11:59pm Friday March 6, 2009
Updates Please update your LinkedList.java with this one. Changes to some descriptions of functions.

Objectives

  1. To gain experience implementing the Linked List ADT from scratch
  2. To gain experience using inheritance with generic classes
  3. To gain experience using Linked Lists in a real application
  4. To gain experience implementing the Adapter Design Pattern

Description

A spell checker is a common application used in conjunction with document preparation. Since many documents focus on one subject, many words in a document are often repeated. Hence, it makes sense that checking the spelling of repeated words should be made as fast as possible. In this project you will implement a rudimentary spell checker that provides for faster access to frequently used words. The spell checker will be implemented as a List of Lists.

We will implement two variations of the a linked list implementation by deriving two new kinds of linked list using inheritance. This will require you to implement your own linked list ADT and create the derived list implementations. One new list is called an "MRU List" (MRULL). MRU stands for "Most Recently Used". Every time an element of an MRU List is "found" it is moved to the front of the list. In this way, frequently accessed data is always close to the front of the list, so subsequent "finds" for that data are quicker. New words are inserted at the front of the MRU List. For our spell-checker this means that repeated words in a document are checked more quickly.

The other new Linked List is the "Sorted Linked List" (SLL). A SLL keeps its data sorted at all times by inserting new elements in the appropriate place in the list.

Project 2 will be invoked with multiple command line arguments. The first argument is the name of the text file which will be checked for spelling errors. The second argument is the name of the dictionary file. The third command line argument is the number of words to print for each letter on the command line. The remaining argument(s) are single lower-case characters [a-z] for which words and recently used words from the dictionary are printed.

Spelling Rules

  1. A "word" in the text file is defined to be a sequence of characters surrounded by whitespace. Any sequence of characters containing anything other than the letters A-Z and a-z should be considered misspelled, but see below concerning punctuation characters. For example, "This.example shows many_things." there are 3 words: "This.example", "shows", and "many_things.".
  2. All after the leading character of any word should be considered as case-insensitive. E.g. if you find the word monday in the text file and that word is not found in the dictionary, then you have a misspelled word since Monday would be in the dictionary.
  3. All punctuation characters found tailing a word should be removed. Punctuation characters are defined as periods, exclamation marks, question marks, semi-colons, and commas found tailing words. Hyphens are the only punctuation that can be removed from the middle of a word (i.e. word-word). Quotation marks and parenthesis must be removed from the beginning and end of words.
    For example, in the sentence "Hi, you're looking nice today?"., you would remove the quotes, comma, and the question mark and search the dictionary for the words Hi, you're, looking, nice and today.

The ADTs

The MRU List ADT

The Most Recently Used (MRU) List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- Elements which are successfully found are moved to the front of the list. The list is unchanged in the event of an unsuccessful find operation. Since an MRU list is inherently NOT sorted, new elements can be inserted anywhere in the MRU list.

The Sorted List

The Sorted List is derived from LinkedList. It provides no other functionality. It differs from a LinkedList in just one way -- the elements are always in ascending sorted order.

The Dictionary ADT

The dictionary contains ascii strings which represent the complete set of correctly spelled words. To facilitate fast look-up, these words are stored in an MRU list for each letter of the alphabet. The Dictionary is essentially a SortedList of MRU word lists.

The Dictionary ADT defines the following operations

  1. Insert( word ) -- adds the specified word to the dictionary.
  2. Find( word ) -- searches the dictionary for the specified word.
  3. PrintMRU( letter, N ) -- prints the N most recently used words in the dictionary that begin with the specified letter. If less than N words start with the specified letter, then all words beginning with that letter are printed.
    The words must be printed in a neat columnar format, 4 words per row. Each column must be separated by a minimum of 3 whitespaces. The width of the columns will depend on the length of the longest word to be printed. Note that this operation is case-INsensitive. The parameter passed to the function may be upper- or lower-case. In either event, words that begin with the same upper- or lower-case letter must be printed.

    For example, PrintMRU('a', 10) and PrintMRU('A', 10) BOTH print words which begin with 'a' OR 'A'.

  4. A default constructor which creates an empty dictionary.

Word List ADT

You will find it necessary (or at least easier) to associate some information with the MRU list for each letter of the alphabet. The WordList ADT encapsulates this information and becomes an adapter for the MRU list. The WordList ADT defines the following operations
  1. appropriate constructor(s)
  2. appropriate comparison methods
  3. an appropriate Print function
  4. Insert( word ) -- inserts the specified word into the WordList
  5. Find( word ) -- finds the specified word in the WordList

Misspelled Word List ADT

Your program must maintain a list of misspelled words along with the number of times each misspelled word appears. To do so, you must implement the MisspelledList ADT which defines the following operations
  1. appropriate constructor(s)
  2. Insert( word ) -- inserts the misspelled word into the MisspelledList. If the word is already in the list, its count is incremented.
  3. Print( ) -- prints the Misspelled list in alphabetical order.

CountedWord ADT

Since the printed list of misspelled words also includes the number of times each misspelled word appears, good OO design dictates that these data items be stored together in a simple object which we are calling a CountedWord. (You're free to choose a better name if you can think of one).

The CountedWord ADT defines the following operations

  1. appropriates constructor(s)
  2. Increment( ) -- increments the word's counter
  3. appropriate comparison operators
  4. Print( ) that outputs the word and and count in the format "<word> : <count>" (without the quotes).

Basic Project Requirements

  1. ALL LISTS used in this project must be linked lists. This means NO JAVA ARRAYLISTS, LINKED LIST, ARRAYS are permitted.
  2. The MRU list and Sorted Linked List (SLL) must be implemented using inheritance. They are both derived from the linked list we have provided you in your repository. Please note, this class is not complete and needs to be modified.
  3. You create/modify your own linked list, list node, and list iterator. This may be a single or doubly linked list.
  4. All possible exceptions discussed pertaining to a linked list must be caught and dealt with appropriately. Your program should continue processing after an exception if at all possible.
  5. Only the Dictionary and MisspelledList classes are instantiated in main( ) or in functions called from main( ).
  6. You may include any private data and private functions to any class.
  7. Be prepared for your code to work on large data sets.
  8. You must bulletproof your code as much as possible, e.g. making sure that the command line parameters are all valid, that the files named in it all exist, and that there is at least one report command. In case of invalid report parameters, your program should print them and a message indicating that the command is in error.
  9. Your program performs the following tasks
    1. Reads the file of words, check each word in the dictionary to make sure that it contains only alphabetic characters, and puts them into the dictionary
      The words "a", "an" and "the" (known as "articles") should be added to the dictionary. They may or may not be present in the dictionary file read by your program.
    2. Reads the text file and checks each word for proper spelling according to the spelling rules above. Misspelled words will then be inserted into the misspelled word list.
    3. Prints the list of misspelled words and the number of times each misspelled word appeared. This list is printed in alphabetical order.
      The words must be printed in a neat columnar format, 4 words per row. The format is shown in the sample output. For formatting purposes only, you may assume that no misspelled word will appear more than 99 times.
    4. For each letter found on the command line, prints the first N (taken from the command line) from that letter's MRU list in the dictionary.

Project Notes and Hints

  1. You may not use the java linked list class and will recieve 0 points in this event. However, you may use the material presented in class as a guide. This does not mean you can copy any code from the slides.
  2. Lists are generic in type. You should write adapters to allow them to process strings for this project.
  3. To answer the questions with this project, you may need to implement code that is not required by the output of the project.
  4. You may look at printf method for printing adapting your output to the required output format.
  5. This is not an easy project. Start early

Sample Output

The output below is provided for formatting purposes only. It is not the result of executing any version of project 2. Your output may vary slightly, but must conform to the output format found elsewhere in this description. Your output must be column aligned for each letter. Use the length of the longest word starting with each letter to determine your column width. linux2[105] Proj2 textfile.txt dictionary.dat 15 f k a Spell checking textfile.text using dictionary.dat There were 10 misspelled words ------------------------------ aren't : 1 brokin :10 brothar : 1 cuddel : 3 freind : 2 mooth : 4 times2 : 1 verry : 1 The first 15 words beginning with 'f' ------------------------------------- fireman first front flight fudge fun famous finance frankly future fast fickle The first 15 words that begin with 'k' -------------------------------------- kettle kitten ketchup Karl Kringle kludge kangaroo karate kidnapper kidneys knight knocker The first 15 words that begin with 'a' -------------------------------------- aspen asteroid asthma ankle athlete atlas attack atom attic August Australia author autograph automobile avoid

Project Questions (10 points)

Edit the file 341-spring09-proj2_questions.txt found in your repository for this project. Edit it to answer the questions. Be sure to submit this file.

Files To Be Submitted

You must use CVS to check out your module in the Proj2 repository and to check in your code. That must include an ant build.xml file with targets for creating javadoc (named "doc"), for running your program (named "run" and that allows the command line arguments to be specified with -Dargs, like this ant -Dargs="arg1 arg2 ..." run), and for compiling your code (this must be the default target). The build.xml file that comes with your module in the Proj2 repository has all of this. Do not submit or put under CVS control your javadoc output. We will build it with the "ant doc" command. See the projects page for more information on all of these topics. If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Grading and Academic Integrity

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.