CMSC-341 Spring 2003

Project 5

Assigned Wednesday 4/23/03
Due Tuesday 5/6/03 11:59pm
Updates 26 April 2003 6.450pm
As noted in a thread on the discussion board, the initial versions of QuadProbing.C/H in Mr. Frey's public directory generated compiler errors with the new compiler. These errors were correct on Thursday 4/24/03 around 9:00am.
If you copied these files before then, you should recopy them

23 April 2003 2:30pm
A couple of the alternative spellings listed in the initial sample output did not follow the rules for generating alternatives. The sample output has been changed.


Background

Hash tables are best used when insertion, searching and removal must run as quickly as possible. Dictionaries and phone books are just two applications that have very fast accesss requirements.

Description

In this project, you will implement a rudimentary spell-checker using a hash table and use it to spell-check a text file. You will be provided a file with an alphabetized list of correctly spelled words (the "dictionary"), the number of words in the dictionary and a text file to process. Any word in the text file that does not appear in the dictionary is presumed to be misspelled. For grading purposes we will use the file /usr/share/dict/words which contains 45427 words as the dictionary.

The output from your program will be an alphabetized list of misspelled words, and the line number(s) on which they occurred. If a misspelled word appears on the same line twice, list the line number twice. Also for each misspelled word provide a list of potential correct spellings from the dictionary that are obtainable by applying each of the following rules

  1. Adding one character to the misspelled word in the beginning, at the end, or in between any two charcters.
  2. Removing a character
  3. Exchanging two adjacent characters

Classes

To implement this project you will need the following classes

A hashtable class

The author has provided a hash table class which uses quadratic probing. The files for this hash table (QuadProbing.H and QuadProbing.C) are available in Mr. Frey's public directory (/afs/umbc.edu/users/d/e/dennis/pub/CMSC341). Copy these files to your own directory and modify them as you see fit.

A class to encapsulate information about a misspelled word

The design and implemenation of this class is up to you, but good OOD/OOP princples must be followed. Provide necessary constructor(s), a destructor, accessor(s) and mutator(s). Avoid the unnecessary use for friends.

Note: It's not necessary to store all the information about each misspelled word. Some information may be created as it's being reported.

A data structure of your choosing to contain all the misspelled words

The hard part of this project is producing the output because
  1. the misspelled words must be output alphabetically
  2. the line numbers on which a misspelled word occurs must be in nondcreasing order
  3. the same misspelled word may appear on more than one line, but is reported only once
You are free to choose whatever data structure you feel is most appropriate to accomplish this output requirement. You may use any of the author's code, modifying it if necessary.

Since the hash table provides O(1) performance for insertion and searching, the overall performance of your project will be determined by the data structure you choose. As usual, the efficiency of your data structure will be a factor in your project grade.

A summary of things to do

  1. Copy QuadProbing.C and QuadProbing.H from Mr. Frey's public directory
  2. Design and implement a class to encapsulate information about a misspelled word
  3. Choose and implement a data structure to maintain information about all misspelled words.
  4. Write main() in Proj5.C to read the dictionary and the text file, process the text file and report the required information about each misspelled word.
  5. Copy and answer the questions (341-Spring03-p5_questions.txt) from Mr. Frey's public directory
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj5
  6. Submit your project
  7. Use submitmake and submitrun to verify your submittal

The Command Line

Project 5 will be invoked with three (3) command line arguments in this order
  1. The name of the dictionary file
  2. The number of words in the dictionary file
  3. The name of the text file to process


The Text File

To make your life a bit easier, the text file will contain only words with lowercase characters and no punctuation. There is no limit on the number of words or number of lines in the file. Some lines may be blank, but should be counted when assigning line numbers.

Sample Output

The misspelled words must be listed in alphabetical order. Line numbers must be in nondecreasing order. If the same word is misspelled twice on the same line, list the line number twice. Alternatives may be listed in any order. This sample output is made up and may well be missing alternatives. It is not the result of processing any file. It is provided to show the required information and an acceptable format. linux3[43]% Proj5 /usr/share/dict/words 45427 bob.txt Text File Name : bob.txt Misspelled Words: 2 Hash Table Size : 12345678 Misspelled Word: aex Line Number(s) : 1 Alternative(s) : axe Misspelled Word: bey Line Number(s) : 1, 10 Alternative(s) : by, obey, bye linux3[44]%

Project Submission

Because you are submitting your own makefile, you are free to name your files whatever you like, with the following restrictions:

What to submit

  1. The author's QuadProbing.H and .C files even if not modified.
  2. All .H and .C files you modified or created
  3. Your makefile
  4. Your copy of the questions with your answers

Submit Tools

There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/. One of the tools provided by the submit program is submitls. It lists the names of the files you have submitted.

Additionally, there are two programs which are not part of the UCS submit program. They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

To access these programs, do one of the following

  1. Create an alias in your .cshrc file (the preferred method)
    alias submitmake /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitmake
    alias submitrun /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitrun
  2. Type the full path name everytime you run them
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitmake cs341 Proj5
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/submitrun cs341 Proj5 dictionary numWords textfile
  3. Copy them to your directory
    You will have to give yourself permission to execute these programs after you have copied them. This is accomplished using the chmod command

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj5

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o files), but leaves the executable in place.

submitrun <class> <project> [command-line args]

Example:  submitrun cs341 Proj5 dictionary1 10 text1

This runs the project, assuming there is an executable (i.e. submitmake was run successfully) and that test.dat is in your local (not submittal) directory.


Grading and Academic Integrity

Your project will be tested using a variety of command lines, some of which will be invalid.
Your project will also be tested using a variety of command files which will test various conditions which your code should handle.

Project grading is described in the Project Policy handout.

Your answers to 341-Spring03-p5_questions.txt are worth 10% of your project grade.

Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after 11:59pm of the due date will not be accepted. Do not submit any files after that time.
 


Last modified on Monday March 31, 2003 by Dennis Frey
email: frey@cs.umbc.edu