CMSC 341 Spring 2006
Project 3 - The Book Index

Assigned Monday, March 27, 2006
Due 11:59pm Sunday, April 9, 2006
Updates

Objective

Description

One important part of any book is the book's index. The index is an alphabetical list of important words found in the book along with the page number(s) on which each word is found. In this project you will implement the index for a book that allows the user to query the index. The project requirements will help you decide what classes and data structures are best suited to complete the book index implementation.

The command line for this project has two command line arguments. The first argument is the name of an index file which your program will read to create the book's index. The second command line argument is the name of a command file which contains user queries.

Grading Notes

The grading for this project will be weighted more toward efficiency and analysis (the questions) than other projects. For this project, the questions will count 20% rather than the usual 10% and implementation (which includes efficiency) will count 30% rather than the usual %15.

The Index File

The index file contains words or short phrases and the page number(s) on which they are found. Each line in the index file pertains to a single word or phrase. Each line begins with a word or phrase terminated by a semi-colon ( ';' ), followed by a page number or range of page numbers.
  1. You may assume that page ranges in the index file are valid (i.e. the first page number in the range is smaller than the second page number in the range).
  2. Any line in the file that begins with the '#' character is a comment and should be ignored.
  3. Blank lines may be found anywhere in the file and should also be ignored.
  4. The following examples show the format of the index file.
    1. sorting; 7
      means the word "sorting" appears on page 7
    2. hello world; 8-10
      means the phrase "hello world" appears on pages 8, 9 and 10. In the case of a page range as shown here, you may assume there are no spaces surrounding the "-" in the range of page numbers.
  5. All words in the index file begin with a letter of the alphabet ('a' - 'z' and 'A' - 'Z'). Words do not begin with any other characters, but any character (except ';') may appear in a word.
  6. The same word or phrase may appear multiple times in the file. If a word or phrase appears multiple times in the file, page numbers will NOT be duplicated and page numbers/ranges will NOT overlap.
  7. You may assume that all words and phrases in index file are spelled differently. I.e., no words or phrases will be different only by case. For example, if "abc" appears in the index file, you will not see any of "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", or abC".

The Command File

The commands in the command file are used to query the book index and produce appropriate output. Each line of the command file represents a single command.
  1. You may assume that valid commands are well-formed, but not all commands are valid. If you encounter an invalid command you should ignore the rest of that line in the file, print an appropriate error and continue to the next command.
  2. Any line in the file that begins with the '#' character is a comment and should be ignored.
  3. Blank lines may be found anywhere in the file and should also be ignored.
  4. The following commands will be found in the command file. The required output data for each command is listed with the command description. The output for each command must include an appropriate label. See the sample output for acceptable labels.
    1. LETTER <letter>
      Prints all book index entries that start with specified letter (upper- or lower-case) in case-insensitive alphabetical order. Thus the commands LETTER A and LETTER a produce identical output, namely all words and phrases that begin with either 'A' or 'a'.
    2. WORD <word or phrase>
      Prints the specified word or phrase and the list of page numbers on which the word or phrase was found. Page numbers/ranges must be printed in increasing page number order.
    3. PRINT
      Prints the entire book index in case-insensitive alphabetical order. For readability, (a) separate the entries for each letter of the alphabet with blank lines and (b) print the letter of the alphabet above its entries. If no entries exist for a letter of the alphabet, print nothing.
    4. PAGE <page number>
      Prints a case-insensitive alphabetical list of all words which are found on the specified page.

Your Tasks

  1. You must implement a C++ class to model the book index.
    1. Although your book index may be implemented in terms of page numbers/ranges and words/phrases, the underlying data structure(s) used to implement the book index must be generic template containers.
    2. Efficiency counts! You must implement the book index class to provide the best possible performance for all required book index operations.

  2. Implement any other C++ classes necessary for the project as dictated by fundamental OO principles.

  3. Write main( ) and supporting functions in Proj3.cpp and Proj3Aux.cpp. These functions read the index file to populate the book index, then read and process the command file to produce the required output.

  4. Copy the file 341-Spring06-proj3_questions.txt from Mr. Frey's public directory and insert your answers to the questions.

  5. Create a makefile for your project.

Miscellaneous Project Requirements and Assumptions

  1. Mr. Frey's public directory for this project is
    /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj3
  2. Since the author does not supply code for vector, deque, stack, or queue, these containers may be used from the STL in this project. No other STL containers are allowed. The author's binary search tree code is available for your use in Mr. Frey's directory. You may also use the author's List code from project 2.
  3. Each command read from the file must be echoed as part of the output.
  4. For output formatting purposes only, you may assume that no word or phrase is more than 20 characters long.
  5. For output formatting purposes only, you may assume that all books have less than 10000 pages.

Sample Output

This output is provided to help clarify the command output requirements This output does not come from any input file and the outputs for each command may not be consistent. It is only an example of one acceptable format. Any output which contains the same information in a neatly arranged and readable format is also allowed.

DO NOT try to reproduce this output exactly word-for-word, line-for-line, character-for-character. This output is just an example. It may not even be possible to duplicate this output exactly without very convoluted C++ code, if it's possible at all.

Cmd: LETTER D 'D' Words --------- deny 4, 7-9, 12 Dorothy 33 dwarf 3-5, 11-14 Cmd: WORD "bob" Pages for "bob" --------------- "bob" was found on page(s) 4-5, 7, 10-12 Cmd: PAGE 44 Page 44 ------- a allow ammunition Andy apple ask Cmd: PRINT The entire contents of the book index 'A' Words --------- a 4, 6, 10-15 apple 3 ARNOLD 2, 5-7 'C' Words --------- Cathy 1, 5, 8-12 color 5, 7, 9-12 cushion 6 .... etc...

Files To Be Submitted

Submit any files which you have written or modified -- source code and makefile.

Be sure to submit the answers to the project questions in plain text format.

Submit the files using the procedure given to you for your section of the course.
If your makefile is set up correctly, you should be able to execute the command make submit.

Submission Tools
There are a number of tools available to you to check on your submittal. It is your responsibility to ensure that the submittal is correct and will result in a successful compilation of your project. Do not wait till the last minute to submit your files. Give yourself enough time to check that your submittal is correct.

If you don't submit a project correctly, you will not get credit for it. Why throw away all that hard work you did to write the project? Check your submittals. Make sure they work. Do this before the due date.

Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One of the tools provided by the submit program is
submitls. It lists the names of the files you have submitted.

Additionally, there are two programs for use only by CMSC-341 students (not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named submitmake and submitrun. You can use these programs to make or run your submitted projects.

The syntax is similar to that for submit:

submitmake <class> <project>

Example:  submitmake cs341 Proj3 <parameter list>

This makes the project, and shows you the report from the make utility. It cleans up the directory after making the project (removes .o and ii_files), but leaves the
executable in place.

submitrun <class> <project>

Example:  submitrun cs341 Proj3

This runs the project, assuming there is an executable (i.e. submitmake was run successfully).


Grading and Academic Integrity

Project grading is described in the Project Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project Policy handout for further details on honesty in doing projects for this course.

Remember, the due date is firm. Submittals made after midnight of the due date will not be accepted. Do not submit any files after that time.