CMSC 313 Project 3

Vowel Statistics

Assigned Monday Oct 3, 2011
Program Due 11:59 pm Tuesday Oct 18, 2011
Points 80

The Objectives

The objective of this assignment is to become familiar using pointers, structs, strings and dynamic memory in a C application. Students will also gain experience with command line arguments.

The Task

The English alphabet is broken into two kinds of letters -- vowels ('a', 'e', 'i', 'o', and 'u') and consonants (all other letters). The letter 'y' is somtimes considered a vowel (consider the word "myth"), but not for our project.
One interesting way to classify a word is as a "panvowel" which is a word that contains all the vowels EXACTLY ONCE (e.g. "education" is a panvowel, but "automobile" is not). Some words contain all five vowels exactly once and in alphabetic order. A word like this is called a "regular panvowel" (e.g. "facetious").
Your task is to write a program that reads a file of words and generates a set of statistics related to the vowels contained in those words, including the identification of panvowels.

How does your program work?

  1. Your program is executed with a single command line argument which is the name of the textfile to process.
  2. Your program reads the text file one "word" at a time, ignoring words that are less than 5 characters long and ignoring words that contain any non-alphabetic characters. A "word" is any sequence of non-whitespace characters, e.g. "Bob", "WAIT!!", "!??!", "cmcs313". Tthe more technical term for a sequence of non-whitespace characters is "token".
  3. After processing the entire file, your program outputs the following statistics in some reasonable, readable format
    1. The number of words read from the file.
    2. The number of words read, but ignored.
    3. The number of panvowels found.
    4. A list of the first 5 panvowels read.
    5. If the file contains less than 5 panvowels, print all that are found. If the list contains no panvowels, output an appropriate message.
    6. The longest panvowel found (even if not one of the first five).
    7. The shortest panvowel found (even of not one of the first five).
    8. The number of words that contained at least one 'a'.
    9. The shortest word that contained at least one 'a' (if any).
    10. The longest word that contained at least one 'a' (if any).
    11. Repeat the previous three statistics for each of the other vowels.

Hints, Notes and Requirements

  1. (N) You may assume that no word is more than 30 characters long.
  2. (N) You may assume that the name of the input text file is no more than 100 characters.
  3. (N) You can read one word at a time using %s with fscanf. This technique will skip whitespace, but will not separate punctuation from adjoining lettters, therefore "Bobby!" will be read as a word, but will be ignored because of the '!' character. There are obviously better ways to deal with punctuation, but we're trying to keep this relatively easy.
  4. (N) If there are multiple words which are the "shortest" or "longest" report the first of these found in the file.
  5. (N) If the file contains duplicate words, process each of the duplicates.

  6. (R) If the text file specified by the user cannot be opened for reading, an appropriate error message should be printed and your program should terminate.
  7. (R) Distinctions in case should be ignored, e.g. "Johnny", "johnny", and "JOHNNY" are all equivalent and should be treated the same way within your program. The letters 'a' and 'A' are considered the same vowel.
  8. (R) Words should be output in lower-case.
  9. (R) Your code must make appropriate use of functions -- C library functions and functions which you write.
  10. (R) The use of struct is required to organize vowel-related data.
  11. (R) Dynamic memory allocation using malloc( ) and/or calloc( ) (and hence the use of pointers) is required for all structs and all strings.
    1. All strings (char arrays) must be dynamically allocated.
      This is ok: char *myName = (char *)malloc( NAMESIZE );
      	     This is not ok: char myName[ NAMESIZE ];
    2. All structs must be dynamically allocated.
      This is ok: struct bob *pBob = (struct bob *)malloc( sizeof(struct bob) ):
      	          This is not ok: stuct bob aBob;
  12. (R)All dynamically allocated memory must be free'd so that no memory leaks are created. Use valgrind (see next item) to confirm that your code is correct.
  13. (R) Your program must run cleanly under the Unix valgrind utility. Execute your program under control of valgrind with the command
     linxu2[2]% valgrind --leak-check=full Project2 textfile
    The result should look something like this
    ==3290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
    ==3290== malloc/free: in use at exit: 0 bytes in 0 blocks.
    ==3290== malloc/free: 355 allocs, 355 frees, 5,329 bytes allocated.
    ==3290== For counts of detected errors, rerun with: -v
    ==3290== All heap blocks were freed -- no leaks are possible.
    See the course resources page for a link to a valgrind tutorial and / or see the on-line Unix manual for help with valgrind.
  14. (R) Your code must be separated into multiple .c files to form a library of reusable function as in project 2. Appropriate .h files are also required. You are free to choose any names for your files.

  15. (H) C does NOT have a library function to convert a word to lower-case, but it does have a library function to convert a character to lower case.
  16. (H) Appropriate use of typedef can make your code easier.
  17. (H) It is not necessary (and is highly inefficient) to read the entire text file in order to store all the words from the file into an array of strings. Process each word one at a time.

Project Grading

The expected point breakdown for this project will be approximately as shown below. In particular proper use of pointers, structs, malloc, calloc, and free will be stressed.
  1. Functionality (45 points)
    Note that your functionality score will be zero if your code does compile or create an executable.
    1. Basic cases - This might be a text file with a few words covering all required statistics with no words excluded..
    2. More complex cases - This would test word discarding rules using more complex and longer text files, words with mixed case, omitting words for some statistics.
    3. Atypical cases - This might be an empty file, a file that cannot be opened, a file containg words which are all discarded. We will not violate our promises regarding word length and number of words.
    4. Stress Cases - Large files that include all situations such as words of max size and the max number of panvowels, etc.
  2. Code (35 points)
    1. Coding Requirements - we expect that you adhere to the project requirements listed above with regard to the use of pointers and the dynamic allocation of struct and strings. We expect that your code will be broken into multiple .c files.
    2. Design - we expect that your code shows sufficient decompostion into appropriate library functions and to use appropriate C library functions.
    3. Style - we expect that your code adheres to the course coding standards, particularly with respect to function and file comments and to naming conventions.
    4. Your code must run cleanly under valgrind as describe above.
    5. You must submit a makefile that creates an executable named Project3 just by typing make at the Unix prompt.

Submitting the Program

You can submit your project using the submit command.

submit cs313 Proj3 <list of .c and .h files> makefile

See this page for a description of other project submission related commands. To verify that your project was submitted, you can execute the following command at the Unix prompt. It will show all files that you submitted in a format similar to the Unix 'ls' command.

submitls cs313 Proj3