CMSC 313 Project 2

Vowel Statistics
Version 2

Assigned Wednesday, Sept 29, 2010
Program Due 11:59 pm Sunday Oct 17, 2010
Points 65
Updates  

The Objective

The objective of this assignment is to become familiar using pointers and dynamic memory in a C application. By comparing their solution of this project with their solution to project 1, students will see and experience the close relationship between arrays and pointers in C. Students will also gain experience with command line arguments.

The Task (Same as Project 1)

The English alphabet is broken into two kinds of letters -- vowels ('a', 'e', 'i', 'o', and 'u') and consonants (all other letters). The letter 'y' is somtimes considered a vowel (consider the word "myth"), but not for our project.
One interesting way to classify a word is "panvowel" which is a word that contains all the vowels EXACTLY ONCE (e.g. "education", but not "automobile"). Some words contain all five vowels exactly once and in alphabetic order. A word like this is called a "regular panvowel" (e.g. "facetious").
Your task is to write a program that reads a file of words and generates a set of statistics related to the vowels contained in those words, including the identification of panvowels.

How does your program work?

  1. Your program is executed with a single command line argument which is the name of the textfile to process
  2. Your program reads the text file one "word" at a time, ignoring words that are less than 5 characters long and words that contain non-alphabetic characters. A "word" is any sequence of non-whitespace characters, e.g. "Bob", "WAIT!!", "!??!", "cmcs313". Tthe more technical term for a sequence of non-whitespace characters is "token".
  3. After processing the entire file, your program outputs the following statistics in some reasonable, readable format
    1. The number of words read
    2. The number of words ignored
    3. The number of panvowels found
    4. A list of the first 5 panvowels read
    5. If the file contains less than 5 panvowels, print all that are found. If the list contains no panvowels, output an appropriate message.
    6. The longest panvowel found (even if not one of the first five)
    7. The shortest panvowel found (even of not one of the first five)
    8. The number of words that contained at least one 'a'
    9. The shortest word that contained at least one 'a' (if any)
    10. The longest word that contained at least one 'a' (if any)
    11. Repeat the previous three statistics for each of the other vowels

Hints, Notes and Requirements

  1. (N) You may assume that no word is more than 30 characters long.
  2. (N) You may assume that the name of the input text file is no more than 100 characters.
  3. (N) You can read one word at a time using %s with fscanf. This technique will skip whitespace, but will not separate punctuation from adjoining lettters, therefore "Bobby!" will be read as a word, but will not be processed because of the '!' character. There are obviously better ways to deal with punctuation, but we're trying to keep this relatively easy.
  4. (N) If there are multiple words which are the "shortest" or "longest" report the first of these found in the file. This is the same as project 1, but added here for clarification.
  5. (N) If the file contains duplicate words, process each of the duplicates separately. This is the same as project 1, but added here for clarification

  6. (R) If the text file specified by the user cannot be opened for reading, an appropriate error message should be printed and your program should terminate.
  7. (R) Distinctions in case should be ignored, e.g. "Johnny", "johnny", and "JOHNNY" are all equivalent and should be treated the same way within your program. The letters 'a' and 'A' are considered the same vowel.
  8. (R) Words should be output in lower-case.
  9. (R) Your code must make appropriate use of functions -- C library functions and functions which you write
  10. (R) This is an individual project. Do your own work.
  11. (RThe use of struct is required to organize related data.
  12. (R) Dynamic memory allocation using malloc( ) and/or calloc( ) (and hence the use of pointers) is required for all structs and strings
    1. All strings (char arrays) must be dynamically allocated.
      This is ok: char *myName = (char *)malloc( NAMESIZE );
      This is not ok: char myName[ NAMESIZE ];
    2. All structs must be dynamically allocated.
      This is ok: struct bob *pBob = (struct bob *)malloc( sizeof(struct bob) ):
      This is not ok: stuct bob aBob;
    3. Arrays of "pointer to structs" and arrays of "pointers to strings" are permitted
  13. (R) All dynamically allocated memory must be free'd so that no memory leaks are created. Use valgrind (see next item) to confirm that your code is correct.
  14. (R) Your program must run cleanly under the Unix valgrind utility. Execute your program under control of valgrind with the command
     linxu2[2]% valgrind --leak-check=full Project2 textfile
    The result should look something like this
    ==3290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
    ==3290== malloc/free: in use at exit: 0 bytes in 0 blocks.
    ==3290== malloc/free: 355 allocs, 355 frees, 5,329 bytes allocated.
    ==3290== For counts of detected errors, rerun with: -v
    ==3290== All heap blocks were freed -- no leaks are possible.
    
    See the course resources page for a link to a valgrind tutorial and / or see the on-line Unix manual for help with valgrind.
  15. (R) Your code must be separated into multiple .c files as follows. Appropriate .h files are also required. You are free to choose any names for your files.
    1. main( ) and helper functions specific to this project may be in the same .c file
    2. Helper functions that are generic enough to be used in a different project (and there should be some such functions) should be in yet another .c file and their prototypes placed in an appropriately named .h file
    3. Use of the keyword static is required for function definitions as appropriate.
  16. (R) You must submit a makefile that creates an executable named Project2 just by typing make at the Unix prompt. Use the makefile provided with project 1 as a starting point.

  17. (H) C does NOT have a library function to convert a word to lower-case, but it does have a library function to convert a character to lower case.
  18. (H) Appropriate use of typedef can make your code easier
  19. (H) It is not necessary (and is highly inefficient) to read the entire text file in order to store all the words from the file into an array of strings. Process each word one at a time.

Project Grading

The expected point breakdown for this project will be something like this. Since the functionality of the project is the same as project 1, more focus is given towards your code. In particular proper use of pointers, structs, malloc, calloc, and free will be stressed.
  1. Functionality
    Note that your functionality score will be zero if your code does compile or create an executable.
    1. Basic cases - This might be a text file with a few words covering all required statistics with no words excluded..
    2. More complex cases - This would test word discarding rules using more complex and longer text files, words with mixed case, omitting words for some statistics.
    3. Atypical cases - This might be an empty file, a file that cannot be opened, a file containg words which are all discarded. We will not violate our promises regarding word length and number of words.
    4. Stress Cases - Large files that include all situations such as words of max size and the max number of panvowels, etc.
  2. Code
    1. Coding Requirements - we expect that you adhere to the project requirements listed above with regard to the use of pointers and the dynamic allocation of struct and strings
    2. Design - we expect that your code shows sufficient decompostion into appropriate functions and to use appropriate C library functions.
    3. Style - we expect that your code adheres to the course coding standards, particularly with respect to function and file comments and to naming conventions.

Submitting the Program

You can submit your project using the submit command.

submit cs313 Proj2 <list of .c and .h files> makefile

See this page for a description of other project submission related commands. To verify that your project was submitted, you can execute the following command at the Unix prompt. It will show all files that you submitted in a format similar to the Unix 'ls' command.

submitls cs313 Proj2