CMSC 313 Project 1

Vowel Statistics

Assigned Wednesday, Sept 15, 2010
Program Due 11:59pm Tuesday, Sept 28, 2010
Points 65
Updates  

The Objective

The objective of this assignment is to become familiar with good design practices, writing in C in a Unix environment, and using arrays, functions, chars and strings.

The Task

The English alphabet is broken into two kinds of letters -- vowels ('a', 'e', 'i', 'o', and 'u') and consonants (all other letters). The letter 'y' is somtimes considered a vowel (consider the word "myth"), but not for our project.
One interesting way to classify a word is "panvowel" which is a word that contains all the vowels EXACTLY ONCE (e.g. "education", but not "automobile"). Some words contain all five vowels exactly once and in alphabetic order. A word like this is called a "regular panvowel" (e.g. "facetious").
Your task is to write a program that reads a file of words and generates a set of statistics related to the vowels contained in those words, including the identification of panvowels.

How does your program work?

  1. Your program prompts the user for the name of the text file to be processed.
  2. Your program reads the text file one "word" at a time, ignoring words that are less than 5 characters long and words that contain non-alphabetic characters. A "word" is any sequence of non-whitespace characters, e.g. "Bob", "WAIT!!", "!??!". Tthe more technical term for a sequence of non-whitespace characters is "token".
  3. After processing the entire file, your program outputs the following statistics in some reasonable, readable format
    1. The number of words read
    2. The number of words ignored
    3. The number of panvowels found
    4. A list of the first 5 panvowels read
    5. If the file contains less than 5 panvowels, print all that are found. If the list contains no panvowels, output an appropriate message.
    6. The longest panvowel found (even if not one of the first five)
    7. The shortest panvowel found (even of not one of the first five)
    8. The number of words that contained at least one 'a'
    9. The shortest word that contained at least one 'a' (if any)
    10. The longest word that contained at least one 'a' (if any)
    11. Repeat the previous three statistics for each of the other vowel

Hints, Notes and Requirements

  1. (N) You may assume that no word is more than 30 characters long.
  2. (N) You may assume that the name of the input text file is no more than 100 characters.
  3. (N) You can read one word at a time using %s with fscanf. This technique will skip whitespace, but will not separate punctuation from adjoining lettters, therefore "Bobby!" will be read as a word, but will not be processed because of the '!' character. There are obviously better ways to deal with punctuation, but we're trying to keep this relatively easy.

  4. (R) If the text file specified by the user cannot be opened for reading, an appropriate error message should be printed and your program should terminate.
  5. (R) Distinctions in case should be ignored, e.g. "Hello", "hello", and "HELLO" are all the same word. The letters 'a' and 'A' are considered the same vowel.
  6. (R) Words should be output in lower-case.
  7. (R) Your code must make appropriate use of functions -- C library functions and functions which you write
  8. (R) This is an individual project. Do your own work.
  9. (R) Because this is a relatively small project (and your first C project) all your code should be placed into one .c file named project1.c.
  10. (R) A makefile has been provided in Mr. Frey's public directory (/afs/umbc.edu/users/f/r/frey/pub/313/proj1). This makefile may be used "as is" as long as your source file is named project1.c as directed. Read and understand the contents of the makefile. Your makefile should be submitted along with project1.c. Whether you change the makefile or not, executing the command make must create the executable named Project1.

  11. (H) C does NOT have a library function to convert a word to lower-case, but it does have a library function to convert a character to lower case.
  12. (H) Appropriate use of typedef can make your code easier

Project Grading

The expected point breakdown for this project will be something like this.
  1. Functionality
    Note that your functionality score will be zero if your code does compile to create an executable.
    1. Basic cases (5 points) - This might be a text file with a few words covering all required statistics with no words excluded..
    2. More complex cases (25 points) - This would test word discarding rules using more complex and longer text files, words with mixed case, omitting words for some statistics.
    3. Atypical cases (5 points) - This might be an empty file, a file that cannot be opened, a file containg words which are all discarded. We will not violate our promises regarding word length and number of words.
    4. Stress Cases (15 points) - Large files that include all situations such as words of max size and the max number of panvowels, etc.
  2. Code
    1. Design (5 points) - we expect that your code shows sufficient decompostion into appropriate functions and to use appropriate C library functions.
    2. Style (10 points) - we expect that your code adheres to the course coding standards, particularly with respect to function and file comments and to naming conventions.

Submitting the Program

You can submit your project using the submit command.

submit cs313 Proj1 project1.c makefile

See this page for a description of other project submission related commands. To verify that your project was submitted, you can execute the following command at the Unix prompt. It will show all files that you submitted in a format similar to the Unix 'ls' command.

submitls cs313 Proj1