UMBC CMSC 202
UMBC CMSC 202 CSEE | 202 | current 202

CMSC 202 Fall 2003
Project 2

Coin Toss

Assigned Thurs Sept. 24, 2003
Program Due Thurs Oct 9th, 2003 at 11:59pm
Students are encouraged to complete this assignment by the original due date of Tues Oct 7th to allow sufficient time for exam preparation
Design Due Thurs Oct 2nd, 2003 at 11:59pm
Updates Fri, Sept 26th
There was a discrepancy between the data file description and the example data file which has been corrected. The data file will contain the words "Head" or "Tail".

Objectives


Project Description

When a fair coin is tossed, the only possible outcomes are Heads (H) and Tails (T). When a coin is tossed many times, the outcomes can be represented as a sequence of Hs and Ts. In this project, we will analyze the sequence of outcomes from tossing a fair coin. You should note that this analysis applies to computer data as well since we could denote Heads using 1 and Tails using 0 rather than H and T. Such analysis can be applied to message encoding and compression.

One important aspect of analyzing such a sequence is the number and lengths of the embedded "runs". A "run" is a sequence of consecutive Heads or Tails. For example, in the sequence HHHHTTHHTTTHT, there is a run of 4 Hs, a run of 2 Ts, a run of 2 Hs, a run of 3 Ts, a run of 1 H and finally a run of 1 T.

We will count the number of runs of each length for both Heads and Tails. In theory, if we toss a fair coin N times, there is a (very small, teeny, tiny) non-zero probability that all N tosses will be Heads or all N tosses will be Tails. Therefore, in theory the longest possible run for N tosses is N. In an award-winning paper (College Math Journal, Vol 21, 1990), Mark Schilling calculates that the expected longest run for N tosses is the integer closest to log2(N / 2) plus or minus 3, which is much less than N. Our program will help validate Mark's theory.

In addition to the frequency and length of the runs, your program will analyze the sequence of outcome to count or calculate the following

  1. The number of Heads and Tails
  2. The number of runs of Heads and Tails
  3. The longest run of Heads and Tails
  4. The average run length of Heads and Tails
  5. The expected longest run. Note that Mark Schilling's formula uses log2, but the C++ math function log10(double) returns the log10. We can convert from log10 to log2 using the formula log2( X ) = log10( X ) / log10( 2 )

Your program will accept a single command line argument which is the name of the file that contains the outcomes of an unknown number of tosses of a fair coin. The format of the data file is given below. Your program must use vectors and may not use arrays.

Your program must contain the following functions. The code for these functions must be found in Coins.cpp. Their prototypes and function header comments must be found in Coins.h which should be included in Proj2.cpp. The names of the functions are left to you, but must follow our coding standards. The parameters and return type for each function are also left to you, but you must carefully consider how each parameter should be passed. Other functions are permitted as you see fit to achieve a good top-down design.

  1. A function to read the data file and create a vector that represents the sequence of Hs and Ts (let's call it the "outcomes" vector)
  2. A function (or functions) that analyzes the "outcomes" vector and counts the number of runs for each possible run length for Heads and Tails. The number of runs of Tails is stored in one vector, the number of runs for Heads in a separate vector. Let's call these the "runs" vectors. The K-th element of the runs vectors is the number of runs of length K. You may write one function that counts the runs for Heads and Tails simultaneously, or separate functions for Heads and Tails.
  3. A function that scans a "runs" vector and calculates the longest run.
  4. A function that calculates the average run length for Heads (or Tails).
    To calculate this average run length, divide the total number of Heads (or Tails) by the number of runs of Heads (or Tails)
  5. A function to calculate the expected length of the longest run. To make this calculation we will use a modified version of the given formula. Instead of calculating "the nearest integer" to log2( N / 2 ), calculate log2( N / 2 ) and round down.
  6. Functions to print the following data with appropriate labels. See the sample output below for an acceptable output format.
  7. A function that prints the table of the number of runs for each possible run length (the contents of the "runs" vectors). This table contains a row for each run length from 1 to max( max Tail run length, max Head run length ). The last row's run length is max of longest Head/Tail runs. No rows skipped even if the Tail and Head column data are both zero. A centered title for the table is required. Column headers are required. Data in the columns must be right-justified. The sample output shows an acceptable format for the table.

Sample Output

This sample outputs was created from the data files p2.200.dat and p2.10000.dat respectively which can be copied from Mr. Frey's public directory /afs/umbc.edu/users/d/e/dennis/pub/CMSC202/p2
Note that the table contains a row for each run length from 1 to the maximum run length of either Heads or Tails in this small example. No rows are skipped. linux3[100]% Proj2 p2.200.dat Welcome to the coin toss analyzer program There were 200 tosses The expected length of the longest run (+/- 3) is: 6 There were 106 Heads There were 54 runs of Heads The longest run of Heads was 6 The average run of Heads was 1.96 There were 94 Tails There were 54 runs of Tails The longest run of Tails was 4 The average run of Tails was 1.74 Run Length Table Run Length Tail Runs Head Runs 1 30 30 2 13 10 3 6 5 4 5 5 5 0 3 6 0 1 linux3[101]% Proj2 p2.10000.dat Welcome to the coin toss analyzer program There were 10000 tosses The expected length of the longest run (+/- 3) is: 12 There were 4881 Heads There were 2541 runs of Heads The longest run of Heads was 13 The average run of Heads was 1.92 There were 5119 Tails There were 2541 runs of Tails The longest run of Tails was 13 The average run of Tails was 2.01 Run Length Table Run Length Tail Runs Head Runs 1 1234 1329 2 651 642 3 341 278 4 155 154 5 79 72 6 49 31 7 17 21 8 10 8 9 2 2 10 1 3 11 1 0 12 0 0 13 1 1 linux3[102]%

The Data File

The data file represents the sequence of outcomes from tossing a fair coin some unkown number of times. Each line in the file will represent one outcome by containing the word "Head" or the word "Tail" (without the quotes). There will be no blank lines in the middle of the file, but inadvertent blank lines at the end of the file should be ignored. You may assume the file has at least one outcome.
For example, a file that represents a sequence of 5 tosses may look like this Head Head Tail Head Tail Be sure to correctly check for EOF when reading the data file so that blank lines at the end do not cause you to reprocess the last real data line.

Free Advice and Information

  1. Use incremental development.
  2. Spend some time thinking about the algorithm for the function that counts the runs before starting to code.
  3. Test your program with small data files that you create so that you can manually verify the results. Trade them with your friends.
  4. A program named GenHT is available in Mr. Frey's public directory /afs/umbc.edu/users/d/e/dennis/pub/CMSC202/p2 and can be used to generate data files with random outcomes. If you copy this file to your directory, you may have to give yourself execute permission using the Unix chmod command.
  5. Check out the math functions floor( ) and log( ) for help with calculating the expected length of the longest run.
  6. Use I/O manipulators (setw, setprecision, etc) to help format your output. See the class notes on streams.
  7. Be sure to check that the data file opened properly.
  8. Be sure to correctly check for EOF when reading the data file so that blank lines at the end do not cause you to reprocess the last real data line.
  9. If you pass a stream to a function, it must be passed by reference.
  10. Use #include <cmath> for the C++ math functions
  11. Copy the makefile from project 1 and modify it for project 2.

Project Design Assignment

Your project design document for project 2 must be named p2design.txt. Be sure to read the
design specification carefully. Submit your design in the usual way: submit cs202 Proj2 p2design.txt

Project Makefile

For this project, you will be responsible for providing your own makefile. Typing "make" should compile and link all files for your project. Your makefile should also support the commands "make clean" and "make cleanest". If you start with the makefile for project 1, the changes for project 2 are straightforward.


Grading

The grade for this project will be broken down as follows. A more detailed breakdown will be provided in the grade form you recieve with your project grade.

85% - Correctness

This list may not be comprehensive, but everything on this list will be verified by the graders.

15% - Coding Standards

Your code adheres to the
CMSC 202 coding standards as discussed and reviewed in class.

Project Submission

Assuming you've used the recommended file names, then to submit your project, type the command submit cs202 Proj2 Proj2.cpp Coins.cpp Coins.h makefile The order in which the files are listed doesn't matter. However, you must make sure that all files necessary to compile your project (using the makefile) are listed.

You can check to see what files you have submitted by typing

submitls cs202 Proj2

More complete documentation for submit and related commands can be found here.

Remember -- if you make any change to your program, no matter how insignificant it may seem, you should recompile and retest your program before submitting it. Even the smallest typo can cause compiler errors and a reduction in your grade.

Avoid unpleasant surprises!
Be sure to use the submitmake and submitrun utilities provided for you to compile, link and run your program after you've submitted it.


Last Modified: Friday, 10-Oct-2003 12:23:44 EDT