CMSC 201

Lab 12: Dictionaries

Dictionaries

Remember, before running each program type:

scl enable python33 bash

Program

Letter Frequencies

Letter frequencies can be useful for cracking codes in cryptoanalysis or for making efficient compression algorithms. In today's lab, you are going to compare letter frequencies for the Wikipedia page on Pablo Picasso, written in different languages. We have provided files with the text from each page in French, Spanish, Portuguese, Italian, German, and English.

Steps:

Download the text files below. Put them in the same directory that you will be writing today's program in.

  1. english_picasso.txt
  2. french_picasso.txt
  3. german_picasso.txt
  4. portuguese_picasso.txt
  5. italian_picasso.txt
  6. spanish_picasso.txt
  7. fileslist.txt

The last file, fileslist.txt, contains the name of each file on a single line. You are going to use fileslist.txt to open each of the language files for analysis, one at a time.

  1. Create the program file for this lab.
  2. In the file, read each filename from fileslist.txt into a list of files that you can open later. (Remember to use .strip() to get rid of whitespace and newlines)
  3. Close fileslist.txt.

  1. For each filename in your list of filenames:
    1. Create an empty dictionary (to hold the characters and their frequency)
    2. Open a file from the list for reading
    3. Use readlines() to read in all the lines from the file into a list
    4. Go through each character in each line:
      1. Remove whitespace/newline from beginning and end of line
      2. Then for each character in the file, use isalpha() to check if it is alphabetic characters only
        1. We only want to get the frequencies of the alphabetic characters.
      3. For the alphabetic characters:
      4. If it is in the dictionary, increment that character's value
      5. Otherwise, add it to the dictionary with a value of 1
    5. print a heading indicating which file this is
    6. print the number of a's, e's, i's, o's, and u's in the dictionary of characters

Sample Output

english_picasso.txt

 a: 2741
 e: 3327
 i: 2637
 o: 2311
 u: 822



french_picasso.txt

 a: 2676
 e: 4494
 i: 2262
 o: 1626
 u: 1653



italian_picasso.txt

 a: 2799
 e: 2639
 i: 2979
 o: 2305
 u: 837



portuguese_picasso.txt

 a: 3083
 e: 2704
 i: 1573
 o: 2358
 u: 986



spanish_picasso.txt

 a: 10673
 e: 11255
 i: 6034
 o: 7239
 u: 3788



german_picasso.txt

 a: 5177
 e: 11880
 i: 6482
 o: 2856
 u: 3235