Cover page images (keyboard)

Python's Data Types

Sue Evans & David Walser



Press space bar for next page

Learning Outcomes

Data types

Whenever we describe a data type, we not only need to give a definition of what that type is but also the operations that can be done on that type. The definition may include the range of possible values that type can hold.

Example: integer

Python's data types

Strings

Data Structures

Why is this important?

Choosing the right data structure to solve a problem affects how easy the problem is to solve, how readable the code will be, and how efficient the resulting program will be.

Lists

A Review of Lists

There are several ways to create new lists:

>>> # assign the literal empty list
>>> empty = []

>>> # use the list constructor
>>> empty = list()    

>>> # give a literal assignment of items
>>> items = ["bread", "milk", "cheese", "cider"]
>>> items
['bread', 'milk', 'cheese', 'cider']

>>> # use the list constructor with a string
>>> letters = list("hello")
>>> letters
['h', 'e', 'l', 'l', 'o']

>>> # use string.split which returns a list of strings
>>> words = "a bunch of words".split()
>>> words
['a', 'bunch', 'of', 'words']

Operators:

Methods:

Built-in functions that operate on lists:

Examples

Examples Using List Methods

append(item) - add an item to the end of the list

>>> items
['bread', 'milk', 'cheese', 'lemonade']
>>> items.append('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham']
>>> items.append('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>>

count(item) - count occurences of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>> items.count('ham')
2
>>> items.count('cheese')
1
>>>

remove(item) - remove the first occurence of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham']
>>> items.remove('ham')
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham']
>>>

extend(list) - append multiple items to end of the list

>>> items.extend(['lettuce', 'tomatoes'])
>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>>

index(item) - locate the index of an item in the list

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.index('milk')
1
>>>

insert(index, item) - insert an item before the one at the index

>>> items
['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.insert(2, 'eggs')
>>> items
['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>>

reverse() - reverse the order of the items in the list

>>> items
['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes']
>>> items.reverse()
>>> items
['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread']
>>>

sort() - sort the list

>>> items
['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread']
>>> items.sort()
>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>>

Built-In Functions

len(list) - count number of items in the list

>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>> len(items)
8
>>>

del(list[index]) - remove the item at the index from the list

>>> items
['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes']
>>> del(items[4])
>>> items
['bread', 'cheese', 'eggs', 'ham', 'lettuce', 'milk', 'tomatoes']
>>>

Using lists

Use a list when order matters. 

Mutability

An object that is mutable can be changed in-place.  Lists are mutable.
Example:  list[2] = "foo"

>>> list = ['cat', 'dog', 'bird', 'hamster']
>>> list
['cat', 'dog', 'bird', 'hamster']
>>> list[2] = "foo"
>>> list
['cat', 'dog', 'foo', 'hamster']
>>>

Strings are immutable (aka, not mutable).
Example:  str[3] = 'f' is not supported

>>> str = 'fooooo'
>>> str[3] = 'f'
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: object does not support item assignment
>>>

Lists, and other data structures can be used as return values from functions.  They can also be used as arguments.  Mutable data structures can be changed by the function without returning them.


# fillList() takes an empty list and fills it with the integers 0 - 9
# Input: aList, an empty list
# Output: None, but the list has been modified

def fillList(aList):

    for i in range(10):
        aList.append(i)


def main():

    print
    print "This program uses a function to fill a list with values"
    print "Notice that since the list is mutable, we need only pass"
    print "it to the function and let the function modify it.  The list"
    print "doesn't need to be returned."
    print

    myList = []
    print "Before calling fillList, myList = ", myList
    fillList(myList)
    print "After calling fillList, myList =", myList


main()

Let's see it work!

This program uses a function to fill a list with values
Notice that since the list is mutable, we need only pass
it to the function and let the function modify it.  The list
doesn't need to be returned.

Before calling fillList, myList =  []
After calling fillList, myList = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

So lists are definitely mutable!

Strings are immutable so let's see the difference:


# fillString() takes an empty string and concatenates letters to the string
# Input: aString, an empty string
# Output: None, and the original string has NOT been modified
def fillString(aString):

    for ch in "letters":
        aString += ch


def main():

    print
    print "This program uses a function to fill a string with characters"
    print "Notice that since the string isn't mutable, the function won't"
    print "be able to modify the original string.  The string would have to"
    print "be returned.  In fact, since strings are immutable, for each"
    print "character that is concatenated onto the string, a new string has"
    print "to be made."
    print

    myString = ""
    print "Before calling fillString, myString = ", myString
    fillString(myString)
    print "After calling fillString, myString =", myString


main()

What happens when we run this:

This program uses a function to fill a string with characters
Notice that since the string isn't mutable, the function won't
be able to modify the original string.  The string would have to
be returned.  In fact, since strings are immutable, for each
character that is concatenated onto the string, a new string has
to be made.

Before calling fillString, myString =  
After calling fillString, myString = 

names.dat

Nicholas, Charles
Finin, Tim
desJardins, Marie
Evans, Sue

Example output:

1    DESJARDINS, MARIE
2    EVANS, SUE
3    FININ, TIM
4    NICHOLAS, CHARLES

How about a top-down design.


import string

# printGreeting() explains the program to the user
# Inputs: None
# Output: None
def printGreeting():
    
    print "This program reads in a list of names from a file,"
    print "capitalizes them, sorts them alphabetically, and prints"
    print "them out in a numbered table."


# readNames() reads in the names from a file
# and puts them into a list called names
# Input: filename, the name of the file to open
# Output: names, the list of names 
def readNames(filename):

    names = []

    file = open(filename, 'r')

    for line in file:
        name = line.strip()
        names.append(name)

    file.close()

    return names


# capitalizeNames() capitalizes names (changes the list in-place)
# Inputs: the list of names
# Outputs: None, but the list was modified by this function
def capitalizeNames(names):

    for i in range(len(names)):
        names[i] = names[i].upper()


# printTable() prints the table of numbered names
# one per line
def printTable(names):

    length = len(names)

    for i in range(length):
        print (i + 1), "\t", names[i]


def main():

    printGreeting()

    # get filename from user    
    filename = raw_input("What's the name of the file ? ")

    # process the names
    names = readNames(filename)
    capitalizeNames(names)
    names.sort()

    # print the table
    printTable(names)

main()

Sets

Definition:

Sets are an unordered collection of items, where duplicates are not allowed.
Items must be immutable (such as numbers or strings).

Operations:

Sets in Python

Operators

Methods

Built-in functions that operate on sets

Using sets

When would you use a set?

When order doesn't matter and you don't want to allow duplicates.

linuxserver1.cs.umbc.edu[128] python nameset.py
Enter a name (or . to quit): sue
Enter a name (or . to quit): ben
Enter a name (or . to quit): dan
Enter a name (or . to quit): dawn
Enter a name (or . to quit): tess
Enter a name (or . to quit): dan
Enter a name (or . to quit): dawn
Enter a name (or . to quit): .
sue
ben
dan
dawn
tess
linuxserver1.cs.umbc.edu[129]

Using Sets for HW4

You should have a file that contains the names of the courses
you've taken toward the CS major.  There should be one course
per line.

Please enter the name of this file : courses.txt

Part A Requirements:
You need to take CMSC 411
You need to take CMSC 313
You need to take CMSC 421
You need to take CMSC 341
You need to take CMSC 345
You need to take CMSC 331
You need to take CMSC 441
You need to take CMSC 304

Part B Requirements:
You need to take MATH 221

Oh, no!
We have the right answer, but our list of courses is out of order!

So after trying the 2 different data structures, lists and sets for HW4, what data structure do you think would be the best for this program ?

Associative Arrays

Definition:

Associative arrays are tuples of key-value pairs.  The keys must be immutable (such as numbers or strings).  Values can be any type.  They associate one value with each key.  Keys aren't in any particular order.  Associative arrays are implemented with the Python dictionary type.

Operations:

Associative arrays are built into many languages, but they are called many different things:

Associative Arrays in Python

Creating new dictionaries:

# use the constructor to make an empty dictionary
empty = dict()

# use the empty dictionary literal
empty = {}        

# use the constructor with some initial pairings
# in this case the keys are strings
numbers = dict(one=1, two=2)        

# use the dictionary literal with some initial pairings
capitals = {'MD':'Annapolis', 'ME':'Augusta'}

Operators:

Methods:

Built-in functions that operate on dictionaries:

Dictionary Examples

Operators:

Methods:

copy() - create a copy of the dictionary

>>> cap2 = capitals.copy()
>>> cap2
{'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'}

clear() - remove all entries from the dictionary

>>> cap2.clear()
>>> cap2
{}

get(key) - find the value associated with a key

>>> cap = capitals.get('MD')
>>> cap
'Annapolis'
>>> cap = capitals.get('VA')
>>> cap
>>>

get(key, default) - use a default value when a key isn't found

>>> cap = capitals.get('MD', 'unknown')
>>> cap
'Annapolis'
>>> cap = capitals.get('VA', 'unknown')
>>> cap
'unknown'

has_key(key) - another way to test for key containment

>>> capitals.has_key('ME')
True
>>> capitals.has_key('ND')
False

keys() - create a list of the keys in the dictionary

>>> capitals.keys()
['ME', 'MD', 'MI']

values() - create a list of the values in the dictionary

>>> capitals.values()
['Augusta', 'Annapolis', 'Lansing']

Built-in functions that operate on dictionaries:

len(dict) - count # of keys in the dictionary

>>> len(capitals)
3

del(dict[key]) - remove a key (and its value) from the dictionary

>>> del(capitals['ME'])
>>> capitals
{'MD': 'Annapolis', 'MI': 'Lansing'}
>>> 

Using associative arrays

Why use an associative array?

When you want to associate keys with values.  Keys don't have to be numbers, or if they are, they don't have to be contiguous and starting from zero (like in a list).

ite207-pc-01.cs.umbc.edu[126] python dice.py
Sum     Occurences
------------------
2       38
3       60
4       92
5       104
6       141
7       169
8       132
9       120
10      68
11      43
12      33
ite207-pc-01.cs.umbc.edu[127] 

Dictionary Exercise

Here's the data file

Plymouth 02360
Bar_Harbor 04609
Mystic 06355
Cape_May 08204
Hyde_Park 12538
Niagara_Falls 14301
Paradise 17562
Rehoboth_Beach 19971
D.C. 20001
UMBC 21250
Key_West 33040
Nashville 37201
Custer 57730
West_Yellowstone 59758
Whitefish 59901
Chicago 60610
St._Louis 63121
Kansas_City 66101
Baton_Rouge 70801
Austin 78701
Boulder 80301
Cody 82414
Ketchum 83340
Moab 84532
Tempe 85281
Taos 87571
Las_Vegas 89110
Big_Sur 93920
Carmel 93923
Monterey 93940
San_Francisco 94110

Here's what the output should look like:

linuxserver1.cs.umbc.edu[149] python zipcodes.py
Enter the filename of the zipcode file : zipcodes.txt
Which location (Q to quit) ? Plymouth
02360
Which location (Q to quit) ? Carmel
93923
Which location (Q to quit) ? UMBC
21250
Which location (Q to quit) ? Oshkosh
unknown
Which location (Q to quit) ? Niagara_Falls
14301
Which location (Q to quit) ? Q
linuxserver1.cs.umbc.edu[150]

Which data structure ?

What data structures would you use for these programs?