Cover page images (keyboard)

Computing with Strings

Sue Evans & Travis Mayberry

Adapted from the CS1 Course at Swarthmore College by Lisa Meeden


Hit the space bar for next slide

Learning Outcomes

String Type

Just like range(x) returns a sequence of numbers, strings are sequences of characters

>>> for ch in 'hello':
...     print ch
...
h
e
l
l
o
>>>

Unlike other languages, strings in Python are denoted using either single or double quotes

>>> myString = "Goodbye"
>>> myString
'Goodbye'

This allows us to easily have quoted text within a string.

>>> fact = '"The Cat In the Hat" was written by Dr. Suess.'
>>> print fact
"The Cat In the Hat" was written by Dr. Suess.
>>>

String Representation

String Representation

Slicing

Slicing

String Operators

Operation Python Operator
Concatenation
+
Repetition
*
Indexing
[ ]
Slicing
[ : ]

Examples using these operators:

>>> 'snow' + 'ball'
'snowball'  
>>> 'hello' * 3
'hellohellohello'

Oops. Let's see if we can do better ...

>>> 'hello ' * 3 + '!'
'hello hello hello !'

Notice in the next example that indexing begins with 0

>>> 'hello'[4]
'o'

Also notice, when slicing, that the character at the first index is included in the slice and all characters from that position up to one less than the second index shown.

>>> 'hello'[1:4]
'ell'

Python also allows use of negative indexes.
An index of -1 is equivalent to the largest index of the string.

>>> 'sample'[-1]
'e'
>>> len('sample')
6
>>> 'sample'[6]
Traceback (most recent call last):
  File "", line 1, in ?
IndexError: string index out of range
>>> 'sample'[5]
'e'
>>> 'sample'[-3]
'p'

String Library

In order to use the string library, you have to import it :
import string

Function Purpose
string.capitalize(s) Returns s with only the first letter capitalized
string.capwords(s) Returns s with the first letter of each word capitalized
string.count(s, sub) Returns the number of times sub occurs in s
string.find(s, sub) Finds the first occurrence of sub in s and returns
its position or -1 if it is not found
string.join(list) Concatenates a list of strings together to make one string
string.upper(s) Returns a copy of s in all capital letters
string.lower(s) Returns a copy of s in all lower case letters
string.split(s, c) Returns a list of strings made by splitting s on each occurence of c
string.strip(s) Removes all whitespace from the beginning and end of s

String Library Examples

How are characters stored?

As binary, of course. Everything is stored in binary!

The technique used for characters is that each character is assigned a number (an integer), and that number is stored in binary.

Recall that an integer is 4 bytes big, with 8 bits/byte, that means an integer needs 32 bits of memory. That allows us to count up to 2147483647. But do we really need all that space to represent a character ?

If you think about the English language and everything that's necessary to communicate, there are very few individual characters :

Type of character Number
Upper-case characters
26
Lower-case characters
26
Digits
10
Arithmetic Operators
9
Punctuation Marks
24
Total
95

This means that in order to assign a number to each character, we only need to count to 95.
How many bits does that take ?

value bits
1
1
2
2
4
3
8
4
16
5
32
6
64
7
128
8

So by storing characters in just one byte (8 bits), the amount of memory needed for each character is only 1/4 of the amount of space needed by an int. This is a tremendous amount of saved space for storing text.
So all characters are only one byte big.

Trivia Question: How big is a nibble ?

ASCII

American Standard Code for Information Interchange or ASCII was derived from telegraphic codes. Work began on ASCII in 1960 with the first version published in 1963. There was a major revision in 1967. The current version became available in 1986.

There are 128 characters defined in ASCII. 33 are nonprinting control characters that control communication devices and printers, like line feed or form feed. 32 (0 - 31) control characters are at the beginning of the chart and one at the end, delete (127). Many of the control characters are obsolete. The values 32 - 126 are the definitions of the printable characters made up of numbers, letters, punctuation, whitespace and symbols.

Just as integers can be cast into floats, characters can be cast into their ASCII values, and from ASCII values back into their characters. Numbers have ASCII values from 48-57, while uppercase letters are 65-90 and lowercase letters are 97-122. Here is a full ASCII table.

>>> chr(104)
'h'
>>> ord('h')
104

Let's print out an ASCII chart for just the printable characters.
How would we do that ?

Console Input

What would you expect the following to do?

x = input() 
#User types: hello 
print x

Output:

Traceback (most recent call last):
  File "<stdin>';line 1, in ?
  File "<string>", line 0, in ?
NameError: name 'hello' is not defined

This is because the string is placed directly where input() was in the code. The user could surround his string with quotes in order to make this work, but you should never leave technical issues for the user to handle. There is a better way to write this code.

Raw Input

ASCII Exercise

Write a program that prints the ASCII value of each letter
in a string all on the same line ?