Cover page images (keyboard)

Searching and Algorithm Analysis

Sue Evans & Travis Mayberry



Searching

Linear Search

# linearSearch() searches for the item in myList and returns
# the index of item in the list myList, or -1 if not found.
# Inputs: myList, the list to search for item
#         item, the item to search for
# Output: the index where item was found or -1 if index was 
#         not in the list  
def linearSearch(myList, item):

    for index in range(len(myList)):
        if myList[index] == item:
            return index

    return -1

Analysis

Linear Search Analysis

Analysis Metric

Improving Search

Binary Search

For example, suppose we were trying to find the word "strawberry".

Another example, finding "banana".

Binary Search Code

# binarySearch() performs a binary search for an item in a list
# Inputs: myList, the list to search
#         item, the item to search for
# Output: the index of item in the list, or -1 if not found
def binarySearch(myList, item):

    low = 0
    high = len(myList) - 1

    while low <= high:

        mid = (low + high) / 2


        # if found return the index
        if item == myList[mid]:
            return mid

        # if item is in the 2nd half of the list
        elif item > myList[mid]:
            low = mid + 1

        # if item is in the 1st half of the list
        else:
            high = mid - 1

    # item was not in list
    return -1

Analysis of Binary Search

How fast is log2(n) ?

How many accesses will it take to find X as we increase N ?

N
log2(N)
1
1
10
3
100
7
1,000
10
1,000,000
20

Keep in mind that a linear search of a list containing 1,000,000 items would require 1,000,000 accesses in the worst case.

So binary search which runs in log2(n) is amazingly fast!

Command-line arguments

There are times when it would be convenient for your program to be able to get information from the operating system's command line. This allows programs to be run in batch mode where the output from one program can be the input for another, etc.

Any number of arguments can be passed from the command line into your program.

Here's an example of code that uses command-line arguments:

import sys

argc = len(sys.argv)

print "Here are the command line arguments: "

for i in range(argc):

    print "sys.argv[%d] = %s" % (i, sys.argv[i])

Here's the output:

linuxserver1.cs.umbc.edu[160] python commandLine.py 2 foo 7.5 bar snoopy jazz
Here are the command line arguments:
sys.argv[0] = commandLine.py
sys.argv[1] = 2
sys.argv[2] = foo
sys.argv[3] = 7.5
sys.argv[4] = bar
sys.argv[5] = snoopy
sys.argv[6] = jazz
linuxserver1.cs.umbc.edu[161]

Using command-line arguments

This example uses command line arguments to give the program the name of the input file to use and the name of the output file to write during processing. This is a very common use of command line arguments.

If your program needs command line arguments in order to run, then you should have a clearly marked usage instructions in your file header comment to explain how to run the program.

# commandLine.py
# Sue Evans
# 11/17/09
# All sections
# bogar@cs.umbc.edu
#
# This is a quiz grading program that illustrates using command-line 
# arguments. It also uses file-handling, strings, lists & dictionaries
#
# This program requires command line arguments which are the
# filename of the input file and the filename of the output
# file, in that order.
#
# Usage: python commandLine.py <input file> <output file>
#

import sys
import string

def main():

    NUM_ARGS = 3
    ANSWER_KEY = ['T','T','F','a','b','c']

    # The student's answer data will be a string in the 
    # form ['T','T','F','a','b','c'] so we need the 
    # following constants to extract the actual answers  
    # from the string. e.g. The first answer, T, is at 
    # index 2.  Each subsequent answer is offset by 4.
    ANSWER1_POS = 2
    OFFSET = 4

    # make sure there are NUM_ARGS arguments on the command line
    # exit if not
    argc = len(sys.argv)
    if argc != NUM_ARGS:
        print "This program requires command line arguments."
        print "The first argument is the filename of the input file."
        print "The second argument is the filename of the output file."    
        print "Usage: python commandLine.py <input file> <output file>"
        sys.exit()

    # create an empty dictionary to hold the students' grades
    grades = {}

    # open file for input
    infile = open(sys.argv[1], "r")

    # for each student read in a line, process it 
    # and calculate the student's grade 
    for line in infile:
        student = string.strip(line)

        # separate the line into a list of two strings 
        # made up of the student's name and a string of her 
        # answers 
        # e.g. ["Barnes,Beth", "['T','T','F','a','b','c']"] 
        student = string.split(student)

        # create an empty list to hold the student's answers 
        answers = [] 
 
        # get the size of the answer string 
        size = len(student[1]) 

        # convert the string of answers into a list of answers 
        # starting at the index of the first answer 
        for i in range(ANSWER1_POS, size, OFFSET):
            answers.append(student[1][i])

        # make the student's name the key and her list of answers 
        # the value
        grades[student[0]] = answers

        # calculate the student's score
        score = 0
        size = len(answers)
        for question in range(size):
            if answers[question] == ANSWER_KEY[question]:
                score += 1

        # change the value to be the score instead of a
        # list of answers
        grades[student[0]] = score

    # close the infile and open the outfile
    infile.close()
    outfile = open(sys.argv[2], "w")

    # make a list of the keys and sort them
    names = grades.keys()
    names.sort()

    # write the sorted students' names and their scores to the outfile
    for name in names:
        outfile.write(name + "\t" + str(grades[name]) + "\n")

    # close the output file
    outfile.close()


main()

Here's the input file, answers.txt :

Barnes,Beth ['T','T','F','a','b','c']
Carson,Ed ['T','F','T','a','b','b']

Let's run it!

linuxserver1.cs.umbc.edu[205] python commandLine.py answers.txt grades.out
linuxserver1.cs.umbc.edu[206]

Here's grades.out :

Barnes,Beth     6
Carson,Ed       3

Since I chose to use a dictionary, things became out of order immediately. Therefore, getting a list of keys and sorting them was necessary to get the roster back into sorted order by the students' last names.

Here is an example with the incorrect number of command line arguments:

linuxserver1.cs.umbc.edu[207] python commandLine.py answers.txt 
This program requires command line arguments.
The first argument is the filename of the input file.
The second argument is the filename of the output file.
Usage: python commandLine.py <input file> <output file>
linuxserver1.cs.umbc.edu[208]

Command-line argument Exercise

Write a program that will add values passed in as command-line arguments and will print their sum. The user may enter as many values as they choose on the command line.