Cover page images (keyboard)

Searching and Algorithm Analysis

Sue Evans & Travis Mayberry



Searching

Linear Search

# linearSearch() searches for the item in myList and returns
# the index of item in the list myList, or -1 if not found.
# Inputs: item, the item to search for
#         myList, the list to search for item
# Output: the index where item was found or -1 if index was 
#         not in the list  
def linearSearch(item, myList):

    for index in range(len(myList)):
        if myList[index] == item:
            return index

    return -1

Analysis

Linear Search Analysis

Analysis Metric

Improving Search

Binary Search

For example, suppose we were trying to find the word "strawberry".

Another example, finding "banana".

Binary Search Code

# binarySearch() performs a binary search for an item in a list
# Inputs: list, the list to search
#         item, the item to search for
# Output: the index of item in the list, or -1 if not found
def binarySearch(list, item):

    low = 0
    high = len(list) - 1

    while low <= high:

        mid = (low + high) / 2


        # if found return the index
        if item == list[mid]:
            return mid

        # if item is in the 2nd half of the list
        elif item > list[mid]:
            low = mid + 1

        # if item is in the 1st half of the list
        else:
            high = mid - 1

    # item was not in list
    return -1

Analysis of Binary Search

How fast is log2(n) ?

How many accesses will it take to find X as we increase N ?

N
log2(N)
1
1
10
3
100
7
1,000
10
1,000,000
20

Keep in mind that a linear search of a list containing 1,000,000 items would require 1,000,000 accesses in the worst case.

So binary search which runs in log2(n) is amazingly fast!

Command-line arguments

There are times when it would be convenient for your program to be able to get information from the operating system's command line. This allows programs to be run in batch mode where the output from one program can be the input for another, etc.

Any number of arguments can be passed from the command line into your program.

Here's an example of code that uses command-line arguments:

import sys

num = len(sys.argv)

for i in range(num):

    print "sys.argv[%d] = %s" % (i, sys.argv[i])

Here's the output:

linuxserver1.cs.umbc.edu[160] python commandLine.py 2 foo 7.5 bar snoopy jazz
sys.argv[0] = commandLine.py
sys.argv[1] = 2
sys.argv[2] = foo
sys.argv[3] = 7.5
sys.argv[4] = bar
sys.argv[5] = snoopy
sys.argv[6] = jazz
linuxserver1.cs.umbc.edu[161]

Using command-line arguments

This example uses command line arguments to give the program the name of the input file to use and the name of the output file to write during processing. This is a very common use of command line arguments.

If your program needs command line arguments in order to run, then you should have a clearly marked usage instructions in your file header comment to explain how to run the program.

# commandLine.py
# Sue Evans
# 11/17/09
# All sections
# bogar@cs.umbc.edu
#
# This program illustrates using command-line arguments
# It also uses file-handling, strings, lists & dictionaries
#
# This program requires command line arguments which are the
# filename of the input file and the filename of the output
# file, in that order.

import sys
import string

NUM_ARGS = 3
ANSWER_KEY = ['T','T','F','a','b','c']
ANSWER1_POS = 2
OFFSET = 4

def main():

    # make sure there are NUM_ARGS arguments on the command line
    # exit if not
    if len(sys.argv) != NUM_ARGS:
        print "This program requires command line arguments."
        print "The first argument is the filename of the input file."
        print "The second argument is the filename of the output file."    
        sys.exit()

    grades = {}

    # open file for input
    infile = open(sys.argv[1], "r")

    # for each student read in a line, strip it and
    # split it into the student's name and a string of his answers
    for line in infile:
        student = string.strip(line)
        student = string.split(student)
        size = len(student[1])

        # change the string of answers to a list of answers
        answers = []
        for i in range(ANSWER1_POS, size, OFFSET):
            answers.append(student[1][i])

        # make the student's name the key and his list of answers the value
        grades[student[0]] = answers

        # calculate the student's score
        score = 0
        size = len(answers)
        for question in range(size):
            if answers[question] == ANSWER_KEY[question]:
                score += 1

        # change the value to be just the score instead of a
        # list of answers
        grades[student[0]] = score

    # close the infile and open the outfile
    infile.close()
    outfile = open(sys.argv[2], "w")

    # make a list of the keys and sort them
    names = grades.keys()
    names.sort()

    # write the sorted student's names and their scores to the outfile
    for name in names:
        outfile.write(name + "\t" + str(grades[name]) + "\n")

    # close the output file
    outfile.close()


main()

Here's the input file, answers.txt :

Barnes,Beth ['T','T','F','a','b','c']
Carson,Ed ['T','F','T','a','b','b']

Let's run it!

linuxserver1.cs.umbc.edu[205] python commandLine.py answers.txt grades.out
linuxserver1.cs.umbc.edu[206]

Here's grades.out :

Barnes,Beth     6
Carson,Ed       3

Since I chose to use a dictionary, things became out of order immediately. Therefore, getting a list of keys and sorting them was necessary to get the roster back into sorted order by the students' last names.

Command-line argument Exercise

Write a program that will add values passed in as command-line arguments and will print their sum. The user may enter as many values as they choose on the command line.