            # Searching and Algorithm Analysis

Sue Evans & Travis Mayberry

# Searching

• We have seen that membership in lists can be checked using the following:
• ```if x in list:
```
• How does Python do this?
• Given a list of integers, how can you search through the values to find one you are looking for?

# Linear Search

• Going through a list item by item to find the value you are looking for is called linear search.
• How would you program this in Python?
```# linearSearch() searches for the item in myList and returns
# the index of item in the list myList, or -1 if not found.
# Inputs: myList, the list to search for item
#         item, the item to search for
# Output: the index where item was found or -1 if index was
#         not in the list
def linearSearch(myList, item):

for index in range(len(myList)):
if myList[index] == item:
return index

return -1
```

# Analysis

• How can we analyze the efficiency of linear search?
• The easiest metric to use is the number of operations required to find the item you are looking for.
• An operation, for our purposes, is any arithmetic or boolean operation (i.e. checking for equality)
• To make things a little easier, we will only consider the number of operations required in the worst case.

# Linear Search Analysis

• What is the worst case for linear search? • When the item you are searching for is the last item in the list (or not in the list), all other items have to be searched before you find it. It is literally "the last place you looked."
• In this case it takes six operations to find the the item you are looking for in the worst case.

# Analysis Metric

• Since the worst case is always when the item you are searching for is the last item in the list, the number of operations required in this case is equal to the length of the list.
• A better way to say this is given a list of length n, we require n operations to find an element in it, in the worst case.
• As n grows, the amount of work required grows linearly, so this is known as a linear time algorithm.

# Improving Search

• If the list is sorted, there is a faster method to search for a specific value.
That method of searching is called binary search.
• Think of how you would play the following game:
• One player thinks of a number between 1 and 100.
• The second player repeatedly guesses a number and the first player tells him/her whether the guess is too high, too low or correct.
• The best way to play this is to always guess halfway between the boundaries you know the answer to be in.
• 50? Lower
• 25? Lower
• 12? Higher
• 18? Higher
• 21? Higher
• 23? Lower
• 22? Yes

# Binary Search

• Now we can take this idea and translate it into a list searching problem.
• The key idea is to use two variables, called low and high, to keep track of the viable positions within the list where the sought after item can be found. ### For example, suppose we were trying to find the word "strawberry".

• Let's go to the word that's right in the middle of the sorted list.
low = 0, high = 7,
mid = (7 + 0) / 2 => 3.
• "strawberry" comes after "cherry" in the dictionary, so we don't need to consider any words before "cherry" in the list.
low = mid + 1 => 4, high = 7,
mid = (4 + 7) / 2 => 5.
• "strawberry" comes after "mango", so we don't need to consider any words before it in the list.
low = mid + 1 => 6, high = 7,
mid = (6 + 7) / 2 => 6
• "strawberry" comes after "orange".
low = mid + 1 => 7, high = 7,
mid = (7 + 7) / 2 => 7
• "strawberry" comes after "pineapple".
low = mid + 1 => 8, high = 7,

### Another example, finding "banana".

low = 0, high = 7,
mid = (0 + 7) / 2 => 3.
• "banana" comes before "cherry".
low = 0, high = mid - 1 => 2,
mid = (0 + 2) / 2 => 1
low = mid + 1 => 2, high = 2,
mid = (2 + 2) / 2 => 2
• "banana" is in the position we are looking at so we are done

# Binary Search Code

• Seeing how binary search is done, how would you code it in Python?
```# binarySearch() performs a binary search for an item in a list
# Inputs: myList, the list to search
#         item, the item to search for
# Output: the index of item in the list, or -1 if not found
def binarySearch(myList, item):

low = 0
high = len(myList) - 1

while low <= high:

mid = (low + high) / 2

# if found return the index
if item == myList[mid]:
return mid

# if item is in the 2nd half of the list
elif item > myList[mid]:
low = mid + 1

# if item is in the 1st half of the list
else:
high = mid - 1

# item was not in list
return -1
```

# Analysis of Binary Search

• How much work does binarySearch do?
• Each time through the loop we halve the amount of values we need to search.
• n, n/2, n/4, n/8, ..., 1
• How many times can we divide n by 2 until we get to 1?
• Let's consider an n that is a power of 2: 32 is 25
• 32/2 = 16
• 16/2=8
• 8/2=4
• 4/2=2
• 2/2=1
• Notice that it took 5 steps to reduce the size of the problem to 1.
• This means that the number of steps required is log2(n).

# How fast is log2(n) ?

#### How many accesses will it take to find X as we increase N ?

N
log2(N)
1
1
10
3
100
7
1,000
10
1,000,000
20

Keep in mind that a linear search of a list containing 1,000,000 items would require 1,000,000 accesses in the worst case.

So binary search which runs in log2(n) is amazingly fast!

# Command-line arguments

There are times when it would be convenient for your program to be able to get information from the operating system's command line. This allows programs to be run in batch mode where the output from one program can be the input for another, etc.

Any number of arguments can be passed from the command line into your program.

Here's an example of code that uses command-line arguments:

```import sys

argc = len(sys.argv)

print "Here are the command line arguments: "

for i in range(argc):

print "sys.argv[%d] = %s" % (i, sys.argv[i])
```

Here's the output:

```linuxserver1.cs.umbc.edu python commandLine.py 2 foo 7.5 bar snoopy jazz
Here are the command line arguments:
sys.argv = commandLine.py
sys.argv = 2
sys.argv = foo
sys.argv = 7.5
sys.argv = bar
sys.argv = snoopy
sys.argv = jazz
linuxserver1.cs.umbc.edu
```
• In the sys module there is a list of strings known as argv. This list will consist of one or more strings.
• sys.argv always contains the name of the executable
• As you see from the example above, each item entered on the command line after that becomes another item in the list of strings. These are known as command-line arguments.
• Everything entered on the command line is a string, so you'll need to cast items that are to be used as numbers.

# Using command-line arguments

This example uses command line arguments to give the program the name of the input file to use and the name of the output file to write during processing. This is a very common use of command line arguments.

If your program needs command line arguments in order to run, then you should have a clearly marked usage instructions in your file header comment to explain how to run the program.

```# commandLine.py
# Sue Evans
# 11/17/09
# All sections
# bogar@cs.umbc.edu
#
# This is a quiz grading program that illustrates using command-line
# arguments. It also uses file-handling, strings, lists & dictionaries
#
# This program requires command line arguments which are the
# filename of the input file and the filename of the output
# file, in that order.
#
# Usage: python commandLine.py <input file> <output file>
#

import sys
import string

def main():

NUM_ARGS = 3

# The student's answer data will be a string in the
# form ['T','T','F','a','b','c'] so we need the
# following constants to extract the actual answers
# from the string. e.g. The first answer, T, is at
# index 2.  Each subsequent answer is offset by 4.
OFFSET = 4

# make sure there are NUM_ARGS arguments on the command line
# exit if not
argc = len(sys.argv)
if argc != NUM_ARGS:
print "This program requires command line arguments."
print "The first argument is the filename of the input file."
print "The second argument is the filename of the output file."
print "Usage: python commandLine.py <input file> <output file>"
sys.exit()

# create an empty dictionary to hold the students' grades

# open file for input
infile = open(sys.argv, "r")

# for each student read in a line, process it
# and calculate the student's grade
for line in infile:
student = string.strip(line)

# separate the line into a list of two strings
# made up of the student's name and a string of her
# e.g. ["Barnes,Beth", "['T','T','F','a','b','c']"]
student = string.split(student)

# create an empty list to hold the student's answers

# get the size of the answer string
size = len(student)

# starting at the index of the first answer
for i in range(ANSWER1_POS, size, OFFSET):

# make the student's name the key and her list of answers
# the value

# calculate the student's score
score = 0
for question in range(size):
score += 1

# change the value to be the score instead of a

# close the infile and open the outfile
infile.close()
outfile = open(sys.argv, "w")

# make a list of the keys and sort them
names.sort()

# write the sorted students' names and their scores to the outfile
for name in names:
outfile.write(name + "\t" + str(grades[name]) + "\n")

# close the output file
outfile.close()

main()
```

Here's the input file, answers.txt :

```Barnes,Beth ['T','T','F','a','b','c']
Carson,Ed ['T','F','T','a','b','b']
```

Let's run it!

```linuxserver1.cs.umbc.edu python commandLine.py answers.txt grades.out
linuxserver1.cs.umbc.edu
```

```Barnes,Beth     6
Carson,Ed       3
```

Since I chose to use a dictionary, things became out of order immediately. Therefore, getting a list of keys and sorting them was necessary to get the roster back into sorted order by the students' last names.

Here is an example with the incorrect number of command line arguments:

```linuxserver1.cs.umbc.edu python commandLine.py answers.txt
This program requires command line arguments.
The first argument is the filename of the input file.
The second argument is the filename of the output file.
Usage: python commandLine.py <input file> <output file>
linuxserver1.cs.umbc.edu
```

# Command-line argument Exercise

Write a program that will add values passed in as command-line arguments and will print their sum. The user may enter as many values as they choose on the command line.