# 15. Binary Search & Big-Oh Notation

#### Thursday October 22, 1998

[Previous Lecture] [Next Lecture]

Handouts (available on-line):

Programs from this lecture:

Topics Covered:

• Finish up some topics on sorting.

• One disadvantage of Insertion Sort is that the running times have greater variability. We look at a test program that calls the Insertion Sort routine with an array that is already sorted. Another test program uses Insertion Sort to sort an array that is sorted in the opposite order. Running times from the sample run shows that insertion is very fast when the array is already sorted and very slow if the array is already sorted, but in the wrong order.

• As with Insertion Sort, the running times of Quicksort are quite variable. If the array is already sorted, Quicksort is in fact very slow. A test program for Quicksort shows that in these cases, the running time of Quicksort is proportional to n2. (See sample runs.)

• These variations in running time makes us question our methods for comparing the "speed" of these sorting algorithms. Is our conclusion that Quicksort is faster than Insertion Sort valid in a general sense or only valid for the data we used for testing? Should we consider best case, worst case or average case running time?

• Typically an algorithm is evaluated based upon its worst case running time. This is because most algorithms will run quickly on some input, so best case running times tend to be meaningless. Average case running times are hard to determine, both empirically and theoretically. Also, an algorithm is typically evaluated on its asymtoptic behavior --- i.e., how it performs for large inputs. Thus, we say that an algorithm has worst case running time that is O(n2) if for some constant c > 0 and some constant n0 > 0, the running time of some implementation of the algorithm is less than cn2 for inputs with n items, where n > n0. (You should read the online notes on asymptotic analysis.)

• It turns out that the worst case running times of Selection Sort and Insertion Sort are O(n2), the worst case running time of Merge Sort is O(n log n) and the average case runnning time of Quicksort is O(n log n). The intended meaning of these statements is that since n log n is much smaller than n2, the running times of Merge Sort and Quicksort are faster than those of Selection Sort and Insertion Sort. However, asymptotic analysis leaves open the possibility that Insertion Sort, for example, might be faster than Quicksort or Merge Sort for sorting a small number of items. In fact, we can exploit this to make our Quicksort and Merge Sort algorithms slightly faster.

• We can tweak the Quicksort and Merge Sort programs for somewhat better performance. We tweaked Merge Sort by calling insertion sort for arrays with fewer than 20 numbers instead of recursively calling Merge Sort. For small arrays, insertion sort can be faster than Merge Sort. The sample runs show approximately 7.5% improvement in the running times compared to the previous running times of Merge Sort.

• Similarly, we can tweak Quicksort to achieve a 9.4% improvement in the running times compared to the previous running times.

• What is the point of sorting? To make searching faster. To find whether an item appears in an unsorted array, we may have to examine every element of the array (e.g., when the item does not appear in the array.) In an array that is sorted (say, from smallest to largest), a linear search algorithm would scan the array from the small end to the large end, looking for the given item. As soon as the algorithm finds an item in the array that is larger than the given item, it knows that the item is not in the array. Thus, on average we would only have to look at n/2 items using the linear search algorithm. This is still very slow. (See program and sample run.)

• Binary search is a recursive divide-and-conquer algorithm. To find an item in a sorted array, we first look at the middle element of the array. If that element is the one we are looking for, then we are done. Otherwise, we either recursively search the first half of the array or the second half of the array, depending on whether the middle element is bigger than or smaller than the given item. Binary search takes O(log n) time in the worst case to find an item in a sorted array with n elements. This is much faster than linear search. (See program and sample run.)

[Previous Lecture] [Next Lecture]