UMBC CMSC202, Computer Science II, Fall 1998,
Sections 0101, 0102, 0103, 0104
15. Binary Search & Big-Oh Notation
Thursday October 22, 1998
Assigned Reading: 6.1-6.5
Handouts (available on-line):
Programs from this lecture:
- Finish up some topics on sorting.
- One disadvantage of Insertion Sort is that the running times have
greater variability. We look at a test
program that calls the Insertion Sort routine with an array that is
already sorted. Another test program
uses Insertion Sort to sort an array that is sorted in the opposite order.
Running times from the sample run
shows that insertion is very fast when the array is already sorted and very
slow if the array is already sorted, but in the wrong order.
- As with Insertion Sort, the running times of Quicksort are quite
variable. If the array is already sorted, Quicksort is in fact very slow.
A test program for Quicksort shows
that in these cases, the running time of Quicksort is proportional to
n2. (See sample runs.)
- These variations in running time makes us question our methods for
comparing the "speed" of these sorting algorithms. Is our conclusion that
Quicksort is faster than Insertion Sort valid in a general sense or only
valid for the data we used for testing? Should we consider best case,
worst case or average case running time?
- Typically an algorithm is evaluated based upon its worst case running
time. This is because most algorithms will run quickly on some input, so
best case running times tend to be meaningless. Average case running times
are hard to determine, both empirically and theoretically. Also, an
algorithm is typically evaluated on its asymtoptic behavior --- i.e.,
how it performs for large inputs. Thus, we say that an algorithm has worst
case running time that is O(n2) if for some constant c > 0
and some constant n0 > 0, the running time of some
implementation of the algorithm is less than cn2 for inputs with
n items, where n > n0. (You should read the
online notes on asymptotic analysis.)
- It turns out that the worst case running times of Selection Sort and
Insertion Sort are O(n2), the worst case running time of Merge
Sort is O(n log n) and the average case runnning time of Quicksort is
O(n log n). The intended meaning of these statements is that since
n log n is much smaller than n2, the running times of Merge
Sort and Quicksort are faster than those of Selection Sort and Insertion
Sort. However, asymptotic analysis leaves open the possibility that
Insertion Sort, for example, might be faster than Quicksort or Merge Sort
for sorting a small number of items. In fact, we can exploit this
to make our Quicksort and Merge Sort algorithms slightly faster.
- We can tweak the Quicksort and Merge Sort programs
for somewhat better performance. We tweaked Merge Sort by calling
insertion sort for arrays with fewer than 20 numbers instead of recursively
calling Merge Sort. For small arrays, insertion sort can be faster than
Merge Sort. The sample runs show
approximately 7.5% improvement in the running times compared to the previous running times of Merge Sort.
- Similarly, we can tweak
Quicksort to achieve a 9.4% improvement in the running times compared to the previous running times.
- What is the point of sorting? To make searching faster. To find
whether an item appears in an unsorted array, we may have to examine every
element of the array (e.g., when the item does not appear in the array.)
In an array that is sorted (say, from smallest to largest), a linear search
algorithm would scan the array from the small end to the large end, looking
for the given item. As soon as the algorithm finds an item in the array
that is larger than the given item, it knows that the item is not in the
array. Thus, on average we would only have to look at n/2 items using the
linear search algorithm. This is still very slow. (See program and sample run.)
- Binary search is a recursive divide-and-conquer algorithm. To find an
item in a sorted array, we first look at the middle element of the array.
If that element is the one we are looking for, then we are done.
Otherwise, we either recursively search the first half of the array or the
second half of the array, depending on whether the middle element is bigger
than or smaller than the given item. Binary search takes O(log n) time in
the worst case to find an item in a sorted array with n elements. This
is much faster than linear search. (See program and sample run.)
29 Oct 1998 17:14:03 EST
to Fall 1998 CMSC 202 Section Homepage