CMSC 202 Lecture Notes: Introduction to Sorting

This handout documents some of the material covered in the sorting lectures.

Introductory Ideas

There are basically two types of sorting algorithms:
1. comparison-based sorting, and
2. address-calculation-based sorting
Radix sorting is an example of an address-calculation-based method. We do not cover these methods.
Examples of comparison-based algorithms are:
1. O(n²) algorithms
  - bubblesort
  - insertion sort
  - selection sort
2. O(n lg(n)) algorithms
  - merge sort
  - quick sort (average behavior)
  - heap sort
In the worst case, Quicksort is an O(n²) algorithm.
The minimum number of comparisons required, on average, to sort n items, using a comparison-based sorting method, is n lg(n).

Definition: A comparison tree (sometimes called a decision tree) is a binary tree in which, at each internal node, a comparison is made between two keys and in which each leaf represents a sorted arrangement of keys. The number of leaves in a comparison tree must be n!, where n is the number of items to be sorted. This is the number of permutations of the n items. Every permutation must be represented in the comparison tree, and every leaf represents one of the permutations.

The following figure is a comparison tree that sorts 3 items. Each node in the tree asks one question about the relative order of a, b and c. The answer to the question determines which branch below the node is taken. Each node is also labeled with the set of possible permutations of a, b and c that is consistent with the questions that have been answered so far.

Fig 1: Comparison Tree for 3 Items

The "worst-case" number of comparisons in the tree is the length of the longest path. For the three item tree above, the longest path is of length 3. This can be expressed as the ceiling of lg(n). The "average" number of comparisons is just the sum of the path lengths divided by the number of leaves or

    (2 + 3 + 3 + 3 + 3 + 2) / 6 = 2.67

It can be shown that as n increases, the average number of comparisons grows proportionately to n lg(n). Thus, the very best average performance of any sorting algorithm based on comparisons is O(n lg(n)).

Question: Since Merge Sort is an O(n lg(n)) algorithm and selection sort is an O(n²) algorithm, why would one ever choose the "slower" selection sort over the "faster" Merge Sort?
Answer: selection sort can be faster than Merge Sort when n is not large. It's a simpler algorithm so will likely have a lower constant of proportionality than Merge Sort. The following figure shows an example.

Fig 2: Comparing the functions 10n² and 30 n lg(n) for small values of n.

Question: Since Merge Sort and Quicksort are each O(n lg(n)) algorithms, why choose one over the other?
Answer: Quicksort runs faster on average, even though both have the same growth behavior with increasing n.

Question: Well, then, why ever use Merge Sort?
Answer: The average performance of Quicksort is O(n lg(n)), but there are worst cases which produce n² performance. Merge Sort performance is the same for average and worst cases. If you don't want to take the chance that your data may give the worst case for Quicksort, you might want to choose Merge Sort (or some other O(n lg(n)) algorithm).

Thomas A. Anastasio, Thu Nov 13 16:28:15 EST 1997

Modified by Richard Chang Thu Jan 22 2:56:48 EST 1998.

CMSC 202 Lecture Notes: Introduction to Sorting

Introductory Ideas

Comparison Trees