Sue Evans & Will Murnane

- Understand that sorting techniques apply for many simple scenarios.
- Ablility to describe how a few well-known sorting methods work.
- Understand that some algorithms run faster than others
- Realize the ordering of run-times (fastest - slow)
- log
_{2}n - n, a.k.a. linear
- n * log
_{2}n - n
^{2}

- log
- Ability to write code to implement a simple sort

Suppose we have a list of numbers and we want to put them in increasing order. How could we go about this?

Well, we could start at the beginning of the list, then when we see two things that are out of order, swap them so they're in order. Then we go back to the beginning of the list. When we get all the way to the end of the list without doing any swaps we're done. Here it is in code:

def swap(aList, i, j): temp = aList[i] aList[i] = aList[j] aList[j] = temp def mySort(aList): index = 0 size = len(aList) while index < size - 1: if aList[index] > aList[index + 1]: # print the list to see how it changes print aList swap(aList, index, index + 1) index = 0 else: index = index + 1

and here's an example execution:

>>> a=[3,45,2,1,6,7]; mySort(a); print a [3, 45, 2, 1, 6, 7] [3, 2, 45, 1, 6, 7] [2, 3, 45, 1, 6, 7] [2, 3, 1, 45, 6, 7] [2, 1, 3, 45, 6, 7] [1, 2, 3, 45, 6, 7] [1, 2, 3, 6, 45, 7] [1, 2, 3, 6, 7, 45]

We can see how the numbers sort of "bubble" from right to left. This method is called "bubble sort".

We start out with an unsorted list. The idea is to select the item that should be next in the sorted portion of the list from the items in the unsorted part of the list. If we want our sorted list to be in ascending order, then the next thing in the sorted part of the list will be the minimum value in the unsorted part of the list.

# File: selection.py # Author: Sue Evans # Date Written: 11/12/09 # Section: All # EMail: bogar@cs.umbc.edu # # This program sorts a list of numbers using # selection sort. # selectionSort() selects the smallest value # in the unsorted portion of the list and # moves it into the current position. The # values "swap" positions. # Input: a list to be sorted # Output: None, but the list is sorted "in place" def selectionSort(aList): size = len(aList) for unsorted in range(size): # get the index of the smallest value # in the unsorted part of the array smallestIndex = findSmallest(aList, unsorted, size) # swap values temp = aList[smallestIndex] aList[smallestIndex] = aList[unsorted] aList[unsorted] = temp print aList # findSmallest() finds the smallest value in the # list between start (an index), and stop (an index) # inclusive, and returns the smallest value's index. # # Input: a list to search # the 'start'ing and 'stop'ping indices in # the list between which to search for the smallest # Output: returns the index corresponding to the smallest value def findSmallest(aList, start, stop): smallestIndex = start smallestValue = aList[start] # look for the smallest value in the # unsorted part of the array for i in range(start + 1, stop): if (aList[i] < smallestValue): smallestIndex = i smallestValue = aList[i] return (smallestIndex) def main(): aList = [3, 45, 2, 1, 6, 7] size = len(aList) selectionSort(aList) print "\nIn main():" for i in range(size): print "aList[%d] = %d" % (i, aList[i]) main()

and a sample execution:

ite207-pc-01.cs.umbc.edu[134] python selection.py [1, 45, 2, 3, 6, 7] [1, 2, 45, 3, 6, 7] [1, 2, 3, 45, 6, 7] [1, 2, 3, 6, 45, 7] [1, 2, 3, 6, 7, 45] [1, 2, 3, 6, 7, 45] In main(): aList[0] = 1 aList[1] = 2 aList[2] = 3 aList[3] = 6 aList[4] = 7 aList[5] = 45 ite207-pc-01.cs.umbc.edu[135]

Now that we know how to sort items in ascending order using Selection Sort, how would you change the code so that it sorts in descending order instead ?

# findLargest() finds the largest value in the # list between start (an index), and stop (an index) # inclusive, and returns the largest value's index. # # Input: a list to search # the 'start'ing and 'stop'ping indices in # the list between which to search for the largest # Output: returns the index corresponding to the largest value def findLargest(aList, start, stop): largestIndex = start largestValue = aList[start] # look for the largest value in the # unsorted part of the array for i in range(start + 1, stop): if (aList[i] > largestValue): largestIndex = i largestValue = aList[i] return (largestIndex) # selectionSort() selects the largest value # in the unsorted portion of the list and # moves it into the current position. The # values "swap" positions. # Input: a list to be sorted # Output: None, but the list is sorted "in place" def selectionSort(aList): size = len(aList) for unsorted in range(size): # get the index of the largest value # in the unsorted part of the array largestIndex = findLargest(aList, unsorted, size) # swap values temp = aList[largestIndex] aList[largestIndex] = aList[unsorted] aList[unsorted] = temp print aList def main(): aList = [3, 45, 2, 1, 6, 7] size = len(aList) selectionSort(aList) print "\nIn main():" for i in range(size): print "aList[%d] = %d" % (i, aList[i]) main()

Let's run it!

linuxserver1.cs.umbc.edu[109] python descending.py [45, 3, 2, 1, 6, 7] [45, 7, 2, 1, 6, 3] [45, 7, 6, 1, 2, 3] [45, 7, 6, 3, 2, 1] [45, 7, 6, 3, 2, 1] [45, 7, 6, 3, 2, 1] In main(): aList[0] = 45 aList[1] = 7 aList[2] = 6 aList[3] = 3 aList[4] = 2 aList[5] = 1 linuxserver1.cs.umbc.edu[110]

Another method of sorting, called Insertion Sort, takes whatever item it finds as the first item in the unsorted part of the list, and puts it in its proper place in the sorted part of the list. Here's the code:

# File: insertion.py # Author: Sue Evans # Date Written: 11/12/09 # Section: All # EMail: bogar@cs.umbc.edu # # This program sorts a list of numbers using # insertion sort. # insertionSort() gets the next value in the # unsorted portion of the list and moves it # into its correct sorted position. # Input: a list to be sorted # Output: None, but the list is sorted def insertionSort(aList): size = len(aList) for i in range(1, size): j = i temp = aList[j] while j > 0 and aList[j - 1] > temp: aList[j] = aList[j - 1] j -= 1 aList[j] = temp print aList def main(): aList = [3, 45, 2, 1, 6, 7] size = len(aList) insertionSort(aList) print "\nIn main():" for i in range(size): print "aList[%d] = %d" % (i, aList[i]) main()

and an example execution:

ite207-pc-01.cs.umbc.edu[149] python insertion.py [3, 45, 2, 1, 6, 7] [2, 3, 45, 1, 6, 7] [1, 2, 3, 45, 6, 7] [1, 2, 3, 6, 45, 7] [1, 2, 3, 6, 7, 45] In main(): aList[0] = 1 aList[1] = 2 aList[2] = 3 aList[3] = 6 aList[4] = 7 aList[5] = 45 ite207-pc-01.cs.umbc.edu[150]

So far, all the sorts we've seen have one thing in common:

given a list of n items, they do something like n^{2} operations.

To see this, consider an execution of Insertion Sort with a list of n items.
The outer loop goes from 0 to n-1, and each time that loop is executed, the
inner loop goes from 0 to index. After doing a little math, we find that this
algorithm executes in about n^{2} steps. Can we do better than this?

Merge sort works by treating each element of the list as its own sorted list (since there's only one element, it must be sorted), then recursively merges pairs of sorted lists together until the whole list is sorted. We won't look at code for this.

This sort runs in time n * log_{2} n. To see this, consider how
many merges we need to run before the list is fully sorted.

When we start, we have n lists of size 1, then we merge these into n/2
lists of size 2. This process takes n steps to merge the lists. Then we
merge these n/2 lists of size 2 into n/4 lists of size 4, again taking n
steps. We continue this process until we have just 1 list of size n.
Since the number of items in each list doubles each time we merge, after k
steps there will be n / 2^{k} lists left, each with 2^{k}
items in it.

With some algebra we find that when k = log_{2} n, there is only
one list left. Then, since we do log_{2} n merges, each of which
takes n steps, the total time is n * log_{2} n.

Heap sort works by "partially sorting" the list, known as heapifying it.

This process is a bit complicated, but in essence it puts the biggest element at the front of the list and makes it easy to get the next-biggest element. Then the full sort consists of heapifying the list, removing the largest element and putting it at the end of the list. We won't look at code for this either.

Heap sort also runs in n * log_{2} n time.

Quick sort works by picking a "pivot"—an element of the list— which the list is partitioned around.

This process moves all the elements that are smaller than the pivot element to its left, and all the elements that are greater than the pivot to its right. Then the left and right sub-lists are partitioned, and so forth until the size of the sublists are 1.

Quick sort also runs in n * log_{2} n time.

So far we've discussed six methods of sorting. Which one should you use? There are a couple ways of looking at this.

First, unless you're working on a very specific application, you're probably better off using the sort function that's built into whatever language you're using. This sort function is usually coded in a way that makes it fast and efficient. And it's been debugged, so you don't have to go through the hassle of writing your own sort.

However, there are some cases where you have knowledge about the list you're
dealing with that makes writing your own sort function worthwhile. For
example, suppose your list is already mostly sorted, but you have a few
items you want to add to it in a way that you end up with a sorted list.
You could just tack the new items on and use Heap sort to re-sort the list,
but this would take n * log_{2} n steps.

What if instead, you take the new items and use insertion sort to
put them in their place? This takes n steps per item, but if there
are less than log_{2} n new items to put into the list then
this is faster than a full heap sort.

Let's look at some sort animations to compare these well-known sorting algorithms.

Bubble | Selection | Insertion |
---|---|---|

Merge | Heap | Quick |

If these don't load properly in slidy, try this site

Python uses a sort know as Timsort, written by Tim Peters in 2002.

It's described by Wikipedia as:

"a hybrid sorting algorithm derived from merge sort and insertion sort,
designed to perform well on many kinds of real-world data."

a problem-solving tool

Problems:

- Find the mimimum & maximum of a list of values
- Recall from Lab 10 that an easy way to find the minimum and maximum is to sort the list of values.

>>> values = [7, 6, 9, 3, 4, 1] >>> values.sort() >>> minimum = values[0] >>> maximum = values[-1] >>> minimum 1 >>> maximum 9

- Sort the scores.
- lab score = sum of the scores at index 3 - 12

>>> scores = [3, 3, 3, 3.5, 3.5, 0, 3, 2, 3, 3.5, 3.5, 2, 0] >>> scores.sort() >>> labScore = 0 >>> size = len(scores) >>> for i in range(3, size): ... labScore += scores[i] ... >>> labScore 31.0 >>>

- Sort the scores.
- Find the average of scores at index 1 to len(list) - 2

>>> gymScores = [9, 8, 9, 9, 8, 10, 10, 9, 9, 8] >>> gymScores.sort() >>> total = 0 >>> size = len(gymScores) >>> for i in range(1, size - 1): ... total += gymScores[i] ... >>> average = float (total) / (size - 2) >>> average 8.875

- Sort the scores.
- Add items to the result list only when the item changes.

def uniq(aList): aList.sort() result = [] lastItem = None for item in aList: if item != lastItem: result.append(item) lastItem = item return result print uniq([1,2,4,2,3,5,2,1,3]) [1, 2, 3, 4, 5]

Suppose we have a list of things that we want to sort by something other than just the natural ordering—for example, if we had a list of strings that we want to sort by length instead of alphabetically. We could take this list and turn it into a list of things whose natural sort order puts the items of the original list into the order we want them in. Here's what it looks like in code:

>>> stringList = ["a", "apple", "hello", "trucks", "junk"] >>> decorated = [] >>> for item in stringList: ... decorated.append((len(item), item)) ... >>> print decorated [(1, 'a'), (5, 'apple'), (5, 'hello'), (6, 'trucks'), (4, 'junk')]

Now if we sort this list, the strings we started with are in order by length.

>>> decorated.sort() >>> print decorated [(1, 'a'), (4, 'junk'), (5, 'apple'), (5, 'hello'), (6, 'trucks')]

Then we can take this list and undecorate it (remove the counts and leave only the original strings) which will still be in sorted order by their length .

>>> undecorated = [] >>> for item in decorated: ... undecorated.append(item[1]) ... >>> undecorated ['a', 'junk', 'apple', 'hello', 'trucks']

This method, as suggested by the variable names we've used, is called

"decorate-sort-undecorate" or "the Schwartzian transform".

It's a common method of sorting by orders other than the default one.

Sort by non-default fields

The decorate-sort-undecorate method shown in the last slide works, but it's a little messy. It takes up temporary space as it does its work that is bigger than the original list of items. This can be a concern if a lot of items are present in the list. To alleviate this problem, Python (and many other languages) have built-in functionality that lets us do this non-default sort without the extra steps.

Suppose we are tracking how many students get a particular grade, and we want to sort the results in various ways. We've got a dictionary built like this:

grades = {'A': 13, 'B': 27, 'C': 11, 'D': 4, 'F': 0}

Now we want to display this data in two ways: first, sorted by grade. We
can extract the elements of the dictionary to a list by using the
`items` method, and then sort it:

>>> grades.items() [('A', 13), ('C', 11), ('B', 27), ('D', 4), ('F', 0)] >>> sorted(grades.items()) [('A', 13), ('B', 27), ('C', 11), ('D', 4), ('F', 0)]

So, we can easily sort by letter grade. But, suppose we wanted to see
which grades were most common?

In other words, we want to sort by the second part of each item. To do
this, we could write a function that gets the second part of an item, and
pass it to the sort function:

>>> def getSortKey(item): ... return item[1] ... >>> sorted(grades.items(), key = getSortKey) [('F', 0), ('D', 4), ('C', 11), ('A', 13), ('B', 27)]

Now the list is sorted by the second attribute of the items. This is
a common thing to do, so Python has a function that does the same thing
as `getSortKey`. We can load and use it like this:

>>> from operator import itemgetter >>> sorted(grades.items(), key = itemgetter(1)) [('F', 0), ('D', 4), ('C', 11), ('A', 13), ('B', 27)]

As you can see, the `itemgetter` function takes an argument that
specifies which attribute of the items it should return for comparison.

What would the earlier "sort strings by length" example look like with a
`key` function?

>>> stringList = ["apple", "a", "hello", "trucks", "junk"] >>> sorted(stringList, key = len) ['a', 'junk', 'apple', 'hello', 'trucks']

Sometimes it's not easy to come up with a function that generates a sort key
for a particular sorted order that we want. We can also use a `cmp`
function that will be given two items to compare. As an example, here's
what the default `cmp` function does with integers:

>>> cmp(5, 3) 1 >>> cmp(5, 5) 0 >>> cmp(5, 7) -1

So the default cmp function returns a positive number if the first item is
larger,

zero if they're equal, or a negative number if the first item is smaller.

Suppose we wanted to sort the `grades` dictionary as before, but
this time using a comparison function:

>>> def myCmp(a, b): ... if a[1] > b[1]: ... return 1 ... if a[1] < b[1]: ... return -1 ... return 0 ... >>> sorted(grades.items(), cmp = myCmp) [('A', 13), ('B', 27), ('C', 11), ('D', 4), ('F', 0)]

This may seem like a trivial example, since we could use a `key`
function. But some languages don't have an equivalent to the `key`
function, so we want to introduce many ways of accomplishing this task.

Let's try a more complicated example. Suppose we have a list of lists: each item in the list is itself a list of the answers a student gave on the midterm exam. We have a list of the correct answers, and we want to sort the students by the grade they got.

ANSWER_KEY = [ ... ] def byGrade(student1, student2): score1 = score2 = 0 for question in range(len(student1)): if student1[question] == ANSWER_KEY[question]: score1 += 1 if student2[question] == ANSWER_KEY[question]: score2 += 1 return cmp(score1, score2)

Three ways to get the list sorted in the *opposite* order:

- Suppose we want to sort a list in descending order: the big items come
first, then the smaller ones. The default sort in Python (and many other
languages) is an ascending-order sort, how could we go about getting the
items sorted in the opposite order? We could sort the list and then use
the
`reverse`method:>>> aList = [4,1,6,3,8,2,5] >>> aList.sort() >>> aList [1, 2, 3, 4, 5, 6, 8] >>> aList.reverse() >>> aList [8, 6, 5, 4, 3, 2, 1]

This works, but it doesn't feel quite right; we just spent a lot of time putting it in one order, then we go back and swap everything around. - Instead, we can define a comparator function that puts things in backwards
order:
>>> def reverseCmp(a, b): ... if a[1] > b[1]: ... return -1 ... if a[1] < b[1]: ... return 1 ... return 0 >>> aList.sort(cmp = reverseCmp) >>> aList [8, 6, 5, 4, 3, 2, 1]

This, too, works, but there's one more thing we could do to optimize it. - Python can take another argument
`reverse`to its sort function. If we give`True`as an argument, then the list is sorted in descending order.>>> aList.sort(reverse = True) >>> aList [8, 6, 5, 4, 3, 2, 1]