Cover page images (keyboard)

Introduction to Sorting

Sue Evans & Will Murnane



Learning Outcomes

Sorting

Suppose we have a list of numbers and we want to put them in increasing order. How could we go about this?

Well, we could start at the beginning of the list, then when we see two things that are out of order, swap them so they're in order. Then we go back to the beginning of the list. When we get all the way to the end of the list without doing any swaps we're done. Here it is in code:

def swap(aList, i, j):

  temp = aList[i]
  aList[i] = aList[j]
  aList[j] = temp

def mySort(aList):

  index = 0
  size = len(aList)

  while index < size - 1:

    if aList[index] > aList[index + 1]:

      # print the list to see how it changes
      print aList 

      swap(aList, index, index + 1)
      index = 0

    else:
      index = index + 1

and here's an example execution:

>>> a=[3,45,2,1,6,7]; mySort(a); print a
[3, 45, 2, 1, 6, 7]
[3, 2, 45, 1, 6, 7]
[2, 3, 45, 1, 6, 7]
[2, 3, 1, 45, 6, 7]
[2, 1, 3, 45, 6, 7]
[1, 2, 3, 45, 6, 7]
[1, 2, 3, 6, 45, 7]
[1, 2, 3, 6, 7, 45]

We can see how the numbers sort of "bubble" from right to left. This method is called "bubble sort".

Selection Sort

We start out with an unsorted list. The idea is to select the item that should be next in the sorted portion of the list from the items in the unsorted part of the list. If we want our sorted list to be in ascending order, then the next thing in the sorted part of the list will be the minimum value in the unsorted part of the list.

# File:         selection.py
# Author:       Sue Evans
# Date Written: 11/12/09
# Section:      All
# EMail:        bogar@cs.umbc.edu
#
#   This program sorts a list of numbers using 
#   selection sort.


# selectionSort() selects the smallest value
# in the unsorted portion of the list and
# moves it into the current position.  The
# values "swap" positions. 
# Input:    a list to be sorted
# Output:   None, but the list is sorted "in place"
def selectionSort(aList):

   size = len(aList)

   for unsorted in range(size):

      # get the index of the smallest value 
      # in the unsorted part of the array 
      smallestIndex = findSmallest(aList, unsorted, size)

      # swap values
      temp = aList[smallestIndex]
      aList[smallestIndex] = aList[unsorted]
      aList[unsorted] = temp
      print aList


# findSmallest() finds the smallest value in the
# list between start (an index), and stop (an index)
# inclusive, and returns the smallest value's index.
#
# Input:  a list to search
#         the 'start'ing and 'stop'ping indices in
#         the list between which to search for the smallest
# Output: returns the index corresponding to the smallest value
def findSmallest(aList, start, stop):

   smallestIndex = start
   smallestValue = aList[start]

   # look for the smallest value in the 
   # unsorted part of the array 
   for i in range(start + 1, stop):
      if (aList[i] < smallestValue):
         smallestIndex = i
         smallestValue = aList[i]

   return (smallestIndex)


def main(): 

   aList = [3, 45, 2, 1, 6, 7]

   size = len(aList)

   selectionSort(aList)
      
   print "\nIn main():"
   for i in range(size):
      print "aList[%d] = %d" % (i, aList[i])

main()

and a sample execution:

ite207-pc-01.cs.umbc.edu[134] python selection.py
[1, 45, 2, 3, 6, 7]
[1, 2, 45, 3, 6, 7]
[1, 2, 3, 45, 6, 7]
[1, 2, 3, 6, 45, 7]
[1, 2, 3, 6, 7, 45]
[1, 2, 3, 6, 7, 45]

In main():
aList[0] = 1
aList[1] = 2
aList[2] = 3
aList[3] = 6
aList[4] = 7
aList[5] = 45
ite207-pc-01.cs.umbc.edu[135] 

Sort Exercise

Now that we know how to sort items in ascending order using Selection Sort, how would you change the code so that it sorts in descending order instead ?


# findLargest() finds the largest value in the
# list between start (an index), and stop (an index)
# inclusive, and returns the largest value's index.
#
# Input:  a list to search
#         the 'start'ing and 'stop'ping indices in
#         the list between which to search for the largest
# Output: returns the index corresponding to the largest value
def findSmallest(aList, start, stop):

   largestIndex = start
   largestValue = aList[start]

   # look for the largest value in the 
   # unsorted part of the array 
   for i in range(start + 1, stop):
      if (aList[i] > largestValue):
         largestIndex = i
         largestValue = aList[i]

   return (largestIndex)


# selectionSort() selects the largest value
# in the unsorted portion of the list and
# moves it into the current position.  The
# values "swap" positions. 
# Input:    a list to be sorted
# Output:   None, but the list is sorted "in place"
def selectionSort(aList):

   size = len(aList)

   for unsorted in range(size):

      # get the index of the largest value 
      # in the unsorted part of the array 
      largestIndex = findLargest(aList, unsorted, size)

      # swap values
      temp = aList[largestIndex]
      aList[largestIndex] = aList[unsorted]
      aList[unsorted] = temp
      print aList


def main(): 

   aList = [3, 45, 2, 1, 6, 7]

   size = len(aList)

   selectionSort(aList)
      
   print "\nIn main():"
   for i in range(size):
      print "aList[%d] = %d" % (i, aList[i])

main()

Let's run it!

linuxserver1.cs.umbc.edu[109] python descending.py
[45, 3, 2, 1, 6, 7]
[45, 7, 2, 1, 6, 3]
[45, 7, 6, 1, 2, 3]
[45, 7, 6, 3, 2, 1]
[45, 7, 6, 3, 2, 1]
[45, 7, 6, 3, 2, 1]

In main():
aList[0] = 45
aList[1] = 7
aList[2] = 6
aList[3] = 3
aList[4] = 2
aList[5] = 1
linuxserver1.cs.umbc.edu[110]

Insertion Sort

Another method of sorting, called Insertion Sort, takes whatever item it finds as the first item in the unsorted part of the list, and puts it in its proper place in the sorted part of the list. Here's the code:

# File:         insertion.py
# Author:       Sue Evans
# Date Written: 11/12/09
# Section:      All
# EMail:        bogar@cs.umbc.edu
#
# This program sorts a list of numbers using 
# insertion sort.


# insertionSort() gets the next value in the
# unsorted portion of the list and moves it
# into its correct sorted position.
# Input:    a list to be sorted
# Output:   None, but the list is sorted
def insertionSort(aList):

   size = len(aList)

   for i in range(1, size):
 
      j = i
      temp = aList[j]

      while j > 0 and aList[j - 1] > temp:

         aList[j] = aList[j - 1]
         j -= 1
         
      aList[j] = temp
      print aList


def main(): 

   aList = [3, 45, 2, 1, 6, 7]

   size = len(aList)

   insertionSort(aList)
      
   print "\nIn main():"
   for i in range(size):
      print "aList[%d] = %d" % (i, aList[i])

main()

and an example execution:

ite207-pc-01.cs.umbc.edu[149] python insertion.py
[3, 45, 2, 1, 6, 7]
[2, 3, 45, 1, 6, 7]
[1, 2, 3, 45, 6, 7]
[1, 2, 3, 6, 45, 7]
[1, 2, 3, 6, 7, 45]

In main():
aList[0] = 1
aList[1] = 2
aList[2] = 3
aList[3] = 6
aList[4] = 7
aList[5] = 45
ite207-pc-01.cs.umbc.edu[150] 

Runtime of these sorts

All of these sorts have been slow.

So far, all the sorts we've seen have one thing in common:
given a list of n items, they do something like n2 operations.

To see this, consider an execution of Insertion Sort with a list of n items. The outer loop goes from 0 to n-1, and each time that loop is executed, the inner loop goes from 0 to index. After doing a little math, we find that this algorithm executes in about n2 steps. Can we do better than this?

Merge sort

Merge sort works by treating each element of the list as its own sorted list (since there's only one element, it must be sorted), then recursively merges pairs of sorted lists together until the whole list is sorted. We won't look at code for this.

This sort runs in time n * log2 n. To see this, consider how many merges we need to run before the list is fully sorted.

When we start, we have n lists of size 1, then we merge these into n/2 lists of size 2. This process takes n steps to merge the lists. Then we merge these n/2 lists of size 2 into n/4 lists of size 4, again taking n steps. We continue this process until we have just 1 list of size n. Since the number of items in each list doubles each time we merge, after k steps there will be n / 2k lists left, each with 2k items in it.

With some algebra we find that when k = log2 n, there is only one list left. Then, since we do log2 n merges, each of which takes n steps, the total time is n * log2 n.

Heap sort

Heap sort works by "partially sorting" the list, known as heapifying it.

This process is a bit complicated, but in essence it puts the biggest element at the front of the list and makes it easy to get the next-biggest element. Then the full sort consists of heapifying the list, removing the largest element and putting it at the end of the list. We won't look at code for this either.

Heap sort also runs in n * log2 n time.

Quick sort

Quick sort works by picking a "pivot"—an element of the list— which the list is partitioned around.

This process moves all the elements that are smaller than the pivot element to its left, and all the elements that are greater than the pivot to its right. Then the left and right sub-lists are partitioned, and so forth until the size of the sublists are 1.

Quick sort also runs in n * log2 n time.

What's the best sort?

So far we've discussed six methods of sorting. Which one should you use? There are a couple ways of looking at this.

First, unless you're working on a very specific application, you're probably better off using the sort function that's built into whatever language you're using. This sort function is usually coded in a way that makes it fast and efficient. And it's been debugged, so you don't have to go through the hassle of writing your own sort.

However, there are some cases where you have knowledge about the list you're dealing with that makes writing your own sort function worthwhile. For example, suppose your list is already mostly sorted, but you have a few items you want to add to it in a way that you end up with a sorted list. You could just tack the new items on and use Heap sort to re-sort the list, but this would take n * log2 n steps.

What if instead, you take the new items and use insertion sort to put them in their place? This takes n steps per item, but if there are less than log2 n new items to put into the list then this is faster than a full heap sort.

What's the fastest sort?

Let's use a sort animation posted by Professor Dr. Robert Stärk to compare these well-known sorting algorithms.

What does Python use ?

Python uses a sort know as Timsort, written by Tim Peters in 2002.

It's described by Wikipedia as:

"a hybrid sorting algorithm derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data."