Norvig's Spelling Corrector

Peter Norvig has a great example of a simple Python program that solves a common problem -- how to suggest possible corrections for a word that appears to be misspelled. We're all familiar with how useful this is when we use a Web search service like Google. Norvig's sample program does a pretty good job in just 21 lines of Python.

Look at his page, How to Write a Spelling Corrector, download the sample code and data, uncompress the file big.txt.Z, and experiment with it.

There are several things to note:

  • He makes extensive use of the list comprehension notation.
  • He follows the convention of including code to run standard tests when the file is called as a script, although he does not use the unittest module
  • He imports the collections module which has alternate definitons for common collection types (lists, tuples, dicts) that have better performance and additional features. He's using it for a dictionary that has a built in function (lambda: 1) to return a default value for a key that is not in the dictionary.
    >>> import collections
    >>> model = collections.defaultdict(lambda: 1)
    >>> model['foo']
    1
    Note that lambda: 1 is just a function of nor arguments that always returns 1.
  • He uses the set datatype as an easy way to take a list and remove any duplicates and then relies on the fact that the empty set is considered False and that the or operator returns the value of its first leftmost non-False argument.
    >>> l1 = [3,1,2,1,3,5,4,4,1]
    >>> s1 = set(l1)
    >>> s1
    set([1, 2, 3, 4, 5])
    >>> set([]) or s1
    set([1, 2, 3, 4, 5])
    
  • Once he has a set of candidate corrections, he selects the one that occurred most frequently in his training text by passing an optional key argument to the max function.
    return max(candidates, key=lambda w: NWORDS[w])