image of bot

UMBC CMSC 471 02 Spring 2021
Introduction to Artificial Intelligence

HW2: Cats and Dogs

out 2/4, due 2/18

click to accept your repository

This short assignment will give you a bit of experience in developing heuristics for Algorithm A*.

1 The DOGCAT Game

DOGCAT is a simple word game that I learned from Jeff Shrager, who was the TA when I first taught an AI course in 1980. We used it for a Lisp programming assignment.

You are given two words with the same number of letters (we will be using three and four letter words) in each, e.g. DOG and CAT. Your goal is to transform the first word into the second by replacing one letter at a time with any other letter as long as the result is a proper English word. For example, we could change DOG to FOG but not to GOG (not a proper English word) or to GOD (two letters were changed). Thus, one way of getting from DOG to CAT might be:

DOG => COG => COT => CAT

There are many ways to make most translations. Here is another way:

DOG => HOG => HOT => HAT => CAT

Some examples are very hard to do. Try changing WHY to ASK.

The game can be generalized to words of other lengths and we can also make the problem more interesting by defining other properties of the transformation we want to optimize. For this assignment, we'll work with words of three or four characters. We will define three cost measures:

  1. The simplest cost measure we will call steps. It is just the number of steps in the sequence from the initial word to the goal word. The cost of DOG => COG => COT => CAT is just three.
  2. The second one, scrabble, associates a cost with using a new letter as a replacement based on the scrabble game.
    • The cost of replacing any letter of a word by a A, E, I, O, U, L, N, S, T or R is one.
    • The cost of using a D or G is two.
    • The cost of replacing a letter with a B, C, M or P is three.
    • The cost of using a F, H, V, W or Y as a replacement is four.
    • Using a K costs five.
    • Using a J or a X costs six.
    • The cost of replacing a letter with a Z or Q is ten.
    Thus the transformation DOG => COG => COT => CAT costs 3 + 1 + 1 or five.
  3. The third cost measure is based on how common the words after the initial one is. The idea is that going from CAM to a very rare word (e.g., CWM) should cost more than going to a common word like CAT. We've provided a list of all of the "legal" English three and four letter words along with a measure of the relative frequency of each. The cost of the sequence DOG => JOG => JOT => COT => CAT is computed as 1+R for each step where R is the is the measure of how rare the word introduced by the step is. Looking at the file in your repository words34.txt.gz with the Unix zmore command we find entires including the following:
    jog  10.052741
    jot  11.177437
    cot  10.071238
    cat   6.886906
    so the cost is 4 + the sum of the scores, or 42.188322.

2. What to do

Start  by accepting your GitHub Classroom repository.  This will contain the starter file dc.py that you will need to complete as well as the AIMA Python required files search.py and utils.py files.

Then study the aima search.py file. You should look at the Problem class and its methods, in particular, and the search algorithms.

Write a generic version of a DC class that solves instances of the dog_cat problem given an initial and final word using the aima code for astar_search. We've created a stub file for dc.py, which is the only file you need to complete. To complete this stub, you must finish the DC class by completing the init, actions, result, goal_test, h and repr methods. You will probably need to write some auxiliary functions as well.

Your heuristic h function must be admissible, that is, always returning an estimate that is less than or equal to the true value. It should not be the trivial estimate of 0, of course. Note that both your path_cost and h methods will have to be sensitive to the problem's cost parameter.

We've provided you with dictionary of all of the legal English three and four letter words along with their rarity measures. See this datafile words34.txt.gz and the code in the dc.py stub that loads it:

import gzip
...
dict_file = "words34.txt.gz"
dictionary = {}
for line in gzip.open(dict_file): word, n = line.strip().split('\t') dictionary[word] = float(n)

3. Testing your code

Once you've written your dc.py file you can test it with dcsolve.py and dctest.py. The dcsolve.py script takes an initial and goal word and an optional cost measure and shows the solution found by our dc.py problem along with the cost and time taken.

HW2> python dcsolve.py dog cat
steps 3:   dog cog cot cat (0.0020)

HW2> python dcsolve.py bear duck steps
steps 4:   bear beak beck deck duck (0.0019)

HW2> python dcsolve.py bear duck scrabble
scrabble 11: bear beak beck deck duck (0.0034)

HW2> python dcsolve.py bear duck frequency
frequency 38: bear beak beck buck duck (1.4166)
The dctest.py script takes no arguments and runs a number of examples using each of the three cost measure. See this example of its use. Note that your program may not produce exactly he same answers, since the A* algorithm with an admissible heuristic finds one of shortest solutions to a problem -- there may be several that are equally short.

What to hand in

The assignment is due by midnight on Thursday, February 18.  After you have written, debugged and tested your dc.py file, you should do the following.

  • Answer the questions in the file qestions.md in your repository 
  • Commit and push the files questions.md and dc.py  

Background reading