CMSC 471

Artificial Intelligence -- Fall 2009

HOMEWORK FOUR
out 10/15/09; due 10/29/09

Note: I will be generous about granting extensions on the Lisp part of this homework if you ask in advance, since I know you are likely to be studying for the midterm. To make the grading easier on Denise and me, please submit the assignment as follows:
  1. Submit Parts I and II (the written assignments) as hardcopy in class. No "generous extensions" will be granted for these sections; the usual rules apply.
  2. Submit the Lisp code for Part III as "hw4" on the gl machines.
  3. Submit the commentary for Part III as hardcopy. You do NOT need to print and submit your Lisp code.

I. Logic (20 points)

(a) Russell & Norvig Exercise 7.11 (a). (5 pts)

(b) Russell & Norvig Exercise 8.6 (a,b,c,i,j). (15 pts)

II. Resolution Theorem Proving (30 points)


(a) (8 points) Represent the following knowledge base in first-order logic.  Use the predicates

where arguments x have the domain of all people, and arguments t have the domain of all tests.
  1. Everyone who is smart, studies, and attends class will be prepared.
  2. Everyone who is prepared will pass any test that is fair.
  3. A student passes a test if and only if they don't fail it.
  4. Every UMBC student is smart.
  5. If a test isn't fair, everyone will fail the test.
  6. Aidan is a UMBC student.
  7. Sandy passed the 471 exam.
  8. Aidan attends class.

(b) (8 points) Convert the KB to conjunctive normal form.

(c) (2 points) We wish to prove that

study(Aidan) -> pass(Aidan, 471-exam)

Express the negation of this goal in conjunctive normal form.

(c) (12 points) Add the negated goal to the KB, and use resolution refutation to prove that it is true. You may show your proof as a series of sentences to be added to the KB or as a proof tree.  In either case, you must clearly show which sentences are resolved to produce each new sentence, and what the unifier is for each resolution step.

III. WalkSAT (50 points)

(a) Lisp Implementation (25 points). Implement the WalkSAT algorithm (shown as pseudocode in Figure 7.17 in Russell and Norvig) in Lisp. You may use the partial implementation that I have provided (see walksat-stubs.lisp and walksat-utils.lisp); or you may write your own implementation from scratch; or you may use part of my code but not all of it. As explained above, the Lisp code does not need to be printed out; you just need to submit it on gl. Also, for this assignment, you are hereby let off the hook for documentation and good design: I will give full credit for a working implementation. (And I expect that most of you will have a working implementation, since otherwise it will be basically impossible to do the second part. I implemented everything -- including the utility functions I'm giving you -- from scratch in less than an hour.)

NOTE: You may, if you wish, work in pairs on the Lisp implementation. However, the commentary should be done independently. If you work in pairs, only one of you needs to submit the code on gl, but you must clearly indicate on your commentary who you worked with, and which account the code is submitted under.

(b) Commentary (25 points). This report should be printed out as hardcopy and submitted in class. It should include two parts:

  1. Log file showing your code running on the test cases. The file benchmarks.lisp contains three simple test cases (*clauses0*, *clauses1*, and *clauses2*), along with two benchmark problems from the SATLIB repository. You should use the "script" command to generate a log file on the gl machines. Once you've started the script, you should do the following things:
    1. Start clisp.
    2. Load your code (along with the benchmark file and any other files you need for your code to work).
    3. Run your walksat algorithm once (or more if needed) on *clauses0*, *clauses1*, and *clauses2* to show that the code works. Set the parameters however you wish so that the algorithm will converge on a solution; if it does not converge the first time, it's fine to run it more than once.
    4. Run your walksat algorithm at least three times on each of the two benchmark problems (*b20* and *b50*) to show that your implementation works, and to demonstrate the stochastic nature of the algorithm (i.e., that it generates different results when rerun).
    5. Exit clisp.
    6. Type ^D or "exit" to stop the script.
    7. Print and turn in the log file (default name: "typescript") as the first part of your commentary. No remarks are needed unless you want to clarify anything.
  2. Analysis of parameters. Pick one (or both) of the benchmark problems, or any set of one or more benchmark problems that you'd like to download from the SATLIB repository. Generate data that will let you answer experimentally at least one of the following questions:
    1. How does the value of the probability p affect the performance (number of steps taken and/or CPU time) of your implementation, for a fixed-size problem? (For example, you might set p to 0.0, 0.1, 0.2, ..., 0.9, 1.0; set max-flips to a large value to ensure convergence, and collect data on how many steps are needed to solve *b20* at each probability value.)
    2. What is an appropriate value of the max-flips parameter for a particular problem(s) and a particular value of p? (For example, you might try different max-flips values (e.g., 50, 100, 150, ...; or a log scale: 16, 32, 64, ...) with *b50* to determine the minimum value that gives a 90% probability of finding a solution. For this question, you would probably want to do more than 5 runs for each setting so that you can get a reasonably fine-grained estimate of the probability of finding a solution. Obviously, if you do only 5 runs, you won't have a granularity of more than 20%.)
    3. How does the number of steps vary as the problem size increases? (For this question, you may want to look through the benchmarks to get a wider range of problem sizes than I've included. Another possibility is to take *b50* and "subset" the clauses to generate a series of successively smaller problems, with the number of clauses = 50, 100, 150, ....)
    Each data point should be the average of at least five runs of your algorithm. (For example, if you're going to test your algorithm on *b20* with the parameters p=0.5 and max-flips=100, you should run this five times and then compute the average number of steps across the five runs.)

    You should generate at least one graph to show your results. For example, for question 1, you might have a scatterplot with p values along the x axis and number of steps along the y axis. For question 2, you might have a bar plot with max-flips values on the x axis and probability of solution on the y axis.

    You should have at least one paragraph describing the results and what they tell you about the algorithm and its parameters.

    Please note that this analysis is really just meant to be a simple, informal study of how the program works. It's not meant to be a systematic or time-consuming investigation. Of course, if you get interested in these questions and want to explore more benchmarks, or multiple questions, I'm delighted to see that -- but you can get full credit if you just put together a basic set of results that meets the guidelines.