## Eliminating Duplicate Answers in Prolog## The QuestionI was recently asked
## The ProblemThis is a classic problem that's also common in a databases query context (SQL is also very declarative). The problem gets right to the issue of whether we want to program declaratively or procedurally. Additional complications are introduced by the desire to write Prolog predicates that can serve either to prove that a relation holds between values or to generate values for which a given relation holds. I tried to fire off a quick and hopefully helpful response to the question, but it spun out of control. To quote Piet Hein, "Problems worthy. of attack. prove their worth. by hitting back." Let's assume a database about the Simpsons: parent(homer,bart). parent(marge,bart). male(bart). parent(homer,lisa). parent(marge,lisa). female(lisa). parent(homer,maggie). parent(marge,maggie). female(maggie). There are two issues involved. One is that the sister/2 predicate can prove every ordered pair twice, since two people are sisters if the first is female and they share a common parent and a pair might share two common (biological) parents. The second is that you only want to see a given pair once in the output. Sister may be a poor example of this problem since sister(X,Y)<=/=>sister(Y,X), so knowing that Lisa is the sister of Bart tells us nothing about whether or not Bart is also the sister of Lisa. But both of these problems are in full force if we consider the related predicate sibling/2, which is, of course, symmetric. ## The road to a solutionThe simple definition of sibling is: sib1(X,Y) :- parent(P,X), parent(P,Y), \+X=Y. For the Simpsons database, solutions of sib1(X,Y) include sib1(bart,lisa) twice as well as sib1(lisa,bart) twice. What follows is an evolutionary path getting to a predicate that (i) generates each answer, but only once and (ii) also only includes a pair in only one order. I put the code and some test data in online and verified that it works in Sicstus Prolog.
sib2(X,Y) :- parent(P,X), parent(P,Y), (\+X=Y), !.but this prevents us from using it to generate pairs of siblings by calling it with two uninstantiated variables, as can be seen from this session. Note:
sib3(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), \+X=Y), Sibs), member((X,Y), Sibs).sib3 will generate all ordered pairs for which a sibling relation holds, but it still generates a pair twice, once in each order.
sib4(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), X@<Y), Sibs), member((X,Y), Sibs).Which works if sib4 is called with two variables, but fails when it shouldn't if some variables are instantiated. sib4(lisa,bart) fails as does sib4(maggie,X). writing a Prolog predicate that works efficiently for all combinations of instantiated and uninstantiated input arguments is tricky.
sib5(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), \+X=Y), Sibs), member((X,Y), Sibs), \+ (Y@<X, member((Y,X), Sibs)).sib5/2 works with any combination of instantiate/uninstantiated arguments. ## Parting thoughts- sib5 is not as efficient as it could be. But making it more so would further obscure it. This is one unseemly aspect of logic programming. But maybe we're being unfair since it's generally true in most programming languages.
- sib5 is fundamentally inefficient in that it needs to find all possible answers before giving you even the first. It would be very bad if, for example, you wanted to generate 10 pairs of siblings from the IRS database of 200M people. There are probably LP languages that have addressed this general problem, btw, using some notion of streams.
- sib1 is elegant and sib5 is not. This is typical: simple Prolog programs can be beautiful, but it's hard to hold onto the beauty as we ask more of the programs. I think this is the world's fault, not ours. It was that way when I got here, in any case.
- This is a good example, even for an overview of Prolog, that points out some of complexities that arise in Prolog and most LP languages.
12-Mar-2006 2:27 PM |