Eliminating Duplicate Answers in Prolog
I was recently asked
Say I have the sister relation defined like this:sister(X,Y) :- female(X),parent(P,X),parent(P,Y),\+ X == Y.And say I have a DB full pairs that Prolog can figure out are sisters. If I try to find all the sisters using a query like this:?- sister(X,Y).I get each pair in each order. Is there any way to make the query such that all of the pairs of sisters are listed only once, in whichever order Prolog finds first?
This is a classic problem that's also common in a databases query context (SQL is also very declarative). The problem gets right to the issue of whether we want to program declaratively or procedurally. Additional complications are introduced by the desire to write Prolog predicates that can serve either to prove that a relation holds between values or to generate values for which a given relation holds. I tried to fire off a quick and hopefully helpful response to the question, but it spun out of control. To quote Piet Hein, "Problems worthy. of attack. prove their worth. by hitting back."
Let's assume a database about the Simpsons:
parent(homer,bart). parent(marge,bart). male(bart). parent(homer,lisa). parent(marge,lisa). female(lisa). parent(homer,maggie). parent(marge,maggie). female(maggie).
There are two issues involved. One is that the sister/2 predicate can prove every ordered pair twice, since two people are sisters if the first is female and they share a common parent and a pair might share two common (biological) parents. The second is that you only want to see a given pair once in the output. Sister may be a poor example of this problem since sister(X,Y)<=/=>sister(Y,X), so knowing that Lisa is the sister of Bart tells us nothing about whether or not Bart is also the sister of Lisa.
But both of these problems are in full force if we consider the related predicate sibling/2, which is, of course, symmetric.
The road to a solution
The simple definition of sibling is:
sib1(X,Y) :- parent(P,X), parent(P,Y), \+X=Y.
For the Simpsons database, solutions of sib1(X,Y) include sib1(bart,lisa) twice as well as sib1(lisa,bart) twice. What follows is an evolutionary path getting to a predicate that (i) generates each answer, but only once and (ii) also only includes a pair in only one order. I put the code and some test data in online and verified that it works in Sicstus Prolog.
Fix one. We first try to fix this problem with a cut to prevent generating alternate proofs:
sib2(X,Y) :- parent(P,X), parent(P,Y), (\+X=Y), !.but this prevents us from using it to generate pairs of siblings by calling it with two uninstantiated variables, as can be seen from this session. Note:
Fix two. The "right" way to solve this problem is to use Prolog's setof/3 predicate to first generate a set of all of the provable results, and then offer up the results, one at a time, using member/2.
sib3(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), \+X=Y), Sibs), member((X,Y), Sibs).sib3 will generate all ordered pairs for which a sibling relation holds, but it still generates a pair twice, once in each order.
Fix three. A standard trick (used in databases, too) is to add a constraint that will only generate one of the pairs at the expense of the other. If the values were strings, we might only select the pair where the names were in alphabetical order. We can try this in Prolog using the term comparison operator @< which is true when one term comes before the other in 'standard order'.
sib4(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), X@<Y), Sibs), member((X,Y), Sibs).Which works if sib4 is called with two variables, but fails when it shouldn't if some variables are instantiated. sib4(lisa,bart) fails as does sib4(maggie,X). writing a Prolog predicate that works efficiently for all combinations of instantiated and uninstantiated input arguments is tricky.
Final fix. So the final attempt at a solution needs to go something like this. (i) generate all results, with duplicates; (ii) remove duplicates; (iii) generate as answers those remaining pairs for which there is not also a similar pair whose args aren't in standard order.
sib5(X,Y) :- setof((X,Y), P^(parent(P,X),parent(P,Y), \+X=Y), Sibs), member((X,Y), Sibs), \+ (Y@<X, member((Y,X), Sibs)).sib5/2 works with any combination of instantiate/uninstantiated arguments.
12-Mar-2006 2:27 PM