Pavlov and the Prisoner's Dilemma [Protected Link] Pavlov and the Prisoner's Dilemma

<-Back

@Article{kraines-1989a,
  author         = {David Kraines and Vivian Kraines},
  title          = {Pavlov and the Prisoner's Dilemma},
  year           = {1989},
  volume         = {26},
  value          = {ab},
  review-dates   = {2005-02-26},
  pages          = {47--79},
  read-status    = {reviewed},
  journal        = {Theory and Decision},
  hardcopy       = {no},
  key            = {kraines-1989a}
}

Summary

"Our Pavlov is learning by conditioned response, through rewards and punishment, to cooperate or defect."

"Keywords: Game theory, prisoner's dilemma, Markov chain, evolution of cooperation."

Bottom line:

Payoffs altered, but similar to standard payoffs. Key distinction is that CD and DC payoffs do not match original values.

This paper payoffs: CC= 1, 1; CD=-2, 2; DC= 2,-2; DD=-1,-1
Adjusted paper pay: CC= 3, 3; CD= 0, 4; DC= 4, 0; DD= 1, 1
Standard payoffs..: CC= 3, 3; CD= 0, 5; DC= 5, 0; DD= 1, 1

So R+P = T in this game, whereas normally R+P < T. I don't remember any key properties in this payoff relationship mentioned elsewhere (such as constraint in IPD that P+T < 2R).

[Note: But later it looks like the payoffs have been adjusted to C=0,0. Especially note page 63, where C 1.0 state leads to tie 0,0 payoff, which would seem to be ideal mutual cooperation, yet that would map to 3,3 in normal payoffs (0,0) -> (3,3).]

Names of strategies in this paper.

"*'d strategies can train Pavlov to cooperate with certainty."

PT knows exactly the probability that Pavlov will cooperate or defect. It is thus always (due to analysis) in its best interest to do the same as Pavlov. Unless Pavlov's learning rate is such n > ~30 (learning rate < ~1/30).

NT guesses as to whether p > 1/2 by looking at Pavlov's recent plays. f(p) is the prob. that NT guesses that p >= 1/2. Claim is that f(0)=0, f(1/2)=1/2, and f(1)=1. [I don't think f(1/2) necessarily is exactly 1/2, though distribution centered there, and if information is only a short streak, then errors of f(0.5)=1 can occur with finite probability]. The assumption goes further, that f(p) = p, for simplicity. [also not a good assumption for short runs, but I get the idea.] NT's best strategy is to approximate PT's strategy with available information. [The problem is that fluctuations due to limited sampling of NT will mean the assumption f(p) = p will be wrong in a significant proportion of cases and lead to longer training... or so I would assert.]

In any event, the tables indicate that NT takes much longer to train Pavlov than PT.

Kraines^2 model the IPD as a finite Markov chain. States represent the probability that Pavlov will cooperate. They use matrix algebra to produce the training time and cost for the differing opponents.

Did Kraines^2 ever describe why the payoff is only weighted at 1 for CC and DD, but 2 for CD and DC? Are there other more optimal values? No explicit description, but given the rest of the paper, it makes the analysis easier, especially when talking of the interaction with PT and the training progression w.r.t. expected reinforcement ER around p = 1/3 and p = 2/3. The relative strength of the reinforcements also bulwark against any thought of a PT swindling Pavlov and then retraining, unless Pavlov is a *very* slow learner (phase shift where n>~30)

This Pavlov-N scheme is more than memory-one, but updates only take the most recent move and state is minimal (single value).

TFT is successful against rational opponents, but inefficient against Pavlov. Authors hint here at the ability to forgive an occasional slip. "One should be more forgiving of the occasional slip by the basically cooperative opponent and more strict toward the habitual defector. An understanding of the degree of rationality as opposed to conditional response of the opponent is essential in designing an effective strategy."


Key Factors

Relations to Other Work:

Problem Addressed: Explaining how a very simple mechanism can beat other strategies, and indicate how trainable it is with other common strategies that could lead to cooperation. Aim is to provide a plausible mechanism

Main Claim and Evidence: Authors establish that Pavlov is better than TFT when matched. Does well with itself and trainers that understand it. Evidence comes from finite Markov chain analysis and other analysis. Some simplifying assumptions may weaken claim w.r.t. Normal Trainer, but Pavlov-Pavlov head to head match-ups indicate strength of approach. [Later papers prove Pavlov-n is effective in noisy environments].

Assumptions:

Next Steps: No explicitly noted by authors.

Remaining Open Questions:


Quality

Originality is outstanding.
Contribution/Significance is outstanding.
Quality of organization is excellent.
Quality of writing is excellent.
<-Back