Learning to Cooperate with Pavlov: An adaptive strategy for the Iterated Prisoner's Dilemma with Noise [Protected Link] Learning to Cooperate with Pavlov: An adaptive strategy for the Iterated Prisoner's Dilemma with Noise

<-Back

@Article{kraines-1993a,
  author         = {David Kraines and Vivian Kraines},
  title          = {Learning to Cooperate with Pavlov: An adaptive strategy for the Iterated Prisoner's Dilemma with Noise},
  year           = {1993},
  volume         = {35},
  value          = {aa},
  review-dates   = {2005-02-26},
  pages          = {107--150},
  read-status    = {reviewed},
  journal        = {Theory and Decision},
  hardcopy       = {no},
  key            = {kraines-1993a}
}

Summary

Pavlov is successful against other strategies, even those that are not rational or are not cooperative. It is successful where TFT is and in more cases. However, it is not ideal by Kraines' criteria of an ideal strategy since it a) can be exploited, esp. early in a limited game, and b) may exploit others.

Denotes Pavlov variants as Pn, where P3 and P4 are the most generally recommended strategies. P3 and P4 are fast learners, but are not hopelessly prone to reactionary defections such as TFT and lower-order Pavlovs (P1 and P2). The article then elaborates on this, indicating P(k,n) as denoting fully specified strategy, where k is the initial determinant of the cooperation probability which is k/n.

The chance of meeting again, or discount rate, is again specified as w.

Kraine's definition of an ideal strategy will (p 114):

  1. Approach average score of mutual cooperation (+1; Kraines have diff. payoffs, see kraines-1989a review)
  2. In long run, average of differences in payoffs should be 0 or in the player's favor
  3. It will quickly recover from noise (perceived defections); can re-learn cooperation

Kraine's definition of a fraternal strategy is (p 115) that given a sequence of outcomes, that after some finite period they will reach mutual cooperation. [also defines clan]. All-C and TFT are fraternal.

All-D outscores PT in the short-run (p 135, p 136).

Notes prior applications / simulations:

According to patchen-1987a, [exploitation usually occurs in the absence of retaliation.] (quote or paraphrase?)

Notes that TFT can never outscore any opponent and does poorly against its clone in a noisy environment. TFT is not fraternal with itself (presumably under noise, that is, or a variant with initial defection).

Abstract claims that Pavlov is not forgiving because it "will exploit altruistic strategies until punished by mutual defection." [But it is forgiving if it doesn't learn to fast and cooperation has already been established]. Can think of:

Pavlov can be better than NT, as NT assumes learning possible and will fail to exploit All-C or Random [hey, but that wouldn't be "ideal" would it?;)]

Recap of strategies discussed:

This article has an appendix describing Markov chain methods used.


Key Factors

Relations to Other Work:

Problem Addressed: What is a natural model for real-life conflict-of-interest encounters and will lead to mutual cooperation? How does one tune P(k,n) to make it a nearly ideal cooperator?

Main Claim and Evidence: Pavlov is close to an ideal strategy, And P3 or P4 are the best candidates for the ideal. Generally, a Pavlov should start out fairly cooperative, such as P(n-2,n). Pavlov can be exploited and will exploit, however. But no known simple strategy does better. Possibly in the face of noise an NT-like strategy, such as Downing's entries in axelrod-1984a or donninger-1986a would be the nearest to ideal.

Assumptions:

Next Steps: None noted.

Remaining Open Questions:


Quality

Originality is excellent. [kraines-1989a took edge off novelty].
Contribution/Significance is outstanding.
Quality of organization is outstanding.
Quality of writing is outstanding.
<-Back