A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma [Protected Link] A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma


  author         = {Joe Yuichiro Wakano and Norio Yamamura},
  title          = {A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma},
  year           = {2001},
  volume         = {19},
  value          = {cb},
  review-dates   = {2005-02-23, 2005-02-26},
  pages          = {1--8},
  read-status    = {reviewed},
  journal        = {Journal of Ethology},
  hardcopy       = {yes},
  key            = {wakano-2001a}


Very simple reinforcement learner.

Digital Learner (DL) is very simple, updating with $\delta h = a * sgn(f-s) * sgn(h)$. Only aspirations that fall in ranges defined by T,R,P,and S (thus five partitions) matter. DL does not prove to be ESS.

Analog Learner (AL) allows updates to reflect quantitatively a difference between the resulting score and their aspiration. Hence $\delta h = a * (f-s) * sgn(h). AL does prove to be ESS, or very nearly so at s=2.7. AL not being invaded by All-C, All-D, TFT, or Pavlov is surprising. Quantified evaluation brings learners an efficient ability to avoid being exploited by always defecting strategies.

Checked AL for invasiveness against TFT and Pavlov. Tests showed that for mid-range s values, say 1.7 to 2.6 for both 1% and 10% error rates, that scores were higher, so invasion possible.

Checked AL for stability (resistance to invasion) and showed a region it is resistant to Pavlov and for TFT resistant at 10% error for s=2.7-2.8. Slightly lower s's can invade other ALs. TFTs will always remain low populations in this general region however.

Key Factors

Relations to Other Work: Heavy comparisons to Nowak and Sigmund and discussion of Axelrod, but a fair amount of reference to other learning systems. Comparisons with Stephens and Clements for conditions of cooperation, Posch for evolutionary sim. with RL strategies, Sandholm and Crites Q-Learning.. but notes their lack of explicit ESS study and analysis. In discussion, note that the paper's AL model is considerably simpler learner than Rubinstein, Billard, or Harrald and Fogel, who take complicated RL, automata, or NN models.

Problem Addressed: Is there a very simple reinforcement learning strategy that is also an ESS?

Main Claim and Evidence: The AL presented in the paper is a simple RL and is ESS with the right settings of aspiration level, roughly s=2.6. It can resist invasion by All-C, All-D, TFT, and Pavlov with error rates 1% and 10%. They demonstrate with graphs and argumentation (though some of the argumentation is not complete or fully convincing) that AL cannot be invaded and can invade, though not necessarily to dominance.


  1. Learners do not notice errors and assume they performed correctly; no adjustments for erroneous actions.
  2. 1% and 10% errors are those worth studying.
  3. Definition of ESS is "if a strategy is stable against invasion of any other strategy, it is called an ESS."
  4. Notes that Nowak and Sigmund showed that their stochastic (p1,p2,p3,p4) model can be analytically calculated for infinite game.
  5. The average score of a Pavlov population is lower than All-C and difference increases with noise.
  6. Numerical calculations on classic indicated only GRIM and All-D were ESSs.
  7. Confirmed that Pavlov-like strategy is invaded by Pavlov, which is invaded by All-D. (nowak-1993a did this first).
  8. "Pavlov does not explain actual cooperative behavior for two reasons: it is not an ESS and it does not behave very cooperatively under high error rates."

Next Steps:

Remaining Open Questions:


Originality is good.
Contribution/Significance is good.
Quality of organization is excellent.
Quality of writing is excellent.