A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma

<-Back

@Article{wakano-2001a,
author         = {Joe Yuichiro Wakano and Norio Yamamura},
title          = {A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma},
year           = {2001},
volume         = {19},
value          = {cb},
review-dates   = {2005-02-23, 2005-02-26},
pages          = {1--8},
journal        = {Journal of Ethology},
hardcopy       = {yes},
key            = {wakano-2001a}
}


### Summary

Very simple reinforcement learner.

• Threshold level is 0.
• h is the internal state and is adjusted by learning. Player plays C when h >= 0; otherwise D.
• s is a fixed aspiration of an individual, but may drift evolutionarily (dynamics in evolution, not it play). If a resulting score is > s, then the result is considered good and the player increases internal state h by some amount.
• f is resulting score from a single play.
• a is the scale (rate) of updating $\delta h$. Initial h value is -a/2.

Digital Learner (DL) is very simple, updating with $\delta h = a * sgn(f-s) * sgn(h)$. Only aspirations that fall in ranges defined by T,R,P,and S (thus five partitions) matter. DL does not prove to be ESS.

Analog Learner (AL) allows updates to reflect quantitatively a difference between the resulting score and their aspiration. Hence \$\delta h = a * (f-s) * sgn(h). AL does prove to be ESS, or very nearly so at s=2.7. AL not being invaded by All-C, All-D, TFT, or Pavlov is surprising. Quantified evaluation brings learners an efficient ability to avoid being exploited by always defecting strategies.

Checked AL for invasiveness against TFT and Pavlov. Tests showed that for mid-range s values, say 1.7 to 2.6 for both 1% and 10% error rates, that scores were higher, so invasion possible.

Checked AL for stability (resistance to invasion) and showed a region it is resistant to Pavlov and for TFT resistant at 10% error for s=2.7-2.8. Slightly lower s's can invade other ALs. TFTs will always remain low populations in this general region however.

### Key Factors

Relations to Other Work: Heavy comparisons to Nowak and Sigmund and discussion of Axelrod, but a fair amount of reference to other learning systems. Comparisons with Stephens and Clements for conditions of cooperation, Posch for evolutionary sim. with RL strategies, Sandholm and Crites Q-Learning.. but notes their lack of explicit ESS study and analysis. In discussion, note that the paper's AL model is considerably simpler learner than Rubinstein, Billard, or Harrald and Fogel, who take complicated RL, automata, or NN models.

Problem Addressed: Is there a very simple reinforcement learning strategy that is also an ESS?

Main Claim and Evidence: The AL presented in the paper is a simple RL and is ESS with the right settings of aspiration level, roughly s=2.6. It can resist invasion by All-C, All-D, TFT, and Pavlov with error rates 1% and 10%. They demonstrate with graphs and argumentation (though some of the argumentation is not complete or fully convincing) that AL cannot be invaded and can invade, though not necessarily to dominance.

Assumptions:

1. Learners do not notice errors and assume they performed correctly; no adjustments for erroneous actions.
2. 1% and 10% errors are those worth studying.
3. Definition of ESS is "if a strategy is stable against invasion of any other strategy, it is called an ESS."
4. Notes that Nowak and Sigmund showed that their stochastic (p1,p2,p3,p4) model can be analytically calculated for infinite game.
5. The average score of a Pavlov population is lower than All-C and difference increases with noise.
6. Numerical calculations on classic indicated only GRIM and All-D were ESSs.
7. Confirmed that Pavlov-like strategy is invaded by Pavlov, which is invaded by All-D. (nowak-1993a did this first).
8. "Pavlov does not explain actual cooperative behavior for two reasons: it is not an ESS and it does not behave very cooperatively under high error rates."

Next Steps:

Remaining Open Questions:

• It's not clear what would happen if there was genetic drift on s.
• How did the real Pavlov strategy, which originated in kraines-1989a, and is at worst only modestly much more complex get passed over [I would say the same complexity]. Win-stay, lose-shift strategy is just an extreme case of Pavlov, where n is 1, and the other stochastic strategies noted by Nowak and Sigmund aren't really the same as the Kraines' Pavlov. We have allowed the change of meaning where the most extreme Pavlov imaginable is mapped to (1,0,0,1) stochastic and then varied in Nowak's work. This has confused followers like Wakano, who didn't even reference kraines-1989a! The problem here is that kraines-1989a presents a similarly simple strategy and isn't even acknowledged here. Oh well, probably no harm intended, but similar confusion over Pavlov has made this a mess. [I recommend using win-stay, lose-shift (WSLS) for Nowak's (possibly earlier work here in naming) and Pavlov *only* for Kraines work.

### Quality

Originality is good.
Contribution/Significance is good.
Quality of organization is excellent.
Quality of writing is excellent.
<-Back