A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma
[Protected Link] A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma
<-Back
@Article{wakano-2001a,
author = {Joe Yuichiro Wakano and Norio Yamamura},
title = {A simple learning strategy that realizes robust cooperation better than Pavlov in Iterated Prisoners' Dilemma},
year = {2001},
volume = {19},
value = {cb},
review-dates = {2005-02-23, 2005-02-26},
pages = {1--8},
read-status = {reviewed},
journal = {Journal of Ethology},
hardcopy = {yes},
key = {wakano-2001a}
}
Summary
Very simple reinforcement learner.
- Threshold level is 0.
- h is the internal state and is adjusted by learning. Player plays
C when h >= 0; otherwise D.
- s is a fixed aspiration of an individual, but may drift
evolutionarily (dynamics in evolution, not it play). If a resulting
score is > s, then the result is considered good and the player
increases internal state h by some amount.
- f is resulting score from a single play.
- a is the scale (rate) of updating $\delta h$. Initial h value is -a/2.
Digital Learner (DL) is very simple, updating with $\delta h = a *
sgn(f-s) * sgn(h)$. Only aspirations that fall in ranges defined by
T,R,P,and S (thus five partitions) matter. DL does not prove to be
ESS.
Analog Learner (AL) allows updates to reflect quantitatively a
difference between the resulting score and their aspiration. Hence
$\delta h = a * (f-s) * sgn(h). AL does prove to be ESS, or very
nearly so at s=2.7. AL not being invaded by All-C, All-D, TFT, or
Pavlov is surprising. Quantified evaluation brings learners an
efficient ability to avoid being exploited by always defecting
strategies.
Checked AL for invasiveness against TFT and Pavlov. Tests showed
that for mid-range s values, say 1.7 to 2.6 for both 1% and 10%
error rates, that scores were higher, so invasion possible.
Checked AL for stability (resistance to invasion) and showed a
region it is resistant to Pavlov and for TFT resistant at 10% error
for s=2.7-2.8. Slightly lower s's can invade other ALs. TFTs will
always remain low populations in this general region however.
Key Factors
Relations to Other Work:
Heavy comparisons to Nowak and Sigmund and discussion of Axelrod, but
a fair amount of reference to other learning systems. Comparisons
with Stephens and Clements for conditions of cooperation, Posch for
evolutionary sim. with RL strategies, Sandholm and Crites
Q-Learning.. but notes their lack of explicit ESS study and analysis.
In discussion, note that the paper's AL model is considerably simpler
learner than Rubinstein, Billard, or Harrald and Fogel, who take
complicated RL, automata, or NN models.
Problem Addressed: Is there
a very simple reinforcement learning strategy that is also an ESS?
Main Claim and Evidence:
The AL presented in the paper is a simple RL and is ESS with the right
settings of aspiration level, roughly s=2.6. It can resist invasion
by All-C, All-D, TFT, and Pavlov with error rates 1% and 10%. They
demonstrate with graphs and argumentation (though some of the
argumentation is not complete or fully convincing) that AL cannot be
invaded and can invade, though not necessarily to dominance.
Assumptions:
- Learners do not notice errors and assume they performed
correctly; no adjustments for erroneous actions.
- 1% and 10% errors are those worth studying.
- Definition of ESS is "if a strategy is stable against invasion
of any other strategy, it is called an ESS."
- Notes that Nowak and Sigmund showed that their stochastic
(p1,p2,p3,p4) model can be analytically calculated for infinite
game.
- The average score of a Pavlov population is lower than All-C and
difference increases with noise.
- Numerical calculations on classic indicated only GRIM and All-D
were ESSs.
- Confirmed that Pavlov-like strategy is invaded by Pavlov, which
is invaded by All-D. (nowak-1993a did this first).
- "Pavlov does not explain actual cooperative behavior for two
reasons: it is not an ESS and it does not behave very cooperatively
under high error rates."
Next Steps:
Remaining Open Questions:
- It's not clear what would happen if there was genetic drift on s.
- How did the real Pavlov strategy, which originated in
kraines-1989a, and is at worst only modestly much more complex get
passed over [I would say the same complexity]. Win-stay, lose-shift
strategy is just an extreme case of Pavlov, where n is 1, and the
other stochastic strategies noted by Nowak and Sigmund aren't really
the same as the Kraines' Pavlov. We have allowed the change of
meaning where the most extreme Pavlov imaginable is mapped to
(1,0,0,1) stochastic and then varied in Nowak's work. This has
confused followers like Wakano, who didn't even reference
kraines-1989a! The problem here is that kraines-1989a presents a
similarly simple strategy and isn't even acknowledged here. Oh
well, probably no harm intended, but similar confusion over Pavlov
has made this a mess. [I recommend using win-stay, lose-shift (WSLS)
for Nowak's (possibly earlier work here in naming) and Pavlov *only*
for Kraines work.
Quality
Originality is good.
Contribution/Significance is good.
Quality of organization is excellent.
Quality of writing is excellent.
<-Back