A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game
A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game
<-Back
@Article{nowak-1993a,
author = {Martin Nowak and Karl Sigmund},
title = {A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game},
year = {1993},
volume = {364},
review-dates = {2005-02-22},
value = {ca},
month = {July},
pages = {56--58},
read-status = {reviewed},
journal = {Nature},
url = {http://www.ped.fas.harvard.edu/pdf_files/Nature93.pdf},
hardcopy = {yes},
key = {nowak-1993a}
}
Summary
This is a short work, a letter to Nature, and describes how a
win-stay, lose-shift (WSLS)* strategy (aka Pavlov (simple, aka
"simpleton") has generally superior performance to TFT (including
'generous' TFT or GTFT, which cooperates after any opponent moves C
and some fraction of D actions).
[WSLS is my notation, not paper's]
Problems with TFT:
- TFT can allow unconditional cooperators to flourish and allow
for later invasion by exploiters of those unconditional cooperators.
- Mistakes, noise, or error can elicit retaliation sequences
amongst TFT populations, reducing performance.
Nowak and Sigmund had believed GTFT would win out, but Pavlov did much
better in the long run.
WSLS-Pavlov distinctions:
- - doesn't do well against All-D, as every other round Pavlov
switches. Pavlov itself can't invade All-D, requires something like
TFT to first invade.
- + Pavlov gracefully recovers from noise or errors resulting in D,
whereas TFT will continue titting and tatting. "..Pavlov is fairly
tolerant, like GTFT, and can correct mistakes."
- + Pavlov can exploit a sucker, given an error in which it can
discover defection leads to success.
- Has best advantage among nicest strategies [can exploit suckers
and cooperate with tat-capable cooperators].
Cooperation was only 27.5% (payoff ave>2.95) after t=10^4, but went up
to 90% at t=10^7.
"The success of Pavlov-like behaviour does not seem to be restricted
to strategies which only remember the last move. In other
evolutionary runs, where mutations can extend the memory length,
similar strategies have been found: typically they resume cooperation
after two rounds of mutual defection." [Axelrod, Lindgren]
Key Factors
Relations to Other Work:
- kraines-1989a
- Win-stay, lose-shift is a widespread rule according to M. Domjan and
B. Burkhart in "The Principles of Learning and Behavior" (Brooks/Cole,
Monterey, 1986)
- "simpleton" which is same as Pavlov goes back to 1965 and
Rapaport
- Axelrod's GA and simulated annealing
Problem Addressed:
Is a WSLS Pavlov or a TFT-variant superior in an evolutionary game
with mutation and long-time horizons. How stable are the resulting
populations?
Main Claim and Evidence:
Pavlov is superior to GTFT and TFT in tournaments with noise over
long-term evolutionary simulations. Significant experiments and
sufficient analysis confirm this.
The level of stability appears to increase over longer time horizons,
but the system may fluctuate from high average scores to low at any
time... and the transitions are very quick, only a few generations
amongst hundreds of thousands.
Assumptions:
- Stochastic strategies considered, using probabilities
(p1,p2,p3,p4) corresponding to cooperation next round based on
whether the result was (R,S,T,or P) respectively. TFT=(1,0,1,0) and
WSLS-Pavlov=(1,0,0,1).
- Many generations in simulations 10^4 considered short-term, 10^7
considered long-term (end of game).
- Mutation
Next Steps:
No future plans of action described.
Remaining Open Questions:
Authors leave some lingering doubt about dominance of win-stay, lose-shift.
Quality
Originality is good.
Contribution/Significance is good.
Quality of organization is excellent.
Quality of writing is outstanding.
<-Back