The **prisoner's dilemma** is an example of a non-zero-sum game that demonstrates a conflict between rational individual behavior and the benefits of cooperation in certain situations.

Table of contents |

2 The Iterated Prisoner's Dilemma 3 Variants 4 Friend or Foe 5 References 6 See Also 7 External Link |

The classical Prisoner's Dilemma is as follows:

*Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and having separated them, visit each of them and offer the same deal: If you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If you both stay silent, all we can do is give you both 6 months for a minor charge. If you both confess, you each get 5 years.**Each prisoner individually reasons like this: Either my accomplice confessed or he did not. If he did, and I remain silent, I get 10 years, while if I confess I only get 5. If he remained silent, then by confessing I go free, while by remaining silent I get 6 months. In either case, it is better for me if I confess. Since each of them reasons the same way, both confess, and get 5 years. But although each followed what seemed to be rational argument to achieve the best result, if they had instead both remained silent, they would only have served 6 months.*

In political science, the Prisoner's Dilemma is often used to illustrate the problem of two states getting into an arms race. Both will reason that they have two options, either to increase military expenditure or to make an agreement to reduce weapons. Neither state can be certain that the other one will keep to such an agreement; therefore, they both incline towards military expansion. The irony is that both states seem to act rationally, but the result is completely irrational.

In his book *The Evolution of Cooperation* (1984), Robert Axelrod explored an extension to the classical PD scenario, which he called the "iterated prisoner's dilemma." In this, participants have to choose their mutual strategy again and again, and have memory of their previous encounters. The example he used was two people meeting and exchanging closed bags, with the understanding that one of them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his bag what he agreed, or he can defect by handing over an empty bag. This mirrors the prisoner's dilemma: for each participant individually, it is better to defect, though both would be better off if both cooperated.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, "greedy" strategies tended to do very poorly in the long run while more "altruistic" strategies did better, as judged purely by self-interest. He used this to show a possible mechanism to explain what had previously been a difficult hole in Darwinian theory: how can apparently altruistic behavior evolve from the purely selfish mechanisms of natural selection?

The best deterministic strategy was found to be "Tit for Tat", which Anatol Rapoport developed and entered in a contest for the best strategy that could be written as a computer algorithm. It was as simple as any entered, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, do what your opponent did on the previous move. A slightly better strategy is "Tit for Tat with forgiveness". When your opponent defects, on the next move you sometimes cooperate anyway with small probability (around 1%-5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the lineup of opponents. "Tit for Tat with forgiveness" is best when miscommunication is introduced to the game. That means that sometimes your move is incorrectly reported to your opponent: you cooperate but your opponent hears that you defected.

For the iterated Prisoner's Dilemma, it is not always correct to say that any given strategy is the best. For example, consider a population where everyone defects every time, except for a single individual following the Tit-for-Tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being Tit-for-Tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game. Simulations of populations have been done, where individuals with low scores die off, and those with high scores reproduce. The mix of algorithms in the final population generally depends on the mix in the initial population.

If an iterated Prisoner's Dilemma is going to be iterated exactly N times, for some known constant N, then there is another interesting fact. The Nash equilibrium is to defect every time. That is easily proved by induction. You might as well defect on the last turn, since your opponent will not have a chance to punish you. Therefore, you will both defect on the last turn. Then, you might as well defect on the second-to-last turn, since your opponent will defect on the last no matter what you do. And so on. For cooperation to remain appealing, then, the future must be indeterminate for both players. One solution is to make the total number of turns N random.

Another odd case is "play forever" prisoner's dilemma. The game is repeated an infinite number of times, and your score is the average (suitably computed).

The Prisoner's Dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that transactions between two people requiring trust can be modelled by the Prisoner's Dilemma, cooperative behavior in populations may be modelled by a multi-player, iterated, version of the game. It has, consequently, fascinated many, many scholars over the years. A not entirely up-to-date estimate (Grofman and Pool 1975) puts the count of scholarly articles devoted to it at over 2,000.

A typical payoff matrix would read:

*If both players cooperate, they each get +5.**If one cooperates and the other defects, the first gets +1 and the other gets +10.**If both defect, they each get -20.*

*If both players cooperate, they each get +10.**If you cooperate and the other guy defects, you get +1 and they get +5.**If both defect, they each get +3.*

*Friend or Foe* is a game show currently airing on the Game Show Network. It is an example of the Prisoner's Dilemma game tested by real people, but in an artificial setting. On the game show, three pairs of people compete. As each pair is eliminated, they play a game of Prisoner's Dilemma to determine how their winnings are split. If they both cooperate ("Friend"), they share the winnings 50-50. If one cooperates and the other defects ("Foe"), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and the "I cooperate and opponent defects" cases are identical. This makes the "both defect" a neutral equilibrium, compared with being a stable equilibrium in standard Prisoner's Dilemma. If you know your opponent is going to vote "Foe", then your choice does not affect your winnings. In a certain sense, "Friend or Foe" is between "Prisoner's Dilemma" and "Chicken".

The payoff matrix is

*If both players cooperate, they each get +1.**If both defect, they each get 0.**If you cooperate and the other person defects, you get +0 and they get +2.*

- Robert Axelrod
*The Evolution of Cooperation*(1984) - Grofman and Pool (1975). Bayesian Models for Iterated Prisoner's Dilemma Games.
*General Systems*20:185-94. - William Poundstone
*Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb*. Doubleday, 1992. ISBN 0385415672. A wide-ranging popular introduction, as the title indicates.