Prisoner’s Dilemma:
This is one of the most famous problems in Game Theory:
Two prisoners are separated and questioned. There is no real evidence against them so if they both do not confess, they will have to be released. This is the situation shown as AA: payoff (0,0). Of course they are not told this. Instead, they are told that if they both confess, they will both get a light sentence – BB: payoff (-1,-1). If one confesses and the other does not, then the one who confesses will be released and rewarded with cash, while the other will get a stiffer sentence - AB payoff (-2,1) or BA payoff (1,-2).
For both prisoners, Strategy B dominates Strategy A, because their payoff is better than with strategy A. Hence, in a rational world, both will choose B. However, that is worse than both choosing A, which would set them free.
Individuals rationally pursuing their interests end at a bad outcome for both.
However, as we will see, cooperation can appear if both players play many turns.
General Formulation:
The general matrix below represents a prisoner’s dilemma where
C = cooperate = do not confess, D = defect = confess, R = reward for cooperation, S = sucker’s payoff, T = temptation payoff, U = un-cooperative payoff.
This game is a Prisoner’s Dilemma when T>R>U>S and R≥(S+T)/2
Iterated Prisoner’s Dilemma:
Finite number of games: If the number of games played is finite, say 100, then at the 100 th game, there is no future so both will play D. But on the 99 th game, they will also play D since the outcome of the final game is known... and so on. Thus, all games will be DD (assuming the players are both completely rational and know there will only be a finite number of games).
2. Unknown number of games: Let p = probability that the next play will happen. Suppose the first player plays C and continues to play C until the second plays D. After that, both play D (do not cooperate).
If the second player never plays D, the second player’s payoff is:
X = R + Rp + Rp2 + R p3….. = R/(1-p) Eq. 1
If the second player plays D at the m’th turn, the second player’s payoff is:
Y = R + Rp +Rp2 + Rp3 + .. Rpm-1 + Tpm + U pm+1+ U pm+2 + ….
= [R(1-pm) + (1-p)pmT + pm+1U]/(1-p) Eq. 2
X > Y if
R > R(1-pm) + (1-p)pmT + pm+1U
which can be simplified to
p > (T-R)/(T-U) Eq. 3
In other words, if the probability to continue play is greater than some threshold, Player 2 will never play D. On the other hand, if T-R approaches zero, so that the value of mutual cooperation and the temptation to defect are the same, then p=0 and the game will end with both not cooperating!
Sequential Iterated Prisoner’s Dilemma in Practice (Axelrod, 1980) (from Wikipedia)
In 1980, Robert Axelrod, then a professor of political science at the University of Michigan, held a tournament for the prisoner's dilemma. He invited a number of well-known game theorists to submit strategies to be run by computer. The programs played games against each other and amongst themselves repeatedly. Each strategy specified whether to cooperate or defect based on the previous moves of both the strategy and its opponent. Fourteen programs were submitted and they played against each other for a long time. The average score per game was recorded.
Some of the strategies submitted were:
Always defect: This strategy defects on every turn. This is what game theory advocates. It is the safest strategy since it cannot be taken advantage of. However, it misses the chance to gain larger payoffs by cooperating with an opponent who is ready to cooperate.
Always cooperate: This strategy does very well when matched against itself. However, if the opponent chooses to defect, then this strategy will do badly.
Random: The strategy cooperates 50% of the time.
The winning program (submitted by Anatol Rapoport) was Tit-for-Tat with the following simple rule:
Start by choosing C
Thereafter, on your turn do whatever your opponent did on his previous turn.
In other words: Tit-for-Tat or “Do unto others what they do unto you!”
Axelrod published a program that beat Tit-for-Tat and asked the CS community to try to find others. This time, sixty-two programs were entered, some of which could beat Tit-for-Tat. But Tit-for-Tat again had the highest average score against all the other programs. The reason this happened was the programs that beat Tit-for-Tat did very poorly against one another!
Axelrod analyzed the programs that did well (top 5-6 winners). They all had the following properties:
· They were Nice: Started by cooperating and never defected first.
· They were Retaliatory: Punished reliably if opponent defected.
· They were Forgiving: Willing to cooperate again if opponent did so.
· They were Clear: Pattern of play was consistent and easy to predict (not random).
This seems like a perfect set of rules for governments, businesses and social interactions.