Duality between cooperation and defection in the presence of tit-for-tat in replicator dynamics

# Duality between cooperation and defection in the presence of tit-for-tat in replicator dynamics

## Abstract

The prisoner’s dilemma describes a conflict between a pair of players, in which defection is a dominant strategy whereas cooperation is collectively optimal. The iterated version of the dilemma has been extensively studied to understand the emergence of cooperation. In the evolutionary context, the iterated prisoner’s dilemma is often combined with population dynamics, in which a more successful strategy replicates itself with a higher growth rate. Here, we investigate the replicator dynamics of three representative strategies, i.e., unconditional cooperation, unconditional defection, and tit-for-tat, which prescribes reciprocal cooperation by mimicking the opponentâs previous move. Our finding is that the dynamics is self-dual in the sense that it remains invariant when we apply time reversal and exchange the fractions of unconditional cooperators and defectors in the population. The duality implies that the fractions can be equalized by tit-for-tat players, although unconditional cooperation is still dominated by defection. Furthermore, we find that mutation among the strategies breaks the exact duality in such a way that cooperation is more favored than defection, as long as the cost-to-benefit ratio of cooperation is small.

iterated prisoner’s dilemma, evolution of cooperation, mutation
###### pacs:
02.50.Le,87.23.Cc,05.45.-a

## I Introduction

Although a society consists of individuals, the collective interest is not an aggregate of individual ones. The prisoner’s dilemma (PD) game is a toy model to illustrate such a social dilemma. The PD game can be formulated as follows: Suppose that we have two players, say, Alice and Bob. When Alice cooperates, it benefits Bob by a certain amount of at her own cost . If she defects, on the other hand, it does not incur any cost and Bob gains nothing. If exceeds , defection obviously drives out cooperation, so we restrict ourselves to . The cost-to-benefit ratio, , is thus limited to an open interval . The resulting payoff matrix between cooperation (C) and defection (D) is expressed as

 Missing dimension or its units for \hskip (1)

from the row-player Alice’s point of view, and the game is symmetric to both players. The collective interest is maximized when both choose , but is the rational choice for each individual, hence a dilemma.

By construction of the PD game, unconditional defection (AllD) always constitutes a Nash equilibrium. However, it has been widely known by folk theorems that a cooperative strategy can also be rational if the PD game is repeated indefinitely with high enough probability because one’s cooperation can be reciprocated by the other’s in future. This is called direct reciprocity and has been popularized by Axelrod’s tournament of the iterated prisoner’s dilemma (IPD) Axelrod (1984). We assume that the repetition probability approaches one. An archetypal strategy of direct reciprocity is Tit-for-tat (TFT). It begins with at the first encounter and then replicates the co-player’s last move. Except the first round, therefore, it cooperates only if the co-player cooperated last time. We may call it a conditional cooperator, opposed to an unconditional cooperator (AllC). We will explain that the interactions between the aforementioned strategies, i.e., AllD, TFT, and AllC, are rather subtle, indicating the complexity in evolution of cooperation. Earlier studies have already focused on the dynamics of these three representative strategies Imhof et al. (2005); Brandt and Sigmund (2006); Toupo et al. (2014).

All these fall into a class of reactive strategies Baek et al. (2016) represented by a two-component array , where () means the probability to cooperate when the co-player cooperated (defected) last time. In this notation, we have AllC = , AllD = , and TFT = . If error occurs with probability at each time step, the effective behavior is described as . The error rate is assumed to be small, and this statement will be made quantitative later. Suppose that two strategies and meet in the IPD. They effectively behave as and , respectively, and stochastically visit four states, , and , where the former (latter) symbol means the move of the player adopting (). The transition probabilities between the states can be arranged in the following matrix Nowak and Sigmund (1989, 1990):

 Unknown environment '% (2)

This stochastic matrix is irreducible and positive definite, so the Perron-Frobenius theorem guarantees the existence of a unique right eigenvector with the largest eigenvalue . If we normalize in such a way that , it is the stationary probability distribution over the four states when the strategies and are adopted in the IPD. The long-term payoff of against per round is obtained by calculating an inner product , where . Likewise, we obtain with . If we list the three strategies in the order of AllC, AllD, and TFT, the matrix can be written as follows:

 ~p=⎛⎜⎝(b−c)(1−e)be−c(1−e)b(1−2e+2e2)−c(1−e)b(1−e)−ce(b−c)e2b(1−e)e−ceb(1−e)−c(1−2e+2e2)be−2c(1−e)e(b−c)/2⎞⎟⎠. (3)

Note that the limit of does not coincide with the case of : If was strictly zero between two TFT players, each of them would earn at each round. For any , however, the average payoff per round reduces to as written in Eq. (3). All these results are fully consistent with existing ones such as in Refs. Molander, 1985; Imhof et al., 2007.

In an evolutionary framework, we consider dynamics of a well-mixed population in which random pairs of individuals play the IPD game. Let us assume that the population is so large that stochastic fluctuations can be ignored. If a certain strategy earns a higher payoff than the population average, we can expect that its fraction will grow at a rate proportional to the payoff difference from the population average. Likewise, a strategy with a lower payoff than the population average will decrease in its fraction. Replicator dynamics (RD) expresses this idea by using a set of deterministic equations for the time evolution of the fractions. Let be the total number of strategies in the population. We have in a set of the three strategies, i.e., {AllC, AllD, TFT}. We are interested in the fraction of strategy , with a normalization condition that . The long-term payoff of strategy from the whole population is denoted as

 pα=∑βpαβxβ. (4)

RD describes the time evolution of as follows:

 dxαdt = ∑βqαβpβxβ−⟨p⟩xα, (5)

where ’s are elements of a transition matrix between strategies. The average payoff of the population is denoted as . If we choose the transition matrix as

 qαβ = {1−μfor   α=βμ/(Ns−1)for   α≠β, (6)

RD takes the following form:

 dxαdt = (1−μ)pαxα−⟨p⟩xα+μNs−1∑β≠αpβxβ, (7)

where is a mutation rate, assumed to satisfy . The first term on the right-hand side means growth with a rate proportional to the payoff, the second term normalizes the total sum of ’s, and the last term describes mutation. Note that the fitness of strategy is identified with its payoff , so that it produces offspring in proportion to between time and . The mutation structure in Eq. (6) means that some of these offspring are randomly picked up and change the strategy to one of the others.

In this work, we will show the following: If vanishes, the time evolution of in RD is the same as that of under time reversal, , and vice versa. The duality does not exactly hold for , and we will discuss its consequences by analyzing the system perturbatively.

## Ii Fixed-point Structure

For the sake of notational convenience, we define , , and henceforth. Due to the normalization condition, we have only two independent variables, which we choose as and . Plugging Eq. (4) into Eq. (7), we find a set of equations, which can be formally written as follows:

 dx1dt = f1(x1,x2;e,μ) (8) dx2dt = f2(x1,x2;e,μ). (9)

After a little algebra, one can show that

 f1(x1,x2;e,μ)+f2(x2,x1;e,μ)=12μ(b−c)(1−3x1), (10)

which becomes zero as vanishes. Note that and exchange their positions when they are arguments of in Eq. (10). If we set and define , therefore,

 dx1dτ = −dx1dt=−f1(x1,x2;e,0)=f2(x2,x1;e,0) (11) dx2dτ = −dx2dt=−f2(x1,x2;e,0)=f1(x2,x1;e,0) (12)

By introducing and , we find that

 dX1dτ = f1(X1,X2;e,0) (13) dX2dτ = f2(X1,X2;e,0), (14)

which recovers the original dynamics. In other words, the dynamics is dual under time reversal and exchange of and . Suppose that we have observed a trajectory under RD with . Even if we exchange the names of AllC and AllD populations and trace the trajectory backward in time, we will obtain a valid trajectory governed by the same RD due to the duality (Fig. 1). As a consequence, for a given fixed point (FP) , there must be a mirror FP . Furthermore, the duality also imposes a constraint on the stability: If one is stable, for example, the other must be unstable. Suppose that RD has a single FP. We then have to conclude that because . In addition, due to the stability constraint, it must be either a saddle or a neutrally stable point.

The question is the number of FP’s in this dynamics. When , it is relatively easy to calculate each FP:

 (x1,x2)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩(1,0)≡FP1(0,1)≡FP2(b(1−2e)−c(b−c)(1−2e),0)≡FP3(0,b(1−2e)−c(b−c)(1−2e))≡FP4(0,0)≡FP5(b(1−2e)−c2b(1−2e),b(1−2e)−c2b(1−2e))≡FP6 (15)

If , all these FP’s are feasible, that is, all ’s () belong to the unit interval . Otherwise, only FP, FP, and FP will remain available. We assume that is small in the sense that for values of and considered in this work. The eigenvalues and eigenvectors of the differential equation of Eqs. (8) and (9) with are given in Table 1.

For , we cannot find all FP’s in closed forms because they are involved with a sixth-order polynomial equation. It is more instructive to calculate them in a perturbative way for small . We obtain the perturbative solution by using the Newton method, in which the FP’s for serve as trial solutions. Let us denote any of the trial solutions as , whereas the corresponding solution for as . From the Taylor expansion around the FP:

 0 = f1(x∗1,x∗2)=f1(x1,x2)+(x∗1−x1)∂f1∂x1+(x∗2−x2)∂f1∂x2+… (16) 0 = f2(x∗1,x∗2)=f2(x1,x2)+(x∗1−x1)∂f2∂x1+(x∗2−x2)∂f2∂x2+…, (17)

we observe that

 (x∗1x∗2)≈(x1x2)−⎛⎜⎝∂f1∂x1∂f1∂x2∂f2∂x1∂f2∂x2⎞⎟⎠−1(f1(x1,x2)f2(x1,x2)). (18)

The resulting expressions for are the followings:

 (x∗1,x∗2)≈⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩(1,0)+μ((b−c)(1−e2)2ce(1−2e),−(b−c)(1−e)2c(1−2e))(0,1)+μ((b−c)e2c(1−2e),−(b−c)(1+e)2c(1−2e))(b(1−2e)−c(b−c)(1−2e),0)+μ([b−c−3(b+c)e][2bc(1−e+e2)−(b2+c2)(1−e)]2c(b−c)e(1−2e)(b−c−2be),−(b−c)2(1−e)−2bce22c(1−2e)(b−c−2be))(0,b(1−2e)−c(b−c)(1−2e))+μ(e[(b−c)2+2bce]2c(1−2e)(b−c−2be),[(b−c)2+2bce][b−c−3(b+c)e]2c(b−c)(1−2e)(b−c−2be))(0,0)+μ(−(b−c)2(1−2e)(b−c−2be),(b−c)2(1−2e)(b−c−2be))(b(1−2e)−c2b(1−2e),b(1−2e)−c2b(1−2e))+μ(b(b−c)(b−3c−2be)4c2(1−2e)(b−c−2be),−b(b−c)(b−3c−2be)4c2(1−2e)(b−c−2be)). (19)

Recall that we are concerned with a parameter region of . We discard the first, third, and fifth solutions because they admit negative fractions in this region. We will denote the other three as FP, FP, and FP, respectively. Some of them can also be unfeasible, however, because an implicit assumption behind Eq. (19) is that the perturbed solutions still exist in the real domain, which may not be always true. It turns out that FP and FP can be complex unless we restrict the ranges of and . In Appendix, we derive the following set of inequalities to make FP and FP real, provided that :

 e ≲ emax≡−b2+3bc+4c2+√9b4−18b3c+37b2c2−44bc3+52c42(2b2+bc+9c2) (20) μ ≲ μmax≡c2e(1−2e)(b−c)2−e(b2−3bc−4c2)−e2(2b2−bc−9c2). (21)

We plot the upper bounds and in Fig. 2. From Fig. 2(a), we see that the first inequality is always satisfied as long as . For given and , one can solve to estimate the range of that makes FP and FP complex, leaving only FP as a possible outcome. The point is that FP, the last one in Eq. (19), is the most robust one which remains feasible over the range of under consideration. To tell if it is actually accessible, we should analyze its stability. Table 1 shows that it is neutrally stable at because its eigenvalues are purely imaginary. Let us denote the eigenvalues as , where means the sign in front. If we introduce small yet positive , they begin to contain a real part with a magnitude of :

 Re(λ±6)≈−μ(b−c)[(b−c)2+2c2]4c(b−c−2be), (22)

which is negative in our parameter region. It means that FP will be stable in the presence of mutation so that nearby trajectories will be attracted to that point. If , the correction is positive for and negative for . Mutation breaks the duality between cooperators and defectors, and it does in a way that favors and stabilizes cooperation.

## Iii Numerical Results

We have performed numerical calculations to check our analytic calculations in the previous section. We fix as unity without loss of generality. We have chosen , so the inequality is satisfied for . Integrating Eq. (7) from an initial condition, we remove transient behavior and calculate the time averages of defined as follows:

 ¯¯¯xα = limT→∞1T−T0∫TToxαdt, (23)

where is transient time. Note that the dynamics may have multiple attractors: Figures 3(a) and (b) show numerical integration of RD when and , respectively. Sometimes every initial condition leads to the same result on average [Fig. 3(a)]. Then, we can express any of ’s () as a function of . However, if this is not the case, as illustrated in Fig. 3(b), we have to test many different initial conditions, and the resulting will be multi-valued for given . To sample the initial condition, we use an exhaustive search with mesh size . That is, we check initial conditions of (AllC, AllD, TFT) = .

In Fig. 4(a), we have depicted how depends on when . For , every initial condition yields the same result in the long run, which agrees with FP very well. For , the system is bistable and we get two different pairs of . One of them still agrees with FP, while the other coincides with FP. Figure 4(b) shows the case of , for which the overall behavior is essentially same as in Fig. 4(a) except at small . This is because is of in FP, as presented in Eq. (19). Hence the correction due to is visible only for . Interestingly, the correction term in FP has singularity at , whereas the fractions and must be bounded. For this reason, our perturbative analysis obviously breaks down as . Having said that, the agreement in Fig. 4 is truly remarkable. On the other hand, the existence of multiple FP’s is detected only at , although Eq. (21) is satisfied for according to our parameters , and . It suggests that FP has small basins of attraction, compared to our mesh size: The population is mostly occupied by AllD at FP, but it cannot be sustained unless the TFT population is very small.

## Iv Discussion and Summary

Before concluding this work, let us consider how our observation can be generalized. In fact, the structure of RD seems to be crucial for the existence of such duality: We have also checked the same strategy set with the Moran process for a finite population Baek et al. (2016); Nowak et al. (2004); Taylor et al. (2004); Jeong et al. (2014), but we do not find such a symmetry between AllC and AllD (not shown). In this sense, the duality between AllC and AllD is not universal. Another related question is whether other sets of strategies can also exhibit the same kind of duality, provided that RD governs time evolution. To be more specific, let , , and be three different strategies, i.e., , , and with fractions , , and , respectively. Just as Eqs. (8) to (10), the duality means that when mutation is absent. It turns out that our strategy set is not the only possibility: One particularly interesting case of duality is such that AllC and AllD as before, whereas TFT is replaced by anti-TFT, which is a reactive strategy described as . Therefore, the duality alone does not determine which strategy set one should work with. We believe that one should first define a larger set of strategies from a general constraint, such as memory length, and then pick up the most important ones therein a posteriori. Along this line, the choice of AllC, AllD, and TFT becomes most meaningful in an environment with a moderate value of , where TFT occupies a substantial fraction of the population and other surviving strategies can be classified into cooperative and non-cooperative ones.

To summarize, we have investigated IPD of three representative strategies, AllC, AllD, and TFT, by analyzing RD as a dynamical system. We have shown duality between the fractions of cooperators and defectors in the absence of mutation. The effects of small positive have been studied in a perturbative manner: Mutation enhances cooperation if and stabilizes the corresponding fixed point. The enhancement becomes significant especially for . These results have been confirmed by numerical calculations. Our finding implies that evolutionary dynamics may have a variety of emergent symmetries. According to this picture, a defecting population can be viewed as a cooperating population traveling backward in time, and vice versa, in the presence of TFT.

###### Acknowledgements.
S.K.B. was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2017R1A1A1A05001482).

## Derivation of Eq. (21)

As increases from zero, FP and FP, the second and fourth solutions in Eq. (19), become complex via a saddle-node bifurcation. When the bifurcation point is approached, the deviation of from zero is entirely due to , whereas the deviation of from unity has a contribution from . It is therefore plausible to assume that . We thus expand and in Eqs. (8) and (9) around to the linear order in and to the second order in .

By solving in this set of reduced equations, we explicitly obtain approximate formulas for the FP’s. They contain a common factor, which we denote as , and this is the only factor that can make the FP’s complex. We simplify by expanding it to the linear order in , and calculate the conditions for it to be non-negative. One of the resulting sets of conditions is written in Eqs. (20) and (21). The other has been discarded because it is valid only for a high error rate.

### References

1. R. Axelrod, The Evolution of Cooperation (Basic Books, New York, 1984).
2. L. A. Imhof, D. Fudenberg,  and M. A. Nowak, Proc. Natl. Acad. Sci. USA 102, 10797 (2005).
3. H. Brandt and K. Sigmund, J. Theor. Biol. 239, 183 (2006).
4. D. F. P. Toupo, D. G. Rand,  and S. H. Strogatz, Int. J. Bifurcat. Chaos 24, 1430035 (2014).
5. S. K. Baek, H.-C. Jeong, C. Hilbe,  and M. A. Nowak, Sci. Rep. 6, 25676 (2016).
6. M. A. Nowak and K. Sigmund, J. Theor. Biol. 137, 21 (1989).
7. M. A. Nowak and K. Sigmund, Acta Appl. Math. 20, 247 (1990).
8. P. Molander, J. Conflict Resolut. 29, 611 (1985).
9. L. A. Imhof, D. Fudenberg,  and M. A. Nowak, J. Theor. Biol. 247, 574 (2007).
10. M. A. Nowak, A. Sasaki, C. Taylor,  and D. Fudenberg, Nature 428, 646 (2004).
11. C. Taylor, D. Fudenberg, A. Sasaki,  and M. A. Nowak, B. Math. Biol. 66, 1621 (2004).
12. H.-C. Jeong, S.-Y. Oh, B. Allen,  and M. A. Nowak, J. Theor. Biol. 356, 98 (2014).
72385
Comments 0
The feedback must be of minumum 40 characters
Loading ...

You are asking your first question!
How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description