# On the coexistence of cooperators, defectors and conditional cooperators in the multiplayer iterated Prisoner's Dilemma

## Abstract

Recent experimental evidence [Grujić et al., PLoS ONE 5, e13749 (2010)] on the spatial Prisoner’s Dilemma suggests that players choosing to cooperate or not on the basis of their previous action and the actions of their neighbors coexist with steady defectors and cooperators. We here study the coexistence of these three strategies in the multiplayer iterated Prisoner’s Dilemma by means of the replicator dynamics. We consider groups with and players and compute the payoffs to every type of player as the limit of a Markov chain where the transition probabilities between actions are found from the corresponding strategies. We show that for group sizes up to there exists an interior point in which the three strategies coexist, the corresponding basin of attraction decreasing with increasing number of players, whereas we have not been able to locate such a point for . We analytically show that in the limit no interior points can arise. We conclude by discussing the implications of this theoretical approach on the behavior observed in experiments.

###### keywords:

evolution; prisoner’s dilemma; cooperation; conditional cooperation; game theory; replicator dynamics^{1}

url]http://gisc.uc3m.es/ cuesta url]http://allariz.uc3m.es/ anxosanchez/

[c1]Corresponding author

## 1 Introduction

In the past few years, different mechanisms have been proposed to explain the origin and stability of cooperation (Nowak, 2006). One of these mechanisms involves assortment of cooperators (Fletcher and Doebeli, 2009), in particular through the existence of a spatial or social structure dictating who interacts with whom (cf. network reciprocity in Nowak (2006)). Cooperators might then interact mainly with each other and keep the benefits of cooperation to the extent that they perform better than defectors or free riders in peripheral positions. This idea stems from the work by Nowak and May (1992b), who carried out a simulation of the iterated Prisoner’s Dilemma (PD) (Rapoport and Guyer, 1966; Axelrod and Hamilton, 1981) on a lattice in which every individual interacted with her eight nearest neighbors. Their finding of sizable proportions of cooperative actions even when the temptation to defect was quite large stimulated a large amount of work on evolutionary game theory on graphs (for reviews see, e.g., Szabó and Fáth (2007) and Roca et al. (2009)). Unfortunately, in spite of the large body of theoretical work devoted to this issue, it has not been possible to reach a general conclusion about how the existence of structure on a population could promote cooperation: indeed, it was shown that the emergence and survival of cooperative behaviors depended so crucially on the details of the models that their applicability to real life situations was dubious, at best.

In view of this situation, in the last few years a number of groups have carried out experiments to probe the relationship between population structure and cooperation with real human subjects (Kirchkamp and Nagel, 2007; Traulsen et al., 2010; Grujić et al., 2010). Arguably, the main conclusion of this research is that lattice-like structures do not seem to promote cooperation, at least not to a extent different from what is found in dyadic or small group experiments (Kagel and Roth, 1995; Camerer, 2003). While the lack of promotion of cooperation is well established, the reasons proposed by the different teams to explain the experimental observations are different, and there is no consensus yet as to what is the way the subjects updated their decisions during the experiment. In particular, Kirchkamp and Nagel (2007) focused on disproving the imitation strategy proposed by Nowak and May (1992b), a conclusion also suppported by Grujić et al. (2010). On the other hand, Traulsen et al. (2010) fitted their results to a payoff-dependent imitation behavior —Fermi rule (Szabó and Töke, 1998)—, finding that they needed a large amount of random mutation to explain their observations.

In the above context, the analysis carried out by Grujić et al. (2010) brought in an alternative way to understand the experimental observations by building upon the idea of reciprocity (Trivers, 1971), i.e., the fact that individuals behave depending on the actions of their partners in the past. In iterated two-player games, this idea has been studied through the concept of reactive strategies (Nowak and Sigmund, 1989a, b, 1990; Nowak and May, 1992a) (see (Sigmund, 2010) for a comprehensive summary on this matter), the most famous of which is Tit-For-Tat (Axelrod and Hamilton, 1981), given by playing what the opponent played in the previous run. Reactive strategies generalize this idea by considering that players choose their action among the available ones with probabilities that depend on the previous action of the opponent. For the simple case of two strategies (say C and D), players choose C with probability following a C from their partner and with probability after a D from their partner. Subsequently this idea was further developed by considering memory-one reactive strategies (Nowak et al., 1995; Sigmund, 2010), in which the probabilities depend on the previous action of both the focal player and her opponent —i.e., the focal player would choose C with some probability following a (C,C) outcome, and so on.

In iterated multi-player games, such as public goods games or multi-player Prisoner’s Dilemmas (IMPD), reciprocity arises in the form of conditional cooperation (Fischbacher et al., 2001; Gächter, 2007): individuals are willing to contribute more to a public good the more others contribute. Conditional cooperation has been observed a number of times in public goods experiments (Croson, 2006; Fischbacher and Gächter, 2010), often along with a large percentage of free-riders. The experiment by Traulsen et al. (2010) showed also evidence for such a behavior in an spatial setup. Grujić et al. (2010) extended this idea in their analysis to include the dependence of the focal player’s previous action, introducing the so-called moody conditional cooperation (cf. Figure 1). In this strategy, players are more prone to cooperate after having cooperated than after having defected, and in the first case they are more cooperative the more cooperative neighbors they have. This behavior has not been reported before in spatial games and appears to be a natural extension of the reactive strategy idea to multi-player games (among the very many other extensions one can conceive). On the other hand, and from an economic viewpoint, which is an important part of the analysis of human behavior, this type of strategy update scheme responds to the often raised questions on payoff-based rules. In economic interactions it is usually the case that agents perceive the others’ actions but not how much do they benefit from them, and therefore the use of action updates depending, e.g., on the payoff differences, may be questionable. This seems to be the case even if this information is explicitly supplied to the players (Grujić et al., 2010).

Interestingly, the conclusion of Grujić et al. (2010) had a new feature as compared to the other two experiments (Kirchkamp and Nagel, 2007; Traulsen et al., 2010), namely the heterogeneity of the population: aside from the already mentioned moody conditional cooperators, there were a large minority of defectors, i.e., players that defected all or almost all the time, and a few cooperators, that cooperated at practically all rounds. This heterogeneity, also found to be very important in public goods experiments (Fischbacher and Gächter, 2010) had also been observed in four-player experiments by Kurzban and Houser (2005), who reported that their subjects could be roughly classified in three main types, including defectors, cooperators and conditional cooperators (called reciprocators in the original work), albeit they did not check for dependences on the past actions of the players either. Both Kurzban and Houser (2005) and Grujić et al. (2010) checked that the payoffs obtained by every type of player were more or less the same, thus suggesting that the population in the lattice experiment might be at an evolutionary equilibrium.

In this paper we address the question of the existence and stability of such a heterogeneous or mixed equilibrium in the multiplayer iterated Prisoner’s Dilemma. It is important to understand that we are not addressing the issue of the evolutionary explanation of moody conditional cooperation. This is a very interesting but also very difficult task, and in fact we do not even have an intuition as to how can one address this problem in a tractable manner. Our goal is then to understand whether or not the coexistence of moody conditional cooperators, defectors, and a small percentage of cooperators, as observed in the experiment, is theoretically possible. In so doing, we will shed light on experimental and theoretical issues at the same time. On the experimental side, our results show that there is coexistence for groups of 2 or 3 players for parameters reasonably close to those found in the experiment, but not for larger groups. As we will see in the discussion section, this prediction has important consequences related to the adequacy of replicator dynamics to describe the experimental result or to the cognitive capabilities of human subjects in dealing with large groups. We will also discuss there the ways in which our theoretical approach and the experiment may differ, something that can also have implications of its own. On the theoretical side, we present an analysis of a population of players interacting through a multi-player Prisoner’s Dilemma including strategies that generalize the ideas behind reactive strategies, as mentioned above. To our knowledge, this has not been carried out before, at least to the extent we are doing it here, in which we are able to show how this coexistence depends on the size of the groups considered. We believe that the approach we are presenting may be useful for other researchers working on related problems.

With the above goals in mind, we introduce below a model in which populations consisting of the three types of individuals discussed above, namely cooperators, defectors, and moody conditional cooperators, play a multiplayer iterated Prisoner’s Dilemma with populations evolving according to the replicator dynamics. We have considered different group sizes, from through players, a size whose outcome is well described by the limit , which we analyze separately. In the following, we report on our findings concerning this system beginning by a detailed introduction of our model (Sec. 2). The key to our analytical approach is the payoff matrix, whose computation we carry out by means of Markov chain techniques. Section 3 presents the calculation in full detail for players (i.e., the standard iterated Prisoner’s Dilemma) and, subsequently, proceeds to the replicator dynamics analysis of the so-obtained payoff matrix. We then extend our approach to larger groups () in Sec. 4. As this becomes more and more cumbersome, in Sec. 5 we address the limit analytically finding exact results about the possible equilibria of the model. Finally, Sec. 6 concludes by comparing our results to the experiments and discussing the implications of such a comparison in terms of theoretical explanations of the observed behavior and their shortcomings.

## 2 Game, strategies and payoffs

Let us consider a well-mixed population of players who interact via iterative multiplayer prisoner’s dilemmas (IMPDs). In these games, players interact in groups of players. Every round each player adopts an action, either cooperate (C) or defect (D), and receives a payoff from every other player in the group according to a standard prisoner’s dilemma payoff matrix (a cooperator receives from another cooperator and from a defector; a defector receives from a cooperator and from another defector; payoffs satisfy ). We note that this is a generalized version of a public goods game: In the latter, if there are cooperators, a defector receives whereas a cooperator receives ( in a standard public goods game). In an multiplayer PD, a defector receives whereas a cooperator receives , and hence choosing , and the IMPD becomes a generalized public goods game. Notice an important difference with respect to the standard public goods game: in this generalized version () the difference between the payoff received by a cooperator and a defector depends on the number of cooperators.

Inspired by the experimental results of Grujić et al. (2010) but keeping at the same time as few parameters as possible, we will classify players’ strategies into three stereotypical behaviors: mostly cooperators, who cooperate with probability (assumed relatively close to one) and defect with probability ; mostly defectors, who cooperate with probability and defect with probability (for simplicity we will assume ); and moody conditional cooperators, who play depending on theirs and their opponents’ actions in the previous round. Specifically, if they defected in the previous round they will cooperate with probability , whereas if they cooperated in the previous round they will cooperate again with a probability

(1) |

where is the fraction of cooperative actions among the opponents in the previous round, and .

At this point, we would like to mention that our results do not depend qualitatively on the “moodiness assumption”; in fact, we have checked that redoing the calculations we will present below for plain conditional cooperators (as those found by Fischbacher et al. (2001); Gächter (2007)) leads only to quantitative changes in the results. Therefore, we present our discussion in terms of moody conditional cooperators as they are empirically more relevant.

To complete the definition of the model, we need to specify how the populations of the different strategies are going to evolve in time. Players interact infinitely often in an IMPD, so payoffs both increase in time and depend on the whole history of play. It thus make sense to use the (time) average payoffs to study the evolution of the game in terms of the abundance of the three strategies considered. As these strategies are defined depending on players’ actions in the round immediately before, a multiplayer game with players and given populations of each type of player can be described as a finite state Markov chain whose states are defined by the actions taken by the players. Of course the chain is different for different compositions of strategies in the group. In any case, given that all outcomes have non-zero probability, the chain is ergodic and therefore there is a well defined steady state (Karlin and Taylor, 1975). Average payoffs are readily obtained once the probability vector in the steady state is known, and subsequent evolution is described through imitation via replicator dynamics (Hofbauer and Sigmund, 1998). In the next section we develop all this formalism in full detail for the case , i.e., for the usual iterated PD, taking advantage that in this case the expressions that arise can be written in a compact way. The cases with more than 2 players are dealt with in Sec. 4 in a more sketchy manner.

## 3 Two-persons game (iterated PD)

### 3.1 General scheme of the approach

In the case the states of the Markov chain are described as CC, CD, DC, and DD, where the first action is the focal player’s and the second one is the opponent’s. The transition probability matrix will be denoted as

(2) |

where gives the probability that players who played in the previous round play in the present round (CC, CD, DC, DD). The matrix will of course depend on the nature of the two players involved, so there will be nine different matrices. Denoting ‘mostly cooperators’ by C, ‘mostly defectors’ by D and ‘moody conditional cooperators’ by X, the six combinations are CC, CD, CX, DD, DX, XX. As we stated above, the Markov chains so defined are always ergodic; consequently, the corresponding stationary probability vector, which we will term , is obtained by solving the equation (Karlin and Taylor, 1975). Note that there is such a stationary probability distribution for each of the six combinations of two players, as we will see below. Now, once the probability distribution is known, the payoff matrix , providing the average payoff that a player of type gets when confronted to a player of type (C, D, X) in an IMPD (in this Section, , an iterated PD) can be computed as

(3) |

These payoffs can then be used in the replicator dynamics to finally find the evolution of the three strategy population.

### 3.2 Payoff computation

Of the six combinations of players, three yield a trivial stationary vector because they do not depend on the previous actions, namely those which do not involve the strategy X. The corresponding payoffs are therefore straightforward to compute, and we have (recall the focal player is denoted by the first subindex):

(4) | ||||

(5) | ||||

(6) | ||||

(7) |

The payoffs for the cases where the moody conditional cooperators, X, play, require computing the corresponding stationary probability. Let us begin with the Markov matrix (2) for a mostly cooperator (C) and a conditional cooperator (X), given by

(8) |

from which the stationary probability vector is given by

(9) |

where is given by (1) (notice that it represents the average probability for a conditional cooperator to cooperate, given that she cooperated in the previous round, whereas her mostly cooperator opponent cooperates with probability ). Therefore, inserting (9) in (3) and having in mind who the focal player is, we arrive at

(10) | ||||

(11) |

The case for a mostly defector facing a moody conditional cooperator can be obtained immediately by realizing that the defector behaves as a mostly cooperator whose probability of cooperating is instead of , hence we find trivially

(12) | ||||

(13) |

Finally, if two conditional cooperators confront each other, the Markov matrix becomes

(14) |

and has a stationary vector which, up to normalization, is proportional to a vector with components

(15) |

From this result one can compute as in the other eight cases. With the payoffs we have computed, we are now in a position to proceed to the dynamical study.

### 3.3 Replicator dynamics

Denoting (with ) the vector with the population fractions of the three types of players, the dynamics of is described by the replicator equation

(16) |

where is the payoff matrix obtained above.

In order to use this dynamics in connection with the experiment of Grujić et al. (2010), we need to recall the payoffs used in that work, namely , , [i.e., a weak prisoner’s dilemma as in Nowak and May (1992b)]. Two consecutive experiments were carried out, leading to two different sets of parameters for the behavior of the players. Figure 2 shows the dynamics resulting for both sets of parameters, whose specific values are listed in the caption. As we may see, there are no interior points, which would indicate equilibria in which the three strategies coexist, as observed in the experiment. The only equilibria we find for these parameters are in the corners of the simplex, C being always a repeller, D an attractor and X being a saddle point or an attractor depending on the parameters. In the case where D and X are both attractors it is X that has the largest basin of attraction (almost the entire simplex), Therefore, the results for this model do not match what is observed in the experiment. However, it is important to keep in mind that in the experiment players played with their eight neighbors, this being the reason why we will later address the dynamics of IMPDs with larger groups.

Notwithstanding this first result, as we will now see it is very interesting to dwell into the case in more detail. For the purpose of illustrating our results, let us choose the behavioral parameters to be , , , and , which are values we could consider representative of both experiments. Inserting these parameters into the calculations above, we find that the payoff matrix is given by

(17) |

This type of matrix belongs to a class of games studied by Zeeman (1980) [compare it with matrix (39) in A]. In fact, in a region of parameters near those that can be inferred from the experiments of Grujić et al. (2010) the game behaves as a Zeeman game. The Zeeman game has five rest points (see A): an unstable one at the C corner, a stable one at the D corner, a saddle point at the X corner, and two mixed equilibria on the C–X and on the D–X edges of the simplex. Besides, under certain constraints (c.f. (41)) there is also an interior point.

Turning now to our example matrix (17), its non-trivial rest points turn out to be , , and . The stability of these mixed, interior equilibria depends on the parameters (see A). For the present case, the situation is similar to that shown in Figure 10(a). Thus the evolution of this system is governed by the presence of two attractors: the interior point and the D corner, each with a certain basin of attraction. A key feature of the class of problems we are considering is that the precise location of the interior rest point is very sensitive to the values of the parameters. Figures 3–6 illustrate what happens to it when each of the four probabilities that define the strategies are changed around the values given above. Generally speaking, the figures show that the interior point approaches either one of the rest points on the edges C–X and D—X, while these in turn move along their edges. The specific details depend on the parameter one is considering as can be seen from the plots. We have also found that larger changes in the parameters can make the interior point coalesce with the mixed equilibrium on the C–X edge —thus transforming the dynamics into the one sketched in Figure 10(c)— or even change the Zeeman structure of the payoff matrix yielding different stable equilibria (generally at the corners). Notice that —particularly so in experiment 2— the values of the parameters are not far from those producing the plots of Figures 3–6. This indicates that, while we would not expect a two-person theory to describe quantitatively the experiments, the existence of an interior point with the same kind of mixed population as observed is possible with minor modifications of the parameters.

## 4 Games involving more than two players

Having discussed in depth the replicator dynamics for the IPD with mostly cooperators, mostly defectors and moody conditional cooperators, with the result that an interior point with a sizable basin of attraction exists for a wide range of parameters, we now increase the number of players to check whether the theory is a valid description of the experimental results. The mathematical approach for the case when more that two players are involved is similar to that for two players, only computationally more involved. The Markov transition matrix (2) now describes a chain containing states, being the number of players. These are described as all combinations of C or D actions adopted by each of the interacting agents. On the other

hand, there will be such matrices displaying all possible combinations of the three strategies (C, D, X). Obtaining the expressions for them is of course straightforward, but doing it analytically for is out of question. Once the matrices are obtained computing the vector containing the stationary probabilities for each of the states simply amounts again to solving the linear system , readily providing the payoffs for any strategy when confronted with any set of strategies of the opponents. The result can be cast in a tensor . For a population composition the payoff received by an individual of strategy will thus be

(18) |

and the average payoff of the population will be

(19) |

Finally, the replicator dynamics is then given by

(20) |

Expression (18) can be further simplified if we exploit the symmetry implicit in public goods games, where the identity of the players is not at all relevant, only the number of them using a given strategy. This means that many payoffs are equal because

(21) |

i.e., the payoff obtained by an strategist only depends on the number of cooperators, of defectors, and of conditional cooperators () she is confronted to. Then

(22) |

As in Sec. 3, for the parameters obtained from the experiments there is no interior point that describes the coexistence of the three strategies. We subsequently proceeded as in the previous case and tried to find ranges of parameters for which such an interior point exists. It turns out that for groups of players sets of parameters can also be found where the dynamics is similar to that for (see Figure 7 for an example), albeit the parameters for which this happens are a bit different —but still reasonably close to those of the experiments of Grujić et al. (2010). As in the two player case, the structure displayed in this figures turns out to be extremely sensitive to variations in the parameters. Although we will not go into the details of those modifications here, we find it interesting to note that Figure 7 shows an evolution of the interior point with increasing very similar to that for (cf. Figure 3), albeit with more drastic changes, indicating that the existence of an interior point is less generic. For IMPDs with larger groups we find that, although for groups of players it is still possible to find a Zeeman-like phase map, one has to choose values for very close to one (meaning that cooperators and defectors are nearly pure strategies) and on top of that the region where this behavior can be obtained is extremely narrow. It can be clearly observed in Figure 8, where several of these maps are shown for different values of , that variations of about noticeably displace the location of the interior point. Importantly, it can be also observed from Figure 7 and Figure 8 that the basin of attraction of the interior point, when it exists, shrinks upon increasing the number of players, i.e., for the fraction of trajectories that end up in the D attractor is larger than those ending in the interior point. Finally, for the largest group size we could handle computationally, players, we have not been able to find an interior point for any choice of parameters. It turns out that the outcome of this game for is well represented by the large group limit , which unlike the case of arbitrary but finite , is amenable to analysis —as we show in the next section.

## 5 Infinitely large groups

Obtaining the payoffs (21) amounts to finding the stationary state of Markov chains, each made of states, where defines the composition of the -player group. The size of the corresponding Markov matrices grows as , which makes it feasible to study groups even larger than players. This will not be necessary though, because the resulting chain can be studied analytically in the limit , which characterizes well the behavior of large groups.

To determine how a group with players of each type will respond in a given iteration of the prisoner’s dilemma we only need to record the vector whose components count how many players of each strategy cooperate in a given round. Then the probability to observe the Markov chain in a certain state given that in the previous round the state was is

(23) |

with the usual convention that for and where we have introduced the short-hand notation

Extracting analytical information for finite from this matrix is not an easy task. However, let us focus on the limit . It is straightforward to show that

(24) |

and

(25) |

Hence, introducing the random variable and denoting , in the limit the probability density of becomes a delta function around , and , this last quantity arising from the solution to the equation

(26) |

If this is a linear equation with solution . If it is a quadratic equation with two solutions. The one that reduces to the solution found for is

(27) | ||||

(28) |

Notice that as long as , as required.

Factors yield the asymptotic, stationary fraction of cooperative actions among players of type in the group. Hence the stationary level of cooperation is given by

(29) |

and the corresponding payoffs of the three type of players are

(30) | ||||

(31) | ||||

(32) |

Notice that

(33) |

so as long as we have i.e., cooperators are always dominated by defector irrespective of the composition of the population (provided ). This implies that no interior point exists in the limit , a property that suggests that the fact that we have not been able to locate an interior point for is generic for larger values of .

On the other hand,

(34) | ||||

(35) |

so any solution to () determines a rest point on the () edge of the simplex. Taking the first equation and assuming we obtain

Upon simplification this equation becomes

Given that on the edge of the simplex, it turns out that does not hold for any point of this edge. A similar argument yields the same result for on the edge of the simplex (the equations are the same just replacing by and by ).

We have thus established that, depending on the parameters and , on the edge of the simplex either or irrespective of the composition, and on the edge of the simplex either or irrespective of the composition. In order to decide which one of the inequalities holds on each edge we can set an arbitrary composition, namely . At this corner of the simplex

(36) |

Then on if, and only if,

(37) |

and on if, and only if,

(38) |

Notice that if (37) is true so is (38) (but the converse does not hold).

For (37) to hold a necessary condition is that the left-hand side is larger than , a condition that boils down to

As , the only way that this can hold is if . When inequality (37) is satisfied, D is an attractor, X is a repellor, and C a saddle point. Otherwise C is a repellor (obviously, a sufficient condition for this to happen is ). In this case D is an attractor and X a saddle point if (38) holds and viceversa if it does not.

A summary of our results for is shown in the sketch of Figure 9. As we can see from the plot, the main results are that there never exists an interior point, that homogeneous C populations are not stable, and that in two out of three cases the final result of the dynamics is a homogeneous population. Therefore, although there is a region of parameters in which a homogeneous population of moody conditional cooperators is actually stable, we never observe coexistence even of pairs of strategies.

## 6 Discussion

Motivated by the recent experimental work by Grujić et al. (2010), where conditional cooperation depending on the player’s previous action was observed in a spatial prisoner’s dilemma coexisting with cooperation and defection, we have studied the replicator dynamics of the IMPD with these three strategies. The fact that the experimental results indicated that all three strategies were getting on average the same payoff suggested that they were in equilibrium; on the other hand, as the presence of a lattice had no significant consequences on the level of cooperation, it seemed likely that the spatial game could be understood in terms of separate multiplayer games.

Assuming a stylized version of the behaviors mentioned above, we have focused on the problem of their coexistence in well-mixed populations, when they interact in groups of players through an IMPD. For , in a region of parameters compatible with those of the experiment we do find a mixed equilibrium in which all three types of players coexist, and they do it in a proportion similar to that found in the experiments. The phase portrait of the replicator dynamics reproduces that of a three-strategies game introduced by Zeeman (1980). However, upon increasing , the region of parameters of this Zeeman-like dynamics shrinks, and for , the maximum size we could analyze with our analytical approach, we could not find a mixed equilibrium anymore.

Given that our Markov chain technique becomes computationally untraceable for larger sizes, we have carried out a rigorous analysis of the replicator dynamics for this game in the limit . The analysis reveals that in this limit, all rest points other than the three corners of the simplex —that can be found for small — disappear. The dynamics in this limit is determined by who beats who, depending on the parameters. Cooperators are always defeated by defectors, but depending on the parameters, conditional cooperators are displaced by any other strategy, or only by defectors, or they can displace the other two strategies.

Putting together our numerical results for small and our analytical calculations for large , we can conclude that an imitative evolution like the one represented by replicator dynamics cannot account for the coexistence of strategies observed in the experiments, at least in groups as large as (the case of the experiment). The reasons for this can be many. The most obvious one is that replicator dynamics might not be what describes the evolution of strategies in human subjects. In this regard, we have to make it clear that we are not studying the evolution of the players during the experiment, as it was shown by Grujić et al. (2010) that there is no learning. Our evolutionary approach would apply to much longer time scales, i.e., these strategies would have arisen from interactions of human groups through history. It may then well be the case that this slower evolution of human behavior requires another approach to its dynamics. By the same token, it might also occur that the typical number of iterations of the game is not very large, so the stationary probability density obtained from the Markov chains is not a good approximation to the observed behavior. All in all, it is clear that our analytical model might not be the most appropriate one to describe human behavior on IMPDs.

Nevertheless, another possible explanation for the discrepancy between our predictions and the coexistence of moody conditional cooperators with the cooperator and defector strategists might come from bounded rationality considerations. Thus, people may behave in a IMPD as though they were playing a (two-person) IPD with some kind of an “average” opponent, something that can be reinforced by the computer interface of the experiment that isolates the subjects from the other ones with whom they interact. Such a heuristic decision making process might be the result of cognitive biases or limitations, among which the inability to deal with large numbers may be of relevance here (Kahneman et al., 1982), or else it could arise as an adaptation itself (Gigerenzer and Selten, 2001). Whatever the underlying reason, the fact that for and players we can easily find wide ranges of parameters for which the three strategies coexist and, furthermore, this coexistence have a large basin of attraction, suggests that the idea that people may be extrapolating their behavior to larger groups should at least be considered, and tested by suitably designed experiments.

On the other hand, it should be borne in mind that the strategies reported by Grujić et al. (2010) are aggregate behaviors, as they attempted to classify the actions of the player in a few archetypal types. Therefore, there may actually be very many different moody conditional cooperators, defined by different , and parameters and different propensities to cooperate (parameter ) among cooperators and defectors. Alternatively players who were classified as conditional cooperators might be using a totally different strategy, different for every player, which aggregated would look like the conditional cooperation detected in the experiment. This is not included anywhere in our replicator dynamics. It is certainly possible that considering several different subclasses of the strategy X in the replicator dynamics might actually provide an explanation for coexistence in larger groups. However, the corresponding calculations become very much involved, and whether this variability can sustain mixed equilibria is an interesting question that remains out of the scope of this work.

As a final remark, we would like to stress that, notwithstanding the issue that the agreement between our results and the experiments is problematic, this study proves that, under replicator dynamics, even for our work predicts the dominance of moody conditional cooperators for certain regions of parameters. It is important to realize that this type of strategy had not been considered prior to the experimental observation, and as we now see it can successfully take over the entire population even from defection when playing an IMPD. This suggests that this or similar strategies may actually be more widespread than this simple case as they might also be the best ones in related games, such as the public goods game. It would be worth widening the scope of this work by analyzing the possible appearance of this conditional cooperators who are influenced by their own mood in other contexts, both theoretically and experimentally. In this regard, an explanation of the evolutionary origin of moody conditional cooperators would be a particularly important, albeit rather difficult goal.

## Acknowledgments

This work was supported in part by MICINN (Spain) through grant MOSAICO, by ERA-NET Complexity-Net RESINEE, and by Comunidad de Madrid (Spain) through grant MODELICO-CM.

## Appendix A Zeeman’s game

Zeeman (1980) analyzed the evolutionary dynamics of three strategies games. Appart from the well known rock-paper-scissors (Hofbauer and Sigmund, 1998) he identified a game with the canonical payoff matrix for the strategies C, D and X, given by

(39) |

where all coefficients are positive. Any payoff matrix can be transformed into a zero diagonal one because the replicator equation remains invariant if the same constant is substracted from every element of one of its columns (Hofbauer and Sigmund, 1998). The coefficients of the payoff matrix (39) represent the payoff an invader gets when it invades a homogeneous population. Thus a D or X individual invading a homogeneous C population will get or , respectively. As both are positive a homogeneous C population is unstable. Similarly a C or X individual invading a homogeneous D population will get or , respectively. Therefore a homogeneous D population is uninvadable (hence stable). As for a C or a D individual invading a homogeneous X population, it will obtain or , respectively. It is therefore a saddle point because it cannot be invaded by individuals but it can by individuals.

This simple analysis fixes the flux of the dynamics at the boundary of the simplex (Figure 10). It also implies the existence of two rest points on the boundary of the simplex: one on the D–X edge and another one on the C–X edge. These points are given by

(40) |

Besides, an interior rest point , with coordinates

(41) |

appears provided all three components have the same sign (Figure 10(a)). Component is proportional to the difference between the payoff of the population at the C–X mixed equilibrium and the payoff of a D invader. When it is negative the C–X rest point becomes a saddle and the interior point is an attractor (the situation depicted in Figure 10(a)). When it is positive a D individual cannot invade the C–X equilibrium, which then becomes an attractor and the interior point becomes a repellor (this is illustrated in Figure 10(b)). If no interior point exists the behavior will be as plotted in Figure 10(c) (Zeeman, 1980).

### Footnotes

- journal: Journal of Theoretical Biology

### References

- Abbot et al., P., 2011. Inclusive fitness theory and eusociality. Nature 471, E1–E4.
- Axelrod, R., Hamilton, W. D., 1981. The evolution of cooperation. Science 211, 1390–1396.
- Camerer, C. F., 2003. Behavioral Game Theory. Princeton University Press, Princeton.
- Croson, R. T. A. Theories of commitment, altruism and reciprocity: evidence from linear public goods games. Econ. Inquiry 45, 199–216.
- Darwin, C., 1871. The Descent of Man, and Selection in Relation to Sex. Murray, London.
- Fischbacher, U., Gächter, S., 2010. Social preferences, beliefs, and the dynamics of free riding in public goods experiments. Am. Econ. Rev. 100, 541–556.
- Fischbacher, U., Gächter, S., Fehr, E., 2001. Are people conditionally cooperative? Evidence from a public goods experiment. Econ. Lett. 71, 397–404.
- Fletcher, J. A., Doebeli, M., 2009. A simple and general explanation for the evolution of altruism. Proc. Roy. Soc. London B 276, 13–19.
- Gächter, S., 2007. Conditional cooperation: behavioral regularities from the lab and the field and their policy implications. In: Frey, B. S., Stutzer, A. (Eds.), Economics and Psychology: A Promising New Cross-Disciplinary Field. MIT Press, pp. 19–50.
- Gigerenzer, G., Selten, R. (Eds.), 2001. Bounded Rationality: The Adaptive Toolbox. MIT Press, Cambridge.
- Grujić, J., Fosco, C., Araujo, L., Cuesta, J. A., Sánchez, A., 2010. Social experiments in the mesoscale: Humans playing a spatial prisoner’s dilemma. PLoS ONE 5, e13749.
- Hamilton, W. D., 1964a. The genetical evolution of social behaviour I. J. Theor. Biol. 7, 1–16.
- Hamilton, W. D., 1964b. The genetical evolution of social behaviour II. J. Theor. Biol. 7, 17–52.
- Hofbauer, J., Sigmund, K., 1998. Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge.
- Kagel, J. H., Roth, A. E., 1995. The Handbook of Experimental Economics. Princeton University Press, Cambridge, Massachusetts.
- Kahneman, D., Slovic, P., Tversky, A. (Eds.), 1982. Judgment under uncertainty: Heuristics and biases. Cambridge University Press, New York.
- Karlin, S., Taylor, H. M., 1975. A First Course in Stochastic Processes, 2nd Edition. Academic Press, New York.
- Kirchkamp, O., Nagel, R., 2007. Naive learning and cooperation in network experiments. Games Econ. Behav. 58, 269–292.
- Kurzban, R., Houser, D., 2005. Experiments investigating cooperative types in humans: A complement to evolutionary theory and experiments. Proc. Natl. Acad. Sci. USA 102, 1803–1807.
- Maynard Smith, J., Szathmary, E., 1995. The Major Transitions in Evolution. Freeman, Oxford.
- Nowak, M. A., 2006. Five rules for the evolution of cooperation. Science 314, 1560–1563.
- Nowak, M. A., Sigmund, K., 1989a. Game-dynamical aspects of the prisoner’s dilemma. Appl. Math. Comp. 30, 191–213.
- Nowak, M. A., Sigmund, K., 1989b. Oscillations in the evolution of reciprocity. J. Theor. Biol. 137, 21–26.
- Nowak, M. A., Sigmund, K., 1990. The evolution of stochastic strategies in the Prisoner’s Dilemma. Acta Appl. Math. 20, 247–65.
- Nowak, M. A., May, R. M., 1992a. Tit for tat in heterogeneous populations. Nature 355, 250-253.
- Nowak, M. A., May, R. M., 1992b. Evolutionary games and spatial chaos. Nature 359, 826–829.
- Nowak M. A., Sigmund, K., El-Sedy, E., 1995. Automata, repeated games and noise. J. Math. Biol. 33, 703–722.
- Nowak, M. A., Tarnita, C. E., Wilson, E. O., 2010. The evolution of eusociality. Nature 466, 1057–1062.
- Rapoport, A., Guyer, M., 1966. A taxonomy of games. General Systems 11, 203–214.
- Roca, C. P., Cuesta, J., Sánchez, A., 2009. Evolutionary game theory: temporal and spatial effects beyond replicator dynamics. Phys. Life Rev. 6, 208.
- Sigmund, K., 2010. The Calculus of Selfishness. Princeton University Press, Princeton.
- Szabó, G., Fáth, G., 2007. Evolutionary games on graphs. Phys. Rep. 446, 97–216.
- Szabó, G., Töke, C., 1998. Evolutionary prisoner’s dilemma game on a square lattice. Phys. Rev. E 58, 69–73.
- Traulsen, A., Semmann, D., Sommerfeld, R. D., Krambeck, H.-J., Milinski, M., 2010. Human strategy updating in evolutionary games. Proc. Natl. Acad. Sci. USA 107, 2962–2966.
- Trivers, R. L. (1971).The evolution of reciprocal altruism. Q. Rev. Biol. 46, pp. 35–57.
- Zeeman, E. C., 1980. Population dynamics from game theory. In: Nitecki, Z., Robinson, C. (Eds.), Global Theory of Dynamical Systems. Springer, pp. 471–497.