Aspiration Dynamics of Multi-player Games in Finite Populations

Aspiration Dynamics of Multi-player Games in Finite Populations

Jinming Du Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, People’s Republic of China Bin Wu Research Group for Evolutionary Theory, Max Planck-Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany Philipp M. Altrock Long Wang Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, People’s Republic of China
July 6, 2019July 6, 2019
July 6, 2019July 6, 2019

Studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics. In the former case, individuals imitate the strategy of a more successful peer. In the latter case, individuals adjust their strategies based on a comparison of their payoffs from the evolutionary game to a value they aspire, called the level of aspiration. Unlike imitation processes of pairwise comparison, aspiration-driven updates do not require additional information about the strategic environment and can thus be interpreted as being more spontaneous. Recent work has mainly focused on understanding how aspiration dynamics alter the evolutionary outcome in structured populations. However, the baseline case for understanding strategy selection is the well-mixed population case, which is still lacking sufficient understanding. We explore how aspiration-driven strategy-update dynamics under imperfect rationality influence the average abundance of a strategy in multi-player evolutionary games with two strategies. We analytically derive a condition under which a strategy is more abundant than the other in the weak selection limiting case. This approach has a long standing history in evolutionary game and is mostly applied for its mathematical approachability. Hence, we also explore strong selection numerically, which shows that our weak selection condition is a robust predictor of the average abundance of a strategy. The condition turns out to differ from that of a wide class of imitation dynamics, as long as the game is not dyadic. Therefore a strategy favored under imitation dynamics can be disfavored under aspiration dynamics. This does not require any population structure thus highlights the intrinsic difference between imitation and aspiration dynamics.

1 Introduction

In the study of population dynamics, it turns out to be very useful to classify individual interactions in terms of evolutionary games [1]. Early mathematical theories of strategic interactions were based on the assumption of rational choice [2, 3]: an agent’s optimal action depends on its expectations on the actions of others, and each of the other agents’ actions depend on their expectations about the focal agent. In evolutionary game theory, successful strategies spread by reproduction or imitation in a population [4, 5, 6, 7, 8].

Evolutionary game theory not only provides a platform for explaining biological problems of frequency dependent fitness and complex individual interactions such as cooperation and coordination [9, 10]. In finite populations, it also links the neutral process of evolution [11] to frequency dependence by introducing an intensity of selection [12, 13, 14, 15]. Evolutionary game theory can also be used to study cultural dynamics including human strategic behavior and updating [16, 17, 18]. One of the most interesting open questions is How do individuals update their strategies based on the knowledge and conception of others and themselves?

Two fundamentally different mechanisms can be used to classify strategy updating and population dynamics based on individuals’ knowledge about their strategic environment or themselves: imitation of others and self-learning based on one’s own aspiration. In imitation dynamics, players update their strategies after a comparison between their own and another individual’s success in the evolutionary game [19, 20, 21]. For aspiration-driven updating, players switch strategies if an aspiration level is not met, where the level of aspiration is an intrinsic property of the focal individual [22, 23, 24, 25]. In both dynamics, novel strategies cannot emerge without additional mechanisms such as spontaneous exploration of strategy space (similar to mutation) [26, 27, 19, 28, 29, 30]. The major difference is that the latter does not require any knowledge about the payoffs of others. Thus aspiration level based dynamics, a form of self-learning, require less information about an individual’s strategic environment than imitation dynamics.

Aspiration-driven strategy-update dynamics are commonly observed in studies of animal and human behavioral ecology. For example, fish would ignore social information when they have relevant personal information [31], and experienced ants hunt for food based on their own previous chemical trials rather than imitating others [32]. Furthermore, a form of aspiration-level-driven dynamics play a key role in the individual behaviors in rat populations [33]. These examples clearly show that the idea behind aspiration dynamics, i.e., self-evaluation, is present in the animal world. In behavioral sciences, such aspiration-driven strategy adjustments generally operate on the behavioral level. However, it can be speculated that self-learning processes can have such an effect that it might actually have a downward impact on regulatory, and thus genetic levels of brain and nervous system. This, in turn, could be seen as a mechanism that alters the rate of genetic change [34]. Whereas such wide reaching systemic alterations are more speculative, it is clear that aspiration levels play a role in human strategy updating [23].

We study the statistical mechanics of a simple case of aspiration-driven self-learning dynamics in well-mixed populations of finite size. Deterministic and stochastic models of imitation dynamics have been well studied in both well-mixed and structured populations [6, 35, 26, 24, 19, 36]. For aspiration dynamics, numerous works have emerged studying population dynamics on graphs, but its impact in well-mixed populations–a basic reference case, one would think–is far less well understood. Although deterministic aspiration dynamics, i.e., a kind of win-stay-lose-shift dynamics, in which individuals are perfectly rational have been analyzed [37], it is not clear how processes with imperfect rationality influence the evolutionary outcome. Here, we ask whether a strategy favored under pairwise comparison driven imitation dynamics can become disfavored under aspiration-driven self-learning dynamics. To this end, in our analytical analysis, we limit ourselves to the weak selection, or weak rationality approximation, where payoffs via the game play little role in the decision-making [35]. As it has been shown that under weak selection, the favored strategy is invariant for a wide class of imitation processes [27, 21, 38]. We show that for pairwise games, the aspiration dynamics and the imitation dynamics always share the same favored strategies. For multi-player games, however, the weak selection criterion under aspiration dynamics that determines whether a strategy is more abundant than the other differs from the criterion under imitation dynamics. This paves the way to construct multi-player games, for which aspiration dynamics favor one strategy whereas imitation dynamics favor another. Furthermore, in contrast to deterministic aspiration dynamics, if the favored strategy is determined by a global aspiration level, the average abundance of a strategy in the stochastic aspiration dynamics is invariant with respect to the aspiration level, provided selection is weak. We also extrapolate our results to stronger selection cases through numerical simulation.

2 Mathematical Model

2.1 Evolutionary games

We consider evolutionary game dynamics with two strategies and players. From these, the more widely studied games emerge as a special case [36]. In individual encounters, players obtain their payoffs from simultaneous actions. A focal player can be of type , or , and encounter a group containing other players of type , to receive the payoff , or . For example, a player, which encounters individuals of type , obtains payoff . An player in a group of one other player and thus players obtains payoff . All possible payoffs of a focal individual are uniquely defined by the number of in the group, such that the payoff matrix reads


For any group engaging in a one-shot game, we can obtain each member’s payoff according to this matrix.

In a finite well-mixed population of size , groups of size are assembled randomly, such that the probability of choosing a group that consists of another players of type , and of players of type , is given by a hypergeometric distribution [39]. For example, the probability that an player is in a group of other s is given by , where () is the number of players in the population, and is the binomial coefficient.

The expected payoffs for any or in a population of size , with players of type and players of type , are given by


In summary, we define a -player stage game [7], shown in Eq. (2.1), from which the evolutionary game emerges such that each individual obtains an expected payoff based on the current composition of the well-mixed population. In the following, we introduce an update rule based on a global level of aspiration. This allows us to define a Markov chain describing the inherently stochastic dynamics in a finite population: probabilistic change of the composition of the population is driven by the fact that each individual compares its actual payoff to an imaginary value that it aspires. Note here that we are only interested in the simplest way to model such a complex problem and do not address any learning process that may adjust such an aspiration level as the system evolves. For a sketch of the aspiration-driven evolutionary game, see Fig. 1.

Figure 1: Evolutionary game dynamics of -player interactions driven by global aspiration. In our mathematical model of human strategy updating driven by self-learning, a group of players is chosen randomly from the finite population to play the game. According to this, game players calculate and obtain their actual payoffs. They are more likely to stochastically switch strategies if the payoffs they aspire are not met. On the other hand, the higher the payoffs compared to the aspiration level are, the less likely they switch their strategies. Besides, strategy switching is also determined by a selection intensity . For vanishing selection intensity, switching is entirely random irrespective of payoffs and the aspiration level. For increasing selection intensity, the self-learning process becomes increasingly more “optimal” in the sense that for high , individuals tend to always switch when they are dissatisfied, and never switch when they are tolerant. We examine the simplest possible setup, where the level of aspired payoff is a global parameter that does not change with the dynamics. We show that, however, statements about the average abundance of a strategy do not depend on under weak selection ().

2.2 Aspiration-level driven stochastic dynamics

In addition to the inherent stochasticity in finite populations, there is randomness in the process of individual assessments of one’s own payoff as compared to a random sample of the rest of the population; even if an individual knew exactly what to do, he might still fail to switch to an optimal strategy, e.g., due to a trembling hand [40, 41].

Here we examine the simplest case of an entire population having a certain level of aspiration. Players needn’t see any particular payoffs but their own, which they compare to an aspired value. This level of aspiration, , is a variable that influences the stochastic strategy updating. The probability of switching strategy is random when individuals’ payoffs are close to their level of aspiration, reflecting the basic degree of uncertainty in the population. When payoffs exceed the aspiration, strategy switching is unlikely. At high values of aspiration compared to payoffs, switching probabilities are high.

The level of aspiration provides a global benchmark of tolerance or dissatisfaction in the population. In addition, when modeling human strategy updating, one typically introduces another global parameter that provides a measure for how important individuals deem the impact of the actual game played on their update, the intensity of selection, . Irrespective of the aspiration level and the frequency dependent payoff distribution, vanishing values of refer to nearly random strategy updating. For large values of , individuals’ deviations from their aspiration level have a strong impact on the dynamics.

Note that although the level of aspiration is a global variable and does not differ individually, due to payoff inhomogeneity there can always be a part of the population that seeks to switch more often due to dissatisfaction with the payoff distribution.

In our microscopic update process, we randomly choose an individual, , from the population, and assume that the payoff of the focal individual is . To model stochastic self-learning of aspiration-driven switching, we can use the following probability function


which is similar to the Fermi-rule [42, 22], but replaces a randomly drawn opponent’s payoff by one’s own aspiration. The wider the positive gap between aspiration and payoff, the higher the switching probability. Reversely, if payoffs exceed the level of aspiration individuals become less active with increasing payoffs. The aspiration level, , provides the benchmark used to evaluate how “greedy” an individual is. Higher aspiration levels mean that individuals aspire to higher payoffs. If payoffs meet aspiration, individuals remain random in their updates. If payoffs are below aspiration, switching occurs with probability larger than random; if they are above aspiration, switching occurs with probability lower than random. The selection intensity governs how strict individuals are in this respect. For , strategy switching is entirely random (neutral). Low values of lead to switching only slightly different from random but follow the impact of . For increasing , the impact of the difference between payoffs and the aspiration becomes more important. In the case of , individuals are strict in the sense that they either switch strategies with probability one if they are not satisfied, or stay with their current strategy if their aspiration level is met or overshot.

The spread of successful strategies is modeled by a birth and death process in discrete time. In one time step, three events are possible: the abundance of , , can increase by one with probability , decrease by one with probability , or stay the same with probability . All other transitions occur with probability zero. The transition probabilities are given by


In each time step, a randomly chosen individual evaluates its success in the evolutionary game, given by Eqs. (2.2), (2.3), compares it to the level of aspiration, and then changes strategy with probability lower than if its payoff exceeds the aspiration. Otherwise, it switches with probability greater than , except when the aspiration level is exactly met, in which case it switches randomly (note that this is very unlikely to ever be the case).

Compared to imitation (pairwise comparison) dynamics, our self-learning process, which is essentially an Ehrenfest-like Markov chain, has some different characteristics. Without the introduction of mutation or random strategy exploration, there exists a stationary distribution for the aspiration-driven dynamics. Even in a homogenous population, there is a positive probability that an individual can switch to another strategy due to the dissatisfaction resulting from payoff-aspiration difference. This facilitates the escape from the states that are absorbing in the pairwise comparison process and other Moran-like evolutionary dynamics. Hence there exists a nontrivial stationary distribution of the Markov chain satisfying detailed balance. Specifically, for the case of (neutral selection), the dynamics defined by Eqs. (2.5)–(2.7) are characterized by linear rates, while these rates are quadratic for the neutral imitation dynamics and Moran process.

In the following analysis and discussion, we are interested in the limit of weak selection, , and its ability to aptly predict the success of cooperation in commonly used evolutionary player games. The limit of weak selection, which has a long standing history in population genetics and molecular evolution [11] also plays a role in social learning and cultural evolution. Recent experimental results suggest that the intensity with which human subjects adjust their strategies might be low [18]. Although it has been unclear to what degree and in what way human strategy updating deviates from random [43, 44], the weak selection limit is of importance to quantitatively characterize the evolutionary dynamics. In the limiting case of weak selection, we are able to analytically classify strategies with respect to the neutral benchmark, [35, 19, 45, 21, 46]. We note that a strategy is favored by selection, if its average equilibrium frequency under weak selection is greater than one half. In order to come to such a quantitative observation, we need to calculate the stationary distribution over frequencies of strategy .

2.3 Stationary distribution

The Markov chain given by Eqs. (2.5)–(2.7) is a one dimensional birth and death process with reflecting boundaries. It satisfies the detailed balance condition , where

is the stationary distribution over frequencies of in equilibrium [47, 48]. Considering , we find the exact solution by recursion, given by


where is the probability of successive transitions from to . The analytical solution Eq. (2.8) allows us to find the exact value of the average abundance of strategy ,


for any strength of selection.

3 Results and Discussion

It has been shown that imitation processes are similar to each other under weak selection [27, 21, 38]. Thus in order to compare the essential differences between imitation processes and aspiration process, we consider such selection limit. To better understand the effects of selection intensity, aspiration level, and payoff matrix on the average abundance of strategy , we further analyze which strategy is more abundant based on Eq. (2.8). For a fixed population size, under weak selection, i.e. , the stationary distribution can be expressed approximately as


where the neutral stationary distribution is simply given by , and the first order term of this Taylor expansion amounts to


Interestingly, in the limiting case of weak selection, the first order approximation of the stationary distribution of does not depend on the aspiration level. For higher order terms of selection intensity, however, does depend on the aspiration level.

In the following we discuss the condition under which a strategy is favored and compare the predictions for stationary strategy abundance under self-learning and under imitation dynamics. Thereafter we consider three prominent examples of games with multiple players through analytical, numerical and simulation methods, the results of which are detailed in Figs. 24 and Appendix B. All three examples are social dilemmas in the sense that the Nash equilibrium of the one-shot game is not the social optimum. First, the widely studied public goods game represents the class of games where there is only one pure Nash equilibrium [49]. Next, the public goods game with a threshold, a simplified version of the collective risk dilemma [50, 51, 52], represents the class of coordination games with multiple pure Nash equilibria, depending on the threshold. Last, we consider the -player volunteer’s dilemma, or snowdrift game, which has a mixed Nash equilibrium [53, 54].

3.1 Average abundance of strategy

Based on the approximation (3.1), for any symmetric multi-player game with two strategies of normal form (2.1), we can now calculate a weak selection condition such that in equilibrium is more abundant than . Since for neutrality, holds and thus , it is sufficient to consider positivity of the sum of over all . Under weak selection, strategy is favored by selection, i.e., , if


which holds for any -player games with two strategies in a population with more than two individuals. For a detailed derivation of our main analytical result, see Appendix A. Note that for a two-player game, , the above condition simplifies to , which is similar to the concept of risk-dominance translated to finite populations [35].

The left hand side expression of inequality (3.3) can also be compared to a similar condition under the class of pairwise comparison processes [22, 19], where two randomly selected individuals compare their payoffs and switch with a certain probability based on the observed inequality. Typically, weak selection results for pairwise comparison processes lead to the result that strategy is favored by selection if [55, 56, 35]


which applies both, to evaluate whether fixation of is more likely than fixation of , or whether the average abundance of is greater than one half under weak mutation and weak selection, that can be shown using properties of the embedded Markov chain [57]. The sums on the left hand sides of (3.3) and (3.4) can thus be compared with each other in order to reveal the nature of our self-learning process driven by a global aspiration level.

Our main result, Eq. (3.3), holds for a variety of self-learning dynamics, not only for the probability function given by Eq. (2.4). Considering the general self-learning function with , here is strictly increasing with increasing . Denoting , we have . Then, for , , and Eq. (3.2) can be rewritten in a more general form


Since is a positive constant, Eq. (3.3) is still valid for any such probability function , see Appendix A.

3.2 Linear public goods game

Public goods games emerge when groups of players engage in the sustenance of common goods. Cooperators pay an individual cost in form of a contribution that is pooled into the common pot. Defectors do not contribute. The pot is then multiplied by a characteristic multiplication factor and shared equally among all individuals in the group, irrespective of contribution. If the multiplication factor is smaller than the size of the group , each cooperator recovers only a fraction of the initial investment. Switching to defection would always be beneficial in a pairwise comparison of the two strategies. The payoff matrix thus reads


where is typically assumed. Since is a negative constant for any number of cooperators in the group, we find that


is always negative. Cooperation cannot be the more abundant strategy in the well-mixed population (see Fig. 2). However, if the self-learning dynamics are driven by a sufficiently high aspiration level, individuals are constantly dissatisfied and switch strategy frequently, even as defectors, such that cooperation can break even if selection is strong enough, namely for all values . On the other hand, if the aspiration level is low, cooperators switch more often than defectors such that the average abundance of assumes a value closer to the evolutionary stable state of full defection, which depends on . In the extreme case of very low and strong selection, defectors fully dominate, thus the stationary measure retracts to the all defection state.

Figure 2: Mean (stationary) fraction of cooperators for the linear public goods game. The common parameters are aspiration level , population size , and cost of cooperation . In both panels, the group sizes are (dark shaded), and (light shaded). Panel A shows the mean fraction of cooperators as a function of selection intensity for , the inset shows a detail for lower selection intensities. Panel B shows the mean fraction of cooperators as a function of selection intensity for . The inset shows the stationary distribution for , and .

3.3 Threshold public goods game

Here we consider the following public goods game with a threshold in the sense that the good becomes strictly unavailable when the number of cooperators in a group is below a critical threshold, . This threshold becomes a new strategic variable. Here, is an initial endowment given to each player, which is invested in full by cooperators. Whatever the cooperators manage to invest is multiplied by and redistributed among all players in the group irrespective of strategy, if the threshold investment is met. Defectors do not make any investment, and thus have an additional payoff of , as long as the threshold is met. Once the number of cooperators is below , all payoffs are zero, which compares to the highest risk possible (loss of endowment and investment with certainty) in what is called the collective-risk dilemma [50, 52]. The payoff matrix for the two strategies, cooperation , and defection , reads


We can examine when the self-learning process favors cooperation. We can also seek to make a statement about whether under self-learning dynamics, cooperation performs better than under pairwise comparison process. For self-learning dynamics, we find

while the equivalent statement for pairwise comparison processes based on the same payoff matrix would be . Thus, the criterion of self-learning dynamics can be written as

whereas positivity of the imitation processes condition, , simply leads to . Comparing the two conditions, we find


Since the first factor on the right hand side of Eq. (3.9) is always positive, the factor


determines the relationship between self-learning dynamics and pairwise comparison processes: for sufficiently large threshold , expression (3.10) is positive. In conclusion, the aspiration-level-driven self-learning dynamics can afford to be less strict than the pairwise comparison process. Namely, it requires less reward for cooperators’ contribution to the common pool (lower levels of ) in order to promote the cooperative strategy. The amount of cooperative strategy depends on the threshold: higher thresholds support cooperation, even for lower multiplication factors (see Fig. 3). For fixed , our self-learning dynamics are more likely to promote cooperation in a threshold public goods game, if the threshold for the number of cooperators needed to support the public goods is large enough, i.e., not too different from the total size of the group. For small thresholds, and thus higher temptation to defect in groups with less cooperators, we approach the regular public goods games, and the conclusion may be reversed. Under such small cases, imitation-driven (pairwise comparison) dynamics are more likely to lead to cooperation than aspiration dynamics.

Figure 3: Mean (stationary) fraction of cooperators for the threshold public goods game. The common parameters are aspiration level , population size , group size , and cost of cooperation . In both panels, the threshold sizes are (dark shaded), and (light shaded). Panel A shows the mean fraction of cooperators as a function of selection intensity for . The inset shows the critical multiplication factor above which cooperation is more abundant as a function of the threshold . High thresholds lower the critical multiplication factor of the public good such that cooperation can become more abundant than defection. Panel B shows the mean fraction of cooperators as a function of selection intensity for . The inset shows the stationary distribution for , , , and .

3.4 -player snowdrift game

Evolutionary games between two strategies can have mixed evolutionary stable states [6, 36]. Strategy can invade and can invade ; a stable coexistence of the two strategies typically evolves. In the replicator dynamics of the snowdrift game, cooperators can be invaded by defectors as the temptation to defect is still larger than the reward of mutual cooperation [54, 58]. In contrast to the public goods game, cooperation with a group of defectors now yields a payoff greater than exclusive defection. The act of cooperation provides a benefit to all members of the group, and the cost of cooperation is equally shared among the number of cooperators [59]. Hence, the payoff matrix reads


Cooperation can maintain a minimal positive payoff from the cooperative act, then cooperation and defection can coexist. The snowdrift game is a social dilemma, as selection does not favor the social optimum of exclusive cooperation. The level of coexistence depends on the amount of cost that a particular cooperator has to contribute in a certain group. Evaluating the weak selection condition, (3.3) in case of the -player snowdrift game leads to the condition


in order to observe in aspiration dynamics under weak selection. For imitation processes, on the other hand, we find . Note that, except for , holds for any other . Because of this, the different nature of these two conditions, given by the positive coefficients for any , reveals that self-learning dynamics narrow down the parameter range for which cooperation can be favored by selection. In the snowdrift game, self-learning dynamics are less likely to favor cooperation than pairwise comparison processes. Larger group size hinders cooperation: the larger the group, the higher the benefit of cooperation, , has to be in order to support cooperation (see Fig. 4).

Figure 4: Mean (stationary) fraction of cooperators for the -player snowdrift game. The common parameters are aspiration level , population size , and cost of cooperation . In both panels, the group sizes are (dark shaded), and (light shaded). Panel A shows the mean fraction of cooperators as a function of selection intensity for . The inset shows the cooperation condition as a function of group size for benefits . Only for high benefit and low group size, cooperation can be more abundant than defection. Panel B shows the mean fraction of cooperators as a function of selection intensity for . The inset shows the stationary distribution for , , and .

4 Summary and Conclusions

Previous studies on self-learning mechanism have typically been investigated on graphs via simulations, which often employ stochastic aspiration-driven update rules [60, 61, 25, 62, 23]. Although results based on the mean field approximations are insightful [25, 24], further analytical insights have been lacking so far. Thus it is constructive to introduce and discuss a reference case of stochastic aspiration-driven dynamics of self-learning in well-mixed populations. To this end, here we introduce and discuss such an evolutionary process. Our weak selection analysis is based on a simplified scenario that implements a non-adaptive self-learning process with global aspiration level.

Probabilistic evolutionary game dynamics driven by aspiration are inherently innovative and do not have absorbing boundaries even in the absence of mutation or random strategy exploration. We study the equilibrium strategy distribution in a finite population and make a weak selection approximation for the average strategy abundance for any multi-player game with two strategies, which turns out to be independent of the level of aspiration. This is different from the aspiration dynamics in infinitely large populations, where the evolutionary outcome crucially depends on the aspiration level [37]. Thus it highlights the intrinsic differences arising from finite stochastic dynamics of multi-player games between two strategies. Based on this we derive a condition for one strategy to be favored over the other. This condition then allows a comparison of a strategy’s performance to other prominent game dynamics based on pairwise comparison between two strategies.

Most of the complex strategic interactions in natural populations, ranging from competition and cooperation in microbial communities to social dilemmas in humans, take place in groups rather than pairs. Thus multi-player games have attracted increasing interest in different areas [63, 64, 65, 66, 36, 67, 68]. The most straightforward form of multi-player games makes use of the generalization of the payoff matrix concept [63]. Such multi-player games are more complex and show intrinsic difference from games. Hence, as examples here we have studied the dynamics of one of the most widely studied multi-player games–the linear public goods game [64], a simplified version of a threshold public goods game that requires a group of players to coordinate contributions to a public good [17, 50, 51, 69, 70, 52], as well as a multi-player version of the snowdrift game [66] where coexistence is possible. Our analytical finding allows a characterization of the evolutionary success under the stochastic aspiration-driven update rules introduced here, as well as a comparison to the well known results of pairwise comparison processes. While in coordination games, such as the threshold public goods game, the self-learning dynamics support cooperation on a larger set in parameter space; the opposite is true for coexistence games, where the condition for cooperation to be more abundant becomes more strict.

It will be interesting to derive analytical results that either hold for any intensity of selection, or at least for the limiting case of strong selection [13, 71] in finite populations. On the other hand, the update rule presented here does not seem to allow a proper continuous limit in the transition to infinitely large populations [20], which might give rise to interesting rescaling requirements of the demographic noise in the continuous approximation [72] in self-learning dynamics.

Our simple model illustrates that aspiration-driven self-learning dynamics in well-mixed populations alone may be sufficient to alter the expected strategy abundance. On previous studies of such processes in structured populations [60, 61, 25, 62], this effect might have been overshadowed by the properties of the network dynamics studied in silico. Our analytical results hold for weak selection, which might be a useful framework in the study of human interactions [18], where it is still unclear to what role model individuals compare their payoffs and with what strength players update their strategies [18, 44, 30]. Although weak selection approximations are widely applied in the study of frequency dependent selection [27, 29, 35, 45], it is not clear whether the successful spread of behavioral traits operates in this parameter regime. Thus, by numerical evaluation and simulations we show that our weak selection predictions also hold for strong selection. Models such as the one presented here may be used in attempts to predict human strategic dynamics [73, 74]. Such predictions, likely to be falsified in their simplicity [75], are essential to our fundamental understanding of complex economic and social behavior and may guide statistical insights to the effective functioning of the human mind.


This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No. 61020106005 and No. 61375120. B.W. gratefully acknowledges generous sponsorship from the Max Planck Society. P.M.A. greatefully acknowledges support from the Deutsche Akademie der Naturforscher Leopoldina, Grant No. LPDS 2012-12.

Appendix A Appendix

In this appendix, we detail the deducing process of the criterion of for general -player game. We consider the first order approximation of stationary distribution, , and get the criterion condition (shown in Sec. 3), as follows:


Inserting Eq. (2.8), we have


Denoting , the above equation can be simplified as




We have


Since ,


Then, inserting Eqs. (A.10)–(A.13) into Eq. (A.4),


Similarly, we can get




Therefore, inserting Eqs. (A.14)–(A.17) into Eq. (A.3),


Combined with Eq. (A.1), the criterion is rewritten as


where and refer to Eqs. (2.2), and (2.3). Hence,


Therefore the criterion equals to


We can prove that the above inequality leads to a general criterion as follows


This is the result we want to show. For this, we only need to demonstrate


This equals to


Since such equation should hold for any choice of ()s, thus


Using the identity , we can simplify the equivalent condition as


This can be easily proved through mathematical induction.

Thus, we get the criterion of for general multi-player games as Eq. (A.22). We rewrite this as follows


Appendix B Appendix

In the following tables, we demonstrate how selection intensity and the population size influence the evolutionary results (the average fraction of cooperators) through simulation.

Table 1: Simulation results for Linear public goods game. The parameters are: =10, =1, =2, and =1. Under such setting, the criterion Eq. (3.3) we analytically deduced reads: , which means the fraction of cooperators .
Table 2: Simulation results for Threshold public goods game. The parameters are: =10, =1, =1, and =7. Under such setting, the criterion Eq. (3.3) reads: , which means the average fraction of cooperators .
Table 3: Simulation results for multiple Snowdrift game. The parameters are: =10, =1, =1.5, and =1. Under such setting, the criterion Eq. (3.3) reads: , which means the average fraction of cooperators .

It is found that for the examples we discussed, namely the linear public goods game, the threshold collective risk dilemma and a multi-player snowdrift game, our result under weak selection can be generalized for a wide range of parameters (higher values of , small and large populations).


  •  1. Sigmund, K. 2010 The calculus of selfishness, 1st edn. Princeton, NJ: Princeton University Press.
  •  2. von Neumann, J. & Morgenstern, O. 1944 Theory of games and economic behavior. Princeton, NJ: Princeton University Press.
  •  3. Nash, J. F. 1950 Equilibrium points in n-person games. Proc. Natl Acad. Sci. USA 36, 48-49. (doi:10.1073/pnas.36.1.48)
  •  4. Maynard Smith, J. & Price, G. R. 1973 The logic of animal conflict. Nature 246, 15-18. (doi:10.1038/246015a0)
  •  5. Weibull, J. W. 1995 Evolutionary game theory. Cambridge, MA: The MIT Press.
  •  6. Hofbauer, J. & Sigmund, K. 1998 Evolutionary games and population dynamics. Cambridge, UK: Cambridge University Press.
  •  7. Gintis, H. 2009 Game theory evolving: a problem-centered introduction to modeling strategic interaction, 2nd edn. Princeton, NJ: Princeton University Press.
  •  8. Nowak, M. A. 2006 Evolutionary dynamics: exploring the equations of life. Cambridge, MA: Harvard University Press.
  •  9. Nowak, M. A. & Sigmund, K. 2004 Evolutionary dynamics of biological games. Science 303, 793-799. (doi:10.1126/science.1093411)
  •  10. Imhof, L. A. & Nowak, M. A. 2006 Evolutionary game dynamics in a wright-fisher process. J Math. Biol. 52, 667-681. (doi:10.1007/s00285-005-0369-8)
  •  11. Kimura, M. 1983 The neutral theory of molecular evolution. Cambridge, UK: Cambridge University Press.
  •  12. Taylor, C., Fudenberg, D., Sasaki, A. & Nowak, M. A. 2004 Evolutionary game dynamics in finite populations. Bull. Math. Biol. 66, 1621-1644. (doi:10.1016/j.bulm.2004.03.004)
  •  13. Altrock, P. M. & Traulsen, A. 2009 Deterministic evolutionary game dynamics in finite populations. Phys. Rev. E 80, 011909-10. (doi:10.1103/PhysRevE.80.011909)
  •  14. Hilbe, C. 2011 Local replicator dynamics: a simple link between deterministi