Statistical Physics of the Spatial Prisoner’s Dilemma with Memory-Aware Agents
Abstract
We introduce an analytical model to study the evolution towards equilibrium in spatial games, with ‘memory-aware’ agents, i.e., agents that accumulate their payoff over time. In particular, we focus our attention on the spatial Prisoner’s Dilemma, as it constitutes an emblematic example of a game whose Nash equilibrium is defection. Previous investigations showed that, under opportune conditions, it is possible to reach, in the evolutionary Prisoner’s Dilemma, an equilibrium of cooperation. Notably, it seems that mechanisms like motion may lead a population to become cooperative. In the proposed model, we map agents to particles of a gas so that, on varying the system temperature, they randomly move. In doing so, we are able to identify a relation between the temperature and the final equilibrium of the population, explaining how it is possible to break the classical Nash equilibrium in the spatial Prisoner’s Dilemma when considering agents able to increase their payoff over time. Moreover, we introduce a formalism to study order-disorder phase transitions in these dynamics. As result, we highlight that the proposed model allows to explain analytically how a population, whose interactions are based on the Prisoner’s Dilemma, can reach an equilibrium far from the expected one; opening also the way to define a direct link between evolutionary game theory and statistical physics.
pacs:
89.20.-aComplex Systems and 87.23.CcPopulation dynamics and ecological pattern formation and 05.90.+mOther topics in statistical physics, thermodynamics, and nonlinear dynamical systems1 Introduction
Evolutionary games (1); (2); (3) represent the attempt to study the evolution of populations (4); (5); (6) by the framework of game theory (7). Notably, these games allow to analyze simplified scenarios in different domains, spanning from socio-economic dynamics to biological systems (8); (9); (1); (10); (11); (12); (13); (14); (15); (16); (17). In general, evolutionary games consider a population of agents whose interactions are based on games like the Prisoner’s Dilemma (hereinafter PD) or the Hawk-Dove game (4), where there are two possible strategies: cooperation and defection. As in classical game theory, the concept of equilibrium represents a core aspect (18). Therefore, we aim to evaluate if a population reaches an equilibrium equal or different from the expected one, i.e., the Nash equilibrium of the considered game. At each interaction, agents gain a payoff according to the adopted strategy and to a payoff matrix. The payoff represents a form of reward in the considered domain (e.g., money in an economic system or food in an ecosystem). Remarkably, as agents are allowed to change their strategy over time, we can map them to spins with states , representing cooperation and defection, respectively. In doing so, we can analyze order-disorder transitions in the spatial PD. Previous studies (19); (20); (21); (22); (23); (24); (25) have shown that, under particular conditions, it is possible that a population playing a game like the PD, i.e., a game characterized by defection as Nash equilibrium, can be able to reach a final state of full cooperation. For instance, it seems that both motion (19); (20); (21); (22) and competitiveness (24) can lead an agent population to cooperate (26) and, more in general, spatial structure plays a key role in the evolution of cooperation (27); (28). Usually, adding properties to agents, as motion, conformity and competitiveness, entails to increase the complexity of the resulting model. Thus, most investigations on evolutionary games are based on computational approaches. Therefore, in this work we try to provide an analytical description of the spatial PD, in order to explain how a population can become cooperative and to strengthen the link between evolutionary game theory (29) and statistical physics (30). It is worth to highlight that we consider ‘memory-aware’ agents, i.e., agents that accumulate their payoff over time. Remarkably, this last condition represents the major difference with most of the evolutionary game models studied by computational approaches (see for instance (31); (32)). On the other hand, considering ‘memory-aware’ agents makes the problem more tractable from an analytical perspective. The remainder of the paper is organized as follows: Section 2 introduces the proposed model and its analytical formulation. Section 3 shows analytical results. Eventually, Section 4 ends the paper.
2 Model
In the proposed model, we are interested in studying the spatial prisoner’s dilemma by an analytical approach. Let us start by introducing the general form of a payoff matrix
(1) |
where the set of strategies is : C stands for ‘Cooperator’ and D for ‘Defector’. In the matrix 1, R is the gain obtained by two interacting cooperators, T represents the Temptation, i.e., the payoff that an agent gains if it defects while its opponent cooperates, the Sucker’s payoff, i.e., the gain achieved by a cooperator while the opponent defects, eventually the payoff of two interacting defectors. In the case of the PD, matrix elements of 1 are: , , and . As stated before, during the evolution of the system agents can change their strategy from to , and vice versa, following an updating rule, as for instance the one named ‘imitation of the best’ (see (19); (4)), where agents imitate the strategy of their richest neighbor.
2.1 Mean field approach
Now, we consider a mixed population of agents with, at the beginning, an equal density of cooperators and defectors. Under the hypothesis that all agents interact together, at each time step the payoffs gained by cooperators and defectors are computed as follows
(2) |
with , density of cooperators and density of defectors. We recall that defection is the dominant strategy in the PD and, even if we set and , it corresponds to the final equilibrium because is always greater than .
At this point, it is important to highlight that previous investigations (19); (20); (21) have been performed by ‘memoryless’ agents (i.e., agents that do not accumulate the payoff over time) whose interactions were defined only with their neighbors, and focusing only on one agent (and on its neighbors) at a time.
These conditions are fundamental. For instance, if at each time step we randomly select one agent interacting only with its neighbors, there exists the probability to select consecutively a number of close cooperators; thus, in this occurrence, very rich cooperators may emerge and then prevail on defectors, even without introducing mechanisms like motion.
It is also worth to observe that as , a homogeneous population of defectors does not increase its overall payoff. Instead, according to the matrix 1, a cooperative population continuously increases its payoff over time.
Now, we consider a population divided into two groups by a wall: a group composed of cooperators, and a mixed group , i.e., composed of cooperators and defectors in equal amount.
Agents interact only with members of the same group, then the group never changes and, in addition, it strongly increases its payoff over time. The opposite occurs in the group , as it converges to an ordered phase of defection, limiting its final payoff.
Remarkably, in this scenario, we can introduce a strategy to modify the equilibria of the two groups. In particular, we can both change to cooperation the equilibrium of , and to defection that of . In the first case, we have to wait a while, before moving one or few cooperators to , so that defectors increase their payoff, but during the revision phase they change strategy to cooperation as the newcomers are richer than them. In the second case, if we move after few time steps a small group of defectors from to , the latter converges to a final defection phase.
These preliminary and theoretical observations let emerge an important property of the ‘memory-aware’ PD: considering the two different groups, cooperators may succeed when act after a long time and individually. Instead, defectors may succeed acting fast and in group. Notably, rich cooperators have to move individually since otherwise many rich cooperators risk to increase too much the payoff of defectors that, in this case, will not change strategy. The opposite holds for defectors that, acting in group, may strongly reduce the payoff of a community of cooperators (for ).
Mapping agents to gas particles
We hypothesize that the spatial PD, with moving agents, can be successfully studied by the framework of kinetic theory (30). Therefore, in the proposed model, we map agents to particles of a gas. In doing so, the average speed of particles is computed as , with system temperature, Boltzmann constant, and particle mass. Particles are divided into two groups by a permeable wall, so that it can be crossed by particles, but it avoids interactions among particles belonging to different groups. Now, it is worth to emphasize that we can provide a dual description of our system: one in the ‘physical’ domain of particles, the other in the ‘information’ domain of agents. Notably, to analyze the system in the ‘information’ domain we will introduce, as above discussed, the mapping of agents to a spin system (see (33)). Summarizing, we map agents to gas particles in order to represent their ‘physical’ property of motion, and we map agents to spins for representing their ‘information’ property (i.e., their strategy). Remarkably, these two mappings can be viewed as two different layers for studying how the agent population evolves over time. Although the physical property (i.e., the motion) affects the agent strategy (i.e., its spin), the equilibrium can be reached in both layers/domains independently. This last observation is important since we are interested in evaluating only the final equilibrium reached in the ’information’ domain. Then, as stated before, agents interact only with those belonging to the same group, so the evolution of the mixed group can be described by following equations
(3) |
with probability that cooperators prevail on defectors (at time ), and probability that defectors prevail on cooperators (at time ). These probabilities are computed according to the payoffs obtained, at each time step, by cooperators and defectors
(4) |
The system 3 can be analytically solved provided that, at each time step, values of and be updated. So, the density of cooperators reads
(5) |
with initial density of cooperators in , , and number of agents in .
Recall that setting , not allowed in a thermodynamic system, corresponds to a motionless case, leading to the Nash equilibrium in . Instead, for we can find more interesting scenarios. Now we suppose that, at time , particles of are much closer to the wall than those of (later we will relax this constraint); for instance, let us consider a particle of that, during its random motion, it is following a trajectory of length (in the -dimensional physical space) towards the wall. Assuming this particle is moving with speed equal to , we can compute the instant of crossing , i.e., the instant when it moves from to .
Thus, on varying the temperature , we can vary .
Let us consider the payoff of cooperators in the two groups. Each cooperator in gains
(6) |
On the other hand, the situation for cooperators in is much more different as, according to the Nash equilibrium, their amount decreases over time. Therefore, we can consider how changes the payoff of the last cooperator survived in
(7) |
moreover, as . At , a new cooperator reaches , with a payoff computed with equation 6.
3 Results
The analytical solution 5 allows to analyze the evolution of the system and to evaluate how initial conditions affects the outcomes of the model.
Let us observe that, if is enough big, the new cooperator may modify the equilibrium of , turning defectors to cooperators. Notably, the payoff considered to compute , after , corresponds to , as the newcomer is the richest cooperator in .
Furthermore, we note that depends on , hence we study the evolution of the system on varying the parameter , i.e., the ratio between particles in the two groups. Eventually, for numerical convenience, we set , , and .
Figure 1 shows the evolution of , for on varying and, depicted in the inner insets, the variation of system magnetization over time (always inside ) computed as (34)
(8) |
with strategy of the -agent.
As discussed before, in the physical domain of particles, heating the system entails the average speed of particles increases. Thus, under the assumption that two agents play together if they stay close (i.e., in the same group) for a long enough time, we hypothesize that exists a maximum speed such that for greater values interactions do not occur (in terms of game). This hypothesis requires a critical temperature , above which no interactions, in the ‘information’ domain, are possible. As shown in plot f of figure 1, for temperatures in range the system converges to a cooperation phase (i.e., ), for the system follows the Nash equilibrium (i.e., ), and for a disordered phase emerges at equilibrium. Remarkably, results of our model suggest that it is always possible to compute a range of temperatures to obtain an equilibrium of full cooperation —see figure 2.
Moreover, we study the variation of on varying (see figure 3) showing that, even for low , it is possible to obtain a time that allows the system to converge towards cooperation.
Eventually, we investigate the relation between the maximum value of that allows a population to become cooperative and its size (i.e., the number of agents). Remarkably, as shown in figure 4, the maximum scales with following a power-law function characterized by a scaling parameter (i.e., an exponent) .
The value of has been computed by considering values of shown in figure 2 for the case . Eventually, it is worth to highlight that all analytical results let emerge a link between the system temperature and its final equilibrium. Recalling that we are not considering the equilibrium of the gas, i.e., it does not thermalize in the proposed model, we emphasize that the equilibrium is considered only in the information domain.
3.1 Phase Transitions in the spatial PD
As discussed before, in the information domain we can study the system by mapping agents to spins, whose value represents their strategy. In addition, we can map the difference between winning probabilities, of cooperators and defectors, to an external magnetic field: . In doing so, by the Landau theory (30), we can analytically identify an order-disorder phase transition. Notably, we analyze the free energy of the spin system on varying the control parameter (35) (corresponding to the magnetization )
(9) |
where the sign of the second term depends on the temperature, i.e., positive for and negative for ; recalling that represents the temperature beyond which it is not possible to play the PD due to the high particles speed (according to the condition before discussed). For the sake of clarity, we want to emphasize that the free energy is introduced in order to evaluate the nature of the final equilibrium achieved by the system. In particular, looking for the minima of allows to investigate if our population reaches the Nash equilibrium, or different configurations (e.g., full cooperation). Figure 5 shows a pictorial representation of the phase transitions that occur in our system, on varying and the external field .
Finally, the constraints related to the average speed of particles, and to the distance between each group and the permeable wall, can in principle be relaxed as we can imagine to extend this description to a wider system with several groups (as done in previous investigations, e.g. (20)), where agents are uniformly spread in the whole space. It is worth to highlight that our results are completely in agreement with those achieved by authors who studied the role of motion in the PD (as (19); (20)), explaining why clusters of cooperators emerge in their simulations (20). We also recall that, in the proposed model, we are using memory-aware agents, while in previous computational investigations agents reset their payoff at each step, i.e., before to start new interactions.
4 Conclusions
To conclude, in this work we provide an analytical description of the spatial Prisoner’s Dilemma, by using the framework of statistical physics, studying the particular case of agents provided with memory of their payoff (defined memory-aware agents). This condition entails that their payoff is not reset at each time step, so that they can increase it over time. In particular, we propose a model based on the kinetic theory of gases, showing how motion may lead a population towards an equilibrium far from the expected one (i.e., the Nash equilibrium). Remarkably, the final equilibrium depends on the system temperature, so that we have been able to identify a range of temperatures that triggers cooperation for all values of the payoff matrix (related to the PD). In addition, we found an interesting relation between the maximum temperature that foster cooperation and the size of the system. Notably, a scaling parameter in that relation has been computed by investigating different orders of magnitude of the size of the system. Furthermore, the dynamics of the resulting model have been also described in terms of order-disorder phase transitions. Finally, we deem that our results open the way to define a direct link between evolutionary game theory and statistical physics.
Acknowledgments
MAJ is extremely grateful to Adriano Barra for all priceless suggestions. Moreover, he wants to thank Mirko Degli Esposti, Marco Lenci, and Giampaolo Cristadoro for the useful comments. This work has been supported by Fondazione Banco di Sardegna.
References
- Perc, M., Grigolini, P.: Collective behavior and evolutionary games â An introduction. Chaos, Solitons & Fractals 56 1â5 (2013)
- Nowak, M.A.: Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press (2006)
- Tomassini, M.: Introduction to evolutionary game theory. Proc. Conf. on Genetic and evolutionary computation companion (2014)
- Julia, PC, Gomez-Gardenes, J., Traulsen, A., and Moreno, Y.: Evolutionary game dynamics in a growing structured population. New Journal of Physics 11 083031 (2009)
- Floria, L.M., Gracia-Lazaro, C., Gomez-Gardenes, J., and Moreno, Y.: Social network reciprocity as a phase transition in evolutionary cooperation. Phys. Rev. E 79 026106 (2009)
- Hofbauer, J., Sigmund, K.: The theory of evolution and dynamical systems. Cambridge University Press (1988)
- Colman, A.M., Game Theory and Its Applications Digital Printing, 2008.
- Perc, M., Szolnoki, A.: Social diversity and promotion of cooperation in the spatial prisoner’s dilemma. Phys. Rev. E 77 011904 (2008)
- Szolnoki, A., Perc, M.: Conformity enhances network reciprocity in evolutionary social dilemmas. J. R. Soc. Interface 12 20141299 (2015)
- Wang, Z., Szolnoki, A., and Perc, M.: Interdependent network reciprocity in evolutionary games. Scientific Reports 3 1183 (2013)
- Szolnoki, A., Xie, N.-G., Wang, C. and Perc, M.: Imitating emotions instead of strategies in spatial games elevates social welfare. Europhysics Letters 96 38002 (2011)
- Perc, M. and Szolnoki, A.: Self-organization of punishment in structured populations. New Journal of Physics 14 043013 (2012)
- Friedman, D.: On economic applications of evolutionary game theory. Journal of Evolutionary Economics 8-1 15–43 (1998)
- Schuster, S., de Figueiredo, L., Schroeter, A., and Kaleta, C.: Combining metabolic pathway analysis with evolutionary game theory. Explaining the occurrence of low-yield pathways by an analytic optimization approach. BioSystems 105 147–153 (2011)
- Frey, E.: Evolutionary game theory: theoretical concepts and applications to microbial communities. Physica A 389 4265â-4298 (2010)
- Lieberman, E., Hauert, C., Nowak, M.A.: Imitation dynamics of vaccination behavior on social networks. The Royal Society - Proc. B 278 (2011)
- Lieberman, E., Hauert, C., Nowak, M.A.: Evolutionary dynamics on graphs. Nature 433 (2004)
- Galam, S. and Walliser, B.: Ising model versus normal form game. Physica A 389 481-489 (2010)
- Meloni, S., Buscarino, A., Fortuna, L., Frasca, M., Gomez-Gardenes, J., Latora, V., Moreno, Y.: Effects of mobility in a population of prisoner’s dilemma players. Phys. Rev. E 79-6 067101 (2009)
- Antonioni, A., Tomassini, M., Buesser, P.: Random Diffusion and Cooperation in Continuous Two-Dimensional Space. Journal of Theoretical Biology 344 (2014)
- Tomassini, M., Antonioni, A.: Levy flights and cooperation among mobile individuals. Journal of theoretical biology 364 154–161 (2015)
- Antonioni, A., Tomassini, M., Sanchez, A.: Short-Range Mobility and the Evolution of Cooperation: An Experimental Study. Scientific Reports 5 (2015)
- Perc, M., Gomez-Gardenes, J., Szolnoki, A., Floria, L.M., and Moreno, Y.: Evolutionary dynamics of group interactions on structured populations: a review. J. R. Soc. Interface 10-80 20120997 (2013)
- Javarone, M.A., Atzeni, A.E.: The role of competitiveness in the Prisoner’s Dilemma. Computational Social Networks 2 (2015)
- Javarone, M.A., Atzeni, A.E. and Galam, S.: Emergence of Cooperation in the Prisonerâs Dilemma Driven by Conformity. LNCS - Springer 9028 155–163 (2015)
- Nowak, M.A.: Five rules for the evolution of cooperation. Science 314-5805 1560–1563 (2006)
- Szabo, G. and Fath, G.: Evolutionary games on graphs. Physics Reports 446: 4-6 97â-216 (2007)
- Nowak, M.A. and May, R.M.: Evolutionary games and spatial chaos. Nature 359 826–829 (1992)
- Hauert, c. and Szabo, G.: Game theory and Physics. A. J. Phys. 73 - 405 (2005)
- Huang, K.: Statistical Mechanics. Wiley 2nd Ed. (1987)
- Szolnoki, A., Szabo, G., Perc, M.: Phase diagrams for the spatial public goods game with pool punishment. Phys. Rev. E 83 0361101 (2011)
- Szolnoki, and Perc, M.: Reward and cooperation in the spatial public goods game. EPL 92 38003 (2010)
- Javarone, M.A.: Is Poker a Skill Game? New insights from Statistical Physics. Europhysics Letters 110 (2015)
- Mobilia, M. and Redner, S.: Majority versus minority dynamics: Phase transition in an interacting two-state spin system. Phys. Rev. E 68-4 046106 (2003)
- Barra, A.: The Mean Field Ising Model trough Interpolating Techniques. Journal of Statistical Physics 132-5 787–809 (2008)