The Optional Prisoner’s Dilemma in a Spatial Environment: Coevolving Game Strategy and Link Weights
^{1}
Abstract
In this paper, the Optional Prisoner’s Dilemma game in a spatial environment, with coevolutionary rules for both the strategy and network links between agents, is studied. Using a Monte Carlo simulation approach, a number of experiments are performed to identify favourable configurations of the environment for the emergence of cooperation in adverse scenarios. Results show that abstainers play a key role in the protection of cooperators against exploitation from defectors. Scenarios of cyclic competition and of full dominance of cooperation are also observed. This work provides insights towards gaining an indepth understanding of the emergence of cooperative behaviour in realworld systems.
Coevolution; Optional Prisoner’s Dilemma Game; Spatial Environment; Evolutionary Game Theory.
1 Introduction
Evolutionary game theory in spatial environments has attracted much interest from researchers who seek to understand cooperative behaviour among rational individuals in complex environments. Many models have considered the scenarios where participant’s interactions are constrained by particular graph topologies, such as lattices (Szabó and Hauert, 2002; Nowak and May, 1992), smallworld graphs (Chen and Wang, 2008; Fu et al., 2007), scalefree graphs (Szolnoki and Perc, 2016; Xia et al., 2015) and, bipartite graphs (GómezGardeñes et al., 2011). It has been shown that the spatial organisation of strategies on these topologies affects the evolution of cooperation (Cardinot et al., 2016).
The Prisoner’s Dilemma (PD) game remains one of the most studied games in evolutionary game theory as it provides a simple and powerful framework to illustrate the conflicts in the formation of cooperation. In addition, some extensions of the PD game, such as the Optional Prisoner’s Dilemma game, have been studied in an effort to investigate how levels of cooperation can be increased. In the Optional PD game, participants are afforded a third option — that of abstaining and not playing and thus obtaining the loner’s payoff (). Incorporating this concept of abstinence leads to a threestrategy game where participants can choose to cooperate, defect or abstain from a game interaction.
The vast majority of the spatial models in previous work have used static and unweighted networks. However, in many social scenarios that we wish to model, such as social networks and real biological networks, the number of individuals, their connections and environment are often dynamic. Thus, recent studies have also investigated the effects of evolutionary games played on dynamically weighted networks (Huang et al., 2015; Wang et al., 2014; Cao et al., 2011; Szolnoki and Perc, 2009; Zimmermann et al., 2004) where it has been shown that the coevolution of both networks and game strategies can play a key role in resolving social dilemmas in a more realistic scenario.
In this paper we adopt a coevolutionary spatial model in which both the game strategies and the link weights between agents evolve over time. The interaction between agents is described by an Optional Prisoner’s Dilemma game. Previous research on spatial games has shown that when the temptation to defect is high, defection is the dominant strategy in most cases. We believe that the combination of both optional games and coevolutionary rules can help in the emergence of cooperation in a wider range of scenarios.
The aims of the work are, given an Optional Prisoner’s Dilemma game in a spatial environment, where links between agents can be evolved, to understand the effect of varying the:

value of the link weight amplitude (the ratio ).

value of the loner’s payoff ().

value of the temptation to defect ().
By investigating the effect of these parameters, we aim to explore the impact of the link update rules and to investigate the evolution of cooperation when abstainers are present in the population.
Although some work has considered coevolving link weights when considering the Prisoner’s Dilemma in a spatial environment, to our knowledge the investigation of an Optional Prisoner’s Dilemma game on a spatial environment, where both the strategies and link weights are evolved, has not been studied to date.
The results show that cooperation emerges even in extremely adverse scenarios where the temptation of defection is almost at its maximum. It can be observed that the presence of the abstainers are fundamental in protecting cooperators from invasion. In general, it is shown that, when the coevolutionary rules are used, cooperators do much better, being also able to dominate the whole population in many cases. Moreover, for some settings, we also observe interesting phenomena of cyclic competition between the three strategies, in which abstainers invade defectors, defectors invade cooperators and cooperators invade abstainers.
The paper outline is as follows: Section 2 presents a brief overview of the previous work in both spatial evolutionary game theory with dynamic networks and in the Optional Prisoner’s Dilemma game. Section 3 gives an overview of the methodology employed, outlining the Optional Prisoner’s Dilemma payoff matrix, the coevolutionary model used (Monte Carlo simulation), the strategy and link weight update rules, and the parameter values that are varied in order to explore the effect of coevolving both strategies and link weights. Section 4 features the results. Finally, Section 5 summarizes the main conclusions and outlines future work.
2 Related Work
The use of coevolutionary rules constitute a new trend in evolutionary game theory. These rules were first introduced by Zimmermann et al. (2001), who proposed a model in which agents can adapt their neighbourhood during a dynamical evolution of game strategy and graph topology. Their model uses computer simulations to implement two rules: firstly, agents playing the Prisoner’s Dilemma game update their strategy (cooperate or defect) by imitating the strategy of an agent in their neighbourhood with a higher payoff; and secondly, the network is updated by allowing defectors to break their connection with other defectors and replace the connection with a connection to a new neighbour selected randomly from the whole network. Results show that such an adaptation of the network is responsible for an increase in cooperation.
In fact, as stated by Perc and Szolnoki (2010), the spatial coevolutionary game is a natural upgrade of the traditional spatial evolutionary game initially proposed by Nowak and May (1992), who considered static and unweighted networks in which each individual can interact only with its immediate neighbours. In general, it has been shown that coevolving the spatial structure can promote the emergence of cooperation in many scenarios (Wang et al., 2014; Cao et al., 2011), but the understanding of cooperative behaviour is still one of the central issues in evolutionary game theory.
Szolnoki and Perc (2009) proposed a study of the impact of coevolutionary rules on the spatial version of three different games, i.e., the Prisoner’s Dilemma, the Snow Drift and the Stag Hunt game. They introduce the concept of a teaching activity, which quantifies the ability of each agent to enforce its strategy on the opponent. It means that agents with higher teaching activity are more likely to reproduce than those with a low teaching activity. Differing from previous research (Zimmermann et al., 2004, 2001), they also consider coevolution affecting either only the defectors or only the cooperators. They discuss that, in both cases and irrespective of the applied game, their coevolutionary model is much more beneficial to the cooperators than that of the traditional model.
Huang et al. (2015) present a new model for the coevolution of game strategy and link weight. They consider a population of agents arranged on a regular lattice network which is evolved through a Monte Carlo simulation. An agent’s interaction is described by the classical Prisoner’s Dilemma with a normalized payoff matrix. A new parameter, , is defined as the link weight amplitude and is calculated as the ratio of . They found that some values of can provide the best environment for the evolution of cooperation. They also found that their coevolutionary model can promote cooperation efficiently even when the temptation of defection is high.
In addition to investigations of the classical Prisoner’s Dilemma on spatial environments, some extensions of this game have also been explored as a means to favour the emergence of cooperative behaviour. For instance, the Optional Prisoner’s Dilemma game, which introduces the concept of abstinence, has been studied since Batali and Kitcher (1995). In their work, they proposed the optout or “loner’s” strategy in which agents could choose to abstain from playing the game, as a third option, in order to avoid cooperating with known defectors. There have been a number of recent studies exploring this type of game (Xia et al., 2015; Ghang and Nowak, 2015; Olejarz et al., 2015; Jeong et al., 2014; Hauert et al., 2008). Cardinot et al. (2016) discuss that, with the introduction of abstainers, it is possible to observe new phenomena and a larger range of scenarios where cooperators can be robust to invasion by defectors and can dominate.
However, the inclusion of optional games with coevolutionary rules has not been studied yet. Therefore, our work aims to combine both of these trends in evolutionary game theory in order to identify favourable configurations for the emergence of cooperation in adverse scenarios, where, for example, the temptation to defect is very high.
3 Methodology
The goal of the experiments outlined in this section is to investigate the environmental settings when coevolution of both strategy and link weights of the Optional Prisoner’s Dilemma on a weighted network takes place.
Firstly, the Optional Prisoner’s Dilemma (PD) game will be described; secondly, the spatial environment is described; thirdly, the coevolutionary rules for both the strategy and link weights are described and finally, the experimental setup is outlined.
In the classical version of the Prisoner’s Dilemma, two agents can choose either cooperation or defection. Hence, there are four payoffs associated with each pairwise interaction between the two agents. In consonance with common practice (Huang et al., 2015; Nowak and May, 1992), payoffs are characterized by the reward for mutual cooperation (), punishment for mutual defection (), sucker’s payoff () and temptation to defect (, where ). Note that this parametrization refers to the weak version of the Prisoner’s Dilemma game, where can be equal to without destroying the nature of the dilemma. In this way, maintains the dilemma.
The extended version of the PD game presented in this paper includes the concept of abstinence, in which agents can not only cooperate () or defect () but can also choose to abstain () from a game interaction, obtaining the loner’s payoff () which is awarded to both players if one or both abstain. As defined in other studies (Cardinot et al., 2016; Szabó and Hauert, 2002), abstainers receive a payoff greater than and less than (i.e., ). Thus, considering the normalized payoff matrix adopted, . The payoff matrix and the associated values are illustrated in Tables 1 and 2.
C  D  A  

C  \backslashboxRR  \backslashboxST  \backslashboxLL 
D  \backslashboxTS  \backslashboxPP  \backslashboxLL 
A  \backslashboxLL  \backslashboxLL  \backslashboxLL 
Payoff  Value 

Temptation to defect (T)  
Reward for mutual cooperation (R)  
Punishment for mutual defection (P)  
Sucker’s payoff (S)  
Loner’s payoff (L) 
In these experiments, the following parameters are used: a () regular lattice grid with periodic boundary conditions is created and fully populated with agents, which can play with their eight immediate neighbours (Moore neighbourhood). We adopt an unbiased environment in which initially each agent is designated as a cooperator (), defector () or abstainer () with equal probability. Also, each edge linking agents has the same weight , which will adaptively change in accordance with the interaction.
Monte Carlo methods are used to perform the Optional Prisoner’s Dilemma game. In one Monte Carlo (MC) step, each player is selected once on average. This means that one MC step comprises inner steps where the following calculations and updates occur:

Select an agent () at random from the population.

Calculate the utility of each interaction of with its eight neighbours (each neighbour represented as agent ) as follows:
(1) where is the edge weight between agents and , and corresponds to the payoff obtained by agent on playing the game with agent .

Calculate the accumulated utility of , that is:
(2) where denotes the set of neighbours of the agent .

In order to update the link weights, between agents, compare the values of and the average accumulated utility (i.e., ) as follows:
(3) where is a constant such that .

In line with previous research (Huang et al., 2015; Wang et al., 2014), is adjusted to be within the range of to , where () defines the weight heterogeneity. Note that when or are equal to , the link weight keeps constant (), which results in the traditional scenario where only the strategies evolve.

In order to update the strategy of , the accumulated utility is recalculated (based on the new link weights) and compared with the accumulated utility of one randomly selected neighbour (). If , agent will copy the strategy of agent with a probability proportional to the utility difference (Equation 4), otherwise, agent will keep its strategy for the next step.
(4) where is the temptation to defect and is the punishment for mutual defection. This equation has been considered previously by Huang et al. (2015).
Simulations are run for MC steps and the fraction of cooperation is determined by calculating the average of the final 1000 MC steps. To alleviate the effect of randomness in the approach, the final results are obtained by averaging 10 independent runs.
The following scenarios are investigated:

Exploring the effect of the link update rules by varying the values of . Specifically, the value of the link weight amplitude is varied for a range of fixed values of the loner’s payoff (), temptation to defect () and the weight heterogeneity ().

Investigating the evolution of cooperation when abstainers are present in the population.

Considering snapshots of the evolution of the population over time at Monte Carlo steps of , , , and .

Investigating the relationship between , and . Specifically, the values of , and are varied for a fixed value of ().
It is noteworthy that a wider range of values for , and were considered in our simulations, but for the sake of simplicity, we report only the values of , and , which are representative of the outcomes at other values also.
4 Results
In this section, we present some of the relevant experimental results of the simulations of the Optional Prisoner’s Dilemma game on the weighted network.
4.1 Varying the Values of
Figure 1 shows the impact of the coevolutionary model on the emergence of cooperation when the link weight amplitude varies for a range of fixed values of the loner’s payoff (), temptation to defect () and weight heterogeneity (). In this experiment, we observe that when , the outcomes of the coevolutionary model for the Optional Prisoner’s Dilemma game are very similar to those in the classical Prisoner’s Dilemma game (Huang et al., 2015). This result can be explained by the normalized payoff matrix adopted in this work (Table 1). Clearly, when , there is no advantage in abstaining from playing the game, thus agents choose the option to cooperate or defect.
In cases where the temptation to defect is less than or equal to (), it can be observed that the level of cooperation does not seem to be affected by the increment of the loner’s payoff, except when the advantage in abstaining is very high, i.e., . However, these results highlight that the presence of the abstainers may protect cooperators from invasion. Moreover, the difference between the traditional case () for and all other values of is strong evidence that our coevolutionary model is very advantageous to the promotion of cooperative behaviour. Namely, when , in the traditional case with a static and unweighted network (), the cooperators have no chance of surviving; in this scenario, when the temptation to defect is low, abstainers always dominate, otherwise, when is high, defection is always the dominant strategy. However, when the coevolutionary rules are used, cooperators do much better, being also able to dominate the whole population in many cases.
4.2 Presence of Abstainers
In addition to the fact that the levels of cooperation are usually improved in the coevolutionary model, as the value of the loner’s payoff increases we also observe newer phenomena.
In the classical Prisoner’s Dilemma in this type of environment, when the defector’s payoff is very high (i.e., greater than ) defectors spread quickly and dominate the environment. However, Figure 1 also shows that, for some values of , it is possible to reach high levels of cooperation even when the temptation of defection is almost at its peak.
Therefore, abstainers seem to help the population to increase their fraction of cooperation in many cases, but mainly in the case where the link weight amplitude is higher than . This is usually a bad scenario in the classical Prisoner’s Dilemma game.
4.3 Snapshots at Different Monte Carlo Steps
In order to further explain the results witnessed in the previous experiments, we investigate how the population evolves over time. Figure 2 features the time course of cooperation for three different values of , which are some of the critical points when , and . Based on these results, in Figure 3 we show snapshots for the Monte Carlo steps , , and for the three scenarios shown in Figure 2.
We see from Figure 2 that for the traditional case (i.e., ), abstainers spread quickly and reach a stable state in which single defectors are completely isolated by abstainers. In this way, as the payoffs obtained by a defector and an abstainer are the same, neither will ever change their strategy. In fact, even if a single cooperator survives up to this stage, for the same aforementioned reason, its strategy will not change either.
When , it is possible to observe some sort of equilibrium between the three strategies. They reach a state of cyclic competition in which abstainers invade defectors, defectors invade cooperators and cooperators invade abstainers.
This behaviour, of balancing the three possible outcomes, is very common in nature where species with different reproductive strategies remain in equilibrium in the environment. For instance, the same scenario was observed as being responsible for preserving biodiversity in the neighbourhoods of the Escherichia coli, which is a bacteria commonly found in the lower intestine of warmblooded organisms. According to Fisher (2008), studies were performed with three natural populations mixed together, in which one population produces a natural antibiotic but is immune to its effects; a second population is sensitive to the antibiotic but can grow faster than the third population; and the third population is resistant to the antibiotic.
Because of this balance, they observed that each population ends up establishing its own territory in the environment, as the first population could kill off any other bacteria sensitive to the antibiotic, the second population could use their faster growth rate to displace the bacteria which are resistant to the antibiotic, and the third population could use their immunity to displace the first population.
Another interesting behaviour is noticed for . In this scenario, defectors are dominated by abstainers, allowing a few clusters of cooperators to survive. As a result of the absence of defectors, cooperators invade abstainers and dominate the environment.
4.4 Investigating the Relationship between , and
To investigate the outcomes in other scenarios, we explore a wider range of settings by varying the values of the temptation to defect (), the loner’s payoff () and the link weight amplitude () for a fixed value of weight heterogeneity ().
As shown in Figure 4, cooperation is the dominant strategy in the majority of cases. Note that in the traditional case, with an unweighted and static network, i.e., , abstainers dominate in all scenarios illustrated in this ternary diagram. In addition, it is also possible to observe that certain combinations of , and guarantee higher levels of cooperation. In these scenarios, cooperators are protected by abstainers against exploitation from defectors. In most cases, for populations with the loner’s payoff, , cooperation is promoted the most.
Although the combinations shown in Figure 4 for higher values of b () are just a small subset of an infinite number of possible values, it is clearly shown that a reasonable fraction of cooperators can survive even in an extremely adverse situation where the advantage of defecting is very high. Indeed, our results show that some combinations of high values of and such as for and , can further improve the levels of cooperation, allowing for the full dominance of cooperation.
In summary, we see that the use of a coevolutionary model in the Optional Prisoner’s Dilemma game allows for the emergence of cooperation.
5 Conclusions and Future Work
In this paper, we studied the impact of abstinence in the Prisoner’s Dilemma game using a coevolutionary spatial model in which both game strategies and link weights between agents evolve over time. We considered a population of agents who were initially organised on a lattice grid where agents can only play with their eight immediate neighbours. Using a Monte Carlo simulation approach, a number of experiments were performed to observe the emergence of cooperation, defection and abstinence in this environment. At each Monte Carlo time step, an agent’s strategy (cooperate, defect, abstain) and the link weight between agents can be updated.
The payoff received by an agent after playing with another agent is a product of the strategy played and the weight of the link between agents. We explored the effect of the link update rules by varying the values of the link weight amplitude , the loner’s payoff , and the temptation to defect . The aims were to understand the relationship between these parameters and also investigate the evolution of cooperation when abstainers are present in the population.
Results showed that, in adverse scenarios where is very high (i.e. ), some combinations of high values of and , such as for and , can further improve the levels of cooperation, even resulting in the full dominance of cooperation.
When , and , it was possible to observe a balance among the three strategies, indicating that, for some parameter settings, the Optional Prisoner’s Dilemma game is intransitive. In other words, such scenarios produce a loop of dominance in which abstainer agents beat defector agents, defector agents beat cooperator agents and cooperator agents beat abstainer agents.
In summary, the difference between the outcomes of (i.e., a static environment with unweighted links) and clearly showed that the coevolutionary model is very advantageous to the promotion of cooperative behaviour. In most cases, with populations with the loner’s payoff, , cooperation is promoted the most. Moreover, results also showed that cooperators are protected by abstainers against exploitation from defectors.
Although recent research has considered coevolving game strategy and link weights (Section 2), to our knowledge the investigation of such a coevolutionary model with optional games has not been studied to date. We conclude that the combination of both of these trends in evolutionary game theory may shed additional light on gaining an indepth understanding of the emergence of cooperative behaviour in realworld scenarios.
Future work will consider the exploration of different topologies and the influence of a wider range of scenarios, where, for example, agents could rewire their links, which, in turn, adds another level of complexity to the model. Future work will also involve applying our studies and results to model realistic scenarios, such as social networks and real biological networks.
Acknowledgements
This work was supported by the National Council for Scientific and Technological Development (CNPqBrazil).
Footnotes
 thanks: The final publication will be available in the Proceedings of the 8th International Joint Conference on Computational Intelligence. http://www.ecta.ijcci.org
References
 Batali, J. and Kitcher, P. (1995). Evolution of altruism in optional and compulsory games. Journal of Theoretical Biology, 175(2):161–171.
 Cao, L., Ohtsuki, H., Wang, B., and Aihara, K. (2011). Evolution of cooperation on adaptively weighted networks. Journal of Theoretical Biology, 272(1):8 – 15.
 Cardinot, M., Gibbons, M., O’Riordan, C., and Griffith, J. (2016). Simulation of an optional strategy in the prisoner’s dilemma in spatial and nonspatial environments. In From Animals to Animats 14 (SAB 2016), pages 145–156, Cham. Springer International Publishing.
 Chen, X. and Wang, L. (2008). Promotion of cooperation induced by appropriate payoff aspirations in a smallworld networked game. Physical Review E, 77:017103.
 Fisher, L. (2008). Rock, Paper, Scissors: Game Theory in Everyday Life. Basic Books.
 Fu, F., Liu, L.H., and Wang, L. (2007). Evolutionary prisoner’s dilemma on heterogeneous newmanwatts smallworld network. The European Physical Journal B, 56(4):367–372.
 Ghang, W. and Nowak, M. A. (2015). Indirect reciprocity with optional interactions. Journal of Theoretical Biology, 365:1–11.
 GómezGardeñes, J., Romance, M., Criado, R., Vilone, D., and Sánchez, A. (2011). Evolutionary games defined at the network mesoscale: The public goods game. Chaos, 21(1).
 Hauert, C., Traulsen, A., Brandt, H., and Nowak, M. A. (2008). Public goods with punishment and abstaining in finite and infinite populations. Biological Theory, 3(2):114–122.
 Huang, K., Zheng, X., Li, Z., and Yang, Y. (2015). Understanding cooperative behavior based on the coevolution of game strategy and link weight. Scientific Reports, 5:14783.
 Jeong, H.C., Oh, S.Y., Allen, B., and Nowak, M. A. (2014). Optional games on cycles and complete graphs. Journal of Theoretical Biology, 356:98–112.
 Nowak, M. A. and May, R. M. (1992). Evolutionary games and spatial chaos. Nature, 359(6398):826–829.
 Olejarz, J., Ghang, W., and Nowak, M. A. (2015). Indirect Reciprocity with Optional Interactions and Private Information. Games, 6(4):438–457.
 Perc, M. and Szolnoki, A. (2010). Coevolutionary games – A mini review. Biosystems, 99(2):109–125.
 Szabó, G. and Hauert, C. (2002). Evolutionary prisoner’s dilemma games with voluntary participation. Physical Review E, 66:062903.
 Szolnoki, A. and Perc, M. (2009). Promoting cooperation in social dilemmas via simple coevolutionary rules. The European Physical Journal B, 67(3):337–344.
 Szolnoki, A. and Perc, M. (2016). Leaders should not be conformists in evolutionary social dilemmas. Scientific Reports, 6:23633.
 Wang, Z., Szolnoki, A., and Perc, M. (2014). Selforganization towards optimally interdependent networks by means of coevolution. New Journal of Physics, 16(3):033041.
 Xia, C.Y., Meloni, S., Perc, M., and Moreno, Y. (2015). Dynamic instability of cooperation due to diverse activity patterns in evolutionary social dilemmas. EPL, 109(5):58002.
 Zimmermann, M. G., Eguíluz, V. M., and Miguel, M. S. (2001). Cooperation, Adaptation and the Emergence of Leadership, pages 73–86. Springer, Berlin, Heidelberg.
 Zimmermann, M. G., Eguíluz, V. M., and San Miguel, M. (2004). Coevolution of dynamical states and interactions in dynamic networks. Physical Review E, 69:065102.