Imitation dynamics with payoff shocks

# Imitation dynamics with payoff shocks

Panayotis Mertikopoulos CNRS (French National Center for Scientific Research), LIG, F-38000 Grenoble, France
and Univ. Grenoble Alpes, LIG, F-38000 Grenoble, France
http://mescal.imag.fr/membres/panayotis.mertikopoulos
and  Yannick Viossat PSL, Université Paris–Dauphine, CEREMADE UMR7534, Place du Maréchal de Lattre de Tassigny, 75775 Paris, France
###### Abstract.

We investigate the impact of payoff shocks on the evolution of large populations of myopic players that employ simple strategy revision protocols such as the “imitation of success”. In the noiseless case, this process is governed by the standard (deterministic) replicator dynamics; in the presence of noise however, the induced stochastic dynamics are different from previous versions of the stochastic replicator dynamics (such as the aggregate-shocks model of Fudenberg and Harris, 1992). In this context, we show that strict equilibria are always stochastically asymptotically stable, irrespective of the magnitude of the shocks; on the other hand, in the high-noise regime, non-equilibrium states may also become stochastically asymptotically stable and dominated strategies may survive in perpetuity (they become extinct if the noise is low). Such behavior is eliminated if players are less myopic and revise their strategies based on their cumulative payoffs. In this case, we obtain a second order stochastic dynamical system whose attracting states coincide with the game’s strict equilibria and where dominated strategies become extinct (a.s.), no matter the noise level.

###### Key words and phrases:
Dominated strategies and evolutionary dynamics and replicator dynamics and revision protocols and aggregate payoff shocks and strict equilibria.
Supported in part by the French National Research Agency under grant no. GAGA–13–JS01–0004–01 and the French National Center for Scientific Research (CNRS) under grant no. PEPS–GATHERING–2014.

## 1. Introduction

Evolutionary game dynamics study the evolution of behavior in populations of boundedly rational agents that interact strategically. The most widely studied dynamical model in this context is the replicator dynamics: introduced in biology as a model of natural selection (Taylor and Jonker, 1978), the replicator dynamics also arise from models of imitation of successful individuals (Schlag, 1998; Weibull, 1995; Björnerstedt and Weibull, 1996) and from models of learning in games (Rustichini, 1999; Hofbauer et al., 2009; Mertikopoulos and Moustakas, 2010). Mathematically, they stipulate that the growth rate of the frequency of a strategy is proportional to the difference between the payoff of individuals playing this strategy and the mean payoff in the population. These payoffs are usually assumed deterministic: this is typically motivated by a large population assumption and the premise that, owing to the law of large numbers, the resulting mean field provides a good approximation of a more realistic but less tractable stochastic model. This approach makes sense when the stochasticity affecting payoffs is independent across individuals playing the same strategies, but it fails when the payoff shocks are aggregate, that is, when they affect all individuals playing a given strategy in a similar way.

Such aggregate shocks are not uncommon. Bergstrom (2014) recounts the story of squirrels stocking nuts for the winter months: squirrels may stock a few or a lot of nuts, the latter leading to a higher probability of surviving a long winter but a higher exposure to predation. The unpredictable mildness or harshness of the ensuing winter will then favor one of these strategies in an aggregate way (see also Robson and Samuelson, 2011, Sec. 3.1.1, and references therein). In traffic engineering, one might think of a choice of itinerary to go to work: fluctuations of traffic on some roads affect all those who chose them in a similar way. Likewise, in data networks, a major challenge occurs when trying to minimize network latencies in the presence of stochastic disturbances: in this setting, the travel time of a packet in the network does not depend only on the load of each link it traverses, but also on unpredictable factors such as random packet drops and retransmissions, fluctuations in link quality, excessive backlog queues, etc. (Bertsekas and Gallager, 1992).

Incorporating such aggregate payoff shocks in the biological derivation of the replicator dynamics leads to the stochastic replicator dynamics of Fudenberg and Harris (1992), later studied by (among others) Cabrales (2000), Imhof (2005) and Hofbauer and Imhof (2009). To study the long-run behavior of these dynamics, Imhof (2005) introduced a modified game where the expected payoff of a strategy is penalized by a term which increases with the variance of the noise affecting this strategy’s payoff (see also Hofbauer and Imhof, 2009). Among other results, it was then shown that a) strategies that are iteratively (strictly) dominated in this modified game become extinct almost surely; and b) strict equilibria of the modified game are stochastically asymptotically stable.

In this biological model, noise is detrimental to the long-term survival of strategies: a strategy which is strictly dominant on average (i.e. in the original, unmodified game) but which is affected by shocks of substantially higher intensity becomes extinct almost surely. By contrast, in the learning derivation of the replicator dynamics, noise leads to a stochastic exponential learning model where only iteratively undominated strategies survive, irrespective of the intensity of the noise (Mertikopoulos and Moustakas, 2010); as a result the frequency of a strictly dominant strategy converges to almost surely. Moreover, strict Nash equilibria of the original game remain stochastically asymptotically stable (again, independently of the level of the noise), so the impact of the noise in the stochastic replicator dynamics of exponential learning is minimal when compared to the stochastic replicator dynamics with aggregate shocks.

In this paper, we study the effect of payoff shocks when the replicator equation is seen as a model of imitation of successful agents. As in the case of Imhof (2005) and Hofbauer and Imhof (2009), it is convenient to introduce a noise-adjusted game which is reduced to the original game in the noiseless, deterministic regime. We show that: a) strategies that are iteratively strictly dominated in the modified game become extinct almost surely; and b) strict equilibria of the modified game are stochastically asymptotically stable. However, despite the formal similarity, our results are qualitatively different from those of Imhof (2005) and Hofbauer and Imhof (2009): in the modified game induced by imitation of success in the presence of noise, noise is not detrimental per se. In fact, in the absence of differences in expected payoffs, a strategy survives with a probability that does not depend on the random variance of its payoffs: a strategy’s survival probability is simply its initial frequency. Similarly, even if a strategy which is strictly dominant in expectation is subject to arbitrarily high noise, it will always survive with positive probability; by contrast, such strategies become extinct (a.s.) in the aggregate shocks model of Fudenberg and Harris (1992).

That said, the dynamics’ long-term properties change dramatically if players are less “myopic” and, instead of imitating strategies based on their instantaneous payoffs, they base their decisions on the cumulative payoffs of their strategies over time. In this case, we obtain a second-order stochastic replicator equation which can be seen as a noisy version of the higher order game dynamics of Laraki and Mertikopoulos (2013). Thanks to this payoff aggregation mechanism, the noise averages out in the long run and we recover results that are similar to those of Mertikopoulos and Moustakas (2010): strategies that are dominated in the original game become extinct (a.s.) and strict Nash equilibria attract nearby initial conditions with arbitrarily high probability.

### 1.1. Paper Outline

The remainder of our paper is structured as follows: in Section 2, we present our model and we derive the stochastic replicator dynamics induced by imitation of success in the presence of noise. Our long-term rationality analysis begins in Section 3 where we introduce the noise-adjusted game discussed above and we state our elimination and stability results in terms of this modified game. In Section 4, we consider the case where players imitate strategies based on their cumulative payoffs and we show that the adjustment due to noise is no longer relevant. Finally, in Section 5, we discuss some variants of our core model related to different noise processes.

### 1.2. Notational conventions

The real space spanned by a finite set will be denoted by and we will write for its canonical basis; in a slight abuse of notation, we will also use to refer interchangeably to either or and we will write for the Kronecker delta symbols on . The set of probability measures on will be identified with the -dimensional simplex of and the relative interior of will be denoted by ; also, the support of will be written . For simplicity, if is a finite family of finite sets, we use the shorthand for the tuple and we write instead of . Unless mentioned otherwise, deterministic processes will be represented by lowercase letters, while their stochastic counterparts will be denoted by the corresponding uppercase letter. Finally, we will suppress the dependence of the law of a process on its initial condition , and we will write instead of .

## 2. The model

In this section, we recall a few preliminaries from the theory of population games and evolutionary dynamics, and we introduce the stochastic game dynamics under study.

### 2.1. Population games

Our main focus will be games played by populations of nonatomic players. Formally, such games consist of a finite set of player populations (assumed for simplicity to have unit mass), each with a finite set of pure strategies (or types) , . During play, each player chooses a strategy and the state of each population is given by the distribution of players employing each strategy . Accordingly, the state space of the -th population is the simplex and the state space of the game is the product .

The payoff to a player of population playing is determined by the corresponding payoff function (assumed Lipschitz).111Note that we are considering general payoff functions and not only multilinear (resp. linear) payoffs arising from asymmetric (resp. symmetric) random matching in finite -person (resp. -person) games. This distinction is important as it allows our model to cover e.g. general traffic games as in Sandholm (2010). Thus, given a population state , the average payoff to population will be

 ∑kαxkαvkα(x)=⟨vk(x)|x⟩, (2.1)

where denotes the payoff vector of the -th population in the state . Putting all this together, a population game is then defined as a tuple of nonatomic player populations , their pure strategies and the associated payoff functions .

In this context, we say that a pure strategy is dominated by if

 vkα(x)

i.e. the payoff of an -strategist is always inferior to that of a -strategist. More generally (and in a slight abuse of terminology), we will say that is dominated by if

 ⟨vk(x)|pk⟩<⟨vk(x)|p′k⟩for all x∈X, (2.3)

i.e. when the average payoff of a small influx of mutants in population is always greater when they are distributed according to rather than (irrespective of the incumbent population state ).

Finally, we will say that the population state is at Nash equilibrium if

 vkα(x∗)≥vkβ(x∗)for all α∈supp(x∗k) and for all β∈Ak, k∈N. (NE)

In particular, if is pure (in the sense that is a singleton) and (NE) holds as a strict inequality for all , will be called a strict equilibrium.

###### Remark 2.1.

Throughout this paper, we will be suppressing the population index for simplicity, essentially focusing in the single-population case. This is done only for notational clarity: all our results apply as stated to the multi-population model described in detail above.

### 2.2. Revision protocols

A fundamental evolutionary model in the context of population games is provided by the notion of a revision protocol. Following Sandholm (2010, Chapter 3), it is assumed that each nonatomic player receives an opportunity to switch strategies at every ring of an independent Poisson alarm clock, and this decision is based on the payoffs associated to each strategy and the current population state. The players’ revision protocol is thus defined in terms of the conditional switch rates that determine the relative mass of players switching from to over an infinitesimal time interval :222In other words, is the probability of an -strategist becoming a -strategist up to normalization by the alarm clocks’ rate.

 dxαβ=xαραβdt. (2.4)

The population shares are then governed by the revision protocol dynamics:

 ˙xα=∑βxβρβα−xα∑βραβ, (2.5)

with defined arbitrarily.

In what follows, we will focus on revision protocols of the general “imitative” form

 ραβ(v,x)=xβrαβ(v,x), (2.6)

corresponding to the case where a player imitates the strategy of a uniformly drawn opponent with probability proportional to the so-called conditional imitation rate (assumed Lipschitz). In particular, one of the most widely studied revision protocols of this type is the “imitation of success” protocol (Weibull, 1995) where the imitation rate of a given target strategy is proportional to its payoff,333Modulo an additive constant which ensures that is positive but which cancels out when it comes to the dynamics. i.e.

 rαβ(v,x)=vβ (2.7)

On account of (2.5), the mean evolutionary dynamics induced by (2.7) take the form:

 ˙xα=xα[vα(x)−∑βxβvβ(x)], (RD)

which is simply the classical replicator equation of Taylor and Jonker (1978).

The replicator dynamics have attracted significant interest in the literature and their long-run behavior is relatively well understood. For instance, Akin (1980), Nachbar (1990) and Samuelson and Zhang (1992) showed that dominated strategies become extinct under (RD), whereas the (multi-population) “folk theorem” of evolutionary game theory (Hofbauer and Sigmund, 2003) states that a) (Lyapunov) stable states are Nash; b) limits of interior trajectories are Nash; and c) strict Nash equilibria are asymptotically stable under (RD).

### 2.3. Payoff shocks and the induced dynamics

Our main goal in this paper is to investigate the rationality properties of the replicator dynamics in a setting where the players’ payoffs are subject to exogenous stochastic disturbances. To model these “payoff shocks”, we assume that the players’ payoffs at time are of the form for some zero-mean “white noise” process . Then, in Langevin notation, the replicator dynamics (RD) become:

 dXαdt =Xα[^vα−∑βXβ^vβ] =Xα[vα(X)−∑βXβvβ(X)]+Xα[ξα−∑βXβξβ], (2.8)

or, in stochastic differential equation (SDE) form:

 dXα =Xα[vα(X)−∑βXβvβ(X)]dt (SRD) +Xα[σα(X)dWα−∑βXβσβ(X)dWβ],

where the diffusion coefficients (assumed Lipschitz) measure the intensity of the payoff shocks and the Wiener processes are assumed independent.

The stochastic dynamics (SRD) will constitute the main focus of this paper, so some remarks are in order:

###### Remark 2.2.

With and assumed Lipschitz, it follows that (SRD) admits a unique (strong) solution for every initial condition . Moreover, since the drift and diffusion terms of (SRD) all vanish at the boundary of , standard arguments can be used to show that these solutions exist (a.s.) for all time, and that for all if (Øksendal, 2007; Khasminskii, 2012).

###### Remark 2.3.

The independence assumption for the Wiener processes can be relaxed without qualitatively affecting our analysis;444An important special case where it makes sense to consider correlated shocks is if the payoff functions are derived from random matchings in a finite game whose payoff matrix is subject to stochastic perturbations. This specific disturbance model is discussed in Section 5. in particular, as we shall see in the proofs of our results, the rationality properties of (SRD) can be formulated directly in terms of the quadratic (co)variation of the noise processes . Doing so however would complicate the relevant expressions considerably, so, for clarity, we will retain this independence assumption throughout our paper.

###### Remark 2.4.

The deterministic replicator dynamics (RD) are also the governing dynamics for the “pairwise proportional imitation” revision protocol (Schlag, 1998) where a revising agent imitates the strategy of a randomly chosen opponent only if the opponent’s payoff is higher than his own, and he does so with probability proportional to the payoff difference. Formally, the conditional switch rate under this revision protocol is:

 ραβ=xβ[vβ−vα]+, (2.9)

where denotes the positive part of . Accordingly, if the game’s payoffs at time are of the perturbed form as before, (2.5) leads to the master stochastic equation:

 ˙Xα =∑βXβXα[^vα−^vβ]+−Xα∑βXβ[^vβ−^vα]+ =Xα[^vα−∑βXβ^vβ], (2.10)

which is simply the stochastic replicator dynamics (2.3). In other words, (SRD) could also be interpreted as the mean dynamics of a pairwise imitation process with perturbed payoff comparisons as above.

### 2.4. Related stochastic models

The replicator dynamics were first introduced in biology, as a model of frequency-dependent selection. They arise from the geometric population growth equation:

 ˙zα=zαvα (2.11)

where denotes the absolute population size of the -th genotype of a given species.555The replicator equation (RD) is obtained simply by computing the evolution of the frequencies under (2.11). This biological model was also the starting point of Fudenberg and Harris (1992) who added aggregate payoff shocks to (2.11) based on the geometric Brownian model:

 dZα=Zα[vαdt+σαdWα], (2.12)

where the diffusion process represents the impact of random, weather-like effects on the genotype’s fitness (see also Cabrales, 2000; Imhof, 2005; Hofbauer and Imhof, 2009).666Khasminskii and Potsepun (2006) also considered a related evolutionary model with Stratonovich-type perturbations while, more recently, Vlasic (2012) studied the effect of discontinuous semimartingale shocks incurred by catastrophic, earthquake-like events. Itô’s lemma applied to the population shares then yields the replicator dynamics with aggregate shocks:

 dXα =Xα[vα(X)−∑βXβvβ(X)]dt (2.13) +Xα[σαdWα−∑βσβXβdWβ] −Xα[σ2αXα−∑βσ2βX2β]dt.

In a repeated game context, the replicator dynamics also arise from a continuous-time variant of the exponential weight algorithm introduced by Vovk (1990) and Littlestone and Warmuth (1994) (see also Sorin, 2009). In particular, if players follow the exponential learning scheme:

 dyα =vα(x)dt, (2.14) xα =exp(yα)∑βexp(yβ),

that is, if they play a logit best response to the vector of their cumulative payoffs, then the frequencies follow (RD).777The intermediate variable should be thought of as an evaluation of how good the strategy is, and the formula for as a way of transforming these evaluations into a strategy. Building on this, Mertikopoulos and Moustakas (2009, 2010) considered the stochastically perturbed exponential learning scheme:

 dYα =vα(X)dt+σα(X)dWα, (2.15) Xα =exp(Yα)∑βexp(Yβ),

where the cumulative payoffs are perturbed by the observation noise process . By Itô’s lemma, we then obtain the stochastic replicator dynamics of exponential learning:

 dXα =Xα[vα(X)−∑βXβvβ(X)]dt (2.16) +Xα[σαdWα−∑βσβXβdWβ] +Xα2[σ2α(1−2Xα)−∑βσ2βXβ(1−2Xβ)]dt.

Besides their very distinct origins, a key difference between the stochastic replicator dynamics (SRD) and the stochastic models (2.13)/(2.16) is that there is no Itô correction term in the former. The reason for this is that in (2.13) and (2.16), the noise affects primarily the evolution of an intermediary variable (the absolute population sizes and the players’ cumulative payoffs respectively) before being carried over to the evolution of the strategy shares . By contrast, the payoff shocks that impact the players’ revision protocol in (SRD) affect the corresponding strategy shares directly, so there is no intervening Itô correction.

#### The pure noise case.

To better understand the differences between our model and previous models of stochastic replicator dynamics, it is useful to consider the case of pure noise, that is, when the expected payoff of each strategy is equal to one and the same constant : for all and for all .

For simplicity, let us also assume that is independent of the state of the population . Eq. (2.12) then becomes a simple geometric Brownian motion of the form:

 dZα=Zα[Cdt+σαdWα], (2.17)

which readily yields . The corresponding frequency will then be:

 Xα(t)=Xα(0)exp(−12σ2αt+σαWα(t))∑βXβ(0)exp(−12σ2βt+σβWβ(t)). (2.18)

If , the law of large numbers yields (a.s.). Therefore, letting , it follows from (2.18) that strategy is eliminated if and survives if (a.s.).888Elimination is obvious; for survival, simply add to the exponents of (2.18) and recall that any Wiener process has and (a.s.). In particular, if all intensities are equal ( for all ), then all strategies survive and the share of each strategy oscillates for ever, occasionally taking values arbitrarily close to and arbitrarily close to . On the other hand, under the stochastic replicator dynamics of exponential learning for the pure noise case, (2.16) readily yields:

 Xα(t)=Xα(0)exp(σαWα(t))∑βXβ(0)exp(σβWβ(t)). (2.19)

Therefore, for any value of the diffusion coefficients (and, in particular, even if some strategies are affected by noise much more than others), all pure strategies survive.

Our model behaves differently from both (2.13) and (2.16): in the pure noise case, for any value of the noise coefficients (as long as for all ), only a single strategy survives (a.s.), and strategy survives with probability equal to . To see this, consider first the model with pure noise and only two strategies, and . Then, letting (so ), we get:

 dX(t)=X(t)(1−X(t))[σαdWα−σβdWβ]=X(t)(1−X(t))σdW(t), (2.20)

where and we have used the time-change theorem for martingales to write for some Wiener process . This diffusion process can be seen as a continuous-time random walk on with step sizes that get smaller as approaches . Thus, at a heuristic level, when starts close to and takes one step to the left followed by one step to the right (or the opposite), the walk does not return to its initial position, but will approach (of course, the same phenomenon occurs near ). This suggests that the process should eventually converge to one of the vertices: indeed, letting , Itô’s lemma yields

 df(X)=(1−2X)σdW−12[(1−X)2+X2]σ2dt≤(1−2X)σdW−14σ2dt, (2.21)

so, by Lemma A.1, we get (a.s.), that is, .

More generally, consider the model with pure noise and strategies. Then, computing as above, we readily obtain (a.s.), for every strategy with . Since is a martingale, we will have for all ,999We are implicitly assuming here deterministic initial conditions, i.e. (a.s.) for some . so with probability and with probability .101010If several strategies are unaffected by noise, that is, are such that , then their relative shares remain constant (that is, if and are two such strategies, then for all ). It follows from this observation and the above result that, almost surely, all these strategies are eliminated or all these strategies survive (and only them).

The above highlights two important differences between our model and the stochastic replicator dynamics of Fudenberg and Harris (1992). First, in our model, noise is not detrimental in itself: in the pure noise case, the expected frequency of a strategy remains constant, irrespective of the noise level; by contrast, in the model of Fudenberg and Harris (1992), the expected frequency of strategies affected by strong payoff noise decreases.111111In the pure noise case of the model of Fudenberg and Harris (1992), what remains constant is the expected number of individuals playing a strategy. A crucial point here is that this number may grow to infinity. What happens to strategies affected by large aggregate shocks is that with small probability, the total number of individuals playing this strategy gets huge, but with a large probability (going to 1), it gets small (at least compared to the number of individuals playing other strategies). This can be seen as a gambler’s ruin phenomenon, which explains that even with a higher expected payoff than others (hence a higher expected subpopulation size), the frequency of a strategy may go to zero almost surely (see e.g. Robson and Samuelson, 2011, Sec 3.1.1). This cannot happen in our model since noise is added directly to the frequencies (which are bounded). Second, our model behaves in a somewhat more “unpredictable” way: for instance, in the model of Fudenberg and Harris (1992), when there are only two strategies with the same expected payoff, and if one of the strategies is affected by a stronger payoff noise, then it will be eliminated (a.s.); in our model, we cannot say in advance whether it will be eliminated or not.

## 3. Long-term rationality analysis

In this section, we investigate the long-run rationality properties of the stochastic dynamics (SRD); in particular, we focus on the elimination of dominated strategies and the stability of equilibrium play.

### 3.1. Elimination of dominated strategies

We begin with the elimination of dominated strategies. Formally, given a trajectory of play , we say that a pure strategy becomes extinct along if as . More generally, following Samuelson and Zhang (1992), we will say that the mixed strategy becomes extinct along if as ; otherwise, we say that survives.

Now, with a fair degree of hindsight, it will be convenient to introduce a modified game with payoff functions adjusted for noise as follows:

 vσα(x)=vα(x)−12(1−2xα)σ2α(x). (3.1)

Imhof (2005) introduced a similar modified game to study the long-term convergence and stability properties of the stochastic replicator dynamics with aggregate shocks (2.13) and showed that strategies that are dominated in this modified game are eliminated (a.s.) – cf. Remark 3.7 below. Our main result concerning the elimination of dominated strategies under (SRD) is of a similar nature:

###### Theorem 3.1.

Let be an interior solution orbit of the stochastic replicator dynamics (SRD). Assume further that is dominated by in the modified game . Then, becomes extinct along (a.s.).

###### Remark 3.1.

As a special case, if the (pure) strategy is dominated by the (pure) strategy , Theorem 3.1 shows that becomes extinct under (SRD) as long as

 vβ(x)−vα(x)>12[σ2α(x)+σ2β(x)]for all x∈X. (3.2)

In terms of the original game, this condition can be interpreted as saying that is dominated by by a margin no less that . Put differently, Theorem 3.1 shows that dominated strategies in the original, unmodified game become extinct provided that the payoff shocks are mild enough.

###### Proof of Theorem 3.1.

Following Cabrales (2000), we will show that becomes extinct along by studying the “cross-entropy” function:

 V(x)=D\textupKL(p,x)−D\textupKL(p′,x)=∑α(pαlogpα−p′αlogp′α)+∑α(p′α−pα)logxα, (3.3)

where denotes the Kullback–Leibler (KL) divergence of with respect to . By a standard argument (Weibull, 1995), becomes extinct along if ; thus, with , it suffices to show that .

To that end, let so that

 dYα=dXαXα−121X2α(dXα)2, (3.4)

by Itô’s lemma. Then, writing for the martingale term of (SRD), we readily obtain:

 (dSα)2 =X2α[σαdWα−∑βXβσβdWβ]⋅[σαdWα−∑γXγσγdWγ] =X2α[(1−2Xα)σ2α+∑βσ2βX2β]dt, (3.5)

where we have used the orthogonality conditions . By the same token, we also get , and hence:

 dYα =(vα−⟨v|X⟩)dt−12[(1−2Xα)σ2α+∑βσ2βX2β]dt (3.6) +σαdWα−∑βXβσβdWβ.

Therefore, after some easy algebra, we obtain:

 dV =⟨v(X)|p′−p⟩dt −12∑α(p′α−pα)(1−2Xα)σ2α(X)dt+∑α(p′α−pα)σα(X)dWα =⟨vσ(X)∣∣p′−p⟩dt+∑α(p′α−pα)σα(X)dWα (3.7)

where we have used the fact that .

Now, since is dominated by in , we will have for some positive constant and for all . Eq. (4.6) then yields:

 V(X(t))≥V(X(0))+mt+ξ(t), (3.8)

where denotes the martingale part of (4.6), viz.

 ξ(t)=∑α(p′α−pα)∫t0σα(X(s))dWα(s). (3.9)

Since the is bounded and continuous (a.s.), Lemma A.1 shows that as , so the RHS of (3.8) escapes to as . This implies and our proof is complete. ∎

Theorem 3.1 is our main result concerning the extinction of dominated strategies under (SRD) so a few remarks are in order:

###### Remark 3.2.

Theorem 3.1 is analogous to the elimination results of Imhof (2005, Theorem 3.1) and Cabrales (2000, Prop. 1A) who show that dominated strategies become extinct under the replicator dynamics with aggregate shocks (2.13) if the shocks satisfy certain “tameness” requirements. On the other hand, Theorem 3.1 should be contrasted to the corresponding results of Mertikopoulos and Moustakas (2010) who showed that dominated strategies become extinct under the stochastic replicator dynamics of exponential learning (2.16) irrespective of the noise level (for a related elimination result, see also Bravo and Mertikopoulos, 2014). The crucial qualitative difference here lies in the Itô correction term that appears in the drift of the stochastic replicator dynamics: the Itô correction in (2.16) is “just right” with respect to the logarithmic variables and this is what leads to the unconditional elimination of dominated strategies. On the other hand, even though there is no additional drift term in (SRD) except for the one driven by the game’s payoffs, the logarithmic transformation incurs an Itô correction which is reflected in the definition of the modified payoff functions (3.1).

###### Remark 3.3.

A standard induction argument based on the rounds of elimination of iteratively dominated strategies (see e.g. Cabrales, 2000 or Mertikopoulos and Moustakas, 2010) can be used to show that the only strategies that survive under the stochastic replicator dynamics (SRD) must be iteratively undominated in the modified game .

###### Remark 3.4.

Finally, it is worth mentioning that Imhof (2005) also establishes an exponential rate of extinction of dominated strategies under the stochastic replicator dynamics with aggregate shocks (2.13). Specifically, if is dominated, Imhof (2005) showed that there exist constants and such that

 Xα(t)=o(exp(−At+B√tloglogt))%(a.s.), (3.10)

and

 P[Xα(t)>ε]≤12erfc[A′t1/2+B′logε⋅t−1/2], (3.11)

provided that the noise coefficients of (2.13) satisfy a certain “tameness” condition. Following the same reasoning, it is possible to establish similar exponential decay rates for the elimination of dominated strategies under (SRD), but the exact expressions for the constants in (3.10) and (3.11) are more complicated, so we do not present them here.

### 3.2. Stability analysis of equilibrium play

In this section, our goal will be to investigate the stability and convergence properties of the stochastic replicator dynamics (SRD) with respect to equilibrium play. Motivated by a collection of stability results that is sometimes called the “folk theorem” of evolutionary game theory (Hofbauer and Sigmund, 2003), we will focus on the following three properties of the deterministic replicator dynamics (RD):

1. Limits of interior orbits are Nash equilibria.

2. Lyapunov stable states are Nash equilibria.

3. Strict Nash equilibria are asymptotically stable under (RD).

Of course, given the stochastic character of the dynamics (SRD), the notions of Lyapunov and asymptotic stability must be suitably modified. In this SDE context, we have:

###### Definition 3.2.

Let . We will say that:

1. is stochastically Lyapunov stable under (SRD) if, for every and for every neighborhood of in , there exists a neighborhood of such that

 P(X(t)∈U0 for all t≥0)≥1−εwhenever X(0)∈U. (3.12)
2. is stochastically asymptotically stable under (SRD) if it is stochastically stable and attracting: for every and for every neighborhood of in , there exists a neighborhood of such that

 P(X(t)∈U0 for all t≥0 and limt→∞X(t)=x∗)≥1−εwhenever X(0)∈U. (3.13)

For (SRD), we have:

###### Theorem 3.3.

Let be an interior solution orbit of the stochastic replicator dynamics (SRD) and let .

1. If , then is a Nash equilibrium of the noise-adjusted game .

2. If is stochastically Lyapunov stable, then it is also a Nash equilibrium of the noise-adjusted game .

3. If is a strict Nash equilibrium of the noise-adjusted game , then it is stochastically asymptotically stable under (SRD).

###### Remark 3.5.

By the nature of the modified payoff functions (3.1), strict equilibria of the original game are also strict equilibria of , so Theorem 3.3 implies that strict equilibria of are also stochastically asymptotically stable under the stochastic dynamics (SRD). The converse does not hold: if the noise coefficients are sufficiently large, (SRD) possesses stochastically asymptotically stable states that are not Nash equilibria of . This is consistent with the behavior of (SRD) in the pure noise case that we discussed in the previous section: if starts within of a vertex of and there are no payoff differences, then converges to this vertex with probability at least .

###### Remark 3.6.

The condition for to be a strict equilibrium of the modified game is that

 vβ−vα<12(σ2α+σ2β)for all β≠α, (3.14)

where the payoffs and the noise coefficients are evaluated at the vertex of (note the similarity with (3.2)). To provide some intuition for this condition, consider the case of only two pure strategies, and , and assume constant noise coefficients. Letting and proceeding as in (2.20), we get where and is a rescaled Wiener process. Heuristically, a discrete-time counterpart of is then provided by the random walk:

 X(n+1)−X(n)=X(n)(1−X(n))[(vβ−vα)δ+σξn√δ] (3.15)

where is a zero-mean Bernoulli process, and the noise term is multiplied by instead of because . For small and , a simple computation then shows that, in the event , we have:

 X(n+2)−X(n)=2δX(n)[vβ−vα−12σ2]+o(δ)+o(X(n)). (3.16)

Since , the bracket is negative (so increases) if and only if condition (3.14) is satisfied. Thus, (3.14) may be interpreted as saying that when the discrete-time process is close to and the random noise term takes two successive steps in opposite direction, then the process ends up even closer to .121212Put differently, it’s more probable for to decrease rather than increase: with probability (i.e. if and only if takes two positive steps), while with probability . On the other hand, if the opposite strict inequality holds, then this interpretation suggests that should successfully invade a population where most individuals play – which, in turn, explains (3.2).

###### Proof of Theorem 3.3.

Contrary to the approach of Hofbauer and Imhof (2009), we will not employ the stochastic Lyapunov method (see e.g. Khasminskii, 2012) which requires calculating the infinitesimal generator of (SRD). Instead, motivated by the recent analysis of Bravo and Mertikopoulos (2014), our proof will rely on the “dual” variables that were already used in the proof of Theorem 3.1.

#### Part 1.

We argue by contradiction. Indeed, assume that but that is not Nash for the noise-adjusted game , so for some , . On that account, let be a sufficiently small neighborhood of in such that for some and for all . Then, by (3.6), we get:

 dYα−dYβ =[vα−vβ]dt−12[(1−2Xα)σ2α−(1−2Xβ)σ2β]dt (3.17) +σαdWα−σβdWβ,

so, if is an interior orbit of (SRD) that converges to , we will have:

 dYα−dYβ≤−mdt−dξfor all large enough t>0, (3.18)

where denoting the martingale part of (3.17). Since the diffusion coefficients of (3.17) are bounded, Lemma A.1 shows that for large (a.s.), so

 logXα(t)Xβ(t)≤logXα(0)Xβ(0)−mt−ξ(t)∼−mt→−∞(a.s.) (3.19)

as . This implies that , contradicting our original assumption that stays in a small enough neighborhood of with positive probability (recall that ); we thus conclude that is a Nash equilibrium of the noise-adjusted game , as claimed.

#### Part 2.

Assume that is stochastically Lyapunov stable. Then, every neighborhood of admits an interior trajectory that stays in for all time with positive probability. The proof of Part 1 shows that this only possible if is a Nash equilibrium of the modified , so our claim follows.

#### Part 3.

To show that strict Nash equilibria of are stochastically asymptotically stable, let be a strict equilibrium of . Then, suppressing the population index as before, let

 Zα=Yα−Yα∗, (3.20)

so that if and only if for all .131313Simply note that .

To proceed, fix some probability threshold and a neighborhood of in . Since is a strict equilibrium of , there exists a neighborhood of and some such that

 vσα∗(x)−vσα(x)≥mfor all x∈U and for all α∈A∗. (3.21)

Let be sufficiently large so that if for all ; we will show that if is chosen suitably (in terms of ) and , then for all and with probability at least , i.e. is stochastically asymptotically stable.

To that end, take in (3.20) and define the first exit time:

 τU=inf{t>0:X(t)∉U}. (3.22)

By applying (3.17), we then get:

 dZα=dYα−dYα∗=[vσα−vσα∗]dt−dξ, (3.23)

where the martingale term is defined as in (3.17), taking . Hence, for all , we will have:

 Zα(t)=Zα(0)+∫t0[vσα(X(s))−vσα∗(X(s))]ds−ξ(t)≤−2M−mt−ξ(t). (3.24)

By the time-change theorem for martingales (Øksendal, 2007, Cor. 8.5.4), there exists a standard Wiener process such that where denotes the quadratic variation of ; as such, we will have whenever . However, with Lipschitz over , we readily get for some positive constant , so it suffices to show that the hitting time

 τ0=inf{t>0:˜W(t)=−M−mt/K} (3.25)

is finite with probability not exceeding . Indeed, if a trajectory of has for all , we will also have

 ˜W(ρ(t))≥−M−mρ(t)/K≥−M−mt, (3.26)

so is infinite for every trajectory of with infinite , hence . Lemma A.2 then shows that , so, if we take , we get . Conditioning on the event , Lemma A.1 applied to (3.24) yields

 Zα(t)≤−2M−mt−ξ(t)∼−mt→−∞(a.s.) (3.27)

so with probability at least , as was to be shown. ∎

###### Remark 3.7.

As mentioned before, Hofbauer and Imhof (2009) state a similar “evolutionary folk theorem” in the context of single-population random matching games under the stochastic replicator dynamics with aggregate shocks (2.13). In particular, Hofbauer and Imhof (2009) consider the modified game:

 vσα(x)=vα(x)−12σ2α, (3.28)

where denotes the intensity of the aggregate shocks in (2.13), and they show that strict Nash equilibria of this noise-adjusted game are stochastically asymptotically stable under (2.13). It is interesting to note that the adjustments (3.1) and (3.28) do not coincide: the payoff shocks affect the deterministic replicator equation (RD) in a different way than the aggregate shocks of (2.13). Heuristically, in the model of Fudenberg and Harris (1992), noise is detrimental because for a given expected growth rate, noise almost surely lowers the long-term average geometric growth rate of the total number of individuals playing by the quantity . In a geometric growth process, the quantities that matter (the proper fitness measures) are these long-term geometric growth rates, so the relevant payoffs are those of this modified game.141414In a discrete time setting, if and with probability , what we mean is that the quantity that a.s. governs the long-term growth of is not , but . In our model, noise is not detrimental, but if it is strong enough compared to the deterministic drift, then, with positive probability, it may lead to other outcomes than the deterministic model. Instead, the assumptions of Theorems 3.1 and 3.3 should be interpreted as guaranteeing that the deterministic drift prevails. One way to see this is to note that if strictly dominates in the original game and both strategies are affected by the same noise intensity (), then need not dominate in the modified game defined by (3.1), unless the payoff margin in the original game is always greater than .

###### Remark 3.8.

It is also worth contrasting Theorem 3.3 to the unconditional convergence and stability results of Mertikopoulos and Moustakas (2010) for the stochastic replicator dynamics of exponential learning (2.16). As in the case of dominated strategies, the reason for this qualitative difference is the distinct origins of the perturbation process: the Itô correction in (2.16) is “just right” with respect to the dual variables , so a state