Electron. Commun. Probab. 0 (2012), no. 0, 1–LABEL:LastPage.DOI: 10.1214/ECP.vVOL-PIDISSN: 1083-589X ELECTRONICCOMMUNICATIONSin PROBABILITY A rank-based mean field game in the strong formulationThis research is supported in part by the National Science Foundation under grant DMS-1613170.

# Electron. Commun. Probab. 0 (2012), no. 0, 1–LABEL:LastPage. Doi: 10.1214/ECP.vVOL-PID Issn: 1083-589x   Electronic Communications in PROBABILITY   A rank-based mean field game in the strong formulation††thanks: This research is supported in part by the National Science Foundation under grant DMS-1613170.

Erhan Bayraktar111University of Michigan, United States of America. E-mail: erhan@umich.edu    Yuchong Zhang222Columbia University, United States of America. E-mail: yz2915@columbia.edu
###### Abstract

We discuss a natural game of competition and solve the corresponding mean field game with common noise when agents’ rewards are rank-dependent. We use this solution to provide an approximate Nash equilibrium for the finite player game and obtain the rate of convergence.
Keywords: Mean field games; competition; common noise; rank-dependent interaction; non-local interaction; strong formulation.

AMS MSC 2010: 60H; 91A.

Submitted to ECP on March 26, 2016, final version accepted on October 6, 2016.

## 1 Introduction

Mean field games (MFGs), introduced independently by [8] and [6], provide a useful approximation for the finite player Nash equilibrium problems in which the players are coupled through their empirical distribution. In particular, the mean field game limit gives an approximate Nash equilibrium, in which the agents’ decision making is decoupled. In this paper we will consider a particular game in which the interaction of the players is through their ranks. Our main goal is to construct an approximate Nash equilibrium for a finite player game when the agents’ dynamics are modulated by common noise.

Rank-based mean field games, which have non-local mean field interactions, have been suggested in [4] and analyzed more generally by the recent paper by Carmona and Lacker [3] using the weak formulation, when there is no common noise. There are currently no results on the rank-dependent mean field games with common noise. In order to solve the problem with common noise, we will make use of the mechanism in [7] by solving the strong formulation of the rank-dependent mean field game without common noise and then by observing that purely rank-dependent reward functions are translation invariant.

The rest of the paper is organized as follows: In Section 2 we introduce the N-player game in which the players are coupled through the reward function which is rank-based. In Section 3 we consider the case without common noise. We first find the mean field limit, discuss the uniqueness of the Nash equilibrium, and construct an approximate Nash equilibrium using the mean field limit. Using these results, in Section 4 we use the mechanism in [7] and obtain respective results for the common noise.

## 2 The N-player game

We consider players each of whom controls her own state variable and is rewarded based on her rank. We will denote by the -th player’s state variable, and assume that it satisfies the following stochastic differential equation (SDE)

 dXi,t=ai,tdt+σdBi,t+σ0dWt,Xi,0=0,

where is the control by agent , and and are independent standard Brownian motions defined on some filtered probability space , representing the idiosyncratic noises and common noise, respectively. The game ends at time , when each player receives a rank-based reward minus the running cost of effort, which we will assumed to be quadratic for some constant .

In order to precisely define the rank-based reward, let

 ¯μN:=1NN∑i=1δXi,T

denote the empirical measure of the terminal state of the -player system. Then gives the fraction of players that finish the same or worse than player . Let be a bounded continuous function that is non-decreasing in both arguments. For any probability measure on , write where denotes the cumulative distribution function of . The reward player receives is given by

 R¯μN(Xi,T)=R(Xi,T,¯μN(−∞,Xi,T])=R(Xi,T,F¯μN(Xi,T)).

When is independent of , the compensation scheme is purely rank-based. In general, we could have a mixture of absolute performance compensation and relative performance compensation. The objective of each player is to observe the progress of all players and choose her effort level to maximize the expected payoff, while anticipating the other players’ strategies.

The players’ equilibrium expected payoffs, as functions of time and state variables, satisfy a system of coupled nonlinear partial differential equations subject to discontinuous boundary conditions, which appears to be analytically intractable. Fortunately, in a large-population game, the impact of any individual on the whole population is very small. So it is often good enough for each player to ignore the private state of any other individual and simply optimize against the aggregate distribution of the population. As a consequence, the equilibrium strategies decentralize in the limiting game as . We shall use the mean field limit to construct approximate Nash equilibrium for the -player game, both in the case with and without common noise.

## 3 Mean field approximation when there is no common noise

In this section, we assume . Solving the mean field game consists of two sub-problems: a stochastic control problem and a fixed-point problem (also called the consistency condition). For any Polish space , denote by the space of probability measures on , and .

We first fix a distribution of the terminal state of the population, and consider a single player’s optimization problem:

 v(t,x):=supa\mathdsEt,x[Rμ(XT)−∫Ttca2sds] (3.1)

where

 dXs=asds+σdBs, (3.2)

is a Brownian motion, and ranges over the set of progressively measurable processes satisfying . The associated dynamic programming equation is

 vt+supa{avx+12σ2vxx−ca2}=0

with terminal condition . Using the first-order condition, we obtain that the candidate optimizer is , and the Hamilton-Jacobi-Bellman (HJB) equation can be written as

 vt+12σ2vxx+(vx)24c=0.

The above equation can be linearized using the Cole-Hopf transformation , giving

 ut+12σ2uxx=0.

Together with the boundary condition , we can easily write down the solution:

 u(t,x)=\mathdsE[exp(12cσ2Rμ(x+σ√T−tZ))] (3.3)

where is a standard normal random variable. Let us further write as an integral:

 u(t,x) =∫∞−∞exp(12cσ2Rμ(x+σ√T−tz))1√2πexp(−z22)dz =∫∞−∞exp(12cσ2Rμ(y))1√2πσ2(T−t)exp(−(y−x)22σ2(T−t))dy.

Using the dominated convergence theorem, we can differentiate under the integral sign and get

 ux(t,x) =∫∞−∞exp(12cσ2Rμ(y))1√2πσ2(T−t)exp(−(y−x)22σ2(T−t))(y−x)σ2(T−t)dy =∫∞−∞exp(12cσ2Rμ(x+σ√T−tz))1√2πexp(−z22)zσ√T−tdz =\mathdsE[exp(12cσ2Rμ(x+σ√T−tZ))Zσ√T−t]. (3.4)

Similarly, we obtain

 uxx=\mathdsE[exp(12cσ2Rμ(x+σ√T−tZ))Z2−1σ2(T−t)]. (3.5)

Using (3.3)-(3.5), together with the boundedness and monotonicity of , we easily get the following estimates. Note that all bounds are independent of .

###### Lemma 3.1.

The functions and satisfy

 0

where .

Since is bounded, the drift coefficient is Lipschitz continuous in . It follows that the optimally controlled state process, denoted by , has a strong solution on . Observe that

 0≤∫Tta∗(s,X∗s)ds≤∫TtσK2√2/π√T−sds=2σK2√2(T−t)π<∞.

So the optimal cumulative effort is bounded by some constant independent of . It also implies that has a well-defined limit as . Standard verification theorem yields that the solution to the HJB equation is the value function of the problem (3.1)-(3.2), and that is the optimal Markovian feedback control. Finally, using the dominated convergence theorem again, we can show that for ,

 limx→±∞ux(t,x)=0.

The same limits also hold for since is bounded away from zero. In other words, the optimal effort level is small when the progress is very large in absolute value. This agrees with many real life observations that when a player has a very big lead, it is easy for her to show slackness; and when one is too far behind, she often gives up on the game instead of trying to catch up.

### 3.1 Existence of a Nash equilibrium

For each fixed , solving the stochastic control problem (3.1)-(3.2) yields a value function and a best response . Suppose the game is started at time zero, with zero initial progress, the optimally controlled state process of the generic player satisfies the SDE

 dXt=vx(t,Xt;μ)2cdt+σdBt, X0=0. (3.6)

Finding a Nash equilibrium for the limiting game is equivalent to finding a fixed point of the mapping , where denotes the law of its argument. We shall sometimes refer to such a fixed point as an equilibrium measure.

###### Theorem 3.2.

The mapping has a fixed point.

###### Proof.

Similar to [1], we will use Schauder’s fixed point theorem. Observe that for any , we have

 \mathdsE[|XμT|2]≤\mathdsE⎡⎣(2σK2√2Tπ+σ|BT|)2⎤⎦=:C0.

This implies the set of is tight in , hence relatively compact for the topology of weak convergence by Prokhorov theorem. Recall that . Equip with the topology induced by the 1-Wasserstein metric:

 W1(μ,μ′) :=inf{∫\mathdsR2|x−y|dπ(x,y):π∈P1(\mathdsR2) with marginals μ and μ′} =sup{∫\mathdsRψdμ−∫\mathdsRψdμ′:ψ∈Lip1(\mathdsR)}.

Here denotes the space of Lipschitz continuous functions on whose Lipschitz constant is bounded by one. It is known that is a complete separable metric space (see e.g. [9, Theorem 6.18]). We shall work with a subset of defined by

 E:={μ∈P1(\mathdsR):∫\mathdsR|x|2dμ(x)≤C0}.

It is easy to check that is non-empty, convex and closed (for the topology induced by the metric). Moreover, one can show using [9, Definition 6.8(iii)] that any weakly convergent sequence is also -convergent. Therefore, is also relatively compact for the topology induced by the metric. So we have found a non-empty, convex and compact set such that maps into itself. It remains to show is continuous on . In the rest of the proof, the constant may change from line to line.

Let such that as . We wish to show . Note that

 W1(Φ(μk),Φ(μ))≤\mathdsE[|XμkT−XμT|]≤12c∫T0\mathdsE[|vx(t,Xμkt;μk)−vx(t,Xμt;μ)|]dt.

From Lemma 3.1, we know that . Since , thanks to the dominated convergence theorem, it suffices to show for ,

 \mathdsE[|vx(t,Xμkt;μk)−vx(t,Xμt;μ)|]→0.

By Lemma 3.1 and the mean value theorem, we have that

 |vx(t,Xμkt;μk)−vx(t,Xμt;μ)| ≤|vx(t,Xμkt;μk)−vx(t,Xμt;μk)|+|vx(t,Xμt;μk)−vx(t,Xμt;μ)| ≤CT−t|Xμkt−Xμt|+|vx(t,Xμt;μk)−vx(t,Xμt;μ)|.

So to show , it suffices to show that for each fixed ,

 \mathdsE[|vx(t,Xμt;μk)−vx(t,Xμt;μ)|]→0, (3.7)

and

 \mathdsE[|Xμkt−Xμt|]→0. (3.8)

We first show (3.7). Using the estimates in Lemma 3.1, we get

 \mathdsE[|vx(t,Xμt;μk)−vx(t,Xμt;μ)|] =C\mathdsE[∣∣∣u(t,Xμt;μ)[ux(t,Xμt;μk)−ux(t,Xμt;μ)]+ux(t,Xμt;μ)[u(t,Xμt;μ)−u(t,Xμt;μk)]u(t,Xμt;μk)u(t,Xμt;μ)∣∣∣] ≤C\mathdsE[|ux(t,Xμt;μk)−ux(t,Xμt;μ)|]+C√T−t\mathdsE[|u(t,Xμt;μ)−u(t,Xμt;μk)|].

Since all integrands are bounded, to show the expectations converge to zero, it suffices to check that the integrands converge to zero a.s. Fix , we know from (3.4) that

 ∣∣ux(t,Xμt(ω);μk)−ux(t,Xμt(ω);μ)∣∣ ≤C\mathdsE[|Z|σ√T−t∣∣Rμk(x+σ√T−tZ)−Rμ(x+σ√T−tZ)∣∣]x=Xμt(ω).

Since , also converges to weakly, and the cumulative distribution function converges to at every point at which is continuous. It follows from the continuity of that converges to at every point at which is continuous. Since has at most countably many points of discontinuity, the random variable inside the expectation converges to zero a.s. The dominated convergence theorem then allows us to interchange the limit and the expectation, giving that

 |ux(t,Xμt(ω);μk)−ux(t,Xμt(ω);μ)|→0.

Similarly, from (3.3) we obtain

 ∣∣u(t,Xμt(ω),μk)−u(t,Xμt(ω),μ)∣∣ ≤C\mathdsE[∣∣Rμk(x+σ√T−tZ)−Rμ(x+σ√T−tZ)∣∣]x=Xμt(ω).

Again, using that has countably many points of discontinuity, one can show that

 ∣∣u(t,Xμt(ω),μk)−u(t,Xμt(ω),μ)∣∣→0.

Putting everything together, we have proved (3.7).

Next, we show (3.8) by Gronwall’s inequality. Let be given. For any ,

 \mathdsE[|Xμkr−Xμr|] ≤12c∫r0\mathdsE[∣∣vx(s,Xμks;μk)−vx(s,Xμs;μ)∣∣]ds ≤∫r0\mathdsE[CT−s|Xμks−Xμs|+12c|vx(s,Xμs;μk)−vx(s,Xμs;μ)|]ds.

By (3.7) and the bounded convergence theorem, we obtain

 ∫t0\mathdsE[|vx(s,Xμs;μk)−vx(s,Xμs;μ)|]ds→0.

So for large enough, we have

 \mathdsE[|Xμkr−Xμr|] ≤CT−t∫r0\mathdsE[|Xμks−Xμs|]ds+ϵe−CtT−t.

By Gronwall’s inequality,

 \mathdsE[|Xμkt−Xμt|]≤ϵe−CtT−t+CT−t∫t0ϵe−CtT−teC(t−s)T−tds=ϵ.

This completes the proof of (3.8), and thus the continuity of . By Schauder’s fixed point theorem, there exists a fixed point of in the set . ∎

### 3.2 Uniqueness of Nash equilibrium.

Let be a class of measures in which uniqueness will be established. We first state a monotonicity assumption which is in the spirit of [8].

###### Assumption 3.3.

For any , we have

 ∫\mathdsR(Rμ−Rμ′)(x)d(μ−μ′)(x)≤0.
###### Remark 3.4.

Take to be the set of all measures in that are absolutely continuous with respect to the Lebesgue measure, then Assumption 3.3 is satisfied if the reward function is Lipschitz continuous and

 h(x,r1,r2):=R(x,r1)−R(x,r2)r1−r2,x∈\mathdsR,(r1,r2)∈[0,1]2∖{r1=r2}

is differentiable and has non-negative partial derivatives . This includes any continuously differentiable function which satisfies (i) is convex, and (ii) is non-decreasing. To see why is sufficient to verify Assumption 3.3, first note that for any , and are absolutely continuous. Using integration by parts for absolutely continuous functions, we have

 ∫\mathdsR(Rμ−Rμ′)(x)d(μ−μ′)(x)=∫\mathdsR(Fμ−Fμ′)(x)h(x,Fμ(x),Fμ′(x))d(Fμ−Fμ′)(x) =−∫\mathdsR(Fμ−Fμ′)(x)d[(Fμ−Fμ′)(x)h(x,Fμ(x),Fμ′(x))] =−∫\mathdsR(Fμ−Fμ′)2(x)dh(x,Fμ(x),Fμ′(x))−∫\mathdsR(Rμ−Rμ′)(x)d(μ−μ′)(x)

Re-arranging terms and using that , we get

 2∫\mathdsR(Rμ−Rμ′)(x)d(μ−μ′)(x) =−∫\mathdsR(Fμ−Fμ′)2(x)∇h(x,Fμ(x),Fμ′(x))⋅(dx,dFμ(x),dFμ′(x))≤0.

If one measures the rank of with respect to a given distribution using the "regular" cumulative distribution function , then for the case , Assumption 3.3 is satisfied with (see [5, Theorem B]).

###### Proposition 3.5.

Under Assumption 3.3, has at most one fixed point in .

###### Proof.

Suppose and are two fixed points of in . To simplify notation, write and . Let and be the optimally controlled state processes (starting at zero) in response to and , respectively. Let . Using Itô’s lemma and the PDE satisfied by and , it is easy to show that

 \mathdsE[v(t,Xμt)]=v(0,0)+\mathdsE[∫t014c(vx)2(s,Xμs)ds], (3.9)

and

 \mathdsE[v′(t,Xμt)]=v′(0,0)+\mathdsE[∫t014c[2v′xvx−(v′x)2](s,Xμs)ds]. (3.10)

Write , we obtain by subtracting (3.10) from (3.9) that

 \mathdsE[Δv(t,Xμt)]=Δv(0,0)+\mathdsE[∫t014c[(Δv)x(s,Xμs)]2ds].

Letting and using the continuity of and at the terminal time, we get

 (3.11)

Now, exchange the role of and . We also have

 \mathdsE[(Rμ′−Rμ)(Xμ′T)]=−\mathdsE[Δv(T,Xμ′T)]=−Δv(0,0)+\mathdsE[∫T014c[(Δv)x(s,Xμ′s)]2ds]. (3.12)

Adding (3.11) and (3.12), and using that , we get

 0 ≤14c\mathdsE[∫T0[(Δv)x(s,Xμs)]2+[(Δv)x(s,Xμ′s)]2ds] =\mathdsE[(Rμ−Rμ′)(XμT)]+\mathdsE[(Rμ′−Rμ)(Xμ′T)]=∫\mathdsR(Rμ−Rμ′)(x)d(μ−μ′)(x)≤0,

where the last inequality follows from Assumption 3.3. This implies

 vx(s,Xμ′s)=v′x(s,Xμ′s) d\mathdsP×dt-a.e.

By the uniqueness of the solution of the SDE (3.6), we must have a.s. and . ∎

### 3.3 Approximate Nash equilibrium of the N-player game

The MFG solution allows us to construct, using decentralized strategies, an approximate Nash equilibrium of the -player game when is large. In the MFG literature, this is typically done using results from the propagation of chaos. Here we have a simpler problem since the mean-field interaction does not enter the dynamics of the state process. And it is this special structure that allows us to handle rank-based terminal payoff which fails to be Lipschitz continuous in general.

###### Definition 3.6.

A progressively measurable vector is called an -Nash equilibrium of the -player game if

• for any ; and

• for any , and any progressively measurable process satisfying , we have

 \mathdsE[R¯μN,a(Xaii,T)−∫T0ca2i,tdt]+ϵ≥\mathdsE[R¯μN,aiβ(Xβi,T)−∫T0cβ2tdt],

where , and .

We now state an additional Hölder condition on which allows us to get the convergence rate. It holds, for example, when where and .

###### Assumption 3.7.

There exist constants and such that for any and .

###### Theorem 3.8.

Let Assumption 3.7 hold. For any fixed point of ,

 ¯ai,t:=(2c)−1vx(t,X¯aii,t;μ), i=1,…,N

form an -Nash equilibrium of the -player game as .

###### Proof.

Let be a fixed point of , and let be defined as in the theorem statement. To keep the notation simple, we omit the superscript of any state process if it is controlled by the optimal Markovian feedback strategy . Let

 V:=v(0,0;μ)=\mathdsE[Rμ(XT)−∫T014cv2x(s,Xs;μ)ds]

be the value of the limiting game where satisfies (3.6), and

 JNi:=\mathdsE[R¯μN(Xi,T)−∫T0c¯a2i,sds]

be the net gain of player in an -player game, if everybody use the candidate approximate Nash equilibrium . Here . Since our state processes do not depend on the empirical measure (the interaction is only through the terminal payoff), each is simply an independent, identical copy of . Hence

 V=\mathdsE[Rμ(Xi,T)−∫T0c¯a2i,sds].

Let us first show that and are close. We have

 JNi−V=\mathdsE[R¯μN(Xi,T)−Rμ(Xi,T)].

It follows from the -Hölder continuity of that

 |JNi−V| ≤L\mathdsE[|F¯μN(Xi,T)−Fμ(Xi,T)|α]≤L\mathdsE[∥^FNμ−Fμ∥α∞],

where for , denotes the empirical cumulative distribution function of i.i.d. random variables with cumulative distribution function . By Dvoretzky-Kiefer-Wolfowitz inequality, we have

 \mathdsP(∥^FNμ−Fμ∥∞>ϵ)≤2e−2Nϵ2.

It follows that

 |JNi−V| ≤L\mathdsE[∥^FNμ−Fμ∥α∞]=L∫∞0\mathdsP(∥^FNμ−Fμ∥α∞>z)dz ≤L∫∞02e−2Nz2/αdz=2L(4N)α/2∫∞0e−12y2/αdy =O(N−α/2)  as N→∞.

Next, consider the system where player makes a unilateral deviation from the candidate approximate Nash equilibrium ; say, she chooses an admissible control . Denote her controlled state process by , and the state processes of all other players by as before for . Let be the corresponding empirical measure of the terminal states, and

 JN,βi:=\mathdsE[R¯νN(Xβi,T)−∫T0cβ2sds]

be the corresponding net gain for player . We have

 JN,βi−V =\mathdsE[R¯