Electron. Commun. Probab. 0 (2012), no. 0, 1–LABEL:LastPage.
A rank-based mean field game in the strong formulation††thanks: This research is supported in part by the National Science Foundation under grant DMS-1613170.
We discuss a natural game of competition and solve the corresponding mean field game with common noise when agents’ rewards are rank-dependent. We use this solution to provide an approximate Nash equilibrium for the finite player game and obtain the rate of convergence.
Keywords: Mean field games; competition; common noise; rank-dependent interaction; non-local interaction; strong formulation.
AMS MSC 2010: 60H; 91A.
Submitted to ECP on March 26, 2016, final version accepted on October 6, 2016.
Mean field games (MFGs), introduced independently by  and , provide a useful approximation for the finite player Nash equilibrium problems in which the players are coupled through their empirical distribution. In particular, the mean field game limit gives an approximate Nash equilibrium, in which the agents’ decision making is decoupled. In this paper we will consider a particular game in which the interaction of the players is through their ranks. Our main goal is to construct an approximate Nash equilibrium for a finite player game when the agents’ dynamics are modulated by common noise.
Rank-based mean field games, which have non-local mean field interactions, have been suggested in  and analyzed more generally by the recent paper by Carmona and Lacker  using the weak formulation, when there is no common noise. There are currently no results on the rank-dependent mean field games with common noise. In order to solve the problem with common noise, we will make use of the mechanism in  by solving the strong formulation of the rank-dependent mean field game without common noise and then by observing that purely rank-dependent reward functions are translation invariant.
The rest of the paper is organized as follows: In Section 2 we introduce the N-player game in which the players are coupled through the reward function which is rank-based. In Section 3 we consider the case without common noise. We first find the mean field limit, discuss the uniqueness of the Nash equilibrium, and construct an approximate Nash equilibrium using the mean field limit. Using these results, in Section 4 we use the mechanism in  and obtain respective results for the common noise.
2 The -player game
We consider players each of whom controls her own state variable and is rewarded based on her rank. We will denote by the -th player’s state variable, and assume that it satisfies the following stochastic differential equation (SDE)
where is the control by agent , and and are independent standard Brownian motions defined on some filtered probability space , representing the idiosyncratic noises and common noise, respectively. The game ends at time , when each player receives a rank-based reward minus the running cost of effort, which we will assumed to be quadratic for some constant .
In order to precisely define the rank-based reward, let
denote the empirical measure of the terminal state of the -player system. Then gives the fraction of players that finish the same or worse than player . Let be a bounded continuous function that is non-decreasing in both arguments. For any probability measure on , write where denotes the cumulative distribution function of . The reward player receives is given by
When is independent of , the compensation scheme is purely rank-based. In general, we could have a mixture of absolute performance compensation and relative performance compensation. The objective of each player is to observe the progress of all players and choose her effort level to maximize the expected payoff, while anticipating the other players’ strategies.
The players’ equilibrium expected payoffs, as functions of time and state variables, satisfy a system of coupled nonlinear partial differential equations subject to discontinuous boundary conditions, which appears to be analytically intractable. Fortunately, in a large-population game, the impact of any individual on the whole population is very small. So it is often good enough for each player to ignore the private state of any other individual and simply optimize against the aggregate distribution of the population. As a consequence, the equilibrium strategies decentralize in the limiting game as . We shall use the mean field limit to construct approximate Nash equilibrium for the -player game, both in the case with and without common noise.
3 Mean field approximation when there is no common noise
In this section, we assume . Solving the mean field game consists of two sub-problems: a stochastic control problem and a fixed-point problem (also called the consistency condition). For any Polish space , denote by the space of probability measures on , and .
We first fix a distribution of the terminal state of the population, and consider a single player’s optimization problem:
is a Brownian motion, and ranges over the set of progressively measurable processes satisfying . The associated dynamic programming equation is
with terminal condition . Using the first-order condition, we obtain that the candidate optimizer is , and the Hamilton-Jacobi-Bellman (HJB) equation can be written as
The above equation can be linearized using the Cole-Hopf transformation , giving
Together with the boundary condition , we can easily write down the solution:
where is a standard normal random variable. Let us further write as an integral:
Using the dominated convergence theorem, we can differentiate under the integral sign and get
Similarly, we obtain
The functions and satisfy
Since is bounded, the drift coefficient is Lipschitz continuous in . It follows that the optimally controlled state process, denoted by , has a strong solution on . Observe that
So the optimal cumulative effort is bounded by some constant independent of . It also implies that has a well-defined limit as . Standard verification theorem yields that the solution to the HJB equation is the value function of the problem (3.1)-(3.2), and that is the optimal Markovian feedback control. Finally, using the dominated convergence theorem again, we can show that for ,
The same limits also hold for since is bounded away from zero. In other words, the optimal effort level is small when the progress is very large in absolute value. This agrees with many real life observations that when a player has a very big lead, it is easy for her to show slackness; and when one is too far behind, she often gives up on the game instead of trying to catch up.
3.1 Existence of a Nash equilibrium
For each fixed , solving the stochastic control problem (3.1)-(3.2) yields a value function and a best response . Suppose the game is started at time zero, with zero initial progress, the optimally controlled state process of the generic player satisfies the SDE
Finding a Nash equilibrium for the limiting game is equivalent to finding a fixed point of the mapping , where denotes the law of its argument. We shall sometimes refer to such a fixed point as an equilibrium measure.
The mapping has a fixed point.
Similar to , we will use Schauder’s fixed point theorem. Observe that for any , we have
This implies the set of is tight in , hence relatively compact for the topology of weak convergence by Prokhorov theorem. Recall that . Equip with the topology induced by the 1-Wasserstein metric:
Here denotes the space of Lipschitz continuous functions on whose Lipschitz constant is bounded by one. It is known that is a complete separable metric space (see e.g. [9, Theorem 6.18]). We shall work with a subset of defined by
It is easy to check that is non-empty, convex and closed (for the topology induced by the metric). Moreover, one can show using [9, Definition 6.8(iii)] that any weakly convergent sequence is also -convergent. Therefore, is also relatively compact for the topology induced by the metric. So we have found a non-empty, convex and compact set such that maps into itself. It remains to show is continuous on . In the rest of the proof, the constant may change from line to line.
Let such that as . We wish to show . Note that
From Lemma 3.1, we know that . Since , thanks to the dominated convergence theorem, it suffices to show for ,
By Lemma 3.1 and the mean value theorem, we have that
So to show , it suffices to show that for each fixed ,
Since all integrands are bounded, to show the expectations converge to zero, it suffices to check that the integrands converge to zero a.s. Fix , we know from (3.4) that
Since , also converges to weakly, and the cumulative distribution function converges to at every point at which is continuous. It follows from the continuity of that converges to at every point at which is continuous. Since has at most countably many points of discontinuity, the random variable inside the expectation converges to zero a.s. The dominated convergence theorem then allows us to interchange the limit and the expectation, giving that
Similarly, from (3.3) we obtain
Again, using that has countably many points of discontinuity, one can show that
Putting everything together, we have proved (3.7).
Next, we show (3.8) by Gronwall’s inequality. Let be given. For any ,
By (3.7) and the bounded convergence theorem, we obtain
So for large enough, we have
By Gronwall’s inequality,
This completes the proof of (3.8), and thus the continuity of . By Schauder’s fixed point theorem, there exists a fixed point of in the set . ∎
3.2 Uniqueness of Nash equilibrium.
Let be a class of measures in which uniqueness will be established. We first state a monotonicity assumption which is in the spirit of .
For any , we have
Take to be the set of all measures in that are absolutely continuous with respect to the Lebesgue measure, then Assumption 3.3 is satisfied if the reward function is Lipschitz continuous and
is differentiable and has non-negative partial derivatives . This includes any continuously differentiable function which satisfies (i) is convex, and (ii) is non-decreasing. To see why is sufficient to verify Assumption 3.3, first note that for any , and are absolutely continuous. Using integration by parts for absolutely continuous functions, we have
Re-arranging terms and using that , we get
Under Assumption 3.3, has at most one fixed point in .
Suppose and are two fixed points of in . To simplify notation, write and . Let and be the optimally controlled state processes (starting at zero) in response to and , respectively. Let . Using Itô’s lemma and the PDE satisfied by and , it is easy to show that
Letting and using the continuity of and at the terminal time, we get
Now, exchange the role of and . We also have
where the last inequality follows from Assumption 3.3. This implies
By the uniqueness of the solution of the SDE (3.6), we must have a.s. and . ∎
3.3 Approximate Nash equilibrium of the -player game
The MFG solution allows us to construct, using decentralized strategies, an approximate Nash equilibrium of the -player game when is large. In the MFG literature, this is typically done using results from the propagation of chaos. Here we have a simpler problem since the mean-field interaction does not enter the dynamics of the state process. And it is this special structure that allows us to handle rank-based terminal payoff which fails to be Lipschitz continuous in general.
A progressively measurable vector is called an -Nash equilibrium of the -player game if
for any ; and
for any , and any progressively measurable process satisfying , we have
where , and .
We now state an additional Hölder condition on which allows us to get the convergence rate. It holds, for example, when where and .
There exist constants and such that for any and .
Let Assumption 3.7 hold. For any fixed point of ,
form an -Nash equilibrium of the -player game as .
Let be a fixed point of , and let be defined as in the theorem statement. To keep the notation simple, we omit the superscript of any state process if it is controlled by the optimal Markovian feedback strategy . Let
be the value of the limiting game where satisfies (3.6), and
be the net gain of player in an -player game, if everybody use the candidate approximate Nash equilibrium . Here . Since our state processes do not depend on the empirical measure (the interaction is only through the terminal payoff), each is simply an independent, identical copy of . Hence
Let us first show that and are close. We have
It follows from the -Hölder continuity of that
where for , denotes the empirical cumulative distribution function of i.i.d. random variables with cumulative distribution function . By Dvoretzky-Kiefer-Wolfowitz inequality, we have
It follows that
Next, consider the system where player makes a unilateral deviation from the candidate approximate Nash equilibrium ; say, she chooses an admissible control . Denote her controlled state process by , and the state processes of all other players by as before for . Let be the corresponding empirical measure of the terminal states, and
be the corresponding net gain for player . We have