On quasi-stationary Mean Field Games models
We explore a mechanism of decision-making in Mean Field Games with myopic players. At each instant, agents set a strategy which optimizes their expected future cost by assuming their environment as immutable. As the system evolves, the players observe the evolution of the system and adapt to their new environment without anticipating. With a specific cost structures, these models give rise to coupled systems of partial differential equations of quasi-stationary nature. We provide sufficient conditions for the existence and uniqueness of classical solutions for these systems, and give a rigorous derivation of these systems from -players stochastic differential games models. Finally, we show that the population can self-organize and converge exponentially fast to the ergodic Mean Field Games equilibrium, if the initial distribution is sufficiently close to it and the Hamiltonian is quadratic.
Key words and phrases:Mean field games, quasi-stationnary models, nonlinear coupled PDE systems, long time behavior, self-organization, N-person games, Nash equilibria, myopic equilibrium.
2010 Mathematics Subject Classification:35Q91, 49N70, 35B40
The Mean Field Games formalism has been introduced some years ago by series of seminal papers by J.-M. Lasry and P.-L. Lions [LLMFG, LLMFG1, LLMFG2] and M. Huang, R. Malhamé and P. Caines [Malhame, Malhame2]. It describes the evolution of stochastic differential games with a large population of rational players and where the strategies of the agents are not only affected by their own preferences but also by the state of the other players through a global mean field effect. In terms of partial differential equations, these models are typically a system of a transport or Fokker-Plank equation for the distribution of the agents coupled with a Hamilton-Jacobi-Bellman equation.
The motivation of this paper is to study a strategy-choice mechanism that is different from classical Mean Field Games. Our agents are myopic, and choose their actions according to the information available at time , by fixing the future state of their opponents and trying to get the best possible gain in the future . Players anticipate no evolution of the system, undergo changes in their environment and adapt their strategies. A system of interacting agents can have such irrational behavior in panic situations for instance. In this framework, agents build a strategy at each moment and the global (in time) strategy is the history of all the chosen strategies. This decision-making mechanism intrinsically implies the existence of two time scales: a fast time scale which is linked to the optimization of the expected future cost; and a slow time scale linked to the actual evolution of the system. The coexistence of these two time scales gives rise to equations of quasi-stationary nature. We are also interested in the formation of equilibria for this type of evolution systems, and in the rate at which these systems converge towards these equilibria.
In general the decision-making mechanism in mean field games (MFG for short) involves solving a stochastic control problem, that provides a global in time optimal strategy. In the case where the players aim to minimize a long time average cost, it is well known that the MFG system of partial differential equations is stationary and takes the following form [LLMFG, LLMFG1, Feleqi, Feleqi2],
Here , all the functions are -periodic, the unknowns are the constant and the functions and , is the -dimensional torus, is the Hamiltonian and the coupling, both related to the structure of the cost, and is the partial derivative of with respect to the second variable. The solution of the first equation in (1.1) can be interpreted as the equilibrium value function of a “small” player whose cost depends on the density of the other players, while the second equation characterizes the distribution of players at the equilibrium. It is well known (see e.g. [LLMFG, LLMFG1, Feleqi]) that there exists a solution in for all to (1.1), under a wide range of sufficient conditions. Moreover, uniqueness holds under the following monotonicity condition on :
The interpretation of the above monotonicity condition is that the players dislike congested regions and prefer configurations in which they are scattered.
Another well known example of stationary MFG systems, is the case where players aim to minimize a discounted infinite-horizon cost functional. In that case, the MFG system takes the following form (see e.g. [Feleqi], among others):
where . It is also well known (see [BardiAMS, Arisawa, Feleqi]) that, under several technical conditions on and , there exists a solution for all to (1.3). Moreover, if has a linear growth, i.e.
In this paper, we consider a situation where the evolution of the players is driven by a system of stochastic differential equations, and where choosing a strategy amounts to choosing a drift vector field that has a suitable regularity; at any time , each player seeks to minimize a cost functional which depends on the current state of the system, and on the possible future evolution of the player, which is related to her/his choice of a vector field at time . Thus, choosing the optimal amounts to plan optimally the future evolution of the player, assuming no evolution in her/his environment. Players follow their planned evolution and adjust their drift according to the observed changes. Further details and explanations about the model will be given in Section 3.
For the choice of we consider two different cost structures: a discounted cost functional; and a long-time average cost (see Section 3). As we already pointed out, the scheduling gives rise to two time scales: a slow time scale “” linked to the evolution of the state of the system; and a fast scale “” (which does not appear explicitly in the MFG systems) related to the scheduling. Under some assumptions on and , we show in Section 3 that at the mean field limit one gets systems of equations of the following form:
where is fixed throughout this paper, , is the initial density of players, and all functions are -periodic. Note that (resp. ) depends on time only through (resp. ). The parameters and are respectively: the noise level related to the prediction process (the assessment of the future evolution), and the noise level associated to the evolution of the players. System (1.6) corresponds to the case of a long time average cost functional, while system (1.5) corresponds to the case of a discounted cost functional. We shall see that for any time , (resp. ) characterizes a local Nash equilibrium related to a long time average cost (resp. a discounted cost). The first equations in (1.6) and (1.5) give the “evolution” of the game value of a “small” player, and expresse the adaptation of players choices to the environment evolution. The evolution of and expresses the actual evolution of the population density. We refer to Section 3 for more detailed explanations.
In contrast to most MFG systems, the uniqueness of solutions to systems (1.6) and (1.5) does not require the monotonicity condition (1.2) nor the convexity of with respect to the second variable. This fact is essentially related to the forward-forward structure of the systems. We also show that the small-discount approximation (1.4) holds for quasi-stationary models under the same conditions as for the stationary ones. Under the monotonicity condition (1.2), we prove in Section 4 that for a quadratic Hamiltonian, a solution to (1.6) converges exponentially fast in some sense to the unique equilibrium of (1.1) as , provided that is sufficiently small and . An analogous result holds also for systems (1.3)-(1.5) when the discount rate is small enough. This asymptotic behavior is interpreted by the emergence of a self-organizing phenomenon and a phase transition in the system. Note that this entails in particular that our systems can exhibit a large scale structure even if the cohesion between the agents is only maintained by interactions between neighbors. The techniques used to prove this asymptotic results rely on some algebraic properties pointed out in [Cardaliaguet1] specific to the quadratic Hamiltonian. On the other hand, one can not use the usual duality arguments to show convergence for general data. Therefore the convergence remains an open problem for more general cases.
Similar asymptotic results were established for the MFG system in [Cardaliaguet1, Cardaliaguet3] for local and non local coupling. Long time convergence of forward-forward MFG models is also discussed in [Gomes, Achdou]. Self-organizing and phase transition in Mean Field Games were addressed in [Mehta1, Mehta2, Mehta3], for applications in neuroscience, biology, economics, and engineering. For an overview on collective motions and self-organization phenomena in mean field models, we refer to [Degond4] and the references therein. The derivation of the Mean Field Games system was addressed in [LLMFG, LLMFG1, Feleqi2] for the ergodic case (long time average cost). More general cases were analyzed in the important recent paper [master-equation] on the master equation and its application to the convergence problem in Mean Field Games. The reader will notice in Section 3 that the analysis of the mean-field limit in our case is very similar to that of the McKean-Vlasov equation. Therefore the proof of convergence is less technical than in [master-equation] and is based on the usual coupling arguments (see e.g. [Snitzman, Mckean, Méléard], among others). MFG models with myopic players are briefly addressed in [Achdou] for applications to urban settlements and residential choice. However, the sense given to “myopic players” is different from the one we are considering in this paper: indeed, “myopic players” in [Achdou] corresponds to individuals which compute their cost functional taking only into account their very close neighbors, while in this paper ”myopic players” refers to individuals which anticipate nothing and only undergo the evolution of their environement. In [Cristiani], the authors introduce a model for the study of crowds dynamics, that is very similar to the one addressed in this paper: in Section 2.2.2, the authors consider a situation where at any time pedestrians build the optimal path to destination, based on the observed state of the system. Although the approaches are different, the two models have many similarities.
Local Nash equilibria for mean field systems of rational agents were also considered in [Degond1, Degond2, Degond3]. The authors use the “Best Reply Strategy approach” to derive a kinetic equation and provide applications to the evolution of wealth distribution in a conservative [Degond2] and non-conservative [Degond3] economy. The link between Mean Field Games and the “Best Reply Strategy approach” is analyzed in [Degond0].
The paper is organized as follows: In Section 2, we give sufficient conditions for the existence and uniqueness of classical solutions for systems (1.5) and (1.6). The proofs rely on continuous dependence estimates for Hamilton-Jacobi-Bellman equations [Marchi], the small-discount approximation, and the non-local coupling which provides compactness and regularity. Section 3 is devoted to a detailed derivation of systems (1.5) and (1.6) from -players stochastic differential games models. In Section 4, we prove the exponential convergence result for system (1.6). Finally, the Appendix recall some elementary facts on the Fokker-planck equation.
Notations and assumptions
For simplicity, we work in a periodic setting in order to avoid issues related to boundary conditions or conditions at infinity. Therefore we will often consider functions as defined on (the -dimensional torus). Throughout the paper, , and the usual inner product on is denoted by or . We use the notation and denote the set of probability measures on . Recall that becomes a compact topological space when endowed with the -convergence thanks to Prokhorov’s theorem. Moreover, this topology is metrizable, e.g. by the Kantorowich-Rubinstein distance:
We denote by the set of -periodic continuous functions on , by , , , the set of -periodic functions having -th order derivatives which are -Hölder continuous, by , , the set of -summable Lebesgue measurable and -periodic functions on , by , , , the Sobolev space of -periodic functions having a weak derivatives up to order which are -summable on , and for a given Lipschitz continuous function , we define
For , we use the notation for parabolic Hölder spaces, with the norm , as defined in [Parabolic67]. For a given , we note
and the space of -periodic function in .
Throughout the proofs, we denote by a generic constant, and we use the notation to point out the dependence of the constant on parameters . For any vector we use the notation . For a random variable defined on some probability space, denotes the law of . Throughout the paper, is a fixed parameter.
This work was supported by LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program ”Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR), and partially supported by project (ANR-16-CE40-0015-01) on Mean Field Games.
The author would like to thank Martino Bardi and Pierre Cardaliaguet for fruitful discussions.
2. Analysis of the quasi-stationary systems
We shall use the following conditions:
the operator is defined from into , and satisfies
the Hamiltonian is locally Lipschitz continuous, and -periodic with respect to the first variable;
exists and is locally Lipschitz continuous;
and exist and are locally Lipschitz continuous;
there exists a constant such that, for any ,
is a probability measure, absolutely continuous with respect to the Lebesgue measure, and its density belongs to .
The Hamiltonian satisfies one of the following sets of conditions:
grows at most linearly in , i.e., there exists , such that
is superlinear in uniformly in , i.e.,
and there exists , , such that a.e and large enough,
Condition (1) arises naturally in control theory when the controls are chosen in a bounded set, whereas under condition (2) the control variable of each player can take any orientation in states space and can be arbitrary large with a large cost. As it is pointed out in [Feleqi, LLMFG, LLMFG1], the condition (2.2) is interpreted as a condition on the oscillations of and plays no role when .
A triplet is a classical solution to (1.6), if is continuous, of class in space, and of class in time, is of class in space, and satisfies (1.6) in the classical sense. Similarly, a couple is a classical solution to (1.5), if is continuous, of class in space, of class in time, is of class in space and satisfies (1.5) in the classical sense.
In this section, we give an existence and uniqueness result of classical solutions for system (1.5) and (1.6) under condition (1). In addition, we show that system (1.6) is also well-posed under condition (2).
We start by dealing with the case where the Hamiltonian has a linear growth (condition (1)). Let us consider the quasi-stationary approximate problem (1.5). We start by analyzing the first equation in (1.5).
The proof of existence and uniqueness for equation (2.3) relies on regularity results and a priori estimates from elliptic theory. A detailed proof to this result is given in [Feleqi, Theorem 2.6] in a more general framework. By looking at the extrema of , one easily gets (2.4a). The second bound is proved by contradiction using the strong maximum principle. The details of the proof are given in [Feleqi, Theorem 2.5]. Condition (2.1) ensures that the constant does not depend on . ∎
We now state a continuous dependence estimate due to Marchi [Marchi], which plays a crucial role.
The proof of (2.5b) is similar to [Marchi, Theorem 2.2]. Nevertheless we give a proof to this result because in this particular framework we do not need to fulfill all the conditions of [Marchi]. We shall proceed by contradiction assuming that there exists sequences , , such that for any ,
where , and . Note that the function
satisfies the following equation
Using (2.6) and (2.5a), one checks that In addition, 3 and (2.4b) entails that is uniformly bounded. Moreover, invoking standard regularity theory for linear elliptic equations (see e.g. [Gilbarg]), the sequence is uniformly bounded in for some . We infer that converge uniformly to some in which satisfies
Since is periodic, we deduce from the strong maximum principle that must be constant; this provides the desired contradiction. ∎
We shall give now an existence and uniqueness result for system (1.5).
Existence : For a constant large enough to be chosen below, let be the set of maps such that
Note that is compact thanks to Ascoli’s Theorem, and the compactness of . We aim to prove our claim using Schauder’s fixed point theorem (see e.g. [Fixedpoint, p. 25]). Set for any ,
such that, for a given , is the solution to the following “McKean-Vlasov” equation
Let us check that is well defined. Note that the above equation can be written as
In the same way, one checks that functions and are in , where , thanks to Lemma 2.1 and 4. Here and are the Hölder exponents appearing in 6 and (2.4b) respectively. We infer that problem (2.8) has a unique solution which satisfies
owing to existence and uniqueness theory for parabolic equations in Hölder spaces [Parabolic67, Theorem IV.5.1 p. 320]. Furthermore, using classical properties of Fokker-Planck equation (see Lemma A.1), it follows that
Therefore for big enough , since and does not dependent on nor on . In particular, the operator is well defined form into .
Let us check now that is continuous. Given a sequence in , let
It is possible to show more regularity for the maps , under additional regularity assumptions on and . For instance, if for some , and satisfies
then and are of class in . We refer to [master-equation] for the definition of derivatives in and notations. In addition, we have that
for any in and any signed measure on , where is the solution to the following problem
One has also an analogous result for the map defined in Lemma 2.1. We omit the details and invoke [master-equation, Proposition 3.8] for a similar approach.
We prove now well-posedness for system (1.6).
The proof of existence relies on small-discount approximation techniques. We give here an adaptation of these techniques for the quasi-stationary case. The crucial point in this proof is estimates (2.4a) and (2.4b).
On the other hand, recall that according to [Parabolic67, Theorem IV.5.1 p. 320] it holds that
where , and the constant is independent of thanks to (2.4b). Hence, one can extract a subsequence such that for any
All the results of this section hold true if one replaces the elliptic parts of the equations with a more general operator of the following form:
where is -periodic, , and there exists such that .
3. Models explanation & mean field limit
We provide in this section a rigorous interpretation for the quasi-stationary systems (1.5) and (1.6) in terms of -players stochastic differential games. We shall start by writing systems of equations for players, then we pass to the limit when the number of players goes to infinity assuming that all the players are identical. Throughout this section, we employ the notations introduced in Lemma 2.1 and Lemma 2.6.
3.1. Stochastic differential games models for -players.
We consider a game of -players where at each time agents choose their strategy
assuming no evolution in their environment;
according to an evaluation of their future situation emanating from the choice.
Observing the evolution of the system, players adjust their strategies without anticipating. More precisely, each player observe the state of the system at time and chooses the best drift vector field which optimize her/his future evolution . The player adapts and corrects her/his choice as the system evolves. This situation amounts to resolving at each moment an optimization problem which consists in finding the vector field (strategy) which guarantees the best future cost. Our agents are myopic: they anticipate no evolution and only undergo changes in their environment.
Let us now give a mathematical formalism to our model. Let be a family of independent Brownian motions in over some probability space , and be closed subsets of . We suppose that the probability space is rich enough to fulfill the assumptions that will be formulated in this section. Let be a vector of i.i.d random variables with values in that are independent of and let
be the information available to the players at time . We suppose that contains the -negligible sets of .
Consider a system driven by the following stochastic differential equations
For any , the -th player choses in the set of admissible strategies denoted by , that is, the set of -periodic processes defined on , indexed by with values in , such that
The reason of considering condition (3.2) will be clear in (3.4) below. At each time , player faces an optimization problem for choosing which insures the best future cost. We will explain the optimization problem in Section 3.1.1.
These instant choices give rise to a global (in time) strategies which does not necessarily guarantee the well-posedness of equations (3.1) in a suitable sense. Hence we need to introduce the following definitions:
Let and . We say that the global strategy is feasible on , if the -th equation of (3.1) is well-posed on .
Note that in contrast to standard optimal control situations, the optimal global strategy is not a solution to a global (in time) optimization problem, but it is the history of all the choices made during the game. The agents plan and correct their plans as the game evolves, and the global strategy is achieved through this process of planning and self-correction.
3.1.1. The case of a long time average cost
Consider the case where the -th player seeks to minimize the following long time average cost:
where and are continuous and - periodic with respect to the first variable. At any time , the process represents the possible future trajectory of player , related to the chosen strategy (vector field) . In other words, is what is likely to happen (in the future ) if player plays at the instant . Mathematically, we consider that are driven by the following (fictitious) stochastic differential equations