Mean Field Game Theory for Agents with Individual-State Partial ObservationsSome of the work in this paper was presented at the 55^{th} IEEE Conference on Decision and Control, Las Vegas, NV, USA, December, 2016.

# Mean Field Game Theory for Agents with Individual-State Partial Observations1

## Abstract

Subject to reasonable conditions, in large population stochastic dynamics games, where the agents are coupled by the system’s mean field (i.e. the state distribution of the generic agent) through their nonlinear dynamics and their nonlinear cost functions, it can be shown that a best response control action for each agent exists which (i) depends only upon the individual agent’s state observations and the mean field, and (ii) achieves a -Nash equilibrium for the system. In this work we formulate a class of problems where each agent has only partial observations on its individual state. We employ nonlinear filtering theory and the Separation Principle in order to analyze the game in the asymptotically infinite population limit. The main result is that the -Nash equilibrium property holds where the best response control action of each agent depends upon the conditional density of its own state generated by a nonlinear filter, together with the system’s mean field. Finally, comparing this MFG problem with state estimation to that found in the literature with a major agent whose partially observed state process is independent of the control action of any individual agent, it is seen that, in contrast, the partially observed state process of any agent in this work depends upon that agent’s control action.

m
\definecolor

redrgb1,0,0 \definecolorPurplergb1,0,0

ean field games, partially observed stochastic control, nonlinear filtering, stochastic games.

{AMS}

35Q83, 35R60, 60G35, 91A10, 91A23, 93A14, 93E20

## 1 Introduction

For dynamical games of mean field type it has been demonstrated that when the agents are coupled through their dynamics and their cost functions, the best response control policies in the asymptotically infinite population limit only depends upon their individual state and the system mean field. Furthermore, such policies generate approximate Nash equilibria when they are applied to a large finite population game, see [12], [15], [16] and [14] among others, by Huang, Malhamé and Caines, [19], [20] and [21], by Lasry and Lions.

A distinct consequence of such a result is that in the mean field games (MFG) set-up an individual agent does not have a significant benefit in learning the state of an other agent, and therefore the estimation of any other agent’s state process has negligible value. Nonetheless, in practical situations one does not have access to complete observation of its own state and and therefore models of such (PO) MFG systems where the agents’ controls depend upon the agents’ observation processes can only represent them as functions of the agents’ states via estimates of those states. Such a model for linear quadratic Gaussian (LQG) MFG type of problems has been considered in [13] and approximate Nash equilibrium is obtained on an extended state space. In this work, we consider the nonlinear MFG where an individual agent has noisy observation on its own state.

Recent works, ([11] and [22]), consider MFG involving a major agent and many minor agents (MM-MFG) where, by definition, a minor agent is an agent which, asymptotically as the population size goes to infinity, has a negligible influence on the overall system while the overall population’s effect on it is significant, and where a major agent is the agent which has asymptotically non-vanishing influence on each minor agent as the population size goes to infinity. A fundamental feature of this setup is that, in contrast to the situation without a major agent, the mean field is stochastic due to the stochastic evolution of the state of the major agent and the best response processes of each minor agent depend on the state of the major agent. Motivated by this observation, state estimation problems in the nonlinear MFG with a major agent is considered in [9] (see [6] for the LQG case) where the major agent’s state process is partially observed but the agents have complete observation of their own states. Adopting the approach of constructing an equivalent completely observed model via application of nonlinear filtering, the MFG problem is analyzed in the space of conditional densities and the existence of Nash equilibria in the infinite population and the -Nash equilibria for the finite population game is obtained. We finally remark that in addition to [11] and [22], MFG setup with major and minor agents has also been considered in [5] and [7] where in the former the authors generalized the MM-MFG setup to the case where the mean field is determined by control policy of the major agent and in the later, a probabilistic approach is taken for the MM-MFG problem in which major agent’s state exists in the both state and the cost functions.

The individuals dynamics in the infinite population limit in an MFG setup are charactherized by McKean-Vlasov (MV) type stochastic differential equations (SDEs). These SDEs have the property that the dynamics depend on the distribution of the state process. Hence, a PO stochastic optimal control problem (SOCP) is formulated for MV type SDEs and as a consequence, the filtering equations should first be developed for such SDEs for which a theory for joint state and distribution estimation in the case the measure is stochastic is developed in [24]. Following the standard approach in the literature, once the filtering equations in the form of normalized or unnormalized densities are obtained, it is possible to obtain a form of the Hamilton-Jacobi-Bellman (HJB) equation in functional spaces. This is the path that we employ in the paper by using the unnormalized conditional densities.

It is also worthwhile to provide a summary of the technical steps that one shall develop in a nonlinear PO MFG setup. We first remark that one can follow different approaches in order to prove the convergence properties of MFG in the infinite population. Among these, the convergence of the dynamics of the controlled state process into a MV type dynamics when feedback controls are applied, see [15], greatly simplifies the analysis of the associated optimal control problem. In the partially observed setup, we follow this approach consequently, as the first step, we shall prove such a convergence argument for the case where the control policies are in the feedback form for conditional densities. We next analyze the fixed point property on the Wasserstein space of probability measures. Recall however that the solution to the completely observed MFG problem is given by a coupled HJB and a Fokker-Plank-Kolmogorov (FPK) equation which essentially requires to analyze the sensitivity of the solutions to the HJB equations with respect to the probability measure representing the mean field. In the PO MFG one needs to generalize such sensitivity results with respect to the conditional density component representing the information state. This is achieved by using the robustness property of the nonlinear filter. In the final stage, we shall prove the approximate Nash equilibrium property of the best response control policies obtained as the solution to the HJB equation of the infinite population game.

The organization of the paper is as follows. In Section 2 we present a MFG setup with uniform agents and discuss the main results in a brief manner. In Section3 we formulate the state estimation problem and present the associated completely observed system via applying the separation principle. We also provide a solution in the form of HJB equation for the completely observed problem. In Section 4 we prove the existence of a Nash equilibrium between an individual and the mass in the infinite population limit and in Section 5, we demonstrate the approximate Nash equilibrium property of the best response processes obtained in the former section. In Section 6 we present an example where the completely observed model has a finite dimensional information state and hence provides a more tractable MFG system. In Section 7 we briefly compare the results presented in the paper with a PO MM-MFG model. We conclude the paper with Section 8.

Throughout the paper we use the following notation. For a matrix , , and denotes the transpose, the trace and the corresponding entry, respectively. and denotes the gradient and Hessian operators with respect to the variable and in a one-dimensional domain, and will be used instead. Let be a metric space. Then, denotes the Borel -algebra and denotes the space of probability measures, respectively, on . Let be a filtered probability space satisfying usual conditions. Conditional expectation with respect to a sigma algebra is denoted by . For an Euclidean space , we denote by the set of all -adapted -valued processes such that .

## 2 Mean Field Games with Uniform Agents

We consider a stochastic dynamic game with agents, , where the dynamics of the agents are given by the following controlled SDEs on :

 dzNi(t)=1NN∑j=1f(t,zNi(t),uNi(t),zNj(t))dt+σdwi(t) (1)

with terminal time and initial conditions , , where (i) , (ii) is the control input of agent , (iii) is a measurable function, (iv) are independent standard Brownian motions in and; (v) is constant. For we denote by , where agents’ states and controls are taken to be scalar valued for simplicity of notation throughout the paper. The objective of each agent is to minimize its cost-coupling function given by

 JNi(uNi,uN−i):=E∫T01NN∑j=1L(zNi(t),uNi(t),zNj(t))dt (2)

where . Remark that the above model can be generalized to the case where the diffusion coefficient depends on the mean field coupling, where the state processes take values in, say, and where the cost functions are time varying. We assume the followings:

• The initial states are mutually independent, independent of all Brownian motions and satisfy , where is independent of . Furthermore, let denote the empirical distribution of agents where if and otherwise. Then we assume converges to a distribution weakly.

• is a compact set.

• The functions , are continuous and bounded in all their parameters and Lipschitz continuous in .

• The first and second order derivatives of and with respect to are all uniformly continuous and bounded with respect to all their parameters and Lipschitz continuous in .

• is Lipschitz continuous in .

For the system described by (1)-(2), the goal is to find individual control strategies and characterize their optimality with regard to Nash equilibrium. Following the standard approaches in literature, the asymptotic analysis () of the above game shall be considered first and as a consequence, MV type dynamics approximating the state dynamics of an individual agent should be obtained. More explicitly, let , the space of -valued, continuous functions on with Lipschitz coefficient in , which is used by the agents as their control laws. Hence, the closed loop dynamics of agents are given by

 d^zoi(t)=1NN∑j=1f(t,^zoi(t),ϕ(t,^zoi),^zoj(t))dt+σdwi(t) (3)

for which unique solution is known to exist [26, Chapter 1, Theorem 6.16]. consider the following MV type dynamics

 dzoi(t)=f[t,zoi(t),ϕ(t,zoi(t)),μt]dt+σdwi(t), (4)

with and . Here and we use the same notation in the rest of the paper. A pair for (4) is said to be a consistent solution if is a solution to the SDE in (4) and for all and . The closed loop dynamics in (3) can be -approximated by the MV type dynamics given by (4) [15].

## 3 Partially Observed Mean Field Games and Nonlinear Filtering for MV Systems

In this section we formulate the estimation problem associated to the MFG set-up described above. Let agent has access to a noisy observation of its own state via:

 dyi(t)=h(t,zoi(t))dt+dνi(t) (5)

where is a Brownian motion independent of and of the other noise processes . We assume the following.

• The function , the space of functions which are differentiable in and twice differentiable in , with and for all .

Following the standard approach to the PO SOCP, we shall construct the associated completely observed system via application of nonlinear filtering for the dynamics described in (4). But prior to that we obtain an MV type approximation result for the state process controlled with filtering dependent policies, since under suitable assumptions the optimal control takes a feedback form given by the solution of an HJB equation with infinite dimensional domain.

### 3.1 Nonlinear Filtering for McKean-Vlasov Dynamics

The nonlinear filtering equations that each agent needs to generate are defined as follows: Given the history of observations , determine a recursive expression for for , the space of all bounded differentiable functions with bounded derivatives up to order 2. Note that the agent’s state has MV type dynamics and so, for a fixed measure flow, we have

 f[t,zoi,ui,μt]=∫Rf[t,zoi,ui,μt]μt(dx):=f∗(t,zoi,ui) (6)

where . Hence, consider the SDEs

 dzoi(t) = f∗(t,zoi(t),ui(t))dt+σdwi(t) (7) dyi(t) = h(t,zoi(t))dt+dνi(t). (8)

The filtering problem for the MV system described by (7)-(8) has been analyzed in [24] where filtering equations generating conditional distributions are obtained. We can similarly obtain filtering equations in the form of conditional densities as follows. Define the following innovation process: which can be shown to be a -Brownian motion under the measure . Let and define

 Lℓ:=12σ2∂2xxℓ+f∗∂xℓ. (9)

Define the adjoint operator on as:

 L∗θ(x)=12∂2xxσθ(x)−∂xf∗θ(x). (10)

Let denote the probability density for i.e., for , where is -measurable and adapted for each . Then satisfies the following: For every ,

 φi(t,x)=φi(0,x) (11) +∫t0L∗φi(s,x)ds+∫t0φi(s,x){h(s,x)−∫Rh(s,x′)φi(s,x′)dx′}dIi(s)

for a.e. with probability and where is the initial conditional density and is the Innovations process, which is a Brownian motion, defined above. This can be shown, for instance, by following [17, Theorem 11.2.1]. Based on the consistency based approach to MFG [15], we now provide a decoupling result which demonstrates that the closed loop dynamics of each agent in the infinite population limit is approximated by MV SDEs in the partially observed setup.

### 3.2 McKean-Vlasov Approximation with Partial Information

Let be a vector space with norm , let be an arbitrary measurable process and assume that

1. and .

The process , , satisfying (11) is a -adapted, -valued process. Assume that the process is used by agent as its control laws in (1) such that for . We then obtain the following closed-loop dynamics:

 dzNi(t) = 1NN∑j=1f(t,zNi(t),α(t,φi(t)),zNj(t))dt+σdwi(t), zNi(t)=zi(0). (12)

One can show that under the assumptions (A1)-(A4), the systems of equation given in (12) has a unique solution by following similar steps to those in the proof of Theorem 6.16 of [26, p. 49] and by using the robustness (i.e., continuity with respect to the observation path) of nonlinear filter; see Theorem 4. We now introduce the MV system for the generic agent where the agent’s MV system shall contain the estimation of its own state via nonlinear filtering equations:

 d^z(t) = f[t,^z(t),α(t,φ(t)),μt]dt+σdw(t) (13) dy(t) = h(^z(t))dt+dν(t), 0≤t≤T (14)

with the initial condition and are standard Brownian motion in , which are independent of each other and independent of initial condition . Furthermore, we characterize by , . Finally, is the -adapted solution to filtering equation for the conditional density. We remark that under (A0)-(A4), (A5) and (M1) it can be shown that a unique consistent solution to the above MV system exists; see Theorem 4. Let us also introduce

 d^zi(t) = f[t,^zi(t),α(t,φi(t)),μt]dt+σdwi(t) (15) dyi(t) = h(t,^zi(t))dt+dνi(t), 0≤t≤T, (16)

where Brownian motions in which are are independent of each other and independent of and is the law of . These equations can be considered as independent copies of (13)-(14). We can now state the MV approximation result. {theorem} Assume (A0)-(A4), (A5) and (M1) hold. Then

 sup1≤j≤Nsup0≤t≤TE|zNj(t)−^zj(t)|=O(1√N), (17)

where and , , are given in (12) and (15), respectively, and depends on . {proof} The proof is an extension of [15, Theorem 12] to the case where control laws depend on the filtering processes. Consider first the th agent and notice that

 zNi(t)−^zi(t)= (18) ∫t01NN∑j=1f(t,zNi,α(s,φi(s)),zNj)ds−∫t0f[s,^zi(s),α(s,φi(s)),μs]ds.

Let

 Di(s):= (19) 1NN∑j=1f(s,zNi(s),α(s,φi(s)),zNj(s))−∫Rf[s,^zi(s),α(s,φi(s)),y]μs(dy)

and observe that

 Di(s) = D1i(s)+D2i(s)+D3i(s) D1i(s) := 1NN∑j=1f(s,zNi(s),α(s,φi(s)),zNj(s))−1NN∑j=1f(s,^zi(s),α(s,φi(s)),zNj(s)) D2i(s) := 1NN∑j=1f(s,^zi(s),α(s,φi(s)),zNj(s))−1NN∑j=1f(s,^zi(s),α(s,φi(s)),^zj(s)) D3i(s) := 1NN∑j=1f(s,^zi(s),α(s,φi(s)),^zj(s))−∫Rf[s,^zi(s),α(s,φi(s)),y]μs(dy).

By the Lipschitz continuity of and , there exists a constant independent of such that

 |D1i+D2i|≤CN∑j=1(1/N)(|zNi−^zi|+|zNj−^zj|). (20)

From (18)-(20), it follows that

 sup0≤s≤t∣∣zNi(s)−^zi(s)∣∣≤C∫t0∣∣zNi(s)−^zi(s)∣∣ds +C∫t0(1/N)N∑j=1∣∣zNj(s)−^zj(s)∣∣ds+∫t0D3i(s)ds (21)

which gives

 N∑i=1sup0≤s≤t∣∣zNi(s)−^zi(s)∣∣ ≤2CN∑i=1∫t0∣∣zNi(s)−^zi(s)∣∣ds+∫t0N∑i=1D3i(s)ds ≤2CN∑i=1∫t0sup0≤τ≤s∣∣zNi(s)−^zi(s)∣∣ds+∫t0N∑i=1D3i(s)ds. (22)

We consider the last item in (22). We have that

 E∣∣D3i(t)∣∣2≤∫t0E∣∣∣1NN∑j=1f(s,^zi(s),α(s,φi(s)),^zj(s)) −∫Rf[s,^zi(s),α(s,φi(s)),y]μs(dy)∣∣∣2. (23)

Define now and recall that depends on through (16). Therefore, for , we have

 E[g(s,^zi,^zj)g(s,^zi,^zk)]=0 (24)

which implies that there are no cross terms in (23). Consequently, by the boundedness of and the inequality that , we obtain

 E∣∣D3i(t)∣∣2≤k1(t)/N=O(1/N) (25)

where is an increasing function of but independent of . Now by (22), (25) and Gronwall’s lemma

 N∑i=1Esup0≤t≤T∣∣zNi(t)−^zi(t)∣∣=O(1√N) (26)

which yields .

### 3.3 A Completely Observed Stochastic Optimal Control Problem for the Generic Agent

The widely adopted procedure in the literature in the construction of a completely observed stochastic optimal control problem from the partially observed one is to use the unnormalized conditional density in the separation principle since it is known that the cost function under an equivalent measure is linear in the initial unnormalized conditional density. Furthermore, the dynamics of the unnormalized conditional density is also a linear functional of the initial density and hence one can significantly benefit from the unnormalized construction including the closed form computation of the first and second order functional (Fréchet) derivatives with respect to the density-valued state component. However, in order to proceed with the unnormalized form, following the standard assumptions in the literature (see [3], [10] and [4]), we shall restrict ourselves to the state dynamics in the following form:

• The function is linear in the control: .

Recall that if the probability measure flow is fixed, and become a function of and as before, we denote

 f∗(t,x,u):=f[t,x,u,μ], L∗(x,u):=L[x,u,μ].

We need a further condition that the measure flow satisfies so that the induced functions are well behaved, see Definition 3 and Proposition 4 of [15]. {definition} A probability measure flow on is in if there exists such that for any bounded and Lipschitz continuous function on ,

 sup1≤j≤K∣∣∣∫Rψ(y)μjt′(dy)−∫Rψ(y)μjt′′(dy)∣∣∣≤B|t′−t′′|β (27)

for all where for given , depends on upon the Lipschitz coefficient of . In order to obtain the unnormalized filtering equations for the MV SDE, we first need to define an exponential martingale for the change of measure argument. Consider first the following MV SDE

 dzo(t) = f[t,zo(t),α(t),μt]dt+σdw(t), (28) dy(t) = h(t,zo(t))dt+dν(t), 0≤t≤T, (29)

with , where is an admissible control. In the rest of this section, we assume that is fixed with exponent and we follow the approach presented in [3]. Hence, we define the process

 w−(t)=w(t)−∫t0u(s)ds (30)

and let where

 Mts(u):=exp{∫ts(u(τ)dw−(τ)+h(τ,zo(τ))dν(τ)) −12∫ts(|u(τ)|2+|h(τ,zo(τ))|2)dτ}. (31)

It now follows from Girsanov’s theorem that under , is a Brownian motion. Define now the backward differential operator and its adjoint as follows: For

 Jat := 12∂2xx+(f†+a)∂x J∗at := 12∂2xx−(f†+a)∂x−∂xf. (32)

Similarly, for a given control process , we denote the family of operators by

 {Jut:=Jutt, J∗ut:=J∗utt, 0≤t≤T}. (33)

Consider a random function with and assume that it is a fundamental solution of the Zakai equation (which is known to exist [3]) given by:

 dq(t,z;τ,κ)=J∗utq(t,z;τ,κ)dt+h(t,z)q(t,z;τ,κ)dy(t) limt↓τq(t,z;τ,κ)=δz−κ, τ≤t≤T, ~P−a.s. (34)

Let denote the density of and set . Then by [3, Theorem 4.1] the function

 pt(z)=∫Rqt(z;κ)p(κ)dκ∫R∫Rqt(z;κ)p(κ)dκdz (35)

is a version of the conditional density of i.e., for and ,

 E[ℓ(zo(T))|FyT]=∫RpT(z)ℓ(z)dz  ~P−a.s. (36)

Finally, let us set . Then, by [3, Theorem 4.1], we obtain

 d~φ(t,z) = J∗ut~φ(z)dt+h(t,z)~φ(t,z)dy(t), 0≤t≤T ~φ(0,z) = p(z) (37)

where (37) is the Zakai equation for the unnormalized conditional density which will serve as the infinite dimensional state process of the completely observed optimal control problem. It is worthwhile recalling at this point that the goal is to solve the partially observed SOCP at the infinite population limit for which we aim to obtain an HJB equation in a function space by constructing the associated completely observed SOCP. We now proceed with such a derivation the first step of which requires one to define the cost in terms of the conditional density process and the new measure defined via (31).

Indeed, consider the cost function and note that

 J(u;p) = E∫T0L[zo(t),u(t),μt]dt (38) = ~E∫T0(∫R∫RL[z,u(t),μt]qt(z;x)p(x)dxdz)dt

where denotes expectation with respect to , , and (38) follows from [3, Equation 5.1]. Note that we explicitly indicate dependence on the initial condition. Define the following space of functions:

 Ek△={p∈L1(R);∥p∥k=∫R(1+|z|k)|p(z)|dz<∞}. (39)

In the derivation of the HJB equation we consider the function space (39) where for the expected total cost incurred during , , we assume that the initial condition satisfies and hence, for a constant control for all , we have . {definition}[3] Consider a probability space and an -valued stochastic process adapted to the filtration with . If

 E∫T0(∫R(1+|z|l)|η(t,z)|)jdt<∞ (40)

than we say that .

A continuous cost functional is next defined by setting:

 Vu(τ,p):=E∫TT−τL[zo(t),u(t),μt]dt (41)

where and has a distribution with density . We now recall the definition of the Fréchet derivative. A function is said be Fréchet differentiable at if there exists , where denote the space of bounded linear operators from to , such that . One can define higher order Fréchet derivatives in a similar manner. For instance, the second order Fréchet derivative of at satisfies that . We define the following assumptions.

• The function possesses continuous first derivatives in and first and second order Fréchet derivatives and with respect to in the form of linear functional and a bilinear form, respectively, which are given by

 DV(τ,p)[η] = ∫RVp(τ,p)(z)η(z)dz D2V(τ,p)[η,θ] = ∫R∫RV2pp(τ,p)(z,z′)η(z)θ(z′)dzdz′, η(⋅),θ(⋅)∈El

where the kernels and are continuous in their arguments and satisfy the following:

 |Vp(τ,p)(z)| ≤ ζ1(τ,∥p∥l)(1+|z|l) |V2pp(τ,p)(z,z′)| ≤ ζ2(τ,∥p∥l)(1+|z|l)(1+|z′|l) (42)

for