Mean-field optimal control

# Mean-field optimal control and optimality conditions in the space of probability measures

Martin Burger Department of Mathematics, Friedrich-Alexander Universität Erlangen-Nürnberg, Cauerstraße 11, 91058 Erlangen René Pinnau Department of Mathematics, TU Kaiserslautern
Erwin-Schrödinger-Str. 48, 67663 Kaiserslautern
Claudia Totzeck Department of Mathematics, TU Kaiserslautern
Erwin-Schrödinger-Str. 48, 67663 Kaiserslautern
and  Oliver Tse Department of Mathematics and Computer Science, Eindhoven University of Technology
P.O. Box 513, 5600MB Eindhoven, The Netherlands
###### Abstract.

We derive a framework to compute optimal controls for problems with states in the space of probability measures. Since many optimal control problems constrained by a system of ordinary differential equations (ODE) modelling interacting particles converge to optimal control problems constrained by a partial differential equation (PDE) in the mean-field limit, it is interesting to have a calculus directly on the mesoscopic level of probability measures which allows us to derive the corresponding first-order optimality system. In addition to this new calculus, we provide relations for the resulting system to the first-order optimality system derived on the particle level, and the first-order optimality system based on -calculus under additional regularity assumptions. We further justify the use of the -adjoint in numerical simulations by establishing a link between the adjoint in the space of probability measures and the adjoint corresponding to -calculus. Moreover, we prove a convergence rate for the convergence of the optimal controls corresponding to the particle formulation to the optimal controls of the mean-field problem as the number of particles tends to infinity.

Corresponding author]totzeck@mathematik.uni-kl.de

Keywords. Optimal control with ODE/PDE constraints, interacting particle systems, mean-field limits.

AMS subject classifications. 49K15, 49K20

## 1. Introduction

In the past few years, the growing interest in the (optimal) control of interacting particle systems and their corresponding mean-field limits led to many contributions on their numerical behavior (see, e.g., [diss, sheep1]) as well as their analytical properties, e.g.,[FornasierSolombrino, FornasierPP]. They can be found in various fields of applications, for example physical or biological models like crowd dynamics [sheep1, Borzi, zuazua, bongini], consensus formation [Dante], or even global optimization [CBO1, CBO2]. Meanwhile, there are also first approaches for stochastic particle systems available [bonnet, pham].

Since there are several points of view on this subject, the analytical techniques vary from standard ODE and PDE theory over optimal transport to measure-valued solutions. This induces also different variants for the derivation of first-order optimality conditions and/or gradient information, which clearly also has some impact on the design of appropriate numerical algorithms for the solution of the optimal control problems at hand.

Here, we present a calculus for the derivation of first-order optimality conditions in the space of probability measures and link it to other different approaches discussed in [bonnet, FornasierPP, FornasierSolombrino] (cf. Section 4). As opposed to the first-order optimality conditions found in those papers, our calculus yields vector fields as adjoint variables, which is consistent with the adjoint variables that appear on the level of particle systems. Furthermore, we show, in Section 5, how these new insights might be used for further analytical investigations. To get an idea of the different strategies, we begin with a simple example that displays all the main features of our calculus.

### 1.1. An illustrative example: Controlling a single particle

Let us start with an illustrative example from classical optimal control in order to illustrate the idea without the complication of a mean-field limit. We denote the dimension of the state space by and the time interval of interest is for some . We assume that the control variable acts on the velocity of a single particle with trajectory for and we want to optimize a given functional depending on the trajectory, i.e.,

 (1) (x,u)=argmin∫T0g(xt)dt,% subject toddtxt=v(xt,ut),

where and are given, sufficiently regular functions.

Then, the standard Pontryagin Maximum principle yields the existence of an adjoint variable satisfying

 (2) ddtξt=∇xg(xt)+∇xv(xt,ut)ξt,

with terminal condition .

Moreover, the control satisfies the optimality condition

 ∇uv(xt,ut)⋅ξt=0 a.e. in (0,T).

These conditions can be translated into the calculation of a saddle-point of the microscopic Lagrangian

 (3) Lmicro(x,u,ξ)=∫T0g(xt)dt+∫T0(ddtxt−v(xt,ut))⋅ξtdt.

On the other hand, the discrete ODE can be translated into a macroscopic formulation via the method of characteristics: with initial value the concentrated measure is the unique solution of

 (4) ∂tμt+∇x⋅(v(xt,ut)μt)=0.

Since all measures are concentrated at we can reinterpret as the evaluation of a feedback-control at and equivalently obtain

 (5) ∂tμt+∇x⋅(v(x,ut)μt)=0,μ0=δx(0).

Since

 ∫T0g(xt)dt=∫T0⟨g,μt⟩dt,

we can formulate an optimal control problem at the macroscopic level for the measure and the control variable , i.e.,

 (6) (μ,u)=argmin∫T0⟨g,μt⟩dtsubject to (???).

This macroscopic optimal control problem is in fact equivalent to the microscopic one for a single particle, since we can choose the state space as the Banach space of Radon measures and the control space as an appropriate space of reasonably smooth functions on . The uniqueness of solutions to the transport equation and the special initial value will always yield a concentrated measure and the identification brings us back to the microscopic control.

However, with the macroscopic formulation we have another option to derive optimality conditions in these larger spaces, based on the Lagrangian

 (7) Lmacro(μ,u,φ)=∫T0⟨g,μt⟩dt+∫T0⟨φ,∂tμt+∇x⋅(v(x,ut)μt)⟩dt.

Then, the macroscopic adjoint equation becomes

 (8) ∂tφ+v(x,ut)⋅∇xφ=0

and the optimality condition is given by

 −⟨∇xφ,∇uv(x,ut)μt⟩=0.

Due to the equivalence of the microscopic and macroscopic optimal control problem it is natural to ask for the relation between the adjoint variables and , which is not obvious at a first glance and yet only very little discussed. For first results in this direction see [herty18]. Using the special structure of the solution and the identification with the microscopic control we can rewrite the optimality condition as

 ∇uv(xt,ut)⋅(−∇xφ(xt,t))=0,

which induces the identification

 (9) ξt=−∇xφ(xt,t).

Indeed, the method of characteristics confirms that satisfies the microscopic adjoint equation. This becomes more apparent if we consider only variations of that respect the nonnegativity and mass one condition of the probability measure, i.e.,

 μ′=−∇⋅q,

with a vector-valued measure being absolutely continuous with respect to . Then, an integration by parts argument directly reveals the relation to .

By using variations of this kind we reinterpret the state space as a Riemannian manifold of Borel probability measures equipped with the 2-Wasserstein distance instead of the flat Banach space of Radon measures. The analysis of particle systems and limiting nonlinear partial differential equations in the 2-Wasserstein distance has been a quite fruitful field of study in the last years following the seminal papers [Otto, JKO]. It is hence highly overdue to study such an approach also in the optimal control setting.

We mention that the values of outside the trajectory are irrelevant for the specific control problem. Solving

 ∂tφ+v(⋅,ut)⋅∇xφ=0,∇uv(⋅,ut)⋅∇φ=0on Rd×(0,T),

we obtain the adjoints for all possible microscopic control problems with initial value in . This is just the well-known Hamilton-Jacobi-Bellmann equation, usually derived with different arguments.

###### Remark 1.1.

The above arguments can also be extended to a stochastic control system (see, e.g., [roy2018]):

 (10) (X,u)=argmin∫T0Ex[g(Xt)]dt,subject todXt=v(Xt,ut)dt+σ(Xt,ut)dWt,

with being a Wiener process and the solution to the stochastic differential equation with initial condition . In this case the state equation for the probability density becomes

 (11) ∂tμt+∇⋅(v(x,ut)μt)=12Δ(σ2μt),

and does not necessarily remain a concentrated measure in time, which corresponds to the stochasticity of the model.

### 1.2. Control in the Mean-field Limit

Having understood the relation between microscopic and macroscopic formulations of the optimal control problem, it seems an obvious step to consider optimal control problems for a high number of particles and their mean-field limit as , which is also the motivation for this paper. However, in the mean-field limit there is no microscopic particle system and corresponding optimal control problem, hence an additional step is needed to understand the connection in the limit. The basis for such a step is to understand the characteristic flow, which replaces the particle dynamics and naturally leads to an analysis in the Wasserstein distance. We will further investigate this mean-field setting in the remainder of the paper.

Here, we restrict our considerations to first order dynamics, but the present paper can be seen as an analytical justification of the convergence shown numerically in [sheep1]. It is an additional contribution to the field of optimization of particle systems and their mean-field limits which is lively discussed in the recent years (e.g. [Borzi, Dante, FornasierSolombrino, FornasierPP, CBO1, CBO2, Giacomo]). Moreover, we would like to connect the fields of optimal control and gradient flows as well as optimal transport. In particular, we show relations between the adjoints derived by -calculus and adjoints derived in the space of probability measures (-adjoints).

The paper is organized as follows: in Section 2 the microscopic model for particles and the corresponding mean-field equation is introduced. Further, we formulate the optimal control problems under investigation. The first main contribution of the article is the derivation of the first-order optimality conditions in the mesoscopic formulation given in Section 3. A discussion of the relation of this new calculus to the first-order optimality systems on the particle level and the first-order optimality condition based on -calculus is the content of Section 4. In Section 5 we show the second main result which is the convergence rate for the optimal controls as .

## 2. Optimal Control Problems

First, we generalize the one-particle case to interacting particles, modeling, e.g., crowd dynamics [sheep1]. Then, we derive its corresponding mean-field limit, i.e., the mesoscopic approximation. These two are the state systems for the respective optimal control problems. Further, we present the assumptions which are necessary for the well-posedness of the state systems.

### 2.1. The State Models

As before, denotes the dimension of the state space and with is the time interval of interest.

#### 2.1.1. The particle system

The considered particle system consists of particles of the same type and controls represented by the functions

 xi,uℓ:[0,T]→Rd,for i=1,…N\;\;and\;\;ℓ=1,…,M.

The vectors

 x:=(xi)i=1,…,N,u:=(uℓ)ℓ=1,…,M,

denote the states of the particles and the controls, respectively.

 (12) ddtxt=vN(xt,% ut),x0=^x,

with given defining the initial states of the particles. The operator on the right-hand side strongly depends on the type of application. Here, we assume

1. Let be given, such that for all :

 ⟨v(μ,u)(x)−v(μ,u)(y),x−y⟩≤Cl|x−y|2,x,y∈Rd.

where the constant is independent of .

We further define via

 vNi(x,u):=v(μN,u)(xi),i=1,…,N,

where

 μN(A)=1NN∑i=1δxi(A),A∈B(Rd)(=Borel σ-algebra),

is the empirical measure for the state .

2. For any two , there exists a constant , independent of and , such that

 ∥v(μ,u)−v(μ′,u′)∥sup≤Cv(W2(μ,μ′)+∥u−u′∥2).
###### Remark 2.1.

By definition, assigns the probability of finding particles with states within a measurable set on the state space at time .

Standard results from ODE theory yield the existence and uniqueness of a global solution.

###### Proposition 2.2.

Assume 1 and 2. Then, for given and there exists a unique global solution of (12).

###### Remark 2.3.

In particular, for applications in the control of crowds we have that models interactions, i.e., particle-particle and particle-control interactions by means of forces (see [CarrilloSurvey] and the references therein). Then, is often given by

 (13) vNi(x,u)=−1NN∑j=1K1(xi−xj)−M∑ℓ=1K2(xi−uℓ),

for given interaction forces and modeling the interactions within the cloud of particles itself and of the particles with the controls, respectively.

#### 2.1.2. The mean-field model

In order to define the limiting problem for an increasing number of particles explicitly, we consider the empirical measure .

Using the ideas from [Neunzert, BraunHepp, Dobrushin] we derive the corresponding PDE formally as

 (14) ∂tμt+∇⋅(v(μt,ut)μt)=0,μ0=^μ,

which is the mean-field 1-particle distribution evolution equation, supplemented with the initial condition , where denotes the space of Borel probability measures on with finite second moment, endowed with the 2-Wasserstein distance, which makes a complete metric space. For the sake of completeness we recall the 2-Wasserstein distance:

 W22(μ,ν):=infπ∈Π(μ,ν){∫Rd|x−y|2dπ(x,y)},μ,ν∈P2(Rd),

where denotes the set of all Borel probabililty measures on that have and as first and second marginals respectively, i.e.,

 π(B×Rd)=μ(B),π(Rd×B)=ν(B)for B∈B(Rd).

In the rest of the article we denote by the second moment of .

###### Remark 2.4.

Here denotes the mean-field representation of . In fact, for the structure given by (13), we obtain

 (15) v(μ,u)=−K1∗μ−M∑ℓ=1K2(x−uℓ).

In the mean-field setting we consider the following notion of solution.

###### Definition 2.5.

We call a weak measure solution of (14) with initial condition iff for any test function we have

 ∫T0∫Rd(∂tht+v(μt,ut)⋅∇ht)dμtdt+∫Rdh0dμ0=0.

An existence and uniqueness result for solutions of (14) may be found, e.g., in [BraunHepp, WassersteinConvergence, Dobrushin, Golse], where the notion of solution is established in the Wasserstein space :

###### Proposition 2.6.

Assume 1 and 2 and let . Then, for there exists a unique global (weak measure) solution of (14).

Further, for we have , where is the initial condition of (12).

###### Remark 2.7.

Under the assumptions 1 and 2 we have enough regularity to use the classical method of characteristics to deduce for any the existence of an unique global flow satisfying

 (16) ddtQt(x,s)=v(μt,ut)∘Qt(x,s),Qs(x,s)=x.

In particular, for we obtain the nonlinear flow with a random initial condition distributed according to , i.e., . The solution of (14) may then be explicitly expressed as for all We shall make use of this representation at several points in the remainder. For simplicity we set

The following stability statement will be useful in the coming results. Its proof may be found in Appendix A.

###### Lemma 2.8.

Let the assumptions 1 and 2 hold, and be solutions to the continuity equation (20) for given controls u, and initial data , , respectively. Then, there exist positive constants and such that

 W22(μt,μ′t)≤(W22(^μ,^μ′)+b∥u−u′∥2L2((0,T),RdM))eatfor all\, t∈[0,T].

We end this section with an important observation:

###### Remark 2.9.

We emphasize that the particle problem is just a special case of the mean-field problem specified by the inital condition. Indeed, for the initial condition we have , where is the initial condition of (12). Strictly speaking, we have only one optimization problem to consider in the following. Whether the problem at hand is of microscopic or mesoscopic type is determined by the initial condition.

### 2.2. Optimal Control Problem

We define the set of admissible controls as

This choice of ensures the continuity of the controls (compare also the previous existence results).

For the study of the respective optimal control problem we require:

1. The cost functional is of separable type, i.e.,

 (18) J(μ,u)=∫T0J1(μt)dt+J2(u),

where is continuously differentiable, weakly lower semicontinuous and coercive on . Further, is a cylindrical function of the form

 J1(μ)=j(⟨g1,μ⟩,…,⟨gL,μ⟩),

where and such that , and

 |∇gℓ|(x)≤Cg(1+|x|)for all x∈Rd % and ℓ=1,…,L,

for some constant .

2. For the microscopic case, we define as well as

 (19) JN(x,u):=∫T0JN1(xt)dt+J2(u),

and assume that is continuously differentiable.

###### Remark 2.10.

Note, that the differentiability properties in the previous assumptions are only necessary for the derivation of the optimality conditions in the next sections, and not for the existence of the respective optimal controls.

A direct consequence of assumption 3 is the continuity of in the Wasserstein metric.

###### Lemma 2.11.

Assume 3 and let with

 M1:=maxℓ=1,…,Lsupt∈[0,T]{|⟨gℓ,μt⟩|+|⟨gℓ,νt⟩|}<∞,M2:=supt∈[0,T]{m2(μt)+m2(νt)}<∞.

Then, there exists a constant , independent of such that

 |J1(μt)−J1(νt)|≤CjW2(μt,νt)for% all\, t∈[0,T].
###### Proof.

Let and be arbitrary. Then, for each , we have by 3, the mean-value theorem and Hölder’s inequality that

 |⟨gℓ,μ⟩−⟨gℓ,ν⟩| ≤∬Rd×Rd|gℓ(x)−gℓ(y)|dπ ≤∬Rd×Rd∫10|∇gℓ|((1−τ)x+τy)|y−x|dτdπ ≤∬Rd×Rd∫10Cg(1+|(1−τ)x+τy|)|y−x|dτdπ ≤Cg[1+(√m2(μ)+√m2(ν))]W2(μ,ν),

where is the optimal coupling between and . In particular, the estimate above shows that the mapping is locally Lipschitz for every .

Denote and . The assumptions on and , and the previous estimate yields

 |J1(μt)−J1(νt)| ≤∫10|Dj(qt+τ(pt−qt))||pt−qt|dτ ≤LCg(1+2√M2)(supp∈BLM1|Dj(p)|)W2(μt,νt),

where we used the fact that for all , . ∎

###### Remark 2.12.

Note that cost functionals that track the center of mass and the variance of a crowd satisfy 3 and 4. In fact, for ,

 j(y1,y2)=λ12|y1−xdes|2+λ24|y2−y1|2,g1(x)=x,g2(x)=|x|2, J2(u)=λ32M∑m=1∫T0∣∣∣ddtumt∣∣∣2dt

fit into the setting. Therefore, the assumptions are rather general and not restrictive for applications (cf. [sheep1]).

The well-posedness of the state problem justifies the notation assigning the unique solution of the state equation to the control. Then, the optimal control problem we investigate in the following is given by

###### Problem 1:

Find such that

 (P∞) (μ(¯u),¯u)=argminμ,u% J(μ,u)subject to (???).

For later use, we note that in the particle case, i.e., for discrete initial data (cf. Remark 2.9), we can rewrite the optimization problem as follows:
For fixed, find such that

 (PN) (¯xN(¯uN),¯uN)=argminx,uJN(x,u)subject to (???).

Using the standard argument based on the boundedness of a minimizing sequence in and continuity properties of stated in 3 and 4, we obtain the following existence result:

###### Theorem 2.13.

Assume 14. Then, the optimal control problem () has a solution .

###### Remark 2.14.

The well-posedness of () follows directly from the above theorem, as the particle problem is a special case of (), see Remark 2.9. Nevertheless, one can prove the well-posedness of () also directly using classical techniques in the optimal control of ODEs.

## 3. First-order optimality conditions in the Wasserstein space P2(Rd)

The main objective of this section is to derive the first-order optimality conditions (FOC) for the optimal control problem () in the framework of probability measures with bounded second moment equipped with the 2-Wasserstein distance. For the sake of a smooth presentation we restrict the interaction terms to the special ones defined in (13) and (15), respectively. This allows us to pose the following regularity assumption

1. .

###### Remark 3.1.

Note that assumption 5 directly implies that defined by (15) is in for every , with

 Kv:=supμ,u{∥v(μ,u)∥∞+∥∇v(μ,u)∥∞}<∞.

For given initial condition we define the state space as

 Y={μ∈C([0,T],P2(Rd)):μt|t=0=^μ∈P2(Rd)}.

As the optimization in the setting is not well-known, we begin by discussing known results (see [Ambrosio, Chapter 8.1]) regarding the constraint

 (20) ∂tμt+∇⋅(v(μt,ut)μt)=0,μt|t=0=^μ.

Recall Proposition 2.6 that provides for each a unique solution of (20). In particular, satisfies

 (21) E(μ,u)[φ]:=⟨φT,μT⟩−⟨φ0,^μ⟩−∫T0⟨∂tφ+v(μt,u% t)⋅∇φt,μt⟩dt=0,

for all . Therefore, there is a well-defined solution operator , which allows us to recast the constrained minimization problem as

where is the so-called reduced functional.

###### Definition 3.2.

A pair is said to be admissible if for all .

Unfortunately, the reduced cost functional is not handy in deriving the first-order optimality conditions for (). For this reason, we will take an extended-Lagrangian approach. We begin by observing that () may be recast as

which may be further reformulated as

 (22) min(μ,u)I(μ,u)=min(μ,u){J(μ,u)+supφ∈AE(μ,u)[φ]}.

Indeed, notice that , since implies for every . Therefore, if for some , the linearity in of yields for every , which consequently shows that .

Under the separation assumption on , i.e., , (22) becomes

with

 χ(u)=minμsupφ∈A{J1(μ)+E(μ,u)[φ]}.

In the following we derive a necessary condition for to be a stationary point. Let be an optimal pair, and be a perturbation of for an arbitrary smooth map such that and there exists a unique satisfying for all Then

 χ(uδ) =minμsupφ∈A{J1(μ)+E(μ,uδ)[φ]}=J1(μδ) =J1(μδ)−J1(¯μ)+minμsupφ∈A{J1(μ)+E(μ,¯u)[φ]} =J1(μδ)−J1(¯μ)+χ(¯u),

and the directional derivative of at along h is given by

which requires us to know the relationship between and .

###### Remark 3.3.

Note, that Lemma 2.8 above provides a stability estimate of the form

 W2(μδt,μt)≤δ√beaT/2∥h∥L2((0,T),RdM)for all\, t∈[0,T],

for appropriate constants . Hence, for each , the curve starting from at is absolutely continuous w.r.t. the 2-Wasserstein distance. In this case, there exists a vector field for each satisfying [Ambrosio, Proposition 8.4.6]

 (23) limδ→0W2(μδt,(id+δψt)#μt)δ=0.

Furthermore,

 W22((id+δψt)#μt,μt)≤∬Rd×Rd|x+δψt(x)−x|2dμt(x)=δ2∬Rd×Rd|ψt(x)|2dμt(x),

where the explicit coupling was used. In particular, we have that

 limsupδ→0W2(μδt,μt)δ=limsupδ→0W2((id+δψt)#μt,μt)δ≤√∬Rd×Rd|ψt|2dμt.

The previous remark allows us to establish an explicit relationship between and h.

###### Lemma 3.4.

Let be an admissible pair, and such that

1. , and

2. there exists satisfying ,

for sufficiently small. Then, the velocity field satisfying (23) fulfills

 (24) μt(∂tψt+Dψtv(μt,% ut)−K(μt,ut)[ψt,ht])=0,in the sense of distribution,

with

 K(μt,ut)[ψt,ht]:=limδ→01δ[v((id+δψt)#μt,uδt)∘(id+δψt)−v(μt,ut)].
###### Proof.

Let . Then for any ,

 ddt∫Fdνδt =ddt∫F∘(id+δψt)dμt =∫⟨(∇F)∘(id+δψt),δ∂tψt⟩dμt+∫⟨∇(F∘(id+δψt)),v(μt,ut)⟩dμt =∫⟨(∇F)∘(id+δψt),δ∂tψt+(I+δDψt)v(μt,ut)⟩dμt.

On the other hand, since for all , we also have that

 ddt∫Fdμδt =∫⟨∇F,v(μδt,uδt)⟩dμδt=∫⟨∇F,v(νδt,uδt)⟩dνδt+(I) =∫⟨(∇F)∘(id+δψt),v(νδt,uδt)∘(id+δψt)⟩dμt+(I),

where

 (I):=∫⟨∇F,v(μδt,uδt)⟩dμδt−∫⟨∇F,v(νδt,uδt)⟩dνδt.

We prove that for : Let be an optimal coupling between and . Then

 (I) =∬[⟨∇F(x),v(μδt,uδt)(x)⟩−⟨∇F(y),v(νδt,uδt)(y)⟩]dπδt =∬⟨∇F(x)−∇F(y),v(μδt,u% δt)(x)⟩dπδt+∬⟨∇F(y),v(μδt,uδt)(x)−v(μδt,uδt)(y)⟩dπδt