Decentralized Convergence to Nash Equilibria in Constrained Deterministic
Mean Field Control
This paper considers decentralized control and optimization methodologies for large populations of systems, consisting of several agents with different individual behaviors, constraints and interests, and affected by the aggregate behavior of the overall population. For such large-scale systems, the theory of aggregative and mean field games has been established and successfully applied in various scientific disciplines. While the existing literature addresses the case of unconstrained agents, we formulate deterministic mean field control problems in the presence of heterogeneous convex constraints for the individual agents, for instance arising from agents with linear dynamics subject to convex state and control constraints. We propose several model-free feedback iterations to compute in a decentralized fashion a mean field Nash equilibrium in the limit of infinite population size. We apply our methods to the constrained linear quadratic deterministic mean field control problem and to the constrained mean field charging control problem for large populations of plug-in electric vehicles.
Decentralized control and optimization in large populations of systems are of interest to various scientific disciplines, such as engineering, mathematics, social sciences, system biology and economics. A population of systems comprises several interacting heterogeneous agents, each with its own individual dynamic behavior and interest. For the case of small/medium size populations, such interactions can be analyzed via dynamic noncooperative game theory .
On the other hand, for large populations of systems the analytic solution of the game equations becomes computationally intractable. Aggregative and population games [2, 3, 4, 5] represent a viable solution method to address large population problems where the behavior of each agent is affected by some aggregate effect of all the agents, rather than by specific one-to-one effects. This feature attracts substantial research interest, indeed motivated by several relevant applications, including demand side management (DSM) for large populations of prosumers in smart grids [6, 7, 8, 9], charging coordination for large fleets of plug-in electric vehicles (PEVs) [10, 11, 12], congestion control for networks of shared resources , synchronization of populations of coupled oscillators in power networks [14, 15].
Along these lines, Mean Field (MF) games have emerged as a methodology to study multi-agent coordination problems where each individual agent is influenced by the statistical distribution of the population, and its contribution to the population distribution vanishes as the number of agents grows [16, 17, 18]. Specific research attention has been posed to MF setups where the effect of the population on each individual agent is given by a weighted average among the agents’ strategies. Unlike aggregative games, the distinctive feature of MF games is the emphasis on the limit of infinite population size, as this abstraction allows one to approximate the average population behavior based on its statistical properties only [16, 17, 18]. In the most general case, as the number of agents tends to infinity, the coupled interactions among the agents can be modeled mathematically via a system of two coupled Partial Differential Equations (PDEs), the Hamilton–Jacobi–Bellman (HJB) PDE for the optimal response of each individual agent [16, 17] and the Fokker–Planck–Kolmogorov (FPK) PDE for the dynamical evolution of the population distribution . From the computational point of view, in the classical MF game setups, all the agents need information regarding the statistical properties of the population behavior to solve the MF equations in a decentralized fashion.
In this paper, we consider deterministic MF games, as in [7, 10, 12, 19], with an information structure for the agents which differs from the one of classical MF games. Specifically, we assume that the agents do not have access to the statistical properties of the population but, on the contrary, react optimally to a common external signal, which is broadcast by a central population coordinator. This information structure is typical of many large-scale multi-agent coordination problems, for instance in large fleets of PEVs [10, 11, 12], DSM in smart grids [6, 8, 9], and congestion control . We then define the mean field control problem as the task of designing an incentive signal that the central coordinator should broadcast so that the decentralized optimal responses of the agents satisfy some desired properties, in terms of the original deterministic MF game. Contrary to the standard approach used to solve MF games, our MF control approach allow us to compute (almost) Nash equilibria for deterministic MF games in which the individual agents are subject to heterogeneous convex constraints, for instance arising from different linear dynamics, convex state and input constraints. Our motivation comes from the fact that constrained systems arise naturally in almost all engineering applications, playing an active role in the agent behavior.
In the presence of constraints, the optimal response of each agent is in general not known in closed form. To overcome this difficulty, we build on mathematical definitions and tools from convex analysis and operator theory [20, 21], establishing useful regularity properties of the mapping describing the aggregate population behavior. We solve the constrained deterministic MF control problem via several specific feedback iterations and show convergence to an incentive signal generating a MF equilibrium in a decentralized fashion, making our methods scalable as the population size increases. Analogously to [10, 12, 17, 19], we seek convergence to a MF Nash equilibrium, that is, we focus on equilibria in which each agent has no interest to change its strategy, given the aggregate strategy of the others.
The contributions of the paper are hence the following:
We address the deterministic mean field control problem for populations of agents with heterogeneous convex constraints.
We show that the set of optimal responses to an incentive signal that is a fixed point of the population aggregation mapping gets arbitrarily close to a mean field Nash equilibrium, as the population size grows.
We show several regularity properties of the mappings arising in constrained deterministic mean field control problems.
We show that specific feedback iterations are suited to solve constrained deterministic mean field control problems with specific regularity.
We apply our results to the constrained linear quadratic deterministic mean field control problem and to the constrained mean field charging control problem for large populations of plug-in electric vehicles, showing extensions to literature results.
The paper is structured as follows. Section II presents as a motivating example the LQ deterministic MF control problem for agents with linear dynamics, quadratic cost function, convex state and input constraints. Section III shows the general deterministic MF control problem and the technical result about the approximation of a MF Nash equilibrium. Section IV contains the main results, regarding some regularity properties of parametric convex programs arising in deterministic MF problems and the decentralized convergence to a MF Nash equilibrium of specific feedback iterations. Section V discusses two applications of our technical results; it revises the constrained LQ deterministic MF control problem and presents the constrained MF charging problem for a large populations of heterogeneous PEVs. Section VI concludes the paper and highlights several possible extensions and applications. Appendix -A presents some background definitions and results from operator theory; Appendix -B justifies the use of finite-horizon formulations to approximate infinite-horizon discounted-cost ones; Appendix -C contains all the proofs of the main results.
, , respectively denote the set of real, positive real, non-negative real numbers; denotes the set of natural numbers; denotes the set of integer numbers; for , , . denotes the transpose of . Given vectors , denotes . Given matrices , denotes the block diagonal matrix with in block diagonal positions. With we denote the set of symmetric matrices; for a given , the notations () and () denote that is symmetric and has positive (non-negative) eigenvalues. We denote by , with , the Hilbert space with inner product defined as , and induced norm defined as . A mapping is Lipschitz in if there exists such that for all . denotes the identity operator, for all . Every mentioned set is meant to be nonempty, unless explicitly stated. The projection operator in , , is defined as . denotes the -dimensional identity matrix; denotes a matrix of all s; denotes a matrix/vector of all s. denotes the Kronecker product between matrices and . Given , and , denotes the set ; hence given and , . The notation denotes that there exists such that .
Ii Constrained linear quadratic deterministic mean field control as motivating example
We start by considering a population of agents, where each agent has discrete-time linear dynamics
where is the state variable, is the input variable, and , . For each agent, we consider time-varying state and input constraints
for all , where and are convex compact sets.
Let us consider that each agent seeks a dynamical evolution that, given the initial state , minimizes the finite-horizon cost function
where , for all , , for all , and .
The cost function in (3) is the sum of two cost terms, and ; the first penalizes deviations from the average population behavior plus some constant offset , while the second penalizes the control effort of the single agent. Note that the time-varying weights and can also model an exponential cost-discount factor as in [17, Equation 2.6], e.g., and for some and .
We emphasize that the optimal decision of each agent , that is, a feasible dynamical evolution minimizing the cost in (3), also depends on the decisions of all other agents through the average state among the population. This feature results in an aggregative game [2, 3, 4] among the population of agents, and specifically in a (deterministic) MF game, because the individual agent’s state/decision depends on the mean population state [10, 19].
The constrained LQ deterministic MF control problem then consists in steering the optimal responses to a noncooperative equilibrium of the original MF game, which satisfies the constraints and is convenient for all the individual noncooperative agents, via an appropriate incentive signal. To solve the MF control problem, we consider an algorithmic setup where a central coordinator, called virtual agent in [17, Section IV.B], broadcasts a macroscopic incentive related to the average population state to all the agents. In other words, the individual agents have no detailed information about every other agent, nor about their statistical distribution, but only react to the information broadcast by a central coordinator, which is somehow related to their aggregate behavior.
Formally, we define the optimal response to a given reference vector of each agent as the solution to the following finite horizon optimal control problem:
We assume that the optimization problem (4) is feasible for all agents , that is, given the initial state , we assume that there exists a control input sequence such that the sets are reachable at time steps , respectively [22, Chapter 6]. This assumption can be checked by solving a convex feasibility problem; furthermore, the set of initial states such that (4) is solvable can be computed by solving the feasibility problem parametrically in .
We refer to [17, Section III] for the stochastic continuous-time infinite-horizon unconstrained counterpart of our linear quadratic (LQ) MF game. Here we focus on a discrete-time finite-horizon formulation to effectively address state and input constraints, by embedding them in finite-dimensional convex quadratic programs (QPs) that are efficiently solvable numerically.
Let us now rewrite the optimization problem in (4) in the following compact form:
and, for a given initial condition ,
Iii Deterministic mean field control problem with convex constraints
Iii-a Constrained deterministic mean field game with quadratic cost function
We consider a large population of heterogeneous agents, where each agent controls its decision variable , taking values in the compact and convex set . The aim of agent is to minimize its individual deterministic cost which depends on its own strategy and on the weighted average of strategies of all the agents, that is for some aggregation parameters . Technically, each agent aims at computing the best response to the other agents’ strategies , that is,
Note that the best response mapping depends only on the aggregate of the other players strategies, thus leading to a MF game setup. In classical game theory, a set of strategies in which every agent is playing a best response to the other players strategies is called Nash equilibrium. In the MF case, the concept is similar: if the population is at a MF Nash equilibrium, then each agent has no individual benefit to change its strategy, given the aggregation among the strategies of the others.
Definition 1 (Mean field Nash equilibrium)
Given a cost function and aggregation parameters , a set of strategies is a MF -Nash equilibrium, with , if for all it holds
It is a MF Nash equilibrium if (10) holds with .
In the sequel, we consider the class of deterministic MF games with convex quadratic cost defined as
where , , and .
The three cost terms in (12) emphasize the contribution of three different contributions to the cost function: a quadratic cost , typical of LQ MF games [19, 17, 23], a quadratic penalty on the deviations from the aggregate information [17, 10], and an affine price-based incentive [10, 12]. Let us also notice that he agents are fully heterogeneous relative to the constraint sets .
Throughout the paper, we consider uniformly bounded aggregation parameters and individual constraint sets for all population sizes, which is typical of all the mentioned engineering applications.
Standing Assumption 1 (Compactness)
There exist and a compact set such that for all , , and hold for all .
Iii-B Information structure and mean field control
We notice that to compute the best response strategy each agent would need to know the aggregation among the strategies of all other agents, namely . Motivated by several large-scale multi-agent applications [6, 8, 9, 10, 12, 13], here we consider a different information structure where each individual agent has neither knowledge about the states of the other agents, nor about the aggregation parameters . Instead, here every agent reacts to some macroscopic incentive, which is a function of the aggregate information about the whole population, including its contribution as well, and is broadcast to all the agents. Given this information structure, we assume that each agent reacts to a broadcast signal through the optimal-response mapping defined as
where is as defined in (11).
Moreover, let us formalize the aggregate (e.g., average) population behavior obtained when all the agents react optimally to a macroscopic signal by defining with the aggregation mapping as
The difference between the best response mapping defining the game in (9), and the optimal response mapping in (12) is that, while in the former an agent can also optimize its contribution in , in the latter the signal is fixed and hence the optimization in (12) is carried over the first argument of only.
According to the information structure described above, the MF control addresses the problem of designing a reference signal , such that the set of strategies possesses some desired properties, relative to the deterministic MF game in (9). Specifically, here we require the set of strategies to be an almost MF Nash equilibrium. To solve this MF control problem, we consider a setup where the agents communicate to a central coordinator in a decentralized iterative fashion. Namely, for a given broadcast signal at iteration , each agent computes its optimal response based only on its own constraint set , that is its private information. The central coordinator then receives the aggregate of all the individual responses, computes an updated reference through some feedback mapping , broadcasts it to the whole population, and the process is repeated.
Technically speaking, given the cost function , the agents constraint sets and the aggregation parameters , the MF control problem consists in designing a signal , for instance via a feedback iteration such that, for any initial condition , which generates a MF (almost) Nash equilibrium for the original MF game in (9).
Iii-C Mean field Nash equilibrium in the limit of infinite population size
Since the objective of our MF control problem is to find a MF Nash equilibrium for large population size, we exploit the Nash certainty equivalence principle or mean field approximation idea [17, Section IV.A]. Namely, for any agent , the problem structure is such that the contribution of an individual strategy to the average population behavior is negligible. Therefore, if , then the optimal response approximates the best response of agent to the strategies of the other players, for large population size .
Formally, under the uniform compactness condition for all population sizes in Standing Assumption 1, the following result shows that a fixed point of the aggregation mapping in (13) generates a MF Nash equilibrium in the limit of infinite population size.
Theorem 1 (Infinite population limit)
It follows from the proof of Theorem 1, given in Appendix -C, that a fixed point of in (13) with population size is a MF -Nash equilibrium with . Having a uniform upper bound on the aggregation parameters means that no single agent has a disproportionate influence on the population aggregation for large population size, which is a typical feature of MF setups [19, 17, 10].
Iv The quest for a fixed point of the aggregation mapping
Iv-a Mathematical tools from fixed point operator theory
In this section we present the mathematical definitions needed for the technical results in Section IV-B, regarding appropriate fixed point iterations relative to the aggregation mapping. For ease of notation, the statements of this section are formulated in an arbitrary finite-dimensional Hilbert space , that is, in terms of an arbitrary norm on , but in general hold for infinite-dimensional metric spaces.
We start from the property of contractiveness [21, Definition 1.6], exploited in most of the MF control literature [17, 23, 10] to show, under appropriate technical assumptions, convergence to a fixed point of the aggregation mapping.
Definition 2 (Contraction mapping)
A mapping is a contraction (CON) if there exists such that
for all .
If a mapping is CON, then the Picard–Banach iteration, ,
converges, for any initial condition , to its unique fixed point [21, Theorem 2.1].
Although commonly used in the MF game literature [17, 23, 10], contractiveness is a quite restrictive property. In this paper we actually exploit less restrictive properties than contractiveness, starting with nonexpansiveness [20, Definition 4.1 (ii)].
Definition 3 (NonExpansive mapping)
A mapping is nonexpansive (NE) if
for all .
Clearly, a CON mapping is also NE, while the converse does not necessarily hold. Note that, unlike CON mappings, NE mappings, e.g., the identity mapping, may have more than one fixed point. Among NE mappings, let us refer to firmly nonexpansive mappings [20, Definition 4.1 (i)].
Definition 4 (Firmly NonExpansive mapping)
A mapping is firmly nonexpansive (FNE) if
for all .
An example of FNE mapping is the metric projection onto a closed convex set [20, Proposition 4.8].
The FNE condition is sufficient for the Picard–Banach in (15) iteration to converge to a fixed point [24, Section 1, p. 522]. This is not the case for NE mappings; for example, is NE, but not CON, and the Picard–Banach iteration oscillates indefinitively between and . If a mapping is NE, with compact and convex, then the Krasnoselskij iteration
where , converges, for any initial condition , to a fixed point of [21, Theorem 3.2].
Finally, we consider the even weaker regularity property of strict pseudocontractiveness [21, Remark 4, pp. 12–13].
Definition 5 (Strictly PseudoContractive mapping)
A mapping is strictly pseudocontractive (SPC) if there exists such that
for all .
If a mapping is SPC with compact and convex, then the Mann iteration
It follows from Definitions 2–5 that FNE NE, CON NE SPC. Therefore, the Mann iteration in (20) ensures convergence to a fixed point for CON, FNE, NE and SPC mappings; the Krasnoselskij iteration in (18) ensures convergence for CON, FNE and NE mappings; the Picard–Banach iteration in (15) for CON and FNE mappings.
The known upper bounds on the convergence rates suggest that a simpler iteration has faster convergence in general. The convergence rate for the Picard–Banach iteration is linear, that is [21, Chapter 1]. Instead, the convergence rate for the Mann iteration is sublinear, specifically [21, Chapter 4], for some .
Note that CON mappings have a unique fixed point [21, Theorem 1.1], whereas FNE, NE, SPC mappings may have multiple fixed points. In our context this implies that, unless the aggregation mapping is CON, there could exist multiple MF Nash equilibria, which is effectively the case in multi-agent applications.
Iv-B Main results: Regularity and decentralized convergence
Theorem 2 (Regularity of the optimizer)
Consider the following matrix inequality, where are from (12):
We can now exploit the structure of the aggregation mapping in (13) to establish our main result about its regularity. Specifically, under the conditions of Theorem 2, the aggregation mapping inherits the same regularity properties of the individual optimizer mappings.
Theorem 3 (Regularity of the aggregation)
Theorem 3 directly leads to iterative methods for finding a fixed point of the aggregation mapping, that is a solution of the MF control problem in the limit of infinite population size.
Corollary 1 (Decentralized convergence)
The following iterations and conditions guarantee global convergence to a fixed point of in (13), where is as in (12) for all :
. Picard–Banach (15) if (21) holds () or ; . Krasnoselskij (18) if (21) holds (); . Mann (20) if (21) holds () or .
Note that convergence is ensured in different norms, namely , if or if ; this is not a limitation since all norms are equivalent in finite-dimensional Hilbert spaces.
We emphasize that each iterative method presented in Corollary 1 has its specific range of applicability depending on the specific MF control problem. This allows us to select one or more fixed point feedback iterations from the specific knowledge of the regularity property at hand. An important advantage of Corollary 1 is that decentralized convergence is guaranteed under conditions independent of the individual constraints , but on the common cost function in (12) only. Therefore, our results and methods apply naturally to populations of heterogeneous agents.
Note that under the conditions of Corollary 1, Algorithm 1 guarantees convergence to a fixed point of the aggregation mapping in (13) in a decentralized fashion. Let us also emphasize that any fixed point of generates a MF -Nash equilibrium by Theorem 1, that is not an exact Nash equilibrium for finite population size , mainly because only some aggregate information , which is related to , is broadcast to all the agents. In other words, we consider an information structure where each agent is not aware of the aggregate strategy of the other agents , because this would require that, at each iteration step, the central coordinator communicates different quantities to the agents, namely to agent , to agent , up to to agent .
Iv-C Discussion on decentralized convergence results in aggregative games
Decentralized convergence to Nash equilibria in terms of fixed point iterations has been studied in aggregative game theory, for populations of finite size. Most of literature results show convergence of sequential (i.e., not simultaneous/parallel) best-response updates of the agents [2, Cournot path] [4, Theorem 2], under the assumption that the best-response mappings of the players are non-increasing [4, Assumption 1’], besides continuous and compact valued.
In large-scale games, however, simultaneous/parallel responses as in Algorithm 1 are computationally more convenient with respect to sequential ones. Within the literature of aggregative games, the Mann iteration in (20) has been proposed in [3, Remark 2] for the simultaneous (parallel) best responses of the agents. See  for an application to distributed power allocation and scheduling in congested distributed networks. The aggregative game setup in these papers considers the strategy of the players to be a -dimensional variable taking values in a compact interval of the real numbers. Convergence is then guaranteed if the best-response mappings of the players are continuous, compact valued and non-increasing [3, conditions (i)–(iii), p. 81, Section 2].
It actually follows from the proof of Theorem 3 that the condition implies that the opposite of aggregation mapping in (13), i.e., , is monotone, which is the -dimensional generalization of the non-increasing property. We conclude that Theorem 3 provides mild sufficient conditions on the problem data such that the convergence result in Corollary 1 subsumes, limited to the quadratic cost function case, the one in [3, Remark 2].
V Deterministic mean field control applications
V-a Solution to the constrained linear quadratic deterministic mean field control
where is defined in (4). In (22), we average the optimal tracking trajectories among the whole population (that is, we take in (13), so that Assumption 1 is satisfied with ) and we require the trajectory to equal such average. For large population size, the interpretation is that each agent responds optimally with state and control trajectory , to the mass behavior [17, Section I, p. 1560].
In the unconstrained linear quadratic setting, that is, and for all and , the mappings and in (4) are known in closed form, in both continuous- and discrete-time case, for both infinite and finite horizon [27, Chapter 11]. Using this knowledge, if we replace in (3) by , for small enough, then the corresponding mapping from (22) is CON111If , then the mapping in (22) is continuous, compact valued and constant, hence CON. [17, Theorem 3.4], and therefore the Picard–Banach iteration converges to the unique fixed point of [17, Proposition 3.4].
Unfortunately, it turns out that the mapping in (22) is not necessarily CON. We therefore apply the results in Section IV-B to ensure convergence of suitable fixed point iterations. Following [17, Equation 2.6], for a given , let us consider
Note that for all , therefore does not affect the optimization problem in (5) with cost function in (23). Here we formally consider a vector of the same dimensions of just to recover the same mathematical setting in Section IV-B.
We can now show conditions for the decentralized convergence to a fixed point of the average mapping in (22) for the discrete-time finite-horizon constrained LQ case, as corollary to our results in Section IV-B.
V-B Production planning example
Let us illustrate the LQ deterministic MF setting with a production planning example inspired by [17, Section II.A]. We consider firms supplying the same product to the market. Let represent the production level of firm at time . We assume that each firm can change its production according to the linear dynamics
where both the states and inputs are subject to heterogeneous constraints of the form and for all . We assume that the price of the product reads as
for . Each firm seeks a production level proportional to the product price , while facing the cost to change its production level (for example, for adding or removing production lines). We can then formulate the associated LQ MF finite horizon cost function as
where , , , , and . Given a signal , each agent, , solves a finite-horizon optimal tracking problem as defined in (4), with cost function in (25). For illustration, we consider the case of a heterogeneous population of firms where we randomly sample the upper bound from a uniform distribution supported on and from a uniform distribution supported on . We consider the parameters , , , and hence . The mapping defined in (22) is then NE, thus the Krasnoleskij iteration in (18) does guarantee convergence to a fixed point, according to Corollary 2.
For different population sizes , we first numerically compute a fixed point of using the Krasnoleskij iteration in (18) with parameter , and we hence compute the strategies . We then verify that this is an -Nash equilibrium: for each firm , we evaluate the individual cost and the actual optimal cost under the knowledge of the production plan of the other firms at the fixed point . In Figure 1 we plot the maximum benefit that a firm could achieve by unilaterally deviating from the solution computed via the fixed point iteration, normalized by the optimal cost in the homogeneous case with expected constraints (, ). According to Theorem 1, such benefit vanishes as the population size increases.
V-C Decentralized constrained charging control for large populations of plug-in electric vehicles
As second control application, we investigate the problem of coordinating the charging of a large population of PEVs, introduced in  and extended to the constrained case in . For each PEV , we consider the discrete-time, , linear dynamics
where is the state of charge, is the charging control input and represents the charging efficiency.
The objective of each PEV is to acquire a charge amount within a finite charging horizon , hence to satisfy the charging constraint222We could also consider more general convex constraints, for instance on the desired state of charge, multiple charging intervals, charging rates, vehicle-to-grid operations. However, we prefer to keep the same setting of [10, 12] for simplicity. , while minimizing its charging cost , where is the electricity price function over the charging horizon. We consider a dynamic pricing, where the price of electricity depends on the overall demand, namely the inflexible demand plus the aggregate PEV demand. In particular, in line with the (almost-affine) price function in [10, 12], we consider an affine price function , where represents the inverse of the price elasticity of demand and denotes the average inflexible demand. The interest of each agent is to minimize its own charging cost , which however leads to a linear program with undesired discontinuous optimal solution. Therefore, following [10, 12], we also introduce a quadratic relaxation term as follows.
The optimal charging control of each PEV , given the price signal , is defined as