Decentralized Control Problems with Substitutable Actions
We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of “simpler” decentralized problems such as partially nested or quadratically invariant problems, our results show that, under the substitutability assumption, linear strategies are optimal and we provide a complete state space characterization of optimal strategies. We also identify a family of information structures that all give the same optimal cost as the centralized information structure under the substitutability assumption. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems.
The difficulty of finding optimal strategies in decentralized control problems has been well-established in the literature . In general, the optimization of strategies can be a non-convex problem over infinite dimensional spaces . Even the celebrated linear quadratic Gaussian (LQG) model of centralized control presents difficulties in the decentralized setting [?]. There has been significant interest in identifying classes of decentralized control problems that are more tractable. Information structures of decentralized control problems, which describe what information is available to which controller, have been closely associated with their tractability. Problems with partially nested  or stochastically nested information structures  and problems that satisfy quadratic invariance  or funnel causality  properties have been identified as “simpler” than the general decentralized control problems.
In this paper, instead of starting from the information structure of the problem, we first look at open-loop strategies under which controllers take actions without any observations. Stated another way, we start with a trivially simple information structure: no controller knows anything (except, of course, the model of the system and the cost objective).
We define a property of open-loop decentralized control, namely the substitutability of one controller by another, and explore its implications on optimization of closed-loop strategies (under which controllers take actions as functions of their observations). In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to one of the simpler classes mentioned earlier (partially nested, quadratic invariant etc.), (i) our results show that linear strategies are optimal; (ii) we provide a complete state space characterization of optimal strategies; (iii) we also identify a family of information structures that all achieve the same cost as the centralized information structure. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems.
Our work shares conceptual similarities with the work on internal quadratic variance  which identified problems that are not quadratically invariant but can still be reduced to (infinite dimensional) convex programs. In contrast to this work, we explicitly identify optimal control strategies.
Uppercase letters denote random variables/vectors and their corresponding realizations are represented by lowercase letters. Uppercase letters are also used to denote matrices. denotes the expectation of a random variable. When random variable is normally distributed with mean and variance , it is shown as .
For a sequence of column vectors , the notation denotes vector . Furthermore, the vector is denoted by . The transpose and Moore-Penrose pseudo-inverse of matrix are denoted by and , respectively. The identity matrix and zero vector are denoted by and 0 respectively and their dimensions are inferred from the context.
We consider a stochastic system with controllers. The dynamics of the system are given as:
where is the state of the system at time , is the action of controller at time and is a random noise variable. The state takes value in the set , the control action of the th controller takes value in the set and the noise takes value in the set . We use to denote the vector .
The system operates in discrete time for a horizon . At time step , the system incurs a cost given as a function of the state and control actions: . The control objective is to minimize the expected value of the total cost accumulated over the time steps: .
We say that controller is using open-loop control strategy if its control actions are a function only of time and not of any information (observations) obtained from the system. Otherwise, we say that controller is using a closed-loop control strategy. Based on the dynamics and the cost function, we can define a notion of open-loop substitutability among controllers.
A similar notion of open-loop substitutability can be defined for any pair of controllers in the system.
If we are considering only open-loop control strategies for all controllers and if controller 1 can substitute for controller 2 in open-loop control (as per Definition ?), then there is no loss of optimality in fixing all actions of controller 2 to . This is the intuitive meaning of substitutability — since controller 1 can substitute for controller 2, controller 2 does not need to do anything.
The open-loop substitutability property has some closed-loop implications. Let us denote by the collection of all observations and past control actions that are available to controller at time . Under a closed-loop strategy, controller selects its control action as a function of , that is, . The collection of functions is referred to as controller ’s (closed-loop) strategy.
Consider any arbitrary strategies for the controllers. Define new strategy for controller 1 as follows:
where is the substitution function from Definition ?. Firstly, note that is a valid strategy for controller 1 because . If this was not the case, the right hand side of would be using information that controller 1 may not have.
The result of the lemma then follows from the observation that the pair will always have the same effects on dynamics and cost as because of the substitutability conditions.
The condition is necessary for Lemma ? to hold. It is easy to construct examples where does not include and the second controller cannot be restricted to the “always zero” strategy without losing optimality.
The statement of Lemma ? can be intuitively interpreted as follows: The open-loop substitutability of controller 2 by controller 1 and the fact that controller 1 is better informed make controller 2 essentially redundant for the purpose of cost optimization.
Lemma ? suggests that open-loop substitutability, combined with the information structure of the problem, can have implications about the closed-loop problem. In the rest of the paper, we consider a LQG control problem with multiple controllers and obtain results much sharper than Lemma ? for such problems.
3LQG problem with state feedback
We consider a stochastic system with controllers where
The state dynamics are given as
where and .
The cost at time is given as
The initial state and the noise variables are independent and have Gaussian distributions.
We will make the following assumption about the system.
The system of equations of is equivalent to a matrix equation of the form
The general solution of is for arbitrary . By setting , we have .
An example of a system satisfying Assumption 1 is a two-controller LQG problem where the dynamics and the cost are functions only of the sum of the control actions, that is, (). This happens if and . In this case, using satisfies ( ?), which means that controller 1 can substitute for controller 2 and vice versa.
We assume that the state vector consists of sub-vectors, that is . can be interpreted as the state of the th sub-system. We assume a local state feedback with perfect recall information structure, that is, the information available to controller at time is:
Controller chooses action as a function of the information available to it. Specifically, for ,
The collection is called the control strategy of controller . The performance of the control strategies , is measured by the expected cost
where the expectation is with respect to the joint probability distribution on induced by the choice of .
The optimization problem is defined as follows.
In addition to the decentralized information structure described above, we will also consider the centralized information structure where all controllers have access to the entire state and action history.
In this section, we will show that we can construct optimal strategies in Problem ? from the optimal control strategies of the centralized problem (Problem ?). We start with the following observations.
We can now state our main result for the state feedback case.
Firstly, observe that the strategies given by are valid control strategies under the information structure of Problem ?. The optimal control vector under the centralized strategy is a superposition of terms of the form . Note that the term consists of sub-vectors (one corresponding to each controller’s action).
Such a control vector cannot be implemented in the decentralized information structure since it requires each controller to have access to . We now exploit the open loop substitutability of the problem to state that the vector will have the same effect as . This allows us to construct a decentralized strategy with the same performance as the centralized one. We provide a detailed proof for the more general case of the output feedback problem in the next section.
We can also derive the following corollary of Theorem ?.
Corollary ? identifies memoryless local state feedback as the minimal information structure that achieves the optimal centralized cost. In other words, it describes the minimal communication and memory requirements for controllers to achieve the optimal centralized cost.
4 LQG problem with output feedback
We consider the system model described in Section 3.1 and assume that each controller makes a noisy observation of the system state given as
Combining (Equation 9) for all controllers gives:
where denotes and denotes . The initial state and the noise variables and are mutually independent and jointly Gaussian with the following probability distributions:
The information available to the th controller at time is:
Each controller , chooses its action according to and the performance of the control strategies of all controllers, (), is measured by (Equation 8).
The optimization problem is defined as follows.
In addition to the decentralized information structure described above, we will also consider the centralized information structure and the corresponding strategy optimization.
The following lemma follows directly from the problem descriptions above and well-known results for the centralized LQG problem with output feedback .
In this section, we show that it is possible to construct optimal strategies in Problem ? from the optimal control strategy of Problem ?.
Observe that the strategies given by ( ?) and are valid control strategies under the information structure of Problem ? because they depend only on which are included in . The states defined in are related to the centralized estimate by the following result.
We prove the result by induction. For , from ( ?), we have and according to ( ?),
Now assume that . We need to show that . From ( ?), it follows that
From ( ?), we have
The following result is an immediate consequence of Theorem ?.
4.3Proof of Theorem
For notational conveniences, we will describe the proof for . If is the optimal control strategy of Problem ?, then from Lemma ?, we have:
We claim that the decentralized control strategies defined in Theorem ?, that is
yield the same expected cost as the optimal centralized control strategies .
To establish the above claim, we define cost-to-go functions under the optimal centralized strategy and the strategies defined in Theorem ?. These functions, denoted by and for , are defined as follows:
where is given by for all , and
where is given by for all . The function in (Equation 16) is defined only for ; is undefined for .
We will show that for , such that . We follow a backward induction argument. For , we have,
Since the only difference between (Equation 17) and ( ?) is with respect to their different control strategies, it suffices to show that the term is the same under these two control strategies.
Under control action , we have . Under control actions , we have
From the substitutability assumption (Assumption 1) and Lemma ?, for any vector , . Therefore,
(Equation 18) can now be written as,
where the last equality is true because . Therefore, such that .
Now, assume that such that . We need to show that with . For this, note that one can use dynamic programming arguments to write the cost-to-go functions and in terms of instantaneous cost and the next stage cost-to-go functions:
The first expectation on the right hand side of (Equation 20) can be shown to be equal to the first expectation on the right hand side of (Equation 21) by repeating the arguments used at time . Using Lemma ?, the second expectation on the right hand side of (Equation 20) can be written as,
Furthermore, because of the induction hypothesis, the second expectation on the right hand side of (Equation 21) can be written as,
can be further written as
From the substitutability assumption (Assumption 1) and Lemma ?, for any vector , . Therefore,
can now be written as
is the same as . Therefore, such that .
Now, the expected cost under the centralized control strategy, , can be written as,
while the expected cost under the decentralized strategies of Theorem ? can be written as
Because such that , and are equal. Thus, the decentralized control strategies of Theorem ? achieve the same expected cost as the optimal centralized strategies.
We considered a decentralized system with multiple controllers and defined a property called substitutability of one controller by another in open-loop strategies. For the LQG problem, our results show that, under the substitutability assumption, linear strategies are optimal and we provide a complete state space characterization of optimal strategies. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems.
- H. S. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Journal on Control, vol. 6, no. 1, pp. 131–147, 1968.
- G. M. Lipsa and N. C. Martins, “Optimal memoryless control in Gaussian noise: A simple counterexample,” Automatica, vol. 47, no. 3, pp. 552–558, 2011.
- V. D. Blondel and J. N. Tsitsiklis, “A survey of computational complexity results in systems and control,” Automatica, vol. 36, no. 9, pp. 1249–1274, 2000.
- S. Yüksel and T. Başar, Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints.1em plus 0.5em minus 0.4emBoston, MA: Birkhäuser, 2013.
- A. Mahajan, N. Martins, M. Rotkowitz, and S. Yuksel, “Information structures in optimal decentralized control,” in IEEE Conference on Decision and Control, 2012, pp. 1291–1306.
- Y.-C. Ho and K.-C. Chu, “Team decision theory and information structures in optimal control problems–Part I,” vol. 17, no. 1, pp. 15–22, 1972.
- S. Yüksel, “Stochastic nestedness and the belief sharing information pattern,” pp. 2773–2786, Dec. 2009.
- M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” vol. 51, no. 2, pp. 274–286, 2006.
- B. Bamieh and P. Voulgaris, “A convex characterization of distributed control problems in spatially invariant systems with communication constraints,” Systems and Control Letters, vol. 54, no. 6, pp. 575–583, 2005.
- L. Lessard and S. Lall, “Internal quadratic invariance and decentralized control,” in American Control Conference (ACC), 2010, June 2010, pp. 5596–5601.
- L. Lessard, “Tractability of complex control systems,” Ph.D. dissertation, Stanford University, 2011.
- S. M. Asghari and A. Nayyar, “Decentralized control problems with substitutable actions,” in IEEE 54th Annual Conference on Decision and Control (CDC), 2015, Dec 2015.
- =2plus 43minus 4 A. Ben-Israel and T. Greville, Generalized Inverses: Theory and Applications, ser. CMS Books in Mathematics.1em plus 0.5em minus 0.4emSpringer New York, 2006. [Online]. Available: https://books.google.com/books?id=abEPBwAAQBAJ =0pt
- P. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control.1em plus 0.5em minus 0.4emPrentice-Hall, 1986.