# Sparse optimal control of networks with multiplicative noise via policy gradient\thanksreffootnoteinfo

###### Abstract:

We give algorithms for designing near-optimal sparse controllers using policy gradient with applications to control of systems corrupted by multiplicative noise, which is increasingly important in emerging complex dynamical networks. Various regularization schemes are examined and incorporated into the optimization by the use of gradient, subgradient, and proximal gradient methods. Numerical experiments on a large networked system show that the algorithms converge to performant sparse mean-square stabilizing controllers.

The University of Texas at Dallas, Richardson, TX 75080 USA

Keywords: Optimal control, multiplicative noise, networks, sensor & actuator placement

^{1}^{1}footnotetext: This material is based on work supported by the Army Research Office under grant W911NF-17-1-0058 and the National Science Foundation under grant CMMI-1728605.

Emerging highly distributed networked dynamical systems, such as critical infrastructure for power, water, and transportation, are high-dimensional and increasingly instrumented with new sensing, actuation, and communication technologies. A key problem is to design high performance control architectures that limit the number of actuators, sensors, and actuator-sensor communication links to reduce complexity and cost. Sparse control architectures may be crucial for managing complexity in emerging complex networks, but require solution of extremely difficult mixed combinatorial-continuous optimization problems.

There is a variety of performance metrics and optimization methodology for sparse control architecture design in the recent literature. Examples include structural rank conditions from Liu et al. (2011); Ruths and Ruths (2014); Olshevsky (2014), controllability and observability Gramians from Pasqualetti et al. (2014); Summers et al. (2016); Tzoumas et al. (2016); Jadbabaie et al. (2018), and optimal and robust control metrics from Hassibi et al. (1998); Polyak et al. (2013); Jovanović and Dhingra (2016); Summers (2016); Taha et al. (2019); Zare and Jovanović (2018), which are optimized via greedy algorithms, convex and mixed-integer optimization, and randomization.

Here we develop methods for sparse optimal control design in dynamical networks with multiplicative noise via policy gradient algorithms with sparsity-inducing regularization. Multiplicative noise arises in many networked systems when the weights of edges connecting nodes are stochastic in time. The noise is thus on the system parameters themselves and has a fundamentally different effect on the state evolution than additive noise, and indeed can lead to dramatic robustness issues. Specifically, a noise-ignorant classical optimal linear-quadratic (LQ) controller may actually destabilize a multiplicative noise system in the mean-square sense, even if the system was open-loop mean-square stable. Therefore noise-aware control is imperative to network performance and robustness. Moreover, the policy gradient methods we propose here, which operate directly on policy parameters, facilitate data-driven sparse control design when the model is unknown, a topic we are exploring in ongoing work.

In Section 2 we formulate the problem and discusses a policy gradient approach to optimal control design for linear-quadratic systems with multiplicative noise. In Section 3 we propose several sparse control design methods for sensor and actuator selection and communication network design using gradient, subgradient, and proximal algorithms. In Section 4 we present numerical experiments to illustrate the results. Section 5 concludes.

Consider the discrete-time linear quadratic regulator with multiplicative noise (LQRm) optimal control problem

(1) | ||||||

s.t. |

where is the system state, is the control input, is randomly distributed according to , expectation is with respect to , and and . The dynamics incorporate multiplicative noise terms modeled by the mutually independent and i.i.d. (over time) zero-mean random variables and , which have variance and , respectively. The matrices and specify how each noise term affects the system dynamics and input matrices. The goal is to determine an optimal closed-loop feedback policy with . We assume that the problem data , , , , , and are such that the optimal value of the problem exists and is finite. Feasibility of this problem is ensured if the system is mean-square stabilizable.

###### Definition 1 (Mean-square stability)

The system in (Sparse optimal control of networks with multiplicative noise via policy gradient\thanksreffootnoteinfo) is stable in the mean-square sense if for any given initial covariance .

We are ultimately interested in the problem

(2) |

where is a sparsity-promoting regularizer of the policy and specifies the importance of sparsity. The regularizer ideally would measure the number of actuators, sensors, or actuator-sensor links, but for computational tractability will be replaced by other functions defined later. We begin by discussing the solution for .

Dynamic programming can be used to show that the optimal policy is linear state feedback with where and the resulting optimal cost for a fixed initial state is quadratic with where is a symmetric positive definite matrix. When the model parameters are known, there are several known ways to compute the optimal feedback gains and corresponding optimal cost. The optimal cost is given by the solution of the generalized algebraic Riccati equation (ARE) (see, e.g., Damm (2004)).

(3) | ||||

This can be solved via value iteration, and the optimal gain matrix is

(4) |

For a fixed mean-square stabilizing linear state feedback policy , there exists a positive semidefinite cost matrix which characterizes the cost by

(5) |

and is the solution to the generalized Lyapunov equation

(6) |

Furthermore, there exists a positive semidefinite infinite-horizon aggregate state covariance matrix

which is the solution to the generalized Lyapunov equation

(7) |

where . Thus, we have

(8) |

This leads to the idea of performing gradient descent on (i.e., policy gradient) to find the optimal gain matrix:

(9) |

for a fixed step size . In this work we consider only the case where the model parameters are known, but the methods presented are immediately usable in the model-unknown case by estimating the gradient from trajectory data. The policy gradient for linear state feedback policies applied to the LQRm problem has the following form:

###### Lemma 2

The LQRm policy gradient is given by

(10) | |||

The proof is omitted due to space constraints and can be found in our technical report (see Gravell et al. (2019)).

It was shown recently by Fazel et al. (2018) that although the deterministic LQR cost is nonconvex, it is gradient dominated, also known as the Polyak-Łojasiewicz inequality originally due to Polyak (1963). It is simple to show that if a function has a Lipschitz continuous gradient and satisfies this condition then performing gradient descent with a sufficiently small constant step size will result in asymptotic convergence to the optimal function value at a linear rate (see Karimi et al. (2016)). For the LQRm problem, so long as the initial controller is stabilizing the LQRm cost is continuously differentiable over the sublevel set associated with the initial controller and thus the gradient possesses a local Lipschitz constant on this set. Identifying and the gradient domination constant is necessary for selection of a step size which is guaranteed to give convergence using gradient descent. Quantifying these constants is difficult but possible via lengthy chains of matrix inequalities as demonstrated by Fazel et al. (2018).

These results extend readily to the LQRm problem with relevant quantities pertaining to Lipschitz continuity of the gradient and the gradient domination conditions modified suitably to accommodate the multiplicative noise. In particular, the effect of the noise is to decrease the maximum step size that can be taken using gradient descent. We now state the relevant lemmas; the proofs are lengthy and can be found in our technical report (see Gravell et al. (2019)).

###### Lemma 3 (Gradient domination)

The LQRm cost satisfies the gradient domination condition

(11) |

###### Lemma 4 (Gradient descent, convergence rate)

Using the policy gradient step update

(12) |

with step size gives global convergence to the optimal at a linear rate described by

(13) |

where is a constant which is polynomial in the parameters , , , , , .

Entrywise, row, and column sparsity in correspond to actuator-sensor communication, actuator, and sensor sparsity respectively. With this in mind, we seek to solve the optimization problem of finding the sparsest set of entries, rows and/or columns of that achieve some prescribed level of performance in terms of the LQRm cost. However this problem is a nonconvex combinatorial problem which is NP-hard; the number of independent problem instances which must be solved scales factorially with and/or . We instead turn to regularization as a heuristic to identifying good sparsity patterns.

The most naïve method of inducing sparsity is hard thresholding of the ARE solution as . However, in general this is not useful since the resulting gains may not be stabilizing. Consider the following example system without multiplicative noise:

where is an identity matrix. Imposing a hard threshold of on the ARE solution results in

which gives a closed-loop state transition matrix with an eigenvalue of outside the unit circle. By contrast, by working with the regularized LQRm cost the optimal gains are always guaranteed to be stabilizing; even in the limit as the regularization weight the sparsity increases until the sparsest stabilizing solution is obtained. In practice, using a small step size helps ensure that each iterate remains inside the domain of .

Certain types of regularization are well-known to be capable of inducing sparsity in the solutions to optimization problems. Perhaps the most basic and well-known is -norm regularization which operates on a vector of decision variables; see Tibshirani (1996) for the seminal LASSO problem for sparse least-squares model selection and Hassibi et al. (1998) for sparse control design. In the case of a convex objective, increasing the regularization weight tends to increase sparsity by moving the global minimum onto the coordinate axes. Once the regularized problem has been solved, a sparsity pattern can easily be identified from the (near-)zero entries. In the current work we consider only the problem of identifying sparsity patterns, however an additional “polishing” step which involves re-solving the LQRm problem under the sparsity pattern can be performed to further improve the LQRm cost, as in Lin et al. (2013).

Entrywise sparsity is induced by the vector -norm

(14) |

Row and column sparsity are induced by using matrix row and column norms respectively defined as

(15) |

where and are the maximum absolute values of the row and column respectively of . Row and column sparsity are also induced by the row and column group LASSO

(16) |

where and are the vector -norms of the row and column respectively of . Combined row and column sparsity can be induced by the row and column sparse group LASSO

(17) | |||

(18) |

or by various other weighted combinations of entrywise, row, and column norms. We refer to as a generic nondifferentiable sparsity-inducing regularizer.

Before proceeding, we must point out an important consequence of regularizing the LQRm cost. The sum of a convex function and a gradient dominated function is not gradient dominated in general, and in fact can have multiple local minima. For example, consider the scalar function

(19) |

where is strongly convex and is gradient dominated. But has two local minima at and and therefore is not gradient dominated.

As a result any local first-order search procedure, such as those used by our algorithms, will not be guaranteed to find the global minimum. We conjecture that for the regularized LQRm problem there are at most two local minima, one associated with the LQRm cost and one associated with the regularization which tends to be more sparse. If this is so then choosing the initial point carefully may help the local search find the desired (sparser) local minimum. For open-loop mean-square systems, this motivates using zero gains as the initial condition. Likewise, in both the open-loop mean-square stable and unstable cases, an effective heuristic is to use the solution to a highly regularized problem instance to “warm start” another nearby problem instance with reduced regularization weight.

Promising choices of step directions other than the gradient are the natural gradient and the Gauss-Newton step as given by Fazel et al. (2018). When , these step directions give faster convergence than the gradient step and in fact the convergence proofs are much simpler than that for the gradient step. Unfortunately, adding a regularizer makes these steps more difficult to calculate; it is not simply the sum of the gradient of the regularizer and the unregularized natural gradient or Gauss-Newton step of the LQRm cost. For this reason we restrict our attention to the standard (sub)gradient directions.

In order to use nondifferentiable regularizers we use subgradient methods which take steps in the direction of subgradients. It is known that using a constant step size gives convergence to a bounded neighborhood of the optimum and that a diminishing step size gives asymptotic, albeit slow, convergence (see Nesterov (2013)). One immediate issue is that subgradients are defined only for convex functions; since the LQRm cost is nonconvex, subgradients do not exist for the regularized LQRm cost. However we simply use the gradient of the LQRm cost plus the subgradient of the regularizer as the step direction. Thus our subgradient descent update is

###### Algorithm 1 (Policy subgradient update)

where and is a subgradient.

Another issue is that there is no guarantee of feasibility of each next step; it is possible to take a step so large that the next point is a mean-square unstable controller giving infinite objective cost. It is not straightforward to obtain restrictions on the step size to guarantee this feasibility. Gradient descent does not suffer from this problem since the gradient is guaranteed to be a true descent direction so there is always a sufficiently small step size to give a feasible next step. Nevertheless, in practice it is rare for a sufficiently small subgradient step to be infeasible.

Proximal gradient methods have become a preferred way to solve optimization problems of the form

where has a Lipschitz continuous gradient and is convex and nondifferentiable, as is the case when is a sparsity-inducing regularizer. The proximal gradient method update is

where the proximity operator is defined as

Much of the existing literature examines the case where is convex, in which case gradient descent is guaranteed to converge. The proximal operator has closed-form expressions for and called soft thresholding and block soft thresholding (see Parikh et al. (2014)). Thus to solve (Sparse optimal control of networks with multiplicative noise via policy gradient\thanksreffootnoteinfo) we also use a proximal policy gradient algorithm:

###### Algorithm 2 (Proximal policy gradient update)

where is a generic step direction.

A result from Hassan-Moghaddam and Jovanović (2018) guarantees convergence at a linear rate to the optimal function value using the proximal gradient method on a function satisfying a proximal gradient domination condition. This condition was shown to be equivalent by one given by Karimi et al. (2016) and an inequality from Kurdyka (1998). However, this condition is not guaranteed to hold when is gradient dominated and is convex; the full condition must be checked, which involves interaction between and . It is nontrivial to verify that the condition is satisfied for the regularized LQRm cost. Empirically it appears that the inequality may be satisfied since the proximal gradient method converged to solutions similar to those from our other two methods.

Another algorithm for solving (Sparse optimal control of networks with multiplicative noise via policy gradient\thanksreffootnoteinfo) is gradient descent:

###### Algorithm 3 (Policy gradient update)

Here we use differentiable Huber-type losses in place of nondifferentiable regularizers, which replace linear corners with quadratic tips for decision variable values smaller than a specified threshold. Although the solutions produced are not exactly sparse, in practice entries are sufficiently close to zero to identify the sparsity pattern. Furthermore, by iteratively decreasing the threshold the solutions can be made arbitrarily close to truly sparse.

We define the Huber function of a scalar as

and the -Huber function (like a -norm) of a vector as

We define the vector Huber loss as

the Huber row and column norms as

and the Huber row and column group LASSO as

Subgradients of two regularizers and the gradients of their differentiable counterparts are given in Table 1.

We considered an example system which represents diffusion dynamics on a particular undirected Erdős-Rényi random graph. It is well known that if for constant , then so we chose , and and with probability obtained a connected graph (see Bollobás and Béla (2001)). The graph was selected so that it was connected, which ensured controllability. The first row and and column of the graph Laplacian were removed in order to fix the system’s state reference to the first node which removed the zero eigenvalue otherwise present. The continuous time system was discretized using a standard bilinear transform (Tustin’s approximation) which preserves the open-loop mean stability of this system. Two multiplicative noises act each on and whose entries were drawn from a Gaussian distribution. The multiplicative noise variances were set at two levels, low and high, so that the system was open-loop mean-square stable and unstable, respectively.

For the subgradient and proximal gradient methods, we stopped iterating after the best iterate had been held for 100 iterations. For the gradient method, we stopped iterating when the Frobenius norm of the gradient of the cost function fell below a small threshold value, . We swept through a range of sparsity levels by solving a problem with low then increasing and resolving the problem using the previous solution as the initial guess. The step size was initialized at . For the -norm and row group LASSO was initialized at 10 and 100 respectively. For each successive problem, the regularization weight was multiplied by a ratio and the step size was multiplied by . To determine sparsity patterns we considered a value to be sparse if it was less than 5% than the max value in . For the -norm the sparsity values were the absolute values of the entries. For the row group lasso norm the sparsity values were the the values are the -norms of the rows and columns respectively. Sparsity patterns are presented in Figs. 1 and 2 with white cells representing near-zero entries. The LQRm costs given in Figs. 3 and 4 are for the sparse gains without any polishing step applied, which otherwise could significantly reduce the cost. We give the total “wall-clock” computation time in Fig. 4 to capture the aggregate computational expense of each algorithm. The main computational expense came from evaluating the LQRm gradient at each iteration, which required solving a generalized discrete Lyapunov equation.

As seen in Fig. 4, the first iteration had the longest compute time since successive iterations benefited from favorable initial conditions from warm-starting. The compute time increased as the regularization weight was increased and a larger number of smaller steps were required to accommodate the increasing gradient magnitude.

From our empirical studies, the three methods presented all gave very similar results with similar efficacy; arbitrarily entrywise and row sparse mean-square stabilizing solutions were obtained for the low noise setting after a reasonable amount of computation time. Similarly, very sparse solutions for the high noise setting were obtained.

Python code which implements the algorithms and generates the figures reported in this work can be found in the GitHub repository at https://github.com/TSummersLab/polgrad-multinoise/.

The code was run on a desktop PC with a quad-core Intel i7 6700K 4.0GHz CPU, 16GB RAM.

We developed three policy gradient algorithms for solving the sparse gain design problem for networked dynamical systems with multiplicative noise. We showed that the regularized LQR cost does not necessarily have a unique local minimum, hampering efforts to guarantee global convergence of the algorithms. Nevertheless, efficacy of the algorithms is demonstrated empirically via computational simulations. Through various regularization functions we identified sparsity patterns for near-optimal actuator, sensor, and actuator-sensor link removal. This paves the way for data-driven control design in the model-free setting for such systems.

Future work will attempt to prove unique local minimization of the regularized LQR cost or provide a set of restrictions under which such a condition holds. A salient issue with policy gradient methods relates to scalability; for large systems the gradient calculation is computationally expensive. Hence we will explore low-rank approximations of the gradient and consequent effects on convergence. We will also extend this work to the unknown-model setting and explore alternative model-based learning schemes.

## References

- Bollobás and Béla (2001) Bollobás, B. and Béla, B. (2001). Random graphs. 73. Cambridge university press.
- Damm (2004) Damm, T. (2004). Rational matrix equations in stochastic control, volume 297. Springer Science & Business Media.
- Fazel et al. (2018) Fazel, M., Ge, R., Kakade, S., and Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 1467–1476. PMLR.
- Gravell et al. (2019) Gravell, B., Guo, Y., and Summers, T. (2019). Policy gradient methods for networked dynamical systems with multiplicative noise. URL https://www.utdallas.edu/~tyler.summers/papers/Gravell2019TechReport.pdf. Unpublished.
- Hassan-Moghaddam and Jovanović (2018) Hassan-Moghaddam, S. and Jovanović, M.R. (2018). On the exponential convergence rate of proximal gradient flow algorithms. In 2018 IEEE Conference on Decision and Control (CDC), 4246–4251. IEEE.
- Hassibi et al. (1998) Hassibi, A., How, J., and Boyd, S. (1998). Low-authority controller design via convex optimization. In Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171), volume 1, 140–145 vol.1.
- Jadbabaie et al. (2018) Jadbabaie, A., Olshevsky, A., and Siami, M. (2018). Deterministic and randomized actuator scheduling with guaranteed performance bounds. arXiv preprint arXiv:1805.00606.
- Jovanović and Dhingra (2016) Jovanović, M.R. and Dhingra, N.K. (2016). Controller architectures: Tradeoffs between performance and structure. European Journal of Control, 30, 76–91.
- Karimi et al. (2016) Karimi, H., Nutini, J., and Schmidt, M. (2016). Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases, 795–811. Springer International Publishing, Cham.
- Kurdyka (1998) Kurdyka, K. (1998). On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier, 48(3), 769–783.
- Lin et al. (2013) Lin, F., Fardad, M., and Jovanović, M.R. (2013). Design of optimal sparse feedback gains via the alternating direction method of multipliers. IEEE Transactions on Automatic Control, 58(9), 2426–2431.
- Liu et al. (2011) Liu, Y.Y., Slotine, J.J., and Barabási, A.L. (2011). Controllability of complex networks. Nature, 473(7346), 167–173.
- Nesterov (2013) Nesterov, Y. (2013). Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media.
- Olshevsky (2014) Olshevsky, A. (2014). Minimal controllability problems. IEEE Transactions on Control of Network Systems, 1(3), 249–258.
- Parikh et al. (2014) Parikh, N., Boyd, S., et al. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1(3), 127–239.
- Pasqualetti et al. (2014) Pasqualetti, F., Zampieri, S., and Bullo, F. (2014). Controllability metrics, limitations and algorithms for complex networks. IEEE Transactions on Control of Network Systems, 1(1), 40–52.
- Polyak et al. (2013) Polyak, B., Khlebnikov, M., and Shcherbakov, P. (2013). An LMI approach to structured sparse feedback design in linear control systems. In Proc. European Control Conference, 833–838.
- Polyak (1963) Polyak, B. (1963). Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4), 864 – 878.
- Ruths and Ruths (2014) Ruths, J. and Ruths, D. (2014). Control profiles of complex networks. Science, 343(6177), 1373–1376.
- Summers et al. (2016) Summers, T., Cortesi, F., and Lygeros, J. (2016). On submodularity and controllability in complex dynamical networks. IEEE Transactions on Control of Network Systems, 3(1), 91–101.
- Summers (2016) Summers, T. (2016). Actuator placement in networks using optimal control performance metrics. In IEEE Conference on Decision and Control, 2703–2708.
- Taha et al. (2019) Taha, A.F., Gatsis, N., Summers, T., and Nugroho, S. (2019). Time-varying sensor and actuator selection for uncertain cyber-physical systems. IEEE Transactions on Control of Network Systems. to appear.
- Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
- Tzoumas et al. (2016) Tzoumas, V., Rahimian, M.A., Pappas, G., and Jadbabaie, A. (2016). Minimal actuator placement with bounds on control effort. IEEE Transactions on Control of Network Systems, 3(1), 67–78.
- Zare and Jovanović (2018) Zare, A. and Jovanović, M.R. (2018). Optimal sensor selection via proximal optimization algorithms. In 2018 IEEE Conference on Decision and Control (CDC), 6514–6518. IEEE.