Contraction and Robustness of Continuous Time Primal-Dual Dynamics

Contraction and Robustness of Continuous Time Primal-Dual Dynamics

Hung D. Nguyen, Thanh Long Vu, Konstantin Turitsyn, Jean-Jacques Slotine . HN is with Nanyang Technological University; TLV, JJS and KT are with the Mechanical Engineering Department, Massachusetts Institute of Technology, Cambridge, MA. KT and HN were supported by the NSF awards 1554171 and 1550015, Siebel Fellowship, and VEF.
Abstract

The Primal-dual (PD) algorithm is widely used in convex optimization to determine saddle points. While the stability of the PD algorithm can be easily guaranteed, for instance by applying Krasovskii’s method, strict contraction is nontrivial to establish in most cases. In this work, we focus on continuous PD dynamics arising in a network context, in distributed optimization, or in multi-layer time-scale systems, and we show that the PD algorithm is indeed strictly contracting in specific metrics. We further analyze the robustness of the algorithm to non-idealities and partial observability. We derive estimates for the performance of hierarchical multi-scale optimization systems using the developed tools. Among other practical realizations of PD dynamics, we consider a simple Automatic Control Generation problem in numerical simulations.

Primal-dual dynamics, continuous optimization, strict contraction, robustness, hierarchical architecture

I Introduction

Multi-scale optimization is ubiquitous in both natural and artificial systems. Multiple time-scales have long been viewed as a fundamental organizing principle in the modular architecture and evolution of complex systems [1]. In natural systems, this perspective has been further strengthened in recent years, particularly in the context of systems biology [2, 3, 4]. In the brain, multiple interactions between different functional levels occur on a broad range of time-scales, involving weak or strong feedback interactions between genes, transcription factors, synapse formation, and global long-range connectivity [5]. In such systems, “general Darwinism” [6, 7] can play the role of an optimization criterion at every level [5], constrained by factors such as energy availability.

In engineering, layering provides a powerful design architecture for decomposition of multiple stage decision processes in networked infrastructures [8]. While, the purely cyber networks offer the engineers full control of the design process, in cyber-physical systems like the power grid, the structure of the dynamic equations is constrained by the physical laws. Nevertheless, it was shown in a number of recent works that the nature of these equations admits a natural optimization based perspective, see e.g. [9, 10, 11]. Largely, the underlying ideas behind the approaches exploited in those works can be traced back to classical saddle point problem, which was first considered in the context of market equilibrium in economics, appears when simultaneously minimizing a function over one set of its variables and maximizing it over the other variable set. Due to the unique characteristic of the cost function, i.e., being convex in the first set variables while being concave in the second, intuitively the primal-dual algorithm is applied by implementing gradient descent compatibly with the convexity/concavity of the cost function. This is the classic Arrow-Hurwicz algorithm [12]. There has been recently rapidly growing literature on the algorithm as it is most efficient for decentralized optimization mostly in network applications [13, 14, 15, 16]. Below we introduce the basics of the PD dynamics then we define the class of systems we focus on in this work.

I-a The PD algorithm in a distributed context

In its simplest form, the primal dual (PD) dynamics minimizes the function subject to a set of linear or nonlinear constraints with . The corresponding Lagrangian function is given by

(1)

with being the dual variables. We use the vector to represent the full state of the system. The continuous time primal dual algorithm defines a dynamic system in and described by the following set of equations

(2)
(3)

There may be two most common applications of primal-dual dynamics discussed in the literature. First, it can be naturally used to design distributed continuous time optimization systems using the Lagrangian relaxation type of approach. Within this formulation the large-scale optimization problem is represented in a superposition form with denoting the set of variables for subsystem . Coupling between the subsystems is enforced through constraining subsets of variables in different subproblems to be equal to each other. These additional constraints are represented as with being an incidence matrix of a directed graph defining equivalencies between variable replicas. In this setting, every subproblem is coupled to common dual variables but not other subproblem variables directly. Every subproblem can then be solved by an individual agent, while the dual dynamics gradually adjusts the dual variables until the equivalence constraints are satisfied.

Another common application arises in network flows of natural or artificial nature characterized by some conservation laws. Assuming that some subset of variables represents the flows of the conserved quantities, the conservation laws are expressed as with being the incidence matrix reflecting the topology of interconnection of individual subsystems and vector of external source/sink flows. In equilibrium the total flow on every node of the interconnection is balanced. However, during the transient dynamics violation can occur due to non-zero . These violations may be interpreted as the accumulation of the transported quantity on the node storage elements. In traditional circuits, the terms represent charging of effective node capacitance. While in the PD formulations of swing equations they represent the acceleration/deceleration of the rotating turbines storing energy in mechanical form. Whenever such a formulation of physical equations exists, one can naturally solve the distributed optimization problem complementing the Lagrangian with additional terms representing the objective of the controller. This approach is closely related to an energy/power shaping techniques based on the port-Hamiltonian formulation of physical equations [17].

Unless otherwise stated here we assume the constraints of the form which lead to a formulation of the form

(4a)
(4b)

I-B Contributions

The goal of this work is to study strict contraction of continuous PD dynamics of the form (4) by constructing explicitly contracting metrics and estimating corresponding contraction rates as shown in Section II. Next, based on the established contraction properties, we give a new look to some of the classical results, especially via Krasovskii method. More importantly, we derive several new results concerning the robustness of the primal-dual systems including the bounds on the long-term steady-state errors induced by various types of disturbances, inaccurate estimations, and multi-layer optimization in Section III. We dedicate Section IV to show a power system example where we present the AGC problem in a PD form then illustrate the error bounds derived in Section III.

Ii strict contraction of PD dynamics

Throughout this paper, we use to denote the norm in which the considered system is strictly contracting. Similarly, will denote the matrix measure of corresponding to the discussed norm. In particular, for the -norm, one has where is the maximal eigenvalue.

We proceed by providing the basis of contraction analysis for nonlinear dynamical systems. For holistic descriptions on this topic, see [18, 19]. Let us consider a nonlinear dynamical system where is a continuous and sufficiently smooth function of the state variable . The infinitesimal dynamics can be given as . Contraction analysis characterizes the dynamics of the distance between two close trajectories in some weighted two norm defined as: where we introduce a differential coordination transformation with an invertible metric transformation . The rate of change of the distance can be calculated accordingly with being the generalized Jacobian. Whenever the symmetric part of the generalized Jacobian is uniformly negative definite, i.e., there exists s.t. , the system is (strictly) -contracting and all trajectories will converge exponentially towards each other with a contraction rate upper bounded by .

Next, we revisit the classical result on the contraction of primal-dual dynamics in identity metric. Specifically, consider a virtual displacement between the two close trajectories characterized by the vector . These displacement vectors are described by the following dynamic equations:

(5a)
(5b)

with being the Hessian of the objective function. One can easily see that the primal-dual dynamics described by (4) is contracting with respect to the traditional Euclidean norm: :

(6)

Among other things, this result implies that the distance between the current point and any equilibrium satisfying the KKT conditions is a non-decreasing function, a well-known result since the classical works on PD dynamics [12]. Moreover, it establishes the connection between this original result and more recent approach to PD systems via Krasovskii type Lyapunov functions [13]. Indeed, the existence of a Krasovskii Lyapunov function implies contraction of the system which in turn implies that the distance between any point and any equilibrium only decreases.

Note, that this system is not strictly contracting, and there may be close trajectories that don’t eventually get closer to each other in Euclidean metrics. This indeed the case for trajectories with the same but different . Moreover, whenever the matrix is not full row rank, the optimum of the original system may not be unique, and the system may converge to different equilibria. In this case, there is no strict contraction, as the distance between two trajectories starting at different equilibria does not change. However, for the systems with full rank the question arises, whether exponential contraction of the trajectories can be established in some other metrics.

Ii-a Strict contraction of PD dynamics

In the following we shall develop a metric which certifies strict contraction of PD systems. Consider a coordinate transformation with invertible “skew” metric transformation :

(7)

where is the identity matrix of an appropriate size. Then, for the two neighbour trajectories with the virtual displacement , we introduce the distance . Following the arguments from the previous sections we arrive at the generalized Jacobian with

(8)

Theorem below establishes sufficient conditions for strict contraction of the primal-dual algorithm in this metric.

Lemma 1.

The primal-dual system defined in (4) is strictly contracting with the rate in the metric whenever also satisfies

(9)
Proof.

The condition ensures not only that the metric transformation is invertible and therefore the distance is a positive definite form:

(10)

The rate of change of this form, according to (4) is given by

Matrix is positive definite whenever satisfies (9), and the system is therefore strictly contracting. ∎

Iii Applications

While the PD dynamics is an extremely flexible framework applicable to a broad variety of continuous time optimization problems, its perfect realization is not viable in most of the practical situations. In this section, we analyze stability and performance of quasi-PD systems that approximate the “true” PD dynamics. Throughout the section, we adopt a number of assumptions and definitions reviewed below. We consider a system characterized by the Lagrangian of the form (1) and the “true” PD dynamics expressed compactly as

(11)

where . We assume that the system is contracting with respect to the uniform metric associated with the norm with some constant matrix as presented in Section II. We assume that the system is strictly -contracting.

While many of the results can be easily extended to a more general case of non-uniform metric explicitly depending on , for the sake of simplicity we restrict the discussion only to the uniform case. In the following, we show that the contraction rate with respect to such a metric can be naturally used to characterize the performance of more realistic approximately primal-dual systems.

In many if not most practical situations the PD algorithms are utilized in a non-stationary setting where the system is subject to constantly changing external conditions. In this case, the PD dynamics allows the system to track the optimal operating condition which also changes in time. For example, in power system context, the secondary frequency control can keep the system close to the optimal power flow solutions as the external factors such as the load power consumption or renewable generation change. Typically there is a time-scale separation between the fast PD dynamics and slow changes of external parameters. In this case, the deviations from equilibrium are small enough and can be safely ignored. At the same time, establishing rigorous bounds on the deviations from the optimum is essential for certifying the performance and safety of the system operation.

In the following, we assume, that the Lagrangian is explicitly dependent on time, and characterize this dependence implicitly through the position of the local optimum . Strict contraction of the PD dynamics provides us with a natural for quantification of the deviations from the optimum.

Lemma 2.

Consider now the distance between the state and the instant optimum . This distance satisfies the following differential inequality:

(12)
Proof.

This result is proven by direct differentiation of and application of the contraction property. The term represents the contraction of the fixed equilibrium system, while the term accounts for the motion of the instant equilibrium point . ∎

Corollary 2.1.

In the steady state , whenever the rate of equilibrium point movement is bounded the “true” PD system above is confined to the ball

(13)

Iii-a Robustness of PD systems

Next, we consider the systems that deviate from the “true” PD dynamics. Our primary motivation is the multi-scale optimization system where the decisions are made by different layers on multiple time-scales (for example, see [20, 21]). It is common in this setting for the higher layers of the hierarchy to have limited observability of the lower layer states. Most commonly, the optimization logic on higher layers either relies on imperfect observations or assumes that the faster lower layers have already equilibrated. To model this setting, we represent the approximate primal-dual dynamics by

(14)

where represents the substitution of true signal with its estimate . In this section we assume that the function is Lipschitz, so that the following two inequalities hold:

(15)
(16)

Then we continue by proving the following intermediate lemma (see also [22] and well as [21]).

Lemma 3.

In steady-state, the distance between the approximate and true primal-dual dynamics systems satisfies the differential inequality:

(17)
Proof.

Consider a trajectory following the original flow, i.e. satisfying and starting at time at , i.e. . Trajectory together with can be considered as two individual trajectories of the same system which is contracting the at the rate of . Strict contraction of the original system implies that the distance between these trajectories will decrease as:

(18)

where we have utilized the fact that at time . At the same time, the distance between and in the same interval has increased by at most . Therefore, we have that

(19)

Combining (18) and (19) via triangle inequality: , and taking the limit , yields (17). ∎

Corollary 3.1.

After exponential transients at rate , the distance between the non-ideal and ideal PD can be bounded as

(20)
Proof.

From Lemma 3 one has the following when the system settles down: . Combining this relation and the inequality (15), yields (20) ∎

Next, we consider a setting where the actual system does not follow exactly the primal-dual dynamics, although the PD system is a reasonable approximation. In practice, this situation can occur for a variety of reasons, for example, due to fast degrees of freedom in plant dynamics lacking the structure, or due to imperfect observers introducing additional delays in the system. While both settings can be modeled in the same way, for exposition purposes we restrict the discussion only to the case of imperfect observers. We assume, that the subset of directly unobservable variables is estimated by a separate observer system that satisfies the following conditions:

  1. A subset of the observer states, is an unbiased estimate of and for constant , the observer converges to the manifold satisfying .

  2. Dynamics of the observer is partially contracting with respect to with a contraction rate of .

The following formal results allow us to characterize the performance of the system.

Lemma 4.

Whenever and is upper bounded, the long-term steady state error of the observer is bounded by

(21)
Proof.

Applying Lemma 2 and the Lipschitz bounds we obtain

(22a)
(22b)
(22c)

where we have used the relation to arrive at (22b) from (22a). Solving for yields (21). ∎

Theorem 5.

Given the conditions stated in Lemma 4 and , the steady state optimal tracking error is bounded from above

(23)
Proof.
(24a)
(24b)
(24c)

where we use relation (21) to arrive at the last inequality from (24b). By solving for we obtain (23). ∎

Note, that in the limit of perfect observer with one recovers the bound (13).

Corollary 5.1.

The distance between the perturbed and “true” Primal-Dual dynamics can be bounded as

(25)

Corollary 5.1 can be easily proved by combining the results from Corollary 3.1 and Lemma 4.

Iii-B Performance of layered optimization architectures

Our results could be naturally applied to multi-layer optimization systems commonly occurring in nature and engineering. In these systems, each layer typically performs its own optimization [6, 1, 8], and interacts with other layers. Usually, the dynamics of the layers are separated in time-scales, such that the dynamics of higher levels is slower in comparison to the lower ones. In engineering systems, the algorithms of the individual layer are often designed with two assumptions in mind: that the lower layer has converged to its optimal equilibrium layer, and that the higher layer can be assumed to be constant. In this section, we quantify the effect of these assumptions on the performance and stability of PD algorithms.

Mathematically, the setting above can be expressed by introducing the subsets of PD variables corresponding to different layers of optimization. For notational simplicity, we assume that each layer interacts directly with two neighboring layers and the “true” PD dynamics is described by

(26)

From the viewpoint of layer , the function can be considered as an exogenous factor, which affects the position of instant equilibrium . On the other hand, we assume that the layer is designed under the assumption that all the faster layers have equilibriated and so are substituted with . In this case, following the same logic as in previous section, the actual dynamics can be represented as with

(27)
Theorem 6.

Consider a multi-layer optimization system described above. Assume that is Lipschitz with respect to and with constants denoted as respectively. Furthermore, assume that functions are also Lipschitz with respect to , with Lipschitz constant . For small enough Lipschitz constants, the system is stable and its performance is characterized by inequalities with given by (13) or (23) for the lowest layer, and the relation below for the higher ones.

(28)
Proof.

We start by bounding the term . Using the definition (27) we have

Hence, whenever we have

(29)

Next, as long as the system is contracting, we have . Using (29), assuming , we arrive at (28). ∎

Iv Numerical simulations

In this section, we will present the secondary frequency control problem in power system, so-called Automatic Generation Control (AGC) [23, 24], in PD form. In the literature, the PD algorithm for more complicated AGC models has been studied recently, for instance, by [14, 9]. As a proof of concept, this work however only considers a simplified AGC with the objective is to bring the system to a steady state characterized by the desired frequency. The frequency dynamics and the AGC are introduced below.

(30)
(31)
(32)

The swing equation(30) is written for the generator at bus which has the inertial and damping . denotes the AGC effort, for instant, the change in the mechanical power input of the generator. (31), which presents the dynamic of the line in terms of frequency, can be derived from the transferred powers where be the electrical angle vector and . is a diagonal matrix consisting of the susceptance of branch . The set denotes the set of branches connecting all buses and corresponds to the connection matrix . In a simple setting, the AGC uses a droop control where the adjustment is given as (32) with a positive gain .

We rescale the variables to retrieve the nominal form of PD with an identity gain; specifically, we introduce new variables, though they are not representing any physical ones, , , . Further let , , where .

Theorem 7.

The simplified AGC with a droop control in the rescaled variable can be represented in PD form, and the Lagrangian function is given below

(33)
Proof.

We define wherein be the primal and be the dual variables. Let represent the dynamics (30)-(32) in the rescaled variables to arrive at a PD form . ∎

Note, that the perfect PD form of the above simplified AGC is an example of the “true” PD system associated with considered in (11). While the dynamics (30)-(32) form an ideal PD system, the approximated counterpart can be created by adding a filter to pass a certain bandwidth of the control command. Particularly we consider a more detailed setting where AGC effort doesn’t enter directly the swing equation, but through intermediate dynamics like that of the turbine characterized by . Here the swing dynamics are governed by the mechanical power, , rather than the original AGC effort . The resulting PD dynamics duly become in which the error accounts for the effect of the turbine. We implement AGC on a SMIB (single-machine infinity-bus) system which consists of a generator with , , and a purely inductive line of . This simple system and AGC, as shown in Theorem 7, can be represented in an ideal PD form.

Fig. 1: The frequency of the machine in SMIB simulated in two cases: the ideal PD() vs. the non-ideal PD ().

We test the AGC by exerting on the machine exogenous torques which can represent, for example, the disturbances caused by persistent load changes. At the beginning of the simulation , we introduce a “jump” in the exogenous torque and later on a sinusoidal disturbance which is specifically of the value for . Figure 1 shows the frequency of the machine in both ideal and non-ideal PD models in . The initial transient due to the step signal dies out after when the system enters its steady state characterized by the continuing sinusoidal disturbance. The instant equilibrium dynamics are modeled in Lemma 2. Figure 2 shows that the long-term steady-state error is bounded from above starting from as shown in Theorem 5.

Fig. 2: The long-term steady-state error between the approximated PD rescaled state and the steady-state equilibrium is curbed above by a bound (the red line) proportional to the rate of change of equilibrium in Theorem 5.

V Conclusions and path forward

In this work, we establish the strict contraction of PD algorithm applicable to a class of optimization problems arising in network flows and distributed optimization. Strict contraction allows us to characterize the performance and robustness of the PD dynamics with respect to common deviations from the “true” PD dynamics. In particular, we consider the case of imperfect observers and derive recursive bounds for hierarchically organized multi-scale optimization systems.

In this work, we restricted ourselves to systems constrained by equality constraints. In practice, most of the engineering systems are constrained by inequalities which introduce additional nonlinearity in the dynamics of dual variables that will be addressed in future works. From a practical perspective, one of the most natural applications of the proposed framework is a characterization of robustness of the recently proposed secondary controls for power dispatch, like in [11] to unmodeled dynamics, such as inductive line flow delays which can compromise the stability of microgrids [25].

References

  • [1] H. A. Simon, “The architecture of complexity,” in Facets of systems science.   Springer, 1991, pp. 457–476.
  • [2] M. Kirschner and J. Gerhart, The Plausibility of Life: Resolving Darwin’s Dilemma.   Yale University Press, 2005.
  • [3] N. Kashtan and U. Alon, “Spontaneous evolution of modularity and network motifs,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 39, pp. 13 773–13 778, 2005.
  • [4] D. Del Vecchio, A. J. Ninfa, and E. D. Sontag, “Modular cell biology: retroactivity and insulation,” Molecular systems biology, vol. 4, no. 1, p. 161, 2008.
  • [5] J.-P. Changeux, “Climbing brain levels of organisation from genes to consciousness,” Trends in cognitive sciences, vol. 21, no. 3, pp. 168–181, 2017.
  • [6] J.-P. Changeux, P. Courrège, and A. Danchin, “A theory of the epigenesis of neuronal networks by selective stabilization of synapses,” Proceedings of the National Academy of Sciences, vol. 70, no. 10, pp. 2974–2978, 1973.
  • [7] G. M. Edelman, Neural Darwinism: The theory of neuronal group selection.   Basic books, 1987.
  • [8] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proceedings of the IEEE, vol. 95, no. 1, pp. 255–312, 2007.
  • [9] N. Li, C. Zhao, and L. Chen, “Connecting automatic generation control and economic dispatch from an optimization view,” IEEE Transactions on Control of Network Systems, vol. 3, no. 3, pp. 254–264, Sept 2016.
  • [10] C. Zhao, U. Topcu, N. Li, and S. Low, “Design and stability of load-side primary frequency control in power systems,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1177–1189, May 2014.
  • [11] F. Dörfler, J. W. Simpson-Porco, and F. Bullo, “Breaking the hierarchy: Distributed control and economic optimality in microgrids,” IEEE Transactions on Control of Network Systems, vol. 3, no. 3, pp. 241–253, 2016.
  • [12] K. Arrow, Studies in Linear and Non-linear Programming, ser. Stanford mathematical studies in the social sciences.   Stanford University Press, 1958.
  • [13] D. Feijer and F. Paganini, “Stability of primal–dual gradient dynamics and applications to network optimization,” Automatica, vol. 46, no. 12, pp. 1974 – 1981, 2010.
  • [14] E. Mallada, C. Zhao, and S. Low, “Optimal load-side control for frequency regulation in smart grids,” IEEE Transactions on Automatic Control, vol. 62, no. 12, pp. 6294–6309, 2017.
  • [15] J. W. Simpson-Porco, “Input/output analysis of primal-dual gradient algorithms,” in Communication, Control, and Computing (Allerton), 2016 54th Annual Allerton Conference on.   IEEE, 2016, pp. 219–224.
  • [16] C. Zhao, E. Mallada, S. Low, and J. Bialek, “A unified framework for frequency control and congestion management,” in Power Systems Computation Conference (PSCC), 2016.   IEEE, 2016, pp. 1–7.
  • [17] R. Ortega, D. Jeltsema, and J. M. Scherpen, “Power shaping: A new paradigm for stabilization of nonlinear rlc circuits,” IEEE Transactions on Automatic Control, vol. 48, no. 10, pp. 1762–1767, 2003.
  • [18] W. Lohmiller and J.-J. E. Slotine, “On contraction analysis for non-linear systems,” Automatica, vol. 34, no. 6, pp. 683–696, 1998.
  • [19] J.-J. E. Slotine, “Modular stability tools for distributed computation and control,” International Journal of Adaptive Control and Signal Processing, vol. 17, no. 6, pp. 397–416, 2003.
  • [20] D. Del Vecchio and J.-j. E. Slotine, “A contraction theory approach to singularly perturbed systems,” Automatic Control, IEEE Transactions on, vol. 58, no. 3, pp. 752–757, 2013.
  • [21] G. D. Bousquet and J.-J. E. Slotine, “A contraction based, singular perturbation approach to near-decomposability in complex systems,” arXiv preprint arXiv:1512.08464, 2015.
  • [22] J.-J. E. Slotine, W. Li et al., Applied nonlinear control.   Prentice-Hall Englewood Cliffs, NJ, 1991, vol. 60.
  • [23] P. Kundur, Power System Stability And Control, ser. EPRI power system engineering series.   McGraw-Hill, 1994.
  • [24] J. W. Simpson-Porco, Q. Shafiee, F. Dörfler, J. C. Vasquez, J. M. Guerrero, and F. Bullo, “Secondary frequency and voltage control of islanded microgrids via distributed averaging,” IEEE Transactions on Industrial Electronics, vol. 62, no. 11, pp. 7025–7038, 2015.
  • [25] P. Vorobev, P. H. Huang, M. A. Hosani, J. L. Kirtley, and K. Turitsyn, “High-fidelity model order reduction for microgrids stability assessment,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 874–887, Jan 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
184659
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description