Finite-Time Convergence of Continuous-Time Optimization Algorithms via Differential Inclusions
In this paper, we propose two discontinuous dynamical systems in continuous time with guaranteed prescribed finite-time local convergence to strict local minima of a given cost function. Our approach consists of exploiting a Lyapunov-based differential inequality for differential inclusions, which leads to finite-time stability and thus finite-time convergence with a provable bound on the settling time. In particular, for exact solutions to the aforementioned differential inequality, the settling-time bound is also exact, thus achieving prescribed finite-time convergence. We thus construct a class of discontinuous dynamical systems, of second order with respect to the cost function, that serve as continuous-time optimization algorithms with finite-time convergence and prescribed convergence time. Finally, we illustrate our results on the Rosenbrock function.
In continuous-time optimization, an ordinary differential equation (ODE), partial differential equation (PDE), or differential inclusion is designed in terms of a given cost function, in such a way to lead the solutions to converge (forward in time) to an optimal value of the cost function. To achieve this, tools from Lyapunov stability theory are often employed, mainly because there already exists a rich body of work within the nonlinear systems and control theory community for this purpose. In particular, we seek asymptotically Lyapunov stable gradient-based systems with an equilibrium (stationary point) at an isolated extremum of the given cost function, thus certifying local convergence. Naturally, global asymptotic stability leads to global convergence, though such an analysis will typically require the cost function to be strongly convex everywhere.
For early work in this direction, see (Botsaris, 1978b, a), (Zghier, 1981), (Snyman, 1982, 1983), and (Brown, 1989). Brockett (1988) and, subsequently, Helmke and Moore (1994), studied relationships between linear programming, ODEs, and general matrix theory. Further, Schropp (1995) and Schropp and Singer (2000) explored several aspects linking nonlinear dynamical systems to gradient-based optimization, including nonlinear constraints. Cortés (2006) proposed two discontinuous normalized modifications of gradient flows to attain finite-time convergence. Later, Wang and Elia (2011) proposed a control-theoretic perspective on centralized and distributed convex optimization.
More recently, Su et al. (2014) derived a second-order ODE as the limit of Nesterov’s accelerated gradient method, when the gradient step sizes vanish. This ODE is then used to study Nesterov’s scheme from a new perspective, particularly in an larger effort to better understand acceleration without substantially increasing computational burden. Expanding upon the aforementioned idea, França et al. (2018) derived a second-order ODE that models the continuous-time limit of the sequence of iterates generated by the alternating direction method of multipliers (ADMM).Then, the authors employ Lyapunov theory to analyze the stability at critical points of the dynamical systems and to obtain associated convergence rates.
Later, França et al. (2019) analyze general non-smooth and linearly constrained optimization problems by deriving equivalent (at the limit) non-smooth dynamical systems related to variants of the relaxed and accelerated ADMM. Then, the authors employ Lyapunov theory to analyze the stability at critical points of the dynamical systems and to obtain associated convergence rates. Later, França et al. (2019) analyze general non-smooth and linearly constrained optimization problems by deriving equivalent (at the limit) non-smooth dynamical systems related to variants of the relaxed and accelerated ADMM.
In the more traditional context of machine learning, only a few papers have adopted the approach of explicitly borrowing or connecting ideas from control and dynamical systems. For unsupervised learning, Plumbley (1995) proposes Lyapunov stability theory as an approach to establish convergence of principal component algorithms. Pequito et al. (2011) and Aquilanti et al. (2019) propose continuous-time generalized expectation-maximization (EM) algorithms, based on mean-field games, for clustering of finite mixture models. Romero et al. (2019) establish convergence of the EM algorithm, and a class of generalized EM algorithms denoted -EM, via discrete-time Lyapunov stability theory. For supervised learning, Liu and Theodorou (2019) provide a review of deep learning from the perspective of control and dynamical systems, with a focus in optimal control. Zhu (2018) and Rahnama et al. (2019) explore connections between control theory and adversarial machine learning.
Statement of Contribution
In this work, we provide a Lyapunov-based tool to both check and construct continuous-time dynamical systems that are finite-time stable and thus lead to finite-time convergence of the candidate Lyapunov function (intended as a surrogate to a given cost function) to its minimum value. In particular, we first extend one of the existing Lyapunov-based inequality condition for finite-time convergence of the usual Lipschitz continuous dynamical systems, to the case of arbitrary differential inclusions. We then use this condition to construct a family of discontinuous, second-order flows, which guarantee local convergence to a local minimum, in prescribed finite time. One of the proposed families of continuous-time optimization algorithms is tested on a well-known optimization testcase, namely, the Rosenbrock function.
2 Finite-Time Convergence in Optimization via Finite-Time Stability
Consider some objective cost function that we wish to minimize. In particular, let be an arbitrary local minimum of that is unknown to us. In continuous-time optimization, we typically proceed by designing a nonlinear state-space dynamical system
or a time-varying one replacing with , for which can be computed without explicit knowledge of and for which (1) is certifiably asymptotically stable at . Ideally, computing should be possible using only up to second-order information on .
In this work, however, we seek dynamical systems for which (1) is certifiably finite-time stable at . As will be clear later, such systems need to be possibly discontinuous or non-Lipschitz, based on differential inclusions instead of ODEs. Our approach to achieve this objective is largely based on exploiting the Lyapunov-like differential inequality
with constants and , for absolutely continuous functions such that . Indeed, under the aforementioned conditions, will be reached in finite time .
We now summarize the problem statement:
Given a sufficiently smooth cost function with a sufficiently regular local minimizer , solve the following tasks:
Design a sufficiently smooth
1candidate Lyapunov function for which is defined and positive definite near and w.r.t. 2.
By following this strategy, we will therefore achieve (local and strong) finite-time stability, and thus finite-time convergence. Furthermore, if can be upper bounded, then can be readily tuned to achieve finite-time convergence under a prescribed range for the settling time, or even with exact prescribed settling time if can be explicitly computed and (2) holds exactly.
3 A Family of Finite-Time Stable, Second-Order Optimization Flows
We now propose a family of second-order optimization methods with finite-time convergence constructed using two gradient-based Lyapunov functions, namely and . First, we need to assume sufficient smoothness on the cost function.
is twice continuously differentiable and strongly convex in an open neighborhood of a stationary point .
Since for and a.e. for , we can readily design Filippov differential inclusions that are finite-time stable at . In particular, we may design such differential inclusions to achieve an exact and prescribed finite settling time, at the trade-off of requiring second-order information on .
Given a symmetric and positive definite matrix with SVD decomposition , , , we define , where .
Let , , and . Under Assumption 1, any maximal Filippov solution to the discontinuous second-order generalized Newton-like flows
with sufficiently small (where ) will converge in finite time to . Furthermore, their convergence times are given exactly by
for (3)-(4), respectively, where . In particular, given any compact and positively invariant subset , both flows converge in finite with the aforementioned settling time upper bounds (which can be tightened by replacing with ) for any . Furthermore, if , then we have global finite-time convergnece, i.e. finite-time convergence to any maximal Filippov solution with arbitrary .
See supplementary material, appendix C. ∎
4 Numerical Experiment: Rosenbrock Function
We will now test one of our proposed flows on the Rosenbrock function , given by
with parameters . This function is nonlinear and non-convex, but smooth. It possesses exactly one stationary point for , which is a strict global minimum for . If , then is a saddle point. Finally, if , then are the stationary points of , and they are all non-strict global minima.
As we can see in Figure 1, this flow converges correctly to the minimum from all the tested initial conditions with an exact prescribed settling time . It should be noted that at any given point in the trajectory , the functions and are not guaranteed to decrease or remain constant, indeed only can be guaranteed to do so, which explains the increase in Figure 1-(d) that could never have occurred in Figure 1-(e).
5 Conclusions and Future Work
We have introduced a new family of second-order flows for continuous-time optimization. The main characteristic of the proposed flows is their finite-time convergence guarantees. Furthermore, they are designed in such a way that the finite convergence time can be prescribed by the user. To be able to analyze these discontinuous flows, we resorted to establishing finite-time stability. In order to do this, we first extended an existing sufficient condition for finite-time stability through a Lyapunov-based inequality, in the case of smooth dynamics, to the case of non-smooth dynamics modeled by differential inclusions. One of the proposed families was tested on a well-known optimization benchmark – the Rosenbrock function.
While the obtained results are encouraging, it should be clear that currently available numerical solvers for our proposed flows do not translate into competitive iterative optimization algorithms. Deciding how to best discretize a given continuous-time optimization algorithm (i.e. as given by an ODE, PDE, or differential inclusion), or how to compare two such continuous-time algorithms in terms of the performance of corresponding iterative schemes suitable for digital computers, largely remains an open problem, but nonetheless a very active topic of research as of right now.
For the aforementioned reasons, future work will be dedicated to studying discretization of our proposed flows (and other finite-time stable flows) that will hopefully lead to either accelerated schemes when compared to currently available methods, or to a better understanding of the intrinsic boundaries of acceleration strategies. Furthermore, we will extend our results to constrained optimization; gradient-free optimization; time-varying optimization; and extremum-seeking control where the derivative of the cost function is estimated from direct measurements of the cost.
The majority of the research that led to this work was conducted when the first author was doing an internship at Mitsubishi Electrics Research Laboratories (MERL) in the summer of 2019, under the supervision of the second author.
Appendix A Discontinuous Systems and Filippov Differential Inclusions
Recall that for an initial value problem (IVP)
with , the typical way to check for existence of solutions is by establishing continuity of . Likewise, to establish unicity of solution, we typically seek Lipschitz continuity. When is discontinuous, but nonetheless Lebesgue measurable and locally essentially uniformly bounded, we may understand (7a) as the Filippov differential inclusion
where denotes the Filippov set-valued map given by
where denotes the usual Lebesgue measure and the convex closure, i.e. closure of the convex hull . For more details, se (Paden and Sastry, 1987), from which this section is based on.
is defined a.e. and is Lebesgue measurable in a non-empty open region . Further, is locally essentially bounded, i.e., for every point , is bounded a.e. in some bounded neighborhood of .
Definition 1 (Filippov).
For a comprehensive overview of discontinuous systems, including sufficient conditions for existence (Proposition 3) and uniqueness (Propositions 4 and 5) of Filippov solutions, see (Cortés, 2008). In particular, it can be established that Filippov solutions to (7) exist, provided that Assumption 2 holds.
Proposition 1 (Paden and Sastry (1987), Theorem 1).
For instance, for a gradient flow, we have for every , provided that is continuously differentiable. Furthermore, if is only locally Lipschitz continuous and regular (see Definition 2 of Appendix B), then , where
denotes Clarke’s generalized gradient (Clarke, 1981) of , with being any zero-measure set and the zero-measure set over which is not differentiable.
Appendix B Finite-Time Stability of Differential Inclusions
Consider a general differential inclusion (see (Bacciotti and Ceragioli, 1999) for more details)
where is a set-valued map.
has nonempty, compact, and convex values, and is upper semi-continuous.
Similarly to the previous case of Filippov solutions, we say that with is a Carathéodory solution to (12) if is absolutely continuous and (12) is satisfied a.e. in every compact subset of . Furthermore, we say that is a maximal Carathéodory solution if no other Carathéodory solution exists with .
We say that is an equilibrium of (12) if on some small enough non-degenerate interval is a Carathéodory solution to (12). In other words, if and only if . We say that (12) is (Lyapunov) stable at if, for every , there exists some such that, for every maximal Carathéodory solution of (12), we have for every in the interval where is defined. Note that, under Assumption 3, if (12) is stable at , then is an equilibrium of (12) (Bacciotti and Ceragioli, 1999). Furthermore, we say that (12) is (locally and strongly) asymptotically stable at if is stable at and there exists some such that, for every maximal Carathéodory solution of (12), if then as . Finally, (12) is (locally and strongly) finite-time stable at if it is asymptotically stable and there exists some and such that, for every maximal Carathéodory solution of (12) with , we have .
We will now construct a Lyapunov-based criterion adapted from the literature of finite-time stability of Lipschitz continuous systems. To do this, we first adapt Lemma 1 in (Benosman et al., 2009) for absolutely continuous functions and non-positive exponents.
Suppose that for every with . Let be the supremum of all such ’s, thus satisfying for every . We will now investigate . First, by continuity of , it follows that . Now, by rewriting
a.e. in , we can thus integrate to obtain
everywhere in , which in turn turn leads to
where the last inequality follows from for every . Taking the supremum in (18) then leads to the upper bound (14). Finally, we conclude that , since is impossible given that it would mean, due to continuity of , that there exists some such that for every , thus contradicting the construction of .
In their Proposition 2.8, Cortés and Bullo (2005) proposed a Lyapunov-based criterion to establish finite-time stability of discontinuous systems, which fundamentally boils down to Lemma 1 with exponent . This proposition was in turn based on Theorem 2 of Paden and Sastry (1987). Cortés (2006) later proposed a second-order Lyapunov criterion, which fundamentally boils down to being strongly convex. Finally, Corollary 3.1 of Hui et al. (2009) generalized the aforementioned Proposition 2.8 of Cortés and Bullo (2005) to establish semistability. Indeed, these two results coincide for isolated equilibria.
We now present a novel result that generalizes the aforementioned first-order Lyapunov-based results, by exploiting our Lemma 1. More precisely, given a Laypunov candidate function , the objective is to set and check that the conditions of Lemma 1 hold. To do this, and assuming to be locally Lipschitz continuous, we first borrow and adapt from Bacciotti and Ceragioli (1999) the definition of set-valued time derivative of w.r.t. the differential inclusion (12), given by
for each . Notice that, under Assumption 3 For Filippov differential inclusions, we have , and the set-valued time derivative of thus coincides with with the set-valued Lie derivative . Indeed, more generally could be seen as a set-valued Lie derivative w.r.t. the set-valued map .
We say that is said to be regular if every directional derivative, given by
exists and is equal to
known as Clarke’s upper generalized derivative (Clarke, 1981).
is locally Lipscthiz continuous and regular.
In practice, regularity is a fairly mild and easy to guarantee condition. For instance, it would suffice that is convex or continuously differentiable to ensure that it is Lipschitz and regular.
We are now equipped to formally establish the correspondence between the set-valued time-derivative of and the derivative of the energy function associated with an arbitrary Carathéodory solution to the differential inclusion (12).
Lemma 2 (Lemma 1 of Bacciotti and Ceragioli (1999)).
We are now ready to state and prove our Lyapunov-based sufficient condition for finite-time stability of differential inclusions.
Suppose that Assumptions 3 and 4 hold for some set-valued map and some function , where is an open and positively invariant neighborhood of a point . Suppose that is positive definite w.r.t. and that there exist constants and such that
a.e. in . Then, (12) is finite-time stable at , with settling time upper bounded by
where . In particular, any Carathéodory solution with will converge in finite time to under the upper bound (23). Furthermore, if , then (12) is globally finite-time stable. Finally, if is a singleton a.e. in and (22) is exact, then so is (23).
Note that, by Proposition 1 of Bacciotti and Ceragioli (1999), we know that (12) is Lyapunov stable at . All that remains to show is local convergence towards (which must be an equilibrium) in finite time. Indeed, given any maximal solution to (12) with , we know by Lemma 2, that is absolutely continuous with a.e. in . Therefore, we have
a.e. in . Since , given that , the result then follows by invoking Lemma 1 and noting that . ∎
Appendix C Proof of the main result: Theorem 1
Let us focus first on (3), since the proof for (4) follows similar steps. The idea is to show that it is finite-time stable at , with the inequality in Theorem 2 holding exactly for . First, notice that is continuous near (but not at) , and undefined at itself. Furthermore, we have
with and everywhere near for some (-strong convexity). Therefore, is Lebesgue integrable (and thus measurable) and locally essentially bounded, which means that Assumption 2 is satisfied.
Set , defined over . If is not positively invariant w.r.t. (3), we can always replace it by a smaller open subset that is, e.g. a sufficiently small strict sublevel set contained within . Clearly, is continuously differentiable, thus satisfying Assumption 4. Furthermore, it is positive definite w.r.t. and, given , we have
with . Furthermore, since . The result thus follows by invoking Theorem 2.
We now proceed to establish finite-time stability of (4) at . Like before, we notice that is continuous near, but not at, . Furthermore, notice that
for every , with . Since is a zero-measure set due to being a finite union of hypersurfaces in (), and also recalling that is strongly convex near and , it follows that is Lebesgue integrable and locally essentially bounded. Therefore, Assumption 2 is once again satisfied.
Now consider the candidate Lyapunov function , defined over . Clearly, is not continuously differentiable this time. However, it is still satisfies Assumption 2 due to being a.e. differentiable. In particular, we have for every . In other words, we have a.e. in .
Given , we thus have
with . The result once again follows by invoking Theorem 2.
- At least locally Lipschitz continuous and regular (see Definition 2 of the supplementary material, Appendix A).
- In other words, there exists some open neighborhood of such that is defined in and satisfies and for every .
- Right-hand side defined at least a.e., Lebesgue measurable, and locally essentially bounded.
- See supplementary material, Appendix A.
- Cited by: §1.
- Stability and stabilization of discontinuous systems and nonsmooth lyapunov functions. ESAIM: Control, Optimisation and Calculus of Variations 4, pp. 361–376. Cited by: Appendix B, Appendix B, Appendix B, Appendix B, Lemma 2.
- Nonlinear control allocation for non-minimum phase systems. IEEE Transactions on Control Systems Technology 17 (2), pp. 394–404. Cited by: Appendix B.
- Differential gradient methods. Journal of Mathematical Analysis and Applications 63 (1), pp. 177–198. Cited by: §1.
- A class of methods for unconstrained minimization based on stable numerical integration techniques. Journal of Mathematical Analysis and Applications 63 (3), pp. 729–749. Cited by: §1.
- Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. pp. 799–803. Cited by: §1.
- Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations. Journal of Optimization Theory and Applications 62 (2), pp. 211–224. Cited by: §1.
- Generalized gradients of lipschitz functionals. Advances in Mathematics 40 (1), pp. 52–67. Cited by: Appendix A, Definition 2.
- Coordination and geometric optimization via distributed dynamical systems. SIAM Journal on Control and Optimization 44 (5), pp. 1543–1574. Cited by: Appendix B.
- Finite-time convergent gradient flows with applications to network consensus. Automatica 42 (11), pp. 1993–2000. Cited by: Appendix B, §1.
- Discontinuous dynamical systems. IEEE Control Systems Magazine 28 (3), pp. 36–73. Cited by: Appendix A.
- Differential equations with discontinuous righthand sides. Kluwer Academic Publishers Group, Dordrecht, Netherlands. Cited by: Appendix B.
- ADMM and accelerated ADMM as continuous dynamical systems. Cited by: §1.
- A dynamical systems perspective on nonsmooth constrained optimization. arXiv:1808.04048 [math.OC]. Cited by: §1.
- Optimization and dynamical systems. Springer-Verlag. Cited by: §1.
- Semistability, finite-time stability, differential inclusions, and discontinuous dynamical systems having a continuum of equilibria. IEEE Transactions on Automatic Control 54, pp. 2465–2470. Cited by: Appendix B.
- Deep learning theory review: an optimal control and dynamical systems perspective. arXiv preprint 1908.10920. Cited by: §1.
- A calculus for computing filippov’s differential inclusion with application to the variable structure control of robot manipulators. IEEE Transactions on Circuits and Systems 34, pp. 73–82. Cited by: Appendix A, Appendix B, Proposition 1.
- Unsupervised learning of finite mixture models using mean field games. 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 321–328. Cited by: §1.
- Lyapunov functions for convergence of principal component algorithms. Neural Networks 8 (1), pp. 11 – 23. External Links: Cited by: §1.
- Connecting lyapunov control theory to adversarial attacks. ArXiv abs/1907.07732. Cited by: §1.
- Convergence of the expectation-maximization algorithm through discrete-time lyapunov stability theory. Proceedings of the American Control Conference (ACC), pp. 163–168. Cited by: §1.
- A dynamical systems approach to constrained minimization. Numerical Functional Analysis and Optimization 21, pp. 537–551. Cited by: §1.
- Using dynamical systems methods to solve minimization problems. Applied Numerical Mathematics 18 (1), pp. 321 – 335. Cited by: §1.
- A new and dynamic method for unconstrained minimization. Applied Mathematical Modelling 6 (6), pp. 448–462. Cited by: §1.
- An improved version of the original leap-frog dynamic method for unconstrained minimization: LFOP1(b). Applied Mathematical Modelling 7 (3), pp. 216–218. Cited by: §1.
- A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In Advances in Neural Information Processing Systems, pp. 2510–2518. Cited by: §1.
- A control perspective for centralized and distributed convex optimization. In IEEE Conference on Decision and Control and European Control Conference, pp. 3800–3805. Cited by: §1.
- The use of differential equations in optimization. Ph.D. Thesis, Loughborough University. Cited by: §1.
- An optimal control view of adversarial machine learning. ArXiv abs/1811.04422. Cited by: §1.