FiniteTime Convergence of ContinuousTime Optimization Algorithms via Differential Inclusions
Abstract
In this paper, we propose two discontinuous dynamical systems in continuous time with guaranteed prescribed finitetime local convergence to strict local minima of a given cost function. Our approach consists of exploiting a Lyapunovbased differential inequality for differential inclusions, which leads to finitetime stability and thus finitetime convergence with a provable bound on the settling time. In particular, for exact solutions to the aforementioned differential inequality, the settlingtime bound is also exact, thus achieving prescribed finitetime convergence. We thus construct a class of discontinuous dynamical systems, of second order with respect to the cost function, that serve as continuoustime optimization algorithms with finitetime convergence and prescribed convergence time. Finally, we illustrate our results on the Rosenbrock function.
1 Introduction
In continuoustime optimization, an ordinary differential equation (ODE), partial differential equation (PDE), or differential inclusion is designed in terms of a given cost function, in such a way to lead the solutions to converge (forward in time) to an optimal value of the cost function. To achieve this, tools from Lyapunov stability theory are often employed, mainly because there already exists a rich body of work within the nonlinear systems and control theory community for this purpose. In particular, we seek asymptotically Lyapunov stable gradientbased systems with an equilibrium (stationary point) at an isolated extremum of the given cost function, thus certifying local convergence. Naturally, global asymptotic stability leads to global convergence, though such an analysis will typically require the cost function to be strongly convex everywhere.
For early work in this direction, see (Botsaris, 1978b, a), (Zghier, 1981), (Snyman, 1982, 1983), and (Brown, 1989). Brockett (1988) and, subsequently, Helmke and Moore (1994), studied relationships between linear programming, ODEs, and general matrix theory. Further, Schropp (1995) and Schropp and Singer (2000) explored several aspects linking nonlinear dynamical systems to gradientbased optimization, including nonlinear constraints. Cortés (2006) proposed two discontinuous normalized modifications of gradient flows to attain finitetime convergence. Later, Wang and Elia (2011) proposed a controltheoretic perspective on centralized and distributed convex optimization.
More recently, Su et al. (2014) derived a secondorder ODE as the limit of Nesterov’s accelerated gradient method, when the gradient step sizes vanish. This ODE is then used to study Nesterov’s scheme from a new perspective, particularly in an larger effort to better understand acceleration without substantially increasing computational burden. Expanding upon the aforementioned idea, França et al. (2018) derived a secondorder ODE that models the continuoustime limit of the sequence of iterates generated by the alternating direction method of multipliers (ADMM).Then, the authors employ Lyapunov theory to analyze the stability at critical points of the dynamical systems and to obtain associated convergence rates.
Later, França et al. (2019) analyze general nonsmooth and linearly constrained optimization problems by deriving equivalent (at the limit) nonsmooth dynamical systems related to variants of the relaxed and accelerated ADMM. Then, the authors employ Lyapunov theory to analyze the stability at critical points of the dynamical systems and to obtain associated convergence rates. Later, França et al. (2019) analyze general nonsmooth and linearly constrained optimization problems by deriving equivalent (at the limit) nonsmooth dynamical systems related to variants of the relaxed and accelerated ADMM.
In the more traditional context of machine learning, only a few papers have adopted the approach of explicitly borrowing or connecting ideas from control and dynamical systems. For unsupervised learning, Plumbley (1995) proposes Lyapunov stability theory as an approach to establish convergence of principal component algorithms. Pequito et al. (2011) and Aquilanti et al. (2019) propose continuoustime generalized expectationmaximization (EM) algorithms, based on meanfield games, for clustering of finite mixture models. Romero et al. (2019) establish convergence of the EM algorithm, and a class of generalized EM algorithms denoted EM, via discretetime Lyapunov stability theory. For supervised learning, Liu and Theodorou (2019) provide a review of deep learning from the perspective of control and dynamical systems, with a focus in optimal control. Zhu (2018) and Rahnama et al. (2019) explore connections between control theory and adversarial machine learning.
Statement of Contribution
In this work, we provide a Lyapunovbased tool to both check and construct continuoustime dynamical systems that are finitetime stable and thus lead to finitetime convergence of the candidate Lyapunov function (intended as a surrogate to a given cost function) to its minimum value. In particular, we first extend one of the existing Lyapunovbased inequality condition for finitetime convergence of the usual Lipschitz continuous dynamical systems, to the case of arbitrary differential inclusions. We then use this condition to construct a family of discontinuous, secondorder flows, which guarantee local convergence to a local minimum, in prescribed finite time. One of the proposed families of continuoustime optimization algorithms is tested on a wellknown optimization testcase, namely, the Rosenbrock function.
2 FiniteTime Convergence in Optimization via FiniteTime Stability
Consider some objective cost function that we wish to minimize. In particular, let be an arbitrary local minimum of that is unknown to us. In continuoustime optimization, we typically proceed by designing a nonlinear statespace dynamical system
(1) 
or a timevarying one replacing with , for which can be computed without explicit knowledge of and for which (1) is certifiably asymptotically stable at . Ideally, computing should be possible using only up to secondorder information on .
In this work, however, we seek dynamical systems for which (1) is certifiably finitetime stable at . As will be clear later, such systems need to be possibly discontinuous or nonLipschitz, based on differential inclusions instead of ODEs. Our approach to achieve this objective is largely based on exploiting the Lyapunovlike differential inequality
(2) 
with constants and , for absolutely continuous functions such that . Indeed, under the aforementioned conditions, will be reached in finite time .
We now summarize the problem statement:
Problem 1.
Given a sufficiently smooth cost function with a sufficiently regular local minimizer , solve the following tasks:

Design a sufficiently smooth
^{1} candidate Lyapunov function for which is defined and positive definite near and w.r.t.^{2} .
By following this strategy, we will therefore achieve (local and strong) finitetime stability, and thus finitetime convergence. Furthermore, if can be upper bounded, then can be readily tuned to achieve finitetime convergence under a prescribed range for the settling time, or even with exact prescribed settling time if can be explicitly computed and (2) holds exactly.
3 A Family of FiniteTime Stable, SecondOrder Optimization Flows
We now propose a family of secondorder optimization methods with finitetime convergence constructed using two gradientbased Lyapunov functions, namely and . First, we need to assume sufficient smoothness on the cost function.
Assumption 1.
is twice continuously differentiable and strongly convex in an open neighborhood of a stationary point .
Since for and a.e. for , we can readily design Filippov differential inclusions that are finitetime stable at . In particular, we may design such differential inclusions to achieve an exact and prescribed finite settling time, at the tradeoff of requiring secondorder information on .
Given a symmetric and positive definite matrix with SVD decomposition , , , we define , where .
Theorem 1.
Let , , and . Under Assumption 1, any maximal Filippov solution to the discontinuous secondorder generalized Newtonlike flows
(3) 
and
(4) 
with sufficiently small (where ) will converge in finite time to . Furthermore, their convergence times are given exactly by
(5) 
for (3)(4), respectively, where . In particular, given any compact and positively invariant subset , both flows converge in finite with the aforementioned settling time upper bounds (which can be tightened by replacing with ) for any . Furthermore, if , then we have global finitetime convergnece, i.e. finitetime convergence to any maximal Filippov solution with arbitrary .
Proof.
See supplementary material, appendix C. ∎
4 Numerical Experiment: Rosenbrock Function
We will now test one of our proposed flows on the Rosenbrock function , given by
(6) 
with parameters . This function is nonlinear and nonconvex, but smooth. It possesses exactly one stationary point for , which is a strict global minimum for . If , then is a saddle point. Finally, if , then are the stationary points of , and they are all nonstrict global minima.
As we can see in Figure 1, this flow converges correctly to the minimum from all the tested initial conditions with an exact prescribed settling time . It should be noted that at any given point in the trajectory , the functions and are not guaranteed to decrease or remain constant, indeed only can be guaranteed to do so, which explains the increase in Figure 1(d) that could never have occurred in Figure 1(e).
5 Conclusions and Future Work
We have introduced a new family of secondorder flows for continuoustime optimization. The main characteristic of the proposed flows is their finitetime convergence guarantees. Furthermore, they are designed in such a way that the finite convergence time can be prescribed by the user. To be able to analyze these discontinuous flows, we resorted to establishing finitetime stability. In order to do this, we first extended an existing sufficient condition for finitetime stability through a Lyapunovbased inequality, in the case of smooth dynamics, to the case of nonsmooth dynamics modeled by differential inclusions. One of the proposed families was tested on a wellknown optimization benchmark – the Rosenbrock function.
While the obtained results are encouraging, it should be clear that currently available numerical solvers for our proposed flows do not translate into competitive iterative optimization algorithms. Deciding how to best discretize a given continuoustime optimization algorithm (i.e. as given by an ODE, PDE, or differential inclusion), or how to compare two such continuoustime algorithms in terms of the performance of corresponding iterative schemes suitable for digital computers, largely remains an open problem, but nonetheless a very active topic of research as of right now.
For the aforementioned reasons, future work will be dedicated to studying discretization of our proposed flows (and other finitetime stable flows) that will hopefully lead to either accelerated schemes when compared to currently available methods, or to a better understanding of the intrinsic boundaries of acceleration strategies. Furthermore, we will extend our results to constrained optimization; gradientfree optimization; timevarying optimization; and extremumseeking control where the derivative of the cost function is estimated from direct measurements of the cost.
Acknowledgments
The majority of the research that led to this work was conducted when the first author was doing an internship at Mitsubishi Electrics Research Laboratories (MERL) in the summer of 2019, under the supervision of the second author.
Supplementary Material
Appendix A Discontinuous Systems and Filippov Differential Inclusions
Recall that for an initial value problem (IVP)
(7a)  
(7b) 
with , the typical way to check for existence of solutions is by establishing continuity of . Likewise, to establish unicity of solution, we typically seek Lipschitz continuity. When is discontinuous, but nonetheless Lebesgue measurable and locally essentially uniformly bounded, we may understand (7a) as the Filippov differential inclusion
(8) 
where denotes the Filippov setvalued map given by
(9) 
where denotes the usual Lebesgue measure and the convex closure, i.e. closure of the convex hull . For more details, se (Paden and Sastry, 1987), from which this section is based on.
Assumption 2.
is defined a.e. and is Lebesgue measurable in a nonempty open region . Further, is locally essentially bounded, i.e., for every point , is bounded a.e. in some bounded neighborhood of .
Definition 1 (Filippov).
For a comprehensive overview of discontinuous systems, including sufficient conditions for existence (Proposition 3) and uniqueness (Propositions 4 and 5) of Filippov solutions, see (Cortés, 2008). In particular, it can be established that Filippov solutions to (7) exist, provided that Assumption 2 holds.
Proposition 1 (Paden and Sastry (1987), Theorem 1).
For instance, for a gradient flow, we have for every , provided that is continuously differentiable. Furthermore, if is only locally Lipschitz continuous and regular (see Definition 2 of Appendix B), then , where
(11) 
denotes Clarke’s generalized gradient (Clarke, 1981) of , with being any zeromeasure set and the zeromeasure set over which is not differentiable.
Appendix B FiniteTime Stability of Differential Inclusions
Consider a general differential inclusion (see (Bacciotti and Ceragioli, 1999) for more details)
(12) 
where is a setvalued map.
Assumption 3.
has nonempty, compact, and convex values, and is upper semicontinuous.
Filippov and Arscott (1988) proved that, under Assumption 2, the Filippov setvalued map satisfies the conditions of Assumption 3.
Similarly to the previous case of Filippov solutions, we say that with is a Carathéodory solution to (12) if is absolutely continuous and (12) is satisfied a.e. in every compact subset of . Furthermore, we say that is a maximal Carathéodory solution if no other Carathéodory solution exists with .
We say that is an equilibrium of (12) if on some small enough nondegenerate interval is a Carathéodory solution to (12). In other words, if and only if . We say that (12) is (Lyapunov) stable at if, for every , there exists some such that, for every maximal Carathéodory solution of (12), we have for every in the interval where is defined. Note that, under Assumption 3, if (12) is stable at , then is an equilibrium of (12) (Bacciotti and Ceragioli, 1999). Furthermore, we say that (12) is (locally and strongly) asymptotically stable at if is stable at and there exists some such that, for every maximal Carathéodory solution of (12), if then as . Finally, (12) is (locally and strongly) finitetime stable at if it is asymptotically stable and there exists some and such that, for every maximal Carathéodory solution of (12) with , we have .
We will now construct a Lyapunovbased criterion adapted from the literature of finitetime stability of Lipschitz continuous systems. To do this, we first adapt Lemma 1 in (Benosman et al., 2009) for absolutely continuous functions and nonpositive exponents.
Lemma 1.
Proof.
Suppose that for every with . Let be the supremum of all such ’s, thus satisfying for every . We will now investigate . First, by continuity of , it follows that . Now, by rewriting
(15) 
a.e. in , we can thus integrate to obtain
(16) 
everywhere in , which in turn turn leads to
(17) 
and
(18) 
where the last inequality follows from for every . Taking the supremum in (18) then leads to the upper bound (14). Finally, we conclude that , since is impossible given that it would mean, due to continuity of , that there exists some such that for every , thus contradicting the construction of .
In their Proposition 2.8, Cortés and Bullo (2005) proposed a Lyapunovbased criterion to establish finitetime stability of discontinuous systems, which fundamentally boils down to Lemma 1 with exponent . This proposition was in turn based on Theorem 2 of Paden and Sastry (1987). Cortés (2006) later proposed a secondorder Lyapunov criterion, which fundamentally boils down to being strongly convex. Finally, Corollary 3.1 of Hui et al. (2009) generalized the aforementioned Proposition 2.8 of Cortés and Bullo (2005) to establish semistability. Indeed, these two results coincide for isolated equilibria.
We now present a novel result that generalizes the aforementioned firstorder Lyapunovbased results, by exploiting our Lemma 1. More precisely, given a Laypunov candidate function , the objective is to set and check that the conditions of Lemma 1 hold. To do this, and assuming to be locally Lipschitz continuous, we first borrow and adapt from Bacciotti and Ceragioli (1999) the definition of setvalued time derivative of w.r.t. the differential inclusion (12), given by
(19) 
for each . Notice that, under Assumption 3 For Filippov differential inclusions, we have , and the setvalued time derivative of thus coincides with with the setvalued Lie derivative . Indeed, more generally could be seen as a setvalued Lie derivative w.r.t. the setvalued map .
Definition 2.
We say that is said to be regular if every directional derivative, given by
(20) 
exists and is equal to
(21) 
known as Clarke’s upper generalized derivative (Clarke, 1981).
Assumption 4.
is locally Lipscthiz continuous and regular.
In practice, regularity is a fairly mild and easy to guarantee condition. For instance, it would suffice that is convex or continuously differentiable to ensure that it is Lipschitz and regular.
We are now equipped to formally establish the correspondence between the setvalued timederivative of and the derivative of the energy function associated with an arbitrary Carathéodory solution to the differential inclusion (12).
Lemma 2 (Lemma 1 of Bacciotti and Ceragioli (1999)).
We are now ready to state and prove our Lyapunovbased sufficient condition for finitetime stability of differential inclusions.
Theorem 2.
Suppose that Assumptions 3 and 4 hold for some setvalued map and some function , where is an open and positively invariant neighborhood of a point . Suppose that is positive definite w.r.t. and that there exist constants and such that
(22) 
a.e. in . Then, (12) is finitetime stable at , with settling time upper bounded by
(23) 
where . In particular, any Carathéodory solution with will converge in finite time to under the upper bound (23). Furthermore, if , then (12) is globally finitetime stable. Finally, if is a singleton a.e. in and (22) is exact, then so is (23).
Proof.
Note that, by Proposition 1 of Bacciotti and Ceragioli (1999), we know that (12) is Lyapunov stable at . All that remains to show is local convergence towards (which must be an equilibrium) in finite time. Indeed, given any maximal solution to (12) with , we know by Lemma 2, that is absolutely continuous with a.e. in . Therefore, we have
(24) 
a.e. in . Since , given that , the result then follows by invoking Lemma 1 and noting that . ∎
Appendix C Proof of the main result: Theorem 1
Let us focus first on (3), since the proof for (4) follows similar steps. The idea is to show that it is finitetime stable at , with the inequality in Theorem 2 holding exactly for . First, notice that is continuous near (but not at) , and undefined at itself. Furthermore, we have
(25a)  
(25b)  
(25c) 
with and everywhere near for some (strong convexity). Therefore, is Lebesgue integrable (and thus measurable) and locally essentially bounded, which means that Assumption 2 is satisfied.
Set , defined over . If is not positively invariant w.r.t. (3), we can always replace it by a smaller open subset that is, e.g. a sufficiently small strict sublevel set contained within . Clearly, is continuously differentiable, thus satisfying Assumption 4. Furthermore, it is positive definite w.r.t. and, given , we have
(26a)  
(26b)  
(26c)  
(26d) 
with . Furthermore, since . The result thus follows by invoking Theorem 2.
We now proceed to establish finitetime stability of (4) at . Like before, we notice that is continuous near, but not at, . Furthermore, notice that
(27a)  
(27b)  
(27c)  
(27d)  
(27e) 
for every , with . Since is a zeromeasure set due to being a finite union of hypersurfaces in (), and also recalling that is strongly convex near and , it follows that is Lebesgue integrable and locally essentially bounded. Therefore, Assumption 2 is once again satisfied.
Now consider the candidate Lyapunov function , defined over . Clearly, is not continuously differentiable this time. However, it is still satisfies Assumption 2 due to being a.e. differentiable. In particular, we have for every . In other words, we have a.e. in .
Given , we thus have
(28a)  
(28b)  
(28c)  
(28d) 
with . The result once again follows by invoking Theorem 2.
Footnotes
 At least locally Lipschitz continuous and regular (see Definition 2 of the supplementary material, Appendix A).
 In other words, there exists some open neighborhood of such that is defined in and satisfies and for every .
 Righthand side defined at least a.e., Lebesgue measurable, and locally essentially bounded.
 See supplementary material, Appendix A.
References
 Cited by: §1.
 Stability and stabilization of discontinuous systems and nonsmooth lyapunov functions. ESAIM: Control, Optimisation and Calculus of Variations 4, pp. 361–376. Cited by: Appendix B, Appendix B, Appendix B, Appendix B, Lemma 2.
 Nonlinear control allocation for nonminimum phase systems. IEEE Transactions on Control Systems Technology 17 (2), pp. 394–404. Cited by: Appendix B.
 Differential gradient methods. Journal of Mathematical Analysis and Applications 63 (1), pp. 177–198. Cited by: §1.
 A class of methods for unconstrained minimization based on stable numerical integration techniques. Journal of Mathematical Analysis and Applications 63 (3), pp. 729–749. Cited by: §1.
 Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. pp. 799–803. Cited by: §1.
 Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations. Journal of Optimization Theory and Applications 62 (2), pp. 211–224. Cited by: §1.
 Generalized gradients of lipschitz functionals. Advances in Mathematics 40 (1), pp. 52–67. Cited by: Appendix A, Definition 2.
 Coordination and geometric optimization via distributed dynamical systems. SIAM Journal on Control and Optimization 44 (5), pp. 1543–1574. Cited by: Appendix B.
 Finitetime convergent gradient flows with applications to network consensus. Automatica 42 (11), pp. 1993–2000. Cited by: Appendix B, §1.
 Discontinuous dynamical systems. IEEE Control Systems Magazine 28 (3), pp. 36–73. Cited by: Appendix A.
 Differential equations with discontinuous righthand sides. Kluwer Academic Publishers Group, Dordrecht, Netherlands. Cited by: Appendix B.
 ADMM and accelerated ADMM as continuous dynamical systems. Cited by: §1.
 A dynamical systems perspective on nonsmooth constrained optimization. arXiv:1808.04048 [math.OC]. Cited by: §1.
 Optimization and dynamical systems. SpringerVerlag. Cited by: §1.
 Semistability, finitetime stability, differential inclusions, and discontinuous dynamical systems having a continuum of equilibria. IEEE Transactions on Automatic Control 54, pp. 2465–2470. Cited by: Appendix B.
 Deep learning theory review: an optimal control and dynamical systems perspective. arXiv preprint 1908.10920. Cited by: §1.
 A calculus for computing filippov’s differential inclusion with application to the variable structure control of robot manipulators. IEEE Transactions on Circuits and Systems 34, pp. 73–82. Cited by: Appendix A, Appendix B, Proposition 1.
 Unsupervised learning of finite mixture models using mean field games. 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 321–328. Cited by: §1.
 Lyapunov functions for convergence of principal component algorithms. Neural Networks 8 (1), pp. 11 – 23. External Links: Link Cited by: §1.
 Connecting lyapunov control theory to adversarial attacks. ArXiv abs/1907.07732. Cited by: §1.
 Convergence of the expectationmaximization algorithm through discretetime lyapunov stability theory. Proceedings of the American Control Conference (ACC), pp. 163–168. Cited by: §1.
 A dynamical systems approach to constrained minimization. Numerical Functional Analysis and Optimization 21, pp. 537–551. Cited by: §1.
 Using dynamical systems methods to solve minimization problems. Applied Numerical Mathematics 18 (1), pp. 321 – 335. Cited by: §1.
 A new and dynamic method for unconstrained minimization. Applied Mathematical Modelling 6 (6), pp. 448–462. Cited by: §1.
 An improved version of the original leapfrog dynamic method for unconstrained minimization: LFOP1(b). Applied Mathematical Modelling 7 (3), pp. 216–218. Cited by: §1.
 A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In Advances in Neural Information Processing Systems, pp. 2510–2518. Cited by: §1.
 A control perspective for centralized and distributed convex optimization. In IEEE Conference on Decision and Control and European Control Conference, pp. 3800–3805. Cited by: §1.
 The use of differential equations in optimization. Ph.D. Thesis, Loughborough University. Cited by: §1.
 An optimal control view of adversarial machine learning. ArXiv abs/1811.04422. Cited by: §1.