Fast convex optimization via inertial dynamics with Hessian driven damping
We first study the fast minimization properties of the trajectories of the second-order evolution equation
where is a smooth convex function acting on a real Hilbert space , and , are positive parameters. This inertial system combines an isotropic viscous damping which vanishes asymptotically, and a geometrical Hessian driven damping, which makes it naturally related to Newton’s and Levenberg-Marquardt methods. For , and , along any trajectory, fast convergence of the values
is obtained, together with rapid convergence of the gradients to zero. For , just assuming that we show that any trajectory converges weakly to a minimizer of , and that . Strong convergence is established in various practical situations. In particular, for the strongly convex case, we obtain an even faster speed of convergence which can be arbitrarily fast depending on the choice of . More precisely, we have . Then, we extend the results to the case of a general proper lower-semicontinuous convex function . This is based on the crucial property that the inertial dynamic with Hessian driven damping can be equivalently written as a first-order system in time and space, allowing to extend it by simply replacing the gradient with the subdifferential. By explicit-implicit time discretization, this opens a gate to new possibly more rapid inertial algorithms, expanding the field of FISTA methods for convex structured optimization problems.
Key words and phrases:Convex optimization, fast convergent methods, dynamical systems, gradient flows, inertial dynamics, vanishing viscosity, Hessian-driven damping, non-smooth potential, forward-backward algorithms, FISTA
Throughout the paper, is a real Hilbert space endowed with scalar product and norm for . Let be a twice continuously differentiable convex function (the case of a nonsmooth function will be considered later on). In view of the minimization of , we study the asymptotic behaviour (as ) of the trajectories of the second-order differential equation
where and are positive parameters.
This inertial system combines two types of damping:
In the first place, the term furnishes an isotropic linear damping with a viscous parameter which vanishes asymptotically, but not too slowly. The asymptotic behavior of the inertial gradient-like system
with Asymptotic Vanishing Damping ((AVD) for short), has been studied by Cabot, Engler and Gaddat in -. They proved that, under moderate decrease of to zero, namely, that and , every solution of (2) satisfies .
Interestingly, with the specific choice :
Su, Boyd and Candès in  proved the fast convergence property
provided . In the same article, the authors show that, for , (3) can be seen as a continuous-time version of the fast convergent method of Nesterov ---. In , Attouch, Peypouquet and Redont showed that, for , each trajectory of (3) converges weakly to an element of . This result is a continuous-time counterpart to the Chambolle-Dossal algorithm , which is a modified Nesterov algorithm specially designed to obtain the convergence of the iterates.
In the second place, a geometrical damping, attached to the term , has a natural link with Newton’s method. It gives rise to the so-called Dynamical Inertial Newton system ((DIN) for short)
which has been introduced by Alvarez, Attouch, Bolte and Redont in  ( is a fixed positive parameter). Interestingly, (5) can be equivalently written as a first-order system involving only the gradient of , which allows its extension to the case of a proper lower-semicontinuous convex function . This led to applications ranging from optimization algorithms  to unilateral mechanics and partial differential equations .
Assuming , and , we show the fast convergence property of the values (4), together with the fast convergence to zero of the gradients
For , we complete these results by showing that every trajectory converges weakly, with its limit belonging to . Moreover, we obtain a faster order of convergence .
Also for , strong convergence is established in various practical situations. In particular, for the strongly convex case, we obtain an even faster speed of convergence which can be arbitrarily fast according to the choice of . More precisely, we have .
A remarkable property of the system (DIN-AVD) is that these results can be naturally generalized to the non-smooth convex case. The key argument is that it can be reformulated as a first-order system (both in time and space) involving only the gradient and not the Hessian!
Time discretization of (DIN-AVD) provides new ideas for the design of innovative fast converging algorithms, expanding the field of rapid methods for structured convex minimization of Nesterov [29, 30, 31, 32], Beck-Teboulle , and Chambolle-Dossal . This study, however, goes beyond the scope of this paper, and will be carried out in a future research. As briefly evoked above, the continuous (DIN-AVD) system is also linked to the modeling of non-elastic shocks in unilateral mechanics, and the geometric damping of nonlinear oscillators. These are important areas for applications, which are not considered in this paper.
1. Smooth potential
The following minimal hypotheses are in force in this section, and are always tacitly assumed:
is a twice continuously differentiable convex function; and
111Taking comes from the singularity of the damping coefficient at zero. Since we are only concerned about the asymptotic behaviour of the trajectories, the time origin is unimportant. If one insists in starting from , then all the results remain valid with ., , .
In view of minimizing , we study the asymptotic behaviour, as , of a solution to (DIN-AVD) second-order evolution equation (1). We will successively examine the following points:
existence and uniqueness of a solution to (DIN-AVD) with Cauchy data and ;
minimizing properties of and convergence of towards whenever ;
fast convergence of towards , when the latter is attained and ;
weak convergence of towards a minimum of and faster convergence of , when ;
some cases of strong convergence of , and faster convergence of .
1.1. Existence and uniqueness of solution
For any Cauchy data , (DIN-AVD) admits a unique twice continuously differentiable global solution verifying .
1.2. Lyapunov analysis and minimizing properties of the solutions for
In this section, we present a family of Lyapunov functions for (DIN-AVD), and use them to derive the main properties of the solutions to this system. As we shall see, the fact that we have more than one (essentially different) of these functions will play a crucial role in establishing that the gradient vanishes as .
Let satisfy (DIN-AVD) with Cauchy data and , and let . Define by
Observe that, for , we obtain
which is the usual global mechanical energy of the system. We shall see that, for each , is a strict Lyapunov function for (DIN-AVD).
In order to simplify the notation, write
so that and, for each ,
Using (9) and (DIN-AVD), elementary computations yield
We have the following:
Let , and suppose is a solution of (DIN-AVD). Then, for each and , we have
Let , and suppose is a solution of (DIN-AVD). Then
Since we are interested in asymptotic properties of , we can assume throughout the proof. Take , so that the last term in the definition (7) of vanishes. Given , we define by
By the Chain Rule, we have
and observe that
Next, since , we can write
Dividing by and rearranging the terms, we have
Since , and are bounded from below, we can integrate this inequality from to , and use Lemma 7.3 to obtain such that
Since is nonincreasing, we have
since is nonincreasing and .
for appropriate constants .
Now, take such that for all , and integrate from to to obtain
Since is nonnegative, this implies
for some other constants . As , we obtain (the limit is in ). Since is arbitrary, and for all , the result follows. ∎
By the weak lower-semicontinuity of , Theorem 1.3 immediately yields the following:
Let , and suppose is a solution of (DIN-AVD). As , every sequential weak cluster point of belongs to . In particular, if does not tend to as , then .
If the function is bounded from below, we have the following stability result:
Let , and suppose is a solution of (DIN-AVD). If , then
Let , and suppose is a solution of (DIN-AVD). If , then
, and ;
For i), observe that . Next, use (17) with to conclude.
1.3. Fast convergence of the values for
In this part we mainly analyze the fast convergence of the values of along a trajectory of (DIN-AVD). The value plays a special role: to our knowledge, it is the smallest for which fast convergence results are proved to hold.
Suppose and . Let be a solution of (DIN-AVD) with Cauchy data . For we define the function by
Now, . If , we deduce, from (1.3), that
Recall that is nonnegative. Let us give a closer look at the coefficients on the right-hand side: First, for provided . Next, whenever . A compatibility condition for these two relations to hold is that , thus . The limiting case (thus ) will be included in Lemma 1.9 below. Finally, for . Summarizing, if , we immediately deduce that is nonincreasing on the interval , and exists.
Let and . Suppose is a solution of (DIN-AVD). If , then the function
is nonincreasing and exists.
Since we are interested in asymptotic properties of , we can assume . From (21) we deduce
Multiplying by and noticing we obtain
Now, multiplying by we obtain
whence we deduce
Therefore, the function is nonincreasing. Since it is nonnegative, it has a limit as , and, clearly, so does . ∎
An important consequence is the following:
Let and . Suppose is a solution of (DIN-AVD). Then, is bounded. Moreover, set and . For all , we have