Fast convex optimization via inertial dynamics with Hessian driven damping
Abstract.
We first study the fast minimization properties of the trajectories of the secondorder evolution equation
where is a smooth convex function acting on a real Hilbert space , and , are positive parameters. This inertial system combines an isotropic viscous damping which vanishes asymptotically, and a geometrical Hessian driven damping, which makes it naturally related to Newton’s and LevenbergMarquardt methods. For , and , along any trajectory, fast convergence of the values
is obtained, together with rapid convergence of the gradients to zero. For , just assuming that we show that any trajectory converges weakly to a minimizer of , and that . Strong convergence is established in various practical situations. In particular, for the strongly convex case, we obtain an even faster speed of convergence which can be arbitrarily fast depending on the choice of . More precisely, we have . Then, we extend the results to the case of a general proper lowersemicontinuous convex function . This is based on the crucial property that the inertial dynamic with Hessian driven damping can be equivalently written as a firstorder system in time and space, allowing to extend it by simply replacing the gradient with the subdifferential. By explicitimplicit time discretization, this opens a gate to new possibly more rapid inertial algorithms, expanding the field of FISTA methods for convex structured optimization problems.
Key words and phrases:
Convex optimization, fast convergent methods, dynamical systems, gradient flows, inertial dynamics, vanishing viscosity, Hessiandriven damping, nonsmooth potential, forwardbackward algorithms, FISTAIntroduction
Throughout the paper, is a real Hilbert space endowed with scalar product and norm for . Let be a twice continuously differentiable convex function (the case of a nonsmooth function will be considered later on). In view of the minimization of , we study the asymptotic behaviour (as ) of the trajectories of the secondorder differential equation
(1) 
where and are positive parameters.
This inertial system combines two types of damping:
In the first place, the term furnishes an isotropic linear damping with a viscous parameter which vanishes asymptotically, but not too slowly. The asymptotic behavior of the inertial gradientlike system
(2) 
with Asymptotic Vanishing Damping ((AVD) for short), has been studied by Cabot, Engler and Gaddat in [21][22]. They proved that, under moderate decrease of to zero, namely, that and , every solution of (2) satisfies .
Interestingly, with the specific choice :
(3) 
Su, Boyd and Candès in [36] proved the fast convergence property
(4) 
provided . In the same article, the authors show that, for , (3) can be seen as a continuoustime version of the fast convergent method of Nesterov [29][30][31][32]. In [13], Attouch, Peypouquet and Redont showed that, for , each trajectory of (3) converges weakly to an element of . This result is a continuoustime counterpart to the ChambolleDossal algorithm [23], which is a modified Nesterov algorithm specially designed to obtain the convergence of the iterates.
In the second place, a geometrical damping, attached to the term , has a natural link with Newton’s method. It gives rise to the socalled Dynamical Inertial Newton system ((DIN) for short)
(5) 
which has been introduced by Alvarez, Attouch, Bolte and Redont in [6] ( is a fixed positive parameter). Interestingly, (5) can be equivalently written as a firstorder system involving only the gradient of , which allows its extension to the case of a proper lowersemicontinuous convex function . This led to applications ranging from optimization algorithms [12] to unilateral mechanics and partial differential equations [11].
As we shall see, (DINAVD) inherits the convergence properties of both (AVD) and (DIN), but exhibits other important features, namely (see Theorems 1.10, 1.14, 1.15, 3.1, 4.8, 4.11, 4.12):

Assuming , and , we show the fast convergence property of the values (4), together with the fast convergence to zero of the gradients
(6) 
For , we complete these results by showing that every trajectory converges weakly, with its limit belonging to . Moreover, we obtain a faster order of convergence .

Also for , strong convergence is established in various practical situations. In particular, for the strongly convex case, we obtain an even faster speed of convergence which can be arbitrarily fast according to the choice of . More precisely, we have .

A remarkable property of the system (DINAVD) is that these results can be naturally generalized to the nonsmooth convex case. The key argument is that it can be reformulated as a firstorder system (both in time and space) involving only the gradient and not the Hessian!
Time discretization of (DINAVD) provides new ideas for the design of innovative fast converging algorithms, expanding the field of rapid methods for structured convex minimization of Nesterov [29, 30, 31, 32], BeckTeboulle [16], and ChambolleDossal [23]. This study, however, goes beyond the scope of this paper, and will be carried out in a future research. As briefly evoked above, the continuous (DINAVD) system is also linked to the modeling of nonelastic shocks in unilateral mechanics, and the geometric damping of nonlinear oscillators. These are important areas for applications, which are not considered in this paper.
1. Smooth potential
The following minimal hypotheses are in force in this section, and are always tacitly assumed:

, ;

is a twice continuously differentiable convex function; and

^{1}^{1}1Taking comes from the singularity of the damping coefficient at zero. Since we are only concerned about the asymptotic behaviour of the trajectories, the time origin is unimportant. If one insists in starting from , then all the results remain valid with ., , .
In view of minimizing , we study the asymptotic behaviour, as , of a solution to (DINAVD) secondorder evolution equation (1). We will successively examine the following points:

existence and uniqueness of a solution to (DINAVD) with Cauchy data and ;

minimizing properties of and convergence of towards whenever ;

fast convergence of towards , when the latter is attained and ;

weak convergence of towards a minimum of and faster convergence of , when ;

some cases of strong convergence of , and faster convergence of .
1.1. Existence and uniqueness of solution
The following result will be derived in Section 4 from a more general result concerning a convex lower semicontinuous function (see Corollary 4.6 below):
Theorem 1.1.
For any Cauchy data , (DINAVD) admits a unique twice continuously differentiable global solution verifying .
1.2. Lyapunov analysis and minimizing properties of the solutions for
In this section, we present a family of Lyapunov functions for (DINAVD), and use them to derive the main properties of the solutions to this system. As we shall see, the fact that we have more than one (essentially different) of these functions will play a crucial role in establishing that the gradient vanishes as .
Let satisfy (DINAVD) with Cauchy data and , and let . Define by
(7) 
Observe that, for , we obtain
which is the usual global mechanical energy of the system. We shall see that, for each , is a strict Lyapunov function for (DINAVD).
In order to simplify the notation, write
(8) 
so that and, for each ,
(9)  
Using (9) and (DINAVD), elementary computations yield
(10) 
We have the following:
Proposition 1.2.
Let , and suppose is a solution of (DINAVD). Then, for each and , we have
Proof.
Theorem 1.3.
Let , and suppose is a solution of (DINAVD). Then
Proof.
Since we are interested in asymptotic properties of , we can assume throughout the proof. Take , so that the last term in the definition (7) of vanishes. Given , we define by
By the Chain Rule, we have
On the other hand, from (9) and (10), we obtain
(13) 
Set
and observe that
Next, since , we can write
where the last inequality follows from the convexity of and the fact that . Using the definition (7) of , and Proposition 1.2, we get
Dividing by and rearranging the terms, we have
Since , and are bounded from below, we can integrate this inequality from to , and use Lemma 7.3 to obtain such that
(14) 
Since is nonincreasing, we have
(15)  
In turn,
(16)  
since is nonincreasing and .
for appropriate constants .
Now, take such that for all , and integrate from to to obtain
Since is nonnegative, this implies
and so,
(17) 
for some other constants . As , we obtain (the limit is in ). Since is arbitrary, and for all , the result follows. ∎
By the weak lowersemicontinuity of , Theorem 1.3 immediately yields the following:
Corollary 1.4.
Let , and suppose is a solution of (DINAVD). As , every sequential weak cluster point of belongs to . In particular, if does not tend to as , then .
If the function is bounded from below, we have the following stability result:
Proposition 1.5.
Let , and suppose is a solution of (DINAVD). If , then
Proof.
Proposition 1.6.
Let , and suppose is a solution of (DINAVD). If , then

, and ;

.
1.3. Fast convergence of the values for
In this part we mainly analyze the fast convergence of the values of along a trajectory of (DINAVD). The value plays a special role: to our knowledge, it is the smallest for which fast convergence results are proved to hold.
Suppose and . Let be a solution of (DINAVD) with Cauchy data . For we define the function by
(19) 
where is given by (8), with . To compute we first differentiate each term of in turn (we use (10) in the second derivative).
Whence
Now, . If , we deduce, from (1.3), that
(21) 
Remark 1.8.
Recall that is nonnegative. Let us give a closer look at the coefficients on the righthand side: First, for provided . Next, whenever . A compatibility condition for these two relations to hold is that , thus . The limiting case (thus ) will be included in Lemma 1.9 below. Finally, for . Summarizing, if , we immediately deduce that is nonincreasing on the interval , and exists.
Lemma 1.9.
Let and . Suppose is a solution of (DINAVD). If , then the function
is nonincreasing and exists.
Proof.
Since we are interested in asymptotic properties of , we can assume . From (21) we deduce
Multiplying by and noticing we obtain
Now, multiplying by we obtain
whence we deduce
Therefore, the function is nonincreasing. Since it is nonnegative, it has a limit as , and, clearly, so does . ∎
An important consequence is the following:
Theorem 1.10.
Let and . Suppose is a solution of (DINAVD). Then, is bounded. Moreover, set and . For all , we have