A primal-dual dynamical approach to structured convex minimization problems

# A primal-dual dynamical approach to structured convex minimization problems

Radu Ioan Boţ University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: radu.bot@univie.ac.at. Research partially supported by FWF (Austrian Science Fund), project I 2419-N32.    Ernö Robert Csetnek University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: ernoe.robert.csetnek@univie.ac.at. Research supported by FWF (Austrian Science Fund), project P 29809-N32.    Szilárd Csaba László Technical University of Cluj-Napoca, Department of Mathematics, Memorandumului 28, Cluj-Napoca, Romania, email: szilard.laszlo@math.utcluj.ro. This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2016-0266, within PNCDI III.

Abstract. In this paper we propose a primal-dual dynamical approach to the minimization of a structured convex function consisting of a smooth term, a nonsmooth term, and the composition of another nonsmooth term with a linear continuous operator. In this scope we introduce a dynamical system for which we prove that its trajectories asymptotically converge to a saddle point of the Lagrangian of the underlying convex minimization problem as time tends to infinity. In addition, we provide rates for both the violation of the feasibility condition by the ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value. Explicit time discretization of the dynamical system results in a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.

Keywords. structured convex minimization, dynamical system, proximal ADMM algorithm, primal-dual algorithm

AMS Subject Classification. 37N40, 49N15, 90C25, 90C46

## 1 Introduction and preliminaries

For and real Hilbert spaces, we consider the convex minimization problem

 infx∈Hf(x)+h(x)+g(Ax), (1)

where and are proper, convex and lower semicontinuous functions, is a convex and Fréchet differentiable function with -Lipschitz continuous gradient , i.e. for every , and is a continuous linear operator.

Problem (1) can be rewritten as

 inf\lx@stackrel(x,z)∈H×GAx−z=0f(x)+h(x)+g(z). (2)

Obviously, is an optimal solution of (1) if and only if is an optimal solution of (2) and .

Based on this reformulation of problem (1) we define its Lagrangian

 l:H×G×G⟶¯¯¯¯R,l(x,z,y)=f(x)+h(x)+g(z)+⟨y,Ax−z⟩.

An element is said to be a saddle point of the Lagrangian , if

 l(x∗,z∗,y)≤l(x∗,z∗,y∗)≤l(x,z,y∗),∀(x,z,y)∈H×G×G.

It is known that is a saddle point of if and only if is an optimal solution of (1), , and is an optimal solution of the Fenchel dual to problem (1), which reads

 supy∈G(−(f∗□h∗)(−A∗y)−g∗(y)). (3)

In this situation the optimal objective values of (1) and (3) coincide.

In the formulation of (3),

 f∗:H→¯¯¯¯R, f∗(u)=supx∈H(⟨u,x⟩−f(x)),  h∗:H→¯¯¯¯R, h∗(u)=supx∈H(⟨u,x⟩−h(x)),

and

 g∗:G→¯¯¯¯R, g∗(y)=supz∈G(⟨y,z⟩−g(z)),

denote the conjugate functions of and , respectively, and denotes the adjoint operator of . The infimal convolution of the functions and is defined by

 (f∗□h∗)(x)=infy∈H(f∗(y)+h∗(x−y)).

It is also known that is a saddle point of the Lagrangian if and only if it is a solution of the following system of primal-dual optimality conditions

 {0∈∂f(x)+∇h(x)+A∗yAx=z,Ax∈∂g∗(y).

We recall that the convex subdifferential of the function at is defined by , for , and by , otherwise.

A saddle point of the Lagrangian exists whenever the primal problem (1) has an optimal solution and the so-called Attouch-Brézis regularity condition

 0∈sqri(domg−A(domf))

holds. Here,

 sqriQ:={x∈Q:∪λ>0λ(Q−x) is a closed% linear subspace of G}

denotes the strong quasi-relative interior of a set . We refer the reader to [9, 11, 28] for more insights into the world of regularity conditions and convex duality theory.

Let denote the family of continuous linear operators which are self-adjoint and positive semidefinite. For we introduce the following seminorm on :

 ∥x∥2U=⟨x,Ux⟩ ∀x∈H.

This introduces on the following partial ordering: for

 U1≽U2⇔∥x∥2U1≥∥x∥2U2 ∀x∈H.

For fixed, let be

 Pα(H)={U∈S+(H):U≽αI},

where denotes the identity operator on .

The subject of our investigations in this paper will be the following dynamical system, for which we will show that it asymptotically approaches the set of solutions of the primal-dual pair of optimization problems (1)-(3)

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩˙x(t)+x(t)∈(∂f+cA∗A+M1(t))−1(M1(t)x(t)+cA∗z(t)−A∗y(t)−∇h(x(t)))˙z(t)+z(t)∈(∂g+cI+M2(t))−1(M2(t)z(t)+cA(γ˙x(t)+x(t))+y(t))˙y(t)=cA(x(t)+˙x(t))−c(z(t)+˙z(t))x(0)=x0∈H,z(0)=z0∈G,y(0)=y0∈G, (4)

where , , and and .

One of the motivation for the study of this dynamical system comes from the fact that, as we will see in Remark 1, it provides through explicit time discretization a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.

In the next section we will show the existence and uniqueness of strong global solutions for the dynamical system (4) in the framework of the Cauchy-Lipschitz Theorem. In Section 3 we will prove some technical results, which will play an important role in the asymptotic analyis. In Section 4 we will investigate the asymptotic behaviour of the trajectories as the time tends to infinity. By carrying out a Lyapunov analysis and by relying on the continuous variant of the Opial Lemma, we are able to prove that the trajectories generated by (4) asymptotically convergence to a saddle point of the Lagrangian . Furthermore, we provide convergence rates of for the violation of the feasibility condition by ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value.

The approach of optimization problems by dynamical systems has a long tradition. Crandall and Pazy considered in  dynamical systems governed by subdifferential operators (and more general by maximally monotone operators) in Hilbert spaces, addressed questions like the existence and uniqueness of solution trajectories, and related the latter to the theory of semi-groups of nonlinear contractions. Brézis  studied the asymptotic behaviour of the trajectories for dynamical systems governed by convex subdifferentials, and Bruck carried out in  a similar analysis for maximally monotone operators. Dynamical systems defined via resolvent/proximal evaluations of the governing operators have enjoyed much attention in the last years, as they result by explicit time discretization in relaxed versions of standard numerical algorithms, with high flexibility and good numerical performances. Abbas and Attouch introduced in  a forward-backward dynamical system, by extending to more general optimization problems an approach proposed by Antipin in  and Bolte in  on a gradient-projected dynamical system associated to the minimization of a smooth convex function over a convex closed set. Implicit dynamical systems were considered also in  in the context of monotone inclusion problems. A dynamical system of forward-backward-forward type was considered in , while, a dynamical system of Douglas-Rachford type was recently introduced in .

It is important to notice that the approaches mentioned above have been introduced in connection with the study of “simple” monotone inclusion and convex minimization problems. They rely on straightforward splitting strategies and cannot be efficiently used when addressing structured minimization problems, like (1), which need to be addressed from a primal and a dual perspective, thus, require for tools and techniques from the convex duality theory. The dynamical approach we introduce and investigate in this paper is, to our knowledge, the first meant to address structured convex minimization problems in the spirit of the full splitting paradigm.

###### Remark 1.

The first inclusion in (4) can be equivalently written as

 0∈∂f(˙x(t)+x(t))+cA∗A(˙x(t)+x(t))+M1(t)˙x(t)−(cA∗z(t)−A∗y(t)−∇h(x(t))) ∀t∈[0,+∞), (5)

while the second one as

 0∈∂g(˙z(t)+z(t))+c(˙z(t)+z(t))−cA(γ˙x(t)+x(t))−y(t)+M2(t)˙z(t) ∀t∈[0,+∞). (6)

The explicit discretization of (5) with respect to the time variable and constant step yields the iterative scheme

 0∈1c∂f(xk+1)+A∗Axk+1+Mk1c(xk+1−xk)−A∗zk+A∗cyk+1c∇h(xk) ∀k≥0.

By convex subdifferential calculus, one can easily see that this can be for every equivalently written as

 0∈∂(f(x)+⟨x−xk,∇h(xk)⟩+c2∥∥∥Ax−zk+ykc∥∥∥2+12∥x−xk∥2Mk1)∣∣∣x=xk+1

and, further, as

 xk+1∈argminx∈H(f(x)+⟨x−xk,∇h(xk)⟩+c2∥∥∥Ax−zk+ykc∥∥∥2+12∥x−xk∥2Mk1).

Similarly, (6) leads for every to

 0∈∂(g(z)+c2∥∥∥A(γxk+1+(1−γ)xk)−z+ykc∥∥∥2+12∥z−zk∥2Mk2)∣∣∣z=zk+1,

which is nothing else than

Here, and are two operator sequences in and , respectively.

Thus the dynamical system (4) leads through explicit time discretization to a numerical algorithm, which, for a starting point , generates a sequence for every as follows

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩xk+1∈argminx∈H(f(x)+⟨x−xk,∇h(xk)⟩+c2∥∥∥Ax−zk+ykc∥∥∥2+12∥x−xk∥2Mk1)zk+1∈argminz∈G(g(z)+c2∥∥∥A(γxk+1+(1−γ)xk)−z+ykc∥∥∥2+12∥z−zk∥2Mk2)yk+1=yk+c(Axk+1−zk+1). (7)

The algorithm (7) is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.

Indeed, in the case when , (7) becomes the proximal ADMM algorithm with variable metrics from  (see, also, ). If, in addition, and the operator sequences and are constant, then (7) becomes the proximal ADMM algorithm investigated in [25, Section 3.2] (see, also, ). It is known that the proximal ADMM algorithm can be seen as a generalization of the full splitting primal-dual algorithms of Chambolle-Pock (see ) and Condat-Vu (see [19, 27]).

On the other hand, in the case when , (7) becomes an extension of the linearized proximal method of multipliers of Chen-Teboulle (see , [25, Algorithm 1]).

In the following remark we provide a particular choice for the linear maps and , which transforms (4) into a dynamical system of primal-dual type formulated in the spirit of the full splitting paradigm.

###### Remark 2.

For every , define

 M1(t)=1τ(t)I−cA∗A and M2(t)=0,

where is such that .

Let be fixed. In this particular setting, (5) is equivalent to

 (1τ(t)I−cA∗A)x(t)+cA∗z(t)−A∗y(t)−∇h(x(t))∈1τ(t)˙x(t)+1τ(t)x(t)+∂f(˙x(t)+x(t))

and further to

 ˙x(t)+x(t)=(I+τ(t)∂f)−1((I−cτ(t)A∗A)x(t)+cτ(t)A∗z(t)−τ(t)A∗y(t)−τ(t)∇h(x(t))).

In other words,

 ˙x(t)+x(t)=proxτ(t)f((I−cτ(t)A∗A)x(t)+cτ(t)A∗z(t)−τ(t)A∗y(t)−τ(t)∇h(x(t))),

where

 proxκ:H→H,proxκ(x)=argminy∈H{κ(y)+12∥x−y∥2}=(I+∂κ)−1(x),

denotes the proximal point operator of a proper, convex and lower semicontinuous function .

On the other hand, relation (6) is equivalent to

 ˙y(t)+y(t)+c(γ−1)A˙x(t)∈∂g(˙z(t)+z(t)),

hence,

 ˙z(t)+z(t)∈∂g∗(˙y(t)+y(t)+c(γ−1)A˙x(t)).

This is further equivalent to

 A(γ˙x(t)+x(t))+1cy(t)∈1c˙y(t)+1cy(t)+(γ−1)A˙x(t)+∂g∗(˙y(t)+y(t)+c(γ−1)A˙x(t))

and further to

 ˙y(t)+y(t)+c(γ−1)A˙x(t)=(I+c∂g∗)−1(cA(γ˙x(t)+x(t))+y(t)).

In other words,

 ˙y(t)+y(t)+c(γ−1)A˙x(t)=proxcg∗(cA(γ˙x(t)+x(t))+y(t)).

Consequently, in this particular setting, the dynamical system (4) can be equivalently written as

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩˙x(t)+x(t)=proxτ(t)f((I−cτ(t)A∗A)x(t)+cτ(t)A∗z(t)−τ(t)A∗y(t)−τ(t)∇h(x(t)))˙y(t)+y(t)+c(γ−1)A˙x(t)=proxcg∗(cA(γ˙x(t)+x(t))+y(t))˙y(t)=cA(x(t)+˙x(t))−c(z(t)+˙z(t))x(0)=x0∈H,z(0)=z0∈G,y(0)=y0∈G. (8)

Let us also mention that when and the dynamical system (8) reads

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩˙x(t)+x(t)=proxτ(t)f(x(t)−τ(t)A∗(y(t)+cAx(t)−cz(t)))˙y(t)+y(t)=proxcg∗(y(t)+cA(˙x(t)+x(t)))˙y(t)=cA(x(t)+˙x(t))−c(z(t)+˙z(t))x(0)=x0∈H,z(0)=z0∈G,y(0)=y0∈G. (9)

The explicit time discretization of (9) leads to a numerical algorithm, which, for a starting point , generates the sequence for every as follows

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩xk+1=proxτkf(xk−τkA∗(yk+cAxk−czk))yk+1=proxcg∗(yk+cAxk+1)yk+1=yk+c(Axk+1−zk+1). (10)

By substituting in the first equation of (10) the term by , which is allowed according to the last equation, one can easily see that (10) is equivalent to the following numerical algorithm, which, for a starting point , generates the sequence for every as follows

 ⎧⎪ ⎪⎨⎪ ⎪⎩xk+1=proxτkf(xk−τkA∗(2yk−yk−1))yk+1=proxcg∗(yk+cAxk+1). (11)

For for every , (11) is nothing else than the primal-dual algorithm proposed by Chambolle and Pock in . Figure 1: First row: the primal trajectory x(t) approaching the primal optimal solution (0,0) for τc=0.49 and starting point x0=(−10,10). Second row: the dual trajectory y(t) approaching a dual optimal solution for τc=0.49 and starting point y0=(−10,10).
###### Example 1.

In this example we will illustrate via some numerical experiments the way in which the parameters and may influence the asymptotic convergence of the primal and dual trajectories. In this scope, we considered the following primal optimization problem

 inf(x1,x2)∈R2∥x∥1+√(x1−x2)2+(x1+x2)2, (12)

which is in fact problem (1) written in the following particular setting: , , , , , for every , and , . One can easily see that is the unique optimal solution of (12) and that

 sup∥(y1,y2)∥2≤1,|y1+y2|≤1,|−y1+y2|≤10 (13)

is the Fenchel dual problem of (12). This means that every feasible element of (13) is a dual optimal solution. Figure 2: First row: the primal trajectory x(t) approaching the primal optimal solution (0,0) for τc=0.25 and starting point x0=(−10,10). Second row: the dual trajectory y(t) approaching a dual optimal solution for τc=0.25 and starting point y0=(−10,10).

We considered the dynamical system (8) attached to the primal-dual pair (12)-(13) with starting points , and in the case when for every is a constant function. In order to solve the resulting dynamical system we used the Matlab function ode15s and, to this end, we reformulated it as

 {˙U(t)=Γ(U(t))U(0)=(x0,y0,z0),

where

 U(t)=(x(t),y(t),z(t))∈H×G×G

and

 Γ:H×G×G→H×G×G, Γ(u1,u2,u3)=(u4,u5,u6),

is defined as

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩u4=proxτf(u1−τA∗(u2+cAu1−cu3))−u1u5=proxcg∗(u2+cA(γu4+u1))−u2−c(γ−1)Au4u6=A(u1+u4)−u3−1cu5.

Notice that

 A∗(x1,x2) =(x1+x2,−x1+x2)∀(x1,x2)∈R2, proxτf(x) =x−τproj[−1,1]2(1τx)∀x∈R2, proxcg∗(y) ={y,if ∥y∥2≤1,1∥y∥2y,otherwise,∀y∈R2,

where denotes the projection operator on a convex and closed set .

As we will see later in Theorem 12, the asymptotic convergence of the trajectories as the time tends to infinity can be proved when . Since , we considered for three different choices, namely, and . The primal and the dual trajectories generated by the dynamical system for each of these three choices are represented in the figures 1, 2 and 3, respectively. The first row of each figure represents the primal trajectories for and , while the second row represents the dual trajectories for the same choices of the parameter .

One can see that the parameter plays in the dynamical system a regularizing role. Namely, in all three figures, thus somehow independently of the choice of the parameters and , the convergence behaviour of the primal trajectories, which approach the unique primal optimal solution are more stable when gets closer to . For the dual trajectories we can observe a reverse phenomenon. Namely, in all three figures, thus also independently of the choice of the other parameters, the dual trajectories, which approach a dual optimal solution, are more stable when gets closer to . Figure 3: First row: the primal trajectory x(t) approaching the primal optimal solution (0,0) for τc=0.1 and starting point x0=(−10,10). Second row: the dual trajectory y(t) approaching a dual optimal solution for τc=0.1 and starting point y0=(−10,10).
###### Notations.

The following two functions will play an important role in particular in the forthcoming analysis

 F:[0,+∞)×H⟶¯¯¯¯R,F(t,x)=f(x)+c2(∥Ax∥2−∥x∥2)+12∥x∥2M1(t),

and

 G:[0,+∞)×G⟶¯¯¯¯R,G(t,z)=g(z)+12∥z∥2M2(t).

With these two notations, the dynamical system (4) can be rewritten as

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩˙x(t)+x(t)∈argminx∈H(F(t,x)+c2∥∥x−(1cM1(t)x(t)+A∗z(t)−A∗cy(t)−1c∇h(x(t)))∥∥2)˙z(t)+z(t)=argminz∈G(G(t,z)+c2∥∥z−(1cM2(t)z(t)+A(γ˙x(t)+x(t))+1cy(t))∥∥2)˙y(t)=cA(x(t)+˙x(t))−c(z(t)+˙z(t))x(0)=x0∈H,y(0)=y0∈G,z(0)=z0∈G. (14)

Let be fixed. The function is proper, convex and lower semicontinuous, hence is proper, strongly convex and lower semicontinuous for every . This allows us to use the sign equal in the second relation of (14). On the other hand, a sufficient condition which guarantees that the function , which is proper and lower semicontinuous, is strongly convex is that there exists such that . This actually ensures that is proper, strongly convex and lower semicontinuous for every .

This means that if the assumption

 (Cweak)for every t∈[0,+∞) there exists α(t)>0 such that cA∗A+M1(t)∈Pα(t)(H)

holds, then we can use also in the first relation of (14) the sign equal. It is easy to see, that, if holds, then is -strongly monotone for every . In other words, for every , all and all we have

 ⟨u∗−x∗,u−x⟩≥α(t)∥u−x∥2.

Notice that, since and for every , is fulfilled, if

 for every t∈[0,+∞) there exists α(t)>0 % such that M1(t)∈Pα(t)(H) (15)

or, if

 (16)

Notice also that, if is a finite dimensional Hilbert space, then (16), which is independent of , is nothing else than is positively definite or, equivalently, is injective.

Let be the unit sphere of . Assumption is fulfilled if and only if for every . In this case we can take for every .

## 2 Existence and uniqueness of the trajectories

In this section we will investigate the existence and uniqueness of the trajectories generated by (4). We start by recalling the definition of a locally absolutely continuous map.

###### Definition 1.

A function is said to be locally absolutely continuous, if it is absolutely continuous on every interval ; that is, for every there exists an integrable function such that

 x(t)=x(0)+∫t0y(s)ds  ∀t∈[0,T].
###### Remark 3.

(a) Every absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative by the above integration formula.

(b) Let be and an absolutely continuous function. This is equivalent to (see [6, 2]): for every there exists such that for any finite family of intervals the following property holds:

 for any subfamily of disjoint intervals Ij with ∑j|bj−aj|<η it holds ∑j∥x(bj)−x(aj)∥<ε.

From this characterization it is easy to see that, if is -Lipschitz continuous with , then the function is absolutely continuous, too. This means that is differentiable almost everywhere and holds almost everywhere.

The following definition specifies which type of solutions we consider in the analysis of the dynamical system (4).

###### Definition 2.

Let , , , and and . We say that the function is a strong global solutions of (4), if the following properties are satisfied:

1. the functions are locally absolutely continuous;

2. for almost every

 ˙x(t)+x(t) ∈(∂f+cA∗A+M1(t))−1(M1(t)x(t)+cA∗z(t)−A∗y(t)−∇h(x(t))), ˙z(t)+z(t) ˙y(t) =cA(x(t)+˙x(t))−c(z(t)+˙z(t));

The following results will be useful in the proof of the existence and uniqueness theorem.

###### Lemma 1.

Assume that holds. Then, for every fixed , the operator

 St:H⟶H,St(u)=argminx∈H(F(t,x)+c2∥x−u∥2),

is Lipschitz continuous.

###### Proof.

Let be fixed and . By subdifferential calculus we obtain that

 cu∈∂f(St(u))+(cA∗A+M1(t))(St(u))

and

 cv∈∂f(St(v))+(cA∗A+M1(t))(St(v)).

Using that, due to , is -strongly monotone, we get

 α(t)∥Stu−Stv∥2≤c⟨u−v<