The Proximal Alternating Minimization Algorithm for two-block separable convex optimization problems with linear constraints

# The Proximal Alternating Minimization Algorithm for two-block separable convex optimization problems with linear constraints

Abstract. The Alternating Minimization Algorithm (AMA) has been proposed by Tseng to solve convex programming problems with two-block separable linear constraints and objectives, whereby (at least) one of the components of the latter is assumed to be strongly convex. The fact that one of the subproblems to be solved within the iteration process of AMA does not usually correspond to the calculation of a proximal operator through a closed formula, affects the implementability of the algorithm. In this paper we allow in each block of the objective a further smooth convex function and propose a proximal version of AMA, called Proximal AMA, which is achieved by equipping the algorithm with proximal terms induced by variable metrics. For suitable choices of the latter, the solving of the two subproblems in the iterative scheme can be reduced to the computation of proximal operators. We investigate the convergence of the proposed algorithm in a real Hilbert space setting and illustrate its numerical performances on two applications in image processing and machine learning.

Key Words. Proximal AMA, Lagrangian, saddle points, subdifferential, convex optimization, Fenchel duality

AMS subject classification. 47H05, 65K05, 90C25

## 1 Introduction and preliminaries

The Alternating Minimization Algorithm (AMA) has been proposed by Tseng (see [16]) in order to solve optimization problems of the form

 infx∈Rn,z∈Rmf(x)+g(z), (1) s.t. Ax+Bz=b.

where is a proper, -strongly convex with (this means that is convex) and lower semicontinuous function, is a proper, convex and lower semicontinuous function, and .

For we consider the augmented Lagrangian associated with problem (1)

 Lc:Rn×Rm×Rr→¯¯¯¯R,Lc(x,z,p)=f(x)+g(z)+⟨p,b−Ax−Bz⟩+c2∥Ax+Bz−b∥2.

The Lagrangian associated with problem (1) is

 L:Rn×Rm×Rr→¯¯¯¯R,L(x,z,p)=f(x)+g(z)+⟨p,b−Ax−Bz⟩.

###### Algorithm 1.

(AMA) Choose and a sequence of stepsizes . For all set:

 xk =argminx∈Rn{f(x)−⟨pk,Ax⟩} (2) zk ∈argminz∈Rm{g(z)−⟨pk,Bz⟩+ck2∥Axk+Bz−b∥2} (3) pk+1 =pk+ck(b−Axk−Bzk). (4)

The main convergence properties of this numerical algorithm are summarized in the theorem below (see [16]).

###### Theorem 2.

Let and be such that . Assume that the sequence of stepsizes satisfies

 ϵ≤ck≤2γ∥A∥2−ϵ ∀k≥0,

where . Let be the sequence generated by Algorithm 1. Then there exist and an optimal Lagrange multiplier associated with the constraint such that

 xk→x∗,Bzk→b−Ax∗,pk→p∗(k→+∞).

If the function has bounded level sets, then is bounded and any of its cluster points provides with an optimal solution of (1).

The strong convexity of allows to reduce the minimization problem in (2) to the calculation of the proximal operator of a proper, convex and lower semicontinuous function. This is for the minimization problem in (3), due to the presence of the linear operator , in general not the case. This fact makes the AMA method not very tractable for implementation issues. With the exception of some very particular cases, one has to use a subroutine in order to compute , a fact which can have a negative influence on the convergence behaviour of the algorithm. One possibility to avoid this, without losing the convergence properties of AMA, is to replace (3) by a proximal step of . The papers [3] and [9] provide convincing evidences for the versatility and efficiency of proximal point algorithms for solving nonsmooth convex optimization problems.

In this paper we address in a real Hilbert space setting a problem of type (1), which is obtained by adding in each block of the objective a further smooth convex function. To solve this problem we propose a so-called Proximal Alternating Minimization Algorithm (Proximal AMA), which is obtained by inducing in each of the minimization problems (2) and (3) additional proximal terms defined by means of positively semidefinite operators. The two smooth convex functions in the objective are evaluated via gradient steps. We will show that, for appropriate choices of these operators, the minimization problem in (3) reduces to the performing of a proximal step. We perform the convergence analysis of the proposed method and show that the generated sequence converges weakly to a saddle point of the Lagrangian associated with the optimization problem under investigation.The numerical performances of Proximal AMA, in particular in comparison with AMA, are illustrated on two applications in image processing and machine learning.

A similarity of AMA to the classical ADMM algorithm, introduced by Gabay and Mercier in [12], is evident. In [10, 15] (see also [1, 5]) proximal versions of the ADMM algorithm have been proposed and investigated from the point of view of their convergence properties. Parts of the convergence analysis for Proximal AMA are carried out in a similar spirit to the convergence proofs in these papers.

In the remainder of this section, we discuss some notations, definitions and basic properties we will use in this paper (see [2]). Let and be real Hilbert spaces with corresponding inner products and associated norms . In both spaces we denote by the weak convergence and by the strong convergence.

We say that a function is proper, if and for all . Let be

 Γ(H)={f:H→¯¯¯¯R:f is proper,% convex and lower semicontinuous}.

Let be . The (Fenchel) conjugate function of is defined as

 f∗(p)=supx∈H{⟨p,x⟩−f(x)} ∀p∈H

and is a proper, convex and lower semicontinuous function. It also holds , where is the conjugate function of . The (convex) subdifferential of is defines as , if , and as , otherwise.

The infimal convolution of two proper functions is the function , defined by .

The proximal point operator of parameter of at , where , is defined as

 Proxγf:H→H,Proxγf(x)=argminy∈H{γf(y)+12∥y−x∥2}.

According to Moreau’s decomposition formula we have

 Proxγf(x)+γProx(1/γ)f∗(γ−1x)=x,  ∀x∈H.

Let be a convex and closed set. The strong quasi-relative interior of is

 sqri(C)={x∈C:∪λ>0λ(C−x) is a closed % linear subspace of H}.

We always have and, if is finite dimensional, then where denotes the relative interior of and represents the interior of relative to its affine hull.

We set

 S+(H)={M:H→H:M is linear, continuous% , self-adjoint and positive semidefinite}.

For we define the seminorm , . We consider the Loewner partial ordering on , defined for by

 M1≽M2⇔∥x∥2M1≥∥x∥2M2 ∀x∈H.

Furthermore, we define for

 Pα(H):={M∈S+(H):M≽αId},

where for all , denotes the identity operator on .

Let be a linear continuous operator. The operator , fulfilling for all and , denotes the adjoint operator of , while denotes the norm of .

## 2 The Proximal Alternating Minimization Algorithm

The two-block separable optimization problem we are going to investigate has the following formulation.

###### Problem 3.

Let , and be real Hilbert spaces, -strongly convex with , , a convex and Fréchet differentiable function with -Lipschitz continuous gradient, , a convex and Fréchet differentiable functions with -Lipschitz continuous gradient, , and linear continuous operators such that and . Consider the following optimization problem with two-block separable objective function and linear constraints

 minx∈H,z∈Gf(x)+h1(x)+g(z)+h2(z). (5) s.t. Ax+Bz=b

Notice that we allow the Lipschitz constant of the gradient of the function to be zero. In this case is an affine function. The same applies for the function .

The Lagrangian associated with the optimization problem (5) is

 L:H×G×K→¯¯¯¯R,L(x,z,p)=f(x)+h1(x)+g(z)+h2(z)+⟨p,b−Ax−Bz⟩.

We say that is a saddle point of the Lagrangian , if

 L(x∗,z∗,p)≤L(x∗,z∗,p∗)≤L(x,z,p∗)

holds for all .

One can show that is a saddle point of the Lagrangian if and only if is an optimal solution of (5), is an optimal solution of its Fenchel dual problem

 supλ∈K{−(f∗□h∗1)(A∗λ)−(g∗□h∗2)(B∗λ)+⟨λ,b⟩}, (6)

and the optimal objective values of (5) and (6) coincide. The existence of saddle points for is guaranteed when (5) has an optimal solution and, for instance, the Attouch-Brézis-type condition

 b∈sqri(A(domf)+B(domg)) (7)

holds (see [4, Theorem 3.4]). In the finite dimensional setting, this asks for the existence of and satisfying and coincides with the assumption used by Tseng in [16].

The system of optimality conditions for the primal-dual pair of optimization problems (5)-(6) reads:

 A∗p∗−∇h1(x∗)∈∂f(x∗), B∗p∗−∇h2(z∗)∈∂g(z∗)  and Ax∗+Bz∗=b. (8)

This means that if (5) has an optimal solution and a qualification condition, like for instance (7), is fulfilled, then there exists an optimal solution of (6) such that (8) holds, consequently, is a saddle point of the Lagrangian . Conversely, if is a saddle point of the Lagrangian , thus, satisfies relation (8), then is an optimal solution of (5) and is an optimal solution of (6).

###### Remark 4.

If and are two saddle points of the Lagrangian , then . This follows easily by using the strong monotonicity of , the monotonicity of and the relations in (8).

In the following we formulate the Proximal Alternating Minimization Algorithm to solve (5). To this end, we modify Tseng’s AMA by evaluating in each of the two subproblems the functions and via gradient steps, respectively, and by introducing proximal terms defined through two sequence of positively semidefinite operators and .

###### Algorithm 5.

(Proximal AMA) Let and . Choose and a sequence of stepsizes . For all set:

 xk+1 =argminx∈H{f(x)−⟨pk,Ax⟩+⟨x−xk,∇h1(xk)⟩+12∥x−xk∥2Mk1} (9) zk+1 ∈argminz∈G{g(z)−⟨pk,Bz⟩+ck2∥Axk+1+Bz−b∥2+⟨z−zk,∇h2(zk)⟩+12∥z−zk∥2Mk2} (10) pk+1 =pk+ck(b−Axk+1−Bzk+1). (11)
###### Remark 6.

The sequence is uniquely determined if there exists such that for all . This actually ensures that the objective function in the subproblem (10) is strongly convex.

###### Remark 7.

Let be fixed and , where and . Then is positively semidefinite and the update of in the Proximal AMA method becomes a proximal step. Indeed, (10) holds if and only if

 0∈∂g(zk+1)+(ckB∗B+Mk2)zk+1+ckB∗(Axk+1−b)−Mk2zk+∇h2(zk)−B∗pk

or, equivalently,

 0∈∂g(zk+1)+1σkzk+1−(1σkId−ckB∗B)zk+∇h2(zk)+ckB∗(Axk+1−b)−B∗pk.

But this is nothing else than

 zk+1 =argminz∈G{g(z)+12σk∥∥z−(zk−σk∇h2(zk)+σkckB∗(b−Axk+1−Bzk)+σkB∗pk)∥∥2} =Proxσkg(zk−σk∇h2(zk)+σkckB∗(b−Axk+1−Bzk)+σkB∗pk).

The convergence of the Proximal AMA method is addressed in the next theorem.

###### Theorem 8.

In the setting of Problem 3 let the set of the saddle points of the Lagrangian be nonempty. Assume that for all and that is a monotonically decreasing sequence satisfying

 ϵ≤ck≤2γ∥A∥2−ϵ∀k≥0, (12)

where . If one of the following assumptions:

1. there exists such that for all ;

2. there exists such that ;

holds true, then the sequence generated by Algorithm 5 converges weakly to a saddle point of the Lagrangian .

###### Proof.

Let be a fixed saddle point of the Lagrangian . This means that it fulfils the system of optimality conditions

 A∗p∗−∇h1(x∗)∈∂f(x∗) (13) B∗p∗−∇h2(z∗)∈∂g(z∗) (14) Ax∗+Bz∗=b (15)

We start by proving that

 ∑k≥0∥xk+1−x∗∥2<+∞,∑k≥0∥Bzk+1−Bz∗∥2<+∞,∑k≥0∥zk+1−zk∥2Mk2−L22Id<+∞

and that the sequences and are bounded.

Assume that and . Let be fixed. Writing the optimality conditions for the subproblems (9) and (10) we obtain

 A∗pk−∇h1(xk)+Mk1(xk−xk+1) ∈∂f(xk+1) (16)

and

 B∗pk−∇h2(zk)+ckB∗(−Axk+1−Bzk+1+b)+Mk2(zk−zk+1) ∈∂g(zk+1), (17)

respectively. Combining (13), (14), (16), (17) with the strong monotonicity of and the monotonicity of , it yields

 ⟨A∗(pk−p∗)−∇h1(xk)+∇h1(x∗)+Mk1(xk−xk+1),xk+1−x∗⟩≥γ∥xk+1−x∗∥2

and

 ⟨B∗(pk−p∗)−∇h2(zk)+∇h2(z∗)+ckB∗(−Axk+1−Bzk+1+b)+Mk2(zk−zk+1),zk+1−z∗⟩≥0,

 ⟨pk−p∗,Axk+1−Ax∗⟩+⟨pk−p∗,Bzk+1−Bz∗⟩ +⟨ck(−Axk+1−Bzk+1+b),Bzk+1−Bz∗⟩ −⟨∇h1(xk)−∇h1(x∗),xk+1−x∗⟩−⟨∇h2(zk)−∇h2(z∗),zk+1−z∗⟩ +⟨Mk1(xk−xk+1),xk+1−x∗⟩+⟨Mk2(zk−zk+1),zk+1−z∗⟩ ≥γ∥xk+1−x∗∥2. (18)

According to the Baillon-Haddad-Theorem (see [2, Corollary 18.16]) the gradients of and are and -cocoercive, respectively, thus

 ⟨∇h1(x∗)−∇h1(xk),x∗−xk⟩≥1L1∥∇h1(x∗)−∇h1(xk)∥2 ⟨∇h2(z∗)−∇h2(zk),z∗−zk⟩≥1L2∥∇h2(z∗)−∇h2(zk)∥2.

On the other hand, by taking into account (11) and (15), it holds:

 ⟨pk−p∗,Axk+1−Ax∗⟩+⟨pk−p∗,Bzk+1−Bz∗⟩ =⟨pk−p∗,Axk+1+Bzk+1−b⟩ =1ck⟨pk−p∗,pk−pk+1⟩

By employing the last three relations in (2), it yields

 1ck⟨pk−p∗,pk−pk+1⟩+ck⟨−Axk+1−Bzk+1+b,Bzk+1−Bz∗⟩ +⟨Mk1(xk−xk+1),xk+1−x∗⟩+⟨Mk2(zk−zk+1),zk+1−z∗⟩ +⟨∇h1(x∗)−∇h1(xk),xk+1−x∗⟩+⟨∇h1(x∗)−∇h1(xk),x∗−xk⟩ −1L1∥∇h1(x∗)−∇h1(xk)∥2+⟨∇h2(z∗)−∇h2(zk),zk+1−z∗⟩ +⟨∇h2(z∗)−∇h2(zk),z∗−zk⟩−1L2∥∇h2(z∗)−∇h2(zk)∥2 ≥γ∥xk+1−x∗∥2,

which, after expressing the inner products by means of norms, becomes

 12ck(∥pk−p∗∥2+∥pk−pk+1∥2−∥pk+1−p∗∥2) +ck2(∥Ax∗−Axk+1∥2−∥b−Axk+1−Bzk+1∥2−∥Ax∗+Bzk+1−b∥2) +12(∥xk−x∗∥2Mk1−∥xk−xk+1∥2Mk1−∥xk+1−x∗∥2Mk1) +12(∥zk−z∗∥2Mk2−∥zk−zk+1∥2Mk2−∥zk+1−z∗∥2Mk2) +⟨∇h1(x∗)−∇h1(xk),xk+1−xk⟩−1L1∥∇h1(x∗)−∇h1(xk)∥2 +⟨∇h2(z∗)−∇h2(zk),zk+1−zk⟩−1L2∥∇h2(z∗)−∇h2(zk)∥2 ≥γ∥xk+1−x∗∥2.

Using again (11), the inequality and the following expressions

 ⟨∇h1(x∗)−∇h1(xk),xk+1−xk⟩−1L1∥∇h1(x∗)−∇h1(xk)∥2 =−L1∥∥∥1L1(∇h1(x∗)−∇h1(xk))+12(xk−xk+1)∥∥∥2+L14∥xk−xk+1∥2,

and

 ⟨∇h2(x∗)−∇h2(zk),zk+1−zk⟩−1L2∥∇h2(z∗)−∇h2(zk)∥2 =−L2∥∥∥1L2(∇h2(z∗)−∇h2(zk))+12(zk−zk+1)∥∥∥2+L24∥zk−zk+1∥2,

it yields

 12ck∥pk+1−p∗∥2+12∥zk+1−z∗∥2Mk2 ≤ 12ck∥pk−p∗∥2+12∥zk−z∗∥2Mk2−ck2∥Ax∗+Bzk+1−b∥2 −12∥zk−zk+1∥2Mk2−(γ−ck2∥A∥2)∥xk+1−x∗∥2−12∥xk−xk+1∥2Mk1 −L1∥∥∥1L1(∇h1(x∗)−∇h1(xk))+12(xk−xk+1)∥∥∥2+L14∥xk−xk+1∥2 −L2∥∥∥1L2(∇h2(z∗)−∇h2(zk))+12(zk−zk+1)∥∥∥2+L24∥zk−zk+1∥2 .

Finally, by using the monotonicity of and of , we obtain

 ∥pk+1−p∗∥2+ck+1∥zk+1−z∗∥2Mk+12≤∥pk−p∗∥2+ck∥zk−z∗∥2Mk2−Rk, (19)

where

 Rk:= ck(2γ−ck∥A∥2)∥xk+1−x∗∥2+c2k∥Bzk+1−Bz∗∥2+ ck∥zk−zk+1∥2Mk2−L22Id+ck∥xk−xk+1∥2Mk1−L12Id+ 2ckL1∥∥∥1L1(∇h1(x∗)−∇h1(xk))+12(xk−xk+1)∥∥∥2+ 2ckL2∥∥∥1L2(∇h2(z∗)−∇h2(zk))+12(zk−zk+1)∥∥∥2.

If (and, consequently, is constant) and , then, by using the same arguments, we obtain again (19), but with

 Rk:= ck(2γ−ck∥A∥2)∥xk+1−x∗∥2+c2k∥Bzk+1−Bz∗∥2+ ck∥zk−zk+1∥2Mk2−L22Id+ck∥xk−xk+1∥2Mk1+ 2