Solving Coupled Composite Monotone Inclusions by Successive Fejér Approximations of Their Kuhn-Tucker SetReceived by the editors XXX XX, 2013; accepted for publication XXX XX, 2014; published electronically DATE. \URLmms/x-x/XXXX.html.

# Solving Coupled Composite Monotone Inclusions by Successive Fejér Approximations of Their Kuhn-Tucker Set††thanks: Received by the editors XXX XX, 2013; accepted for publication XXX XX, 2014; published electronically DATE. \Urlmms/x-x/XXXX.html.

Abdullah Alotaibi King Abdulaziz University, Department of Mathematics, Jeddah 21859, Saudi Arabia (aalotaibi@kau.edu.sa)    Patrick L. Combettes    Sorbonne Universités – UPMC Univ. Paris 06, UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris, France (plc@ljll.math.upmc.fr)    Naseer Shahzad King Abdulaziz University, Department of Mathematics, Jeddah 21859, Saudi Arabia (nshahzad@kau.edu.sa)
###### Abstract

We propose a new class of primal-dual Fejér monotone algorithms for solving systems of composite monotone inclusions. Our construction is inspired by a framework used by Eckstein and Svaiter for the basic problem of finding a zero of the sum of two monotone operators. At each iteration, points in the graph of the monotone operators present in the model are used to construct a half-space containing the Kuhn-Tucker set associated with the system. The primal-dual update is then obtained via a relaxed projection of the current iterate onto this half-space. An important feature that distinguishes the resulting splitting algorithms from existing ones is that they do not require prior knowledge of bounds on the linear operators involved or the inversion of linear operators.

d

uality, Fejér monotonicity, monotone inclusion, monotone operator, primal-dual algorithm, splitting algorithm

{AMS}

Primary 47H05; Secondary 65K05, 90C25, 94A08

## 1 Introduction

The first monotone operator splitting methods arose in the late 1970s and were motivated by applications in mechanics and partial differential equations [32, 35, 39]. In recent years, the field of monotone operator splitting algorithms has benefited from a new impetus, fueled by emerging application areas such as signal and image processing, statistics, optimal transport, machine learning, and domain decomposition methods [3, 5, 24, 36, 41, 43, 46]. Three main algorithms dominate the field explicitly or implicitly: the forward-backward method [38], the Douglas-Rachford method [37], and the forward-backward-forward method [47]. These methods were originally designed to solve inclusions of the type , where and are maximally monotone operators acting on a Hilbert space (via product space reformulations, they can also be extended to problems involving sums of more than 2 operators [9, 45]). Until recently, a significant challenge in the field was to design splitting techniques for inclusions involving linearly composed operators, say

 (1) 0∈Ax+L∗BLx,

where and are maximally monotone operators acting on Hilbert spaces and , respectively, and is a bounded linear operator from to . In the case when and are subdifferentials, say and , where and are lower semicontinuous convex functions satisfying a suitable constraint qualification, (1) corresponds to the minimization problem

 (2) minimize% x∈Hf(x)+g(Lx).

The Fenchel-Rockafellar dual of this problem is

 (3) % minimizev∗∈Gf∗(−L∗v∗)+g∗(v∗)

and the associated Kuhn-Tucker set is

 (4)

The importance of this set is discussed extensively in [44], notably in connection with the fact that Kuhn-Tucker points provide solutions to (2) and (3). To the best of our knowledge, the first splitting method for composite problems of the form (1) is that proposed in [16], which was developed around the following formulation.

###### Problem \thetheorem

Let and be real Hilbert spaces, and set . Let and be maximally monotone operators, and let be a bounded linear operator. Consider the inclusion problem

 (5) find¯¯¯x∈Hsuch that0∈A¯¯¯x+L∗BL¯¯¯x,

the dual problem

 (6) find¯¯¯v∗∈Gsuch that0∈−LA−1(−L∗¯¯¯v∗)+B−1¯¯¯v∗,

and the associated Kuhn-Tucker set

 (7) Z={(x,v∗)∈K ∣∣ −L∗v∗∈AxandLx∈B−1v∗}.

The problem is to find a point in . The sets of solutions to (5) and (6) are denoted by and , respectively.

The Kuhn-Tucker set (7) is a natural extension of (4) to general monotone operators. In [16], a point in was obtained by applying the forward-backward-forward method to a suitably decomposed inclusion in (the use of Douglas-Rachford splitting was also discussed there). Subsequently, the idea of using traditional splitting techniques to find Kuhn-Tucker points was further exploited in a variety of settings, e.g., [1, 12, 14, 23, 25, 26, 48]. Despite their broad range of applicability, existing splitting methods suffer from two shortcomings that precludes their use in certain settings. Thus, a shortcoming of splitting methods based on the forward-backward-forward [16, 25] or the forward-backward algorithms [2, 26, 48] is that they require knowledge of ; this is also true for the Douglas-Rachford-based method of [14]. On the other hand, a shortcoming of splitting methods based on the Douglas-Rachford [16, Remark 2.9] or Spingarn [1] algorithms is that they require the inversion of linear operators, as does [12, Algorithm 3]. In some applications, however, cannot be evaluated reliably and the inversion of linear operators is not numerically feasible. As will be seen in Section 4, this issue becomes particularly acute when dealing with systems of coupled monotone inclusions, which constitute the main motivation for our investigation.

Our objective is to devise a new class of algorithms for solving Problem 1 that alleviate the above-mentioned shortcomings of existing methods. Our approach is inspired by an original splitting framework proposed in [28] for solving the basic inclusion (see also [29] for the extension to the sum of several operators)

 (8) 0∈Ax+Bx.

The main idea of [28] is to use points in the graphs of and to construct a sequence of Fejér approximations to the so-called extended solution set

 (9) {(x,v∗)∈H⊕H ∣∣ −v∗∈Axandv∗∈Bx}

and to iterate by projection onto these successive approximations. This extended solution set is actually nothing but the specialization of the Kuhn-Tucker set (7) to the case when and . This construction led to novel splitting methods for solving (8) that do not seem to derive from the traditional methods mentioned above. In the present paper, we extend it significantly beyond (8) in order to design new primal-dual splitting algorithms for Problem 1.

The paper is organized as follows. Preliminary results are established in Section 2 and algorithms for solving Problem 1 are developed in Section 3. These results are then used in Section 4 to solve systems of composite monotone inclusions in duality.

Notation. The scalar product of a Hilbert space is denoted by and the associated norm by . The symbols and denote, respectively, weak and strong convergence, and denotes the identity operator. Let and be real Hilbert spaces, let be the power set of , and let . We denote by the range of , by the graph of , and by the inverse of , which is defined through its graph . The resolvent of is . We say that is monotone if

 (10) (∀(x,u)∈\rm graA)(∀(y,v)∈\rm graA)⟨x−y∣u−v⟩⩾0,

and maximally monotone if there does not exist any monotone operator such that . In this case, is firmly nonexpansive and defined everywhere on . The Hilbert direct sum of and is denoted by . The projection operator onto a nonempty closed convex subset of is denoted by . The necessary background on convex analysis and monotone operators will be found in [9].

## 2 Preliminary results

We first investigate some basic properties of Problem 1, starting with the fact that Kuhn-Tucker points automatically provide primal and dual solutions.

{proposition}

In the setting of Problem 1, the following hold:

1. is a closed convex subset of .

2. .

{proof}

A fundamental concept in algorithmic nonlinear analysis is that of Fejér monotonicity: a sequence in a Hilbert space is said to be Fejér monotone with respect to a set if

 (11) (∀z∈C)(∀n∈N)∥xn+1−z∥⩽∥xn−z∥.

Alternatively (see [8, Section 2]), is Fejér monotone with respect to if, for every , is a relaxed projection of onto a closed affine half-space containing , i.e.,

 (12) (∀n∈N)xn+1=xn+λn(PHnxn−xn),% where0⩽λn⩽2andC⊂Hn.

The half-spaces in (12) are called Fejér approximations to . The Fejér monotonicity property (11) makes it possible to greatly simplify the analysis of the asymptotic behavior of a broad class of algorithms; see [7, 9, 21, 22, 30, 31] for background, examples, and historical notes.

In the following proposition, we consider the problem of constructing a Fejér approximation to the Kuhn-Tucker set (7).

{proposition}

In the setting of Problem 1, for every and , set

 (13) Ha,b={x∈K ∣∣ ⟨x∣s∗a,b⟩⩽ηa,b},where{s∗a,b=(a∗+L∗b∗,b−La)ηa,b=⟨a∣a∗⟩+⟨b∣b∗⟩.

Then the following hold:

1. Let and . Then .

2. Let and . Then  .

3.  .

4. Let , , and . Set , , and ; if , set . Then

 (14) PHa,b(x,v∗)={(x−(Δ/σ)s∗,v∗−(Δ/σ)t),ifσ>0andΔ>0;(x,v∗),otherwise.
{proof}

(i): Suppose that . Then and . Hence, (7) implies that . In addition,

 (15) ηa,b=⟨a∣a∗⟩+⟨b∣b∗⟩=⟨a∣−L∗b∗⟩+⟨La∣b∗⟩=−⟨La∣b∗⟩+⟨La∣b∗⟩=0

and therefore . Conversely, and .

(ii): Suppose that . Then and, by monotonicity of ,

 (16) ⟨a−x∣a∗+L∗v∗⟩⩾0.

Likewise, since , we have

 (17) ⟨b−Lx∣b∗−v∗⟩⩾0.

Using (16) and (17), we obtain

 ⟨x∣s∗a,b⟩ =⟨x∣a∗+L∗b∗⟩+⟨b−La∣v∗⟩ =⟨x∣a∗+L∗v∗⟩+⟨Lx∣b∗−v∗⟩+⟨b−Lx∣v∗⟩+⟨x−a∣L∗v∗⟩ =⟨x−a∣a∗+L∗v∗⟩+⟨a∣a∗⟩+⟨La∣v∗⟩ +⟨Lx−b∣b∗−v∗⟩+⟨b∣b∗⟩−⟨b∣v∗⟩+⟨b−Lx∣v∗⟩+⟨x−a∣L∗v∗⟩ ⩽⟨a∣a∗⟩+⟨La−b∣v∗⟩+⟨b∣b∗⟩+⟨b−Lx∣v∗⟩+⟨x−a∣L∗v∗⟩ =⟨a∣a∗⟩+⟨b∣b∗⟩ (18) =ηa,b.

Thus, .

(iii): By (ii), . Conversely, fix and , and let . Then and therefore

 ⟨(a,b∗)−(x,v∗)∣(a∗,b)−(−L∗v∗,Lx)⟩ =⟨(a−x,b∗−v∗)∣(a∗+L∗v∗,b−Lx)⟩ =⟨a−x∣a∗+L∗v∗⟩+⟨b−Lx∣b∗−v∗⟩ =ηa,b−⟨x∣s∗a,b⟩ (19) ⩾0.

Now set . Then, since is an arbitrary point in and since [9, Propositions 20.22 and 20.23] imply that is maximally monotone, we derive from (2) that , i.e., that .

(iv): Let . As seen in (i), if , then and . Hence and . Otherwise, it follows from [9, Example 28.16] that

 (20) PHa,bx=⎧⎪ ⎪⎨⎪ ⎪⎩x−⟨x∣s∗a,b⟩−ηa,b∥s∗a,b∥2s∗a,b,if⟨x∣s∗a,b⟩>ηa,b;x,otherwise.

In view of (13), the proof is complete.

###### Remark \thetheorem
1. The fact that is closed and convex (Proposition 2(i)) is also apparent in Proposition 2(iii), which exhibits as an intersection of closed affine half-spaces.

2. The inclusion (Proposition 2(i)) will play a key role in the paper. This construction is inspired by that of [28, Lemma 3], where and .

Our analysis will require the following asymptotic principle, which is of interest in its own right.

{proposition}

In the setting of Problem 1, let be a sequence in , let be a sequence in , and let . Suppose that , , , and . Then and . {proof} Define

 (21) V={(x,y)∈K ∣∣ Lx=y}.

Then

 (22) V⊥={(u∗,v∗)∈K ∣∣ u∗=−L∗v∗}.

Now set

 (23) A:K→2K:(x,y)↦Ax×By.

We deduce from (7) that, for every ,

 (24) (x,v∗)∈Z ⇔ {(x,−L∗v∗)∈\rm graA(Lx,v∗)∈\rm graB ⇔ (x,u)=((x,Lx),(−L∗v∗,v∗))∈(V×V⊥)∩\rm graA.

On the other hand, [1, Lemma 3.1] asserts that

 (25) (∀(x,y)∈K){PV(x,y)=((Id+L∗L)−1(x+L∗y),L(Id+L∗L)−1(x+L∗y))PV⊥(x,y)=(L∗(Id+LL∗)−1(Lx−y),−(Id+LL∗)−1(Lx−y)).

Now set

 (26) ¯¯¯x=(¯¯¯x,L¯¯¯x),¯¯¯u=(−L∗¯¯¯v∗,¯¯¯v∗),and(∀n∈N){xn=(an,bn)un=(a∗n,b∗n).

Since and , we derive from (25) that and . Altogether, since and are weakly continuous, the assumptions yield

 (27) ⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩(∀n∈N)(xn,un)∈\rm graAxn⇀¯¯¯xun⇀¯¯¯uPV⊥xn→0PVun→0.

However, (27) and [9, Proposition 25.3] imply that

 (28) ⟨xn∣un⟩→0% and(¯¯¯x,¯¯¯u)∈(V×V⊥)∩\rm graA.

In view of (24), the proof is complete.

###### Remark \thetheorem

In the special case when and , Proposition 2 reduces to [6, Corollary 3] (see also [9, Corollary 25.5] for an alternate proof), where . The decomposition , where is as in (21), is used in [1] in a different context.

## 3 Finding Kuhn-Tucker points by Fejér approximations

In view of Proposition 2(i), Problem 1 reduces to finding a point in a nonempty closed convex subset of a Hilbert space. This can be achieved via the following generic Fejér-monotone algorithm.

{proposition}

[22] Let be a real Hilbert space, let be a nonempty closed convex subset of , and let . Iterate

 (29) forn=0,1,…⎢⎢ ⎢ ⎢⎣Hn is a closed affine % half-space such that C⊂Hnλn∈]0,2[xn+1=xn+λn(PHnxn−xn).

Then the following hold:

1. is Fejér monotone with respect to : .

2. .

3. Suppose that, for every and every strictly increasing sequence in , . Then converges weakly to a point in .

We now derive from the above convergence principle a conceptual primal-dual splitting framework.

{proposition}

Consider the setting of Problem 1. Suppose that , let , let , and iterate

 (30) forn=0,1,…⎢⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣(an,a∗n)∈\rm graA(bn,b∗n)∈\rm graBs∗n=a∗n+L∗b∗ntn=bn−Lanσn=√∥s∗n∥2+∥tn∥2ifσn=0⎢⎢ ⎢ ⎢⎣¯¯¯x=an¯¯¯v∗=b∗nterminate.ifσn>0⎢⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣λn∈]0,2[Δn=\rm max{0,(⟨xn∣s∗n⟩+⟨tn∣v∗n⟩−⟨an∣a∗n⟩−⟨bn∣b∗n⟩)/σn}θn=λnΔn/σnxn+1=xn−θns∗nv∗n+1=v∗n−θntn.

Then either (30) terminates at a solution in a finite number of iterations or it generates infinite sequences and such that the following hold:

1. is Fejér monotone with respect to .

2. .

3. Suppose that for every , every , and every strictly increasing sequence in ,

 (31) [xkn⇀xandv∗kn⇀v∗]⇒(x,v∗)∈Z.

Then converges weakly to a point , converges weakly to a point , and .

{proof}

We first observe that, by Proposition 2, is nonempty, closed, and convex. Two alternatives are possible. First, suppose that, for some , . Then Proposition 2(i) asserts that the algorithm terminates at . Now suppose that . For every , set

 (32) xn=(xn,v∗n),s∗n=(s∗n,tn),andηn=⟨an∣a∗n⟩+⟨bn∣b∗n⟩,

and define

 (33)

Then we derive from (30) and Proposition 2(ii) that . On the other hand, Proposition 2(iv) implies that

 (34) (∀n∈N)Δn=∥PHnxn−xn∥andxn+1=xn+λn(PHnxn−xn).

Thus, the conclusions follow from Proposition 2(i) and Proposition 3.

At the th iteration of algorithm (30), one picks the quadruple in . In the following corollary, this quadruple is taken in a more restricted set adapted to the current primal-dual iterate , which leads to more explicit convergence conditions.

{corollary}

Consider the setting of Problem 1. Suppose that , let , let , let , and let . For every , set

 (35) Gα(x,v∗)={(a,b,a∗,b∗)∈K×K∣∣(a,a∗)∈\rm graA,(b,b∗)∈\rm graB,and⟨x−a∣a∗+L∗v∗⟩+⟨Lx−b∣b∗−v∗⟩⩾α(∥a∗+L∗b∗∥2+∥La−b∥2)}.

Iterate

 (36) forn=0,1,…⎢⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣(an,bn,a∗n,b∗n)∈Gα(xn,v∗n)s∗n=a∗n+L∗b∗ntn=bn−Lanτn=∥s∗n∥2+∥tn∥2ifτn=0⎢⎢ ⎢ ⎢⎣¯¯¯x=an¯¯¯v∗=b∗nterminate.ifτn>0⎢⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣λn∈[ε,2−ε]θn=λn(⟨xn∣s∗n⟩+⟨tn∣v∗n⟩−⟨an∣a∗n⟩−⟨bn∣b∗n⟩)/τnxn+1=xn−θns∗nv∗n+1=v∗n−θntn.

Then either (36) terminates at a solution in a finite number of iterations or it generates infinite sequences and such that the following hold:

1. and .

2. and .

3. Suppose that

 (37) xn−an⇀0andv∗n−b∗n⇀0.

Then converges weakly to a point , converges weakly to a point , and .

{proof}

This corollary is an application of Proposition 3. To see this, let . First, to show that the algorithm is well defined, we must prove that . Since , it follows from Proposition 2(ii) that . Now let , and set and . Then (7) yields and . Moreover,

 ⟨x−a∣a∗+L∗v∗⟩+⟨Lx−b∣b∗−v∗⟩ =−⟨x−a∣L∗(b∗−v∗)⟩+⟨L(x−a)∣b∗−v∗⟩ =0 (38) =α(∥a∗+L∗b∗∥2+∥La−b∥2).

Hence and (36) is well-defined. Next, to show that (36) is a special case of (30) it is enough to consider the case when . Note that (36) yields

 (∀n∈N)⟨xn∣s∗n⟩+⟨tn∣v∗n⟩−⟨an∣a∗n⟩−⟨bn∣b∗n⟩ =⟨xn−an∣a∗n+L∗v∗n⟩+⟨Lxn−bn∣b∗n−v∗n⟩ ⩾α(∥a∗n+L∗b∗n∥2+∥Lan−bn∥2) =ατn (39) >0.

In turn, if we define as in (30), we obtain

 (40) (∀n∈N)Δn=⟨xn∣s∗n⟩+⟨tn∣v∗n⟩−⟨an∣a∗n⟩−⟨bn∣b∗n⟩√τn⩾α√τn>0.

Hence (36) is a special case of (30). Moreover, it follows from (40) and Proposition 3(ii) that

 (41) ∑n∈N(∥s∗n∥2+∥tn∥2)=∑n∈Nτn⩽1α2∑n∈NΔ2n⩽1(αε)2∑n∈Nλn(2−λn)Δ2n<+∞,

which establishes (i). On the other hand, (ii) results from (36) and (41) since

 ∑n∈N(∥xn+1−xn∥2+∥v∗n+1−v∗n∥2) =∑n∈Nθ2nτn =∑n∈Nλ2nΔ2n ⩽(2−ε)2∑n∈NΔ2n (42) <+∞.

Finally, to prove (iii), it remains to check (31). Take , , and a strictly increasing sequence in such that and . Then it follows from (37) and (i) that

 (43) akn⇀x,b∗kn⇀v∗,a∗kn+L∗b∗kn→0,andLakn−bkn→0,

and from (36) that and . We therefore appeal to Proposition 2 to conclude that .

###### Remark \thetheorem

In the special case when and , Corollary 3(iii) was established in [28, Proposition 2] under the following additional assumptions: is maximally monotone or is finite-dimensional, , and .

Corollary 3 is conceptual in that it does not specify a rule for selecting the quadruple in at iteration . We now provide an example of a concrete selection rule.

{proposition}

Consider the setting of Problem 1. Suppose that , let