Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

# Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

## Abstract

This paper considers the relaxed Peaceman-Rachford (PR) splitting method for finding an approximate solution of a monotone inclusion whose underlying operator consists of the sum of two maximal strongly monotone operators. Using general results obtained in the setting of a non-Euclidean hybrid proximal extragradient framework, we extend a previous convergence result on the iterates generated by the relaxed PR splitting method, as well as establish new pointwise and ergodic convergence rate results for the method whenever an associated relaxation parameter is within a certain interval. An example is also discussed to demonstrate that the iterates may not converge when the relaxation parameter is outside this interval.

## 1 Introduction

In this paper, we consider the relaxed Peaceman-Rachford (PR) splitting method for solving the monotone inclusion

 0∈(A+B)(u) (1)

where and are maximal -strongly monotone (point-to-set) operators for some (with the convention that -strongly monotone means simply monotone, and -strongly monotone with means strongly monotone in the usual sense). Recall that the relaxed PR splitting method is given by

 xk=xk−1+θ(JB(2JA(xk−1)−xk−1)−JA(xk−1)), (2)

where is a fixed relaxation parameter and . The special case of the relaxed PR splitting method in which is known as the Peaceman-Rachford (PR) splitting method and the one with is the widely-studied Douglas-Rachford (DR) splitting method. Convergence results for them are studied for example in [1, 2, 3, 4, 8, 13, 14, 22].

The analysis of the relaxed PR splitting method for the case in which has been undertaken in a number of papers which are discussed in this paragraph. Convergence of the sequence of iterates generated by the relaxed PR splitting method is well-known when (see for example [1, 7, 14]) and, according to [16], its limiting behavior for the case in which is not known. We actually show in Subsection 5.2 that the sequence (2) does not necessarily converge when . An (strong) pointwise convergence rate result is established in [18] for the relaxed PR splitting method when . Moreover, when and where and are proper lower semi-continuous convex functions, papers [9, 10, 11] derive strong pointwise (resp., ergodic) convergence rate bounds for the relaxed PR method when (resp., ) under different assumptions on the functions. Assuming only -strong monotonicity of , where , some smoothness property on , and maximal monotonicity of , [16] shows that the relaxed PR splitting method has linear convergence rate for for some . Linear rate of convergence of the relaxed PR splitting method and its two special cases, namely, the DR splitting and PR splitting methods, are established in [2, 3, 4, 11, 15, 16, 22] under relatively strong assumptions on and/or (see also Table 2).

This paper assumes that , and hence its analysis applies to the case in which both and are monotone () and the case in which both and are strongly monotone (). This paragraph discusses papers dealing with the latter case. Paper [12] establishes convergence of the sequence generated by the relaxed PR splitting method for any and, under some strong assumptions on and , establishes its linear convergence rate. We complement the convergence results in [12] by showing that for , the sequence of iterates generated by the relaxed PR splitting method also converge, and describe an instance showing its nonconvergence when . Moreover, we establish strong pointwise and ergodic convergence rate results (Theorems 4.6 and 4.8) for the relaxed PR splitting method when and , respectively.

Finally, by imposing strong assumptions requiring one of the operators to be strong monotone and one of them to be Lipschitz (and hence point-to-point), [11, 15, 16] establish linear convergence rate of the relaxed PR splitting method. As opposed to these papers, the assumptions in [12] and this paper do not imply the operators or to be point-to-point.

Our analysis of the relaxed PR splitting method for solving (1) is based on viewing it as an inexact proximal point method, more specifically, as an instance of a non-Euclidean hybrid proximal extragradient (HPE) framework for solving the monotone inclusion problem. The proximal point method, proposed by Rockafellar [28], is a classical iterative scheme for solving the latter problem. Paper [29] introduces an Euclidean version of the HPE framework which is an inexact version of the proximal point method based on a certain relative error criterion. Iteration-complexities of the latter framework are established in [24] (see also [25]). Generalizations of the HPE framework to the non-Euclidean setting are studied in [17, 21, 30]. Applications of the HPE framework can be found for example in [19, 20, 25, 24].

This paper is organized as follows. Section 2 describes basic concepts and notation used in the paper. Section 3 discusses the non-Euclidean HPE framework which is used to the study the convergence properties of the relaxed PR splitting method in Sections 4 and 5. Section 4 derives convergence rate bounds for the relaxed Peaceman-Rachford (PR) splitting method. Section 5, which consists of two subsections, discusses a convergence result of the relaxed PR splitting method in the first subsection and provides an example showing that its iterates may not converge when in the second subsection. Finally, Section 6 discusses the numerical performance of the relaxed PR splitting method for solving the weighted Lasso minimization problem. Section 7 gives some concluding remarks.

## 2 Basic concepts and notation

This section presents some definitions, notation and terminology which will be used in the paper.

We denote the set of real numbers by and the set of non-negative real numbers by . Let and be functions with the same domain and whose values are in . We write that if there exists constant such that . Also, we write if and .

Let be a finite-dimensional real vector space with inner product denoted by (an example of is endowed with the standard inner product) and let denote an arbitrary seminorm in . Its dual (extended) seminorm, denoted by , is defined as . It is easy to see that

 ⟨z,v⟩≤∥z∥∥v∥∗∀z,v∈Z. (3)

The following straightforward result states some basic properties of the dual seminorm associated with a matrix seminorm. Its proof can be found for example in Lemma A.1(b) of [23].

###### Proposition 2.1

Let be a self-adjoint positive semidefinite linear operator and consider the seminorm in given by for every . Then, and for every .

Given a set-valued operator , its domain is denoted by and its inverse operator is given by . The graph of is defined by . The operator is said to be monotone if

 ⟨z−z′,t−t′⟩≥0∀(z,t),(z′,t′)∈Gr(T).

Moreover, is maximal monotone if it is monotone and, additionally, if is a monotone operator such that for every , then . The sum of two set-valued operators is defined by for every . Given a scalar , the -enlargement of a monotone operator is defined as

 T[ε](z):={t∈Z:⟨t−t′,z−z′⟩≥−ε,∀z′∈Z,∀t′∈T(z′)}∀z∈Z. (4)

## 3 A non-Euclidean hybrid proximal extragradient framework

This section discusses the non-Euclidean hybrid proximal extragradient (NE-HPE) framework and describes its associated convergence and iteration complexity results. The results of the section will be used in Sections 4 and 5 to study the convergence and iteration complexity properties of the relaxed PR splitting method (2). It contains two subsections. The first one describes a class of distance generating functions introduced in [17] and derives some of its basic properties. The second one describes the NE-HPE framework and its corresponding convergence and iteration complexity results.

### 3.1 A class of distance generating functions

We start by introducing a class of distance generating functions (and its corresponding Bregman distances) which is needed for the presentation of the NE-HPE framework in Subsection 3.2.

###### Definition 3.1

For a given convex set , a seminorm in and scalars , we let denote the class of real-valued functions which are differentiable on and satisfy

 w(z′)−w(z)−⟨∇w(z),z′−z⟩≥m2∥z−z′∥2∀z,z′∈Z, (5) ∥∇w(z)−∇w(z′)∥∗≤M∥z−z′∥∀z,z′∈Z. (6)

A function is referred to as a distance generating function with respect to the seminorm and its associated Bregman distance is defined as

 (dw)(z′;z)=(dw)z(z′):=w(z′)−w(z)−⟨∇w(z),z′−z⟩∀z,z′∈Z. (7)

Throughout our presentation, we use the second notation instead of the first one although the latter one makes it clear that is a function of two arguments, namely, and . Clearly, it follows from (5) that is a convex function on which is in fact -strongly convex on whenever is a norm.

The following simple result summarizes the main identities about the Bregman distance .

###### Lemma 3.2

For some convex set and scalars , let be given. Then, the following identities hold for every :

 ∇(dw)z(z′) =−∇(dw)z′(z)=∇w(z′)−∇w(z), (8) (dw)v(z′)−(dw)v(z) =⟨∇(dw)v(z),z′−z⟩+(dw)z(z′),∀v∈Z (9) m2∥z−z′∥2 ≤(dw)z(z′)≤M2∥z−z′∥2, (10) ∥∇(dw)z′(z)∥2∗ ≤2M2mmin{(dw)z(z′),(dw)z′(z)}; (11)

Proof: Identities (8) and (9) follow straightforwardly from the definition of the Bregman distance in (7). The first inequality in (10) follows easily from (5) and the definition of in (7). The second inequality in (10) follows from (3), (6), the definition of in (7), and the identity

 w(z′)−w(z)=∫10⟨∇w(z+t(z′−z)),z′−z⟩dtz,z′∈Z.

It is easy to see that (11) immediately follows from (6), (8) and (10).

Note that if the seminorm in Definition 3.1 is a norm, then (5) implies that is strongly convex on , in which case the corresponding is said to be nondegenerate on . However, since Definition 3.1 does not necessarily assume that is a norm, it admits the possibility of being not strongly convex on , or equivalently, being degenerate on .

The following result gives some useful properties of distance generating functions.

###### Lemma 3.3

For some convex set and scalars , let be given. Then, for every and , we have

 (dw)z0(zl)≤lMml∑i=1min{(dw)zi−1(zi),(dw)zi(zi−1)}. (12)

Proof: By (10), the triangle inequality for norms and the fact that the -norm of an -vector is bounded by times its -norm, we have

 (dw)z0(zl) ≤M2∥zl−z0∥2≤M2(l∑i=1∥zi−zi−1∥)2≤lM2l∑i=1∥zi−zi−1∥2

which clearly implies (12) due to the first inequality in (10).

### 3.2 The NE-HPE framework

This subsection describes the NE-HPE framework and its corresponding convergence and iteration complexity results.

Throughout this subsection, we assume that scalars , convex set , seminorm and distance generating function with respect to are given. Our problem of interest in this section is the MIP

 0∈T(z) (13)

where is a maximal monotone operator satisfying the following conditions:

• ;

• the solution set of (13) is nonempty.

We now state a non-Euclidean HPE (NE-HPE) framework for solving the MIP (13) which generalizes its Euclidean counterparts studied in the literature (see for example in [24, 26, 29]).

Framework 1 (An NE-HPE framework for solving (13)). Let and be given, and set ; choose and find such that (14) (15) set and go to step 1. end

We now make some remarks about Framework 1. First, it does not specify how to find and satisfying (14) and (15). The particular scheme for computing and will depend on the instance of the framework under consideration and the properties of the operator . Second, if is strongly convex on and , then (15) implies that and for every , and hence that in view of (14). Therefore, the HPE error conditions (14)-(15) can be viewed as a relaxation of an iteration of the exact non-Euclidean proximal point method, namely,

 0∈1λk∇(dw)zk−1(zk)+T(zk).

We observe that NE-HPE frameworks have already been studied in [17], [21] and [30]. The approach presented in this section differs from these three papers as follows. Assuming that is an open convex set, is continuously differentiable on and continuous on its closure, [30] studies a special case of the NE-HPE framework in which for every , and presents results on convergence of sequences rather than iteration complexity. Paper [21] deals with distance generating functions which do not necessarily satisfy conditions (5) and (6), and as consequence, obtains results which are more limited in scope, i.e., only an ergodic convergence rate result is obtained for operators with bounded feasible domains (or, more generally, for the case in which the sequence generated by the HPE framwework is bounded). Paper [17] introduces the class of distance generating functions but only analyzes the behavior of a HPE framework for solving inclusions whose operators are strongly monotone with respect to a fixed (see condition A1 in Section 2 of [17]). This section on the other hand assumes that but it does assume any strong monotonicity of with respect to .

Before presenting the main results about the the NE-HPE framework, namely, Theorems 3.8 and 3.9 establishing its pointwise and ergodic iteration complexities, respectively, and Propositions 3.10 and 3.11 showing that and/or approach in terms of the Bregman distance , we first establish a few preliminary technical results.

###### Lemma 3.4

For every and , we have:

 (dw)zk−1(z)−(dw)zk(z) =(dw)zk−1(~zk)−(dw)zk(~zk)+λk⟨rk,~zk−z⟩; (16) (dw)zk−1(z)−(dw)zk(z) ≥(1−σ)(dw)zk−1(~zk)+λk(⟨rk,~zk−z⟩+εk); (17) (dw)z0(z)−(dw)zk(z) ≥(1−σ)k∑i=1(dw)zi−1(~zi)+k∑i=1λi[⟨ri,~zi−z⟩+εi]. (18)

Proof: Using (9) twice and the definition of in (14), we conclude that

 (dw)zk−1(z)−(dw)zk(z) =(dw)zk−1(zk)+⟨∇(dw)zk−1(zk),z−zk⟩ =(dw)zk−1(zk)+⟨∇(dw)zk−1(zk),~zk−zk⟩+⟨∇(dw)zk−1(zk),z−~zk⟩ =(dw)zk−1(~zk)−(dw)zk(~zk)+⟨∇(dw)zk−1(zk),z−~zk⟩ =(dw)zk−1(~zk)−(dw)zk(~zk)+λk⟨rk,~zk−z⟩,

and hence that (16) holds. Inequality (17) follows immediately from (16) and (15). Moreover, (18) follows by adding (17) from to .

###### Proposition 3.5

For every and , we have

 (dw)zk−1(z∗)−(dw)zk(z∗)−(1−σ)(dw)zk−1(~zk)≥λk[⟨rk,~zk−z∗⟩+εk]≥0. (19)

As a consequence, the following statements hold:

• is non-increasing;

• ;

• .

Proof: Let be given. The first inequality in (19) follows from (17) with and the last inequality in (19) follows from the fact that and , and the definition of . Finally, statements (a) and (b) follow immediately from (19) while (c) follows by adding (19) over and using the fact that for every .

For the purpose of stating the convergence rate results below, define

 (dw)0:=inf{(dw)z0(z∗):z∗∈T−1(0)}. (20)
###### Lemma 3.6

For every , define

 θi:=max{λ2i∥ri∥2∗τ2(1+√σ)2,λiεiσ}  %where  τ:=√2M√m. (21)

Then, .

Proof: For every , it follows from (14), (8), (11), (15), the triangle inequality for norms and the above definition of , that

 λi∥ri∥∗ =∥∇(dw)zi(zi−1)∥∗=∥∇(dw)zi−1(~zi)−∇(dw)zi(~zi)∥∗ ≤∥∇(dw)zi−1(~zi)∥∗+∥∇(dw)zi(~zi)∥∗≤τ[(dw)zi−1(~zi)1/2+(dw)zi(~zi)1/2] ≤τ(1+√σ)(dw)zi−1(~zi)1/2.

The last inequality, (15) and the definition of then imply that for every . Hence, if , it follows that

 (1−σ)k∑i=1θi≤(1−σ)k∑i=1(dw)zi−1(~zi)≤(dw)z0(z∗)

where the last inequality follows from Proposition 3.5(c). The lemma now follows from the latter relation and the definition of in (20).

###### Lemma 3.7

Let be as in (20) and be as in (21), and assume that . Then, for every and every , there exists an such that

 ∥ri∥∗≤τ(1+√σ)  ⎷(dw)01−σ⎛⎝λα−2i∑kj=1λαj⎞⎠,εi≤σ(dw)01−σ⎛⎝λα−1i∑kj=1λαj⎞⎠. (22)

Proof: It follows from Lemma 3.6 that

 (dw)01−σ≥k∑i=1θi=k∑i=1θiλαiλαi≥(mini=1,…,kθiλαi)(k∑i=1λαi)

which, in view of the definition of in (21), can be easily seen to be equivalent to the conclusion of the lemma.

The following pointwise convergence rate result describes the convergence rate of the sequence of residual pairs associated to the sequence . Note that its convergence rate bounds are derived on the best residual pair among for rather than on the last residual pair .

###### Theorem 3.8

(Pointwise convergence) Let be as in (20) and be as in (21), and assume that . Then, the following statements hold:

• if , then for every there exists such that

 ∥ri∥∗≤τ(1+√σ)  ⎷(dw)01−σ⎛⎝λ––−1∑kj=1λj⎞⎠≤τ(1+√σ)λ––√k√(dw)01−σ
 εi≤σ(dw)01−σ1∑ki=1λi≤σ(dw)0(1−σ)λ––k;
• for every , there exists an index such that

 Missing or unrecognized delimiter for \left (23)

Proof: Statements (a) (resp., (b)) follows from Lemma 3.7 with (resp., ).

From now on, we focus on the ergodic convergence rate of the NE-HPE framework. For , define and the ergodic sequences

 ~zak=1Λkk∑i=1λi~zi,rak:=1Λkk∑i=1λiri,εak:=1Λkk∑i=1λi(εi+⟨ri,~zi−~zak⟩). (24)

The following ergodic convergence result describes the association between the ergodic iterate and the residual pair , and gives a convergence rate bound on the latter residual pair.

###### Theorem 3.9

(Ergodic convergence) Let be as in (20) and be as in (21). Then, for every , we have

 εak≥0,rak∈T[εak](~zak)

and

 ∥rak∥∗≤2τ√(dw)0Λk,εak≤(3Mm)2(dw)0+ρkΛk

where

 ρk:=maxi=1,…,k(dw)zi(~zi). (25)

Moreover, the sequence is bounded under either one of the following situations:

• , in which case

 ρk≤σ(dw)01−σ; (26)
• is bounded, in which case

 ρk≤2Mm[(dw)0+D] (27)

where is the diameter of with respect to .

Proof: The inequality and the inclusion follows from (24) and the transportation formula (see [5, Theorem 2.3]). Now, let be given. Using (8), (14) and (24), we easily see that

 Λkrak=k∑i=1λiri=k∑i=1∇(dw)zi(zi−1)=∇(dw)zk(z∗)−∇(dw)z0(z∗).

Hence, in view of Proposition 3.5(a), and relations (11) and (21), we have

 Λk∥∥rak∥∥∗ =∥∥∇(dw)z0(z∗)∥∥∗+∥∥∇(dw)zk(z∗)∥∥∗ ≤τ[(dw)z0(z∗)1/2+(dw)zk(z∗)1/2]≤2τ(dw)z0(z∗)1/2.

This inequality together with definition of clearly imply the bound on . We now establish the bound on . Using inequality (18) with , noting (24), and using the fact that is convex and , we conclude that

 Λkεak=k∑i=1λi(⟨ri,~zi−~zak⟩+εi)≤(dw)z0(~zak)≤maxi=1,…,k(dw)z0(~zi).

On the other hand, (12) with implies that for every and ,

 (dw)z0(~zi) ≤3Mm[(dw)zi(~zi)+(dw)zi(z∗)+(dw)z0(z∗)] ≤3Mm[(dw)zi(~zi)+2(dw)z0(z∗)]

where the last inequality is due to Proposition 3.5(a). Combining the above two relations and using the definitions of and , we then conclude that the bound on holds.

We now establish the bounds on under either one of the conditions (a) or (b). First, if , then it follows from (15) and Proposition 3.5 that

 (dw)zi(~zi)≤σ(dw)zi−1(~zi)≤σ1−σ(dw)zi−1(z∗)≤σ1−σ(dw)z0(z∗)

for every and . Noting (20) and (25), we then conclude that (26) holds. Assume now that is bounded. Using (12) with and Proposition 3.5(a), and noting the definition of in (b), we conclude that

 (dw)zi(~zi)≤2Mm[(dw)zi(z∗)+min{(dw)~zi(z∗),(dw)z∗(~zi)}]≤2Mm[(dw)z0(z∗)+D]

for every and . Hence, noting (20) and (25), we conclude that (27) holds.

In the remaining part of this subsection, we state some results about the sequence generated by an instance of the NE-HPE framework. We assume from now on that such instance generates an infinite sequence of iterates, i.e., the instance does not terminate in a finite number of steps and no termination criterion is checked. Since we are not assuming that the distance generating function is nondegenerate on , it is not possible to establish convergence of the sequence generated by the NE-HPE framework to a solution of (13). However, under some mild assumptions, it is possible to establish that approaches a point if the proximity measure used is the actual Bregman distance.

###### Proposition 3.10

Assume that for some infinite index set and some , we have

 limk→K(rk,εk)=(0,0),limk→K~zk=~z. (28)

Then, . If, in addition, , then .

Proof: Using the two limits in (28), and the fact that every maximal monotone operator is closed and for every , we conclude that . This conclusion together with Assumption A0 then imply that the first assertion of the proposition holds and that is non-increasing in view of Proposition 3.5(a). To show the second assertion, assume that . Since Lemma 3.3 with implies

 (dw)zk(~z)≤2Mm[(dw)zk(~zk)+(dw)~zk(~z)],

and the second limit in (28) clearly implies that , we then conclude that . Clearly, since is non-increasing, we have that , and hence that the second assertion holds.

###### Proposition 3.11

Assume that , and is bounded. Then, there exists such that

 limk→∞(dw)zk(~z)=limk→∞(dw)~zk(~z)=0. (29)

Proof: The assumption that and together with Theorem 3.8(b) imply that there exists subsequence converging to zero. Since is bounded, we may assume without loss of generality (by passing to a subsequence if necessary) that converges to some . Hence, by the first part of Proposition 3.10, we conclude that