Projected Reflected Gradient Methods for Monotone Variational Inequalities

# Projected Reflected Gradient Methods for Monotone Variational Inequalities

Yu. Malitsky222Department of Cybernetics, Taras Shevchenko National University of Kyiv,
64/13, Volodymyrska Str., Kyiv, 01601, Ukraine (y.malitsky@gmail.com).
###### Abstract

This paper is concerned with some new projection methods for solving variational inequality problems with monotone and Lipschitz-continuous mapping in Hilbert space. First, we propose the projected reflected gradient algorithm with a constant stepsize. It is similar to the projected gradient method, namely, the method requires only one projection onto the feasible set and only one value of the mapping per iteration. This distinguishes our method from most other projection-type methods for variational inequalities with monotone mapping. Also we prove that it has R-linear rate of convergence under the strong monotonicity assumption. The usual drawback of algorithms with constant stepsize is the requirement to know the Lipschitz constant of the mapping. To avoid this, we modify our first algorithm so that the algorithm needs at most two projections per iteration. In fact, our computational experience shows that such cases with two projections are very rare. This scheme, at least theoretically, seems to be very effective. All methods are shown to be globally convergent to a solution of the variational inequality. Preliminary results from numerical experiments are quite promising.

Key words. variational inequality, projection method, monotone mapping, extragradient method

AMS subject classifications. 47J20, 90C25, 90C30, 90C52

## 1 Introduction

We consider the classical variational inequality problem (VIP) which is to find a point such that

 ⟨F(x∗),x−x∗⟩≥0∀x∈C, \hb@xt@.01(1.1)

where is a closed convex set in Hilbert space , denotes the inner product in , and is some mapping. We assume that the following conditions hold

• The solution set of (LABEL:vip), denoted by , is nonempty.

• The mapping is monotone, i.e.,

 ⟨F(x)−F(y),x−y⟩≥0∀x,y∈H.
• The mapping is Lipschitz-continuous with constant , i.e., there exists such that

 ∥F(x)−F(y)∥≤L∥x−y∥∀x,y∈H.

The variational inequality problem is one of the central problems in nonlinear analysis (see [2, 5, 21]). Also monotone operators have turned out to be an important tool in the study of various problems arising in the domain of optimization, nonlinear analysis, differential equations and other related fields (see [3, 22]). Therefore, numerical methods for VIP with monotone operator have been extensively studied in the literature, see [5, 10] and references therein. In this section we briefly consider the development of projection methods for monotone variational inequality that provide weak convergence to a solution of (LABEL:vip).

The simplest iterative procedure is the well-known projected gradient method

 xn+1=PC(xn−λF(xn)),

where denotes the metric projection onto the set and is some positive number. In order to converge, however, this method requires the restrictive assumption that be strongly (or inverse strongly) monotone. The extragradient method proposed by Korpelevich and Antipin [11, 1], according to the following formula, overcomes this difficulty

 {yn=PC(xn−λF(xn))xn+1=PC(xn−λF(yn)), \hb@xt@.01(1.2)

where . The extragradient method has received a great deal of attention by many authors, who improved it in various ways; see, e.g., [9, 7, 17, 4] and references therein. We restrict our attention only to one extension of the extragradient method. It was proposed in [4] by Censor, Gibali and Reich

 \hb@xt@.01(1.3)

where . Since the second projection in (LABEL:censor) can be found in a closed form, this method is more applicable when a projection onto the closed convex set is a nontrivial problem.

As an alternative to the extragradient method or its modifications is the following remarkable scheme proposed by Tseng in [20]

 {yn=PC(xn−λF(xn))xn+1=yn+λ(F(xn)−F(yn)), \hb@xt@.01(1.4)

where . Both algorithms (LABEL:censor) and (LABEL:tseng) have the same complexity per iteration: we need to compute one projection onto the set and two values of .

Popov in his work [16] proposed an ingenious method, which is similar to the extragradient method, but uses on every iteration only one value of the mapping . Using the idea from [4, 12], Malitsky and Semenov improved Popov algorithm. They presented in [13] the following algorithm

 ⎧⎨⎩Tn={w∈H∣⟨xn−λF(yn−1)−yn,w−yn⟩≤0}xn+1=PTn(xn−λF(yn))yn+1=PC(xn+1−λF(yn)), \hb@xt@.01(1.5)

where . It is easy to see that this method needs only one projection onto the set (as in (LABEL:censor) or (LABEL:tseng)) and only one value of per iteration. The latter makes algorithm (LABEL:mal_sem) very attractive for cases when a computation of operator is expensive. This often happens, for example, in a huge-scale VIP or VIP that arises from optimal control.

In this work we propose the following scheme

 xn+1=PC(xn−λF(2xn−xn−1)), \hb@xt@.01(1.6)

where . Again we see that both algorithms (LABEL:mal_sem) and (LABEL:our) have the same computational complexity per iteration, but the latter has more simple and elegant structure. Algorithm (LABEL:our) reminds the projected gradient method, however, a value of a gradient is taken in the point that is a reflection of in . Preliminary results from numerical experiments, comparing algorithm (LABEL:our) to others, are promising. Also note that a simple structure of (LABEL:our) allows us to obtain a nonstationary variant of  (LABEL:our) with variable stepsize in a very convenient way.

The paper is organized as follows. Section LABEL:prel presents some basic useful facts which we use throughout the paper. In Section LABEL:algorithm we prove the convergence of the method (LABEL:our). Under some more restrictive assumption we also establish its rate of convergence. In Section LABEL:modif_algorithm we present the nonstationary Algorithm LABEL:algModif1 which is more flexible, since it does not use the Lipschitz constant of the mapping . This makes it much more convenient in practical applications than Algorithm LABEL:our. Section LABEL:num_results contains the results of numerical experiments.

## 2 Preliminaries

The next three statements are classical. For the proof we refer the reader to [2, 3].

###### Lemma 2.1

Let be nonempty closed convex set in , . Then

• ;

• .

###### Lemma 2.2 (Minty)

Assume that is a continuous and monotone mapping. Then is a solution of (LABEL:vip) iff is a solution of the following problem

###### Remark 2.1

The solution set of variational inequality (LABEL:vip) is closed and convex.

As usual, the symbol denotes a weak convergence in .

###### Lemma 2.3 (Opial)

Let be a sequence in such that . Then for all

 liminfn→∞∥xn−x∥

The proofs of the next four statements are omitted by their simplicity.

###### Lemma 2.4

Let be nonempty closed convex set in , and . Then for all

 PM(¯x+λ(x−¯x))=¯x.
###### Lemma 2.5

Let , be in H. Then

 ∥2u−v∥2=2∥u∥2−∥v∥2+2∥u−v∥2.
###### Lemma 2.6

Let and be two real sequences. Then

 liminfn→∞an+liminfn→∞bn≤liminfn→∞(an+bn).
###### Lemma 2.7

Let , be two nonnegative real sequences such that

 an+1≤an−bn.

Then is bounded and .

In order to establish the rate of convergence, we need the following

###### Lemma 2.8

Let , be two nonnegative real sequences and such that for all the following holds

 an+1+bn+1≤(1−2α)an+αan−1+βbn. \hb@xt@.01(2.1)

Then there exist and such that for any .

Proof. First, note that a simple calculus ensures that is decreasing on . Since is continuous and , we can choose such that . Our next task is to find some and such that (LABEL:rate_1) with defined as above will be equivalent to the inequality

 an+1+δan+bn+1≤γ(an+δan−1)+βbn.

It is easy to check that such numbers are

 γ =1−2α+√1+4α22∈(0,1), δ =−1+2α+√1+4α22>0.

Then by we can conclude that

 an+1+δan+bn+1≤γ(an+δan−1+bn).

Iterating the last inequality simply leads to the desired result

 an+1+δan+bn+1≤γ(an+δan−1+bn)≤⋯≤γn(a1+δa0+b1)=γnM,

where .

## 3 Algorithm and its convergence

We first note that solutions of (LABEL:vip) coincide with zeros of the following projected residual function:

 r(x,y):=∥y−PC(x−λF(y))∥+∥x−y∥,

where is some positive number. Now we formally state our algorithm.

###### Algorithm 3.1
1. Choose and .

2. Given the current iterate and , compute

 xn+1=PC(xn−λF(yn))
3. If then stop: is a solution. Otherwise compute

 yn+1=2xn+1−xn,

Next lemma is central to our proof of the convergence theorem.

###### Lemma 3.1

Let and be two sequences generated by Algorithm LABEL:alg and let . Then

 ∥xn+1−z∥2≤ ∥xn−z∥2−(1−λL(1+√2))∥xn−xn−1∥2+λL∥xn−yn−1∥2 −(1−√2λL)∥xn+1−yn∥2−2λ⟨F(z),yn−z⟩. \hb@xt@.01(3.1)

Proof. By Lemma LABEL:proj we have

 ∥xn+1−z∥2 ≤∥xn−λF(yn)−z∥2−∥xn−λF(yn)−xn+1∥2 \hb@xt@.01(3.2) =∥xn−z∥2−∥xn+1−xn∥2−2λ⟨F(yn),xn+1−z⟩.

Since is monotone, . Thus, adding this item to the right side of (LABEL:fst), we get

 ∥xn+1−z∥2≤ ∥xn−z∥2−∥xn+1−xn∥2+2λ⟨F(yn),yn−xn+1⟩−2λ⟨F(z),yn−z⟩ = ∥xn−z∥2−∥xn+1−xn∥2+2λ⟨F(yn)−F(yn−1),yn−xn+1⟩ +2λ⟨F(yn−1),yn−xn+1⟩−2λ⟨F(z),yn−z⟩. \hb@xt@.01(3.3)

As , we have by Lemma LABEL:proj

 ⟨xn−xn−1+λF(yn−1),xn−xn+1⟩ ≤0, ⟨xn−xn−1+λF(yn−1),xn−xn−1⟩ ≤0.

 ⟨xn−xn−1+λF(yn−1),yn−xn+1⟩≤0,

from which we conclude

 2λ⟨F(yn−1),yn−xn+1⟩ ≤2⟨xn−xn−1,xn+1−yn⟩=2⟨yn−xn,xn+1−yn⟩ =∥xn+1−xn∥2−∥xn−yn∥2−∥xn+1−yn∥2, \hb@xt@.01(3.4)

since .

We next turn to estimating . It follows that

 \hb@xt@.01(3.5)

Using (LABEL:est1) and (LABEL:est2), we deduce in (LABEL:snd) that

 ∥xn+1−z∥2≤ ∥xn−z∥2−(1−λL(1+√2))∥xn−yn∥2+λL∥xn−yn−1∥2 −(1−√2λL)∥xn+1−yn∥2−2λ⟨F(z),yn−z⟩.

which completes the proof.

Now we can state and prove our main convergence result.

###### Theorem 3.2

Assume that (C1)–(C3) hold. Then any sequence generated by Algorithm LABEL:alg weakly converges to a solution of .

Proof. Let us show that the sequence is bounded. Fix any . Since and

 ⟨F(z),yn−z⟩ =2⟨F(z),xn−z⟩−⟨F(z),xn−1−z⟩ ≥⟨F(z),xn−z⟩−⟨F(z),xn−1−z⟩,

we can deduce from inequality (LABEL:ineq_lemma) that

 ∥xn+1−z∥2+ λL∥xn+1−yn∥2+2λ⟨F(z),xn−z⟩ \hb@xt@.01(3.6) ≤ ∥xn−z∥2+λL∥xn−yn−1∥2+2λ⟨F(z),xn−1−z⟩ −(1−λL(1+√2))∥xn−xn−1∥2.

For let

 an =∥xn−z∥2+λL∥xn−yn−1∥2+2λ⟨F(z),xn−1−z⟩, bn =(1−λL(1+√2))∥xn−xn−1∥2.

Then we can rewrite inequality (LABEL:rewrite_ineq) as . By Lemma LABEL:lim_seq, we conclude that is bounded and . Therefore, the sequences and hence are also bounded. From the inequality

 ∥xn+1−yn∥≤∥xn+1−xn∥+∥xn−yn∥=∥xn+1−xn∥+∥xn−xn−1∥

we also have that .

As is bounded, there exist a subsequence of such that converges weakly to some . It is clear that is also convergent to that . We show . From Lemma LABEL:proj it follows that

From this we conclude that, for all ,

 0 ≤⟨xni+1−xni,y−xni+1⟩+λ⟨F(yni),y−yni⟩+λ⟨F(yni),yni−xni+1⟩ ≤⟨xni+1−xni,y−xni+1⟩+λ⟨F(y),y−yni⟩+λ⟨F(yni),yni−xni+1⟩. \hb@xt@.01(3.7)

In the last inequality we used condition (C2). Taking the limit as in (LABEL:weak) and using that , we obtain

 0≤⟨F(y),y−x∗⟩∀y∈C,

which implies by Lemma LABEL:minty that .

Let us show . From (LABEL:rewrite_ineq) it follows that the sequence is monotone for any . Taking into account its boundedness we deduce that it is convergent. At last, the sequence is also convergent, therefore, is convergent.

We want to prove that is weakly convergent. On the contrary, assume that the sequence has at least two weak cluster points and such that . Let be a sequence such that as . Then by Lemma LABEL:liminf and LABEL:opial we have

We can now proceed analogously to the proof that

which is impossible. Hence we can conclude that weakly converges to some .

It is well-known (see [19]) that under some suitable conditions the extragradient method has R-linear rate of convergence. In the following theorem we show that our method has the same rate of convergence under a strongly monotonicity assumption of the mapping , i.e.,

 (C2*)⟨F(x)−F(y),x−y⟩≥m∥x−y∥2∀x,y∈H

for some .

###### Theorem 3.3

Assume that (C1), (C2*), (C3) hold. Then any sequence generated by Algorithm LABEL:alg converges to the solution of (LABEL:vip) at least R-linearly.

Proof. Since is strongly monotone, (LABEL:vip) has a unique solution, which we denote by . Note that by Lemma LABEL:u_v

From this and from (C2*) we conclude that, for all ,

 \hb@xt@.01(3.8)

From now on, let be any number in . Then adding the left part of (LABEL:str_mon) to the right side of (LABEL:fst), we get

 ∥xn+1−z∥2≤ ∥xn−z∥2−∥xn+1−xn∥2+2λ⟨F(yn),yn−xn+1⟩ −2λ⟨F(z),yn−z⟩−2λm1(2∥xn−z∥2−∥xn−1−z∥2) = (1−4λm1)∥xn−z∥2−∥xn+1−xn∥2+2λm1∥xn−1−z∥2 −2λ⟨F(z),yn−z⟩.

For items and we use estimations (LABEL:est1) and (LABEL:est2) from Lemma LABEL:main_lemma. Therefore, we obtain

 ∥xn+1−z∥2≤(1−4λm1)∥xn−z∥2+2λm1∥xn−1−z∥2−(1−λL(1+√2))∥xn−xn−1∥2−(1−√2λL)∥xn+1−yn∥2+λL∥xn−yn−1∥2−2λ⟨F(z),yn−z⟩.

From the last inequality it follows that

 ∥xn+1−z∥2+(1−√2λL)∥xn+1−yn∥2+4λ⟨F(z),xn−z⟩≤(1−4λm1)∥xn−z∥2+2λm1∥xn−1−z∥2+λL∥xn−yn−1∥2+2λ⟨F(z),xn−1−z⟩≤(1−4λm1)∥xn−z∥2+2λm1∥xn−1−z∥2+max{λL1−√2λL,12}((1−√2λL)∥xn−yn−1∥2+4λ⟨F(z),xn−1−z⟩). \hb@xt@.01(3.9)

In the first inequality we used that , and in the second we used that

 λL ≤(1−√2λL)max{λL1−√2λL,12}, 2 ≤4max{λL1−√2λL,12}.

Set

 an =∥xn−z∥2, bn =(1−√2λL)∥xn−yn−1∥2+4λ⟨F(z),xn−1−z⟩, β =max{λL1−√2λL,12}, α =2λm1 α0 =2λm.

As is arbitrary in , we can rewrite (LABEL:rate_ineq1) in the new notation as

 an+1+bn+1≤(1−2α)an+αan−1+βbn∀α∈(0,α0].

Since , we can conclude by Lemma LABEL:rate_1 that for some and . This means that converges to at least R-linearly.

## 4 Modified Algorithm

The main shortcoming of all algorithms mentioned in §LABEL:intro is a requirement to know the Lipschitz constant or at least to know some estimation of it. Usually it is difficult to estimate the Lipschitz constant more or less precisely, thus stepsizes will be quite tiny and, of course, this is not practical. For this reason, algorithms with constant stepsize are not applicable in most cases of interest. The usual approaches to overcome this difficulty consist in some prediction of a stepsize with its further correction (see [9, 14, 20]) or in a usage of Armijo-type linesearch procedure along a feasible direction (see [7, 17]). Usually the latter approach is more effective, since very often the former approach requires too many projections onto the feasible set per iteration.

Nevertheless, our modified method uses the prediction-correction strategy. However, in contrast to algorithms in [9, 14, 20], we need at most two projections per iteration. This is explained by the fact that for direction in Algorithm LABEL:alg we use a very simple and cheap formula: . Although we can not explain this theoretically, but numerical experiments show that cases with two projections per iteration are quite rare, so usually we have only one projection per iteration that is a drastic contrast to other existing methods.

Looking on the proof of Theorem LABEL:th_1, we can conclude that inequality (LABEL:est2) is the only place where we use Lipschitz constant . Therefore, choosing such that the inequality holds for some fixed we can obtain the similar estimation as in (LABEL:est2). All this leads us to the following

###### Algorithm 4.1
 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩Choosex0=y0∈H,λ0>0,α∈(0,√2−1).Chooseλns.t.λn∥F(yn)−F(yn−1)∥≤α∥yn−yn−1∥,n>0.xn+1=PC(xn−λnF(yn)),yn+1=2xn+1−xn.

Although numerical results showed us effectiveness of Algorithm LABEL:algModif0, we can not prove its convergence for all cases. Nevertheless, we want to notice that we did not find any example where Algorithm LABEL:algModif0 did not work. Thus, even Algorithm LABEL:algModif0 seems to be very reliable for many problems.

Now our task is to modify Algorithm LABEL:algModif0 in a such way that we will be able to prove convergence of the obtained algorithm. For this we need to distinguish the good cases, where Algorithm LABEL:algModif0 works well, from the bad ones, where it possibly does not.

From now on we adopt the convention that . Clearly, it follows that as well. The following algorithm gets round the difficulty of bad cases of Algorithm LABEL:algModif0.

###### Algorithm 4.2
• Choose , , , and some large . Compute

 y0 =PC(x0−λ−1F(x0)), λ0 =min{α∥x0−y0∥∥F(x0)−F(y0)∥,¯λ}, x1 =PC(x0−λ0F(y0)).
• Given and , set and let be defined by

 λ(y,τ):=min{α∥y−yn−1∥∥F(y)−F(yn−1)∥,1+τn−1τλn−1,¯λ}. \hb@xt@.01(4.1)

Compute

 yn =2xn−xn−1, λn =λ(yn,τn), xn+1 =PC(xn−λnF(yn)).
• If then stop: is a solution. Otherwise compute

 tn= −∥xn+1−xn∥2+2λn⟨F(yn),yn−xn+1⟩+(1−α(1+√2))∥xn−yn∥2 −α∥xn−yn−1∥2+(1−√2α)∥xn+1−yn∥2.
• If then set and go to step LABEL:modifstep. Otherwise we have two cases and .

• If then choose such that

 ∥λ′nF(yn)−λn−1F(yn−1)∥≤α∥yn−yn−1∥. \hb@xt@.01(4.2)

Compute

 xn+1=PC(xn−λ′nF(yn)).

Set , and go to step LABEL:modifstep.

• If then find such that

 y′n=xn+τ′n(xn−xn−1) λ(y′n,τ′n)≥τ′nλn−1. \hb@xt@.01(4.3)

Then choose such that

 ∥λ′nF(y′n)−τ′nλn−1F(yn−1)∥≤α∥y′n−yn−1∥. \hb@xt@.01(4.4)

Compute

 xn+1 =PC(xn−λ′nF(y′n))

Set , , , and go to step LABEL:modifstep.

It is clear that on every iteration in Algorithm LABEL:algModif1 we need to use a residual function with different , namely, .

First, let us show that Algorithm LABEL:algModif1 is correct, i.e., it is always possible to choose and on steps (LABEL:step_la_n.i) and (LABEL:step_la_n.ii). For this we need two simple lemmas.

###### Lemma 4.1

Step (LABEL:step_la_n.i) in Algorithm LABEL:algModif1 is well-defined.

Proof. From the inequality

 ∥λn−1F(yn)−λn−1F(yn−1)∥≤λn∥F(yn)−F(yn−1)∥≤α∥yn−yn−1∥

we can see that it is sufficient to take . (However, it seems better for practical reasons to choose as great as possible).

###### Lemma 4.2

Step (LABEL:step_la_n.ii) in Algorithm LABEL:algModif1 is well-defined.

Proof. First, let us show that for all and . It is clear that for every

Then and hence by induction

 λ(y,τ):=min{α∥y−yn−1∥∥F(y)−F(yn−1)∥,1+τn−1τλn−1,¯λ}≥αL.

Therefore, it is sufficient to take . (But, as above, it seems better to choose as great as possible).

At last, we can prove the existence of such that (LABEL:la_nD) will hold by the same arguments as in Lemma LABEL:4i.

The following lemma yields an analogous inequality to (LABEL:ineq_lemma).

###### Lemma 4.3

Let and be two sequences generated by Algorithm LABEL:algModif1 and let , . Then

 ∥xn+1−z∥2≤ ∥xn−z∥2−(1−α(1+√2))∥xn−yn∥2−(1−√2α)∥xn+1−yn∥2 +α∥xn−yn−1∥2−2λn⟨F(z),yn−z⟩. \hb@xt@.01(4.5)

Proof. Proceeding analogously as in (LABEL:fst) and in (LABEL:snd), we get

 ∥xn+1−z∥2≤ ∥xn−z∥2−∥xn+1−xn∥2+2λn⟨F(yn)−F(yn−1),yn−xn+1⟩ +2λn⟨F(yn−1),yn−xn+1⟩−2λn⟨F(z),yn−z⟩. \hb@xt@.01(4.6)

The same arguments as in (LABEL:est1) yield

 2λn−1⟨F(yn−1),yn−xn+1⟩ ≤2⟨xn−xn−1,xn+1−yn⟩=2⟨yn−xn,xn+1−yn⟩ =∥xn+1−xn∥2−∥xn−yn∥2−∥xn+1−yn∥2. \hb@xt@.01(4.7)

Using (LABEL:est2) and the inequality