Quantitative convergence analysis of iterated expansive, set-valued mappings

Quantitative convergence analysis of iterated expansive, set-valued mappings

D. Russell Luke Institut für Numerische und Angewandte Mathematik, Universität Göttingen, 37083 Göttingen, Germany. DRL was supported in part by German Israeli Foundation Grant G-1253-304.6 and Deutsche Forschungsgemeinschaft Collaborative Research Center SFB755 E-Mail: r.luke@math.uni-goettingen.de    Nguyen H. Thao Institut für Numerische und Angewandte Mathematik, Universität Göttingen, 37083 Göttingen, Germany. NHT was supported by German Israeli Foundation Grant G-1253-304.6. E-Mail: h.nguyen@math.uni-goettingen.de    Matthew K. Tam Institut für Numerische und Angewandte Mathematik, Universität Göttingen, 37083 Göttingen, Germany. MKT was supported by Deutsche Forschungsgemeinschaft Research Training Grant 2088. E-Mail: m.tam@math.uni-goettingen.de
Abstract

We develop a framework for quantitative convergence analysis of Picard iterations of expansive set-valued fixed point mappings. There are two key components of the analysis. The first is a natural generalization of single-valued averaged mappings to expansive, set-valued mappings that characterizes a type of strong calmness of the fixed point mapping. The second component to this analysis is an extension of the well-established notion of metric subregularity – or inverse calmness – of the mapping at fixed points. Convergence of expansive fixed point iterations is proved using these two properties, and quantitative estimates are a natural byproduct of the framework. To demonstrate the application of the theory, we prove for the first time a number of results showing local linear convergence of nonconvex cyclic projections for inconsistent (and consistent) feasibility problems, local linear convergence of the forward-backward algorithm for structured optimization without convexity, strong or otherwise, and local linear convergence of the Douglas–Rachford algorithm for structured nonconvex minimization. This theory includes earlier approaches for known results, convex and nonconvex, as special cases.

2010 Mathematics Subject Classification: Primary 49J53, 65K10 Secondary 49K40, 49M05, 49M27, 65K05, 90C26.

Keywords: Almost averaged mappings, averaged operators, calmness, cyclic projections, elemental regularity, feasibility, fixed points, forward-backward, Douglas–Rachford, Hölder regularity, hypomonotone, Krasnoselski-Mann iteration, Picard iteration, piecewise linear-quadratic, polyhedral mapping, metric regularity, metric subregularity, nonconvex, nonexpansive, structured optimization, submonotone, subtransversality, transversality

1 Introduction

We present a program of analysis that enables one to quantify the rate of convergence of sequences generated by fixed point iterations of expansive, set-valued mappings. The framework presented here subsumes earlier approaches for analyzing fixed point iterations of relaxed nonexpansive mappings and opens up new results for expansive mappings. Our approach has its roots in the pioneering work of Mann, Krasnoselski, Edelstein, Gurin111We learned from Alex Kruger that Gurin’s name was misprinted as Gubin in the translation of his work into English., Polyak and Raik who wrote seminal papers in the analysis of (firmly) nonexpansive and averaged mappings [53, 40, 30, 31] although the terminology “averaged” wasn’t coined until sometime later [8]. Our strategy is also indebted to the developers of notions of stability, in particular metric regularity and its more recent refinements [63, 7, 28, 35, 36]. We follow a pattern of proof used in [32] and [3] for Picard iterations of set-valued mappings, though this approach was actually inspired by the analysis of alternating projections in [31].

The idea is to isolate two properties of the fixed point mapping. The first property is a generalization of the averaging property, what we call almost averaging. When a self-mapping is averaged and fixed points exist, then the Picard iteration converges to a fixed point (weakly in the Hilbert space setting) without any additional assumptions. (See [61, Theorem 3]. See also [70, 3. Satz] for the statement under the assumption that the mapping is weakly continuous.) In order to quantify convergence, a second property is needed. In their analysis of Krasnoselski-Mann relaxed cyclic projections for convex feasibility, Gurin, Polyak and Raik assume that the set-intersection has interior [31, Theorem 1]. Interiority is an assumption about stability of the fixed points of the mapping, and this generalizes considerably. Even if rates of convergence are not the primary interest, if the averaging property is relaxed in any meaningful way, monotonicity of Picard iterations with respect to the set of fixed points is lost. In order to recover convergence in this case, we appeal to stability of the set of fixed points to overcome the lack of monotonicity of the fixed point mapping. The second property we require of the mapping is a characterization of the needed stability at fixed points. Metric subregularity of the mapping at fixed points is one well-established notion that fulfills this stability and provides quantitative estimates for the rate of convergence of the iterates. This is closely related (actually synonymous) to the existence of error bounds. The almost averaging and the stability properties are defined and quantified on local neighborhoods, but our approach is not asymptotic. Indeed, when convexity or nonexpansivity is assumed, these local neighborhoods extend to the whole space and the corresponding results are global and recover the classical results.

We take care to introduce the notions of almost averaging, stability and metric subregularity, and to present the most general abstract results in Section 2. Almost averaged mappings are developed first in Section 2.1, after which abstract convergence results are presented in Section 2.2. In Section 2.3 the notion of metric regularity and its variants is presented and applied to the abstract results of Section 2.2. The rest of the paper, Section 3, is a tutorial on the application of these ideas to quantitative convergence analysis of algorithms for, respectively, nonconvex and inconsistent feasibility (Section 3.1) and structured optimization (Section 3.2). We focus our attention on just a few simple algorithms, namely cyclic projections, projected gradients and Douglas–Rachford.

Among the new and recent concepts are: almost nonexpansive/averaged mappings (Section 2.1), which are a generalization of averaged mappings [8] and satisfy a type of strong calmness of set-valued mappings; submonotonicity of set-valued self-mappings (Definition 2.9), which is equivalent to almost firm-nonexpansiveness of their resolvents (Proposition 2.8) generalizing Minty’s classical identification of monotone mappings with firmly-nonexpansive resolvents [54, 67]; elementally subregular sets (Definition 3.1 from [42, Definition 5]); subtransversality of collections of sets at points of nonintersection (Definition 3.6); and gauge metric subregularity (Definition 2.17 from [35, 36]). These objects are applied to obtain a number of new results: local linear convergence of nonconvex cyclic projections for inconsistent feasibility problems (Theorem 3.14) with some surprising special cases like two nonintersecting circles (Example 3.18) and practical (inconsistent) phase retrieval (Example 3.20); global R-linear convergence of cyclic projections onto convex sets (Corollary 3.15); local linear convergence of forward-backward-type algorithms without convexity or strong monotonicity (Theorem 3.24); local linear convergence of the Douglas–Rachford algorithm for structured nonconvex optimization (Theorem 3.33) and a specialization to the relaxed averaged alternating reflections (RAAR) algorithm [46, 47] for inconsistent phase retrieval (Example 3.35).

The quantitative convergence results presented here focus on linear convergence, but this framework is appropriate for a wider range of behaviors, particularly sublinear convergence. The emphasis on linear convergence is in part due to its simplicity, but also because it is surprisingly prevalent in first order algorithms for common problem structures (see the discussions of phase retrieval in Examples 3.20 and 3.35). To be sure, there are constants that would, if known, determine the exact rate, and these are either hard or impossible to calculate. But in many instances the order of convergence – linear or sublinear – can be determined a priori. As such, a posteriori error bounds can be estimated in some cases, with the usual epistemological caveats, from the observed behavior of the algorithm. For problems where the solution to the underlying variational problem, as opposed to its optimal value, is the only meaningful result of the numerical algorithm, such error bounds are essential. One important example is image processing with statistical constraints studied in [3] and [51]. Here the images are physical measurements and solutions to the variational image processing problems have a quantitative statistical interpretation in terms of the experimental data. In contrast, the more common analysis determining that an algorithm for computing these solutions merely converges, or even that the objective value converges at a given rate, leads unavoidably to vacuous assurances.

1.1 Basic definitions and notation

The setting throughout this work is a finite dimensional Euclidean space . The norm denotes the Euclidean norm. The open unit ball and the unit sphere in a Euclidean space are denoted and , respectively. stands for the open ball with radius and center . We denote the extended reals by . The domain of a function is defined by . The subdifferential of at , for our purposes, can be defined by

(1)

Here the notation means that and . When is convex, (1) reduces to the usual convex subdifferential given by

(2)

When the subdifferential is defined to be empty. Elements of the subdifferential are called subgradients.

A set-valued mapping from to another Euclidean space is denoted and its inverse is given by

(3)

The mapping is said to be monotone on if

(4)

is called strongly monotone on if there exists a such that

(5)

A maximally monotone mapping is one whose graph cannot be augmented by any more points without violating monotonicity. The subdifferential of a proper, l.s.c., convex function, for example, is a maximally monotone set-valued mapping [68, Theorem 12.17]. We denote the resolvent of by where denotes the identity mapping. The corresponding reflector is defined by . A basic and fundamental fact is that the resolvent of a monotone mapping is firmly nonexpansive and hence single-valued [54, 22]. Of particular interest are polyhedral (or piecewise polyhedral [68]) mappings, that is, mappings whose graph is the union of finitely many sets that are polyhedral convex in [28].

Notions of continuity of set-valued mappings have been thoroughly developed over the last years. Readers are referred to the monographs [5, 68, 28] for basic results. A mapping is said to be Lipschitz continuous if it is closed-valued and there exists a such that, for all ,

(6)

Lipschitz continuity is, however, too strong a notion for set-valued mappings. We will mostly only require calmness, which is a pointwise version of Lipschitz continuity. A mapping is said to be calm at for if and there is a constant together with neighborhoods of such that

(7)

When is single-valued, calmness is just pointwise Lipschitz continuity:

(8)

Closely related to calmness is metric subregularity, which can be understood as the property corresponding to a calmness of the inverse mapping. As the name suggests, it is a weaker property than metric regularity which, in the case of an matrix for instance (), is equivalent to surjectivity. Our definition follows the characterization of this property given in [35, 36], and appropriates the terminology of [28] with slight but significant variations. The graphical derivative of a mapping at a point is denoted and defined as the mapping whose graph is the tangent cone to at (see [6] where it is called the contingent derivative). That is,

(9)

where is the tangent cone mapping associated with the set defined by

(10)

Here the notation means that the sequence of points approaches from within .

The distance to a set with respect to the bivariate function is defined by

(11)

and the set-valued mapping

(12)

is the corresponding projector. An element is called a projection. Closely related to the projector is the prox mapping [56]

When , then for all . The value function corresponding to the prox mapping is known as the Moreau envelope, which we denote by . When and the Moreau envelope is just one-half the squared distance to the set : . The inverse projector is defined by

(13)

Throughout this note we will assume the distance corresponds to the Euclidean norm, though most of the statements are not limited to this. When then one has the following variational characterization of the projector: if and only if

(14)

Following [16], we use this object to define the various normal cone mappings, which in turn lead to the subdifferential of the indicator function .

The -normal cone to at is defined

(15)

The (limiting) normal cone to at , denoted , is defined as the limsup of the -normal cones. That is, a vector if there are sequences , with and . The proximal normal cone to at is the set

(16)

If , then all normal cones are defined to be empty.

The proximal normal cone need not be closed. The limiting normal cone is, of course, closed by definition. See [55, Definition 1.1] or [68, Definition 6.3] (where this is called the regular normal cone) for an in-depth treatment as well as [55, page 141] for historical notes. When the projection is with respect to the Euclidean norm, the limiting normal cone can be written as the limsup of proximal normals:

(17)

2 General theory: Picard iterations

2.1 Almost averaged mappings

Our ultimate goal is a quantitative statement about convergence to fixed points for set-valued mappings. Preparatory to this, we first must be clear what is meant by a fixed point of a set-valued mapping.

Definition 2.1 (fixed points of set-valued mappings).

The set of fixed points of a set-valued mapping is defined by

In the set-valued setting, it is important to keep in mind a few things that can happen that cannot happen when the mapping is single-valued.

Example 2.2 (inhomogeneous fixed point sets).

Let where

Here and the point is a fixed point of since . However, the point is also in , and this is not a fixed point of .

To help rule out inhomogeneous fixed point sets like the one in the previous example, we introduce the following strong calmness of fixed point mappings that is an extension of conventional nonexpansiveness and firm nonexpansiveness. What we call almost nonexpansive mappings below were called -nonexpansive mappings in [32, Definition 2.3], and almost averaged mappings are slight generalization of -firmly nonexpansive mappings also defined there.

Definition 2.3 (almost nonexpansive/averaged mappings).

Let be a nonempty subset of and let be a (set-valued) mapping from to .

  1. is said to be pointwise almost nonexpansive on at if there exists a constant such that

    (18)

    If (18) holds with then is called pointwise nonexpansive at on .

    If is pointwise (almost) nonexpansive at every point on a neighborhood of (with the same violation constant ) on , then is said to be (almost) nonexpansive at (with violation ) on .

    If is pointwise (almost) nonexpansive on at every point (with the same violation constant ), then is said to be pointwise (almost) nonexpansive on (with violation ). If is open and is pointwise (almost) nonexpansive on , then it is (almost) nonexpansive on .

  2. is called pointwise almost averaged on at if there is an averaging constant and a violation constant such that the mapping defined by

    is pointwise almost nonexpansive at with violation on .

    Likewise if is (pointwise) (almost) nonexpansive on (at ) (with violation ), then is said to be (pointwise) (almost) averaged on (at ) (with averaging constant and violation ).

    If the averaging constant , then is said to be (pointwise) (almost) firmly nonexpansive on (with violation ) (at ).

Note that the mapping need not be a self-mapping from to itself. In the special case where is (firmly) nonexpansive at all points , mappings satisfying (18) are also called quasi-(firmly)nonexpansive [11].

The term “almost nonexpansive” has been used for different purposes by Nussbaum [60] and Rouhani [69]. Rouhani uses the term to indicate sequences, in the Hilbert space setting, that are asymptotically nonexpansive. Nussbaum’s definition is the closest in spirit and definition to ours, except that he defines to be locally almost nonexpansive when . In this context, see also [66]. At the risk of some confusion, we re-purpose the term here. Our definition of pointwise almost nonexpansiveness of at is stronger than calmness [68, Chapter 8.F] with constant since the inequality must hold for all pairs and , while for calmness the inequality would hold only for points and their projections onto . We have avoided the temptation to call this property “strong calmness” in order to make clearer the connection to the classical notions of (firm) nonexpansiveness. A theory based only on calm mappings, what one might call “weakly almost averaged/nonexpansive” operators is possible and would yield statements about the existence of convergent selections from sequences of iterated set-valued mappings. In light of the other requirement of the mapping that we will explore in Section 2.3, namely metric subregularity, this would illuminate an aesthetically pleasing and fundamental symmetry between requirements on and its inverse. We leave this avenue of investigation open. Our development of the properties of almost averaged operators parallels the treatment of averaged operators in [11].

Proposition 2.4 (characterizations of almost averaged operators).

Let , and let . The following are equivalent.

  1. is pointwise almost averaged at on with violation and averaging constant .

  2. is pointwise almost nonexpansive at on with violation .

  3. For all and it holds that

    (19)

Consequently, if is pointwise almost averaged at on with violation and averaging constant then is pointwise almost nonexpansive at on with violation at most .

Proof.

This is a slight extension of [11, Proposition 4.25]. ∎

Example 2.5 (alternating projections).

Let for the closed sets and defined below.

  1. If and are convex, then is nonexpansive and averaged (i.e. pointwise everywhere, no violation).

  2. Packman eating a piece of pizza:

    The mapping is not almost nonexpansive on any neighborhood for any finite violation at , but it is pointwise nonexpansive (no violation) at and nonexpansive at all on small enough neighborhoods of these points.

  3. is pointwise averaged at when

    This illustrates that whether or not and have points in common is not relevant to the property.

  4. is not pointwise almost averaged at for any when

    In light of Example 2.2, this shows that the pointwise almost averaged property is incompatible with inhomogeneous fixed points (see Proposition 2.6).     

Proposition 2.6 (pointwise single-valuedness of pointwise almost nonexpansive mappings).

If is pointwise almost nonexpansive on at with violation , then is single-valued at . In particular, if (that is ) then .

Proof.

By the definition of pointwise nonexpansive on at , it holds that

for all and . In particular, setting gives yields

That is, and hence we conclude that is single-valued at . ∎

Example 2.7 (pointwise almost nonexpansive mappings not single-valued on neighborhoods).

Although a pointwise almost nonexpansive mapping is single-valued at the reference point, it need not be single-valued on neighborhoods of the reference points. Consider, for example, the coordinate axes in ,

The metric projector is single-valued and even pointwise nonexpansive (no almost) at every point in , but multivalued on .

Almost firmly nonexpansive mappings have particularly convenient characterizations. In our development below and thereafter we use the set to denote the collection of points at which the property holds. This is useful for distinguishing points where the regularity holds from other distinguished points, like fixed points. In Section 2.3 the set is used to isolate a subset of fixed points. The idea here is that the properties needed to quantify convergence need not hold on the space where a problem is formulated, but may only hold on a subset of this space where the iterates of a particular algorithm may be, naturally, confined. This is used in [3] to achieve linear convergence results for the alternating directions method of multipliers algorithm. Alternatively, can also include points that are not fixed points of constituent operators in an algorithm, but are closely related to fixed points. One example of this is local best approximation points, that is, points in one set that are locally nearest to another. In section 3.1 we will need to quantify the violation of the averaging property for a projector onto a nonconvex set at points in another set, say , that are locally nearest points to . This will allow us to tackle inconsistent feasibility where the alternating projections iteration converges not to the intersection, but to local best approximation points.

Proposition 2.8 (almost firmly nonexpansive mappings).

Let be nonempty and . The following are equivalent.

  1. is pointwise almost firmly nonexpansive on at all with violation .

  2. The mapping given by

    (20)

    is pointwise almost nonexpansive on at all with violation , that is, can be written as

    (21)
  3. for all , and all at each whenever .

  4. Let be a mapping whose resolvent is , i.e., . At each , for all , and , the points and are in where and , and satisfy

    (22)
Proof.

12: Follows from Proposition 2.4 when .

23: Note first that at each and

(23a)
for all and . Repeating the definition of pointwise almost nonexpansiveness of at with violation on ,
(23b)

Together (23) yields

as claimed.

32: Use (23a) to replace in 3 and rearrange the resulting inequality to conclude that is pointwise almost nonexpansive at with violation on .

43: First, note that if and only if . From this it follows that for and , the points and with and , are in . So starting with 3, at each and ,

(24)
(25)

for all and . Separating out from the inner product on the left hand side of (25) yields the result. ∎

Property 4 of Proposition 2.8 is a type of submonotonicity of the mapping on with respect to . We use this descriptor to distinguish this notion from another well-established property known as hypomonotonicity [65].

Definition 2.9 ((sub/hypo)monotone mappings).

  1. A mapping is pointwise submonotone at if there is a constant together with a neighborhood of such that

    (26)

    The mapping is said to be submonotone on if (26) holds for all on .

  2. The mapping is said to be pointwise hypomonotone at with constant on if

    (27)

    If (27) holds for all then is said to be hypomonotone with constant on .

In the event that is in fact firmly nonexpansive (that is, and ) then Proposition 2.84 just establishes the well known equivalence between monotonicity of a mapping and firm nonexpansiveness of its resolvent [54]. Moreover, if a single-valued mapping is calm at with calmness modulus , then it is pointwise hypomonotone at with violation at most . Indeed,

(28)

This also points to a relationship to cohypomonotonicity developed in [26]. More recently the notion of pointwise quadratically supportable functions was introduced [51, Definition 2.1]; for smooth functions, this class – which is not limited to convex functions – was shown to include functions whose gradients are pointwise strongly monotone (pointwise hypomonotone with constant ) [51, Proposition 2.2]. A deeper investigation of the relationships between these different notions is postponed to future work.

The next result shows the inheritance of the averaging property under compositions and averages of averaged mappings.

Proposition 2.10 (compositions and averages of relatively averaged operators).

Let for be pointwise almost averaged on at all with violation and averaging constant where for .

  1. If and then the weighted mapping with weights , , is pointwise almost averaged at all with violation and averaging constant on .

  2. If and for , then the composite mapping is pointwise almost nonexpansive at all on with violation at most

    (29)
  3. If and for , then the composite mapping is pointwise almost averaged at all on with violation at most given by (29) and averaging constant at least

    (30)
Proof.

Statement 1 is a formal generalization of [11, Proposition 4.30] and follows directly from convexity of the squared norm and Proposition 2.43.

Statement 2 follows from applying the definition of almost nonexpansivity to each of the operators inductively, from to .

Statement 3 is formal generalization of [11, Proposition 4.32] and follows from more or less the same pattern of proof. Since it requires a little more care, the proof is given here. Define and set . Identify with any for and choose any . Likewise, identify with any for and choose any . Denote for and for . By convexity of the squared norm and Proposition 2.43 one has

Replacing by yields

(31)

From part 2 one has

so that

(32)

Putting (31) and (32) together yields

(33)

The composition is therefore almost averaged with violation

and averaging constant . Finally, an induction argument shows that

which is the claimed violation. ∎

Remark 2.11.

We remark that Proposition 2.102 holds in the case when () are merely pointwise almost nonexpansive. The counterpart for () pointwise almost nonexpansive to Proposition 2.101 is given by allowing .

Corollary 2.12 (Krasnoselski–Mann relaxations).

Let and define for pointwise almost averaged at with violation and averaging constant on . Then is pointwise almost averaged at with violation and averaging constant on . In particular, when the mapping is pointwise almost firmly nonexpansive at with violation on .

Proof.

Noting that is averaged everywhere on with zero violation and all averaging constants , the statement is an immediate specialization of Proposition 2.101. ∎

A particularly attractive consequence of Corollary 2.12 is that the violation of almost averaged mappings can be mitigated by taking smaller steps via Krasnoselski-Mann relaxation.

To conclude this section we prove the following lemma, a special case of which will be required in Section 3.1.3, which relates the fixed point set of the composition of pointwise almost averaged operators to the corresponding difference vector.

Definition 2.13 (difference vectors of composite mappings).

For a collection of operators () and the set of difference vectors of at is given by the mapping defined by

(34)

where is the permutation mapping on the product space for and

Lemma 2.14 (difference vectors of averaged compositions).

Given a collection of operators (), set . Let , let be a neighborhood of and define . Fix and the difference vector with for the point having . Let be pointwise almost averaged at with violation and averaging constant on where denotes the th coordinate projection operator (). Then, for and with for having ,

(35)

If the mapping is in fact pointwise averaged at on (), then the set of difference vectors of is a singleton and independent of the initial point; that is, there exists such that for all .

Proof.

First observe that, since , there exists