Local convergence for alternating and averaged nonconvex projections

Local convergence for alternating and averaged nonconvex projections

A.S. Lewis ORIE, Cornell University, Ithaca, NY 14853, U.S.A. aslewis@orie.cornell.edu people.orie.cornell.edu/~aslewis. Research supported in part by National Science Foundation Grant DMS-0504032.    D.R. Luke Department of Mathematical Sciences, University of Delaware. rluke@math.udel.edu    J. Malick CNRS, Lab. Jean Kunztmann, University of Grenoble. jerome.malick@inria.fr
Abstract

The idea of a finite collection of closed sets having “strongly regular intersection” at a given point is crucial in variational analysis. We show that this central theoretical tool also has striking algorithmic consequences. Specifically, we consider the case of two sets, one of which we assume to be suitably “regular” (special cases being convex sets, smooth manifolds, or feasible regions satisfying the Mangasarian-Fromovitz constraint qualification). We then prove that von Neumann’s method of “alternating projections” converges locally to a point in the intersection, at a linear rate associated with a modulus of regularity. As a consequence, in the case of several arbitrary closed sets having strongly regular intersection at some point, the method of “averaged projections” converges locally at a linear rate to a point in the intersection. Inexact versions of both algorithms also converge linearly.

Key words: alternating projections, averaged projections, linear convergence, metric regularity, distance to ill-posedness, variational analysis, nonconvex, extremal principle, prox-regularity

AMS 2000 Subject Classification: 49M20, 65K10, 90C30

1 Introduction

An important theme in computational mathematics is the relationship between the “conditioning” of a problem instance and the speed of convergence of iterative solution algorithms on that instance. A classical example is the method of conjugate gradients for solving a positive definite system of linear equations: we can bound the linear convergence rate in terms of the relative condition number of the associated matrix. More generally, Renegar [33, 32, 34] showed that the rate of convergence of interior-point methods for conic convex programming can be bounded in terms of the “distance to ill-posedness” of the program.

In studying the convergence of iterative algorithms for nonconvex minimization problems or nonmonotone variational inequalities, we must content ourselves with a local theory. A suitable analogue of the distance to ill-posedness is then the notion of “metric regularity”, fundamental in variational analysis. Loosely speaking, a generalized equation, such as a system of inequalities, for example, is metrically regular when, locally, we can bound the distance from a trial solution to an exact solution by a constant multiple of the error in the equation generated by the trial solution. The constant needed is called the “regularity modulus”, and its reciprocal has a natural interpretation as a distance to ill-posedness for the equation [15].

This philosophy suggests understanding the speed of convergence of algorithms for solving generalized equations in terms of the regularity modulus at a solution. Recent literature focuses in particular on the proximal point algorithm (see for example [29, 22, 1]). A unified approach to the relationship between metric regularity and the linear convergence of a family of conceptual algorithms appears in [23].

We here study a very basic algorithm for a very basic problem. We consider the problem of finding a point in the intersection of several closed sets, using the method of averaged projections: at each step, we project the current iterate onto each set, and average the results to obtain the next iterate. Global convergence of this method in the case of two closed convex sets was proved in 1969 in [2]. In this work we show, in complete generality, that this method converges locally to a point in the intersection of the sets, at a linear rate governed by an associated regularity modulus. Our linear convergence proof is elementary: although we use the idea of the normal cone, we apply only the definition, and we discuss metric regularity only to illuminate the rate of convergence.

Our approach to the convergence of the method of averaged projections is standard [30, 4]: we identify the method with von Neumann’s alternating projection algorithm [40] on two closed sets (one of which is a linear subspace) in a suitable product space. A nice development of the classical method of alternating projections may be found in [11]. The linear convergence of the method for two closed convex sets with regular intersection was proved in [4], strengthening a classical result of [21]. Remarkably, we show that, assuming strong regularity, local linear convergence requires good geometric properties (such as convexity, smoothness, or more generally, “amenability ” or “prox-regularity”) of only one of the two sets.

One consequence of our convergence proof is an algorithmic demonstration of the “exact extremal principle” described in [26, Theorem 2.8]. This result, a unifying theme in [26], asserts that if several sets have strongly regular intersection at a point, then that point is not “locally extremal” [26]: in other words, translating the sets by small vectors cannot render the intersection empty locally. To prove this result, we simply apply the method of averaged projections, starting from the point of regular intersection. In a further section, we show that inexact versions of the method of averaged projections, closer to practical implementations, also converge linearly.

The method of averaged projections is a conceptual algorithm that might appear hard to implement on concrete nonconvex problems. However, the projection problem for some nonconvex sets is relatively easy. A good example is the set of matrices of some fixed rank: given a singular value decomposition of a matrix, projecting it onto this set is immediate. Furthermore, nonconvex iterated projection algorithms and analogous heuristics are quite popular in practice, in areas such as inverse eigenvalue problems [8, 7], pole placement [27, 42], information theory [39], low-order control design [20, 19, 28] and image processing [41, 5]). Previous convergence results on nonconvex alternating projection algorithms have been uncommon, and have either focussed on a very special case (see for example [7, 25]), or have been much weaker than for the convex case [10, 39]. For more discussion, see [25].

Our results primarily concern R-linear convergence: in other words, we show that our sequences of iterates converge, with error bounded by a geometric sequence. In a final section, we employ a completely different approach to show that the method of averaged projections, for prox-regular sets with regular intersection, has a Q-linear convergence property: each iteration guarantees a fixed rate of improvement. In a final section, we illustrate these theoretical results with an elementary numerical example coming from signal processing.

2 Notation and definitions

We begin by fixing some notation and definitions. Our underlying setting throughout this work is a Euclidean space with corresponding closed unit ball . For any point and radius , we write for the set .

Consider first two sets . A point is locally extremal [26] for this pair of sets if restricting to a neighborhood of and then translating the sets by small distances can render their intersection empty: in other words, there exists a and a sequence of vectors in such that

Clearly is not locally extremal if and only if

For recognition purposes, it is easier to study a weaker property than local extremality. Following the terminology of [24], we say the two sets have strongly regular intersection at the point if there exists a constant such that

for all points near and near . By considering the case , we see that strong regularity implies that is not locally extremal. This “primal” definition of strong regularity is often not the most convenient way to handle strong regularity, either conceptually or theoretically. By contrast, a “dual” approach, using normal cones, is very helpful.

Given a set , we define the distance function and (multivalued) projection for by

The central tool in variational analysis is the normal cone to a closed set at a point , which can be defined (see [9, 26, 35]) as

Notice two properties in particular. First,

(2.1)

Secondly, the normal cone is a “closed” multifunction: for any sequence of points in , any limit of a sequence of normals must lie in . Indeed, the definition of the normal cone is in some sense driven by these two properties: it is the smallest cone satisfying the two properties. Notice also that we have the equivalence: .

Normal cones provide an elegant alternative approach to defining strong regularity. In general, a family of closed sets has strongly regular intersection at a point , if the only solution to the system

is for . In the case , this condition can be written

and it is equivalent to our previous definition (see [24, Cor 2], for example). We also note that this condition appears throughout variational-analytic theory. For example, it guarantees the important inclusion (see [35, Theorem 6.42])

We will find it helpful to quantify the notion of strong regularity (cf. [24]). A straightforward compactness argument shows the following result.

Proposition 2.2 (quantifying strong regularity)

A collection of closed sets have strongly regular intersection at a point if and only if there exists a constant such that the following condition holds:

(2.3)

We define the condition modulus to be the infimum of all constants such that property (2.3) holds. Using the triangle and Cauchy-Schwarz inequalities, we notice that vectors always satisfy the inequality

(2.4)

which yields

(2.5)

except in the special case when (or equivalently ) for all ; in this case the condition modulus is zero.

One goal of this paper is to show that, far from being of purely analytic significance, strong regularity has central algorithmic consequences, specifically for the method of averaged projections for finding a point in the intersection . Given any initial point , the algorithm proceeds iteratively as follows:

Our main result shows, assuming only strong regularity, that providing the initial point is near , any sequence generated by the method of averaged projections converges linearly to a point in the intersection , at a rate governed by the condition modulus.

3 Strong and metric regularity

The notion of strong regularity is well-known to be closely related to another central idea in variational analysis: “metric regularity”. A concise summary of the relationships between a variety of regular intersection properties and metric regularity appears in [24]. We summarize the relevant ideas here.

Consider a set-valued mapping , where is a second Euclidean space. The inverse mapping is defined by

For vectors and , we say is metrically regular at for if there exists a constant such that all vectors close to and vectors close to satisfy

Intuitively, this inequality gives a local linear bound for the distance to a solution of the generalized equation (where the vector is given and we seek the unknown vector ), in terms of the the distance from to the set . The infimum of all such constants is called the modulus of metric regularity of at for , denoted . This modulus is a measure of the sensitivity or “conditioning” of the generalized equation . To take one simple example, if is a single-valued linear map, the modulus of regularity is the reciprocal of its smallest singular value. In general, variational analysis provides a powerful calculus for computing the regularity modulus. In particular, we have the following formula [35, Thm 9.43]:

(3.1)

where denotes the “coderivative”.

We now study these ideas for a particular mapping, highlighting the connections between metric and strong regularity. As in the previous section, consider closed sets and a point . We endow the space with the inner product

and define set-valued mapping by

Then the inverse mapping is given by

and finding a point in the intersection is equivalent to finding a solution of the generalized equation . By definition, the mapping is metrically regular at for if and only if there is a constant such that the following strong metric inequality holds:

(3.2)

Furthermore, the regularity modulus is just the infimum of those constants such that inequality (3.2) holds.

To compute the coderivative , we decompose the mapping as , where, for points ,

The calculus rule [35, 10.43] yields . Then, by definition,

and since , we deduce

and hence

From the coderivative formula (3.1) we now obtain

(3.3)

where, following the usual convention, we interpret the right-hand side as if (or equivalently ) for all . Thus the regularity modulus agrees exactly with the condition modulus that we defined in the previous section:

Furthermore, as is well-known [24], strong regularity is equivalent to the strong metric inequality (3.2).

4 Clarke regularity and refinements

Even more central than strong regularity in variational analysis is the concept of “Clarke regularity”. In this section we study a slight refinement, crucial for our development. In the interest of maintaining as elementary approach as possible, we use the following geometric definition of Clarke regularity.

Definition 4.1 (Clarke regularity)

A closed set is Clarke regular at a point if, given any , any two points near with , and any point , satisfy

In other words, the angle between the vectors and , whenever it is defined, cannot be much less than when the points and are near .

Remark 4.2

This property is equivalent to the standard notion of Clarke regularity. To see this, suppose the property in the definition holds. Consider any unit vector , and any unit “tangent direction” to at . By definition, there exists a sequences , , and with , such that

By assumption, given any , for all large the angle between the two vectors on the left-hand side is at least , and hence so is the angle between and . Thus , so Clarke regularity follows, by [35, Cor 6.29]. Conversely, if the property described in the definition fails, then for some and some sequences , , and with , the angle between the unit vectors

(4.3)

is less than . Then any cluster points and of the two sequences (4.3) are respectively an element of and a tangent direction to at , and satisfy , contradicting Clarke regularity.

The property we need for our development is an apparently-slight modification of Clarke regularity.

Definition 4.4 (super-regularity)

A closed set is super-regular at a point if, given any , any two points near with , and any point , satisfy

In other words, then angle between the vectors and , whenever it is defined, cannot be much less than when the points and are near .

An equivalent statement involves the normal cone.

Proposition 4.5 (super-regularity and normal angles)

A closed set is super-regular at a point if and only if, for all , the inequality

holds for all points near and all normal vectors .

Proof  Super-regularity follows immediately from the normal cone property describe in the proposition, by property (2.1). Conversely, suppose the normal cone property fails, so for some and sequences of distinct points approaching and unit normal vectors , we have, for all ,

Fix an index . By definition of the normal cone, there exist sequences of distinct points and such that

Since , we must have, for all large ,

Choose sufficiently large to ensure both the above inequality and the inequality , and then define points and .

We now have sequences of points approaching with , and , and satisfying

Hence is not super-regular at .

Super-regularity is a strictly stronger property than Clarke regularity, as the following result and example make clear.

Corollary 4.6 (super-regularity implies Clarke regularity)

If a closed set is super-regular at a point, then it is also Clarke regular there.

Proof  Suppose the point in question is . Fix any , and set in Proposition 4.5. Then clearly any unit tangent direction to at and any unit normal vector satisfy . Since was arbitrary, in fact , so Clarke regularity follows by [35, Cor 6.29].

Example 4.7

Consider the following function , taken from an example in [37]:

The epigraph of this function is Clarke regular at , but it is not hard to see that it is not super-regular there. Indeed, a minor refinement of this example (smoothing the set slightly close to the nonsmooth points and ) shows that a set can be everywhere Clarke regular, and yet not super-regular.

Super-regularity is a common property: indeed, it is implied by two well-known properties, that we discuss next. Following [35], we say that a set is amenable at a point when there exists a neighborhood of , a mapping , and a closed convex set containing , and satisfying the constraint qualification

(4.8)

such that points near lie in exactly when . In particular, if is defined by equality and inequality constraints and the Mangasarian-Fromovitz constraint qualification holds at , then is amenable at .

Proposition 4.9 (amenable implies super-regular)

If a closed set is amenable at a point in , then it is super-regular there.

Proof  Suppose the result fails at some point . Assume as in the definition of amenability that, in a neighborhood of , the set is identical with the inverse image , where the map and the closed convex set satisfy the condition (4.8). Then by definition, for some , there are sequences of points and unit normal vectors satisfying

It is easy to check the condition

for all large , since otherwise we contradict assumption (4.8). Consequently, using the standard chain rule from [35], we deduce

so there are normal vectors such that . The sequence must be bounded, since otherwise, by taking a subsequence, we could suppose and approaches some unit vector , leading to the contradiction

For all large , we now have

and by convexity we know

Adding these two inequalities gives

But as , the left-hand side is , since the sequence is bounded and is . This contradiction completes the proof.

A rather different refinement of Clarke regularity is the notion of “prox-regularity”. Following [31, Thm 1.3], we call a set is prox-regular at a point if the projection mapping is single-valued around . (In this case, clearly must be locally closed around .) For example, if, in the definition of an amenable set that we gave earlier, we strengthen our assumption on the map to be rather than just , the resulting set must be prox-regular. On the other hand, the set

is amenable at the point (and hence super-regular there), but is not prox-regular there.

Proposition 4.10 (prox-regular implies super-regular)

If a closed set is prox-regular at a point in , then it is super-regular there.

Proof  If the results fails at , then for some constant , there exist sequences of points converging to the point , and a sequence of normal vectors satisfying the inequality

By [31, Proposition 1.2], there exist constants such that

for all large . This gives a contradiction, since eventually.

Super-regularity is related to various other notions in the literature. We end this section with a brief digression to discuss these relationships. First note the following equivalent definition, which is an immediate consequence of Proposition 4.5, and which gives an alternate proof of Proposition 4.10 via “hypomonotonicity” of the truncated normal cone mapping for prox-regular sets [31, Thm 1.3].

Corollary 4.11 (approximate monotonicity)

A closed set is super-regular at a point if and only if, for all , the inequality

holds for all points near and all normal vectors and .

If we replace the normal cone in the property described in the result above by its convex hull, the “Clarke normal cone”, we obtain a stronger property, called “subsmoothness” in [3]. Similar proofs to those above show that, like super-regularity, subsmoothness is a consequence of either amenability or prox-regularity. However, submoothness is strictly stronger than super-regularity. To see this, consider the graph of the function defined by the following properties: , for all integers , is linear on each interval , and for all . The graph of is super-regular at , but is not subsmooth there.

In a certain sense, however, the distinction between subsmoothness and super-regularity is slight. Suppose the set is super-regular at every point in , for some open set . Since super-regularity implies Clarke regularity, the normal cone and Clarke normal cone coincide throughout , and hence is also subsmooth throughout . In other words, “local” super regularity coincides with “local” subsmoothness, which in turn, by [3, Thm 3.16] coincides with the “first order Shapiro property” [36] (also called “near convexity” in [38]) holding locally.

5 Alternating projections with nonconvexity

Having reviewed or developed over the last few sections the key variational-analytic properties that we need, we now turn to projection algorithms. In this section we develop our convergence analysis of the method of alternating projections. The following result is our basic tool, guaranteeing conditions under which the method of alternating projections converges linearly. For flexibility, we state it in a rather technical manner. For clarity, we point out afterward that the two main conditions, (5.2) and (5.3), are guaranteed in applications via assumptions of strong regularity and super-regularity (or in particular, amenability or prox-regularity) respectively.

Theorem 5.1 (linear convergence of alternating projections)

Consider the closed sets , and a point . Fix any constant . Suppose for some constant , the following condition holds:

(5.2)

Suppose furthermore for some constant the following condition holds:

(5.3)

Define a constant . Then for any initial point satisfying , any sequence of alternating projections on the sets and ,

must converge with R-linear rate to a point satisfying the inequality .

Proof  First note, by the definition of the projections we have

(5.4)

Clearly we therefore have

(5.5)

We next claim

(5.6)

To see this, note that if , the result is trivial, and if then so again the result is trivial. Otherwise, we have

while

Furthermore, using inequality (5.4), the left-hand side of the implication (5.6) ensures

Hence, by assumption (5.2) we deduce

so

On the other hand, by assumption (5.3) we know

using inequality (5.5). Adding this inequality to the previous inequality then gives the right-hand side of (5.6), as desired.

Now let . We will show by induction the inequalities

(5.7)
(5.8)
(5.9)

Consider first the case . Since and , we deduce , which is inequality (5.8). Furthermore,

which shows inequality (5.7). Finally, since and , the implication (5.6) shows

which is inequality (5.9).

For the induction step, suppose inequalities (5.7), (5.8), and (5.9) all hold for some . Inequalities (5.4) and (5.9) imply

(5.10)

We also have, using inequalities (5.10), (5.9), and (5.7)

so

(5.11)

Now implication (5.6) with replaced by implies

and using inequality (5.10) we deduce

(5.12)

Since inequalities (5.11), (5.10), and (5.12) are exactly inequalities (5.7), (5.8), and (5.9) with replaced by , the induction step is complete and our claim follows.

We can now easily check that the sequence is Cauchy and therefore converges. To see this, note for any integer and any integer , we have

so

and a similar argument shows

(5.13)

Hence converges to some point , and for all we have

(5.14)

We deduce that the limit lies in the intersection and satisfies the inequality , and furthermore that the convergence is R-linear with rate , which completes the proof.

To apply Theorem 5.1 to alternating projections between a closed and a super-regular set, we make use of the key geometric property of super-regular sets (Proposition 4.5): at any point near a point where a set is super-regular the angle between any normal vector and the direction to any nearby point in the set cannot be much less than .

We can now prove our key result.

Theorem 5.15 (alternating projections with a super-regular set)

Consider closed sets and a point . Suppose is super-regular at (as holds, for example, if it is amenable or prox-regular there). Suppose furthermore that and have strongly regular intersection at : that is, the condition

holds, or equivalently, the constant

(5.16)

is strictly less than one. Fix any constant . Then, for any initial point close to , any sequence of iterated projections