Relative Interior Rule in Block-Coordinate Minimization

# Relative Interior Rule in Block-Coordinate Minimization

Tomáš Werner, Daniel Průša
Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University

## 1 Introduction

(Block-)coordinate minimization is an iterative optimization method which in every iteration finds a global minimum of the objective over a variable or a subset of variables, while keeping the remaining variables constant. For some problems, coordinate minimization converges to a global minimum. This class includes unconstrained problems with convex differentiable objective function [1, §2.7] or convex objective function whose non-differentiable part is separable [11]. For general convex problems, the method need not converge to a global minimum but only to a local one, where ‘local’ is meant with respect to moves along (subsets of) coordinates.

For large-scale non-differentiable convex problems, (block-)coordinate minimization can be an acceptable option despite its inability to converge to a global minimum. An example is a class of methods to solve the linear programming relaxation of the discrete energy minimization problem (also known as MAP inference in graphical models). These methods apply (block-)coordinate minimization to various forms of the dual linear programming relaxation. Examples are max-sum diffusion [7, 9, 12], TRW-S [5], MPLP [2], and SRMP [6]. For many problems from computer vision, it has been observed [10, 4] that TRW-S converges faster than the competing methods and its fixed points are often not far from global minima, especially for large sparse instances.

When block-coordinate minimization is applied to a general convex problem, in every iteration the minimizer over the current coordinate block need not be unique and therefore a single minimizer must be chosen. These choices can significantly affect the quality of the achieved local minima. We propose that this minimizer should always be chosen from the relative interior of the set of all minimizers over the current block. Indeed, it can be easily verified that max-sum diffusion satisfies this condition. We show that block-coordinate minimization methods satisfying this condition are not worse, in a certain precise sense, than any other block-coordinate minimization methods.

## 2 Main Results

For brevity, we will use

 M(X,f)={x∈X∣f(x)≤f(y)∀y∈X} (1)

to denote the set of all global minima of a function on a set .

Suppose we want to minimize a convex function on a closed convex set where  is a finite-dimensional vector space over . For that, we consider a coordinate-free generalization of block-coordinate minimization. Let be a finite set of subspaces of , which represent search directions. Having an estimate  of the minimum, the next estimate  is always chosen such that

 xn+1∈M(X∩(xn+In),f) (2)

for some . Clearly, . A point satisfying

 x∈M(X∩(x+I),f)∀I∈I (3)

has the property that  cannot be improved by moving from  within  along any single subspace from . We call such a point a local minimum of  on  with respect to . When and/or is clear from context, we will speak only about a local minimum of  on  or just a local minimum. Note that the term ‘local minimum’ is used here in a different meaning than is usual in optimization and calculus.

Coordinate minimization and block-coordinate minimization are special cases of this formulation. In the former, we have and where  denotes the th vector of the standard basis of . In the latter, we have and each element of  is the span of a subset of the standard basis of .

Recall [8, 3] that the relative interior of a convex set , denoted by , is the topological interior of  with respect to the affine hull of . We propose to modify condition (2) such that the minimum is always chosen from the relative interior of the current optimal set. Thus, (2) changes to

 xn+1∈riM(X∩(xn+In),f). (4)

A point  always exists because the relative interior of any non-empty convex set is non-empty. We call a point that satisfies

 x∈riM(X∩(x+I),f)∀I∈I (5)

an interior local minimum of  on  with respect to . Clearly, every interior local minimum is a local minimum.

In our analysis, another type of local minimum will naturally appear: pre-interior local minimum. It will be precisely defined later; informally, it is only a finite number of iterations (4) away from an interior local minimum.

Consider a sequence satisfying (2) resp. (4), where denotes the positive integers. To ensure that each search direction is always visited again after a finite number of iterations, we assume that the sequence contains each element of  an infinite number of times. For brevity, we will often write only and instead of and . The following facts, proved in the sequel, show that methods satisfying (4) are not worse, in a precise sense, than methods satisfying (2):

• For every sequence satisfying (4), if  is an interior local minimum then  is an interior local minimum for all .

• For every sequence satisfying (4), if  is a pre-interior local minimum then  is an interior local minimum for some .

• For every sequence satisfying (2), if  is a pre-interior local minimum then for all .

• For every sequence satisfying (4), if  is not a pre-interior local minimum then for some .

To illustrate this, consider an example of coordinate minimization applied on a simple linear program (see the picture below). Let , , (i.e.,  is constant vertically and decreases to the right), and . The set of global minima is the line segment , the set of local minima is , the set of interior local minima is , and the set of pre-interior local minima is . The thick polyline shows the first few points of a sequence satisfying (4), where the sequence alternates between the two subspaces from . When starting from any point , every sequence satisfying (4) leaves any non-interior local minimum after a finite number of iterations, while improving the objective function. Informally, this is because when the objective cannot be decreased by moving along any single subspace from , condition (4) at least enforces the point to move to a face of  of a higher dimension (if such a face exists), providing thus ‘more room’ to hopefully decrease the objective in future iterations. In contrast, condition (2) allows a sequence to stay in any (possibly non-interior) local minimum forever. Of course, when starting from , every sequence satisfying (2) will stay in  forever. This just confirms the well-known fact that for some non-smooth convex problems, coordinate minimization can get stuck in a point that is not a global minimum.

Moreover, we prove the following convergence result: if the choices in (4) are fixed such that is a continuous function of , the elements of  are visited in a cyclic order, and the sequence is bounded, then the distances of  from the set of pre-interior local minima converges to zero.

## 3 Global Minima Are Local Minima

As a warm-up, we prove one expected property of local minima: every element of (global minimum) is a local minimum and every element of (which could be called interior global minimum) is an interior local minimum. Noting that global minima are local minima with respect to , we actually prove, in Theorem 2 below, a more general fact. For sets and of subspaces of , we say that dominates  if for every there is such that .

###### Lemma 1.

Let and . Let . Then .

###### Proof.

To prove , we need to prove that implies . This is obvious because if holds for all , then it holds for all .

To prove , we need to prove that and imply . For that, it suffices to show that and imply that for all . This is true, because for some would imply . ∎

Now we will use the property of the relative interior [8, 3] that for any convex sets ,

 riX∩riY≠∅⟹riX∩riY=ri(X∩Y). (6)
###### Theorem 2.

Let be a convex set and be a convex function. Let and be finite sets of subspaces of  such that dominates .

• Every local minimum with respect to  is a local minimum with respect to .

• Every interior local minimum with respect to  is an interior local minimum with respect to .

###### Proof.

We just need to consider two subspaces such that .

• Noting that , by Lemma 1 we have .

• Noting that , by (6) we have . ∎

## 4 Linear Objective Function

Using the epigraph form, the minimization of a convex function on a closed convex set can be transformed to the minimization of a linear function on a closed convex set. Therefore, further in §4 we assume that  is closed convex and  is linear. We will return to the case of non-linear convex  later in §5.

For , we denote

 [x,y]=conv{x,y}={(1−α)x+y∣0≤α≤1}. (7)

For this is a line segment, for it is a singleton. It holds that

 ri[x,y]={(1−α)x+y∣0<α<1}. (8)

For we have , for we have .

We recall basic facts about faces of a convex set [8, 3]. A face of a convex set is a convex set such that every line segment from  whose relative interior intersects  lies in , i.e.,

 x,y∈X,F∩ri[x,y]≠∅⟹x,y∈F. (9)

The set of all faces of a closed convex set partially ordered by inclusion is a complete lattice, in particular it is closed under (possibly infinite) intersections. For a point , let denote the intersection of all faces (equivalently, the smallest face) of  that contain . For every , {IEEEeqnarray}rCrClCl y&∈& F(X,x) & ⟺ & F(X,y)&⊆& F(X,x) , \IEEEyesnumber\IEEEyessubnumber*
y&∈& riF(X,x) & ⟺ & F(X,y)&=&F(X,x) .
y&∈& rbF(X,x) & ⟺ & F(X,y)&⊊& F(X,x) , where denotes the relative boundary of a closed convex set . Equivalence (9) shows that is in fact the unique face of  having  in its relative interior. Note that (9) follows from (9) and (9).

The following simple lemmas will be used several times later:

###### Lemma 3.

Let be a convex set. We have iff for every there exists such that .

###### Proof.

The ‘only-if’ direction is immediate from the definition of relative interior. For the ‘if’ direction see, e.g., [8, Theorem 6.4]. ∎

###### Lemma 4.

Let be closed convex sets such that . Let . Then {IEEEeqnarray}rCrCrCr y && Y &  & y && F(X,x) \IEEEyesnumber\IEEEyessubnumber*
y &
& riY &  & y && riF(X,x)
y &
& rbY &  & y && rbF(X,x)

###### Proof.

To see (4), let and . Thus, by Lemma 3, there is such that . Since and , the definition of face yields . Implications (4) and (4) follow from (4) and (9). ∎

Let and . Then .

###### Proof.

Let be such that (note that if then is unique, otherwise we can choose any ). Let , hence . Subtracting the two equations yields , hence . ∎

The picture illustrates Lemma 5 for the points in a general position (i.e., not collinear):

### 4.1 Structure of the Set of Local Minima

It is well-known that the set of global minima of a linear function  on a closed convex set  is an (exposed) face of . We show that local resp. interior local minima also cluster to faces of . Moreover, similarly as the set of all faces of , we show that the set of faces of  containing local resp. interior local minima are closed under intersections.

In the theorems in the rest of this section, the letter  will always denote a subspace of .

Let and . Then .

###### Proof.

Let . We need to prove that . Since , by Lemma 3 there is such that . By Lemma 5, there is a point

 v∈ri[u,z]∩ri[x,x+z−y].

Since , from convexity of  we have . Since , we have . Since , we thus have , hence . Since , by linearity of  we have . ∎

###### Corollary 7.

If is a local minimum, then every point of is a local minimum.

But notice that if and  are local minima such that , then we can have .

Let and . Then .

###### Proof.

Let . By Theorem 6 we have , hence . Since , by Lemma 3 there is such that . By Lemma 5, there is

 v∈ri[u,z]∩ri[x,x+z−y].

Since and , we have . Since , by linearity of  we have , hence . Lemma 4 yields . Since , the definition of face yields . ∎

Let . Then .

###### Proof.

Let . Therefore . Moreover, by Lemma 3 there is such that . Since , we have . By linearity of  we have , therefore . By Lemma 3, . ∎

###### Theorem 10.

Let . Let for all . Let . Then .

###### Proof.

Since is a face of , we have iff . By Theorem 6, . By Lemma 8, . By Lemma 9, . ∎

###### Corollary 11.

Let . If every point from  is an interior local minimum, then every relative interior point of the face is an interior local minimum.

###### Corollary 12.

If  is an interior local minimum, then every point of is an interior local minimum.

###### Proof.

This is Corollary 11 for . ∎

The results from this section lead to the following definitions and facts:

• We call a face of  a local minima face if all its points are local minima. Since the set of faces of  is closed under intersection, it follows from Corollary 7 that the set of all local minima faces of  (assuming fixed  and ) is closed under intersections. Thus, it is a complete meet-semilattice (but not a lattice, because it need not have the greatest element).

• We call a face of  an interior local minima face if all its relative interior points are interior local minima. Corollary 11 shows that the set of all interior local minima faces of  (assuming fixed  and ) is closed under intersections. Thus, it again is a complete meet-semilattice.

We finally define one more type of local minimum: a point  is a pre-interior local minimum if for some interior local minimum . Motivation for introducing this concept will become clear later.

### 4.2 The Effect of Iterations

Here we prove properties of sequences satisfying conditions (2) resp. (4) under various assumptions.

###### Theorem 13.

Let be a sequence satisfying (4) such that  is an interior local minimum. Then for all  we have , , and  is an interior local minimum.

###### Proof.

Suppose that for some ,  is an interior local minimum. Considering (4), by Lemma 4 we thus have . By Corollary 11,  is an interior local minimum. Since , we have . ∎

###### Theorem 14.

Let be a sequence satisfying (4) and for all . Then for all  we have , there exists  such that  is an interior local minimum, and  is a pre-interior local minimum.

###### Proof.

Combining with (4) yields . Thus, for every  there are two possibilities:

• If then, by Lemma 4, we have . By Theorem 10, we have for all such that .

• If then, by Lemma 4, we have .

In either case, we have . Moreover, if  is not an interior local minimum for some , then after some finite number  of iterations the second case occurs, therefore . But this implies . If  were not an interior local minimum for any , for some  we would have , which is impossible.

Since for all , the faces form a non-decreasing chain. In particular, for all . Since there is  such that  is an interior local minimum,  is a pre-interior local minimum. ∎

###### Theorem 15.

Let be a sequence satisfying (2) such that  is a pre-interior local minimum, i.e., for some interior local minimum . Then for all  we have and .

###### Proof.

We will use induction on . The claim trivially holds for . We will show that for every , implies and .

Let . By Lemma 3, there is such that . By Lemma 5, there is

 v∈ri[u,xn+1]∩ri[x,x+xn+1−xn].

Since , we have . Since , we have . Since , this implies . Since , by linearity of  we have . But from (2) we have also , hence . This in turn implies . Since , we have . By Lemma 4, . Since and , the definition of face gives . ∎

###### Corollary 16.

Let be a sequence satisfying (4) such that  is a pre-interior local minimum. Then there exists  such that  is an interior local minimum.

###### Proof.

First apply Theorem 15 and then Theorem 14. ∎

###### Corollary 17.

Let be a sequence satisfying (4). Then  is a pre-interior local minimum iff for all .

###### Proof.

The ‘if’ direction follows from Theorem 14. The ‘only-if’ direction follows from Theorem 15. ∎

### 4.3 Convergence

So far, we have not examined the convergence properties of sequences satisfying (4). For that, we impose some additional restrictions on the sequences and . Namely, we assume that the action of every iteration is continuous and the elements of  are visited in a regular order.

Formally, we assume that for each a continuous map is given that satisfies

 pI(x)∈riM(X∩(x+I),f) (10)

for every . This map describes the action of one iteration. Let the map

 pσ=pσ(1)∘⋯∘pσ(m) (11)

denote the action of one round of iterations, in which all elements of  are visited (some possibly more than once) in the order given by a surjective map where .

In Theorem 14, the sequence is assumed to contain every element of  an infinite number of times. The form of iterations given by  gives a stronger property: each element of  is always visited again after at most  iterations. We adapt Theorem 14 to this situation. For that, we denote (i.e.,  is obtained by composing  with itself -times) where .

###### Theorem 18.

Let and . Then is an interior local minimum and  is a pre-interior local minimum.

###### Proof.

By similar arguments as in the proof of Theorem 14, for every it holds that:

• If  is an interior local minimum, then .

• If  is not an interior local minimum, then , hence .

Therefore, if and were not an interior local minimum, we would have , a contradiction. Since ,  is a pre-interior local minimum. ∎

Starting from some , we will examine convergence properties of the sequence defined by . Recall that a limit point (also known as an accumulation point or cluster point) of a sequence is the limit point of its converging subsequence.

###### Theorem 19.

Let . Let the sequence be bounded. Then every limit point of the sequence satisfies .

###### Proof.

Let us denote . Let  be a limit point of the sequence , i.e., for some strictly increasing function we have

 limn→∞xk(n)=y. (12)

Since  is a composition of a finite number of continuous maps, it is continuous. Applying  to (12) yields

 p(limn→∞xk(n))=limn→∞p(xk(n))=limn→∞xk(n)+1=p(y). (13)

We show that

 f(y)=limn→∞f(xk(n))=limn→∞f(xn)=limn→∞f(xk(n)+1)=f(p(y)). (14)

The first and last equality holds by applying the continuous function  to equality (12) and (13), respectively. The second and third equality hold because the sequence is convergent (being bounded and non-increasing), hence every its subsequence converges to the same point. ∎

###### Corollary 20.

Let . Let the sequence be bounded. Then every limit point of the sequence is a pre-interior local minimum.

###### Proof.

Combine Theorems 19 and 18. ∎

Let be a metric on . Denote the distance of a point from a set as

 d(X,x)=infy∈Xd(x,y). (15)
###### Lemma 21.

For any , the function is Lipschitz, hence continuous.

###### Proof.

For all and we have . Taking over  on the right gives . Swapping and  gives . ∎

###### Lemma 22.

Let be closed, bounded, and continuous. Then is bounded.

###### Proof.

By monotonicity of closure, . The set is compact (closed and bounded), therefore is also compact. Hence is bounded. ∎

###### Lemma 23.

A sequence in a metric space is convergent iff it is bounded and has a unique limit point.

###### Proof.

The ‘only-if’ direction is obvious. To see the ‘if’ direction, let  be a limit point of a bounded sequence . For contradiction, suppose does not converge to . Then for some , for every  there is such that . So there is a subsequence such that for all . As  is bounded, by Bolzano-Weierstrass it has a convergent subsequence, . But clearly cannot converge to . ∎

###### Theorem 24.

Let be a bounded sequence from a closed set . Let be such that every limit point of is in . Then .

###### Proof.

By Lemmas 21 and 22, the sequence is bounded. Thus it has a convergent subsequence, where is a subsequence of . By Lemma 23, it suffices to show that .

Being a subsequence of , the sequence is bounded. Therefore, it has a convergent subsequence, . Thus, is a limit point of . Therefore, . Applying the continuous function to this limit yields . Since the sequence is convergent, every its convergent subsequence converges to the same number. Since is one such subsequence, we have . ∎

###### Corollary 25.

Let . Let the sequence be bounded. Let  be the set of all pre-interior local minima of  on . Then .

###### Proof.

Combine Theorem 24 and Corollary 20. ∎

For the sequence to be bounded, it clearly suffices that  is bounded. But there is a weaker sufficient condition: as the sequence is non-increasing, it suffices that the set is bounded (note that is the half-space whose boundary is the contour of  passing through the initial point ).

## 5 Non-linear Objective Function

As we said, the minimization of a convex function on a convex set can be transformed to the epigraph form, which is the minimization of a linear function on a convex set. Here we show that this transformation allows us to generalize the results from §4 to non-linear convex objective functions.

The epigraph of a function is the set

 epif={(x,t)∈X×R∣f(x)≤t}. (16)

If is closed convex and  is convex, then is closed convex. We have

 minx∈Xf(x)=min(x,t)∈epift=min¯x∈epifπ(¯x) (17)

where is the linear function defined by , i.e., the projection on the -coordinate. For every we have , i.e.,  is the minimum value of  on . Moreover, {IEEEeqnarray}rCr M(X,f) ×{t} &=& M(epif,π) , \IEEEyesnumber\IEEEyessubnumber*
riM(X,f) ×{t} &=& riM(epif,π) , which can equivalently be written as {IEEEeqnarray}rCrCrCr x &∈& M(X,f) & ⟺ & (x,f(x)) &∈& M(epif,π) , \IEEEyesnumber\IEEEyessubnumber*
x &∈& riM(X,f) & ⟺ & (x,f(x)) &∈& riM(epif,π) .

The following lemma will allow us to show that the concepts of local minima and the updates (2) and (4) remain ‘the same’ if we pass to the epigraph form, provided that instead of a subspace  we use the subspace . To illustrate this, consider the case and coordinate minimization. In every iteration, we minimize over a single variable . In the epigraph form, we would minimize  subject to over the pair . Clearly, both forms are equivalent.

###### Lemma 26.

Let be convex, be convex. Let be a subspace and . Let and . Then {IEEEeqnarray}rCrCrCr y && M(X(x+I),f) &  & (y,f(y)) && M(epif(¯x+¯I),π) , \IEEEyesnumber\IEEEyessubnumber*
y &
& riM(X(x+I),f) &  & (y,f(y)) && riM(epif(¯x+¯I),π) .

###### Proof.

One can verify from (16) that for every we have

 epif∩(Y×R) =epif|X∩Y

where denotes the restriction of the function  to the set . Since , we thus have . We see that (26) are (17), applied to the function . ∎

By letting and , the lemma shows that  is an [interior] local minimum of  on  with respect to  iff is an [interior] local minimum of  on with respect to . Similarly, the results from §4.2 and §4.3 can be extended from linear to non-linear convex functions .

## References

• [1] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 2nd edition, 1999.
• [2] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In Neural Information Processing Systems, pages 553–560, 2008.
• [3] J. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Grundlehren Text Editions. Springer, 2004.
• [4] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. Intl. J. of Computer Vision, 115(2):155–184, 2015.
• [5] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(10):1568–1583, 2006.
• [6] V. Kolmogorov. A new look at reweighted message passing. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37(5), May 2015.
• [7] V. A. Kovalevsky and V. K. Koval. A diffusion algorithm for decreasing the energy of the max-sum labeling problem. Glushkov Institute of Cybernetics, Kiev, USSR. Unpublished, approx. 1975.
• [8] R. T. Rockafellar. Convex analysis. Princeton Mathematical Series. Princeton University Press, 1970.
• [9] M. I. Schlesinger and K. Antoniuk. Diffusion algorithms and structural recognition optimization problems. Cybernetics and Systems Analysis, 47:175–192, 2011.
• [10] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(6):1068–1080, 2008.
• [11] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl., 109(3):475–494, June 2001.
• [12] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(7):1165–1179, July 2007.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters