A Multiscale Framework for Challenging Discrete Optimization

A Multiscale Framework for Challenging Discrete Optimization


Current state-of-the-art discrete optimization methods struggle behind when it comes to challenging contrast-enhancing discrete energies (i.e., favoring different labels for neighboring variables). This work suggests a multiscale approach for these challenging problems. Deriving an algebraic representation allows us to coarsen any pair-wise energy using any interpolation in a principled algebraic manner. Furthermore, we propose an energy-aware interpolation operator that efficiently exposes the multiscale landscape of the energy yielding an effective coarse-to-fine optimization scheme. Results on challenging contrast-enhancing energies show significant improvement over state-of-the-art methods.


1 Introduction

We consider discrete pair-wise energies, defined over a (weighted) graph :


where is the set of variables and is the set of edges. The sought solution is a discrete vector: , with variables each taking one of possible labels, minimizing (1).

Most energy instances of form (1) considered in the literature are smoothness preserving: that is, assigning neighboring variables to the same label costs less energy. Smoothness preserving energies include submodular [[15]], metric and semi-metric [[4]] energies. State-of-the-art optimization algorithms (e.g., TRW-S [[11]], large move [[4]] and dual decomposition (DD) [[13]]) handle smoothness preserving energies well yielding close to optimal results. However, when it comes to contrast-enhancing energies (i.e., favoring different labels for neighboring variables) existing algorithms provide poor approximations (see e.g., [[17], example 8.1], [[11], §5.1]). For contrast-enhancing energies the relaxation of TRW and DD is no longer tight and therefore they converge to a far from optimal solution.

This work suggests a multiscale approach to the optimization of contrast-enhancing energies. Coarse-to-fine exploration of the solution space allows us to effectively avoid getting stuck in local minima. Our work makes two major contributions: (i) An algebraic representation of the energy allows for a principled derivation of the coarse scale energy using any linear coarse-to-fine interpolation. (ii) An energy-aware method for computing the interpolation operator which efficiently exposes the multiscale landscape of the energy.

Multiscale approaches for discrete optimization has been proposed in the past (e.g., [[7], [14], [6], [10], [12], [9]]). However, they focus mainly on accelerating the optimization process of smoothness preserving energies. Furthermore, these methods are usually restricted to a diadic coarsening of grid-based energies, and suggest “ad-hoc” and heuristic derivation of the coarse-scale energy (e.g., [[10], §3]). In contrast, our framework suggests a principled derivation of coarse scale energy using a novel energy-aware interpolation yielding low energy solutions.

2 Multiscale Energy Pyramid

Our algebraic representation requires the substitution of vector in (1) with an equivalent binary matrix representation . The rows of correspond to the variables, and the columns corresponds to labels: iff variable is labeled “” (). Expressing the energy (1) using yields a quadratic representation:

s.t. (3)

where , s.t. , and s.t. , . An energy over variables with labels is now parameterized by .

Let be the fine scale energy. We wish to generate a coarser representation with fewer variables . This representation approximates using fewer variables: with only rows.

An interpolation matrix s.t. , maps coarse assignment to fine assignment . For any fine assignment that can be approximated by a coarse assignment , i.e., , we can write eq. (2):


We have generated a coarse energy parameterized by that approximates the fine energy . This coarse energy is of the same form as the original energy allowing us to apply the coarsening procedure recursively to construct an energy pyramid.

Our principled algebraic representation allows us to perform label coarsening in a similar manner. Looking at a different interpolation matrix , we interpolate a coarse solution by . This time the interpolation matrix acts on the labels, i.e., the columns of . The coarse labeling matrix has the same number of rows (variables), but fewer columns (labels). Coarsening the labels yields:


Again, we end up with the same type of energy, but this time it is defined over a smaller number of discrete labels: , where and .

Equations (4) and (5) encapsulate one of our key contributions: Constructing an energy pyramid depends only on . For any interpolation it is straightforward to derive the coarse-scale energy in a principled manner. But what is an appropriate interpolation?

3 Energy-aware Interpolation

The effectiveness of the multiscale approximation of (4) and (5) heavily depends on the interpolation matrix ( resp.). The matrix can be interpreted as an operator that aggregates fine-scale variables into coarse ones (Fig. 1). Aggregating fine variables and into a coarser one excludes from the search space all assignments for which . This aggregation is undesired if assigning and to different labels yields low energy. However, when variables and are in agreement under the energy (i.e., assignments with yield low energy), aggregating them together allows for efficient exploration of low energy assignments. A desired interpolation aggregates and when and are in agreement under the energy.



Figure 1: Interpolation as soft variable aggregation: fine variables 1, 2, 3 and 4 are softly aggregated into coarse variables 1 and 2. For example, fine variable 1 is a convex combination of of 1 and of 2. Hard aggregation is a special case where is a binary matrix. In that case each fine variable is influenced by exactly one coarse variable.

To estimate these agreements we empirically generate several samples with relatively low energy, and measure the label agreement between neighboring variables and in these samples. We use Iterated Conditional Modes (ICM) [[3]] to obtain locally low energy assignments. This procedure may be interpreted as Gibbs sampling from the Gibbs distribution at the limit (i.e., the “zero-temperature” limit). Performing ICM iterations with random restarts provides us with samples . The disagreement between neighboring variable and is estimated as , where is the label of variable in the sample. Their agreement is then given by , with .

Using the variable agreements, , we follow the Algebraic Multigrid (AMG) method of [[5]] to first determine the set of coarse scale variables and then construct an interpolation matrix that softly aggregates fine scale variables according to their agreement with the coarse ones.

We begin by selecting a set of coarse representative variables , such that every variable in is in agreement with . A variable is considered in agreement with if . That is, every variable in is either in or is in agreement with other variables in , and thus well represented in the coarse scale.

We perform this selection greedily and sequentially, starting with adding to if it is not yet in agreement with . The parameter affects the coarsening rate, i.e., the ratio , smaller results in a lower ratio.

At the end of this process we have a set of coarse representatives . The interpolation matrix is then defined by:


Where is the coarse index of the variable whose fine index is (in Fig. 1: and ).

We further prune rows of leaving only maximal entries. Each row is then normalized to sum to 1. Throughout our experiments we use and for computing .

4 A Unified Discrete Multiscale Framework

Given an energy at scale , our framework first works fine-to-coarse to compute interpolation matrices that construct the “energy pyramid”: . Typically we reduce the number of variables by a factor of between consecutive levels, resulting with less than variables at the coarsest scale. Since there are very few degrees of freedom at the coarsest scale ICM1 is likely to obtain a low-energy coarse solution. Then, at each scale the coarse solution is interpolated to a finer scale : . At the finer scale serves as a good initialization for ICM (fractional solutions are rounded). These two steps of interpolation followed by refinement are repeated for all scales from coarse to fine.

Our energy-aware interpolation and ICM play complementary roles in this multiscale framework. ICM makes fine scale local refinements of a given labeling, while the energy-aware interpolation makes coarse grouping of variables to expose global behavior of the energy. In a sense, ICM is a discrete equivalent to the continuous Gauss-Seidel relaxation used in continuous domain multiscale schemes.

5 Experimental Results

We evaluated our multiscale framework on challenging contrast enhancing synthetic, as well as on co-clustering energies. We follow the protocol of [[16]] that uses the lower bound as a baseline for comparing performance of different optimization methods on different energies. We report the ratio between the resulting energy and the lower bound (in percents), closer to is better2.

Ours single scale
Ours single scale
Table 2: Co-clustering results: Baseline for comparison are state-of-the-art results of [[8]]. (a) We report our results as percent of the baseline: smaller is better, lower than even outperforms state-of-the-art. (b) We also report the fraction of energies for which our multiscale framework outperform state-of-the-art.
Table 1: Synthetic results: Showing percent of achieved energy value relative to the lower bound computed by TRW-S (closer to is better) for ICM and TRW-S for varying strengths of the pair-wise term (, stronger harder to optimize.)

Synthetic:  We begin with synthetic contrast-enhancing energies defined over a 4-connected grid graph of size (), and labels. The unary term . The pair-wise term () and . The parameter controls the relative strength of the pair-wise term, stronger (i.e., larger ) results with energies more difficult to optimize (see [[11]]). The resulting synthetic energies are contrast-enhancing (since may become negative). Table 2 shows results, averaged over 100 experiments. Using our multiscale framework to perform coarse-to-fine optimization of the energy yields significantly lower energies than single-scale methods used (ICM and TRW-S).

Co-clustering (Correlation-Clustering):  The problem of co-clustering addresses the matching of superpixels within and across frames in a video sequence. Following [[2], §6.2], we treat co-clustering as a minimization of a discrete Potts energy adaptively adjusting the number of labels. The resulting energies are contrast-enhancing (with some ), have no underlying regular grid, no data term, and are very challenging to optimize. We obtained 77 co-clustering energies, courtesy of [[8]], used in their experiments. Table 2 compares our discrete multiscale framework to the state-of-the-art results of [[8]] obtained by applying specially tailored convex relaxation method. Our multiscale framework improves state-of-the-art for this family of challenging energies and significantly outperforms TRW-S.

6 Extensions

It is rather straightforward to extend our framework to handle energies with different for every pair . Moreover, higher order potentials can also be considered using the same algebraic representation. A detailed derivation may be found in [[1]].


We would like to thank Irad Yavneh, Maria Zontak and Daniel Glasner for their insightful remarks and discussions. Special thanks go to Michal Irani for her exceptional encouragement and support.


  1. Our framework is not restricted to ICM and may utilize other single-scale optimization algorithms.
  2. Matlab implementation is available at: www.wisdom.weizmann.ac.il/~bagon/matlab.html


  1. S. Bagon. Discrete Energy Minimization, beyond Submodular: Applications and Approximations. PhD thesis, Weizmann Institute of Science, http://arxiv.org/abs/1210.7362, 2012.
  2. S. Bagon and M. Galun. Large scale correlation clustering optimization. arXiv, 2011.
  3. J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 1986.
  4. Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 2002.
  5. A. Brandt. Algebraic multigrid theory: The symmetric case. Applied Mathematics and Computation, 1986.
  6. P. Felzenszwalb and D. Huttenlocher. Efficient belief propagation for early vision. IJCV, 2006.
  7. B. Gidas. A renormalization group approach to image processing problems. PAMI, 1989.
  8. D. Glasner, S. N. Vitaladevuni, and R. Basri. Contour-based joint clustering of multiple segmentations. In CVPR, 2011.
  9. T. Kim, S. Nowozin, P. Kohli, and C. Yoo. Variable grouping for energy minimization. In CVPR, 2011.
  10. P. Kohli, V. Lempitsky, and C. Rother. Uncertainty driven multiscale optimization. In DAGM, 2010.
  11. V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006.
  12. N. Komodakis. Towards more efficient and effective LP-based algorithms for MRF optimization. In ECCV, 2010.
  13. N. Komodakis, N. Paragios, and G. Tziritas. MRF energy minimization and beyond via dual decomposition. PAMI, 2011.
  14. P. Pérez and F. Heitz. Restriction of a markov random field on a graph and multiresolution statistical image modeling. IEEE Tran. on Inf. Theory, 1996.
  15. D. Schlesinger and B. Flach. Transforming an arbitrary minsum problem into a binary one. Technical report, TU, Fak. Informatik, 2006.
  16. R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for markov random fields with smoothness-based priors. PAMI, 2008.
  17. M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on trees: message-passing and linear programming. Information Theory, IEEE Transactions on, 2005.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description