Compassionately Conservative Balanced Cuts for Image Segmentation

Compassionately Conservative Balanced Cuts for Image Segmentation


The Normalized Cut (NCut) objective function, widely used in data clustering and image segmentation, quantifies the cost of graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. However, this bias is so strong that it avoids any singleton partitions, even when vertices are very weakly connected to the rest of the graph. Motivated by the Bühler-Hein family of balanced cut costs, we propose the family of Compassionately Conservative Balanced (CCB) Cut costs, which are indexed by a parameter that can be used to strike a compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced. We show that CCB-Cut minimization can be relaxed into an orthogonally constrained -minimization problem that coincides with the problem of computing Piecewise Flat Embeddings (PFE) for one particular index value, and we present an algorithm for solving the relaxed problem by iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ). Using images from the BSDS500 database, we show that image segmentation based on CCB-Cut minimization provides better accuracy with respect to ground truth and greater variability in region size than NCut-based image segmentation.


1 Introduction

The Normalized Cut (NCut) graph partitioning cost function was first introduced over two decades ago to tackle the perceptual grouping problem [18, 19], and its approximate minimization via continuous relaxation into a generalized eigenvector problem has emerged as one of the fundamental techniques of data clustering [13].

(a) Toy Graph
(b) Cut
(c) NCut
Figure 1: (a) A graph having unit weight edges except for the two edges with weight . (b) Minimizing the Cut cost removes the weakly-connected vertex, whereas (c) minimizing the Normalized Cut (NCut) cost yields a more ”balanced” partitioning.

The key observation that motivated the normalization of graph cut cost functions is that when graph partitioning is performed by minimizing the sum of the weights of edges that are cut (the Cut cost), the resulting partitions will be unbalanced; vertices that are weakly connected to the rest of the graph are likely to be separated to form singleton partitions, while the rest of the graph vertices are likely to remain in large partitions. Normalizing the cut cost by the degrees of each partition (NCut [18, 19]), the size of each partition (Average Cut [17]), or the minimum of the size of each partition and its complement (Ratio or Cheeger Cut [6]) yield balanced partitions that have similar total degree or size when minimized.

Is it possible that these types of normalization go too far? Consider the example of Figure 1, in which we wish to partition the toy graph into two subgraphs. This graph (0(a)) contains seven vertices and nine edges; all but two of the edges have unit weight, and the two indicated edges have weight for . Since , the degree of the vertex sharing these two edges is guaranteed to be smaller than the degree of any other vertex in the graph. Hence, as shown in (0(b)), minimizing the Cut cost will separate that weakly-connected vertex from the rest of the graph. Minimizing the NCut cost creates a more balanced partitioning as shown in (0(c)); however, this partitioning will always have a lower-cost NCut than removing the weakly-connected vertex, even when is infinitesimally small. Perhaps the Ratio Cut (RCut) is a better choice: for this graph, is a critical value above which (0(c)) minimizes the RCut cost, and below which (0(b)) yields the minimum. But is there really anything special about ?

If this graph represented an image to be segmented, one would want to use ground-truth segmentation maps from training images to suggest an optimal critical value for , above which the more “balanced” partitioning is desired, and below which the weakly-connected vertex should be separated. The ground truth might dictate that even the Ratio Cut cost provides too much normalization, or instead, that it does not provide enough.

One potential idea is to use the infinite families of balanced cost functions defined by Bühler and Hein [4] and indexed by a parameter . We refer to these families as the Bühler-Hein -Normalized () and the Bühler-Hein -Ratio () Cuts. is equivalent to NCut for and the Normalized Cheeger (NCh) Cut as , and is equivalent to RCut for and the Ratio Cheeger (RCh) Cut as , so both families of cut costs can be thought of as interpolating between different normalization factors. However, as shown in Fig. (2), when we minimize Cut and Cut for the graph in Fig. (1), there are substantial ranges of for which no value of will allow a bifurcation between solutions.

Figure 2: Using the Bühler-Hein -Normalized (a) and -Ratio (b) Cut costs to partition the graph in Fig. (0(a)). Over all possible , either the partitionings in Fig.’s (0(b)) or (0(c)) yield the minimum value. For both families of costs, there are substantial ranges of for which only one partitioning emerges for all .

In this paper, we show that although the Bühler-Hein costs provide a mechanism for interpolating between Normalized (or Ratio) Cuts and Normalized (or Ratio) Cheeger Cuts, they do not provide a path for interpolating all the way to the unnormalized Cut cost. However, a different set of infinite families of balanced cost functions can be constructed that do enable such interpolation. Our proposed families are called the Compassionately Conservative Normalized (CCN) Cut and the Compassionately Conservative Ratio (CCR) Cut. Both are indexed by a parameter ; is equivalent to NCut for and the unnormalized Cut cost for , and is equivalent to RCut for and the unnormalized Cut cost for . As we can see from Fig. (3), when we minimize and for the graph in Fig. (1), different ranges of can be selected that will allow a bifurcation between solutions for any value of .

Figure 3: Using the (a) and (b) Cut costs to partition the graph in Fig. (0(a)). Over all possible , either the partitionings in Fig.’s (0(b)) or (0(c)) yield the minimum value. For both families of costs, for any choice of , ranges of exist that enable either partitioning.

This paper makes four novel contributions to graph partitioning in computer vision:

  • presenting Compassionately Conservative Balanced (CCB) Cuts: families of cut costs that enable normalizations ranging from Normalized/Ratio Cuts to unnormalized Cuts and can be naturally extended to -way partitionings with ;

  • presenting a continuous relaxation of CCB Cut minimization and illuminating its connection to computing Piecewise Flat Embeddings (PFE) [23];

  • presenting an efficient algorithm for solving the relaxed problem via minimizing a succession of reweighted Rayleigh quotients (IRRQ); and

  • demonstrating empirical advantages of the CCB Cut costs and the IRRQ minimization algorithm for the application of image segmentation.

2 Balanced Cut Costs

Consider an undirected weighted graph that we wish to partition into two disjoint subgraphs , , by removing the edges connecting and . A standard cost of partitioning is the Cut cost, defined as the total weight of the edges that have been removed:


where the vertex set , and is the weighted adjacency matrix (or affinity matrix) of .

Minimizing the Cut cost is undesirable, however, as it can often yield partitionings that simply disconnect one vertex from the rest of the graph [21]. More balanced partitions emerge if the Cut cost is normalized by some function of the total sizes or total degrees (volumes) of the subgraphs. Such balanced cut costs can be expressed generally by:


where is a symmetric function that decreases with increasing differences in the sizes or degrees of and . A variety of balanced cut costs have been proposed in the literature, including the Normalized Cut [18, 19], Average Cut [17], Ratio Cut [7, 9], and Normalized and Ratio Cheeger Cuts [4, 6].

In Theorem 4.1 of [4], Bühler and Hein show that there exists an infinite family of balanced cut costs that contains Normalized, Average, Ratio, and Cheeger Cuts. If we consider the following function:


for and , then with is the aforementioned Cut, and it approaches the Ratio Cheeger Cut as and equals the Average/Ratio Cuts for . If , is the Cut, and it approaches the Normalized Cheeger Cut as and equals the Normalized Cut for . (The volume of a subgraph is defined as , where is the degree of vertex .) Note that this is a rescaling of the balanced cut costs in [4], defined so that the maximum value of is always unity.

In addition to providing a way to interpolate between Cheeger Cuts and Normalized/Ratio Cuts for , (3) enables cut costs with normalizations that are slightly more conservative (closer to unity) than Normalized/Ratio Cuts by selecting . It is straightforward to show this by noting that:


and so for all , , which can be seen in the left side of Figure 4.

The behavior of (3) as suggests that there is room to define even more conservative normalization functions. We do so in this paper by considering the following transformation of :


where . (The reason for switching parametrizations from to will be addressed at the end of Section 3.) Clearly , and so Normalized/Ratio Cuts arise from employing (5) in place of (3) for . More interestingly, however, is that , and so for , defining with or provides a way to interpolate between Normalized/Ratio Cuts and Cuts with no normalization at all. This behavior of increasing as can be seen in the right side of Figure 4.

Of pedagogical interest is that , and so as , employing (5) to define would yield balanced cut costs that diverge whenever the sizes or degrees of and are unequal. Since our interest in this paper is to explore the impact of cut costs having more conservative normalizations than Normalized/Ratio Cuts, we will restrict our attention to the use of for .

Figure 4: Left: the family that enables the balanced cut costs defined by [4], which includes Normalized/Ratio Cuts and Cheeger Cuts but limits the extent of more conservative normalizations. Right: the family that enables the proposed balanced cut costs; for , the proposed costs interpolate from Normalized/Ratio Cuts to Cuts with no normalization at all.

In addition to enabling a greater range of conservative normalizations, the use of (5) in place of (3) yields an added benefit: the resulting Balanced Cut costs can be directly extended to form costs of partitionings into subgraphs. (As discussed in [4], it is not obvious whether such a direct extension to the Bühler-Hein cut costs exists.) If we now consider that we wish to partition into disjoint subgraphs , , then an unbalanced way to measure the cost of the partitioning is the multiway cut cost, defined in terms of pairwise cut costs as:


The family of multiway balanced cut costs that we propose is described generally by the Compassionately Conservative Balanced Cut:


where is a user-defined vector of positive weights. If , then , and we refer to (7) as the Compassionately Conservative Ratio Cut (). If (the degree vector), then , and we refer to (7) as the Compassionately Conservative Normalized Cut (). With some algebraic manipulation, it is straightforward to see that (7) is a generalization of (2); when , (7) reduces to (2) with . Furthermore, (7) generalizes multiclass cut costs presented in the research literature; , , and are scaled versions of the Multiclass Penalized Cut [24], the Multiclass Ratio Cut [20], and the Multiclass Normalized Cut [22, 20], respectively, and so choosing generates Multiclass Balanced Cut costs that interpolate between (6) and any of these previously developed costs.

3 Relaxing the CCB Cut

As with many balanced cut costs, minimizing the CCB Cut is NP-hard. However, we can formulate a continuous relaxation of for all . Unfortunately, the spectral clustering relaxation of and in terms of the second eigenvector of the graph -Laplacian [4] is of no help; it is only tight as (and therefore not for ), and furthermore, CCB Cuts are not even members of the Bühler-Hein family of cuts except for when . Hence, we must identify a different relaxation.

To do so, we first reformulate to express it in terms of an indicator matrix so that if and otherwise. If is the column of , then can be written in terms of the diagonal matrix as , and the pairwise cut cost between and can be written as , where . This allows us to express (7) as:


Minimizing (7) is equivalent to minimizing (8) subject to the constraint that is positive diagonal, which ensures that none of the ’s will collapse to the empty set.

To guide us towards an appropriate relaxation, we first consider how the Multiclass Penalized Cut () can be relaxed. Using an argument similar to Yu and Shi [22], we see that if , then and (7) is equivalent to a scalar multiple of . Hence, the solution to minimizing a relaxed version of is , where is the matrix whose columns are the orthonormal eigenvectors , , corresponding to the smallest nontrivial eigenvalues of , and is an arbitrary orthogonal matrix. The optimal solution to (7) with can then be approximated by -means clustering [15], nonmaximal suppression [22] or Procrustean rounding [24] on .

Note that if we define to be the transpose of the row of , then minimizing the relaxed version of is equivalent to solving the constrained minimization problem:

subject to:

which, when , is identical to the Laplacian Eigenmaps (LE) problem [3] for computing embeddings of data that are assumed to lie on a manifold. The balance constraint is necessary to avoid eigenvectors of corresponding to the trivial eigenvalue.

Turning our attention now to the more general case where , if we define for , and we note that , we can write:


Hence, the relaxation of (8) is obtained by dropping the condition that and solving the constrained minimization problem:

subject to:

Note that the balance constraint is only necessary for the case .

Remarkably, in the special case where and , (11) is exactly the Piecewise Flat Embedding (PFE) problem [23], whose solution is an embedding in which the data are naturally clustered due to the promotion of sparsity in the differences between rows of . It is for this reason that we choose to parametrize CCB Cuts in terms of and not : solutions to the relaxed versions of for can be expected to be piecewise constant, consistent with the sparse nature of solutions to -minimization problems for .

4 IRRQ Minimization Algorithm

While (11) with has a solution that can be written in terms of the eigenvectors corresponding to the smallest nontrivial eigenvalues of , no analytical form of the solution is known for . In [23], Yu et al. proposed approximating the solution to (11) with via the Splitting Orthogonality Constraint (SOC) algorithm [14], which requires an initial estimate of and performs a nested iteration with parameters at each level that must be tuned. While it is possible to partially generalize this approach to handle (11) with , is non-convex for , and so the SOC algorithm is not applicable in this regime. Furthermore, even in the regime in which it is applicable, the use of splitting in the algorithm formulation means that the solutions will not strictly satisfy the orthogonality constraint .

Here, we propose an alternative algorithm for solving (11) that can be applied for all , that does not require an initial estimate of , and that does strictly satisfy the orthogonality constraint. Our alternative algorithm is motivated from the Iteratively Reweighted Least Squares (IRLS) algorithms commonly used to solve -minimization problems [8]. In IRLS, -minimization is performed by iteratively solving a succession of weighted least-squares (-minimization) problems, with the weights updated at each iteration so as to decrease the impact of large residual errors. IRLS algorithms do require initialization, but it is the weights that must be initialized as opposed to the solution. Weights are typically initialized to unity (although they can be initialized differently by an expert user), and in specific cases [8], IRLS algorithms have provable convergence guarantees.

In our relaxed problem (11), the presence of the orthogonality constraint renders IRLS algorithms invalid. However, solutions to (11) can be approximated by iteratively solving a series of constrained weighted -minimization problems, each of the form:

subject to:

where is the matrix of weights (with entries ) that is updated at each iteration in a manner similar to IRLS.

To establish a connection between (11) and (12), we first eliminate the balance constraint from (12) using the result of the following Lemma, which is proved in Appendix C:

Lemma 4.1.


and define for , where , is a full rank matrix with , and . Then is a bijection from to .

Using this Lemma allows us to solve (12) by first solving:


and then computing .

An analogy to IRLS suggests that under the assumption that component-wise differences in the embedding do not vanish, the best choice of weights for (12) would be for , where is the solution to (11). This choice of weights would yield . In practice, however, is unknown, and many of the component-wise differences in the embedding will vanish. Hence, we propose using regularized weights as suggested in [8]:


and we update according to the schedule prescribed by [8], which suggests:


where is the largest element of .

Combining the steps of solving (12) and updating (14)–(15) into a sequence of iterations yields Algorithm 1 for computing the solution to (11). Since (12) is equivalent to (13), and (13) can be transformed into an unconstrained minimization of a Rayleigh Quotient, we term this algorithm Iteratively Reweighted Rayleigh Quotient (IRRQ) Minimization.

procedure IRRQ(, , , )
     , , ,
     while  do
     end while
end procedure
Algorithm 1 IRRQ Algorithm for Solving (11)

Unlike SOC, IRRQ requires tuning of only a single hyperparameter , it can be used for any , and it guarantees a solution in which the orthogonality constraint is strictly enforced. Furthermore, IRRQ does not require a preprocessing step to estimate an initial clustering; rather, only the weights must be initialized, and they can all be initialized to unity. Interestingly, this initialization is equivalent to implicitly using an initial clustering that corresponds to the solution of the relaxed NCut problem. This is because is equivalent to the LE objective function in (9). If different initializations are desired, for instance, by computing an initial embedding using a Gaussian Mixture Model as in [23], these can be incorporated by setting the initial weights to be or .

Solving IRRQ Step (a)

Computing in step (a) of the IRRQ minimization algorithm is nontrivial. From the relationship between (12)–(13), we can see that computing is equivalent to solving


and then computing . Problem (16) can be expressed as the Rayleigh quotient minimization:


where is the Laplacian of the graph having weight matrix and denotes Hadamard product. The solution to (4) is given by , where is the matrix whose columns are the orthonormal eigenvectors , , corresponding to the smallest eigenvalues of , and is an arbitrary orthogonal matrix. (Note that by eliminating the balance constraint, we also eliminate the possibility of a trivial eigenvalue of . Such an eigenvalue would have eigenvector for which is in the direction of ; however, this is contradicted by the fact that .)

The ability of this solution to scale to large depends critically on the structure of , and on whether or not must explicitly be constructed. Since we are free to choose any full-rank matrix such that , we make the particular choice , where and is the identity matrix. This particular choice of is sparse, and therefore, as shown in Appendix D, the multiplication of an arbitrary vector by can be performed efficiently without explicitly constructing .

A final note is that the solution to step (a) is not actually unique: can be postmultiplied by and still be a valid solution (recalling that is orthogonal). This is not a problem for steps (b) and (c) of Algorithm 1, because and are invariant to such transformations of . As a consequence, IRRQ minimization could yield an entire family of solutions to the PFE problem. This could be problematic because the -norms/pseudo-norms are not invariant under orthogonal transformations for . In practice, however, we have found that the -norms/pseudo-norms are minimized for the choice and we therefore suggest this choice. Proof that this is the best choice remains an open problem.

Choosing for Rapid Convergence

In IRLS algorithms for -minimization, linear convergence can typically be achieved for and super-linear convergence for if is chosen large enough so that if the resulting solution is -sparse, then . (Actually, there are much more sophisticated convergence results in [8], but this is a good rule-of-thumb.) Without attempting to prove convergence results for the IRRQ algorithm, we use a similar strategy in choosing . In practice, the main difficulty in choosing is that is not known exactly until the problem is solved. To approximate , we use an estimate equal to twice the number of graph edges that connect different clusters from a -means clustering performed on the initial embedding . For “scale-free” behavior, we introduce the hyperparameter that can then be mapped to by , where is the total number of edges in the graph.

5 CCB Cuts for Segmentation

Original NCutM RCutM NCutH RCutH
Figure 5: Example BSDS500 test image and segmentation results. First row: Lab affinities; second row: mPb affinities; third row: PMI affinities. *M indicates multiway segmentation; *H indicates hierarchical 2-way segmentation. clusters specified for each case. Images appearing to have fewer segments actually have a number of small/singleton segments.
Figure 6: Hierarchical clustering results for NCuts ( Cuts) and Cuts for different numbers of clusters and Lab affinities. Top row: segmentation maps; bottom row: colorized segmentation maps.

To investigate the performance of CCB Cuts for image segmentation, we segment the 200 test images in the BSDS500 dataset [1], each of which has a variety of manually-labeled segmentations with different numbers of segments that can be used as ground truth. We explore three different affinity constructions for each image:

  • Lab: a Gaussian kernel on squared Euclidean distances in color space;

  • mPb: an exponentiated negative maximum of the multiscale probability of boundary (mPb) using the code provided with the BSDS500 dataset [1]; and,

  • PMI: an exponentiated version of a statistic related to pointwise mutual information (PMI) used in crisp boundary detection [12].

We use a -pixel radius to define neighborhoods for affinity construction. For efficiency, we downsample each affinity matrix by two scale levels (and subsequently upsample the computed embeddings) using the strategy in [2, 16].

Figure 6 illustrates segmentation results on a BSDS500 test image for both multi-way and hierarchical 2-way segmentation using various types of balanced cuts. Although different segmentations appear to have different numbers of regions, many of the regions are small/singleton, and all segmentation results have the same number of regions. Figure 6 shows how hierarchical 2-way segmentation using CCN Cuts with does not exhibit the iterative “chopping” behavior common to hierarchical NCuts minimization.

Quantitative Validation

Drawing any conclusions based on visual assessment of segmentation maps can be problematic in the absence of quantitative validation. First, to validate that CCB segmentation yields less balanced partitions than NCut and Cheeger Cut-based segmentation, we show in Figure 7 the “degree spread” for segmentation results from various methods across all test images, as measured by the ratio of the standard deviation of the partition volumes to the mean partition volume.

Next, we evaluate segmentation performance with respect to ground truth using the three criteria discussed in [1] and used in [23]: Segmentation covering, Probabilistic Rand Index (PRI), and Variation of Information (VI). Covering and PRI increase and VI decreases as segmentations become closer to the ground truth. We segment each BSDS500 test image multiple times by minimizing CCB Cuts for various values of , once corresponding to each value of (number of segments) reflected in one of the ground-truth segmentations, and once for each of , , , , and , if any of these are not reflected in the ground truth. Each multi-way segmentation algorithm proceeds by solving/approximating the corresponding embedding and then subsequently performing -means clustering on the result. We also perform segmentation using hierarchical 2-way cuts; in this case, clustering is performed at each step by identifying the threshold that best minimizes the balanced cut cost in question. For comparison purposes, we also apply the inverse power method of [10]; this method hierarchically approximates 2-way cuts that minimize the Normalized/Ratio Cheeger costs.

Figure 7: Ratio of the standard deviation of partition volumes to the mean partition volume for , NCut and Normalized Cheeger Cut-based segmentation of BSDS images for various numbers of segments.

In CCB Cut minimization (for both multi-way and hierarchical 2-way segmentation), we iterate until the relative change in cost between two iterations falls below , or until a maximum of 50 iterations are performed, whichever occurs first. The hyperparameter is set to for Lab affinities and to for mPb and PMI affinities. Following the strategy of [23], we report results for both a fixed scheme, in which we run the algorithm repeatedly with corresponding to each number of segments in the ground-truth and average the performance results from the multiple runs, and the dynamic scheme, in which we choose the value of from , , , , and that yields the best performance for a particular image. Table 1 shows the performance results for a subset of the cut costs; to save space, we excluded results from smaller values of , which typically performed worse than but better than , and we excluded results from ratio-costs, which are similar to those of normalized-costs.

Affinity Method Covering PRI VI
fixed dynamic fixed dynamic fixed dynamic
Lab NCutM 0.48 0.65 0.70 0.83 2.17 1.50
0.49 0.66 0.69 0.83 2.13 1.47
NCutH 0.51 0.67 0.78 0.87 1.98 1.43
0.57 0.69 0.69 0.78 1.81 1.38
NCheeger 0.46 0.62 0.78 0.87 2.21 1.66
mPb NCutM 0.29 0.48 0.75 0.84 2.86 2.10
0.32 0.51 0.75 0.83 2.75 1.96
NCutH 0.30 0.48 0.83 0.89 2.61 2.00
0.38 0.52 0.73 0.78 2.35 1.83
NCheeger 0.27 0.48 0.83 0.88 2.90 2.31
PMI NCutM 0.34 0.53 0.76 0.85 2.62 1.86
0.40 0.58 0.77 0.86 2.34 1.65
NCutH 0.37 0.54 0.87 0.90 2.37 1.75
0.47 0.60 0.78 0.81 1.99 1.55
NCheeger 0.33 0.51 0.84 0.89 2.65 2.04
Table 1: Comparison of various segmentation methods on BSDS500 test set, averaged across images. For each affinity construction, the best results for each performance measure are highlighted in bold. *M indicates multiway segmentation; *H indicates hierarchical 2-way segmentation.

A few interesting conclusions can be drawn. First, the use of simple Lab affinities surprisingly yields superior results compared to the more complicated mPb and PMI affinities. Second, PRI values appear to indicate that greater degrees of balance are better, whereas VI and Covering values seem to indicate that CCN with provides the best results. Third, better results are found when hierarchical 2-way segmentation is performed as opposed to simultaneous multiway segmentation. However, in our MATLAB implementation, multiway segmentation is significantly faster than hierarchical 2-way segmentation, as shown in Table 2. In the future, it would be interesting to compare these results to those of hierarchical 2-way segmentation via minimizing ratios of differences of set functions [11, 5].

Cost Function Quartile I Median Quartile III Max
2.0 2.9 4.2 14.9
13.0 23.4 41.6 258.2
NCheeger 2.4 4.0 9.4 434.6
1.0 1.4 1.9 8.0
10.0 18.6 31.6 180.7
RCheeger 1.4 2.6 9.5 378.1
Table 2: Statistics of computation times (minutes) required for segmentation, excluding affinity construction. Statistics are computed across all values of , all numbers of clusters, all images, and all affinity types.

6 Conclusion

Compassionately Conservative Balanced Cuts enable normalizations ranging from Normalized/Ratio Cuts to unnormalized Cuts, allowing a tradeoff between generating partitions that are too similar in size/degree and avoiding all singletons. They can be directly applied to the -way partitioning problem for any , and their minimization can be relaxed into a problem that can be solved by minimizing a succession of Rayleigh quotients. Image segmentation experiments show that minimizing the CCB Cut yields more accurate results than minimizing NCuts/RCuts and generates regions having greater variability in size/degree.

Appendix A Code

Prototype implementations of algorithms in this paper are available at the MATLAB Central File Exchange under File ID #66158.

Appendix B Acknowledgements

NDC thanks Thomas Bühler and Matthias Hein for helpful discussions.

Appendix C Proof of Lemma 4.1

Let . First, note that , which vanishes because annihilates vectors in the direction of . Second, note that . From these two properties, we see that , and so .

Now, define for , where . Then . Noting that is the orthogonal projector onto the subspace orthogonal to , and that this orthogonal projector can be expressed alternatively as , we can write because for . Hence, .

Now, suppose . Then such that . Note that . Since is the orthogonal projector onto the column space of , and this column space is equivalent to the column space of , . Hence, , a contradiction. Hence, , and therefore, is a surjection from to .

Finally, , and so is invertible. Hence, is a bijection from to .

Appendix D Computing

If , it can be verified that . Hence, can be computed without constructing any dense matrix by performing the following steps:

  1. ,

  2. ,

  3. , where .


  1. P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 33(5):898–916, May 2011.
  2. P. Arbeláez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. In Proc. Computer Vision and Pattern Recognition, CVPR, pages 328–335, 2014.
  3. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
  4. T. Bühler and M. Hein. Spectral clustering based on the graph p-Laplacian. In Proc. International Conference on Machine Learning, ICML, pages 81–88. ACM, 2009.
  5. T. Bühler, S. S. Rangapuram, S. Setzer, and M. Hein. Constrained fractional set programs and their application in local clustering and community detection. In Proc. International Conference on Machine Learning, ICML, pages 624–632, 2013.
  6. J. Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proceedings of the Princeton Conference in Honor of Professor S. Bochner, pages 195–199, 1969.
  7. C.-K. Cheng and Y.-C. A. Wei. An improved two-way partitioning algorithm with stable performance. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 10(12):1502–1511, 1991.
  8. I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk. Iteratively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics, 63(1):1–38, 2010.
  9. L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 11(9):1074–1085, 1992.
  10. M. Hein and T. Bühler. An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In Advances in Neural Information Processing Systems, NIPS, pages 847–855, 2010.
  11. M. Hein and S. Setzer. Beyond spectral clustering - tight relaxations of balanced graph cuts. In Advances in Neural Information Processing Systems, NIPS, pages 2366–2374, 2011.
  12. P. Isola, D. Zoran, D. Krishnan, and E. H. Adelson. Crisp boundary detection using pointwise mutual information. In Proc. European Conference on Computer Vision, ECCV, pages 799–814. Springer, 2014.
  13. A. K. Jain. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8):651–666, 2010.
  14. R. Lai and S. Osher. A splitting method for orthogonality constrained problems. Journal of Scientific Computing, 58(2):431–449, 2014.
  15. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, NIPS, pages 849–856, 2002.
  16. J. Pont-Tuset, P. Arbeláez, J. T. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Analysis and Machine Intelligence, 39(1):128–140, 2017.
  17. S. Sarkar and P. Soundararajan. Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(5):504–525, 2000.
  18. J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. Computer Vision and Pattern Recognition, CVPR, pages 731–737, Jun 1997.
  19. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888–905, Aug 2000.
  20. U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
  21. Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 15(11):1101–1113, Nov 1993.
  22. S. X. Yu and J. Shi. Multiclass spectral clustering. In Proc. International Conference on Computer Vision, ICCV, pages 313–319, 2003.
  23. Y. Yu, C. Fang, and Z. Liao. Piecewise flat embedding for image segmentation. In Proc. International Conference on Computer Vision, ICCV, pages 1368–1376, 2015.
  24. Z. Zhang and M. I. Jordan. Multiway spectral clustering: A margin-based perspective. Statistical Science, 23(3):383–403, 2008.