Filtrated Spectral Algebraic Subspace Clustering This work was supported by grant NSF 1447822.

Filtrated Spectral Algebraic Subspace Clustering thanks: This work was supported by grant NSF 1447822.

Manolis C. Tsakiris and René Vidal
Center for Imaging Science, Johns Hopkins University
3400 N. Charles Street, Baltimore, MD, 21218, USA
m.tsakiris,rvidal@jhu.edu
Abstract

Algebraic Subspace Clustering (ASC) is a simple and elegant method based on polynomial fitting and differentiation for clustering noiseless data drawn from an arbitrary union of subspaces. In practice, however, ASC is limited to equi-dimensional subspaces because the estimation of the subspace dimension via algebraic methods is sensitive to noise. This paper proposes a new ASC algorithm that can handle noisy data drawn from subspaces of arbitrary dimensions. The key ideas are (1) to construct, at each point, a decreasing sequence of subspaces containing the subspace passing through that point; (2) to use the distances from any other point to each subspace in the sequence to construct a subspace clustering affinity, which is superior to alternative affinities both in theory and in practice. Experiments on the Hopkins 155 dataset demonstrate the superiority of the proposed method with respect to sparse and low rank subspace clustering methods.

1 Introduction

Subspace clustering is the problem of clustering a collection of points drawn approximately from a union of linear subspaces. This is an important problem in pattern recognition with diverse applications from computer vision [22] to genomics [15].

Related Work. Early subspace clustering methods were based on alternating between finding the subspaces given the clustering and vice versa [2, 21, 17], and were very sensitive to initialization. The need for good initialization motivated the development of an algebraic technique called Generalized Principal Component Analysis (GPCA) [26], which solves the problem in closed form. The key idea behind GPCA is that a union of subspaces can be represented by a collection of polynomials of degree , with the property that their gradients at a data point give the normals to the subspace passing through that point. This is exploited in [24] and [8] for clustering a known number of subspaces. The recent Abstract Algebraic Subspace Clustering (AASC) method of [19, 20], unifies the ideas of [8, 26], into a provably correct method for the decomposition of a union of subspaces to its constituent subspaces. However, while in theory GPCA and AASC are applicable to subspaces of any dimensions, in practice the estimation of the subspaces is sensitive to data corruptions.

The need for methods that can handle high-dimensional data corrupted by noise and outliers motivated the quest for better subspace clustering affinities. State-of-the-art methods, such as Sparse Subspace Clustering [4, 5, 6] and Low Rank Subspace Clustering [12, 7, 23, 11], exploit the fact that a point in a union of subspaces can always be expressed as a linear combination of other points in the subspaces. Sparse and low rank representation techniques are then used to compute the coefficients, which are then used to build a subspace clustering affinity. These methods perform very well when the subspace dimensions are much smaller than the dimension of the ambient space, the subspaces are sufficiently separated and the data are well distributed inside the subspaces [6, 16]. However, these methods fail when the dimensions of the subspaces are large, e.g. a union of hyperplanes, which is the case where GPCA, to be henceforth called Algebraic Subspace Clustering (ASC), performs best. In addition, sparse methods produce low Inter-Class Connectivity, but the Intra-Class Connectivity is also low due to sparsity, leading to over-segmentation issues. Conversely, Low-Rank and methods produce high Intra-Class Connectivity (since they are less sparse) but this also leads to high Inter-Class Conectivity. Consequently, there is a strong need for developing methods that produce high Intra-Class and low Inter-Class connectivity.

Paper Contributions. The main contribution of this paper is to propose a new subspace clustering algorithm that can handle noisy data drawn from a union of subspaces of different dimensions. The key idea is to construct for each data point (the reference point) a sequence of projections onto hyperplanes that contain the reference subspace (the subspace associated to the reference point). The norms of the projected data points are used to define their affinity with the reference point. This process leads to an affinity matrix of high intra-class and low cross-class connectivity, upon which spectral clustering is applied. We provide a theorem of correctness of the proposed algorithm in the absence of noise as well as a variation suitable for noisy data. As a secondary contribution, we propose to replace the angle-based affinity proposed in [26] by a superior distance-based affinity. This modification is motivated by the fact that the angle-based affinity is theoretically correct only in the case of hyperplanes, and is not a good affinity for subspaces of varying dimensions. Our experiments demonstrate that the proposed method outperforms other subspace clustering algorithms on the Hopkins 155 motion segmentation database as well as on synthetic experiments for arbitrary-dimensional subspaces of a low-dimensional ambient space.

2 Algebraic Subspace Clustering: A Review

We begin with a brief overview of the ASC theory and algorithms. We refer the reader to [25, 26, 3, 14] for details.

Subspace Clustering Problem. Let be a set of points that lie in an unknown union of subspaces , where a linear subspace of of dimension . The goal of subspace clustering is to find the number of subspaces, a basis for each subspace, and cluster the data points based on their subspace membership, i.e., find the correct decomposition or clustering of as , where . To make the subspace clustering problem well-defined, we need to make certain assumptions on the geometry of both the subspaces and the data . In this work we assume that the underlying union of subspaces is transversal [14], which in particular implies that there are no inclusions between subspaces. Moreover, we assume that for , i.e., each of the given points is associated to a unique subspace. This guarantees that the above decomposition of is in fact a partition, and it is unique. A final assumption that we need is that the data are rich enough and in general position (see Definition 1).

Unions of Subspaces as Algebraic Varieties. A key idea behind ASC is that a union of subspaces of is the zero set of a finite set of homogeneous polynomials of degree with real coefficients in indeterminates . Such a set is called an algebraic variety [1]. For example, a union of hyperplanes , where the th hyperplane is defined by its normal vector , is the zero set of

(1)

Likewise, the union of a plane with normal and a line with normals is the zero set of the two polynomials and . Observe that these vanishing polynomials are homogeneous of degree , where is the number of subspaces. Moreover, they are factorizable into linear forms, with each subspace contributing a linear form to the product. Each such linear form is in turn defined by a normal vector to the subspace.

Finding Vanishing Polynomials. Note that the coefficients of the polynomials associated with a union of subspaces can be obtained from sufficiently many samples in general position by solving a linear system of equations.

Definition 1

We say that the data is in general position if a degree polynomial vanishes on if and only if it vanishes on the underlying union of subspaces .

For example, if is a union of two planes in with normals , , then we can write as

(2)

where and . Thus, we can find the vector of coefficients by solving the set of linear equations for . More generally, each polynomial of degree can be written as , where is the Veronese embedding of degree that maps a point to all distinct monomials of degree in the entries of . Consequently, a basis for the set of polynomials of degree that vanishes in can be found by computing a basis for the right nullspace of the embedded data matrix, i.e., by solving the linear system:

(3)

However, the polynomials obtained by the above procedure may not factorize into a product of linear forms because the space of factorizable polynomials is not a linear space, e.g is not factorizable.

Polynomial Differentiation Algorithm. Even though an elegant solution based on polynomial factorization exists for the case of hyperplanes [25], it has not been generalized for subspaces of different dimensions. However, an alternative solution has been achieved by observing that given any degree vanishing polynomial on , and a point in , the gradient of evaluated at will be orthogonal to the subspace associated with point (see [26] and [14] for a geometric and algebraic argument respectively). Consequently, for the purpose of computing normal vectors to the subspaces, it is enough to compute general vanishing polynomials of degree . The set of all such polynomials, denoted , is a finite-dimensional vector space and a basis can be computed as a basis of the right nullspace of the Veronese matrix , where is the Veronese embedding of degree that maps a point to all distinct monomials of degree in the entries of . Having a basis for , it can be shown that the subspace associated to a point can be identified as the orthogonal complement of the subspace spanned by the vectors [26, 14]. Then we can remove the points that lie in the same subspace as and iterate the procedure with the remaining points until all subspaces have been identified. It is remarkable that this procedure is provably correct for a known number of subspaces of arbitrary dimensions. Even though this result is general and insightful, algorithms that are directly based on it are extremely sensitive to noise. The main reason is that any procedure for estimating the dimension of the nullspace will unavoidably involve thresholding the singular values of , which will in turn yield very unstable estimates of the subspaces and subsequently poor clustering of the points.

Spectral Algebraic Subspace Clustering Algorithm. In the interest of enhancing the robustness of ASC in the presence of noise and obtaining a working algebraic algorithm, the standard practice has been to apply a variation of the polynomial differentiation algorithm based on spectral clustering. More specifically, given noisy data lying close to a union of subspaces , one computes an approximate vanishing polynomial whose coefficients are given by the right singular vector of corresponding to its smallest singular value. Given , one computes the gradient of at each point in (which gives a normal vector associated with each point in , and builds an affinity matrix between points and as the cosine of the angle between their corresponding normal vectors, i.e.,

(4)

This affinity is then used as input to any spectral clustering algorithm to obtain the clustering . We call this Spectral ASC method with angle-based affinity as SASC-A. To gain some intuition on , suppose is a union of hyperplanes and that there is no noise. Then must be of the form in (1). In that case is simply the cosine of the angle between the normals to the hyperplanes that are associated with points and . If both points lie in the same hyperplane, their normals must be equal, and hence . Otherwise, is the cosine of the angles between the hyperplanes. Thus, assuming that these angles are not small, and that the points are well distributed on the union of the hyperplanes, spectral clustering on the affinity matrix will in general yield the correct clustering. Even though SASC-A is much more robust in the presence of noise than purely algebraic methods for the case of a union of hyperplanes, it is fundamentally limited by the fact that it applies only to unions of hyperplanes. Indeed, if the orthogonal complement of a subspace has dimension greater than , there may be points inside such that the angle between and is as large as . In such instances, points associated to the same subspace may be weakly connected and thus there is no guarantee for the success of spectral clustering.

Abstract Filtration Scheme. Motivated by the limitation of the polynomial differentiation algorithm to a known number of subspaces, and the association of undesired ghost-subspaces with the recursive method of [8], an alternative algebraic subspace clustering procedure based on filtrations of subspace arrangements was proposed in [19, 20]. The procedure is abstract in the sense that it receives as input a union of an unknown number of subspaces of arbitrary dimensions, and it decomposes it to the list of its constituent subspaces. This is done recursively by identifying a single subspace each time: is intersected with the hyperplane , whose normal vector is the gradient of a vanishing polynomial at a point . Then contains the subspace associated to and so does the new smaller union of subspaces . Next is intersected with a hyperplane of , whose normal is the gradient of a vanishing polynomial of evaluated at . As before, contains and the process repeats until no non-zero vanishing polynomial exists, in which case is precisely . By picking a point a new subspace is identified and so on. This method has very strong theoretical guarantees (for noiseless data) but is fairly abstract in nature. It is the very purpose of the remaining of this paper to adapt the work of [19, 20] to a numerical algorithm and to experimentally demonstrate its merit.

3 Filtrated Spectral ASC

In this section, we propose a new subspace clustering procedure which addresses the robustness of ASC with respect to noise and unknown subspace dimensions, especially in the case of subspaces of varying dimensions.

3.1 A Distance-Based Affinity

Our first contribution is to replace the angle-based affinity in (4) by a distance-based-affinity and to show that the new affinity possesses superior theoretical guarantees.

Given unit norm data points lying close to an unknown union of subspaces, let be an approximate vanishing polynomial whose coefficients are given by the right singular vector of associated with its smallest singular value. We define the distance-based-affinity as

(5)

where the gradient vectors are assumed to be normalized to unit Euclidean norm. We will refer to this Spectral ASC method with the distance-based affinity in (5) SASC-D. The denomination distance-based comes from the fact that the Euclidean distance from point to the hyperplane defined by the unit normal vector is precisely . Moreover, contains the subspace passing through . Thus, if and are in the same subspace, then the distance from to is zero and so is the distance from to . This implies that . Of course, it may be the case that for points and coming from distinct subspaces. For instance, consider a union of two lines in and choose a plane containing one of the lines. If the plane happens to contain the two lines, then for all pairs of points in the two lines.

Theorem 1

Let be points of lying in a union of subspaces . Let be a homogeneous polynomial of any degree vanishing on . Then the distance-based affinity in (5) is such that if points lie in the same subspace, then . The converse is not true in general.

3.2 Filtrated ASC

Theorem 1 shows the superiority of the distance-based affinity in (5) over the angle-based affinity in (4) because it ensures that points from the same subspace will be given an affinity of maximal value . What still limits the theoretical guarantees of (5) is the fact that points from distinct subspaces may also have a maximal affinity .

In this section, we show that it is possible to further refine (5) by a filtration process illustrated in Figure 1. Let be a set of points in in general position in a union of transversal subspaces . Assume that each point lies in only one of the subspaces and is normalized to have unit norm. The key idea behind the filtration process is that, given an arbitrary reference point in one of the subspaces, say , we can identify all other data points in the same subspace as by 1) projecting all data points in onto and 2) finding the points in whose norm after projection remains equal to one.

Figure 1: Commutative diagram of the filtration associated with a reference point . The arrows denote embeddings.

The fundamental challenge, however, is that we do not know . The filtration process in Figure 1 is designed, precisely, to perform a sequence of projections, which ultimately give the projection onto without knowing .

At step 1 of the filtration, choose a vanishing polynomial of of degree from the nullspace of such that . One can show that such a always exists. Let and let be the hyperplane of defined by . If , then by Theorem 1 we know that point is not in . Consequently, we can filter the set to obtain a subset . Geometrically, is precisely the subset of that lies inside the hyperplane , i.e., .

The key observation now is that is a set of points of drawn from the union of subspaces . But is up to isomorphism a union of subspaces of , since it is embedded in the hyperplane . In particular, consider the composite linear transformation , where the first arrow is the orthogonal projection of onto and the second arrow maps a basis of to the standard basis of . We can replace the redundant representation by . It is important to note that the norm of every point in remains unchanged and equal to under the transformation . Note also that now may be actually a subset of a union of subspaces of , as it is quite possible that all points in lying in some subspace were filtered out.

Now, since is in general position inside , will be in general position inside , and from this one can deduce that every vanishing polynomial of has gradient orthogonal to at , and that there is a vanishing polynomial of degree such that . Let be a hyperplane of defined by the normal vector . Note that contains all the points of that correspond to . As before, we can filter the set to obtain a new set . Once again, lies in a union of at most subspaces of , which is however embedded in the hyperplane , and thus we can replace by its image under the composite linear transformation , in which the first arrow is the orthogonal projection of onto , and the second arrow maps a basis of to the standard basis of . Proceeding inductively, this process will terminate precisely after steps, where is the codimension of . More specifically, there will be no non-zero vanishing polynomials on and will be isomorphic to . Thus will consist of the images of the points of under the sequence of transformations . We note that the norm of these points remains unchanged and equal to under .

Once the points of that lie in have been identified, we can remove them and repeat the process starting with the set , which lies in general position inside . This leads to Algorithm 1, which we term Filtrated-Algebraic-Subspace-Clustering (FASC), and is guaranteed to return the correct clustering:

1:procedure FASC()
2:     ;
3:     for  do
4:         ; ;
5:         take any , ;
6:         while  do
7:              ;
8:              find s.t. ;
9:              ;
10:              ;
11:              ; ;
12:         end while
13:         ;
14:         ; ;
15:     end for
16:     ;
17:     return ;
18:end procedure
Algorithm 1 Filtrated Algebraic Subspace Clustering
Theorem 2

Let be points of lying in a transversal union of subspaces . Let . Assuming that the points of are in general position inside , Algorithm 1 returns a set such that , where is a permutation on symbols.

3.3 Filtrated Spectral ASC

Let us now consider the case where the data are corrupted by noise. In this case, Algorithm 1 (FASC) is not applicable because the noisy embedded data matrix is in general full rank. Nonetheless, we will show next that we can still exploit the insights revealed by the theoretical guarantees of FASC to construct a Robust-ASC algorithm.

To begin with, note that Algorithm 1 requires a single vanishing polynomial at each step of each filtration. We can use any approximate vanishing polynomial at step . For example, letting be the points that have passed through the filtration at step , we can let be the polynomial whose coefficients are given by the right singular vector of corresponding to its smallest singular value. Notice that no thresholding is required to choose such a . This is in sharp contrast to the polynomial differentiation algorithm described in Section 2, which requires a thresholding on the singular values of in order to estimate a basis of . Now, for any point , gives a hyperplane that approximately contains the subspace associated to point . However, we cannot go to the next step due to the following problems.

Problem 1

In general, two points lying approximately in the same subspace will produce different hyperplanes that approximately contain with different levels of accuracy. In the noiseless case any point would be equally good. In the presence of noise though, the choice of the reference point becomes significant. How should be chosen?

Problem 2

Given a hyperplane produced by a point , we need to determine which other points in lie approximately in the hyperplane and filter out the remaining points. A simple approach is to filter out a point if its distance to the hyperplane is above a threshold , or if the relative change in its norm is more than . Clearly the choice of will affect the performance of the algorithm. How should be chosen?

Problem 3

Finally, we also need to determine the number of steps needed to stop the filtration. This is equivalent to determining the codimension of the subspace associated to the reference point of that filtration. In the noiseless case, one stops when the norm of the reference point becomes less than . In the noisy case, because the hyperplanes used to construct the filtration are only approximate, the norm of the reference point could drop at every step of the filtration. Hence a suitable stopping criterion needs to be devised.

Inspired be the SASC-D algorithm, which handles noise by computing a normal vector for each data point and uses the normal vectors to define a distance-based affinity, we propose to address Problem 1 by constructing a filtration for each data point with reference point and using the norms of the data points to construct the affinity.

Let be the projection at step of the filtration for point . Recall that at step , only a subset of the original points will remain, while others will be filtered out. We can define an affinity matrix as

(6)

This affinity captures the fact that if points and are in the same subspace, then the norm of should not change from step to step of the filtration computed with reference point . Otherwise, if and are in different subspaces, the norm of is expected to be reduced by the time the filtration reaches step , where is the reference subspace associated to . In the case of noiseless data, only the points in the correct subspace survive step and their norms are precisely equal to one. Therefore, if points and are in the same subspace and otherwise. In the case of noisy data, the above affinity will not be perfect due to Problems 2 and 3, which we address next.

To address Problem 2, let be the approximate vanishing polynomial whose coefficients are the right singular vector of corresponding to the smallest singular value. Let

(7)

Notice that in the noiseless case. In the presence of noise, is the average over all points of the distance of a point from the hyperplane that it produces. Evidently, small levels of noise will correspond to small values of . Thus, we propose to define , where is a user defined parameter. To determine , we propose to construct multiple filtrations for different values of . Each filtration will result in a different affinity matrix. Suppose we have defined a stopping criterion to terminate each filtration so that we can use the affinity matrix at the last step of the filtration (see below for stopping criteria). Given these affinity matrices, we choose the one whose normalized Laplacian has the largest eigengap , where the eigenvalues are ordered increasingly.

To address Problem 3, we stop the filtration at step if 1) the number of points is less than the ambient dimension of the Veronese-embedded points; 2) the reference point is filtered out at the th step; or 3) the number of points that passed through the filtration at step is less than some integer . This integer is the smallest number of points that our algorithm is allowed to consider as a cluster.

Finally, the resulting affinity is symmetrized, and used for spectral clustering, as described in Algorithm 2.111 denotes the spectrum of the normalized Laplacian matrix of , denotes spectral clustering being applied to to obtain clusters, and is the polynomial whose coefficients are the right singular vector of corresponding to the smallest singular value.

1:procedure FSASC()
2:     if  then
3:         return (’Not enough points’);
4:     else
5:         eigengap ; ;
6:         ;
7:         ;
8:         ;
9:         for k = 1 : M do
10:              ;
11:              for j = 1 : N do
12:                  ;
13:              end for
14:               ;
15:              if (eigengap then
16:                  eigengap ; ;
17:              end if
18:         end for
19:         ;
20:         return ;
21:     end if
22:end procedure
23:
24:function Filtration()
25:     ;
26:     flag ;
27:     while () and (do
28:         ;
29:         if  then
30:              if  then
31:                  ;
32:              end if
33:              flag ;
34:         else
35:              
36:              if  then
37:                  flag ;
38:              else
39:                  ;
40:                  ;
41:                  if  then
42:                       flag ;
43:                  else
44:                       ;
45:                       ;
46:                       ;
47:                        ;
48:                  end if
49:              end if
50:         end if
51:     end while
52:     return ();
53:end function
Algorithm 2 Filtrated Spectral ASC

4 Experiments

Synthetic Data. We randomly generate subspaces of dimensions in . For each choice of , we randomly generate unit norm points per subspace and add zero-mean Gaussian noise with standard deviation in the direction orthogonal to the subspace. For each choice of and , we perform independent subspace clustering experiments using the algebraic methods FSASC, SASC-D and SASC-A, and compare to state of the art methods such as SSC [6], LRR [11, 12], LRSC [23] and LSR using equation (16) in [13]. We also use the heuristic post processing of the affinity for LRR (LRR-H) and LSR (LSR-H). For FSASC we use and , for SSC and , for LRR , for LRSC and for LSR . We report average clustering errors, intra-cluster connectivities of the affinity matrices produced by the methods (defined to be the minimum algebraic connectivity222The algebraic connectivity of a graph is the second smallest eigenvalue of the Laplacian of the graph. Here we use the normalized Laplacian. among the subgraphs corresponding to each of the three subspaces) and inter-cluster connectivities (). Due to lack of space, we report errors on all methods only for and noise.

Table 1 reports the mean clustering errors. Observe that FSASC is the only method that gives error for noiseless data for all dimension configurations, thus verifying experimentally its strong theoretical guarantees: no restrictions on the dimensions of subspaces are required for correctness. As expected, SSC, LRR, LRSC and LSR yield perfect clustering when is small, but their performance degrades significantly for large . Observe also that, although SASC-D is much simpler than FSASC and has similar complexity to SASC-A, its performance is very close to that of FSASC, and much better than SASC-A. We attribute this phenomenon to the correctness Theorem 1 of SASC-D. As the noise level increases, FSASC remains stable across all dimension configurations with superior behavior among all compared methods. SASC-D is less robust in the presence of noise, except for the case of hyperplanes, in which it is the best method. This phenomenon is expected, since SASC-D is essentially equivalent to FSASC if the latter is configured to take only one step in each filtration in Figure 1, and this is precisely the optimal stopping point in every filtration when the subspaces are hyperplanes. In this case, if data are noisy, the criterion for stopping FSASC filtrations is determined by the parameter and by the level of noise via the quantity , leading to suboptimal values (i.e., more than one step may be taken in the filtration).

Tables 3 and 3 indicate that FSASC yields higher quality affinity graphs for the purpose of clustering. To see why this is the case, observe that except for FSASC, we can distinguish two kinds of behavior in the remaining methods: the first kind gives high intra-cluster connectivity at the cost of high inter-cluster connectivity. Such methods are SASC-D, SASC-A, LRR, LRSC and LSR. The second kind gives low inter-cluster connectivity at the expense of low intra-cluster connectivity leading to unstable clustering results by the spectral clustering method. Such methods are SSC, LRR-H and LSR-H. This is expected because these methods use sparse affinities. On the other hand, FSASC circumvents this trade-off by giving high intra-cluster connectivity and low inter-cluster connectivity, thus enhancing the success of the spectral clustering step.

method
FSASC
SASC-D
SASC-A
SSC
LRR
LRR-H
LRSC
LSR
LSR-H
FSASC
SASC-D
SSC
FSASC
SASC-D
SSC
FSASC
SASC-D
SASC-A
SSC
LRR
LRR-H
LRSC
LSR
LSR-H
Table 1: Mean clustering error in over independent experiments on synthetic data for subspaces of of varying dimensions and varying levels of noise .
method
FSASC
SASC-D
SASC-A
SSC
LRR
LRR-H
LRSC
LSR
LSR-H
FSASC
SASC-D
SSC

Table 3: Mean inter-cluster connectivity (%) for synthetic subspaces of of dimensions , and noise .
method
FSASC
SASC-D
SASC-A
SSC
LRR
LRR-H
LRSC
LSR
LSR-H
FSASC
SASC-D
SSC
Table 2: Mean intra-cluster connectivity in for synthetic subspaces of of dimensions and noise .

Motion Segmentation. We evaluate different methods on the Hopkins155 motion segmentation data set [18], which contains 155 videos of , moving objects, each one with - feature point trajectories of dimension -. While SSC, LRR, LRSC and LSR can operate directly on the raw data, algebraic methods require . Hence, for algebraic methods, we project the raw data onto the subspace spanned by their principal components, where is the largest integer such that , and then normalize each point to have unit norm. We apply SSC to i) the raw data (SSC-raw) and ii) the raw points projected onto their first principal components and normalized to unit norm (SSC-proj). For FSASC, LRR, LRSC and LSR we use the same parameters as before, while for SSC the parameters are and .

The clustering errors and the intra/inter-cluster connectivities are reported in Table 4 and Fig. 2. Notice the clustering errors of about 5% and 37% for SASC-A, which is the classical GPCA algorithm. Notice how changing the angle-based by the distance-based affinity (SASC-D) already gives errors of around 5.5% and 14%. But most dramatically, notice how FSASC further reduces those errors to 0.8% and 2.48%. This clearly demonstrates the advantage of FSASC over classical ASC. Moreover, even though the dimensions of the subspaces ( for motion segmentation) are low relative to the ambient space dimension (-) - a case that is specifically suited for SSC, LRR, LRSC, LSR - projecting the data to , which makes the subspace dimensions comparable to the ambient dimension, is sufficient for FSASC to get superior performance relative to the best performing algorithms on Hopkins 155. We believe that this is because, overall, FSASC produces a much higher inter-cluster connectivity, without increasing the intra-cluster connectivity too much.

motions motions all motions
method
FSASC
SASC-D
SASC-A
SSC-raw
SSC-proj 3
LRR
LRR-H
LRSC
LSR
LSR-H
Table 4: Mean clustering error (), intra-cluster connectivity () and inter-cluster connectivity () in on the Hopkins155 data.
Figure 2: Clustering error ratios for both and motions in Hopkins155, ordered increasingly for each method. Errors start from the -th smallest error of each method.

Handwritten Digit Clustering. In this section we consider the problem of clustering two digits, one of which is the digit (see Benford’s law, e.g. [9]). For each pair we randomly select images from the MNIST database [10] corresponding to each digit and compute the clustering errors averaged over independent experiments. SSC, LRR and LSR operate on raw data. LRR and LSR parameters are the same as before. For FSASC we set and and for SSC we set and , as before. For the three algebraic methods we first project the raw data onto their first principal components and then normalize each point to have unit norm. For comparison, we also run SSC on the projected data. Mean errors are reported in Table 5 with SASC-A, LRR, LRSC omitted since they perform poorly (with LRR performing worse with the post-processing). We also do not show the numbers for SSC-proj since they are very close to those of SSC-raw. As in the case of motion segmentation, we observe that FSASC outperforms SASC-D (this time by a large margin), which in turn significantly outperforms SASC-A. This confirms the superiority of FSASC over previous algebraic methods. As before, FSASC is also superior to SSC. The only method that performs better is LSR-H. We note that projecting the -dimensional data onto dimension , reduces the angles between the subspaces, thus making the clustering problem harder. As a result, for more than digits the performance of FSASC degrades singinficantly, even for projection dimensions up to , since it becomes harder for the method to distinguish the subspaces. To circumvent this issue, a higher projection dimension would be required, which currently can not be handled by FSASC, due to the high complexity.

Digits Pair
method
FSASC
SASC-D
SSC-raw
LSR
LSR-H
Table 5: Clustering error () for two digits in the MNIST dataset.

5 Conclusions

We presented a novel algebraic subspace clustering method based on the geometric idea of filtrations and we experimentally demonstrated its robustness to noise using synthetic and real data and its superiority to the state-of-the-art algorithms on several occasions. Overall, the method works very well for subspaces of arbitrary dimensions in a low-dimensional ambient space, and it can handle higher dimensions via a projection. The main weakness of the method is its high computational complexity, which comes from the large number of filtrations required, as well as the exponential cost of fitting polynomials to subspaces. Future research will be concerned with reducing the complexity, as well as dealing with outliers and missing entries.

References

  • [1] M. Atiyah and I. MacDonald. Introduction to Commutative Algebra. Westview Press, 1994.
  • [2] P. S. Bradley and O. L. Mangasarian. k-plane clustering. Journal of Global Optimization, 16(1):23–32, 2000.
  • [3] H. Derksen. Hilbert series of subspace arrangements.