List-Decodable Subspace Recovery via Sum-of-Squares

# List-Decodable Subspace Recovery via Sum-of-Squares

## Abstract

We give the first efficient algorithm for the problem of list-decodable subspace recovery. Our algorithm takes input samples () are generated i.i.d. from Gaussian distribution on with covariance of rank and the rest are arbitrary, potentially adversarial outliers. It outputs a list of projection matrices guaranteed to contain a projection matrix such that 1. Here, is the projection matrix to the range space of . The algorithm needs samples and runs in time time where is the ratio of the largest to smallest non-zero eigenvalues of .

Our algorithm builds on the recently developed framework for list-decodable learning via the sum-of-squares (SoS) method [KKK19, RY20] with some key technical and conceptual advancements. Our key conceptual contribution involves showing a (SoS “certified”) lower bound on the eigenvalues of covariances of arbitrary small subsamples of an i.i.d. sample of a certifiably anti-concentrated distribution. One of our key technical contributions gives a new method that allows error reduction “within SoS” with only a logarithmic cost in the exponent in the running time (in contrast to polynomial cost in [KKK19, RY20]).

In a concurrent and independent work, Raghavendra and Yau proved related results for list-decodable subspace recovery [RY20].

\DeclareCaptionType

Algorithm

## 1 Introduction

An influential recent line of work [KLS09, ABL13, DKK16, LRV16, CSV17, KS17a, KS17b, HL17, DKK18, DKS17a, KKM18] has focused on designing algorithms for basic statistical estimation tasks in the presence of adversarial outliers. This has led to a body of work on outlier-robust estimation of basic parameters of distributions such as mean, covariance [DKK16, DKS17b, CDG19, DKK17, DKK18, CDGW19] and moment tensors [KS17b] along with applications to “downstream” learning tasks such as linear and polynomial regression [DKS17c, KKM18, DKK19, PSBR18]. The upshot of this line of work is a detailed understanding of efficient robust estimation when the fraction of inliers (), but a fixed fraction of arbitrary adversarial outliers in the input data.

In this work, we focus on the harsher list-decodable estimation model where the fraction of inliers is - i.e.,a majority of the input sample are outliers. First considered in [BBV08] in the context of clustering, this was proposed as a model for untrusted data in a recent influential recent work of Charikar, Steinhardt and Valiant [CSV17]. Since unique recovery is information-theoretically impossible in this setting, the goal is to recover a small (ideally ) size list of parameters one of which is guaranteed to be close to those of the inlier distribution. A recent series of works have resulted in a high-level blueprint based on the sum-of-squares method for list-decodable estimation yielding algorithms for list-decodable mean estimation [DKS18, KS17a] and linear regression [KKK19, RY20].

We extend this line of work by giving the first efficient algorithm for list-decodable subspace recovery. In this setting, we are given data with fraction inliers generated i.i.d. according 2 on with a (possibly low-rank, say ) covariance matrix and rest being arbitrary outliers. We give an algorithm that succeeds in returning a list of size that contains a satisfying where is the projector to the range space of and is the ratio of the largest to smallest non-zero eigenvalues of . Our Frobenius norm recovery guarantees are the strongest possible and imply guarantees in other well-studied norms such as spectral norm or principle angle distance between subspaces. Our algorithm runs in time and requires samples.

Our results work more generally for any distribution that satisfies certifiable anti-concentration and mild concentration properties (concentration of PSD forms). Certifiable anti-concentration was first defined and studied in recent works on list-decodable regression  [KKK19, RY20]. Gaussian distribution and uniform distribution on sphere (restricted to a subspace) are natural examples of distributions satisfying this property. We note that Karmalkar et. al. [KKK19] proved that anti-concentration of is necessary for list-decodable regression (and thus also subspace recovery) to be information theoretically possible.

#### Why List-Decodable Estimation?

List-decodable estimation is a strict generalization of related and well-studied clustering problems (for e.g., list-decodable mean estimation generalizes clustering spherical mixture models, list-decodable regression generalizes mixed linear regression). In our case, list-decodable subspace recovery generalizes the well-studied problem of subspace clustering where given a mixtur of distributions with covariances non-zero in different subspaces, the goal is to recover the underlying subspaces  [AGGR98, CFZ99, GNC99, PJAM02, AY00]. Algorithms in this model thus naturally yield robust algorithms for the related clustering formulations. In contrast to known results, such algorithms allow “partial recovery” (e.g. for example recovery or fewer clusters) even in the presence of outliers that garble up one or more clusters completely.

Another important implication of list-decodable estimation is algorithms for unique recovery that work all the way down to the information-theoretic threshold (i.e. fraction of inliers ). Thus, specifically in our case, we obtain an algorithm for (uniquely) estimating the subspace spanned by the inlier distribution whenever the fraction of inliers satisfy - the information theoretically minimum possible value. We note that such a result will follow from outlier-robust covariance estimation algorithms [DKK16, LRV16, CDGW19] whenever is sufficiently close to . While prior works do not specify precise constants, all known works appear to require at least .

### 1.1 Our Results

We are ready to formally state our results. Our results apply to input samples generated according to the following model:

###### Model 1 (Robust Subspace Recovery with Large Outliers).

For and , let , be a rank PSD matrix and let be a distribution on with mean and covariance . Let denote the following probabilistic process to generate samples, with inliers and outliers :

1. Construct by choosing i.i.d. samples from .

2. Construct by choosing the remaining points arbitrarily and potentially adversarially w.r.t. the inliers.

###### Remark 2.

We will mainly focus on the case when . The case of non-zero can be easily reduced to the case of by modifying samples by randomly pairing them up and subtracting off samples in each pair (this changes the fraction of inliers from to ).

###### Remark 3.

Our results naturally extend to the harsher strong contamination model (where one first chooses an i.i.d. sample from and then corrupts an arbitrary fraction of them) with no change in the algorithm.

An -approximate list-decodable subspace recovery algorithm takes input a sample drawn according to and outputs a list of absolute constant (depending only on ) such that there exists a satisfying , where is the projector to the range space of .

Before stating our results we observe that since list-decodable subspace recovery strictly generalizes list-decodable regression (by viewing samples as dimensional points with a rank covariance), we can import the result of Karamalkar, Klivans and Kothari [KKK19] that shows the information-theoretic necessity of anti-concentration of the distribution .

###### Fact 4 (Theorem 6.1, Page 19 in [Kkk19]).

There exists a distribution that -anti-concentrated for every but there is no algorithm for -approximate list-decodable subspace recovery for that outputs a list of size .

The distribution is simply the uniform distribution on an affine subcube of dimension of (and more generally, -ary discrete cube).

Our first main result shows that given any arbitrarily small , we can recover a polynomial (in the rank ) size list of subspaces that contains a satisfying . The surprising aspect of this result is that we can get an error that can be made arbitrarily small (independent of the rank or the dimension ) at the cost of increasing the list size from a fixed constant to polynomially large in the rank of . This result crucially relies on our new exponential error reduction method (see Lemma 7).

###### Theorem 5 (Large-List Subspace Recovery).

Let be such that has rank and condition number , and is -certifiably -anti-concentrated. For any , there exists an algorithm that takes input samples from and outputs a list of size of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .

We use a new pruning procedure to get the optimal list size of at the cost of increasing the Frobenius error to .

###### Theorem 6 (List-Decodable Subspace Recovery).

Let be such that has rank and condition number , and is . Then, there exists an algorithm that takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .

As discussed above, our results immediately extends by means of a simple reduction to the case when is non-zero.

###### Corollary 7 (Large-List Affine Recovery).

Let be such that has rank and condition number , and is . Then, there exists an algorithm that takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .

### 1.2 Related Work

#### Subspace Clustering.

Prior work on subspace recovery focused on the closely related problem of subspace clustering in high dimension, where to goal is to partition a set of points into -clusters according to their underlying subspaces. Subspace clustering methods have found numerous applications computer vision tasks such as image compression [HWHM06], motion segmentation [CK98], data mining [PHL04], disease classfication [MM14], recommendaation systems [ZFIM12] etc. Algorithms for subspace clustering include iterative methods, algebraic and statistical methods and spectral techniques. We refer the readers to the following surveys for a comprehensive overview [EV13, PHL04]. Elhamifar and Vidal [EV13] also introduced sparse subspace clustering, building on the compressed sensing and matrix completion literature. Soltanolkotabi et. al. [SEC14] extend sparse subspace clustering to work in the presence of noise and provide rigorous algorithmic guarantees. They assume the outliers contribute a small fraction of the input and are distributed uniformly distributed of the unit sphere.

#### Robust Subspace Recovery.

A recent line of work on robust subspace recovery has focused on projection pursuit techniques, -PCA (robust PCA), exhaustive subspace search and robust covariance estimation. Here, the goal is to recover a set of inliers that span a single low-dimensional space. Projection pursuit algorithms iteratively find directions that maximize a scale function. The scale function often accounts on outliers and thus may be non-convex. McCoy and Tropp [MT11] consider one such function and develop a rounding which approximates the global optimizer. The or Robust PCA objective replaces the Frobenius norm objective with a sum of absolute values objective, since it is less sensitive to outliers. While this formulation is non-convex and NP-hard in general, many special cases are tractable, as discussed here [VN18]. Hardt and Moitra [HM13] provide a worst-case exhaustive search algorithm, where both the inliers and outliers are required to be in general position and the inliers are generated deterministically. For a more comprehensive treatment of robust subspace recovery we refer the reader to [LM18].

In a concurrent and independent work, Raghavendra and Yau proved related results for list-decodable subspace recovery [RY20].

## 2 Technical Overview

In this section, we give a high level overview of our algorithm and the new ideas that go into making it work. At a high level, our algorithm generalizes the framework for list-decodable estimation recently used to obtain an efficient algorithm for list-decodable regression in the recent work of [KKK19].

In the list-decodable subspace recovery problem, our input is a collection of samples , an of which are drawn i.i.d. from distribution with mean and unknown covariance of rank . For the purpose of this overview, we will think of itself being a projection matrix . Our algorithm starts from a polynomial feasibility program that simply tries to find a subset of sample that contains at least an points such that all of these points lie in a subspace of dimension . We can encode these two requirements as the following system of polynomial constraints as follows:

 Aw,Π:⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∑i∈[n]wi=αn∀i∈[n].wi(I−Π)xi=0∀i∈[n].w2i=wiΠ2=ΠTr(Π)=r⎫⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎭ (2.1)

In this system of constraints, are indicators (due to the constraint ) of the subset of sample we pick. Since , the constraints force to indicate a subset of the sample of size . To force that all the points indicated by lie in a subspace of dimension , we define variable intended to be the projector to this unknown subspace. The constraint forces to be a projection matrix and forces its rank to be . Given these constraints, it’s easy to verify the constraint forces to be in the subspace projected to by whenever .

### 2.1 Designing an Inefficient Algorithm

A feasible solution to the aforementioned constraint system (ignoring for now, the issue of efficiency), results in a subset of samples that span a subspace of dimension . However, there can be multiple dimensional subspaces that satisfy this requirement for various subsets chosen entirely out of the outliers3. Thus, even if we were to find a solution to this program, it’s not clear how to recover a subspace close to the one spanned by the inliers.

#### High-Entropy Distributions.

In order to force our solution to (2.1) to give us information about the true inliers, it seems beneficial to try to find not one but multiple solution pairs for so that at least one of the indicates a subset that has a substantial intersection with the true inliers. An important conceptual insight in (see Overview section in  [KKK19] for a longer discussion) is to thus ask for a probability distribution (which, at this point can be thought of as a method to ask for multiple solutions) over solutions satisfying (2.1). It turns out that we can make sure that there are solutions in the support of where indicates a subset with a non-trivial intersection with the inliers by finding a distribution so that is minimized. This constraint serves as a proxy for high entropy distributions. Formally, we can conclude the following useful result that shows that the expected (over ) intersection of a subset indicated by and the inliers is at least fraction of the inliers.

###### Proposition 1.

Let be a distribution on satisfying . Then, .

This result follows by a simple “weight-shifting” argument (if the distribution is over that do not intersect enough with the inliers, we can shift probability mass on the inliers and decrease )).

#### Anti-Concentration.

Our distribution over is guaranteed to contain with at least fraction of the points of in the intersection. Our hopes of finding information about the true subspace are pinned on such “good” at this point. We would like that for such , matches the ground truth subspace projected to by Let be the “intersection indices”, i.e., the set of indices of samples in for which . Why should this be true? Since we have no control over , it could, a priori, consist of the points in that span only a proper subspace, say of the ground truth subspace. In this case, may not equal .

The key observation is that in this “bad” case, there is a vector that is in the orthogonal complement of inside such that for every . That is, there’s a direction that inliers have a zero projection in fraction of the times. Such an eventuality is ruled out if we force , the distribution of the inliers to be anti-concentrated.

###### Definition 2 (Anti-Concentration).

A -valued random variable with mean and covariance is -anti-concentrated if for all satisfying , . A set is -anti-concentrated if the uniform distribution on is -anti-concentrated.

The following proposition is now a simple corollary:

###### Proposition 3 (High Intersection Implies Same Subspace (TV Distance to Parameter Distance)).

Let be a sample of size from for a projection matrix of rank such that the inliers are -anti-concentrated. Let be a subset of size such that for every for some projection matrix of rank . Suppose . Then, .

###### Proof.

Let for an orthonormal set of vectors s. Since for every , for every . Thus, . Since is -anti-concentrated, this must mean that .

Thus, . Or . On the other hand, by Cauchy-Schwarz inequality, with equality iff . Here, we used the facts that and that . Thus, . ∎

#### Inefficient Algorithm for Anti-Concentrated Distributions.

We can use the lemma above to give an inefficient algorithm for list-decodable subspace recovery.

###### Lemma 4 (Identifiability for Anti-Concentrated inliers).

Let be a sample drawn according to such that the inliers are -anti-concentrated for . Then, there is an (inefficient) randomized algorithm that finds a list of projectors of rank of size such that with probability at least .

###### Proof.

Let be any maximally uniform distribution over soluble subset-projection pairs where indicates a set of size at least . For , let be i.i.d. samples from . Output . To finish the proof, we will show that there is an such that . Then, we can then apply Proposition 3 to conclude that .

By Proposition 1, . Thus, by averaging, . Thus, the probability that at least one of satisfy is at least . ∎

### 2.2 Efficient Algorithm

Our key technical contributions are in making the above inefficient algorithm into an efficient algorithm using the sum-of-squares method. As in prior works, it is natural at this point to consider the algorithm that finds a pseudo-distribution minimizing and satisfying . This is indeed our starting point.

A precise discussion of pseudo-distributions and sum-of-squares proofs appears in Section 3 - at this point, we can simply think of pseudo-distributions as objects similar to the distribution that appeared above for all “properties” that have a low-degree sum-of-squares proofs. Sum-of-squares proofs are a system of reasoning about polynomial inequalities under polynomial inequality constraints. It turns out that the analog of Proposition 1 can be proven easily even for pseudo-distributions.

In the following we point out three novel technical contributions that go into making the inefficient algorithm discussed above into an efficient one.

#### Unconstrained Formalization of Certifiable Anti-Concentration

The key technical step is to find a sum-of-squares proof of the “high-intersection implies same subspace” property. This is a bit tricky because it relies on the anti-concentration property of which does not have natural formalization as a polynomial inequality. Thankfully, recent works [KKK19, RY20] formalized this property within the SoS system in slightly different ways.

Our proofs are more attuned to the formalization in  [KKK19]. But for technical reasons the precise formulation proposed in  [KKK19] is not directly useful for us. Briefly and somewhat imprecisely put, anti-concentration formalizations posit that there be a low-degree SoS proof (in the variable ) for polynomial inequalities of the form for a univariate polynomial that approximates a Dirac Delta function at . In the prior works, this requirement was formulated in a constrained manner (“ implies ”). For the application to subspace recovery, natural arguments require unconstrained versions of the above inequality (i.e. that hold without the norm bound constraint on ). Definition 1 formulates this condition precisely. One can then modify the constructions of polynomials used in  [KKK19] and show that this notion of anti-concentration holds for natural distribution families such as Gaussians.

#### Spectral Bound on Subsamples

Given our modified formalization of anti-concentration, we give a sum-of-squares proof of the analog of Proposition 3. This statement (see Lemma 5) is a key technical contribution of our work and we expect will find a applications in future works. It can be seen as a SoS version of results that relate total variation distance (this corresponds to the where is the normalized interesection size) between two certifiably anti-concentrated distributions to the Frobenius norm distance between their covariances.

#### Exponential Error Reduction and Large List Rounding

The proof of Lemma 4.5 involves a new technical powering step that allows exponential error reduction. This step allows exponentially reducing the error guarantee of list-decoding at the cost of blowing up the list-size by applying a natural extension of the rounding “by votes” method introduced in [KKK19]. Our powering technique is quite general and will likely find new uses in list-decodable estimation.

#### Pruning Lists

In order to get optimal list size bounds, the last step in our algorithm introduces a “pruning method” on the list obtained by rounding pseudo-distributions. It involves a simple test based on new fresh sample that uses additional fresh samples, say and selects a member of the large list such that is a large enough fraction of .

## 3 Preliminaries

Throughout this paper, for a vector , we use to denote the Euclidean norm of . For a matrix , we use to denote the spectral norm of and to denote the Frobenius norm of . For symmetric matrices we use to denote the PSD/Loewner ordering over eigenvalues of . For a , rank- symmetric matrix , we use to denote the Eigenvalue Decomposition, where is a matrix with orthonormal columns and is a diagonal matrix denoting the eigenvalues. We use to denote the Moore-Penrose Pseudoinverse, where inverts the non-zero eigenvalues of . If , we use to denote taking the square-root of the non-zero eigenvalues. We use to denote the Projection matrix corresponding to the column/row span of . Since , the pseudo-inverse of is itself, i.e. .

In the following, we define pseudo-distributions and sum-of-squares proofs. Detailed exposition of the sum-of-squares method and its usage in average-case algorithm design can be found in  [FKP19] and the lecture notes [BS16].

### 3.1 Pseudo-distributions

Let be a tuple of indeterminates and let be the set of polynomials with real coefficients and indeterminates . We say that a polynomial is a sum-of-squares (sos) if there are polynomials such that .

Pseudo-distributions are generalizations of probability distributions. We can represent a discrete (i.e., finitely supported) probability distribution over by its probability mass function such that and . Similarly, we can describe a pseudo-distribution by its mass function by relaxing the constraint to passing certain low-degree non-negativity tests.

Concretely, a level- pseudo-distribution is a finitely-supported function such that and for every polynomial of degree at most . (Here, the summations are over the support of .) A straightforward polynomial-interpolation argument shows that every level--pseudo distribution satisfies and is thus an actual probability distribution. We define the pseudo-expectation of a function on with respect to a pseudo-distribution , denoted , as

 ~ED(x)f(x)=∑xD(x)f(x). (3.1)

The degree- moment tensor of a pseudo-distribution is the tensor . In particular, the moment tensor has an entry corresponding to the pseudo-expectation of all monomials of degree at most in . The set of all degree- moment tensors of probability distribution is a convex set. Similarly, the set of all degree- moment tensors of degree pseudo-distributions is also convex. Unlike moments of distributions, there’s an efficient separation oracle for moment tensors of pseudo-distributions.

###### Fact 1 ([Sho87, Par00, Nes00, Las01]).

For any , the following set has a -time weak separation oracle (in the sense of [GLS81]):

 {~ED(x)(1,x1,x2,…,xn)⊗d∣ degree-d pseudo-distribution D over Rn}. (3.2)

This fact, together with the equivalence of weak separation and optimization [GLS81] allows us to efficiently optimize over pseudo-distributions (approximately)—this algorithm is referred to as the sum-of-squares algorithm. The level- sum-of-squares algorithm optimizes over the space of all level- pseudo-distributions that satisfy a given set of polynomial constraints (defined below).

###### Definition 2 (Constrained pseudo-distributions).

Let be a level- pseudo-distribution over . Let be a system of polynomial inequality constraints. We say that satisfies the system of constraints at degree , denoted , if for every and every sum-of-squares polynomial with , .

We write (without specifying the degree) if holds. Furthermore, we say that holds approximately if the above inequalities are satisfied up to an error of , where denotes the Euclidean norm4 of the cofficients of a polynomial in the monomial basis.

We remark that if is an actual (discrete) probability distribution, then we have if and only if is supported on solutions to the constraints . We say that a system of polynomial constraints is explicitly bounded if it contains a constraint of the form . The following fact is a consequence of Fact 1 and [GLS81],

###### Fact 3 (Efficient Optimization over Pseudo-distributions).

There exists an -time algorithm that, given any explicitly bounded and satisfiable system5 of polynomial constraints in variables, outputs a level- pseudo-distribution that satisfies approximately.

### 3.2 Sum-of-squares proofs

Let and be multivariate polynomials in . A sum-of-squares proof that the constraints imply the constraint consists of polynomials such that

 g=∑S⊆[m]pS⋅Πi∈Sfi. (3.3)

We say that this proof has degree if for every set , the polynomial has degree at most . If there is a degree SoS proof that implies , we write:

 {fi⩾0∣i⩽r}\sststileℓ{g⩾0}. (3.4)

For all polynomials and for all functions , , such that each of the coordinates of the outputs are polynomials of the inputs, we have the following inference rules.

The first one derives new inequalities by addition/multiplication:

The next one derives new inequalities by transitivity:

 A\sststileℓB,B\sststileℓ′CA\sststileℓ⋅ℓ′C, (Transitivity Rule)

Finally, the last rule derives new inequalities via substitution:

 {F⩾0}\sststileℓ{G⩾0}{F(H)⩾0}\sststileℓ⋅deg(H){G(H)⩾0}. (Substitution Rule)

Low-degree sum-of-squares proofs are sound and complete if we take low-level pseudo-distributions as models. Concretely, sum-of-squares proofs allow us to deduce properties of pseudo-distributions that satisfy some constraints.

###### Fact 4 (Soundness).

If for a level- pseudo-distribution and there exists a sum-of-squares proof , then .

If the pseudo-distribution satisfies only approximately, soundness continues to hold if we require an upper bound on the bit-complexity of the sum-of-squares (number of bits required to write down the proof). In our applications, the bit complexity of all sum of squares proofs will be (assuming that all numbers in the input have bit complexity ). This bound suffices in order to argue about pseudo-distributions that satisfy polynomial constraints approximately.

The following fact shows that every property of low-level pseudo-distributions can be derived by low-degree sum-of-squares proofs.

###### Fact 5 (Completeness).

Suppose and is a collection of polynomial constraints with degree at most , and for some finite .

Let be a polynomial constraint. If every degree- pseudo-distribution that satisfies also satisfies , then for every , there is a sum-of-squares proof .

We will use the following Cauchy-Schwarz inequality for pseudo-distributions:

###### Fact 6 (Cauchy-Schwarz for Pseudo-distributions).

Let be polynomials of degree at most in indeterminate . Then, for any degree d pseudo-distribution , .

###### Fact 7 (Hölder’s Inequality for Pseudo-Distributions).

Let be polynomials of degree at most in indeterminate . Fix . Then, for any degree pseudo-distribution , .

The following fact is a simple corollary of the fundamental theorem of algebra:

###### Fact 8.

For any univariate degree polynomial for all , .

This can be extended to univariate polynomial inequalities over intervals of . 2

###### Fact 9 (Fekete and Markov-Lukacs, see [Lau09]).

For any univariate degree polynomial for , .

###### Fact 10.

Let be a matrix. Then,

 \sststile2v{v⊤Av⩾0}.

#### Reweightings Pseudo-distributions.

The following fact is easy to verify and has been used in several works (see [BKS17] for example).

###### Fact 11 (Reweighting).

Let be a pseudo-distribution of degree satisfying a set of polynomial constraints in variable . Let be a sum-of-squares polynomial of degree such that . Let be the pseudo-distribution defined so that for any polynomial , . Then, is a pseudo-distribution of degree satisfying .

## 4 Algorithm

In this section, we describe an efficient algorithm for list-decodable subspace recovery. Let be the following system of polynomial inequality constraints in indeterminates .

 Aw,Π:⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∑i∈[n]wi=αn∀i∈[n].wi(I−Π)xi=0∀i∈[n].w2i=wiΠ2=ΠTr(Π)=r⎫⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎭ (4.1)

Our algorithm finds a pseudo-distribution consistent with . It then uses the large-list rounding algorithm as a first step to get a polynomial (in ) size list that contains a subspace that is -close in Frobenius norm to the range space of . Finally, we apply a pruning procedure to obtain a size from the large list procedure.

{mdframed}

[nobreak=true]

###### Algorithm 1.

List-Decodable Subspace Recovery

Given:

Sample of size drawn according to such that the is -certifiably -anti-concentrated, has mean and the condition number of is .

Operation:

1. Let for a large enough constant .

2. Compute a -degree pseudo-distribution satisfying that minimizes .

3. Run Large-List Rounding with (Algorithm 2) to output a sized list .

4. Run pruning (Algorithm 3) on and output the resulting list .

Output:

A list of projection matrices containing a satisfying .

{mdframed}

[nobreak=true]

###### Algorithm 2.

Large List Rounding

Given:

A pseudo-distribution of degree satisfying and minimizing such that , for a large constant , accuracy parameter .

Operation:

Repeat times:

1. Let such that . Draw with probability proportional to .

2. Let be the corresponding matrix. Compute the Eigenvalue Decomposition of and let , where are the eigenvectors corresponding to the top- eigenvalues of .

3. Add to the list .

Output:

A list of size containing a Projection matrix satisfying .

{mdframed}

[nobreak=true]

###### Algorithm 3.

Pruning Lists

Given:

A list of projection matrices, a threshold , fresh samples , drawn according to .

Operation:

For :

1. Compute the subset of matrices such that .

2. If is non-empty, pick an arbitrary matrix from this set and add it to .

Output:

A of size such that there exists a Projection matrix satisfying .

### 4.1 Analysis of Algorithm 1.

The following theorem captures the guarantees we prove on Algorithm 1.

###### Theorem 4 (List-Decodable Subspace Recovery, restated).

Let be such that has rank and condition number , and is . Then, Algorithm 1 takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . Further, Algorithm 1 has time complexity at most .

Our proof of Theorem 4 is based on the following four pieces. The key technical piece is the following consequence of the constraint system in the low-degree SoS proof system.

###### Lemma 5.

Given and any , and an instance of , such that the inlier distribution has mean and is -certifiably -anti-concentrated,

 Aw,Π\sststile2k+tΠ,w⎧⎨⎩(1|I|∑i∈Iwi)t∥Π−Π∗∥kF=(1|I|∑i∈Iwi)t2k/2Tr(MΠ∗M)k/2⩽(2rκ)k/2δt⎫⎬⎭.

where is the condition number of and is the corresponding rank- Projection matrix.

Next, we show that “high-entropy” pseudo-distributions must place a large enough weight on the inliers. This is similar to the usage of high-entropy pseudo-distributions in  [KKK19].

###### Lemma 6 (Large weight on inliers from high-etropy constraints).

Let pseudo-distribution of degree that satisfies and minimizes . Then, .

The above two lemmas allow us to argue that our large-list rounding algorithm (Algorithm 2) succeeds.

###### Lemma 7 (Large-List Subspace Recovery, Theorem 5 restated).

Let be such that has rank and condition number , and is -certifiably -anti-concentrated. For any , there exists an algorithm that takes input samples from and outputs a list of size of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .

Finally, we show that we can prune the list output by Algorithm 2 to a list of size such that it still contains a Projection matrix close to . Formally,

###### Lemma 8 (Pruning Algorithm).

Let be the list output by Algorithm 2. Given fresh samples from , Algorithm 3 outputs a list of size such that with probability at least , there exists a projection matrix satisfying .

Theorem 4 follows easily by combining the above claims :

###### Proof of Theorem  4.

It follows from Theorem 2 that is -certifiably -anti-concentrated. Since the inliers are drawn from it suffices to set . By Lemma 8 that the uniform distribution on is also -certifiably -anti-concentrated if the number of samples are at least . Since the hypothesis of Lemma 7 is now satisfies with and , Algorithm 2 runs in time and outputs a list of size such that with probability at least , it contains a projector satisfying . Recall, is the projector corresponding to .

Since we now have a list satifying the hypothesis for Lemma 8 and access to fresh samples we can conlcude that Algorithm 3 outputs a list of size which containts a projection matrix satisfying , as desired. The overall running time is dominated by Algorithm 2, which completes the proof.

### 4.2 Analyzing Aw,Π: Proof of Lemma  5

We first show that covariance of all large enough subsamples of certifiably anti-concentrated samples have lower-bounded eigenvalues. Recall, for a PSD matrix , denotes the Eigenvalue Decomposition and denotes the corresponding rank- Projection matrix.

###### Lemma 9 (Covariance of Subsets of Certifiably Anti-Concentrated Distributions).

Let be -certifiably -anti-concentrated with . Then,

 {w2i=wi∣∀i}\sststile2kw,v{1nn∑i=1∥v∥k−22wi⟨