ListDecodable Subspace Recovery via SumofSquares
Abstract
We give the first efficient algorithm for the problem of listdecodable subspace recovery. Our algorithm takes input samples () are generated i.i.d. from Gaussian distribution on with covariance of rank and the rest are arbitrary, potentially adversarial outliers. It outputs a list of projection matrices guaranteed to contain a projection matrix such that
Our algorithm builds on the recently developed framework for listdecodable learning via the sumofsquares (SoS) method [KKK19, RY20] with some key technical and conceptual advancements. Our key conceptual contribution involves showing a (SoS “certified”) lower bound on the eigenvalues of covariances of arbitrary small subsamples of an i.i.d. sample of a certifiably anticoncentrated distribution. One of our key technical contributions gives a new method that allows error reduction “within SoS” with only a logarithmic cost in the exponent in the running time (in contrast to polynomial cost in [KKK19, RY20]).
In a concurrent and independent work, Raghavendra and Yau proved related results for listdecodable subspace recovery [RY20].
Algorithm
Contents
1 Introduction
An influential recent line of work [KLS09, ABL13, DKK16, LRV16, CSV17, KS17a, KS17b, HL17, DKK18, DKS17a, KKM18] has focused on designing algorithms for basic statistical estimation tasks in the presence of adversarial outliers. This has led to a body of work on outlierrobust estimation of basic parameters of distributions such as mean, covariance [DKK16, DKS17b, CDG19, DKK17, DKK18, CDGW19] and moment tensors [KS17b] along with applications to “downstream” learning tasks such as linear and polynomial regression [DKS17c, KKM18, DKK19, PSBR18]. The upshot of this line of work is a detailed understanding of efficient robust estimation when the fraction of inliers (), but a fixed fraction of arbitrary adversarial outliers in the input data.
In this work, we focus on the harsher listdecodable estimation model where the fraction of inliers is  i.e.,a majority of the input sample are outliers. First considered in [BBV08] in the context of clustering, this was proposed as a model for untrusted data in a recent influential recent work of Charikar, Steinhardt and Valiant [CSV17]. Since unique recovery is informationtheoretically impossible in this setting, the goal is to recover a small (ideally ) size list of parameters one of which is guaranteed to be close to those of the inlier distribution. A recent series of works have resulted in a highlevel blueprint based on the sumofsquares method for listdecodable estimation yielding algorithms for listdecodable mean estimation [DKS18, KS17a] and linear regression [KKK19, RY20].
We extend this line of work by giving the first efficient algorithm for listdecodable subspace recovery. In this setting, we are given data with fraction inliers generated i.i.d. according
Our results work more generally for any distribution that satisfies certifiable anticoncentration and mild concentration properties (concentration of PSD forms). Certifiable anticoncentration was first defined and studied in recent works on listdecodable regression [KKK19, RY20]. Gaussian distribution and uniform distribution on sphere (restricted to a subspace) are natural examples of distributions satisfying this property. We note that Karmalkar et. al. [KKK19] proved that anticoncentration of is necessary for listdecodable regression (and thus also subspace recovery) to be information theoretically possible.
Why ListDecodable Estimation?
Listdecodable estimation is a strict generalization of related and wellstudied clustering problems (for e.g., listdecodable mean estimation generalizes clustering spherical mixture models, listdecodable regression generalizes mixed linear regression). In our case, listdecodable subspace recovery generalizes the wellstudied problem of subspace clustering where given a mixtur of distributions with covariances nonzero in different subspaces, the goal is to recover the underlying subspaces [AGGR98, CFZ99, GNC99, PJAM02, AY00]. Algorithms in this model thus naturally yield robust algorithms for the related clustering formulations. In contrast to known results, such algorithms allow “partial recovery” (e.g. for example recovery or fewer clusters) even in the presence of outliers that garble up one or more clusters completely.
Another important implication of listdecodable estimation is algorithms for unique recovery that work all the way down to the informationtheoretic threshold (i.e. fraction of inliers ). Thus, specifically in our case, we obtain an algorithm for (uniquely) estimating the subspace spanned by the inlier distribution whenever the fraction of inliers satisfy  the information theoretically minimum possible value. We note that such a result will follow from outlierrobust covariance estimation algorithms [DKK16, LRV16, CDGW19] whenever is sufficiently close to . While prior works do not specify precise constants, all known works appear to require at least .
1.1 Our Results
We are ready to formally state our results. Our results apply to input samples generated according to the following model:
Model 1 (Robust Subspace Recovery with Large Outliers).
For and , let , be a rank PSD matrix and let be a distribution on with mean and covariance . Let denote the following probabilistic process to generate samples, with inliers and outliers :

Construct by choosing i.i.d. samples from .

Construct by choosing the remaining points arbitrarily and potentially adversarially w.r.t. the inliers.
Remark 2.
We will mainly focus on the case when . The case of nonzero can be easily reduced to the case of by modifying samples by randomly pairing them up and subtracting off samples in each pair (this changes the fraction of inliers from to ).
Remark 3.
Our results naturally extend to the harsher strong contamination model (where one first chooses an i.i.d. sample from and then corrupts an arbitrary fraction of them) with no change in the algorithm.
An approximate listdecodable subspace recovery algorithm takes input a sample drawn according to and outputs a list of absolute constant (depending only on ) such that there exists a satisfying , where is the projector to the range space of .
Before stating our results we observe that since listdecodable subspace recovery strictly generalizes listdecodable regression (by viewing samples as dimensional points with a rank covariance), we can import the result of Karamalkar, Klivans and Kothari [KKK19] that shows the informationtheoretic necessity of anticoncentration of the distribution .
Fact 4 (Theorem 6.1, Page 19 in [Kkk19]).
There exists a distribution that anticoncentrated for every but there is no algorithm for approximate listdecodable subspace recovery for that outputs a list of size .
The distribution is simply the uniform distribution on an affine subcube of dimension of (and more generally, ary discrete cube).
Our first main result shows that given any arbitrarily small , we can recover a polynomial (in the rank ) size list of subspaces that contains a satisfying . The surprising aspect of this result is that we can get an error that can be made arbitrarily small (independent of the rank or the dimension ) at the cost of increasing the list size from a fixed constant to polynomially large in the rank of . This result crucially relies on our new exponential error reduction method (see Lemma 7).
Theorem 5 (LargeList Subspace Recovery).
Let be such that has rank and condition number , and is certifiably anticoncentrated. For any , there exists an algorithm that takes input samples from and outputs a list of size of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .
We use a new pruning procedure to get the optimal list size of at the cost of increasing the Frobenius error to .
Theorem 6 (ListDecodable Subspace Recovery).
Let be such that has rank and condition number , and is . Then, there exists an algorithm that takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .
As discussed above, our results immediately extends by means of a simple reduction to the case when is nonzero.
Corollary 7 (LargeList Affine Recovery).
Let be such that has rank and condition number , and is . Then, there exists an algorithm that takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .
1.2 Related Work
Subspace Clustering.
Prior work on subspace recovery focused on the closely related problem of subspace clustering in high dimension, where to goal is to partition a set of points into clusters according to their underlying subspaces. Subspace clustering methods have found numerous applications computer vision tasks such as image compression [HWHM06], motion segmentation [CK98], data mining [PHL04], disease classfication [MM14], recommendaation systems [ZFIM12] etc. Algorithms for subspace clustering include iterative methods, algebraic and statistical methods and spectral techniques. We refer the readers to the following surveys for a comprehensive overview [EV13, PHL04]. Elhamifar and Vidal [EV13] also introduced sparse subspace clustering, building on the compressed sensing and matrix completion literature. Soltanolkotabi et. al. [SEC14] extend sparse subspace clustering to work in the presence of noise and provide rigorous algorithmic guarantees. They assume the outliers contribute a small fraction of the input and are distributed uniformly distributed of the unit sphere.
Robust Subspace Recovery.
A recent line of work on robust subspace recovery has focused on projection pursuit techniques, PCA (robust PCA), exhaustive subspace search and robust covariance estimation. Here, the goal is to recover a set of inliers that span a single lowdimensional space. Projection pursuit algorithms iteratively find directions that maximize a scale function. The scale function often accounts on outliers and thus may be nonconvex. McCoy and Tropp [MT11] consider one such function and develop a rounding which approximates the global optimizer. The or Robust PCA objective replaces the Frobenius norm objective with a sum of absolute values objective, since it is less sensitive to outliers. While this formulation is nonconvex and NPhard in general, many special cases are tractable, as discussed here [VN18]. Hardt and Moitra [HM13] provide a worstcase exhaustive search algorithm, where both the inliers and outliers are required to be in general position and the inliers are generated deterministically. For a more comprehensive treatment of robust subspace recovery we refer the reader to [LM18].
In a concurrent and independent work, Raghavendra and Yau proved related results for listdecodable subspace recovery [RY20].
2 Technical Overview
In this section, we give a high level overview of our algorithm and the new ideas that go into making it work. At a high level, our algorithm generalizes the framework for listdecodable estimation recently used to obtain an efficient algorithm for listdecodable regression in the recent work of [KKK19].
In the listdecodable subspace recovery problem, our input is a collection of samples , an of which are drawn i.i.d. from distribution with mean and unknown covariance of rank . For the purpose of this overview, we will think of itself being a projection matrix . Our algorithm starts from a polynomial feasibility program that simply tries to find a subset of sample that contains at least an points such that all of these points lie in a subspace of dimension . We can encode these two requirements as the following system of polynomial constraints as follows:
(2.1) 
In this system of constraints, are indicators (due to the constraint ) of the subset of sample we pick. Since , the constraints force to indicate a subset of the sample of size . To force that all the points indicated by lie in a subspace of dimension , we define variable intended to be the projector to this unknown subspace. The constraint forces to be a projection matrix and forces its rank to be . Given these constraints, it’s easy to verify the constraint forces to be in the subspace projected to by whenever .
2.1 Designing an Inefficient Algorithm
A feasible solution to the aforementioned constraint system (ignoring for now, the issue of efficiency), results in a subset of samples that span a subspace of dimension . However, there can be multiple dimensional subspaces that satisfy this requirement for various subsets chosen entirely out of the outliers
HighEntropy Distributions.
In order to force our solution to (2.1) to give us information about the true inliers, it seems beneficial to try to find not one but multiple solution pairs for so that at least one of the indicates a subset that has a substantial intersection with the true inliers. An important conceptual insight in (see Overview section in [KKK19] for a longer discussion) is to thus ask for a probability distribution (which, at this point can be thought of as a method to ask for multiple solutions) over solutions satisfying (2.1). It turns out that we can make sure that there are solutions in the support of where indicates a subset with a nontrivial intersection with the inliers by finding a distribution so that is minimized. This constraint serves as a proxy for high entropy distributions. Formally, we can conclude the following useful result that shows that the expected (over ) intersection of a subset indicated by and the inliers is at least fraction of the inliers.
Proposition 1.
Let be a distribution on satisfying . Then, .
This result follows by a simple “weightshifting” argument (if the distribution is over that do not intersect enough with the inliers, we can shift probability mass on the inliers and decrease )).
AntiConcentration.
Our distribution over is guaranteed to contain with at least fraction of the points of in the intersection. Our hopes of finding information about the true subspace are pinned on such “good” at this point. We would like that for such , matches the ground truth subspace projected to by Let be the “intersection indices”, i.e., the set of indices of samples in for which . Why should this be true? Since we have no control over , it could, a priori, consist of the points in that span only a proper subspace, say of the ground truth subspace. In this case, may not equal .
The key observation is that in this “bad” case, there is a vector that is in the orthogonal complement of inside such that for every . That is, there’s a direction that inliers have a zero projection in fraction of the times. Such an eventuality is ruled out if we force , the distribution of the inliers to be anticoncentrated.
Definition 2 (AntiConcentration).
A valued random variable with mean and covariance is anticoncentrated if for all satisfying , . A set is anticoncentrated if the uniform distribution on is anticoncentrated.
The following proposition is now a simple corollary:
Proposition 3 (High Intersection Implies Same Subspace (TV Distance to Parameter Distance)).
Let be a sample of size from for a projection matrix of rank such that the inliers are anticoncentrated. Let be a subset of size such that for every for some projection matrix of rank . Suppose . Then, .
Proof.
Let for an orthonormal set of vectors s. Since for every , for every . Thus, . Since is anticoncentrated, this must mean that .
Thus, . Or . On the other hand, by CauchySchwarz inequality, with equality iff . Here, we used the facts that and that . Thus, . ∎
Inefficient Algorithm for AntiConcentrated Distributions.
We can use the lemma above to give an inefficient algorithm for listdecodable subspace recovery.
Lemma 4 (Identifiability for AntiConcentrated inliers).
Let be a sample drawn according to such that the inliers are anticoncentrated for . Then, there is an (inefficient) randomized algorithm that finds a list of projectors of rank of size such that with probability at least .
Proof.
Let be any maximally uniform distribution over soluble subsetprojection pairs where indicates a set of size at least . For , let be i.i.d. samples from . Output . To finish the proof, we will show that there is an such that . Then, we can then apply Proposition 3 to conclude that .
By Proposition 1, . Thus, by averaging, . Thus, the probability that at least one of satisfy is at least . ∎
2.2 Efficient Algorithm
Our key technical contributions are in making the above inefficient algorithm into an efficient algorithm using the sumofsquares method. As in prior works, it is natural at this point to consider the algorithm that finds a pseudodistribution minimizing and satisfying . This is indeed our starting point.
A precise discussion of pseudodistributions and sumofsquares proofs appears in Section 3  at this point, we can simply think of pseudodistributions as objects similar to the distribution that appeared above for all “properties” that have a lowdegree sumofsquares proofs. Sumofsquares proofs are a system of reasoning about polynomial inequalities under polynomial inequality constraints. It turns out that the analog of Proposition 1 can be proven easily even for pseudodistributions.
In the following we point out three novel technical contributions that go into making the inefficient algorithm discussed above into an efficient one.
Unconstrained Formalization of Certifiable AntiConcentration
The key technical step is to find a sumofsquares proof of the “highintersection implies same subspace” property. This is a bit tricky because it relies on the anticoncentration property of which does not have natural formalization as a polynomial inequality. Thankfully, recent works [KKK19, RY20] formalized this property within the SoS system in slightly different ways.
Our proofs are more attuned to the formalization in [KKK19]. But for technical reasons the precise formulation proposed in [KKK19] is not directly useful for us. Briefly and somewhat imprecisely put, anticoncentration formalizations posit that there be a lowdegree SoS proof (in the variable ) for polynomial inequalities of the form for a univariate polynomial that approximates a Dirac Delta function at . In the prior works, this requirement was formulated in a constrained manner (“ implies ”). For the application to subspace recovery, natural arguments require unconstrained versions of the above inequality (i.e. that hold without the norm bound constraint on ). Definition 1 formulates this condition precisely. One can then modify the constructions of polynomials used in [KKK19] and show that this notion of anticoncentration holds for natural distribution families such as Gaussians.
Spectral Bound on Subsamples
Given our modified formalization of anticoncentration, we give a sumofsquares proof of the analog of Proposition 3. This statement (see Lemma 5) is a key technical contribution of our work and we expect will find a applications in future works. It can be seen as a SoS version of results that relate total variation distance (this corresponds to the where is the normalized interesection size) between two certifiably anticoncentrated distributions to the Frobenius norm distance between their covariances.
Exponential Error Reduction and Large List Rounding
The proof of Lemma 4.5 involves a new technical powering step that allows exponential error reduction. This step allows exponentially reducing the error guarantee of listdecoding at the cost of blowing up the listsize by applying a natural extension of the rounding “by votes” method introduced in [KKK19]. Our powering technique is quite general and will likely find new uses in listdecodable estimation.
Pruning Lists
In order to get optimal list size bounds, the last step in our algorithm introduces a “pruning method” on the list obtained by rounding pseudodistributions. It involves a simple test based on new fresh sample that uses additional fresh samples, say and selects a member of the large list such that is a large enough fraction of .
3 Preliminaries
Throughout this paper, for a vector , we use to denote the Euclidean norm of . For a matrix , we use to denote the spectral norm of and to denote the Frobenius norm of . For symmetric matrices we use to denote the PSD/Loewner ordering over eigenvalues of . For a , rank symmetric matrix , we use to denote the Eigenvalue Decomposition, where is a matrix with orthonormal columns and is a diagonal matrix denoting the eigenvalues. We use to denote the MoorePenrose Pseudoinverse, where inverts the nonzero eigenvalues of . If , we use to denote taking the squareroot of the nonzero eigenvalues. We use to denote the Projection matrix corresponding to the column/row span of . Since , the pseudoinverse of is itself, i.e. .
In the following, we define pseudodistributions and sumofsquares proofs. Detailed exposition of the sumofsquares method and its usage in averagecase algorithm design can be found in [FKP19] and the lecture notes [BS16].
3.1 Pseudodistributions
Let be a tuple of indeterminates and let be the set of polynomials with real coefficients and indeterminates . We say that a polynomial is a sumofsquares (sos) if there are polynomials such that .
Pseudodistributions are generalizations of probability distributions. We can represent a discrete (i.e., finitely supported) probability distribution over by its probability mass function such that and . Similarly, we can describe a pseudodistribution by its mass function by relaxing the constraint to passing certain lowdegree nonnegativity tests.
Concretely, a level pseudodistribution is a finitelysupported function such that and for every polynomial of degree at most . (Here, the summations are over the support of .) A straightforward polynomialinterpolation argument shows that every levelpseudo distribution satisfies and is thus an actual probability distribution. We define the pseudoexpectation of a function on with respect to a pseudodistribution , denoted , as
(3.1) 
The degree moment tensor of a pseudodistribution is the tensor . In particular, the moment tensor has an entry corresponding to the pseudoexpectation of all monomials of degree at most in . The set of all degree moment tensors of probability distribution is a convex set. Similarly, the set of all degree moment tensors of degree pseudodistributions is also convex. Unlike moments of distributions, there’s an efficient separation oracle for moment tensors of pseudodistributions.
Fact 1 ([Sho87, Par00, Nes00, Las01]).
For any , the following set has a time weak separation oracle (in the sense of [GLS81]):
(3.2) 
This fact, together with the equivalence of weak separation and optimization [GLS81] allows us to efficiently optimize over pseudodistributions (approximately)—this algorithm is referred to as the sumofsquares algorithm. The level sumofsquares algorithm optimizes over the space of all level pseudodistributions that satisfy a given set of polynomial constraints (defined below).
Definition 2 (Constrained pseudodistributions).
Let be a level pseudodistribution over . Let be a system of polynomial inequality constraints. We say that satisfies the system of constraints at degree , denoted , if for every and every sumofsquares polynomial with , .
We write (without specifying the degree) if holds.
Furthermore, we say that holds approximately if the above inequalities are satisfied up to an error of , where denotes the Euclidean norm
We remark that if is an actual (discrete) probability distribution, then we have if and only if is supported on solutions to the constraints . We say that a system of polynomial constraints is explicitly bounded if it contains a constraint of the form . The following fact is a consequence of Fact 1 and [GLS81],
Fact 3 (Efficient Optimization over Pseudodistributions).
There exists an time algorithm that, given any explicitly bounded and satisfiable system
3.2 Sumofsquares proofs
Let and be multivariate polynomials in . A sumofsquares proof that the constraints imply the constraint consists of polynomials such that
(3.3) 
We say that this proof has degree if for every set , the polynomial has degree at most . If there is a degree SoS proof that implies , we write:
(3.4) 
For all polynomials and for all functions , , such that each of the coordinates of the outputs are polynomials of the inputs, we have the following inference rules.
The first one derives new inequalities by addition/multiplication:
(Addition/Multiplication Rules) 
The next one derives new inequalities by transitivity:
(Transitivity Rule) 
Finally, the last rule derives new inequalities via substitution:
(Substitution Rule) 
Lowdegree sumofsquares proofs are sound and complete if we take lowlevel pseudodistributions as models. Concretely, sumofsquares proofs allow us to deduce properties of pseudodistributions that satisfy some constraints.
Fact 4 (Soundness).
If for a level pseudodistribution and there exists a sumofsquares proof , then .
If the pseudodistribution satisfies only approximately, soundness continues to hold if we require an upper bound on the bitcomplexity of the sumofsquares (number of bits required to write down the proof). In our applications, the bit complexity of all sum of squares proofs will be (assuming that all numbers in the input have bit complexity ). This bound suffices in order to argue about pseudodistributions that satisfy polynomial constraints approximately.
The following fact shows that every property of lowlevel pseudodistributions can be derived by lowdegree sumofsquares proofs.
Fact 5 (Completeness).
Suppose and is a collection of polynomial constraints with degree at most , and for some finite .
Let be a polynomial constraint. If every degree pseudodistribution that satisfies also satisfies , then for every , there is a sumofsquares proof .
We will use the following CauchySchwarz inequality for pseudodistributions:
Fact 6 (CauchySchwarz for Pseudodistributions).
Let be polynomials of degree at most in indeterminate . Then, for any degree d pseudodistribution , .
Fact 7 (Hölder’s Inequality for PseudoDistributions).
Let be polynomials of degree at most in indeterminate . Fix . Then, for any degree pseudodistribution , .
The following fact is a simple corollary of the fundamental theorem of algebra:
Fact 8.
For any univariate degree polynomial for all , .
This can be extended to univariate polynomial inequalities over intervals of . 2
Fact 9 (Fekete and MarkovLukacs, see [Lau09]).
For any univariate degree polynomial for , .
Fact 10.
Let be a matrix. Then,
Reweightings Pseudodistributions.
The following fact is easy to verify and has been used in several works (see [BKS17] for example).
Fact 11 (Reweighting).
Let be a pseudodistribution of degree satisfying a set of polynomial constraints in variable . Let be a sumofsquares polynomial of degree such that . Let be the pseudodistribution defined so that for any polynomial , . Then, is a pseudodistribution of degree satisfying .
4 Algorithm
In this section, we describe an efficient algorithm for listdecodable subspace recovery. Let be the following system of polynomial inequality constraints in indeterminates .
(4.1) 
Our algorithm finds a pseudodistribution consistent with . It then uses the largelist rounding algorithm as a first step to get a polynomial (in ) size list that contains a subspace that is close in Frobenius norm to the range space of . Finally, we apply a pruning procedure to obtain a size from the large list procedure.
[nobreak=true]
Algorithm 1.
ListDecodable Subspace Recovery
 Given:

Sample of size drawn according to such that the is certifiably anticoncentrated, has mean and the condition number of is .
 Operation:
 Output:

A list of projection matrices containing a satisfying .
[nobreak=true]
Algorithm 2.
Large List Rounding
 Given:

A pseudodistribution of degree satisfying and minimizing such that , for a large constant , accuracy parameter .
 Operation:

Repeat times:

Let such that . Draw with probability proportional to .

Let be the corresponding matrix. Compute the Eigenvalue Decomposition of and let , where are the eigenvectors corresponding to the top eigenvalues of .

Add to the list .

 Output:

A list of size containing a Projection matrix satisfying .
[nobreak=true]
Algorithm 3.
Pruning Lists
 Given:

A list of projection matrices, a threshold , fresh samples , drawn according to .
 Operation:

For :

Compute the subset of matrices such that .

If is nonempty, pick an arbitrary matrix from this set and add it to .

 Output:

A of size such that there exists a Projection matrix satisfying .
4.1 Analysis of Algorithm 1.
The following theorem captures the guarantees we prove on Algorithm 1.
Theorem 4 (ListDecodable Subspace Recovery, restated).
Let be such that has rank and condition number , and is . Then, Algorithm 1 takes as input samples from and outputs a list of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . Further, Algorithm 1 has time complexity at most .
Our proof of Theorem 4 is based on the following four pieces. The key technical piece is the following consequence of the constraint system in the lowdegree SoS proof system.
Lemma 5.
Given and any , and an instance of , such that the inlier distribution has mean and is certifiably anticoncentrated,
where is the condition number of and is the corresponding rank Projection matrix.
Next, we show that “highentropy” pseudodistributions must place a large enough weight on the inliers. This is similar to the usage of highentropy pseudodistributions in [KKK19].
Lemma 6 (Large weight on inliers from highetropy constraints).
Let pseudodistribution of degree that satisfies and minimizes . Then, .
The above two lemmas allow us to argue that our largelist rounding algorithm (Algorithm 2) succeeds.
Lemma 7 (LargeList Subspace Recovery, Theorem 5 restated).
Let be such that has rank and condition number , and is certifiably anticoncentrated. For any , there exists an algorithm that takes input samples from and outputs a list of size of projection matrices such that with probability at least over the draw of the sample and the randomness of the algorithm, there is a satisfying . The algorithm has time complexity at most .
Finally, we show that we can prune the list output by Algorithm 2 to a list of size such that it still contains a Projection matrix close to . Formally,
Lemma 8 (Pruning Algorithm).
Theorem 4 follows easily by combining the above claims :
Proof of Theorem 4.
It follows from Theorem 2 that is certifiably anticoncentrated. Since the inliers are drawn from it suffices to set . By Lemma 8 that the uniform distribution on is also certifiably anticoncentrated if the number of samples are at least . Since the hypothesis of Lemma 7 is now satisfies with and , Algorithm 2 runs in time and outputs a list of size such that with probability at least , it contains a projector satisfying . Recall, is the projector corresponding to .
Since we now have a list satifying the hypothesis for Lemma 8 and access to fresh samples we can conlcude that Algorithm 3 outputs a list of size which containts a projection matrix satisfying , as desired. The overall running time is dominated by Algorithm 2, which completes the proof.
∎
4.2 Analyzing : Proof of Lemma 5
We first show that covariance of all large enough subsamples of certifiably anticoncentrated samples have lowerbounded eigenvalues. Recall, for a PSD matrix , denotes the Eigenvalue Decomposition and denotes the corresponding rank Projection matrix.
Lemma 9 (Covariance of Subsets of Certifiably AntiConcentrated Distributions).
Let be certifiably anticoncentrated with . Then,