Interlacing Families III: Sharper Restricted Invertibility Estimates ^{†}^{†}thanks: Many of the results in this paper were first announced in lectures by the authors in 2013. This research was partially supported by NSF grants CCF0915487, CCF1111257, CCF1562041, CCF1553751, DMS0902962, DMS1128155 and DMS1552520, a Sloan Research Fellowship to Nikhil Srivastava a Simons Investigator Award to Daniel Spielman, and a MacArthur Fellowship.
Abstract
We use the method of interlacing families of polynomials to derive a simple proof of Bourgain and Tzafriri’s Restricted Invertibility Principle, and then to sharpen the result in two ways. We show that the stable rank can be replaced by the Schatten 4norm stable rank and that tighter bounds hold when the number of columns in the matrix under consideration does not greatly exceed its number of rows. Our bounds are derived from an analysis of smallest zeros of Jacobi and associated Laguerre polynomials.
1 Introduction
The Restricted Invertibility Principle of Bourgain and Tzafriri [BT87] is a quantitative generalization of the assertion that the rank of a matrix is the maximum number of linearly independent columns that it contains. It says that if a matrix has high stable rank:
then it must contain a large column submatrix , of size , with large least singular value, defined as:
The least singular value of is a measure of how far the matrix is from being singular. Bourgain and Tzafriri’s result was strengthened in the works of [Ver01, SS12, You14, NY17], and has since been a useful tool in Banach space theory, data mining, and more recently theoretical computer science.
Prior to this work, the sharpest result of this type was the following theorem of Spielman and Srivastava [SS12]:
Theorem 1.1.
Suppose is an matrix and is an integer. Then there exists a subset of size such that
(1) 
Note that when is proportional to , Theorem 1.1 produces a submatrix whose squared least singular value is at least a constant times the average squared norm of the columns of , a bound which cannot be improved even for . Thus, the theorem tells us that the columns of are “almost orthogonal” in that they have least singular value comparable to the average squared norm of the vectors individually.
To understand the form of the bound in (1), consider the case when , which is sometimes called the “isotropic” case. In this situation we have , and the right hand side of (1) becomes
(2) 
The number may seem familiar, and arises in the following two contexts.

It is an asymptotically sharp lowerbound on the least zero of the associated Laguerre polynomial , after an appropriate scaling. In Section 3 we derive the isotropic case of Theorem 1.1 from this fact, using the method of interlacing families of polynomials ^{1}^{1}1A version of this proof appeared in the authors’ survey paper [MSS14].

It is the lower edge of the support of the MarchenkoPastur distribution [MP67], which is the limiting spectral distribution of a sequence of random matrices where is with appropriately normalized i.i.d. Gaussian entries, as with fixed [MP67]. This convergence result along with large deviation estimates may be used to show that it is not possible to obtain a bound of
(3) in Theorem 1.1 for any constant , when goes to infinity significantly faster then . Thus, the bound of Theorem 1.1 is asymptotically sharp. See [Sri] for details.
In Section 3 we present a proof of Theorem 1.1 in the isotropic case using the method of interlacing polynomials. This proof considers the expected characteristic polynomial of for a randomly chosen . In this first proof, we choose by sampling columns with replacement. This seems like a suboptimal thing to do since it may select a column twice (corresponding to a trivial bound of ), but it allows us to easily prove that the expected characteristic polynomial is an associated Laguerre polynomial and that the family of polynomials that arise in the expectation form an interlacing family, which we define below. Because these polynomials form an interlacing family, there is some polynomial in the family whose th largest zero is at least the th largest zero of the expected polynomial. The bound (2) then follows from lower bounds on the zeros of associated Laguerre polynomials.
In Section 4, we extend this proof technique to show that Theorem 1.1 remains true in the nonisotropic case. In addition, we show a bound that replaces the stable rank with a Schatten 4norm stable rank, defined by
where denotes the Schatten norm, i.e., the norm of the singular values of . That is,
where are the singular values of . As
this is a strict improvement on Theorem 1.1. The above inequality is far from tight when has many moderately large singular values. In Section 4.1 we give a polynomial time algorithm for finding the subset guaranteed to exist by Theorem 1.1.
In Section 5, we improve on Theorem 1.1 in the isotropic case by sampling the sets without replacement. We show that the resulting expected characteristic polynomials are scaled Jacobi polynomials. We then derive a new bound on the smallest zeros of Jacobi polynomials which implies that there exists a set of columns for which
(4) 
As
this improves on Theorem 1.1 by a constant factor when is a constant multiple of . Note that this does not contradict the lower bound (3) from [Sri], which requires that .
A number of the results in this paper require a bound on the smallest root of a polynomial. In order to be as self contained as possible, we will either prove such bounds directly or take the best known bound directly from the literature. It is worth noting, however, that a more generic way of proving each of these bounds is provided by the framework of polynomial convolutions developed in [MSS15a]. The necessary inequalities in [MSS15a] are known to be asymptotically tight (as shown in [Mar15]) and in some cases improve on the bounds given here.
2 Preliminaries
2.1 Notation
We denote the Euclidean norm of a vector by . We denote the operator norm by:
This also equals the largest singular value of the matrix . The Frobenius norm of , also known as the HilbertSchmidt norm and written , is the square root of the sum of the squares of the singular values of . It is also equal to the square root of the sum of the squares of the entries of .
For a real rooted polynomial , we let denote the th largest zero of . When we want to refer to the smallest zero of a polynomial without specifying its degree, we will call it . We define the th elementary symmetric function of a matrix with eigenvalues to be the th elementary symmetric function of those eigenvalues:
Thus, the characteristic polynomial of may be expressed as
By inspecting the Leibniz expression for the determinant in terms of permutations, it is easy to see that the may also be expanded in terms of minors. The CauchyBinet identity says that for every matrix
where ranges over all subsets of size of indices in , and denotes the matrix formed by the columns of specified by .
We will use the following two formulas to calculate determinants and characteristic polynomials of matrices. You may prove them yourself, or find proofs in [Mey00, Chapter 6] or [Har97, Section 15.8].
Lemma 2.1.
For any invertible matrix and vector
Lemma 2.2 (Jacobi’s formula).
For any square matrices ,
We also use the following consequence of these formulas that was derived in [MSS15c, Lemma 4.2].
Lemma 2.3.
For every square matrix and random vector ,
2.2
We bound the zeros of the polynomials we construct by using the barrier function arguments developed in [BSS12], [MSS15c], and [MSS15a]. For , we define the of a polynomial to be the least root of , where indicates partial differentiation with respect to . We sometimes write this in the compact form
We may also define this in terms of the lower barrier function
by observing
As
we see that is less than the least root of .
The following claim is elementary.
Claim 2.4.
For ,
2.3 Interlacing Families
We use the method of interlacing families of polynomials developed in [MSS15b, MSS15c] to relate the zeros of sums of polynomials to individual polynomials in the sum. The results in Section 5 will require the following variant of the definition, which is more general than the ones we have used previously.
Definition 2.5.
An interlacing family consists of a finite rooted tree and a labeling of the nodes by monic realrooted polynomials , with two properties:

Every polynomial corresponding to a nonleaf node is a convex combination of the polynomials corresponding to the children of .

For all nodes with a common parent, all convex combinations of and are realrooted.^{2}^{2}2This condition implies that all convex combinations of all the children of a node are realrooted; the equivalence is discussed in [MSS14].
We say that a set of polynomials is an interlacing family if they are the labels of the leaves of such a tree.
In the applications in this paper, the leaves of the tree will naturally correspond to elements of a probability space, and the internal nodes will correspond to conditional expectations of the corresponding polynomials over this probability space.
In Sections 3 and 4, as in [MSS15b, MSS15c], we consider interlacing families in which the nodes of the tree at distance from the root are indexed by sequences . We denote the empty sequence and the root node of the tree by .
The leaves of the tree correspond to sequences of length , and each is labeled by a polynomial . Each intermediate node is labeled by the average of the polynomials labeling its children. So, for
and
A fortunate choice of polynomials labeling the leaves yields an interlacing family.
Theorem 2.6 (Theorem 4.5 of [MSS15c]).
Let be vectors in and let
Then, these polynomials form an interlacing family.
Interlacing families are useful because they allow us to relate the zeros of the polynomial labeling the root to those labeling the leaves. In particular, we will prove the following slight generalization of Theorem 4.4 of [MSS15b].
Theorem 2.7.
Let be an interlacing family of degree polynomials with root labeled by and leaves by . Then for all indices , there exist leaves and such that
(5) 
To prove this theorem, we first explain why we call these families “interlacing”.
Definition 2.8.
We say that a polynomial interlaces a polynomial if
We say that polynomials have a common interlacing if there is a single polynomial that interlaces for each .
The common interlacing assertions in this paper stem from the following fundamental example.
Claim 2.9.
Let be a dimensional symmetric matrix and let be vectors in . Then the polynomials
have a common interlacing.
Proof.
Let . For any , let . Cauchy’s interlacing theorem tells us that
So, for sufficiently large , interlaces each . ∎
The connection between interlacing and the realrootedness of convex combinations is given by the following theorem (see [Ded92], [Fel80, Theorem ], and [CS07, Theorem 3.6].
Theorem 2.10.
Let be realrooted (univariate) polynomials of the same degree with positive leading coefficients. Then have a common interlacing if and only if is real rooted for all convex combinations .
Theorem 2.7 follows from an inductive application of the following lemma, which generalizes the case which was proven in [MSS15b, MSS15c].
Lemma 2.11.
Let be realrooted degree polynomials that have a common interlacing. Then for every index and for every nonnegative such that , there exist an and a so that
Proof.
By restricting our attention to the polynomials for which is positive, we may assume without loss of generality that each is positive for every . Define
and let
We seek and for which .
Let
be a polynomial that interlaces every . As each has a positive leading coefficient, we know that is at least for odd and at most for even.
We first consider the case in which do not have any zero in common. In this case, : if some then for all . Moreover, there must be some for which is nonzero. As all the are positive, is positive for odd and negative for even.
As there must be some for which , there must be an for which and a for which . We now show that if is odd, then . As is nonnegative, must have a zero between and . As interlaces , this is the th largest zero of . Similarly, the nonpositivity of implies that has a zero between and . This must be the th largest zero of .
The case of even is symmetric, except that we reverse the choice of and .
We finish by observing that it suffices to consider the case in which do not have any zero in common. If they do, we let be their greatest common divisor, define , and observe that do not have any common zeros. Thus, we may apply the above argument to these polynomials. As multiplying all the polynomials by adds the same zeros to for and , the theorem holds for these as well. ∎
3 The Isotropic Case with Replacement: Laguerre Polynomials
We now prove Theorem 1.1 in the isotropic case. Let the columns of be the vectors . The condition is equivalent to . For a set of size ,
We now consider the expected characteristic polynomial of the sum of the outer products of of these vectors chosen uniformly at random, with replacement. We indicate one such polynomial by a vectors of indices such as , where we recall . These are the leaves of the tree in the interlacing family:
As in Section 2.3, the intermediate nodes in the tree are labeled by subsequences of this form, and the polynomial at the root of the tree is
We now derive a formula for .
Lemma 3.1.
Proof.
For the polynomial is divisible by . So, the th largest root of is equal to the smallest root of
To bound the smallest root of this polynomial, we observe that it is a slight transformation of an associated Laguerre polynomial. We use the definition of the associated Laguerre polynomial of degree and parameter , , given by Rodrigues’ formula [Sze39, (5.1.5)]
Thus,
We now employ a lower bound on the smallest root of associated Laguerre polynomials due to Krasikov [Kra06, Theorem 1].
Theorem 3.2.
For ,
where and .
Corollary 3.3.
Proof.
Lemma 3.1 and Corollary 3.3 are all one needs to establish Theorem 1.1 in the isotropic case. Theorems 2.6 and 2.7 tell us that there exists a sequence for which
As is the characteristic polynomial of , this sequence must consist of distinct elements. If not, then the matrix in the sum would have rank at most and thus . So, we conclude that there exists a set of size for which
4 The Nonisotropic Case and the Schatten norm
In this section we prove the promised strengthening of Theorem 1.1 in terms of the Schatten 4norm. In the proof it will be more natural to work with eigenvalues of rather than singular values of (and its submatrices). For a symmetric matrix , we define
(6) 
With this definition and the change of notation the theorem may be stated as follows.
Theorem 4.1.
Suppose are vectors with . Then for every integer , there exists a set of size with
(7) 
We prove this theorem by examining the same interlacing family as in the previous section. As we are no longer in the isotropic case, we need to recalculate the polynomial at the root of the tree, which will not necessarily be a Laguerre polynomial. We give the formula for general random vectors with finite support, but will apply it to the special case in which each random vector is uniformly chosen from .
Lemma 4.2.
Let be a random dimensional vector with finite support. If are i.i.d. copies of , then
where are the eigenvalues of .
Proof.
Let . By introducing variables and applying Lemma 2.3 times, we obtain
By computing
we simplify the above expression to
Since for and for , we can rewrite this as
as desired. ∎
We now require a lower bound on the smallest zero of . We will use the following lemma, which tells us that the of a polynomial grows in a controlled way as a function of when we apply a operator to it. This is similar to Lemma 3.4 of [BSS12], which was written in the language of random rank one updates of matrices.
Lemma 4.3.
If is a realrooted polynomial and , then is realrooted and
Proof.
It is well known that is real rooted: see [PS76, Problem V.1.18], [Mar49, Corollary 18.1], or [MSS14, Lemma 3.7] or for a proof. To see that for , recall that . So, both and have the same single sign for all , and thus cannot be zero there.
Let be have degree and zeros . Let , so that . To prove the claim it is enough to show for
that and that
(8) 
The first statement is true because
so .
We begin our proof of the second statement by expressing in terms of and :
(9) 
wherever all quantities are finite, which happens everywhere except at the zeros of and . Since is strictly below the zeros of both, it follows that:
After replacing by and rearranging terms (noting the positivity of ), we see that (8) is equivalent to
We now finish the proof by expanding and in terms of the zeros of :
as all terms are positive  
as  
∎
Theorem 4.4.
Let be a random dimensional vector with finite support such that , let be i.i.d. copies of , and let
Then
where is defined as in (6).