Interlacing Families III: Sharper Restricted Invertibility Estimates Many of the results in this paper were first announced in lectures by the authors in 2013. This research was partially supported by NSF grants CCF-0915487, CCF-1111257, CCF-1562041, CCF-1553751, DMS-0902962, DMS-1128155 and DMS-1552520, a Sloan Research Fellowship to Nikhil Srivastava a Simons Investigator Award to Daniel Spielman, and a MacArthur Fellowship.

Interlacing Families III: Sharper Restricted Invertibility Estimates thanks: Many of the results in this paper were first announced in lectures by the authors in 2013. This research was partially supported by NSF grants CCF-0915487, CCF-1111257, CCF-1562041, CCF-1553751, DMS-0902962, DMS-1128155 and DMS-1552520, a Sloan Research Fellowship to Nikhil Srivastava a Simons Investigator Award to Daniel Spielman, and a MacArthur Fellowship.

Adam W. Marcus
Princeton University
   Daniel A. Spielman
Yale University
   Nikhil Srivastava
UC Berkeley
Abstract

We use the method of interlacing families of polynomials to derive a simple proof of Bourgain and Tzafriri’s Restricted Invertibility Principle, and then to sharpen the result in two ways. We show that the stable rank can be replaced by the Schatten 4-norm stable rank and that tighter bounds hold when the number of columns in the matrix under consideration does not greatly exceed its number of rows. Our bounds are derived from an analysis of smallest zeros of Jacobi and associated Laguerre polynomials.

1 Introduction

The Restricted Invertibility Principle of Bourgain and Tzafriri [BT87] is a quantitative generalization of the assertion that the rank of a matrix is the maximum number of linearly independent columns that it contains. It says that if a matrix has high stable rank:

then it must contain a large column submatrix , of size , with large least singular value, defined as:

The least singular value of is a measure of how far the matrix is from being singular. Bourgain and Tzafriri’s result was strengthened in the works of [Ver01, SS12, You14, NY17], and has since been a useful tool in Banach space theory, data mining, and more recently theoretical computer science.

Prior to this work, the sharpest result of this type was the following theorem of Spielman and Srivastava [SS12]:

Theorem 1.1.

Suppose is an matrix and is an integer. Then there exists a subset of size such that

(1)

Note that when is proportional to , Theorem 1.1 produces a submatrix whose squared least singular value is at least a constant times the average squared norm of the columns of , a bound which cannot be improved even for . Thus, the theorem tells us that the columns of are “almost orthogonal” in that they have least singular value comparable to the average squared norm of the vectors individually.

To understand the form of the bound in (1), consider the case when , which is sometimes called the “isotropic” case. In this situation we have , and the right hand side of (1) becomes

(2)

The number may seem familiar, and arises in the following two contexts.

  1. It is an asymptotically sharp lowerbound on the least zero of the associated Laguerre polynomial , after an appropriate scaling. In Section 3 we derive the isotropic case of Theorem 1.1 from this fact, using the method of interlacing families of polynomials 111A version of this proof appeared in the authors’ survey paper [MSS14].

  2. It is the lower edge of the support of the Marchenko-Pastur distribution [MP67], which is the limiting spectral distribution of a sequence of random matrices where is with appropriately normalized i.i.d. Gaussian entries, as with fixed [MP67]. This convergence result along with large deviation estimates may be used to show that it is not possible to obtain a bound of

    (3)

    in Theorem 1.1 for any constant , when goes to infinity significantly faster then . Thus, the bound of Theorem 1.1 is asymptotically sharp. See [Sri] for details.

In Section 3 we present a proof of Theorem 1.1 in the isotropic case using the method of interlacing polynomials. This proof considers the expected characteristic polynomial of for a randomly chosen . In this first proof, we choose by sampling columns with replacement. This seems like a suboptimal thing to do since it may select a column twice (corresponding to a trivial bound of ), but it allows us to easily prove that the expected characteristic polynomial is an associated Laguerre polynomial and that the family of polynomials that arise in the expectation form an interlacing family, which we define below. Because these polynomials form an interlacing family, there is some polynomial in the family whose th largest zero is at least the th largest zero of the expected polynomial. The bound (2) then follows from lower bounds on the zeros of associated Laguerre polynomials.

In Section 4, we extend this proof technique to show that Theorem 1.1 remains true in the nonisotropic case. In addition, we show a bound that replaces the stable rank with a Schatten 4-norm stable rank, defined by

where denotes the Schatten -norm, i.e., the norm of the singular values of . That is,

where are the singular values of . As

this is a strict improvement on Theorem 1.1. The above inequality is far from tight when has many moderately large singular values. In Section 4.1 we give a polynomial time algorithm for finding the subset guaranteed to exist by Theorem 1.1.

In Section 5, we improve on Theorem 1.1 in the isotropic case by sampling the sets without replacement. We show that the resulting expected characteristic polynomials are scaled Jacobi polynomials. We then derive a new bound on the smallest zeros of Jacobi polynomials which implies that there exists a set of columns for which

(4)

As

this improves on Theorem 1.1 by a constant factor when is a constant multiple of . Note that this does not contradict the lower bound (3) from [Sri], which requires that .

A number of the results in this paper require a bound on the smallest root of a polynomial. In order to be as self contained as possible, we will either prove such bounds directly or take the best known bound directly from the literature. It is worth noting, however, that a more generic way of proving each of these bounds is provided by the framework of polynomial convolutions developed in [MSS15a]. The necessary inequalities in [MSS15a] are known to be asymptotically tight (as shown in [Mar15]) and in some cases improve on the bounds given here.

2 Preliminaries

2.1 Notation

We denote the Euclidean norm of a vector by . We denote the operator norm by:

This also equals the largest singular value of the matrix . The Frobenius norm of , also known as the Hilbert-Schmidt norm and written , is the square root of the sum of the squares of the singular values of . It is also equal to the square root of the sum of the squares of the entries of .

For a real rooted polynomial , we let denote the th largest zero of . When we want to refer to the smallest zero of a polynomial without specifying its degree, we will call it . We define the th elementary symmetric function of a matrix with eigenvalues to be the th elementary symmetric function of those eigenvalues:

Thus, the characteristic polynomial of may be expressed as

By inspecting the Leibniz expression for the determinant in terms of permutations, it is easy to see that the may also be expanded in terms of minors. The Cauchy-Binet identity says that for every matrix

where ranges over all subsets of size of indices in , and denotes the matrix formed by the columns of specified by .

We will use the following two formulas to calculate determinants and characteristic polynomials of matrices. You may prove them yourself, or find proofs in [Mey00, Chapter 6] or [Har97, Section 15.8].

Lemma 2.1.

For any invertible matrix and vector

Lemma 2.2 (Jacobi’s formula).

For any square matrices ,

We also use the following consequence of these formulas that was derived in [MSS15c, Lemma 4.2].

Lemma 2.3.

For every square matrix and random vector ,

2.2

We bound the zeros of the polynomials we construct by using the barrier function arguments developed in [BSS12], [MSS15c], and [MSS15a]. For , we define the of a polynomial to be the least root of , where indicates partial differentiation with respect to . We sometimes write this in the compact form

We may also define this in terms of the lower barrier function

by observing

As

we see that is less than the least root of .

The following claim is elementary.

Claim 2.4.

For ,

2.3 Interlacing Families

We use the method of interlacing families of polynomials developed in [MSS15b, MSS15c] to relate the zeros of sums of polynomials to individual polynomials in the sum. The results in Section 5 will require the following variant of the definition, which is more general than the ones we have used previously.

Definition 2.5.

An interlacing family consists of a finite rooted tree and a labeling of the nodes by monic real-rooted polynomials , with two properties:

  1. Every polynomial corresponding to a non-leaf node is a convex combination of the polynomials corresponding to the children of .

  2. For all nodes with a common parent, all convex combinations of and are real-rooted.222This condition implies that all convex combinations of all the children of a node are real-rooted; the equivalence is discussed in [MSS14].

We say that a set of polynomials is an interlacing family if they are the labels of the leaves of such a tree.

In the applications in this paper, the leaves of the tree will naturally correspond to elements of a probability space, and the internal nodes will correspond to conditional expectations of the corresponding polynomials over this probability space.

In Sections 3 and 4, as in [MSS15b, MSS15c], we consider interlacing families in which the nodes of the tree at distance from the root are indexed by sequences . We denote the empty sequence and the root node of the tree by .

The leaves of the tree correspond to sequences of length , and each is labeled by a polynomial . Each intermediate node is labeled by the average of the polynomials labeling its children. So, for

and

A fortunate choice of polynomials labeling the leaves yields an interlacing family.

Theorem 2.6 (Theorem 4.5 of [MSS15c]).

Let be vectors in and let

Then, these polynomials form an interlacing family.

Interlacing families are useful because they allow us to relate the zeros of the polynomial labeling the root to those labeling the leaves. In particular, we will prove the following slight generalization of Theorem 4.4 of [MSS15b].

Theorem 2.7.

Let be an interlacing family of degree polynomials with root labeled by and leaves by . Then for all indices , there exist leaves and such that

(5)

To prove this theorem, we first explain why we call these families “interlacing”.

Definition 2.8.

We say that a polynomial interlaces a polynomial if

We say that polynomials have a common interlacing if there is a single polynomial that interlaces for each .

The common interlacing assertions in this paper stem from the following fundamental example.

Claim 2.9.

Let be a dimensional symmetric matrix and let be vectors in . Then the polynomials

have a common interlacing.

Proof.

Let . For any , let . Cauchy’s interlacing theorem tells us that

So, for sufficiently large , interlaces each . ∎

The connection between interlacing and the real-rootedness of convex combinations is given by the following theorem (see [Ded92], [Fel80, Theorem ], and [CS07, Theorem 3.6].

Theorem 2.10.

Let be real-rooted (univariate) polynomials of the same degree with positive leading coefficients. Then have a common interlacing if and only if is real rooted for all convex combinations .

Theorem 2.7 follows from an inductive application of the following lemma, which generalizes the case which was proven in [MSS15b, MSS15c].

Lemma 2.11.

Let be real-rooted degree polynomials that have a common interlacing. Then for every index and for every nonnegative such that , there exist an and a so that

Proof.

By restricting our attention to the polynomials for which is positive, we may assume without loss of generality that each is positive for every . Define

and let

We seek and for which .

Let

be a polynomial that interlaces every . As each has a positive leading coefficient, we know that is at least for odd and at most for even.

We first consider the case in which do not have any zero in common. In this case, : if some then for all . Moreover, there must be some for which is nonzero. As all the are positive, is positive for odd and negative for even.

As there must be some for which , there must be an for which and a for which . We now show that if is odd, then . As is nonnegative, must have a zero between and . As interlaces , this is the th largest zero of . Similarly, the nonpositivity of implies that has a zero between and . This must be the th largest zero of .

The case of even is symmetric, except that we reverse the choice of and .

We finish by observing that it suffices to consider the case in which do not have any zero in common. If they do, we let be their greatest common divisor, define , and observe that do not have any common zeros. Thus, we may apply the above argument to these polynomials. As multiplying all the polynomials by adds the same zeros to for and , the theorem holds for these as well. ∎

Proof of Theorem 2.7.

For every node in the tree defining an interlacing family, the subtree rooted at and the polynomials on the nodes of that tree form an interlacing family of their own. Thus, we may prove the theorem by induction on the height of the tree. Lemma 2.11 handles trees of height .

For trees of greater height, Lemma 2.11 tells us that there are children of the root and that satisfy (5). If is not a leaf, then it is the root of its own interlacing family and Lemma 2.11 tells this family has a leaf node for which

The same holds for . ∎

3 The Isotropic Case with Replacement: Laguerre Polynomials

We now prove Theorem 1.1 in the isotropic case. Let the columns of be the vectors . The condition is equivalent to . For a set of size ,

We now consider the expected characteristic polynomial of the sum of the outer products of of these vectors chosen uniformly at random, with replacement. We indicate one such polynomial by a vectors of indices such as , where we recall . These are the leaves of the tree in the interlacing family:

As in Section 2.3, the intermediate nodes in the tree are labeled by subsequences of this form, and the polynomial at the root of the tree is

We now derive a formula for .

Lemma 3.1.
Proof.

For every , define

We will prove by induction on that

The base case of is trivial. To establish the induction, we use Lemma 2.1, the identity , and Lemma 2.2 to compute

For the polynomial is divisible by . So, the th largest root of is equal to the smallest root of

To bound the smallest root of this polynomial, we observe that it is a slight transformation of an associated Laguerre polynomial. We use the definition of the associated Laguerre polynomial of degree and parameter , , given by Rodrigues’ formula [Sze39, (5.1.5)]

Thus,

We now employ a lower bound on the smallest root of associated Laguerre polynomials due to Krasikov [Kra06, Theorem 1].

Theorem 3.2.

For ,

where and .

Corollary 3.3.
Proof.

Applying Theorem 3.2 with and thus

gives

Lemma 3.1 and Corollary 3.3 are all one needs to establish Theorem 1.1 in the isotropic case. Theorems 2.6 and 2.7 tell us that there exists a sequence for which

As is the characteristic polynomial of , this sequence must consist of distinct elements. If not, then the matrix in the sum would have rank at most and thus . So, we conclude that there exists a set of size for which

4 The Nonisotropic Case and the Schatten norm

In this section we prove the promised strengthening of Theorem 1.1 in terms of the Schatten 4-norm. In the proof it will be more natural to work with eigenvalues of rather than singular values of (and its submatrices). For a symmetric matrix , we define

(6)

With this definition and the change of notation the theorem may be stated as follows.

Theorem 4.1.

Suppose are vectors with . Then for every integer , there exists a set of size with

(7)

We prove this theorem by examining the same interlacing family as in the previous section. As we are no longer in the isotropic case, we need to re-calculate the polynomial at the root of the tree, which will not necessarily be a Laguerre polynomial. We give the formula for general random vectors with finite support, but will apply it to the special case in which each random vector is uniformly chosen from .

Lemma 4.2.

Let be a random -dimensional vector with finite support. If are i.i.d. copies of , then

where are the eigenvalues of .

Proof.

Let . By introducing variables and applying Lemma 2.3 times, we obtain

By computing

we simplify the above expression to

Since for and for , we can rewrite this as

as desired. ∎

We now require a lower bound on the smallest zero of . We will use the following lemma, which tells us that the of a polynomial grows in a controlled way as a function of when we apply a operator to it. This is similar to Lemma 3.4 of [BSS12], which was written in the language of random rank one updates of matrices.

Lemma 4.3.

If is a real-rooted polynomial and , then is real-rooted and

Proof.

It is well known that is real rooted: see [PS76, Problem V.1.18], [Mar49, Corollary 18.1], or [MSS14, Lemma 3.7] or for a proof. To see that for , recall that . So, both and have the same single sign for all , and thus cannot be zero there.

Let be have degree and zeros . Let , so that . To prove the claim it is enough to show for

that and that

(8)

The first statement is true because

so .

We begin our proof of the second statement by expressing in terms of and :

(9)

wherever all quantities are finite, which happens everywhere except at the zeros of and . Since is strictly below the zeros of both, it follows that:

After replacing by and rearranging terms (noting the positivity of ), we see that (8) is equivalent to

We now finish the proof by expanding and in terms of the zeros of :

as all terms are positive
as

Theorem 4.4.

Let be a random -dimensional vector with finite support such that , let be i.i.d. copies of , and let

Then

where is defined as in (6).

Proof.

By multiplying through by a constant, we may assume without loss of generality that . In this case, we need to prove

Let be the eigenvalues of , so that Lemma 4.2 implies

Applying Lemma 4.3 times for any yields

by Claim 2.4.

To lower bound this expression, observe that the function