Subset Selection for Matrices with Fixed Blocks
Abstract.
Subset selection for matrices is the task of extracting a column submatrix from a given matrix with such that the pseudoinverse of the sampled matrix has as small Frobenius or spectral norm as possible. In this paper, we consider the more general problem of subset selection for matrices that allows a block is fixed at the beginning. Under this setting, we provide a deterministic method for selecting a column submatrix from . We also present a bound for both the Frobenius and the spectral matrix norms of the pseudoinverse of the sampled matrix with showing that the bound is asymptotically optimal. The main technology for proving this result is the interlacing families of polynomials which is developed by Marcus, Spielman and Srivastava. This idea also results in a deterministic greedy selection algorithm that produces the submatrix promised by our result.
1. Introduction
1.1. Subset selection for matrices
Subset selection for matrices aims to select a column submatrix from a given matrix with such that the sampled matrix is wellconditioned. To state conveniently, we will assume is fullrank, i.e., . Given , the cardinality of the set is denoted by . We use to denote the submatrix of obtained by extracting the columns of indexed by and use to denote the MoorePenrose pseudoinverse of . Let be a sampling parameter. We can state the subset selection for matrices as follows:
Problem 1.1.
Find a subset with cardinality at most such that and is minimized, i.e.,
where , denotes the spectral and the Frobenius matrix norm, respectively.
Problem 1.1 is raised in many applied areas such as preconditioning for solving linear systems[1], sensor selection [18], graph signal processing [12, 34], and feature selection in means clustering [7, 8]. In [2], Avron and Boutsidis show an interesting connection between Problem 1.1 and the combinatorial problem of finding a lowstretch spanning tree in an undirected graph. In statistics literature, the subset selection problem has also been studied. For instance, for , the solution to Problem 1.1 is statistically optimal design for linear regression [15, 28].
One simple method for solving Problem 1.1 is to evaluate the performance of all possible subset of size , but evidently it is computationally expensive unless or is very small. In [14], Çivril and MagdonIsmail study the complexity of the spectral norm version of Problem 1.1, where they show that this problem is NPhard. Several heuristics have been proposed to approximately solve the subset selection problem. Section 1.3 will provide a summary of known results from prior literature.
1.2. Our contribution
In this paper we consider a generalized version of the subset selection for matrices where we have a matrix fixed at first and then complement this matrix by adding columns of such that has as small Frobenius or spectral norm as possible. Usually, is chosen as a column submatrix of . This notion of keeping a fixed block of is useful, if we already know that such a block has some distinguished properties. We state the problem as follows:
Problem 1.2.
Suppose that and with and . Find a subset with cardinality at most such that and is minimized, i.e.,
where , denotes the spectral and the Frobenius matrix norm, respectively.
We would like to mention that the Frobenius norm version of Problem 1.2 was considered in [33]. If we take , then Problem 1.2 is reduced to Problem 1.1. Hence, the results presented in this paper also present a solution to Problem 1.1. We next state the main result of this paper. To state conveniently, for , throughout this paper, we set
(1) 
We have the following result for Problem 1.2.
Theorem 1.3.
Suppose that and with and . Then for any fixed , there exists a subset with cardinality such that is fullrank and for both ,
(2) 
The proof of Theorem 1.3 provides a deterministic algorithm for computing the subset in time where is the exponent of the complexity for the matrix multiplication. We will introduce it in Section 4.
If we take in Theorem 1.3, then we can obtain the following corollary:
Corollary 1.4.
Suppose that with . Then for any fixed , there exists a subset with cardinality such that and for both ,
1.3. Related work
We now give a summary of known results regarding both Problem 1.1 and Problem 1.2 and provide comparisons between our result and the known results.
1.3.1. Lower bounds
The lower bound is defined as the nonnegative number such that for every of cardinality , there exists a matrix satisfying
The lower bounds for Problem 1.1 have been developed in [2]. For , Theorem in [2] shows the bound is ; for and , Theorem in [2] shows the bound is . The approximation bound in Corollary 1.4 asymptotically matches those bounds.
1.3.2. Restricted invertibility principle
The restricted invertibility problem asks whether one can select a large number of linearly independent columns of and provide an estimation for the norm of the restricted inverse. To be more precise, one wants to find a subset , with cardinality being as large as possible, such that for all and to estimate the constant . In [6], Bourgain and Tzafriri study restricted invertibility problem with showing its applications in geometry and analysis. Later, their results are improved in [30, 32, 29]. In [24], Marcus, Spielman and Srivastava employ the method of interlacing families of polynomials to sharpen this result with presenting a simple proof to restricted invertibility principle. One can see [27] for a survey of recent development in restricted invertibility.
Problem 1.1 is different with restricted invertibility problem. In Problem 1.1, we require , while, in the restricted invertibility problem, one only considers the case where . Our proof for Theorem 1.3 is inspired by the method used by Marcus, Spielman and Srivastava [24] to prove restricted invertibility principle. We will introduce the main idea of the proof in Section 1.4.
1.3.3. Approximation bounds for
We first focus on and with presenting known bounds for the approximation ratio
In [2, 16, 17], the authors develop a greedy removal algorithm where one “bad” column of is removed at each step. They show that this algorithm can find a subset such that in time. If is fixed, the approximation bound in [2, 16, 17] is which is as same as that of Corollary 1.4.
In [33], the Frobenius norm version of Problem 1.2 has been considered by Youssef. Let be the fixed matrix which is chosen at the beginning. Theorem in [33] shows that for any sampling parameter , one can produce a subset in time with presenting an upper bound of . Note that
and hence . Hence Theorem 1.3 is available for the wider range of the sampling parameter .
1.3.4. Approximation bounds for
For , Corollary in [2] designs an algorithm for computing which can run in time with presenting the bound
(3) 
If is fixed, the asymptotically bound in (3) is which is larger than that in Corollary 1.4. For the spectral norm, to our knowledge, Problem 1.2 has not been considered in previous paper, and Theorem 1.3 is the first work on the approximation bound as well as the deterministic algorithm for Problem 1.2.
1.3.5. Approximation bounds for both
In [2], a deterministic algorithm is also presented for both . The algorithm which runs in time outputs a set with satisfying
(4) 
Noting that
we obtain that
Hence our result in Corollary 1.4 improves the bound in (4). Particularly, when tends to , the approximation bound in (4) goes to infinity while still is finite. Hence, the bound is far better than the one in (4) when is close to .
1.3.6. Algorithms
Many random algorithms are developed for solving Problem 1.1 (see [2]). In this paper, we focus on deterministic algorithms. Motivated by the proof of Theorem 1.3, we introduce a deterministic algorithm in Section 4 which outputs a subset such that
for any fixed . As shown in Theorem 4.1, the complexity of the algorithm is where is the exponent of the complexity for the matrix multiplication. We emphasize that our algorithm is faster than all of the algorithms mentioned in Section 1.3.3 and Section 1.3.4 when is large enough, since there exists a factor in the computational cost of all of the algorithms, while the time complexity of our algorithm is linear about .
Note that the time complexity of the algorithm mentioned in Section 1.3.5 is much better than that of our algorithm. However, as said before, the approximation bound obtained by our algorithm is far better than the one which is provided by the algorithm mentioned in Section 1.3.5. Moreover, our algorithm can solve both Problem 1.1 and Problem 1.2 while all of the other algorithms only work for Problem 1.1.
1.4. Our techniques
Our proof of Theorem 1.3 builds on the method of interlacing families which is a powerful technology developed in [22, 23] (see also [24, 25]) by Marcus, Spielman and Srivastava in work of the solution to the KadisonSinger problem. Recall that an interlacing family of polynomials has the property that there is always contain a polynomial whose th largest root is at least the th largest root of the sum of the polynomials in the family (or the expected polynomial).
Our selection is based on the observation that the space spanned by the columns of the matrix is actually the space spanned by the columns of a matrix with rows consisting of the left singular vectors of . Note that the left singular vectors are a set of orthonormal vectors, so which is sometimes called the “isotropic” case. Then we consider the subset selection in the isotropic case while fixing at the beginning, where is a submatrix of corresponding to . We then prove that if is selected by randomly sampling columns from without replacement, the related characteristic polynomials of form an interlacing family. This implies that there is a subset such that the smallest root of the characteristic polynomial of is at least the smallest root of the expected characteristic polynomial of certain sums of those characteristic polynomials. Then we need present a lower bound of the smallest root of this expected characteristic polynomial. We do this by using method of lower barrier function argument[4, 29, 23] together with the consideration of the behavior of the roots of a realrooted polynomial under the operator .
1.5. Organization
2. Preliminaries
2.1. Notations and Lemmas
We use to denote the operator that performs partial differentiation in . We say that a univariate polynomial is realrooted if all of its coefficients and roots are real. For a realrooted polynomial , we let and denote the smallest and the largest root of , respectively. We use to denote the th largest root of . Let and be two sets and we use to denote the set of elements in but not in . We use to denote the expectation of a random variable.
Singular Value Decomposition. For a matrix , we denote the operator norm and the Frobenius norm of by and , respectively. The (thin) singular value decomposition (SVD) of of rank is
with singular values . Here, is some rank parameter . The matrices and contain the left singular vectors of ; and similarly, the matrices and contain the right singular vectors of . We see that and .
MoorePenrose pseudoinverse. Suppose that and its thin SVD is . We write as the MoorePenrose pseudoinverse of , here is the inverse of . It has the following properties.
Lemma 2.1 ([5], Fact ).
Let and . If or , then .
In general, if is not full rank. However, if is a nonsingular square matrix, the following lemma shows that . Lemma 2.2 is useful in our argument and we believe that it is independent interesting.
Lemma 2.2.
Let be an invertible matrix. Then for any , .
Proof.
Set . Then . It suffices to prove
(5) 
Let be the singular value decomposition of , where and are two unitary matrices,
with and . Note that
(6) 
Recall that . Then
(7) 
implies (5). Denote the standard basis by . Since is invertible, so the linear systems and has the same solutions. Hence for . This implies
Jacobi’s formula and Jensen’s Inequality.
Lemma 2.3 (Jacobi’s formula).
Let and be two square matrices. Then,
We will utilize Jensen’s inequality to estimate the lower bound of the sum of a certain concave function.
Lemma 2.4 (Jensen’s Inequality).
Let be a function from to . Then is concave if and only if
whenever .
We also need the following lemma.
Lemma 2.5 ([5], Fact ).
If is an invertible matrix, then for any vector ,
2.2. Interlacing Families
Our proof of Theorem 1.3 builds on the method of interlacing families which is a powerful techniques discovered in [22, 23] by Marcus, Spielman and Srivastava in work of the solution to the KadisonSinger problem [19, 9, 10, 11, 23, 26].
Let and be two realrooted polynomials. We say interlaces if
We say that polynomials have a common interlacing if there is a polynomial so that interlaces for each .
Following [24], we define the notion of an interlacing family of polynomials as follows.
Definition 2.6 ([24], Definition 2.5).
An interlacing family consists of a finite rooted tree and a labeling of the nodes by monic realrooted polynomials , with two properties:

Every polynomial corresponding to a nonleaf node is a convex combination of the polynomials corresponding to the children of .

For all nodes with a common parent, the polynomials have a common interlacing.^{1}^{1}1This condition is equivalent to that all convex combinations of all the children of a node are realrooted; the equivalence is implied by Helly’s theorem and Lemma 2.9.
We say that a set of polynomials form an interlacing family if they are the labels of the leaves of such a tree.
The following lemma which was proved in [24, Theorem ] shows the utility of forming an interlacing family.
Lemma 2.7 ([24], Theorem ).
Let be an interlacing family of degree polynomials with root labeled by and leaves by . Then for all indices , there exist leaves and such that
In Section 3, we will prove that the polynomials obtained by average subset selection form an interlacing family. According to the above definition, this requires establishing the existence of certain common interlacing. The following lemma will be used to show the common interlacing.
Lemma 2.8 ([24], Claim ).
If is a symmetric matrix and are vectors in , then the polynomials
have a common interlacing.
The following lemma shows that the common interlacings are equivalent to the realrootedness of convex combinations.
Lemma 2.9 ([13], Theorem ).
Let be realrooted (univariate) polynomials of the same degree with positive leading coefficients. Then have a common interlacing if and only if is realrooted for all convex combinations .
2.3. Lower barrier function and properties
In this section, we introduce the lower barrier potential function from [4, 23]. For a realrooted polynomial , one can use the evolution of such barrier function to track the approximation locations of the roots .
Definition 2.10.
For a realrooted polynomial with roots , define the lower barrier function of as
We have the following technical lemma for the lower barrier function. This result can be obtained by Lemma in [23]. Here we include a proof for completeness.
Lemma 2.11.
Suppose that is a realrooted polynomial and . Suppose that and
Then
(8) 
Proof.
Suppose that the degree of is and its zeros are . To this end, we need to prove . According to
we have . Noting that , we obtain that . Next we will express in terms of and :
wherever all quantities are finite, which happens everywhere except at the zeros of and . Since is strictly below the zeros of both, it follows that:
So (8) is equivalent to
i.e.,
By expanding and in terms of the zeros of , we can see that (8) is equivalent to
Noting , we have
as desired. Here the first and the second inequalities are due to , i.e., and the CauchySchwarz inequality. ∎
3. Proof of Theorem 1.3
In this section, we present the proof of Theorem 1.3. Our proof provides a deterministic greedy algorithm which will be proposed in Section 4. To state our proof clearly, we introduce the following result with postponing its proof to the end of this section.
Theorem 3.1.
Let which satisfies . Assume that with . Let be a submatrix of whose columns are indexed by . Set . Then for any fixed there exists a subset of size such that
where is defined by (1).
Using this Theorem, we next present the proof of Theorem 1.3.
Proof of Theorem 1.3.
Let be the SVD of . Suppose that and are two indexed sets so that and .
Recall that , which implies that . Applying Theorem 3.1 with and , we obtain that there exists a subset with size such that
(9) 
Consider the left side of (2), we have
(10) 
From (9), we know that the matrix has full row rank. Since also has full column rank, by Lemma 2.1 we know that and
(11) 
where follows from standard properties of matrix norms and using the definition of the pseudoinverse of and , and follows from (9). To this end, we still need present an upper bound of . Note that
(12) 
where follows from , follows from Lemma 2.2, follows from , follows from , the standard properties of matrix norm and , and follows from . Thus combining (10), (11) and (12), we can obtain (2). ∎
The rest of this section aims to prove Theorem 3.1 by using the method of interlacing families. The proof consists of two main parts. Firstly, we will prove that the characteristic polynomials of the matrices that arise in Theorem 3.1 form an interlacing family and present an expression for the expected characteristic polynomial (the summation of the polynomials in the family). Secondly, we use the barrier function argument to establish a lower bound on the smallest zero of the expected characteristic polynomial.
3.1. Interlacing family for subset selection
In this subsection, we show that the expected characteristic polynomials obtained by average subset selection over while keeping the given matrix form an interlacing family.
Let the columns of be the vectors and let be a given matrix with . Since , we obtain that . Denote the nonzero singular values of as . For each , set
For any fixed set of size less than , we define the polynomial
where the expectation is taken uniformly over sets of size containing . Building on the ideas of MarcusSpielmanSrivastava [24], we can derive expressions for the polynomials .
We begin with the following result.
Lemma 3.2.
Suppose that and . Then
holds for every subset of size .