Testing Matrix Rank, Optimally
We show that for the problem of testing if a matrix has rank at most , or requires changing an -fraction of entries to have rank at most , there is a non-adaptive query algorithm making queries. Our algorithm works for any field . This improves upon the previous bound (Krauthgamer and Sasson, SODA ’03), and bypasses an lower bound of (Li, Wang, and Woodruff, KDD ’14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of . We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of queries in the sensing model for which query access comes in the form of ; perhaps surprisingly these bounds do not depend on .
Testing rank is only one of many tasks in determining if a matrix has low intrinsic dimensionality. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix more generally, which includes the stable rank, Schatten- norms, and SVD entropy. Specifically, we propose a bounded entry model, where is required to have entries bounded by in absolute value. Such a model provides a meaningful framework for testing numerical quantities and avoids trivialities caused by single entries being arbitrarily large. It is also well-motivated by recommendation systems. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above. We obtain several results for estimating the operator norm that may be of independent interest. For example, we show that if the stable rank is constant, , and the singular value gap for any constant , then the operator norm can be estimated up to a -factor non-adaptively by querying entries. This should be contrasted to adaptive methods such as the power method, or previous non-adaptive sampling schemes based on matrix Bernstein inequalities which read a submatrix and thus make queries. Similar to our non-adaptive algorithm for testing rank, our scheme instead reads a carefully selected pattern of entries.
- 1 Introduction
- 2 Preliminaries
- 3 Non-Adaptive Rank Testing
- 4 Non-Adaptive Stable Rank Testing
- 5 Non-Adaptive Testing of Schatten- Norm
- 6 Non-adaptive Testing of Matrix Entropy
- A Other Related Works
- B New Operator Norm Estimators
Data intrinsic dimensionality is a central object of study in compressed sensing, sketching, numerical linear algebra, machine learning, and many other domains [34, 25, 48, 47, 14, 52, 51]. In compressed sensing and sketching, the study of intrinsic dimensionality has led to significant advances in compressing the data to a size that is far smaller than the ambient dimension while still preserving useful properties of the signal [38, 3]. In numerical linear algebra and machine learning, understanding intrinsic dimensionality serves as a necessary condition for the success of various subspace recovery problems , e.g., matrix completion [49, 18, 21, 42] and robust PCA [6, 50, 10]. The focus of this work is on the intrinsic dimensionality of matrices, such as the rank, stable rank, Schatten- norms, and SVD entropy. The stable rank is defined to be the squared ratio of the Frobenius norm and the largest singular value, and the Schatten- norm is the norm of the singular values (see Appendix 6 for our definition of SVD entropy). We study these quantities in the framework of non-adaptive property testing [39, 12, 15]: given non-adaptive query access to the unknown matrix over a field , our goal is to determine whether is of dimension (where dimension depends on the specific problem), or is -far from having this property. The latter means that at least an -fraction of entries of should be modified in order to have dimension . Query access typically comes in the form of reading a single entry of the matrix, though we will also discuss sensing models where a query returns the value for a given . Without making assumptions on , we would like to choose our sample pattern or set of query matrices so that the query complexity is as small as possible.
Despite a large amount of work on testing matrix rank, many fundamental questions remain open. In the rank testing problem in the sampling model, one such question is to design an efficient algorithm that can distinguish rank- vs. -far from rank- with optimal sample complexity. The best-known sampling upper bound for non-adaptive rank testing for general is , which is achieved simply by sampling an submatrix uniformly at random . For arbitrary fields , only an lower bound for constant is known .
Besides the rank problem above, testing many numerical properties of real matrices has yet to be explored. For example, it is unknown what the query complexity is for the stable rank, which is a natural relaxation of rank in applications. Other examples for which previously we had no bounds are the Schatten- norms and SVD entropy. We discuss these problems in a new property testing framework that we call the bounded entry model. This model has many realistic applications in the Netflix challenge , where each entry of the matrix corresponds to the rating from a customer to a movie, ranging from 1 to 5. Understanding the query complexity of testing numerical properties in the bounded entry model is an important problem in recommendation systems and applications of matrix completion, where often entries are bounded.
1.1 Problem Setup, Related Work, and Our Results
Our work has two parts: (1) we resolve the query complexity of non-adaptive matrix rank testing, a well-studied problem in this model, and (2) we develop a new framework for testing numerical properties of real matrices, including the stable rank, the Schatten- norms and the SVD entropy. Our results are summarized in Table 1. We use and notation to hide polylogarithmic factors in the arguments inside. For the rank testing results, the hidden polylogarithmic factors depend only on and and do not depend on ; for the other problems, they may depend on .
Rank Testing. We first study the rank testing problem when we can only non-adaptively query entries. The goal is to design a sampling scheme on the entries of the unknown matrix and an algorithm so that we can distinguish whether is of rank , or at least an -fraction of entries of should be modified in order to reduce the rank to . This problem was first proposed by Krauthgamer and Sasson in  with a sample complexity upper bound of . In this work, we improve this to for every and , and complement this with a matching lower bound, showing that any algorithm with constant success probability requires at least samples:
Theorems 3.5, 3.11, and 3.14 (Informal). For any matrix over any field, there is a randomized non-adaptive sampling algorithm which reads entries and runs in time, and with high probability correctly solves the rank testing problem. Further, any non-adaptive algorithm with constant success probability requires samples over or any finite field.
Our non-adaptive sample complexity bound of matches what is known with adaptive queries , and thus we show the best known upper bound might as well be non-adaptive.
|Testing Problems||Rank||Stable Rank||Schatten- Norm||Entropy|
|(finite fields and )||†||()|
† The lower bound involves a reparameterization of the testing problem. Please see the respective theorem for details.
New Framework for Testing Matrix Properties. Testing rank is only one of many tasks in determining if a matrix has low intrinsic dimensionality. In several applications, we require a less fragile measure of the collinearity of rows and columns, which is known as the stable rank . We introduce what we call the bounded entry model as a new framework for studying such problems through the lens of property testing. In this model, we require all entries of a matrix to be bounded by in absolute value. Boundedness has many natural applications in recommendation systems, e.g., the user-item matrix of preferences for products by customers has bounded entries in the Netflix challenge . Indeed, there are many user rating matrices, etc., which naturally have a small number of discrete values, and therefore fit into a bounded entry model. The boundedness of entries also avoids trivialities in which one can modify a matrix to have a property by setting a single entry to be arbitrarily large, which, e.g., could make the stable rank arbitrarily close to .
Our model is a generalization of previous work in which stable rank testing was done in a model for which all rows had to have bounded norm , and the algorithm is only allowed to change entire rows at a time. As our non-adaptive rank testing algorithm will illustrate, one can sometimes do better by only reading certain carefully selected entries in rows and columns. Indeed, this is precisely the source of our improvement over prior work. Thus, the restriction of having to read an entire row is often unnatural, and further motivates our bounded entry model. We first informally state our main theorems on stable rank testing in this model.
Theorem 4.3 (Informal). There is a randomized algorithm for the stable rank testing problem to decide whether a matrix is of stable rank at most or is -far from stable rank at most , with failure probability at most , and which reads entries.
Theorem 4.3 relies on a new -approximate non-adaptive estimator of the largest singular value of a matrix, which may be of independent interest.
Theorem B.2 (Informal). Suppose that has stable rank and . Then in the bounded entry model, there is a randomized non-adaptive sampling algorithm which reads entries and with probability at least , outputs a -approximation to the largest singular value of .
We remark that when the stable rank is constant and the singular value gap for an arbitrary constant , the operator norm can be estimated up to a -factor by querying entries non-adaptively. We defer these and related results to Appendix B.1.2.
Other measures of intrinsic dimensionality include matrix norms, such as the Schatten- norm , which measures the central tendency of the singular values. Familiar special cases are , and , which have applications in differential privacy  and non-convex optimization [6, 16] for , and in numerical linear algebra  for . Matrix norms have been studied extensively in the streaming literature [28, 31, 32, 33], though their study in property testing models is lacking.
We study non-adaptive algorithms for these problems in the bounded entry model. We consider distinguishing whether is at least for (at least for ), or at least an -fraction of entries of should be modified in order to have this property, where is a constant (depending only on ). We choose the threshold for and for because they are the largest possible value of for under the bounded entry model. When , is maximized when is of rank , and so this gives us an alternative “measure” of how close we are to a rank- matrix. Testing whether is large in sublinear time allows us to quickly determine whether can be well approximated by a low-rank matrix, which could save us from running more expensive low-rank approximation algorithms. In contrast, when , is maximized when has a flat spectrum, and so is a measure of how well-conditioned is. A fast tester could save us from running expensive pre-conditioning algorithms. We state our main theorems informally below.
Theorem 5.2 (Informal). For constant , there is a randomized algorithm for the Schatten- norm testing problem with failure probability at most which reads entries.
Results for Sensing Algorithms. We also consider a more powerful query oracle known as the sensing model, where query access comes in the form of for some sensing matrices of our choice. These matrices are chosen non-adaptively. We show differences in the complexity of the above problems in this and the above sampling model. For the testing and the estimation problems above, we have the following results in the sensing model:
Theorem 3.17 (Informal). Over an arbitrary finite field, any non-adaptive algorithm with constant success probability for the rank testing problem in the sensing model requires queries.
Theorems 4.3 and 4.7 (Informal). There is a randomized algorithm for the stable rank testing problem with failure probability at most in the sensing model with queries. Further, any algorithm with constant success probability requires queries.
Theorem 5.4 (Informal). For , any algorithm for the Schatten- norm testing problem with failure probability at most requires queries.
Theorem B.4 (Informal). Suppose that has stable rank and . In the bounded entry model, there is a randomized sensing algorithm with sensing complexity which outputs a -approximation to the largest singular value with probability at least . This sensing complexity is optimal up to polylogarithmic factors.
We also provide an query lower bound for the SVD entropy testing in the sensing model. We defer the definition of the problem and related results to Section 6.
1.2 Our Techniques
We now discuss the techniques in more detail, starting with the rank testing problem.
Prior to the work of , the only known algorithm for was to sample an submatrix. In contrast, for rank an algorithm in  samples blocks of varying shapes “within a random submatrix” and argues that these shapes are sufficient to expose a rank- submatrix. For the goal is to augment a matrix to a full-rank matrix. One can show that with good probability, one of the shapes “catches” an entry that enlarges the matrix to a full-rank matrix. For instance, in Figure 1, is our matrix and the leftmost vertical block catches an “augmentation element” which makes a full-rank matrix. Hereby, the “augmentation element” means the entry by adding which we augment a matrix to a matrix. In , an argument was claimed for , though we note an omission in their analysis. Namely, the “augmentation entry” can be the matrix we begin with (meaning that , which might not be true), and since one can show that both and fall inside the same sampling block with good probability, the matrix would be fully observed and the algorithm would thus be able to determine that it has rank . However, it is possible that and would not be a starting point (i.e., a rank- matrix), and in this case, may not be observed, as illustrated in Figure 1. In this case the algorithm will not be able to determine whether the augmented matrix is of full rank. For , nothing was known. One issue is that the probability of fully observing a submatrix within these shapes is very small. To overcome this, we propose what we call rebasing and transformation to a canonical structure. These arguments allow us to tolerate unobserved entries and conveniently obtain an algorithm for every , completing the analysis of  for in the process.
Rebasing Argument + Canonical Structure. The best previous result for the rank testing problem uniformly samples an submatrix and argues that one can find a full-rank submatrix within it when is -far from rank- . In contrast, our algorithm follows from subsampling an -fraction of entries in this submatrix. Let and be the indices of subsampled rows and columns, respectively, with . We choose these indices uniformly at random such that and , and sample the entries in all blocks determined by the (see Figure 1, where our sampled regions are enclosed by the dotted lines). Since there are blocks and in each block we sample entries, the sample complexity of our algorithm is as small as .
The correctness of our algorithm for follows from what we call a rebasing argument. Starting from an empty matrix, our goal is to maintain and augment the matrix to a full-rank matrix when is -far from rank-. By a level-set argument, we show an oracle lemma which states that we can augment any full-rank matrix to an full-rank matrix by an augmentation entry in the sampled region, as long as and is -far from rank-. Therefore, as a first step we successfully find a full-rank matrix, say with index , in the sampled region. We then argue that we can either (a) find a fully-observed full-rank submatrix or a submatrix which is not fully observed but we know must be of full rank, or (b) move our maintained full-rank submatrix upwards or leftwards to a new full-rank submatrix and repeat checking whether case (a) happens or not; if not, we implement case (b) again and repeat the procedure. To see case (a), by the oracle lemma, if the augmented entry is (see Figure 1), then we fully observe the submatrix determined by and and so the algorithm is correct in this case. On the other hand, if the augmented entry is , then we fail to see the entry at . In this case, when , then we must have ; otherwise, is not an augment of , which leads to a contradiction with the oracle lemma. Thus we find a matrix with structure
which must be of rank despite an unobserved entry, and the algorithm therefore is correct in this case. The remaining case of the analysis above is when . Instead of trying to augment , we augment in the next step. Note that the index is to the left of . This leads to case (b). In the worst case, we move the non-zero matrix to the uppermost left corner,
The analysis becomes more challenging for general , since the number of unobserved/unimportant entries (i.e., those entries marked as “”) may propagate as we augment an submatrix () in each round. To resolve the issue, we maintain a structure (modulo elementary transformations) similar to structure (1) for the submatrix, that is,
Since the proposed structure has non-zero determinant, the submatrix is always of full rank. Similar to the case for , we show that we can either (a) augment the submatrix to an submatrix with the same structure (2) (modulo elementary transformations); or (b) find another submatrix of structure (2) that is closer to the upper-left corner than the original matrix. Hence the algorithm is correct for general . More details are provided in the proof of Theorem 3.5.
Pivot-Node Assignment. Our rank testing lower bound under the sampling model over a finite field follows from distinguishing two hard instances vs. , where and have i.i.d. entries that are uniform over . For an observed subset of entries with , we bound the total variation distance between the distributions of the observed entries in the two cases by a small constant. In particular, we show that the probability is large for any observation , by a pivot-node assignment argument, as follows. We reformulate our problem as a bipartite graph assignment problem , where corresponds to the rows of , the rows of and each edge of one entry in . We want to assign each node a vector/affine subspace, meaning that the corresponding row in or will be that vector or in that affine subspace, such that they agree with our observation, i.e., . Since are random matrices, we assign random vectors to nodes adaptively, one at a time, and try to maintain consistency with the fact that . Note that the order of the assignment is important, as a bad choice for an earlier node may invalidate any assignment to a later node. To overcome this issue, we choose nodes of large degrees as pivot nodes and assign each non-pivot node adaptively in a careful manner so as to guarantee that the incident pivot nodes will always have valid assignments (which in fact form an affine subspace). In the end we assign the pivot node vectors from their respective affine subspaces. We employ a counting argument for each step in this assignment procedure to lower bound the number of valid assignments, and thus lower bound the probability .
The above analysis gives us an lower bound for constant since is constant-far from being of rank . The desired lower bound follows from planting vs. with into an matrix at uniformly random positions, and padding zeros everywhere else.
New Analytical Framework for Stable Rank, Schatten- Norm, and Entropy Testing. We propose a new analytical framework by reducing the testing problem to a sequence of estimation problems without involving in the sample complexity. There is a two-stage estimation in our framework: (1) a constant-approximation to some statistic of interest (e.g., stable rank) which enables us to distinguish vs. for the threshold parameter of interest. If , we can safely output “ is far from ”; otherwise, the statistic is at most , and (2) we show that has a -factor difference between “” and “far from ”, and so we implement a more accurate -approximation to distinguish the two cases. The sample complexity does not depend on polynomially because (1) the first estimator is “rough” and gives only a constant-factor approximation and (2) the second estimator operates under the condition that and thus has a low intrinsic dimension. We apply the proposed framework to the testing problems of the stable rank and the Schatten- norm by plugging in our estimators in Theorem B.2 and Theorem B.4. This analytical framework may be of independent interest to other property testing problems more broadly.
In a number of these problems, a key difficulty is arguing about spectral properties of a matrix when it is -far from having a property, such as having stable rank at most . Because of the fact that the entries must always be bounded by in absolute value, it becomes non-trivial to argue, for example, that if is -far from having stable rank at most , that its stable rank is even slightly larger than . A natural approach is to argue that you could change an -fraction of rows of to agree with a multiple of the top left or right singular vector of , and since we are still guaranteed to have stable rank at least after changing such entries, it means that the operator norm of must have been small to begin with (which says something about the original stable rank of , since its Frobenius norm can also be estimated). The trouble is, if the top singular vector has some entries that are very large, and others that are small, one cannot scale the singular vector by a large amount since then we would violate the boundedness criterion of our model. We get around this by arguing there either needs to exist a left or a right singular vector of large -norm (in some cases such vectors may only be right singular vectors, and in other cases only left singular vectors). The -norm is a natural norm to study in this context, since it is dual to the -norm, which we use to capture the boundedness property of the matrix.
We shall use bold capital letters , , … to indicate matrices, bold lower-case letters , , … to indicate vectors, and lower-case letters , , … to indicate scalars. We adopt the convention of abbreviating the set as . We write (resp. ) if there exists a constant such that (resp. ).
For matrix functions, denote by and the rank and the stable rank of , respectively. It always holds that . For matrix norms, let denote the Schatten- norm of , defined as . The Frobenius norm is a special case of the Schatten- norm when , the operator norm or the spectral norm (the largest singular value) of equals to the limit as . When , is not a norm but is still a well-defined quantity, and it tends to as . Let denote the number of non-zero entries in , and denote the entrywise norm of , i.e., . The rigidity of a matrix over a field , denoted by , is the least number of entries of that must be changed in order to reduce the rank of to a value at most :
Sometimes we may omit the subscript in when the matrix of interest is clear from the context.
We define the entropy of an unnormalized distribution ( with for all ) to be
Let , we define its entropy as
with the convention that . For matrices satisfying , it holds that for all and the entropy above coincides with the usual Shannon entropy. Note that scaling only changes the entropy additively; that is, .
Let denote the distribution of i.i.d. standard Gaussian matrix over and (or ) represent i.i.d. uniform matrix over a finite field (or a finite set ). We use to denote the total variation distance between two distributions and .
We shall also frequently use , , , , , , etc., to represent constants, which are understood to be absolute constants unless the dependency is otherwise specified.
3 Non-Adaptive Rank Testing
In this section, we study the following problem of testing low-rank matrices.
Problem 3.1 (Rank Testing with Parameter in the Sampling Model).
Given a field and a matrix which has one of promised properties:
has rank at most ;
is -far from having rank at most , meaning that requires changing at least an -fraction of its entries to have rank at most .
The problem is to design a property testing algorithm that outputs with probability if , and output with probability at least 0.99 if , with the least number of queried entries.
3.1 Positive Results
Below we provide a non-adaptive algorithm for the rank testing problem under the sampling model with queries when . Let be such that and let .
We note that the number of entries that Algorithm 1 queries is
Definition 1 (Augment).
For fixed matrix , we call an augment for if , and . We denote by the set of all the augments for , namely,
Definition 2 (Augment Pattern).
For fixed , and , define (where ) to be the number of ’s such that . Let the non-increasing reordering of the sequence , and for . We say that has augment pattern on if and only if .
Let be a full-rank matrix. If is -far from having rank and , then
Let be the set of entries in such that , i.e., . We will show that .
Let be the complement of inside the set . For any , we discuss the following two cases.
Case (i). There is such that or such that
In the former case, the row vector is a linear combination of the rows of . So we can change the value of so that is a linear combination of with the same representation coefficients as that of . Therefore, augmenting by the pair would not increase . Similarly, if there is such that , we can change the value of so that augmenting by the pair would not increase . We change at most entries for both cases combined.
Case (ii). for all and for all
In this case, we can change the entire -th row and -th column of so that does not increase by augmenting it with any pair in . Recall that and . It follows that . Therefore, this specific pair would lead to the change of at most entries. For all such ’s, we change at most entries in this case.
In summary, we can change at most entries of so that cannot increase by augmenting with any pair . Since is -far from being rank , we must have . Namely, . ∎
Let be a full-rank matrix. If is -far from being rank and , then there exists such that has augment pattern .
Suppose that does not have any augment pattern in . That is
which leads to a contradiction with Lemma 3.1. ∎
For fixed , suppose that has augment pattern on . Let be uniformly random such that , . Then the probability that contains at least one augment of on is at least .
Since has augment pattern on matrix , the probability that (and ) does not hit row (and column) of any augment is (and ). Therefore, the probability that hits at least one augment is given by
Warm-Up: The Case of
Without loss of generality, we may permute the rows and columns of and assume that and for all .
Let and . For any matrix , the probability that Algorithm 1 fails is at most .
If is of rank at most , then the algorithm will never make mistake; so we assume that is -far from being rank in the proof below.
Again by Lemma 3.2, has an augment pattern ; otherwise, is not -far from being rank-, and with probability at least there exists such that . We now discuss three cases based on the position of in relation to .
Case (i). .
By Lemma 3.3, with probability at least , contains an argument for , denoted by . By construction of and , and are also queried (See Figure 2(a)). Thus we find a non-singular matrix. The algorithm answers correctly with probability at least in this case.
Case (ii). or .
In this case, we show that starting from , we can always find a path for the non-singular submatrix such that the index always moves to the left or above, so we make progress towards case (i): we note that the non-zero element in the most upper left corner can always be augmented with three queried elements in the same augment pattern (i.e., Case (i)), because the uppermost left corner belongs to all ’s by construction. We now show how to find the path (Please refer to Figure 2(b) for the following proofs).
For index such that , if or (say at the moment), then by Lemma 3.3, there exists an index such that can be augmented by . However, we cannot observe so we do not find a submatrix at the moment. To make progress, we further discuss two cases.
Case (ii.1). and .
This case is impossible; otherwise, cannot be an augment of .
Case (ii.2). and .
Since , and , no matter what is, the submatrix
is always non-singular (Denote by the entry which can be observed or unobserved, meaning that the specific value of the entry is unimportant for our purpose). So the algorithm answers correctly with probability at least .
Case (ii.3). . Instead of augmenting , we shall pick to be our new base entry ( matrix) and try to augment it to a matrix. In this way, we have moved our base matrix towards the upper-left corner. We can repeat the preceding arguments of different cases.
If Case (i) happens for , we immediately have a rank- submatrix and the algorithm answers correctly with a good probability. If Case (i) does not happen, we shall demonstrate that we can make further progress. Suppose that is an augment of and . We intend to look at the submatrix
Here we cannot observe . We know that and cannot be both , otherwise would not be an augment for . If and , this matrix is nonsingular regardless of the value of and the algorithm will answer correctly. If , we can rebase our base matrix to be and try to augment it. Since is above , we have again moved towards the upper-left corner.
Note that there are at most different augment patterns and each time we rebase, moves from one to another for some . Hence, after repeating the argument above at most times, the algorithm is guaranteed to observe a non-singular submatrix. Since the failure probability in each round is at most , by union bound over rounds, the overall failure probability is at most , provided that .
In summary, the overall probability is at least that the algorithm answers correctly in all cases by finding a submatrix of rank , when is -far from being rank-. ∎
Extension to General Rank
Let and . For any matrix , the probability that Algorithm 1 fails is at most .
If is of rank at most , then the algorithm will never make mistake, so we assume that is -far from being rank in the proof below.
The idea is that, we start with the base case of an empty matrix, and augment it to a full-rank matrix in rounds, where in each round we increase the dimension of the matrix by exactly one. Each round may contain several steps in which we move the intermediate matrix () towards the upper-left corner without augmenting it; here, moving the matrix towards the upper-left corner means changing to , of the same rank, with and and , where means that, suppose that