On Identity Testing of Tensors, Low-rank Recovery and Compressed Sensing

On Identity Testing of Tensors, Low-rank Recovery and Compressed Sensing

Michael A. Forbes Email: miforbes@mit.edu, Department of Electrical Engineering and Computer Science, MIT CSAIL, 32 Vassar St., Cambridge, MA 02139, Supported by NSF grant 6919791, MIT CSAIL and a Siebel Scholarship.    Amir Shpilka Faculty of Computer Science, Technion — Israel Institute of Technology, Haifa, Israel, shpilka@cs.technion.ac.il. The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 257575.
Abstract

We study the problem of obtaining efficient, deterministic, black-box polynomial identity testing algorithms for depth-3 set-multilinear circuits (over arbitrary fields). This class of circuits has an efficient, deterministic, white-box polynomial identity testing algorithm (due to Raz and Shpilka [RS05]), but has no known such black-box algorithm. We recast this problem as a question of finding a low-dimensional subspace , spanned by rank 1 tensors, such that any non-zero tensor in the dual space has high rank. We obtain explicit constructions of essentially optimal-size hitting sets for tensors of degree 2 (matrices), and obtain quasi-polynomial sized hitting sets for arbitrary tensors (but this second hitting set is less explicit).

We also show connections to the task of performing low-rank recovery of matrices, which is studied in the field of compressed sensing. Low-rank recovery asks (say, over ) to recover a matrix from few measurements, under the promise that is rank . In this work, we restrict our attention to recovering matrices that are exactly rank using deterministic, non-adaptive, linear measurements, that are free from noise. Over , we provide a set (of size ) of such measurements, from which can be recovered in field operations, and the number of measurements is essentially optimal. Further, the measurements can be taken to be all rank-1 matrices, or all sparse matrices. To the best of our knowledge no explicit constructions with those properties were known prior to this work.

We also give a more formal connection between low-rank recovery and the task of sparse (vector) recovery: any sparse-recovery algorithm that exactly recovers vectors of length and sparsity , using non-adaptive measurements, yields a low-rank recovery scheme for exactly recovering matrices of rank , making non-adaptive measurements. Furthermore, if the sparse-recovery algorithm runs in time , then the low-rank recovery algorithm runs in time . We obtain this reduction using linear-algebraic techniques, and not using convex optimization, which is more commonly seen in compressed sensing algorithms.

Finally, we also make a connection to rank-metric codes, as studied in coding theory. These are codes with codewords consisting of matrices (or tensors) where the distance of matrices and is , as opposed to the usual hamming metric. We obtain essentially optimal-rate codes over matrices, and provide an efficient decoding algorithm. We obtain codes over tensors as well, with poorer rate, but still with efficient decoding.

1 Introduction

We start with a motivating example. Let and be vectors of variables each. Let be an matrix (over some field, say ), and define the quadratic form

Suppose now that we are given an oracle to , that can evaluate on inputs that we supply. The type of question we consider is: how many (deterministically chosen) evaluations of must we make in order to determine whether is non-zero?

It is not hard to show that evaluations to are necessary and sufficient to determine whether is non-zero. The question becomes more interesting when we are promised that . That is, given that , can we (deterministically) determine whether using evaluations of ? It is not hard to show that there (non-explicitly) exist evaluations to determine whether , and one of the new results in this paper is to give an explicit construction of such evaluations (over ).

We also consider various generalizations of this problem. The first generalization is to move from matrices (which are in a sense 2 dimensional) to the more general notion of tensors (which are in a sense -dimensional). That is, a tensor is a map and like a matrix we can define a polynomial

As with matrices, tensors have a notion of rank (defined later), and we can ask: given that how many (deterministically chosen) evaluations of are needed to determine whether . As iff , we see that this problem is an instance of polynomial identity testing, which asks: given oracle access to a polynomial that is somehow “simple”, how many (deterministically chosen) queries to are needed to determine whether ?

The above questions ask whether a certain matrix or tensor is zero. However, we can also ask for more, and seek to reconstruct this matrix/tensor fully. That is, how many (deterministically chosen) evaluations to are needed to determine ? This question can be seen to be related to compressed sensing and sparse recovery, where the goal is to reconstruct a “simple” object from “few” measurements. In this case, “simple” refers to the matrix being low-rank, as opposed to a vector being sparse. As above, it is not hard to show that there exist evaluations that determine , and this paper gives an explicit construction of such evaluations, as well as an efficient algorithm to reconstruct from these evaluations.

We will now place this work in a broader context by providing background on polynomial identity testing, compressed sensing and low-rank recovery, and the theory of rank-metric codes.

1.1 Polynomial Identity Testing

Polynomial identity testing (PIT) is the problem of deciding whether a polynomial (specified by an arithmetic circuit) computes the identically zero polynomial. The obvious deterministic algorithm that completely expands the polynomial unfortunately takes exponential time. This is in contrast to the fact that there are several (quite simple) randomized algorithms that solve this problem quite efficiently. Further, some of these randomized algorithms treat the polynomial as a black-box, so that they only use the arithmetic circuit to evaluate the polynomial on chosen points, as opposed to a white-box algorithm which can examine the internal structure of the circuit. Even in the white-box model, no efficient deterministic algorithms are known for general circuits.

Understanding the deterministic complexity of PIT has come to be an important problem in theoretical computer science. Starting with the work of Kabanets and Impagliazzo [KI04], it has been shown that the existence of efficient deterministic (white-box) algorithms for PIT has a tight connection with the existence of explicit functions with large circuit complexity. As proving lower bounds on circuit complexity is one of the major goals of theoretical computer science, this has led to much research into PIT.

Stronger connections are known when the deterministic algorithms are black-box. For, any such algorithm corresponds to a hitting set, which is a set of evaluation points such that any small arithmetic circuit computing a non-zero polynomial must evaluate to non-zero on at least one point in the set. Heintz and Schnorr [HS80], as well as Agrawal [Agr05], showed that any deterministic black-box PIT algorithm very easily yields explicit polynomials that have large arithmetic circuit complexity. Moreover, Agrawal and Vinay [AV08] showed that a deterministic construction of a polynomial size hitting set for arithmetic circuits of depth- gives rise to a quasi-polynomial sized hitting set for general arithmetic circuits. Thus, the black-box deterministic complexity of PIT becomes interesting even for constant-depth circuits. However, currently no polynomial size hitting sets are known for general depth-3 circuits. Much of recent work on black-box deterministic PIT has identified certain subclasses of circuits for which small hitting sets can be constructed, and this work fits into that paradigm. See [SY10] for a survey of recent results on PIT.

One subclass of depth-3 circuits is the model of set-multilinear depth-3 circuits, first introduced by Nisan and Wigderson [NW96]. Raz and Shpilka [RS05] gave a polynomial-time white-box PIT algorithm for non-commutative arithmetic formulas, which contains set-multilinear depth-3 circuits as a subclass. However, no polynomial-time black-box deterministic PIT algorithm is known for set-multilinear depth-3 circuits. The best known black-box PIT results for the class of set-multilinear circuits, with top fan-in and degree , are hitting sets of size , where the first part of bound comes from a simple argument (presented in Lemma 3.11), and the second part of the bound ignores that we have set-multilinear polynomials, and simply uses the best known hitting sets for so-called circuits as established by Saxena and Seshadhri [SS11]. For non-constant and , these bounds are super-polynomial. Improving the size of these hitting sets is the primary motivation for this work.

To connect PIT for set-multilinear depth-3 circuits with the above questions on matrices and tensors, we now note that any such circuit of top fan-in , degree , on variables (and thus size ), computes a polynomial , where is an tensor of rank . Conversely, any such can be computed by such a circuit. Thus, constructing better hitting sets for this class of circuits is exactly the question of finding smaller sets of (deterministically chosen) evaluations to to determine whether .

1.2 Low-Rank Recovery and Compressed Sensing

Low-rank Recovery (LRR) asks (for matrices) to recover an matrix from few measurements of . Here, a measurement is some inner product , where is an matrix and the inner product is the natural inner product on long vectors. This can be seen as the natural generalization of the sparse recovery problem, which asks to recover sparse vectors from few linear measurements. For, over matrices, our notion of sparsity is simply that of being low-rank.

Sparse recovery and compressed sensing are active areas of research, see for example [CSw]. Much of this area focuses on constructing distributions of measurements such that the unknown sparse vector can be recovered efficiently, with high probability. Also, it is often assumed that the sequence of measurements will not depend on any of the measurement results, and this is known as non-adaptive sparse recovery. We note that Indyk, Price and Woodruff [IPW11] showed that adaptive sparse recovery can outperform non-adaptive measurements in certain regimes. Much of the existing work also focuses on efficiency concerns, and various algorithms coming from convex programming have been used. As such, these algorithms tend to be stable under noise, and can recover approximations to the sparse vector (and can even do so only if the original vector was approximately sparse). One of the initial achievements in this field is an efficient algorithm for recovery of a -sparse111A vector is -sparse if it has at most non-zero entries. approximation of -entry vector in measurements [CRT05].

Analogous questions for low-rank recovery have also been explored (for example, see [lrr] and references there in). Initial work (such as [CT09, CP09]) asked the question of low-rank matrix completion, where entries of a low-rank matrix are revealed individually (as opposed measuring linear combinations of matrix entries). It was shown in these works that for an rank matrix that noisy samples suffice for nuclear-norm minimization to complete the matrix efficiently. Further works (such as [ENP11]) prove that a randomly chosen set of measurements (with appropriate parameters) gives enough information for low-rank recovery, other works (such as [CP11, RFP10]) giving explicit conditions on the measurements that guarantee that the nuclear norm minimization algorithm works, and finally other works seek alternative algorithms for certain ensembles of measurements (such as222Interestingly, [KOH11] use what they call subspace expanders a notion that was studied before in a different context in theoretical computer science and mathematics under the name of dimension expanders [LZ08, DS08]. [KOH11]). As in the sparse recovery case, most of these work seek stable algorithms that can deal with noisy measurements as well as matrices that are only approximately low-rank. Finally, we note that some applications (such as quantum state tomography) have additional requirements for their measurements (for example, they should be easy to prepare as quantum states) and some work has gone into this as well [GLF10, Gro09].

We now make a crucial observation which shows that black-box PIT for the quadratic form is actually very closely related to low-rank recovery of . That is, note that . That is, an evaluation of corresponds to a measurement of , and in particular this measurement is realized as a rank-1 matrix. Thus, we see that any low-rank-recovery algorithm that only uses rank-1 measurement can also determine if is non-zero, and thus also performs PIT for quadratic forms. Conversely, suppose we have a black-box PIT algorithm for rank quadratic forms. Note then that for any with rank , has rank . Thus, if then will evaluate to non-zero on some point in the hitting set. As , it follows that a hitting set for rank matrices will distinguish and . In particular, this shows that information-theoretically any hitting set for rank matrices is also an LRR set. Thus, in addition to constructing hitting sets for the quadratic forms , this paper will also use those hitting sets as LRR sets, and also give efficient LRR algorithms for these constructions.

1.3 Rank-Metric Codes

Most existing work on LRR has focused on random measurements, whereas the interesting aspect of PIT is to develop deterministic evaluations of polynomials. As the main motivation for this paper is to develop new PIT algorithms, we will seek deterministic LRR schemes. Further, we will want results that are field independent, and so this work will focus on noiseless measurements (and matrices that are exactly of rank ). In such a setting, LRR constructions are very related to rank-metric codes. These codes (related to array codes), are error-correcting codes where the messages are matrices (or tensors) and the normal notion of distance (the Hamming metric) is replaced by the rank metric (that is, the distance of matrices and is ). Over matrices, these codes were originally introduced independently by Gabidulin, Delsarte and Roth [GK72, Gab85b, Gab85a, Del78, Rot91]. They showed, using ideas from BCH codes, how to get optimal (that is, meeting an analogue of the Singleton bound) rank-metric codes over matrices, as well as how to decode these codes efficiently. A later result by Meshulam [Mes95] constructed rank-metric codes where every codeword is a Hankel matrix. Roth [Rot91] also showed how to construct rank-metric codes from any hamming-metric code, but did not provide a decoding algorithm. Later, Roth [Rot96] considered rank-metric codes over tensors and gave decoding algorithms for a constant number of errors. Roth also discussed analogues to the Gilbert-Varshamov and Singleton bounds in this regime. This alternate metric is motivated by crisscross errors in data storage scenarios, where corruption can occur in bursts along a row or column of a matrix (and are thus rank-1 errors).

We now explain how rank-metric codes are related to LRR. Suppose we have a set of matrices which form a set of (non-adaptive, deterministically chosen) LRR measurements that can recover rank matrices. Define the code as the set of matrices orthogonal to each matrix in . Thus, is a linear code. Further, given some and such that , it follows that (where we abuse notation and treat and as -long vectors, and as an matrix). That is an LRR set means that can be recovered from the measurements . Thus the code can correct errors (and has minimum distance , by a standard coding theory argument, as encapsulated in Lemma 8.4). Similarly, given a rank-metric code that can correct up to rank errors, the parity checks of this code define an LRR scheme. Thus, a small LRR set is equivalent to a rank-metric code with good rate.

The previous subsection showed the tight connection between LRR and PIT. Via the above paragraph, we see that hitting sets for quadratic forms are equivalent to rank-metric codes, when the parity check constraints are restricted to be rank 1 matrices.

1.4 Reconstruction of Arithmetic Circuits

Even more general than the PIT and LRR problems, we can consider the problem of reconstruction of general arithmetic circuits only given oracle access to the evaluation of that circuit. This is the arithmetic analog of the problem of learning a function using membership queries. For more background on reconstruction of arithmetic circuits we refer the reader to [SY10]. Just as with the PIT and LRR connection, PIT for a specific circuit class gives information-theoretic reconstruction for that circuit class. As we consider the PIT question for tensors, we can also consider the reconstruction problem.

The general reconstruction problem for tensors of degree and rank was considered before in the literature [BBV96, BBB00, KS06] where learning algorithms were given for any value of . However, those algorithms are inherently randomized. Also of note is that the algorithms of [BBB00, KS06] output a multiplicity automata, which in the context of arithmetic circuits can be thought of as an arithmetic branching program. In contrast, the most natural form of the reconstruction question would be to output a degree tensor.

1.5 Our Results

In this subsection we informally summarize our results. We again stress that our results handle matrices of exactly rank , and we consider non-adaptive, deterministic measurements. The culminating result of this work is the connection showing that low-rank recovery reduces to performing sparse-recovery, and that we can use dual Reed-Solomon codes to instantiate the sparse-recovery oracle to achieve a low-rank recovery set that only requires rank-1 (or even sparse) measurements. We find the fact that we can transform an algorithm for a combinatorial property (recovering sparse signals) to an algorithm for an algebraic property (recovering low-rank matrices) quite interesting.

Hitting Sets for Matrices and Tensors

We begin with constructions of hitting sets for matrices, so as to get black box PIT for quadratic forms. By improving a construction of rank-preserving matrices from Gabizon-Raz [GR08], we are able to show the following result, which we can then leverage to construct hitting sets.

Theorem (Theorem 5.1).

Let . Let be a “large” field, and let have “large” multiplicative order. Let be an matrix of rank over . Let be the bivariate polynomial defined by the vectors and such that333In this paper, vectors and matrices are indexed from zero, so . and .

Then is non-zero iff one of the univariate polynomials is non-zero.

Intuitively this says that we can test if the quadratic form is zero by testing whether each of univariate polynomials are zero. As these univariate polynomials are of degree , it follows that we can interpolate them fully using evaluations. As such a univariate polynomial is zero iff all of these evaluations are zero, this yields a sized hitting set. While this only works for “large” fields, we can combine this with results on simulation of large fields (see Section 6.3) to derive results over any field with some loss. This is encapsulated in the next results for black-box PIT, where the log factors are unnecessary over large fields.

Theorem (Corollaries 6.13 and 6.17).

Let . Let be any field, then there is a -explicit444A matrix is -explicit if each entry can be (deterministically) computed in steps, where field operations are considered unit cost. hitting set for matrices of rank , of size .

Theorem (Corollary 6.18).

Let and . Let be any field, then there is a -explicit hitting set for tensors of rank , of size .

If is large enough then the term is unnecessary. In such a situation, this is a quasi-polynomial sized hitting set, improving on the sized hitting set achievable by invoking the best known results for circuits [SS11]. However, this hitting set is not as explicit as the construction of [SS11] since it takes at least time to compute, as opposed to . Nevertheless, although it takes time to construct the set, the fact that it is of quasi-polynomial size is quite interesting and novel. Indeed, in general it is not clear at all how to construct a quasi-polynomial sized hitting set for general circuits (or just for depth- circuits), when one is allowed even an construction time (where is the number of variables, and is the degree of the output polynomial). We note that this result improves on the two obvious hitting sets seen in Lemmas 3.11 and 3.13. The first gives tensors in the hitting set and is -explicit while the second gives a set of size while not being explicit at all. The above result non-trivially interpolates between these two results. Finally, we mention that in Remark 6.9 we explain how one can achieve (roughly) a -constructible hitting set of the same size. As this is a somewhat mild improvement (this is still not the explicitness that we were looking for) we only briefly sketch the argument.

Low-Rank Recovery

As mentioned in the previous section, black-box PIT results imply LRR constructions in an information theoretic sense. Thus, the above hitting sets imply LRR constructions but the algorithm for recovery is not implied by the above result. To yield algorithmic results, we actually establish a stronger claim. That is, we first show that the above hitting sets embed a natural sparse-recovery set arising from the dual Reed-Solomon code. Then we develop an algorithm that shows that any sparse-recovery set gives rise to a low-rank-recovery set, and that recovery can be performed efficiently given an oracle for sparse recovery. This connection (in the context that any error-correcting code in the hamming metric yields an error-correcting code in the rank-metric) was independently made by Roth [Rot91] (see Theorem 3), who did not give a recovery procedure for the resulting LRR scheme. The next theorem, which is the main result of the paper, shows this connection is also efficient with respect to recovery.

Theorem (Theorem 7.19).

Let . Let be a set of (non-adaptive) measurements for -sparse-recovery for -long vectors. Then there is a -explicit set , which is a (non-adaptive) rank low-rank-recovery set for matrices, with a recovery algorithm running in time , where is the amount of time needed to do sparse-recovery from . Further, , and each matrix in is -sparse.

This result shows that sparse-recovery and low-rank recovery (at least in the exact case) are very closely connected. Interestingly, this shows that sparse-recovery (which can be regarded as a combinatorial property) and low-rank recovery (which can be regarded as an algebraic property) are tightly connected. Many fruitful connections have taken this form, such as in spectral graph theory, and perhaps the connection presented here will yield yet further results.

Also, the algorithm used in the above result is purely linear-algebraic, in contrast to the convex optimization approaches that many compressed sensing works use. However, we do not know if the above result is stable to noise, and regard this issue as an important question left open by this work.

When the above result is combined with our hitting set results, we achieve the following LRR scheme for matrices (and an LRR scheme for tensors, with parameters similar to Corollary 6.18 mentioned above, and Corollary 8.6 mentioned below, is derived in Corollary 8.2).

Theorem (Corollary 7.26).

Let . Over any field , there is an -explicit set , of size, such that measurements against allow recovery of matrices of rank in time . Further, the matrices in can be chosen to be all rank 1, or all -sparse.

We note again that over large fields these logarithmic factors are seen to be unneeded.

Some prior work [GK72, Gab85b, Gab85a, Del78, Rot91] on LRR focused on finite fields, and as such based their results on BCH codes. The above result is based on (dual) Reed-Solomon codes, and as such works over any field (when combined with results allowing simulation of large fields by small fields). Other prior work [RFP10] on exact LRR permitted randomized measurements, while we achieve deterministic measurements.

Further, we are able to do LRR with measurements that are either all -sparse, or all rank-1. As Roth [Rot91] independently observed, the -sparse LRR measurements can arise from any (hamming-metric) error-correcting code (but he did not provide decoding). Tan, Balzano and Draper [TBD11] showed that random -sparse measurements provide essentially the same low-rank recovery properties as random measurements. Thus, our results essentially achieve this deterministically.

We further observe that a specific code (the dual Reed-Solomon code) allows a change of basis for the measurements, and in this new basis the measurements are all rank 1. Recht et al. [RFP10] asked whether low-rank recovery was possible when the measurements were rank 1 (or “factored”), as such measurements could be more practical as they are simpler to generate and store in memory. Thus, our construction answers this question in the positive direction, at least for exact LRR.

Rank-Metric Codes

Appealing to the connection between LRR and rank-metric codes, we achieve the following constructions of rank-metric codes.

Theorem (Corollary 8.5).

Let be any field, and . Then there are -explicit rank-metric codes with -time decoding for up to errors, with parameters , and the parity checks on this code can be chosen to be all rank-1 matrices, or all -sparse matrices.

Earlier work on rank-metric codes over finite fields [GK72, Gab85b, Gab85a, Del78, Rot91] achieved rank-metric codes, with efficient decoding algorithms. These are optimal (meeting the analogue of the Singleton bound for rank-metric codes). However, these constructions only work over finite fields. While our code achieves a worse rate, its construction works over any field, and over infinite fields the term is unneeded. Further, Roth [Rot91] observed that the resulting code is optimal (see discussion of his Theorem 3) over algebraically closed fields (which are infinite).

We are also able to give rank-metric codes over tensors, which can correct errors up to rank (out of a maximum ), while still achieving constant rate. The rank-metric code arising from the naive low-rank recovery of Lemma 3.11 never achieves constant rate, and prior work by Roth [Rot96] only gave decoding against a constant number of errors.

Theorem (Corollary 8.6).

Let be any field, and . Then there are -explicit rank-metric codes with -time decoding for up to errors, with parameters .

We note here that our decoding algorithm will return the entire tensor, which is of size . Trivially, any algorithm returning the entire tensor must take at least time. In this case, the level of explicitness of the code we achieve is reasonable. However, a more desirable result would be for the algorithm to return a rank representation of the tensor, and thus the lower bound would not apply so that one could hope for faster decoding algorithms. Unfortunately, even for an efficient algorithm to do so would imply . That is, if an algorithm (even one which is not a rank-metric decoding or low-rank recovery algorithm) could produce a rank decomposition for any rank tensor, then one could compute tensor-rank by as it is the minimum such that the resulting rank decomposition actually computes the desired tensor (this can be checked in time). However, Håstad [Hås90] showed that tensor-rank (over finite fields) is -hard for any fixed . It follows that for any (fixed) , if one could recover (even in -time) a rank tensor into its rank decomposition, then . Thus, we only discuss recovery of a tensor by reproducing its entire list of entries, as opposed to its more concise representation.

Finally, we remark that in [Rot96] Roth discussed the question of decoding rank-metric codes of degree , gave decoding algorithms for errors of rank and , and wrote that “Since computing tensor rank is an intractable problem, it is unlikely that we will have an efficient decoding algorithm otherwise, we could use the decoder to compute the rank of any tensor. Hence, if there is any efficient decoding algorithm, then we expect such an algorithm to recover the error tensor without necessarily obtaining its rank. Such an algorithm, that can handle any prescribed number of errors, is not yet known.” Thus, our work gives the first such algorithm for tensors of degree .

1.6 Proof Overview

In this section we give proof outlines of the results mentioned so far.

Hitting Sets for Matrices

The main idea for our hitting set construction is to reduce the question of hitting (non-zero) matrices to a question of hitting (non-zero) matrices. Once this reduction is performed, we can then run the naive hitting set of Lemma 3.11, which queries all entries. This can loosely be seen in analogy with the kernelization process in fixed-parameter tractability, where a problem depending on the input size, , and some parameter, , can be solved by first reducing to an instance of size , and then brute-forcing this instance.

To perform this kernelization, we first note that any matrix of rank exactly can be written as , where and are matrices of rank exactly . To reduce to an matrix, it thus suffices to reduce and each to matrices, denoted and . As this reduction must preserve the fact that is non-zero, we need that . We enforce this requirement by insisting that and are also rank exactly , so that is also non-zero.

To achieve this rank-preservation, we turn to a lemma of Gabizon-Raz [GR08] (we note that this lemma has been used before for black-box PIT [KS08, SS11]). They gave an explicit family of -many -matrices , such that for any and of rank exactly , at least one matrix from the family is such that . Translating this result into our problem, it follows that one of the matrices is full-rank. The -th entry of is , where is the -th row of . It follows that querying each entry in these matrices corresponds to a rank 1 measurement of , and thus make up a hitting set. As there were choices of and choices of , this gives a -sized hitting set.

To achieve a smaller hitting set, we use the following sequence of ideas. First, we observe that in the above, we can always assume . Loosely, this is because is always full-rank, or zero. Thus, only the first row of needs to be queried to determine this. Second, we improve upon the Gabizon-Raz lemma, and provide an explicit family of rank-preserving matrices with size . This follows from modifying their construction so the degree of a certain determinant is smaller. To ensure that the determinant is a non-zero polynomial, we show that it has a unique monomial that achieves maximal degree, and that the term achieving maximal degree has a non-zero coefficient as a Vandermonde determinant (formed from powers of an element , which has large multiplicative order) is non-zero. Finally, we observe that the hitting set constraints can be viewed as a constraints regarding polynomial interpolation. This view shows that some of the constraints are linearly-dependent, and thus can be removed. Each of the above observations saves a factor of in the size of the hitting set, and thus produces an -sized hitting set.

Low-Rank Recovery

Having constructed hitting sets, Lemma 3.10 implies that the same construction yields low-rank-recovery sets. As this lemma does not provide a recovery algorithm, we provide one. To do so, we must first change the basis of our hitting set. That is, the hitting set yields a set of constraints on a matrix , and we are free to choose another basis for these constraints, which we call . The virtue of this new basis is that each constraint is non-zero only on some -diagonal (the entries such that ). It turns out that these constraints are the parity checks of a dual Reed-Solomon code with distance . This code can be decoded efficiently using what is known as Prony’s method [dP95], which was developed in 1795. We give an exposition in Section 7.1, where we show how to syndrome-decode this code up to half its minimum distance, counting erasures as half-errors. Thus, given a -sparse vector (which can be thought of as errors from the vector ) these parity checks impose constraints from which the sparse vector can be recovered. Put another way, our low-rank-recovery set naturally embeds a sparse-recovery set along each -diagonal.

Thus, in designing a recovery algorithm for our low-rank recovery set, we do more and show how to recover from any set of measurements which embed a sparse-recovery set along each -diagonal. In terms of error-correcting codes, this shows that any hamming-metric code yields a rank-metric code over matrices, and that decoding the rank-metric code efficiently reduces to decoding the hamming-metric code.

To perform recovery, we introduce the notion of a matrix being in -upper-echelon form. Loosely, this says that , the entries of the matrix with , are in row-reduced echelon form. We then show that for any matrix in -upper-echelon form, the -diagonal is -sparse. As an example, suppose was entirely zero. It follows then that is in -upper-echelon form. Further, the rows that have non-zero entries on the -diagonal of are then linearly-independent, as they form a triangular system. It follows that the -diagonal can only have non-zero entries. The more general case is slightly more complicated technically, but not conceptually. Thus, this echelon-form translates the notion of low-rank into the notion of sparsity.

The algorithm then follows naturally. We induct on , first putting into -upper-echelon form (using row-reduction), and then invoking a sparse-recovery oracle on the -diagonal of to recover it. This then yields , and we increment . However, as described so far, the use of the sparse-recovery oracle is adaptive. We show that the row-reduction procedure can be understood such that the adaptive use of the sparse-recovery oracle can be simulated using non-adaptive calls to the oracle. More specifically, we will apply the measurements of the sparse-recovery oracle on each -diagonal of (which may not be sparse), and show how to compute the measurements of the adaptive algorithm (where the -diagonals are sparse) from the measurements made. Putting these steps together, this shows that exact non-adaptive low-rank-recovery reduces to exact non-adaptive sparse-recovery. Instantiating this claim with our hitting sets from above gives a concrete low-rank-recovery set, with accompanied recovery algorithm.

Hitting Sets and Low-Rank Recovery for Tensors

The results for matrices naturally generalize to tensors in the sense that an tensor can be viewed as an matrix. However, we can do better. Specifically, the hitting set results were done via variable reduction, as encapsulated by Theorem 5.1, which shows that a rank bivariate polynomial is zero iff a set of univariate polynomials are all zero. Further, the degrees of these polynomials is only twice the original degree. As each univariate polynomial can be interpolated using measurements, this yields measurements total. This motivates the more general idea of treating a degree tensor as a -variate polynomial, and showing that we can test whether this polynomial is zero by testing if a collection of -variate polynomials are zero, for . Recursing on this procedure then reduces the -variate case to the univariate case, and the univariate case is brute-force interpolated.

The recursion scheme we develop for this is to show that a -variate polynomial is zero iff -variate polynomials are zero, and this naturally leads to an -sized hitting set. To prove its correctness, we show that the bivariate case (corresponding to matrices) applied to two groups of variables allows us to reduce to a single group of variables (with an increase in the number of polynomials to test). Finally, since we saw how to do low-rank recovery for matrices, and the tensor-case essentially only uses the matrix case, we can also turn this hitting set procedure into a low-rank recovery algorithm.

Simulation of Large Fields by Small Fields

Most all of the results mentioned require a field of size . When getting results over small fields, we show that, with some loss, we can simulate such large fields inside the hitting sets. We break-up each tensor in the original hitting set into new tensors such that for any -tensor , can be reconstructed from the set of values . To do so, we use the well-known representation of a extension field of as a field of matrices over . As the entries of a rank-1 tensor are multiplications of elements of , we can expand these multiplications out as iterated matrix multiplications, which yields terms to consider, each of which corresponds to some .

Rank-Metric Codes

The above techniques give the existence of low-rank-recovery sets (and corresponding algorithms) for tensors, over any field. Via the connections presented in Section 1.3, this readily yields rank-metric codes with corresponding parameters.

2 Notation

We now fix some notation. For a positive integer we denote and . We use to denote the set of all subsets of of size . Given a set of integers, we denote . All logarithms will be base 2. Given a polynomial , will denote the total degree of , and will denote the individual degree of in the variable . That is, the polynomial has total degree 2 and individual degree 1 in the variable and individual degree 0 in the variable . Given a monomial , will denote the coefficient of in the polynomial .

Vectors, matrices, and tensors will all begin indexing from 0, instead of from 1. The number will typically refer to the number of rows of a matrix, and the number of columns. will denote the identity matrix. Denote to be the square matrix with its -th entry being 1, and all other entries being zero. A vector is -sparse if it has at most non-zero entries. Given a matrix , will denotes its transpose. Given a vector , .

A list of values in is -explicit if each entry can be computed in steps, where we allow operations in to be done at unit cost.

Frequently throughout this paper we will divide a matrix into its diagonals, which we define as the entries where is constant. The following notation will make this discussion more convenient.

Notation 2.1.

Let be an matrix. The -diagonal of is the set of entries . The -diagonals of is the set of entries . The -diagonals of is the set of entries

, and will denote the -diagonal, -diagonals and -diagonals of , respectively.

This notation will be frequently abused, in that a diagonal will refer to a set of positions in a matrix in addition to referring to the values in those positions. However, the main diagonal of a matrix will refer to the entries of that matrix.

3 Preliminaries

In this section we formally define tensors as well as the PIT and LRR problems. We first discuss tensors, and their notion of rank. Rank-metric codes will be defined and discussed in Section 8. Recall that we index starting at , so we will use the product space instead of for the domains of tensors.

Definition 3.1.

A tensor over a field is a function . It is said to have degree and size . If all of the are equal to , then is said to have size .

Given two tensor of size , .

Note that the above inner product is the natural inner product when regarding a tensor as a vector of dimension . We now define the notion of rank. Loosely, a tensor is rank 1 if it can be “factored” along each dimension, and a tensor is rank if it can be expressed as the sum of rank 1 tensors.

Definition 3.2.

A tensor is rank-one if for there are vectors such that . That is, for all , where denotes the -th coordinate of .

The rank of a tensor , is defined as the minimum number of terms in a summation of rank-1 tensors expressing , that is,

As one might hope, when the above definitions reduce to the definition of a matrix, and matrix-rank, respectively. Further, the inner-product is then their Frobenius inner product. That is, .

We now define the polynomial of a tensor.

Definition 3.3.

Let be a tensor, and let be vectors of variables, so for all . Then define

and define the -variate polynomial

where .

Note that the second equality in the first equation of the above definition follows from the definition of the inner product over tensors. As a matrix is also a tensor, we will also use this notation when considering the polynomial , as the above definition readily generalizes the notion of a quadratic form. Note that allows us to consider any -variate polynomial to be a tensor, and the rank of such a polynomial will simply be the rank of the corresponding tensor.

We now show the connection of these polynomials to set-multilinear depth-3 circuits. We do not seek to define all of the relevant terms in this notion, and instead refer the reader to the recent survey [SY10], and will simply define the subclass we are interested in.

Definition 3.4.

For , let be vectors of variables. A degree , set-multilinear, circuit with top fan-in , is a polynomial of the following form

where each .

We now see the following connection between these circuits and tensors.

Lemma 3.5.

The polynomials computed by degree set-multilinear circuits, with top fan-in , on variables, are exactly the polynomials , for tensors with rank .

Proof.

: Suppose is of rank , so for . Then , and this final polynomial is computed as a set-multilinear circuit.

: This argument is simply the reverse of the above. ∎

We also get the following result for the polynomial .

Lemma 3.6.

For with rank , , where .

Proof.

As is rank , for . Then . Taking yields the result. ∎

Recall that, as discussed in the introduction, set-multilinear circuits have a white-box polynomial-time PIT algorithm due to Raz and Shpilka [RS05] but no known polynomial-sized black-box PIT algorithm. By the above connection, this is the same as creating hitting sets for tensors, which we will now define.

Definition 3.7.

Let be an extension of . A hitting set for tensors of rank over is a set of points such that for any of rank , is a non-zero iff there exists such that .

However, we saw in Definition 3.3 that evaluating is equivalent to taking an inner product of with a rank-1 tensor. This leads to the following equivalent definition.

Definition 3.8 (Reformulation of Definition 3.7).

Let be an extension of . A hitting set for tensors of rank over is a set of rank-1 tensors such that for any of rank , is a non-zero iff there exists such that .

If instead is not constrained to consist of rank-1 tensors, then we say is an improper hitting set.

As is common in PIT literature, we allow the use of the extension field , and in our case will be sufficient. However, the results of Section 6.3 will show how to remove the need for from our results (with some loss).

We now define our notion of a low-rank recovery set, extending Definition 3.8. Note that we drop here the restriction that the tensors must be rank 1.

Definition 3.9.

A set of tensors is an -low-rank-recovery set if for every tensor with rank , is uniquely determined by , where is defined by , for .

An algorithm performs recovery from if, for each such , it recovers given .

We now show that, despite low-rank recovery being a stronger notion than a hitting set, hitting sets imply low-rank recovery with some loss in parameters, as seen by the following lemma.

Lemma 3.10.

If is a (proper or improper) hitting-set for tensors of rank , then is an -low-rank-recovery set for tensors also.

Proof.

Let be two tensors of rank such that their inner products with the tensors in are the same. By linearity of the inner product, it follows then that the tensor has rank and has zero inner product with each tensor in . As is a hitting set, it follows that , and thus . Therefore, tensors of rank are determined by their inner products with and thus is an -low-rank-recovery set. ∎

We now discuss some trivial LRR results. The first result is the obvious low-rank recovery construction, which is extremely explicit but requires many measurements.

Lemma 3.11.

For , , there is a -explicit -low-rank-recovery set for tensors, of size . Further, recovery of is possible in time.

Proof.

For , let the rank 1 tensor be the rank 1 tensor, which is the indicator function for the set . Thus, . It follows that iff each such inner product is zero, and further that recovery of is possible (in time). The explicitness of the recovery set is also clear. ∎

We now will show that, via the probabilistic method, one can show that much smaller low-rank recovery sets exist. To do so, we first cite the following form of the Schwartz-Zippel Lemma.

Lemma 3.12 (Schwartz-Zippel Lemma [Sch80, Zip79]).

Let be a non-zero polynomial of total degree , and . Then .

We now give a (standard) probabilistic method proof that small hitting sets exist (over finite fields). We present this not as a tight result, but as an example of what parameters one can hope to achieve.

Lemma 3.13.

Let be the field on elements. Let and . Then there is a hitting set for tensors of rank , of size . Further, there is an -low-rank recovery set of size .

Proof.

For any non-zero tensor , has degree , and thus by the Schwartz-Zippel Lemma, for a random , with probability at most . There are at most such non-zero tenors. By a union bound, it follows that random points are not a hitting set for rank tensors with probability at most , which is if . The low-rank-recovery set follows from Lemma 3.10. ∎

We now briefly remark on the tightness of the above result. The general case of tensors is not well understood, as it is not well-understood how many tensors there are of a given rank. For matrices, the situation is much more clear. In particular, Roth [Rot91] showed (using the language of rank-metric codes) that over finite fields the best (improper) hitting set for matrices of rank is of size , and over algebraically closed fields the best (improper) hitting set is of size . As we will aim to be field independent, the second bound is more relevant, and we indeed match this bound (as seem in Theorem 5.10) with a proper hitting set.

Clearly, the above lemma is non-explicit. However, it yields a much smaller hitting set than the result given in Lemma 3.11. Note that previous work (even for ) on LRR and rank-metric codes did not focus on requiring that the measurements are rank-1 tensors, and thus cannot be used for PIT. Given this lack of knowledge, this paper seeks to construct proper hitting sets, and low-rank-recovery sets, that are both explicit and small.

We remark that any explicit hitting set naturally leads to tensor rank lower bounds555This connection, along with the connection to rank-metric codes mentioned earlier, can be put in a more broad setting: hitting sets (and thus lower-bounds) for circuits from some class are in a sense equivalent to -metric linear codes. That is, codes where is defined as the size of the smallest circuit whose truth table is the string . We do not pursue this idea further in this work.. The following lemma, which can be seen as a special case of the more general results of Heintz-Schnorr [HS80] and Agrawal [Agr05], shows this connection more concretely.

Lemma 3.14.

Let be a hitting set for tensors of rank , such that . Then there is a -explicit tensor of rank .

Proof.

Consider the constraints imposed on a tensor by the system of equations . There are constraints and variables. It follows that there is a non-zero solving this system. By the definition of a hitting set, it follows that . That is explicit follows from Gaussian Elimination. ∎

For , the above is less interesting, as matrix rank is well understood and we know many matrices of high rank. For , tensor rank is far less understood. For , the best known lower bounds for the rank of explicit tensors, over arbitrary fields, due to Alexeev, Forbes, and Tsimerman [AFT11], are (over , a lower bound of is known, essentially due to Brown and Dobkin [BD80]). More generally, for any fixed , no explicit tensors are known with tensor rank . The above lemma shows that constructing hitting sets is at least as hard as getting a lower bound on any specific tensor. In particular, constructing a hitting set for tensors of rank of size with would yield new tensor rank lower bounds for odd , in particular . Such lower bounds would imply new circuit lower bounds, using the results of Strassen [Str73] and Raz [Raz10]. Our results give a hitting set with , and we leave open whether further improvements are possible.

We will mention the definitions and preliminaries of rank-metric codes in Section 8.

3.1 Paper Outline

We briefly outline the rest of the paper. In Section 4 we give our improved construction of rank-preserving matrices, which were first constructed by Gabizon-Raz [GR08]. In Section 5 we then use this construction to give our reduction from bivariate identity testing to univariate identity testing (Section 5.1), which then readily yields our hitting set for matrices (Section 5.2). In Section 5.3 we show an equivalent hitting set, which is more useful for low-rank-recovery.

Section 6 extends the above results to tensors, where Section 6.1 reduces -variate identity testing to univariate identity testing, and Section 6.2 uses this reduction to construction hitting sets for tensors. Finally, Section 6.3 shows how to extend these results to any field.

Low-rank recovery of matrices is discussed in Section 7. It is split into two parts. Section 7.1 shows how to decode dual Reed-Solomon codes, which we use as a sparse-recovery oracle. Section 7.2 shows how to, given any such sparse-recovery oracle, perform low-rank-recovery of matrices. Instantiating the oracle with dual Reed-Solomon codes gives our low-rank-recovery construction.

Section 8 shows how to extend our LRR algorithms to tensors, and how to use these results to construct rank-metric codes. Finally, Section 9 discusses some problems left open by this work.

4 Improved Construction of Rank-preserving Matrices

In this section we will give an improved version of the Gabizon-Raz lemma [GR08] on the construction of rank-preserving matrices. The goal is to transform an -dimensional subspace living in an -dimensional ambient space, to an -dimensional subspace living in an -dimensional ambient space. We will later show (see Theorem 5.1) how to use such a transformation to reduce the problem of PIT for matrices of rank to the problem of PIT for matrices of rank .

We first present the Gabizon-Raz lemma ([GR08], Lemma 6.1), stated in the language of this paper.

Lemma (Gabizon-Raz ([Gr08], Lemma 6.1)).

Let . Let be of rank . Define by . Then there are values such that .

Our version of this lemma gives a set of matrices parameterized by where there are only values of that lead to . This extra factor of allows us to achieve an -sized hitting set for matrices instead of a -sized hitting set. We comment more on the necessity of this improvement in Remark 5.3. We now state our version of this lemma. Our proof is very similar to that of Gabizon-Raz.

Theorem 4.1.

Let . Let be of rank . Let be a field extending , and let be an element of order . Define by . Then there are values such that .

Proof.

We will now treat as a variable, and thus refer to simply as . The matrix is an matrix, and thus the claim will follow from showing that is a non-zero polynomial in of degree . As , .

To analyze this determinant, we invoke the Cauchy-Binet formula.

Lemma (Cauchy-Binet Formula, Lemma a.1).

Let . Let , . For , let be the matrix formed from by taking the columns with indices in . Let be defined analogously, but with rows. Then

so that

For ,

By assumption the order of is , so the elements are distinct, implying that the above Vandermonde determinant is non-zero.

Further, we observe that . As , it follows that , and thus also.

We now show is not identically zero, as a polynomial in . We show this by showing that there is no cancellation of terms at the highest degree of . That is, there is a unique set maximizing subject to . This is proven by the following lemma.

Lemma 4.2.

Let . Let be a matrix of rank . For , denote as the matrix formed by taking the columns in (in order) whose indices are in . Denote . Then there is a unique set that maximizes subject to .

Proof.

The proof uses the ideas of the Steinitz Exchange Lemma. That is, recall the following facts in linear algebra. If sets are both sets of linearly independent vectors, and , then there is some such that is linearly independent. Thus, if are both sets of linearly independent vectors and then for any there is a vector such that is linearly independent.

Now suppose (for contradiction) that there are two different sets that maximize over the sets such that , so that . Pick the smallest index in the (non-empty) symmetric difference