Classical and Quantum Algorithms for Tensor Principal Component Analysis
We present classical and quantum algorithms based on spectral methods for a problem in tensor principal component analysis. The quantum algorithm achieves a quartic speedup while using exponentially smaller space than the fastest classical spectral algorithm, and a super-polynomial speedup over classical algorithms that use only polynomial space. The classical algorithms that we present are related to, but slightly different from those presented recently in Ref. wein2019kikuchi (). In particular, we have an improved threshold for recovery and the algorithms we present work for both even and odd order tensors. These results suggest that large-scale inference problems are a promising future application for quantum computers.
Principal component analysis is a fundamental technique that finds applications in reducing the dimensionality of data and denoising. While an optimal choice of principal components for a matrix can be computed efficiently using linear algebra, the corresponding problem for tensors is much less well-understood. Ref. richard2014statistical () introduced a simple statistical model for tensor principal component analysis, termed the “spiked tensor” problem, and this paper has lead to a large amount of follow-up research. The model consists of (see below for more precise definitions) randomly choosing some unknown “signal vector” ; then, the -th order tensor
is formed, where is noise chosen from some random distribution and where is some scalar representing a signal-to-noise ratio. One task called recovery is to infer (to some accuracy) given . A simpler task called detection is to distinguish the case from for some , again just given .
Ref. richard2014statistical () presented a variety of algorithms for this problem. Following the normalization of Ref. wein2019kikuchi (), the entries of are chosen independently from a Gaussian distribution of zero mean and unit variance, with . Then, information theoretically, it is possible to recover for much larger than richard2014statistical (); Lesieur_2017 (). However, no polynomial time algorithm is known that achieves this performance. Rather, the two best known algorithms are spectral and sum-of-squares. Spectral algorithms were first suggested in Ref. richard2014statistical (). There, a matrix is formed from (if is even, the matrix is -by-, with its entries given by entries of ) and the leading eigenvector of the matrix is used to determine . For even , this method works for much larger than , and a variant of it is conjectured to perform similarly for odd . Methods based on the sum-of-squares also perform similarly to the spectral method. The sum-of-squares method hopkins2015tensor (); Hopkins_2016 () for this problem gives rise to a sequence of algorithms tradeoff (); tradeoff2 (), in which one can recover at smaller than at the cost of runtime and space increasing exponentially in . In Ref. wein2019kikuchi (), a sequence of spectral algorithms with similar performance was shown.
In this paper, we present another spectral algorithm for the problem. Our spectral algorithm for even is closely related to that of Ref. wein2019kikuchi () which we became aware of while this paper was in preparation, and we use the normalization in that paper. However, we make several changes. Our main technical results are the following. First, we prove an improved threshold for recovery for even using our algorithm; the improvement is by a constant factor and relies on a randomized method of recovering. Second, we provide a different algorithm for odd with provable guarantees on while no guarantees were given in Ref. wein2019kikuchi () for odd . For both even and odd , we have provable bounds on recovery for of order (without any polylogarithmic factors) and we have a sequence of algorithms similar to that above for small compared to . Third, we give a quantum algorithm for our spectral method, achieving a quartic speedup and exponential reduction in space. This quantum algorithm involves two main ideas. The first uses phase estimation and amplitude amplification to obtain a quadratic speedup in computing the largest eigenvector. The second idea uses a chosen input state to obtain a further quadratic speedup, giving the overall quartic speedup.
We emphasize that the quantum speedup is quartic compared to classical spectral algorithms presented here and in previous work. We are not able to make an accurate comparison of the runtime to sum-of-squares methods. In part, given that the runtime of all of these algorithms increases exponentially in , a change in prefactors in some estimates for threshold can give rise to polynomial changes in runtime. We expect that many of these estimates of thresholds are not tight (indeed, we expect that they are off by a polylogarithmic factor), and so either improved analytic methods or numerical simulations are needed to give an accurate comparison.
At a more heuristic level, we present a rather different motivation for our spectral algorithm compared to Ref. wein2019kikuchi (). Rather than being motivated by the so-called Kikuchi free energy, we instead are motivated by mean-field approximations to quantum many-body systems. We consider a system of a some number of qudits, each of dimension , and use the tensor to construct a quantum Hamiltonian on these qudits. Increasing gives rise to a similar sequence of algorithms as above, with increased runtime but improved performance: the required increases polynomially in as , but the runtime increases exponentially.
Restricting to the symmetric subspace, these qudits can be thought of as a system of bosons. In the case , for example, our Hamiltonian has pairwise interaction terms for all pairs of qudits. It is natural from the viewpoint of mean-field theory in physics then to expect that the leading eigenvector of the problem, for large , can be approximated by a product state. While the bounds for arbitrary pairwise Hamiltonians would require rather large for given in order for such a mean-field approximation to be accurate Werner_1992 (); Kraus_2013 (); B2016 (), we will be able to prove that for the statistical model above the mean-field approximation becomes accurate with high probablity at much smaller , depending upon the value of . In this mean-field regime, the product state is an -fold tensor product of a single particle state, and this single particle state is given by in an obvious way, regarding the vector as a vector in the single-particle Hilbert space. While we will not prove that this state is a good approximation to the leading eigenvector, it will be a good approximation to some state in an eigenspace with large eigenvalue. Then, the single particle density matrix allows one to infer (a similar matrix was used in Ref. wein2019kikuchi () where it was termed a voting matrix).
Classically, implementing this spectral algorithm requires high-dimensional linear algebra, in particular finding the leading eigenvector of a matrix of dimension . This makes it a natural candidate for a quantum algorithm. Since the Hamiltonian here is fairly simple, it can be simulated efficiently using standard techniques in the literature reviewed later. This allows us to give a simple algorithm based on preparing a random initial state and then phase estimating in an attempt to project onto the leading eigenvector. The probability of success in this projection is inverse in the dimension of the matrix, so this simple algorithm leads to no speedup over classical. However, we show that it is possible to apply amplitude amplification to give a quadratic speedup over classical. More surprisingly, we show that one can use the tensor to prepare an input state to the algorithm with improved overlap with the leading eigenvector, giving the quantum algorithm a quartic speedup over classical. Here, when comparing to classical we are considering classical algorithms based on the power method or similar algorithms such as Lanczos; these algorithms require exponential space while the quantum algorithm uses only polynomial space. We also consider classical algorithms based on ideas in Ref. aaronson2017complexity () which use polynomial space but the quantum algorithm is super-polynomially faster than these algorithms. We also present some minor improvements to the quantum algorithm which may be useful in practice.
1.1 Definitions, Random Ensembles, and Notation
Let us make some formal definitions. A tensor of order and dimension is a multi-dimensional array. The entries of the tensor are written where is an integer and each ranges from . Generalizing previous work on this problem, we consider two possible cases, one in which entries of a tensor are chosen to be real numbers, and one in which they may be complex numbers, so that either or ; we explain later the reason for this generalization; a tensor with all entries real will be called a real tensor. A symmetric tensor is one that is invariant under permutation of its indices. The symmetrization of a tensor is equal to times the sum of tensors given by permuting indices.
The spiked tensor model for given is defined as follows. Let be a vector in , normalized by , chosen from some probability distribution; this is the “signal vector”. Let be a real tensor of order with entries chosen from a Gaussian distribution with vanishing mean. We let as above, where is defined to be the tensor with entries
Here we use the notation that a subscript on a vector denotes an entry of that vector; we use a similar notation for matrices later.
Remark: some of the best sum-of-squares results are for a different distribution in which the entries of are chosen from a biased distribution on , rather than for the Gaussian distribution. We expect that using that distribution would not affect the results here too much, but we avoid treating that case also for simplicity.
Since the tensor is symmetric, of course it is natural to replace by its symmetrization. Indeed, no information can be lost by this replacement since given a tensor one can symmetrize the tensor, and then add back in Gaussian noise chosen to vanish under symmetrization to obtain a tensor drawn from the same distribution as was. That is, the cases in which is symmetrized or not can be reduced to each other.
A generalization of this problem is the case in which is chosen to have complex entries, with each entry having real and imaginary parts chosen from a Gaussian distribution with vanishing mean and variance . We refer to this as the complex ensemble, while we refer to the case where has real entries as the real ensemble; the choice of reducing the variance to is a convenient normalization for later. It is clear that since is real, the case of complex can be reduced to the real case (up to an overall rescaling for the different variance) simply by taking the real part of , and similarly the real case can be reduced to the complex case (again up to an overall rescaling) by adding Gaussian distributed imaginary terms to the entries of . We will see that for odd , at least for reasons of analyzing the algorithms, it is convenient not to symmetrize and to take complex , while for even this is not necessary. It may be possible to avoid doing this for odd (which may improve the detection and recovery threshold of the algorithm by constant factors) and we comment on this later.
We treat as fixed in the asymptotic notation, but consider the dependence on . So, throughout this paper, when we refer to a polynomial in , we mean a polynomial independent of the parameter . The polynomial may, however, depend on , such as .
We make additionally the following assumptions:
We assume that for some -dependent constant chosen sufficiently small. We will also assume that is for some -dependent constant .
Finally, we assume that . Remark: there is of course no reason to consider larger than this since simple spectral methods succeed if is , but we state this assumption explicitly as it simplifies some of the big-O notation.
We will explicitly state this Assumption 1 in all theorems where it is needed; the assumption will be implicit in the statement of the lemmas and will not be explicitly stated to avoid cluttering the statement of the results.
The first of these assumptions, that , is useful to simplify some of the statements of the results to avoid having to specify the allowed range of in each case. For example, we will say that a quantity such as is , meaning that we must take . We do not specify the exact value of but it can be deduced from the proofs if desired.
The second of these assumptions, that is , also helps simplify some of the statements of the results. Since we have assumed that and we will see that the required increases polynomially with , this assumed lower bound on is not a further restriction on our results.
We write to denote expectation values and to denote a probability. Usually these are expectation values or probabilities over choices of , though in some cases we consider expectation values over other random variables. We use to denote the operator norm of an operator, i.e., the largest singular value of that operator. We use to denote either the norm of a tensor or the norm of a quantum state. All logarithms are natural logarithms.
We use to denote inner products and use bra-ket notation both for vectors and for quantum mechanical states.
We will need to compute expectation values over random choices of and also compute expectation values of certain operators in quantum states, such as for some state and operator . We refer to the latter as a quantum expectation value of in state to distinguish it from an expectation value over random variables.
In section 2, we review some results on recovery and boosting from Ref. wein2019kikuchi () and present a randomized recovery procedure that will help in improving the recovery threshold. In section 3, we give spectral algorithms for the spiked tensor problem for the case of both even and odd . In that section, we present algorithms in terms of eigenvalues and eigenvectors of a matrix (more precisely, vectors in some eigenspace and quantum expectation values of operators in those vectors) that we call a Hamiltonian. We leave the method to compute these eigenvalues and expectation values for later in section 5, where we give classical and quantum algorithms for this and give time bounds for those algorithms. In section 4, we give some results on the spectra of random tensors needed for the analysis of these algorithms. A key idea here is reducing the case of a -th order tensor for odd to the case of a -th order tensor for even . One interesting corollary of this technique, see corollary 2, is that for odd and for the minimal value of we are able remove a logarithmic factor in some of the bounds (a similar logarithmic factor has been removed also using in hopkins2015tensor () using what they termed an “unfolding” algorithm). An appendix A gives an introduction to some techniques used to evaluate expectation values of tensors networks whose tensors are chosen from a Gaussian distribution; these techniques are used earlier in the paper. In section 6, we further discuss tensor networks and use this to consider limitations of certain algorithms and also to explain further some of the motivation for this algorithm. In section 7, we discuss some extensions of the results.
The proof of detection is in theorem 2 for the even case and 4 for the odd case. The proof of recovery is in theorem 3 for the even case and theorem 5 for the odd case. The runtime bound for the fastest quantum algorithm is in theorem 6. This theorem gives a quartic improvement in the runtime compared to the fastest classical spectral algorithm; more precisely the log of the runtime with the quantum algorithm divided by the log of the runtime of the classical algorithm approaches as at fixed .
In this section we discuss recovery and define randomized procedures for recovery that will be useful in boosting the threshold. Following the notation of Ref. wein2019kikuchi (), define the correlation between two vectors by a normalized overlap
The goal of an algorithm is to produce a vector with large . Note that we take the absolute value in the correlation, ignoring the sign. For even , the sign of is irrelevant, while for odd it is easy, given a guess of up to sign, to try both choices of sign and see which is most likely.
Strong recovery means that . Proposition 2.6 of Ref. wein2019kikuchi (), which is noted in that reference as being implicit in Ref. richard2014statistical (), shows how to “boost” a weaker correlation to strong recovery. It is shown that given a vector one can apply a single iteration of the the tensor power algorithm to obtain a new vector such that, with high probability,
where is a constant depending on . So, for any , if , we have .
Thus, it suffices to construct which outputs some vector which has, with high probability, . This is termed weak recovery; indeed, one can be satisfied with even weaker assumptions depending on the value of ; for close to , one may have polynomially small .
The spectral algorithms that we construct later will output a matrix that we will write . This matrix will be a positive semi-definite Hermitian matrix with trace . In physics terms, such a matrix is termed a density matrix. For sufficiently large , the leading eigenvector of the matrix will have a large overlap with . However, for smaller , we will still have some lower bound that, with high probability, . The randomized algorithm 1 below shows how to use this to obtain weak recovery. This randomized algorithm allows us to improve the threshold over the recovery method wein2019kikuchi () of simply using the leading eigenvector since it works even in some cases where the leading eigenvector has small correlation with .
Putting these results together we find that
Given an algorithm that, with high probability, outputs a density matrix with
then in polynomial time, with high probability, strong recovery is possible.
We will simply say that such an algorithm achieves recovery.
We present the algorithm for the case that the matrix may be complex; if the matrix is real, one can instead sample from a real Gaussian distribution and the proof of lemma 1 goes through will slightly different constants.
For algorithm 1, with probability at least ,
for some scalar .
We have . Hence, with probability at least we have that . We have and since is a Gaussian random variable with mean , with probability at least its absolute value is at least some positive constant (the exact constant can be deduced from the error function) times its standard deviation. Hence, the lemma follows for . ∎
3 Spectral Algorithm
We now give the spectral algorithms for the spiked tensor problem for the case of both even and odd . In subsection 3.1, we define a Hamiltonian given an arbitrary tensor . Then, in subsection 3.2, we present the spectral algorithm in terms of .
For even , the Hamiltonian that we present is very similar to the matrix given in Ref. wein2019kikuchi () but it has some minor differences. In our language (explained below), this matrix is obtained by projecting our Hamiltonian of Eq. (5) into the subspace of the “symmetric subspace” spanned by with all distinct. The relative reduction in the size of the matrix is only in the limit of large .
Also, in our method, we have an rotational symmetry of the basis which is very useful in analysis, for example showing that the eigenvalues of are independent of choice of . For the matrix of wein2019kikuchi (), this is not obvious to us and we do not fully understand the claimed behavior of the largest eigenvalue in that case. We will use a different notation, using creation and annihilation operators, which will help make this rotational symmetry more explicit.
For odd , the Hamiltonian that we use is unrelated to that of Ref. wein2019kikuchi ().
3.1 Hamiltonian Definition
For even , given a tensor we define an linear operator that we call a Hamiltonian as follows. This Hamiltonian is a linear operator on a vector space or , for integer chosen below. We write basis elements of this space as , and we call this space the full Hilbert space. We define
where the sum is over distinct so that there are terms in the sum and where means adding the Hermitian conjugate of the given terms, so that is Hermitian and where denotes the operator on qudit . We require that or else is trivially zero.
Note of course that if is real and symmetric, then the term is already Hermitian. can be regarded as a Hamiltonian acting on a space of qudits, each of dimension , and with interaction between sets of particles at a time.
Even if is not symmetrized, is unchanged if one applies an arbitrary permutation to the first indices of and applies the same permutation to the last indices of .
We may restrict to the symmetric subspace of this Hilbert space. We write to indicate the dimension of this subspace. For , we can approximate .
Within the symmetric subspace, we can write this Hamiltonian in a so-called “second-quantized” form:
This replacement by a second-quantized Hamiltonian is simply a convenient notation. The operators are bosonic creation and annihilation operators, obeying canonical commutation relations . See appendix B for a brief review of this formalism. We restrict to the subspace with a total of bosons, i.e., we define the number operator by
and restrict to An orthonormal basis of states for this symmetric subspace is given by all states equal to some normalization constant multiplying , where is the vacuum state (i.e., the state annihilated by for all ), and where is some sequence.
The second quantized Hamiltonian for the symmetric subspace is unchanged under arbitrary permutation of the first indices of and arbitrary (not necessarily the same) permutation of the last indices of .
For odd , we define the Hamiltonian as follows. Given a tensor of odd order , define a new tensor of even order with components
Now we require that as otherwise is trivially zero. For this Hamiltonian, it is convenient to take from the complex ensemble because, as we explain more below, it makes equal to zero, as well as canceling out certain terms in higher order moments, making the proof of the spectral properties of simpler. We discus later to what extent we can avoid using the complex ensemble.
3.2 Spectral Algorithms
The spectral algorithm for detection and recovery is algorithm 2. In this subsection we prove correctness of this algorithm, using statistical properties of proven later.
This algorithm uses quantities and defined later; roughly is an upper bound on the eigenvalues of and is the largest eigenvalue of . The algorithm can then achieve detection by verifying that the largest eigenvalue is significantly larger than , which occurs when is large enough. Indeed, we will see that it suffices to have for any fixed (some results can be extended to the case that decays slowly with but we omit this for brevity). This fixes the scaling of with so that we need (up to polylogarithmic factors) .
One interesting feature of the algorithm is that in step 3, we compute the density matrix of the leading eigenvector or of any vector in the eigenspace of eigenvalue , for defined in the algorithm. This might seem surprising, given that the leading eigenvector is computed in step 1; one might wonder why some other vector should be taken. We describe the algorithm in this way since, in later classical and quantum algorithms that we give to compute the spectral properties of the matrix, we might not extract the leading eigenvector but instead extract only some vector in this eigenspace due to use of the power method in a classical algorithm or due to approximations in phase estimation in a quantum algorithm. Thus, much of our analysis is given to showing that some eigenvalue larger than exists by lower bounding the leading eigenvalue , but given that some such eigenvalue exists, we do not worry too much about exactly what mixture of eigenvectors in the given eigenspace we compute.
Compute the eigenvector of and the leading eigenvalue, denoted .
where for even , and for odd , and where is defined in theorem 1, then report that . Otherwise report .
(Recovery) Compute the single particle density matrix (defined below) of the leading eigenvector or of any vector in the eigenspace of eigenvalue . Apply algorithm 1 to recover an approximation to .
In section 4, we consider the largest eigenvalue of and show the following theorem which summarizes the results of lemmas 5,7. Roughly, up to prefactors, the result for odd is given by considering the result for a -th order tensor for even and then multiplying the largest eigenvalue by a factor of .
Let be the largest eigenvalue of . For even let
and for odd let
where is a scalar depends that implicitly on and tends to some function depending only on for large . More precisely, for even , is equal to for the real ensemble and is twice that for the complex ensemble, and for odd , is equal to that for the even case for .
Then, for any , assuming Assumption 1,
with for even
and for odd
So, for any which is , with high probability .
Now consider the eigenvectors and eigenvalues of . For any symmetric tensor of order , let be the vector on qudits (each of dimension ) with amplitudes given by the entries of the tensor in the obvious way:
This vector is only normalized if . So, is a normalized vector.
We have the following simple property:
Let be the largest eigenvalues of . Then, .
Immediate from the variational principle. ∎
We now show correctness of the algorithm. All results in this subsubsection refer to even even if we do not state it explicitly. First we estimate of lemma 2 to show detection. Then, we show recovery.
To evaluate it is convenient to exploit a rotational invariance of this problem. We can apply a rotation using any matrix in the orthogonal group , rotating the creation and annihilation operators by making the replacement
This rotation preserves the canonical commutation relations and is equivalent to rotating the basis states on each qudit by . To preserve the Hamiltonian , we rotate each leg of the tensor :
This rotation preserves the Gaussian measure on but changes . So, we can rotate so that is some fixed vector, say so . Then is equal to multiplied by a single entry of , i.e., the entries with all indices equal to , which is some quantity chosen from a Gaussian distribution with zero mean and unit variance. So, with high probability, for any .
This follows from the estimate of which lowers bounds the largest eigenvalue when and from theorem 1 which upper bounds . ∎
Given any state, define the single particle density matrix in the basis for the full Hilbert space used in Eq. (5) to be the reduced density matrix on any of the qudits. Equivalently, this single particle density matrix can be expressed in terms of creation and annihilation operators by
Note that is positive semi-definite, Hermitian, and trace .
Let Assumption 1 hold. Given any vector such that for any scalar , then with high probability the corresponding single particle density matrix obeys
In particular, for with we have so with high probability
Hence, algorithm 2 achieves recovery.
We have . By theorem 1, with high probability . Hence, with high probability .
Rotate so that . Then, the Hamiltonian is diagonal in the computational basis for the full Hilbert space. Let be the probability, for state , that exactly out of qudits are in state . Then, . ∎
Remark: more generally, the same result holds for any mixed state: given any mixed state such that for any scalar , then with high probability the corresponding single particle density matrix obeys Eq. (17).
We now show correctness of the algorithm for odd . All results in this subsubsection refer to odd even if we do not state it explicitly.
Let us first estimate to show detection. We have , but now is a quadratic function of so there are some cross-terms. We have
The cross-term is
Exploiting the same rotational invariance as in the even case, this is equal to multiplied by the real part of a single entry of , i.e., the entry with all indices equal to . So, with high probability, this cross-term is bounded by for any .
The term quadratic in is
Exploiting the same rotational invariance as before, fixing , we must have all . So, this is a sum of squares of different entries of , corresponding to possible choices of ; since we use the complex ensemble, this has vanishing mean. So, with high probability this term is for any .
So, with high probability,
for any .
We now consider recovery. We will first give a more general bound on cross-terms that will be useful. We have
Let us first bound the cross-terms.
With high probability, the operator norm of the cross-terms is bounded by .
The cross-terms are equal to where is a tensor for even which has components
Rotating so that , we have
Clearly, where is an -by- matrix, whose entries are determined by the entries of in an obvious way so that the first indices of determine a column index in and the last indices of determine a row index of . Regard as being the entries of some tensor of order and let by the -by- matrix whose entries are determined by the entries of this tensor, again in the obvious way so that the first indices of the tensor determine a column index and the last indices determine a row index.
However, since the entries of are independent Gaussian random variables, it follows from standard random matrix theory results mehta2004random () that with high probability . Remark: the dependence on is not tight here. ∎
So as in the even case, we have
Let Assumption 1 hold. Given any vector such that