Computing tensor Zeigenvectors with dynamical systems
Abstract
We present a new framework for computing Zeigenvectors of general tensors based on numerically integrating a dynamical system that must converge to a Zeigenvector. Our motivation comes from our recent research on spacey random walks, where the longterm dynamics of a stochastic process are governed by a dynamical system that must converge to a Zeigenvectors of a given transition probability tensor. Here, we apply the ideas more broadly to general tensors and find that our method can compute Zeigenvectors that algebraic methods like the higherorder power method cannot compute.
remarkRemark \newsiamremarkobservationObservation \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim
1 Tensor eigenvectors
Computing matrix eigenvalues is a classic problem in numerical linear algebra and scientific computing. Given a square matrix , the goal is to find a vectorscalar pair with satisfying
(1) 
The pair is called the eigenpair, the eigenvector, and the eigenvalue. After several decades of research and development, we have, by and large, reliable methods and software for computing all eigenpairs of a given matrix . (Experts will, of course, be aware of exceptions, but we hope they would agree with the general sentiment of the statement.)
In numerical multilinear algebra, there are analogous eigenvector problems (note the plurality).
For example, given a threemode cubic tensor (here meaning that is a multidimensional
array with entries ,
We use the “” and “” terminology instead of “” and “”. Both  and eigenvectors are defined for tensors with the dimension equal in all modes (such a tensor is called cubic [7]), and the definitions can be derived by showing that the eigenpairs are KKT points of a variational form for a generalization of a Rayleigh quotient to tensors [18]. One key difference between the types is that eigenvectors are scale invariant, while eigenvectors are not—this is why we put a norm constraint on the vector. Specifically, if we ignore the norm constraint and scale by a constant, the corresponding eigenvalue would change; for eigenpairs it is easy to see that this is not the case. If is symmetric, then it has a finite set of eigenvalues and moreover, there must be a real eigenpair where the tensor order is odd [6].
This paper presents a new framework for computing eigenpairs. Tensor eigenvectors show up in a variety of applications, including evolutionary biology [5, 20], lowrank factorizations and compression [1, 12, 16, 21], signal processing [11, 15], quantum geometry [10, 27], medical imaging [25], and data mining [3, 9, 28]. These eigenpairs can be computed with a Lassere type semidefinite programming hierarchy [8, 21, 22], although the scalability of such methods is limited. Thus, we still lack robust and scalable generalpurpose methods for computing these eigenvectors.
We introduce two special cases of tensor contractions that will be useful:

The tensor apply takes a cubic tensor and a vector and produces a vector:

The tensor collapse takes a cubic tensor and a vector and produces a matrix:
For the tensor collapse operator, the “:” symbol signifies taking all entries along that index, so is a square matrix. We note that the tensor may not be symmetric, but we are always contracting onto the first mode (tensor apply) or first and second modes (tensor collapse). We assume that has been permuted in the appropriate manner for the problem at hand. With this notation, it is easy to see that the tensor eigenvector problem reduces to the following analog of Eq. 1:
(2) 
The crux of our computational method is based on the following observation that relates tensor and matrix eigenvectors. {observation} A tensor eigenvector of an mode tensor must be a matrix eigenvector of the collapsed matrix , i.e.,
(3) 
The trick, of course, is that the matrix itself depends on the tensor eigenvector we want to compute. Therefore, we still have a nonlinear problem.
2 A new method for computing tensor eigenvectors
Section 1 provides a different perspective on the tensor eigenvector problem. Specifically, the tensor eigenvector problem is actually a matrix eigenvector problem, but for some unknown matrix. Our computational approach is based on the following continuoustime dynamical system, for reasons that we will make clear in Section 2.2:
(4) 
where is some fixed map that takes as input a matrix and produces as output some prescribed eigenvector of the matrix with unit norm. For a matrix , could be defined to compute several objects:

the eigenvector of with smallest/largest magnitude eigenvalue

the eigenvector of with smallest/largest algebraic eigenvalue

the eigenvector of with th smallest/largest magnitude eigenvalue

the eigenvector of with th smallest/largest algebraic eigenvalue
We resolve the ambiguity in the sign of the eigenvector by picking the sign based on the first element. In the case of multiple eigenvectors sharing an eigenvalue, we propose using the closest eigenvector to , although we have not evaluated this technique.
Proposition 1
Let be a prescribed map from a matrix to one of its eigenvectors. Then if the dynamical system in Eq. 4 converges to a nonzero solution, it must converge to a tensor eigenvector.
Proof
If the dynamical system converges, then it converges to a stationary point. Any stationary point has zero derivative, so
One must be a bit careful with the input and output values of . If is not symmetric, then might not be diagonalizable; thus we may have to deal with complex eigenvalues. To keep the dynamical system realvalued, one could always modify the map to output the real part. However, the tensor need not be symmetric (nor normal for all ) for the dynamical system to maintain real values. In fact, our motivation for this dynamical system comes from a tensor that is not necessarily symmetric, which we will discuss in Section 2.2.
Proposition 1 leads to a broad framework for computing eigenvectors:

Choose a map and a numerical integration scheme.

Numerically integrate Eq. 4.
Different choices of may converge to different eigenvectors and different numerical integration schemes may lead to different convergence properties. Figure 1 shows a concrete example, where picks the eigenvector corresponding to the largest eigenvalue in magnitude and the integration scheme is the forward Euler method.
2.1 Forward Euler convergence with diagonal tensors
As an illustrative example, we consider the special case of using the forward Euler numerical integration scheme for computing the tensor eigenvalues of an dimensional, mode diagonal tensor . Without loss of generality, assume that the diagonal entries of are decreasing so that ordered so that if . This tensor has eigenpairs: for , where is the th standard basis vector. Suppose that we want to compute the th eigenvector and set to map to the largest algebraic eigenvalue (with unit norm). Let be the residual at the th iteration. If the step size is , then
(5)  
(6) 
Thus, the forward Euler scheme converges if and converges in one step if .
2.2 Motivating the dynamical system with spacey random walks
The motivation for the dynamical system comes from our previous analysis of a stochastic process known as the “spacey random walk” that relates tensor eigenvectors of a particular class of tensors to a stochastic process [4]. Specifically, the class of tensors are irreducible transition probability tensors (any tensor with for ). For simplicity, we will now discuss a threemode transition probability tensor , where the entries can be interpreted as coming from a secondorder Markov chain; the entry is the probability of transitioning to state given that the last two states were and . Due to the theory of Li and Ng [17], there exists a tensor eigenvector with eigenvalue satisfying
(7) 
The vector is stochastic, but it does not represent the stationary distribution of a Markov chain. Instead, we showed that is the limiting distribution of a nonMarkovian, generalized vertexreinforced random walk [2] that we call the spacey random walk [4]. In the th step of a spacey random walk, after the process has visited states , it spaces out and forgets its second last state (that is, the state ). It then invents a new history state by randomly drawing a past state . Finally, it transitions to via the secondorder Markov chain represented by as if its last two states were and , i.e., it transitions to with probability . (In contrast, a true secondorder Markov chain would transition with probability .)
Using results from Benaïm [2], we showed that the longterm dynamics of the spacey random walk for an mode transition probability tensor are governed by the following dynamical system [4]:
(8) 
where is a map that takes a columnstochastic transition matrix and maps it to the Perron vector of the matrix. In other words, if the spacey random walk converges, it must converge to an attractor of the dynamical system in Eq. 8. The dynamical system in Eq. 8 is a special case of the more general system in Eq. 4, where the map picks the eigenvector with largest algebraic eigenvalue (the Perron vector), and the tensor has certain structural properties (it is a transition probability tensor).
To summarize, our prior work studied a specific case of the general dynamical system in Eq. 4 to understand the stochastic process behind principal eigenvectors of transition probability tensors. The general dynamical system provides a new framework for computing general tensor eigenvectors—if the dynamical system in Eq. 4 converges, then it converges to a tensor eigenvector. The dynamical system may not converge [23], but it usually does in practice, as we will see in Section 3.
2.3 Relationship with the shifted higherorder power method
The shifted higherorder power method [14] can be derived by noticing that
(9) 
for any eigenpair. This yields the iteration
(10) 
for any shift parameter (the case where is just the “higherorder power method” [12, 16, 26]). Kolda and Mayo showed that when is symmetric, the iterates in Eq. 10 converge monotonically to a tensor eigenvector given an appropriate shift .
If for some transition probability tensor and we are interested in the case when , then one can also derive these iterates by the dynamical system
(11) 
(c.f. Eq. 8). If this dynamical system converges (), then , and is a tensor eigenvector with eigenvalue . If we numerically integrate Eq. 11 using the forward Euler method with step size and any starting vector satisfying and , then the iterates are
(12)  
(13) 
which are the same as the shifted higherorder power method iterates in Eq. 10. The last equality comes from the fact that and by a simple induction argument: the base case holds by the initial conditions and
(14) 
since and are both stochastic vectors.
For a general tensor , we can use the dynamical system
(15) 
If , then , so is a eigenvector of with eigenvalue . Now suppose that we numerically integrate the dynamical system in Eq. 15 by

taking a forward Euler step to produce the iterate ; and

projecting onto the unit sphere by .
If the step size of the forward Euler method is , then
(16) 
since . The projection onto the unit sphere then gives the shifted higherorder power method iterates in Eq. 10.
3 Numerical examples
We now demonstrate our method on two tensors used in prior work. Section 3.1 shows that our approach can compute all eigenvalues of a specific tensor, while the (shifted) higherorder power method cannot compute all of the eigenvalues. Section 3.2 shows verifies that our approach can compute all eigenvalues of a tensor whose eigenvalues were found with semidefinite programming.
3.1 Example 3.6 from Kolda and Mayo [14]
Our first test case is a symmetric tensor from Kolda and Mayo [14, Example 3.6]:
The tensor has 7 eigenvalues, which Kolda and Mayo classify as “positive stable”, “negative stable”, or “unstable” (see Fig. 2, top), corresponding to positive definiteness, negative definiteness, or indefiniteness of the projected Hessian of the Lagrangian of their optimization function [14]. (Note that since the tensor has odd number of modes, eigenvalues are only defined up to sign.) Kolda and Mayo showed that their shifted symmetric higherorder power method (SSHOPM), a generalization of the symmetric higherorder power method (SHOPM) [12, 16, 26], only converges to eigenvectors of the positive or negative stable eigenvalues.
Of the 7 eigenpairs for the above tensor, 3 are unstable. Our dynamical systems approach can compute all 7 eigenpairs, using 5 variations of the dynamical system:

maps to the eigenvector with largest magnitude eigenvalue;

maps to the eigenvector with smallest magnitude eigenvalue;

maps to the eigenvector with largest algebraic eigenvalue;

maps to the eigenvector with smallest algebraic eigenvalue; and

maps to the eigenvector with second smallest algebraic eigenvalue.
We used the forward Euler method with step size set to 0.5 in order to compute the eigenvalues. Empirically, convergence is fast, requiring 10 or fewer iterations (see the bottom row of Fig. 2). We note that one can also also compute these eigenvectors with a Lassere type semidefinite programming hierarchy [8, 21, 22], although the scalability of such methods is limited. We provide numerical results from a tensor in this literature in the next section.
3.2 Example 4.11 from Cui, Dai, and Nie [8]
Our second test case is a symmetric tensor from Cui, Dai, and Nie [8, Example 4.11]:
The tensor has 3 eigenvalues (again, the tensor has an odd number of modes, so the eigenvalues are only defined up to sign). We use the same 5 variations of our algorithm to compute the eigenpairs (Fig. 3). Again, we are able to compute all of the eigenvalues of the tensor, and convergence is rapid.
4 Discussion
Scalable methods for computing tensor eigenvectors remain a challenge. Our new framework for computing eigenvectors offers insights through two observations. First, a tensor eigenvector is a matrix eigenvector of some matrix. However, the matrix is obtained by applying the tensor collapse operator with the eigenvector itself. Second, for a certain class of tensors where eigenvectors have a stochastic interpretation, the dynamical system in Eq. 4 is the one that governs the longterm dynamics of the stochastic process. Nevertheless, the same dynamical system seems to work for more general tensors. In fact, we can compute tensor eigenvectors that the (shifted) higherorder power method cannot. However, one major challenge is knowing what map to choose—different choices lead to different eigenvectors and there is no immediate relationship between them for general tensors. The structure of the tensor can offer some guidance; for the special case of a transition probability tensor and eigenvalue , the right map is the one that outputs the Perron vector of a column stochastic matrix. Finally, our method is not immediately applicable to eigenvectors because Section 1 no longer holds. Adapting our methodology to this class of eigenvectors is an area for future research.
Our framework came from relating tensor eigenvectors to stochastic processes. This is quite different from the core ideas in the tensor literature, which are firmly rooted in algebraic generalizations. We hope that our ideas encourage the use of stochastics and other areas of applied mathematics in solving difficult tensor problems.
Acknowledgments
We thank Brad Nelson for providing valuable feedback. ARB is supported by NSF TRIPODS Award #1740822. DFG is supported by NSF award CCF1149756, IIS1422918, IIS1546488, the NSF Center for Science of Information STC, CCF0939370, DARPA SIMPLEX, and the Sloan Foundation.
Footnotes
References
 A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773–2832, 2014.
 M. Benaïm. Vertexreinforced random walks and a conjecture of Pemantle. The Annals of Probability, 25(1):361–392, 1997.
 A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higherorder network structures. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 118–126, 2015.
 A. R. Benson, D. F. Gleich, and L.H. Lim. The spacey random walk: a stochastic process for higherorder data. SIAM Review, 59(2):321–345, 2017.
 D. A. Bini, B. Meini, and F. Poloni. On the solution of a quadratic vector equation arising in markovian binary trees. Numerical Linear Algebra with Applications, 18(6):981–991, 2011.
 D. Cartwright and B. Sturmfels. The number of eigenvalues of a tensor. Linear Algebra and its Applications, 438(2):942–952, 2013.
 P. Comon, G. Golub, L.H. Lim, and B. Mourrain. Symmetric tensors and symmetric tensor rank. SIAM Journal on Matrix Analysis and Applications, 30(3):1254–1279, 2008.
 C.F. Cui, Y.H. Dai, and J. Nie. All real eigenvalues of symmetric tensors. SIAM Journal on Matrix Analysis and Applications, 35(4):1582–1601, 2014.
 D. F. Gleich, L.H. Lim, and Y. Yu. Multilinear PageRank. SIAM Journal on Matrix Analysis and Applications, 36(4):1507–1541, 2015.
 S. Hu, L. Qi, and G. Zhang. Computing the geometric measure of entanglement of multipartite pure states by means of nonnegative tensors. Physical Review A, 93(1), 2016.
 E. Kofidis and P. A. Regalia. Tensor approximation and signal processing applications. Contemporary Mathematics, 280:103–134, 2001.
 E. Kofidis and P. A. Regalia. On the best rank1 approximation of higherorder supersymmetric tensors. SIAM Journal on Matrix Analysis and Applications, 23(3):863–884, 2002.
 T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
 T. G. Kolda and J. R. Mayo. Shifted power method for computing tensor eigenpairs. SIAM Journal on Matrix Analysis and Applications, 32(4):1095–1124, 2011.
 L. Lathauwer. Signal processing based on multilinear algebra. PhD thesis, Katholieke Universiteit Leuven, 1997.
 L. D. Lathauwer, B. D. Moor, and J. Vandewalle. On the Best Rank and Rank(, , , ) Approximation of HigherOrder Tensors. SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000.
 W. Li and M. K. Ng. On the limiting probability distribution of a transition probability tensor. Linear and Multilinear Algebra, Online:1–24, 2013.
 L.H. Lim. Singular values and eigenvalues of tensors: a variational approach. In 1st IEEE International Workshop on Computational Advances in MultiSensor Adaptive Processing, pages 129–132, 2005.
 L.H. Lim. Tensors and hypermatrices. In Handbook of Linear Algebra, Second Edition, chapter 15, pages 231–260. Chapman and Hall/CRC, 2013.
 B. Meini and F. Poloni. A perron iteration for the solution of a quadratic vector equation arising in markovian binary trees. SIAM Journal on Matrix Analysis and Applications, 32(1):248–261, 2011.
 J. Nie and L. Wang. Semidefinite relaxations for best rank1 tensor approximations. SIAM Journal on Matrix Analysis and Applications, 35(3):1155–1179, 2014.
 J. Nie and X. Zhang. Real eigenvalues of nonsymmetric tensors. Computational Optimization and Applications, 70(1):1–32, 2017.
 J. Peterson. Personal communication, 2018.
 L. Qi. Eigenvalues of a real supersymmetric tensor. J. Symb. Comput., 40(6):1302–1324, 2005.
 L. Qi, Y. Wang, and E. X. Wu. eigenvalues of diffusion kurtosis tensors. Journal of Computational and Applied Mathematics, 221(1):150–157, 2008.
 P. A. Regalia and E. Kofidis. The higherorder power method revisited: convergence proofs and effective initialization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2000.
 T.C. Wei and P. M. Goldbart. Geometric measure of entanglement and applications to bipartite and multipartite quantum states. Physical Review A, 68(4), 2003.
 T. Wu, A. Benson, and D. F. Gleich. General tensor spectral coclustering for higherorder data. In Advances in Neural Information Processing Systems, pages 2559–2567, 2016.