Computing tensor Z-eigenvectors with dynamical systems

# Computing tensor Z-eigenvectors with dynamical systems

Austin R. Benson Department of Computer Science, Cornell University ().    David F. Gleich Department of Computer Science, Purdue University ().
###### Abstract

We present a new framework for computing Z-eigenvectors of general tensors based on numerically integrating a dynamical system that must converge to a Z-eigenvector. Our motivation comes from our recent research on spacey random walks, where the long-term dynamics of a stochastic process are governed by a dynamical system that must converge to a Z-eigenvectors of a given transition probability tensor. Here, we apply the ideas more broadly to general tensors and find that our method can compute Z-eigenvectors that algebraic methods like the higher-order power method cannot compute.

\newsiamremark

remarkRemark \newsiamremarkobservationObservation \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim

## 1 Tensor eigenvectors

Computing matrix eigenvalues is a classic problem in numerical linear algebra and scientific computing. Given a square matrix , the goal is to find a vector-scalar pair with satisfying

 \mA\vx=λ\vx. (1)

The pair is called the eigenpair, the eigenvector, and the eigenvalue. After several decades of research and development, we have, by and large, reliable methods and software for computing all eigenpairs of a given matrix . (Experts will, of course, be aware of exceptions, but we hope they would agree with the general sentiment of the statement.)

In numerical multilinear algebra, there are analogous eigenvector problems (note the plurality). For example, given a three-mode cubic tensor (here meaning that is a multi-dimensional array with entries , 111In true pure mathematical parlance, this is the definition of a hypermatrix, not a tensor (see the book chapter by Lim for the precise definitions [19].) However, “tensor” has become synonymous with a multi-dimensional array of numbers [13], and we adopt this terminology.), the two most common tensor eigenvector problems are:

 Z-eigenvectors~{}\@@cite[cite]{[% \@@bibref{}{Qi-2005-Z-eigenvalues}{}{}]}H-eigenvectors~{}% \@@cite[cite]{[\@@bibref{}{Qi-2005-Z-eigenvalues}{}{}]}l2-eigenvectors~{}\@@cite[cite]{[\@@bibref{}{Lim-2005-eigenvalues}{% }{}]}lk-eigenvectors~{}\@@cite[cite]{[\@@bibref{}{Lim-% 2005-eigenvalues}{}{}]}∑jk\celTi,j,kxjxk=λxi,1≤i≤n∑jk\celTi,j,kxjxk=λx2i,1≤i≤n\normof[2]\vx=1\vx≠0

We use the “” and “” terminology instead of “” and “”. Both - and -eigenvectors are defined for tensors with the dimension equal in all modes (such a tensor is called cubic [7]), and the definitions can be derived by showing that the eigenpairs are KKT points of a variational form for a generalization of a Rayleigh quotient to tensors [18]. One key difference between the types is that -eigenvectors are scale invariant, while -eigenvectors are not—this is why we put a norm constraint on the vector. Specifically, if we ignore the norm constraint and scale by a constant, the corresponding eigenvalue would change; for -eigenpairs it is easy to see that this is not the case. If is symmetric, then it has a finite set of -eigenvalues and moreover, there must be a real eigenpair where the tensor order is odd [6].

This paper presents a new framework for computing -eigenpairs. Tensor -eigenvectors show up in a variety of applications, including evolutionary biology [5, 20], low-rank factorizations and compression [1, 12, 16, 21], signal processing [11, 15], quantum geometry [10, 27], medical imaging [25], and data mining [3, 9, 28]. These eigenpairs can be computed with a Lassere type semidefinite programming hierarchy [8, 21, 22], although the scalability of such methods is limited. Thus, we still lack robust and scalable general-purpose methods for computing these eigenvectors.

We introduce two special cases of tensor contractions that will be useful:

1. The tensor apply takes a cubic tensor and a vector and produces a vector:

 three-mode tensor\vy=\cmT\vx2yi=∑j,k\celTi,j,kxjxkm-mode tensor\vy=\cmT\vxm−1yi=∑i2,…,in\celTi1,…,imxi2⋯xim
2. The tensor collapse takes a cubic tensor and a vector and produces a matrix:

 three-mode tensor\mY=\cmT[\vx]1\mY=∑k\celT:,:,kxkYij=∑k\celTi,j,kxkm-mode tensor\mY=\cmT[\vx]m−2\mY=∑i3,…,im\celT:,:,i3,…imxi3⋯ximYij=∑i3,…,im\celTi,j,i3,…,imxi3⋯xim.

For the tensor collapse operator, the “:” symbol signifies taking all entries along that index, so is a square matrix. We note that the tensor may not be symmetric, but we are always contracting onto the first mode (tensor apply) or first and second modes (tensor collapse). We assume that has been permuted in the appropriate manner for the problem at hand. With this notation, it is easy to see that the tensor -eigenvector problem reduces to the following analog of Eq. 1:

 \cmT\vxm−1=λ\vx,\normof[2]\vx=1. (2)

The crux of our computational method is based on the following observation that relates tensor and matrix eigenvectors. {observation} A tensor -eigenvector of an -mode tensor must be a matrix eigenvector of the collapsed matrix , i.e.,

 \cmT\vxm−1=λ\vx⟺\cmT[\vx]m−2\vx=λ\vx. (3)

The trick, of course, is that the matrix itself depends on the tensor eigenvector we want to compute. Therefore, we still have a nonlinear problem.

## 2 A new method for computing tensor Z-eigenvectors

Section 1 provides a different perspective on the tensor -eigenvector problem. Specifically, the tensor -eigenvector problem is actually a matrix eigenvector problem, but for some unknown matrix. Our computational approach is based on the following continuous-time dynamical system, for reasons that we will make clear in Section 2.2:

 d\vxdt=Λ(\cmT[\vx]m−2)−\vx, (4)

where is some fixed map that takes as input a matrix and produces as output some prescribed eigenvector of the matrix with unit norm. For a matrix , could be defined to compute several objects:

1. the eigenvector of with smallest/largest magnitude eigenvalue

2. the eigenvector of with smallest/largest algebraic eigenvalue

3. the eigenvector of with th smallest/largest magnitude eigenvalue

4. the eigenvector of with th smallest/largest algebraic eigenvalue

We resolve the ambiguity in the sign of the eigenvector by picking the sign based on the first element. In the case of multiple eigenvectors sharing an eigenvalue, we propose using the closest eigenvector to , although we have not evaluated this technique.

###### Proposition 1

Let be a prescribed map from a matrix to one of its eigenvectors. Then if the dynamical system in Eq. 4 converges to a non-zero solution, it must converge to a tensor -eigenvector.

###### Proof

If the dynamical system converges, then it converges to a stationary point. Any stationary point has zero derivative, so

 d\vxdt=0⟺Λ(\cmT[\vx]m−2)=\vx ⟺\cmT[\vx]m−2\vx=λ\vx for some λ that % depends on Λ ⟺\cmT\vxm−1=λ\vx.

One must be a bit careful with the input and output values of . If is not symmetric, then might not be diagonalizable; thus we may have to deal with complex eigenvalues. To keep the dynamical system real-valued, one could always modify the map to output the real part. However, the tensor need not be symmetric (nor normal for all ) for the dynamical system to maintain real values. In fact, our motivation for this dynamical system comes from a tensor that is not necessarily symmetric, which we will discuss in Section 2.2.

1. Choose a map and a numerical integration scheme.

2. Numerically integrate Eq. 4.

Different choices of may converge to different -eigenvectors and different numerical integration schemes may lead to different convergence properties. Figure 1 shows a concrete example, where picks the eigenvector corresponding to the largest eigenvalue in magnitude and the integration scheme is the forward Euler method.

### 2.1 Forward Euler convergence with diagonal tensors

As an illustrative example, we consider the special case of using the forward Euler numerical integration scheme for computing the tensor eigenvalues of an -dimensional, -mode diagonal tensor . Without loss of generality, assume that the diagonal entries of are decreasing so that ordered so that if . This tensor has -eigenpairs:  for , where is the th standard basis vector. Suppose that we want to compute the th eigenvector and set to map to the largest algebraic eigenvalue (with unit norm). Let be the residual at the th iteration. If the step size is , then

 ∥\vrk+1∥=∥\vxk+1−\vei∥ =∥\vxk+h(\vei−\vxk)−\vei∥ (5) =(1−h)∥\vxk−\vei∥=(1−h)∥\vrk∥=(1−h)k∥\vr0∥. (6)

Thus, the forward Euler scheme converges if and converges in one step if .

### 2.2 Motivating the dynamical system with spacey random walks

The motivation for the dynamical system comes from our previous analysis of a stochastic process known as the “spacey random walk” that relates tensor eigenvectors of a particular class of tensors to a stochastic process [4]. Specifically, the class of tensors are irreducible transition probability tensors (any tensor with for ). For simplicity, we will now discuss a three-mode transition probability tensor , where the entries can be interpreted as coming from a second-order Markov chain; the entry is the probability of transitioning to state given that the last two states were and . Due to the theory of Li and Ng [17], there exists a tensor -eigenvector with eigenvalue satisfying

 \cmP\vx2=\vx,∑ixi=1,xi≥0. (7)

The vector is stochastic, but it does not represent the stationary distribution of a Markov chain. Instead, we showed that is the limiting distribution of a non-Markovian, generalized vertex-reinforced random walk [2] that we call the spacey random walk [4]. In the th step of a spacey random walk, after the process has visited states , it spaces out and forgets its second last state (that is, the state ). It then invents a new history state by randomly drawing a past state . Finally, it transitions to via the second-order Markov chain represented by as if its last two states were and , i.e., it transitions to with probability . (In contrast, a true second-order Markov chain would transition with probability .)

Using results from Benaïm [2], we showed that the long-term dynamics of the spacey random walk for an -mode transition probability tensor are governed by the following dynamical system [4]:

 d\vxdt=Π(\cmP[\vx]m−2)−\vx, (8)

where is a map that takes a column-stochastic transition matrix and maps it to the Perron vector of the matrix. In other words, if the spacey random walk converges, it must converge to an attractor of the dynamical system in Eq. 8. The dynamical system in Eq. 8 is a special case of the more general system in Eq. 4, where the map picks the eigenvector with largest algebraic eigenvalue (the Perron vector), and the tensor has certain structural properties (it is a transition probability tensor).

To summarize, our prior work studied a specific case of the general dynamical system in Eq. 4 to understand the stochastic process behind principal -eigenvectors of transition probability tensors. The general dynamical system provides a new framework for computing general tensor eigenvectors—if the dynamical system in Eq. 4 converges, then it converges to a tensor -eigenvector. The dynamical system may not converge [23], but it usually does in practice, as we will see in Section 3.

### 2.3 Relationship with the shifted higher-order power method

The shifted higher-order power method [14] can be derived by noticing that

 (1+γ)λ\vx=\cmT\vxm−1+γλ\vx (9)

for any eigenpair. This yields the iteration

 \vxk+1=11+γ(\cmT\vxm−1k+γ\vxk)∥11+γ(\cmT\vxm−1k+γ\vxk)∥ (10)

for any shift parameter (the case where is just the “higher-order power method” [12, 16, 26]). Kolda and Mayo showed that when is symmetric, the iterates in Eq. 10 converge monotonically to a tensor eigenvector given an appropriate shift .

If for some transition probability tensor and we are interested in the case when , then one can also derive these iterates by the dynamical system

 d\vxdt=\cmP\vxm−1−\vx (11)

(c.f. Eq. 8). If this dynamical system converges (), then , and is a tensor -eigenvector with eigenvalue . If we numerically integrate Eq. 11 using the forward Euler method with step size and any starting vector satisfying and , then the iterates are

 \vxk+1 =\vxk+11+γ(\cmP\vxm−1k−\vxk) (12) =11+γ(\cmP\vxm−1k+γ\vxk)=11+γ(\cmP\vxm−1k+γ\vxk)∥11+γ(\cmP\vxm−1k+γ\vxk)∥1, (13)

which are the same as the shifted higher-order power method iterates in Eq. 10. The last equality comes from the fact that and by a simple induction argument: the base case holds by the initial conditions and

 ∥\cmP\vxm−1k+γ\vxk∥1=∥\cmP\vxm−1k∥1+γ=1+γ (14)

since and are both stochastic vectors.

For a general tensor , we can use the dynamical system

 d\vxdt=\cmT\vxm−1−∥\vx∥2\vx. (15)

If , then , so is a -eigenvector of with eigenvalue . Now suppose that we numerically integrate the dynamical system in Eq. 15 by

1. taking a forward Euler step to produce the iterate ; and

2. projecting onto the unit sphere by .

If the step size of the forward Euler method is , then

 \vx′k+1 =\vxk+11+γ(\cmT\vxm−1k−∥\vxk∥2\vxk)=11+γ(\cmP\vxm−1k+γ\vxk) (16)

since . The projection onto the unit sphere then gives the shifted higher-order power method iterates in Eq. 10.

## 3 Numerical examples

We now demonstrate our method on two tensors used in prior work. Section 3.1 shows that our approach can compute all eigenvalues of a specific tensor, while the (shifted) higher-order power method cannot compute all of the eigenvalues. Section 3.2 shows verifies that our approach can compute all eigenvalues of a tensor whose eigenvalues were found with semi-definite programming.

### 3.1 Example 3.6 from Kolda and Mayo [14]

Our first test case is a symmetric tensor from Kolda and Mayo [14, Example 3.6]:

The tensor has 7 eigenvalues, which Kolda and Mayo classify as “positive stable”, “negative stable”, or “unstable” (see Fig. 2, top), corresponding to positive definiteness, negative definiteness, or indefiniteness of the projected Hessian of the Lagrangian of their optimization function [14]. (Note that since the tensor has odd number of modes, eigenvalues are only defined up to sign.) Kolda and Mayo showed that their shifted symmetric higher-order power method (SS-HOPM), a generalization of the symmetric higher-order power method (S-HOPM) [12, 16, 26], only converges to eigenvectors of the positive or negative stable eigenvalues.

Of the 7 eigenpairs for the above tensor, 3 are unstable. Our dynamical systems approach can compute all 7 eigenpairs, using 5 variations of the dynamical system:

1. maps to the eigenvector with largest magnitude eigenvalue;

2. maps to the eigenvector with smallest magnitude eigenvalue;

3. maps to the eigenvector with largest algebraic eigenvalue;

4. maps to the eigenvector with smallest algebraic eigenvalue; and

5. maps to the eigenvector with second smallest algebraic eigenvalue.

We used the forward Euler method with step size set to 0.5 in order to compute the eigenvalues. Empirically, convergence is fast, requiring 10 or fewer iterations (see the bottom row of Fig. 2). We note that one can also also compute these eigenvectors with a Lassere type semidefinite programming hierarchy [8, 21, 22], although the scalability of such methods is limited. We provide numerical results from a tensor in this literature in the next section.

### 3.2 Example 4.11 from Cui, Dai, and Nie [8]

Our second test case is a symmetric tensor from Cui, Dai, and Nie [8, Example 4.11]:

 \cTi,j,k=(−1)ii+(−1)jj+(−1)kk,1≤i,j,k≤5.

The tensor has 3 eigenvalues (again, the tensor has an odd number of modes, so the eigenvalues are only defined up to sign). We use the same 5 variations of our algorithm to compute the eigenpairs (Fig. 3). Again, we are able to compute all of the eigenvalues of the tensor, and convergence is rapid.

## 4 Discussion

Scalable methods for computing tensor eigenvectors remain a challenge. Our new framework for computing -eigenvectors offers insights through two observations. First, a tensor -eigenvector is a matrix eigenvector of some matrix. However, the matrix is obtained by applying the tensor collapse operator with the -eigenvector itself. Second, for a certain class of tensors where eigenvectors have a stochastic interpretation, the dynamical system in Eq. 4 is the one that governs the long-term dynamics of the stochastic process. Nevertheless, the same dynamical system seems to work for more general tensors. In fact, we can compute tensor eigenvectors that the (shifted) higher-order power method cannot. However, one major challenge is knowing what map to choose—different choices lead to different eigenvectors and there is no immediate relationship between them for general tensors. The structure of the tensor can offer some guidance; for the special case of a transition probability tensor and eigenvalue , the right map is the one that outputs the Perron vector of a column stochastic matrix. Finally, our method is not immediately applicable to -eigenvectors because Section 1 no longer holds. Adapting our methodology to this class of eigenvectors is an area for future research.

Our framework came from relating tensor eigenvectors to stochastic processes. This is quite different from the core ideas in the tensor literature, which are firmly rooted in algebraic generalizations. We hope that our ideas encourage the use of stochastics and other areas of applied mathematics in solving difficult tensor problems.

## Acknowledgments

We thank Brad Nelson for providing valuable feedback. ARB is supported by NSF TRIPODS Award #1740822. DFG is supported by NSF award CCF-1149756, IIS-1422918, IIS-1546488, the NSF Center for Science of Information STC, CCF-0939370, DARPA SIMPLEX, and the Sloan Foundation.

## References

• [1] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773–2832, 2014.
• [2] M. Benaïm. Vertex-reinforced random walks and a conjecture of Pemantle. The Annals of Probability, 25(1):361–392, 1997.
• [3] A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In Proceedings of the 2015 SIAM International Conference on Data Mining, pages 118–126, 2015.
• [4] A. R. Benson, D. F. Gleich, and L.-H. Lim. The spacey random walk: a stochastic process for higher-order data. SIAM Review, 59(2):321–345, 2017.
• [5] D. A. Bini, B. Meini, and F. Poloni. On the solution of a quadratic vector equation arising in markovian binary trees. Numerical Linear Algebra with Applications, 18(6):981–991, 2011.
• [6] D. Cartwright and B. Sturmfels. The number of eigenvalues of a tensor. Linear Algebra and its Applications, 438(2):942–952, 2013.
• [7] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain. Symmetric tensors and symmetric tensor rank. SIAM Journal on Matrix Analysis and Applications, 30(3):1254–1279, 2008.
• [8] C.-F. Cui, Y.-H. Dai, and J. Nie. All real eigenvalues of symmetric tensors. SIAM Journal on Matrix Analysis and Applications, 35(4):1582–1601, 2014.
• [9] D. F. Gleich, L.-H. Lim, and Y. Yu. Multilinear PageRank. SIAM Journal on Matrix Analysis and Applications, 36(4):1507–1541, 2015.
• [10] S. Hu, L. Qi, and G. Zhang. Computing the geometric measure of entanglement of multipartite pure states by means of non-negative tensors. Physical Review A, 93(1), 2016.
• [11] E. Kofidis and P. A. Regalia. Tensor approximation and signal processing applications. Contemporary Mathematics, 280:103–134, 2001.
• [12] E. Kofidis and P. A. Regalia. On the best rank-1 approximation of higher-order supersymmetric tensors. SIAM Journal on Matrix Analysis and Applications, 23(3):863–884, 2002.
• [13] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
• [14] T. G. Kolda and J. R. Mayo. Shifted power method for computing tensor eigenpairs. SIAM Journal on Matrix Analysis and Applications, 32(4):1095–1124, 2011.
• [15] L. Lathauwer. Signal processing based on multilinear algebra. PhD thesis, Katholieke Universiteit Leuven, 1997.
• [16] L. D. Lathauwer, B. D. Moor, and J. Vandewalle. On the Best Rank- and Rank-(, , , ) Approximation of Higher-Order Tensors. SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000.
• [17] W. Li and M. K. Ng. On the limiting probability distribution of a transition probability tensor. Linear and Multilinear Algebra, Online:1–24, 2013.
• [18] L.-H. Lim. Singular values and eigenvalues of tensors: a variational approach. In 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pages 129–132, 2005.
• [19] L.-H. Lim. Tensors and hypermatrices. In Handbook of Linear Algebra, Second Edition, chapter 15, pages 231–260. Chapman and Hall/CRC, 2013.
• [20] B. Meini and F. Poloni. A perron iteration for the solution of a quadratic vector equation arising in markovian binary trees. SIAM Journal on Matrix Analysis and Applications, 32(1):248–261, 2011.
• [21] J. Nie and L. Wang. Semidefinite relaxations for best rank-1 tensor approximations. SIAM Journal on Matrix Analysis and Applications, 35(3):1155–1179, 2014.
• [22] J. Nie and X. Zhang. Real eigenvalues of nonsymmetric tensors. Computational Optimization and Applications, 70(1):1–32, 2017.
• [23] J. Peterson. Personal communication, 2018.
• [24] L. Qi. Eigenvalues of a real supersymmetric tensor. J. Symb. Comput., 40(6):1302–1324, 2005.
• [25] L. Qi, Y. Wang, and E. X. Wu. -eigenvalues of diffusion kurtosis tensors. Journal of Computational and Applied Mathematics, 221(1):150–157, 2008.
• [26] P. A. Regalia and E. Kofidis. The higher-order power method revisited: convergence proofs and effective initialization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2000.
• [27] T.-C. Wei and P. M. Goldbart. Geometric measure of entanglement and applications to bipartite and multipartite quantum states. Physical Review A, 68(4), 2003.
• [28] T. Wu, A. Benson, and D. F. Gleich. General tensor spectral co-clustering for higher-order data. In Advances in Neural Information Processing Systems, pages 2559–2567, 2016.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters