Shifted Power Method for Computing Tensor EigenpairsThis work was funded by the applied mathematics program at the U.S. Department of Energy and by an Excellence Award from the Laboratory Directed Research & Development (LDRD) program at Sandia National Laboratories. Sandia National Laboratories is a multiprogram laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

# Shifted Power Method for Computing Tensor Eigenpairs††thanks: This work was funded by the applied mathematics program at the U.S. Department of Energy and by an Excellence Award from the Laboratory Directed Research & Development (LDRD) program at Sandia National Laboratories. Sandia National Laboratories is a multiprogram laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Tamara G. Kolda222Sandia National Laboratories, Livermore, CA. Email: {tgkolda,jmayo}@sandia.gov.    Jackson R. Mayo222Sandia National Laboratories, Livermore, CA. Email: {tgkolda,jmayo}@sandia.gov.
###### Abstract

Recent work on eigenvalues and eigenvectors for tensors of order has been motivated by applications in blind source separation, magnetic resonance imaging, molecular conformation, and more. In this paper, we consider methods for computing real symmetric-tensor eigenpairs of the form subject to , which is closely related to optimal rank-1 approximation of a symmetric tensor. Our contribution is a shifted symmetric higher-order power method (SS-HOPM), which we show is guaranteed to converge to a tensor eigenpair. SS-HOPM can be viewed as a generalization of the power iteration method for matrices or of the symmetric higher-order power method. Additionally, using fixed point analysis, we can characterize exactly which eigenpairs can and cannot be found by the method. Numerical examples are presented, including examples from an extension of the method to finding complex eigenpairs.

t

ensor eigenvalues, E-eigenpairs, Z-eigenpairs, -eigenpairs, rank-1 approximation, symmetric higher-order power method (S-HOPM), shifted symmetric higher-order power method (SS-HOPM)

## 1 Introduction

Tensor eigenvalues and eigenvectors have received much attention lately in the literature [13, 18, 20, 19, 4, 15, 27]. The tensor eigenproblem is important because it has applications in blind source separation [11], magnetic resonance imaging [25, 22], molecular conformation [7], etc. There is more than one possible definition for a tensor eigenpair [18]; in this paper, we specifically use the following definition. {definition} Assume that is a symmetric -order -dimensional real-valued tensor. For any -dimensional vector , define

 (Axm−1)i1≡n∑i2=1⋯n∑im=1ai1i2⋯imxi2⋯ximfori1=1,…,n. (1)

Then is an eigenvalue of if there exists such that

 Axm−1=λxandxTx=1. (2)

The vector is a corresponding eigenvector, and is called an eigenpair.

Definition 1 is equivalent to the Z-eigenpairs defined by Qi [18, 19] and the -eigenpairs defined by Lim [13]. In particular, Lim [13] observes that any eigenpair is a Karush-Kuhn-Tucker (KKT) point (i.e., a constrained stationary point) of the nonlinear optimization problem

 maxx∈RnAxmsubject toxTx=1,whereAxm≡n∑i1=1⋯n∑im=1ai1⋯imxi1⋯xim. (3)

This is equivalent to the problem of finding the best symmetric rank-1 approximation of a symmetric tensor [6]. We present the more general definition that incorporates complex-valued eigenpairs in §5.

In this paper, we build upon foundational work by Kofidis and Regalia [11] for solving (3). Their paper is extremely important for computing tensor eigenvalues even though it predates the definition of the eigenvalue problem by three years. Kofidis and Regalia consider the higher-order power method (HOPM) [6], a well-known technique for approximation of higher-order tensors, and show that its symmetric generalization (S-HOPM) is not guaranteed to converge. They go on, however, to use convexity theory to provide theoretical results (as well as practical examples) explaining conditions under which the method is convergent for even-order tensors (i.e., even). Further, these conditions are shown to hold for many problems of practical interest.

In the context of independent component analysis (ICA), both Regalia and Kofidis [23] and Erdogen [8] have developed shifted variants of the power method and shown that they are monotonically convergent. We present a similar method in the context of finding real-valued tensor eigenpairs, called the shifted symmetric higher-order power method (SS-HOPM), along with theory showing that it is guaranteed to converge to a constrained stationary point of (3). The proof is general and works for both odd- and even-order tensors (i.e., all ). The effectiveness of SS-HOPM is demonstrated on several examples, including a problem noted previously [11] for which S-HOPM does not converge. We also present a version of SS-HOPM for finding complex-valued tensor eigenpairs and provide examples of its effectiveness.

As mentioned, there is more than one definition of a tensor eigenpair. In the case of the -eigenpair (we use for the tensor order instead of as in some references) or H-eigenpair, the eigenvalue equation becomes , where denotes the vector with each element raised to the power [13, 18]. In this context, Qi, Wang, and Wang [21] propose some methods specific to third-order tensors (). Unlike the (-)eigenvalues we consider here, it is possible to guarantee convergence to the largest -eigenvalue for certain classes of nonnegative tensors. For example, see the power methods proposed by Ng, Qi, and Zhou [15] and Liu, Zhou, and Ibrahim [14], the latter of which also uses a shift to guarantee convergence for any irreducible nonnegative tensor.

## 2 Preliminaries

Throughout, let and denote the unit ball and sphere on , i.e.,

 Γ={x∈Rn:∥x∥≤1}andΣ={x∈Rn:∥x∥=1}.

 Πm≡the set of all permutations of (1,…,m).

Let denote , and define . Let denote the spectral radius of a square matrix , i.e., the maximum of the magnitudes of its eigenvalues.

### 2.1 Tensors

A tensor is an -way array. We let denote the space of -order real-valued tensors with dimension , e.g., . We adopt the convention that .

We formally introduce the notion of a symmetric tensor, sometimes also called supersymmetric, which is invariant under any permutation of its indices. Further, we define a generalization of the tensor-vector multiplication in equations (1) and (3).

{definition}

[Symmetric tensor [5]] A tensor is symmetric if

 aip(1)⋯ip(m)=ai1⋯imfor alli1,…,im∈{1,…,n}andp∈Πm.
{definition}

[Symmetric tensor-vector multiply] Let be symmetric and . Then for , the -times product of the tensor with the vector is denoted by and defined by

 (Axm−r)i1⋯ir≡∑ir+1,…,imai1⋯imxir+1⋯ximfor alli1,…,ir∈{1,…,n}.
###### Example \thetheorem

The identity matrix plays an important role in matrix analysis. This notion can be extended in a sense to the domain of tensors. We may define an identity tensor as a symmetric tensor such that

 Exm−1=xfor % allx∈Σ.

We restrict since it is not possible to have a tensor with such that the above equation holds for all . For any , the above equation implies

 Exm−1=∥x∥m−1E(x/∥x∥)m−1=∥x∥m−1(x/∥x∥)=∥x∥m−2x.

Consider the case of and . The system of equations that must be satisfied for all is

 e1111x31+3e1112x21x2+3e1122x1x22+e1222x32 =x1, e1112x31+3e1122x21x2+3e1222x1x22+e2222x32 =x2.

Consider . This yields and . Similarly, yields and . The only remaining unknown is , and choosing, e.g., yields . In summary, the identity tensor for and is

 eijkl=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩1if i=j=k=l,1/3if i=j≠k=l,1/3if i=k≠j=l,1/3if i=l≠j=k,0otherwise.

We generalize this idea in the next property.

###### Property \thetheorem

For even, the identity tensor satisfying for all is given by

 ei1⋯im=1m!∑p∈Πmδip(1)ip(2)δip(3)ip(4)⋯δip(m−1)ip(m) (4)

for , where is the standard Kronecker delta, i.e.,

 δij≡{1if i=j,0if i≠j.

This identity tensor appears in a previous work [18], where it is denoted by and used to define a generalization of the characteristic polynomial for symmetric even-order tensors.

###### Example \thetheorem

There is no identity tensor for odd. This is seen because if for some odd and some , then we would have but .

For any even-order tensor (i.e., even), observe that if is an eigenpair, then is also an eigenpair since

Likewise, for any odd-order tensor (i.e., odd), is also an eigenpair since

 A(−x)m−1=Axm−1=(−λ)(−x).

These are not considered to be distinct eigenpairs.

We later present, as Theorem 5.1, a recently derived result [3] that bounds the number of real eigenpairs by . We defer discussion of this result until §5, where we discuss complex eigenpairs.

Because the tensor eigenvalue equation for amounts to a system of nonlinear equations in the components of , a direct solution is challenging. Numerical algorithms exist for finding all solutions of a system of polynomial equations, but become computationally expensive for systems with many variables (here, large ) and with high-order polynomials (here, large ). A polynomial system solver (NSolve) using a Gröbner basis method is available in Mathematica [28] and has been employed to generate a complete list of eigenpairs for some of the examples in this paper. The solver is instructed to find all solutions of the system (2). Redundant solutions with the opposite sign of (for even ) or the opposite signs of and (for odd ) are then eliminated.

### 2.2 Convex functions

Convexity theory plays an important role in our analysis. Here we recall two important properties of convex functions [2].

###### Property \thetheorem (Gradient of convex function)

A differentiable function is convex if and only if is a convex set and for all .

###### Property \thetheorem (Hessian of convex function)

A twice differentiable function is convex if and only if is a convex set and the Hessian111By we denote the Hessian matrix and not its trace, the Laplacian. of is positive semidefinite on , i.e., for all .

We prove an interesting fact about convex functions on vectors of unit norm that will prove useful in our later analysis. This fact is implicit in a proof given previously [11, Theorem 4] and explicit in [23, Theorem 1]. {theorem}[Kofidis and Regalia [11, 23]] Let be a function that is convex and continuously differentiable on . Let with . If , then {proof} For arbitrary nonzero , is strictly maximized for by . Substituting , it follows that since and . By the convexity of on and Property 2.2, we have for all . Consequently,

### 2.3 Constrained optimization

Here we extract relevant theory from constrained optimization [17].

{theorem}

Let be continuously differentiable. A point is a (constrained) stationary point of

 maxf(x)subject tox∈Σ

if there exists such that The point is a (constrained) isolated local maximum if, additionally,

 wT(∇2f(x∗)+μ∗I)w<0for allw∈Σ∩x⊥∗.
{proof}

The constraint can be expressed as . The Lagrangian for the constrained problem is then given by

Its first and second derivatives with respect to are

 ∇L(x,μ)=∇f(x)+μxand∇2L(x,μ)=∇2f(x)+μI.

By assumption, and . Therefore, the pair satisfies the Karush-Kuhn-Tucker (KKT) conditions [17, Theorem 12.1] and so is a constrained stationary point. It is additionally a constrained isolated local maximum if it meets the second-order sufficient condition [17, Theorem 12.6].

### 2.4 Fixed point theory

We consider the properties of iterations of the form

 xk+1=ϕ(xk).

Under certain conditions, the iterates are guaranteed to converge to a fixed point. In particular, we are interested in “attracting” fixed points.

{definition}

[Fixed point] A point is a fixed point of if . Further, is an attracting fixed point if there exists such that the sequence defined by converges to for any such that .

{theorem}

[[24, Theorem 2.8]] Let be a fixed point of , and let be the Jacobian of . Then is an attracting fixed point if ; further, if , then the convergence of to is linear with rate .

This condition on the Jacobian for an attracting fixed point is sufficient but not necessary. In particular, if , then may or may not be attracting, but there is no neighborhood of linear convergence to it. For , the rate of linear convergence depends on and is slower for values closer to 1. On the other hand, for , an attractor is ruled out by the following.

{theorem}

[[26, Theorem 1.3.7]] Let be a fixed point of , and let be the Jacobian of . Then is an unstable fixed point if .

## 3 Symmetric higher-order power method (S-HOPM)

We review the symmetric higher-order power method (S-HOPM), introduced by De Lathauwer et al. [6] and analyzed further by Kofidis and Regalia [11]. The purpose of S-HOPM is to solve the optimization problem

 maxx∈Rn|Axm|subject tox∈Σ. (5)

The solution of this problem will be a solution of either the following maximization problem (lacking the absolute value) or its opposite minimization problem:

 maxx∈Rnf(x)% subject tox∈Σ,wheref(x)=Axm. (6)

Setting , these problems are equivalent to finding the best symmetric rank-1 approximation of a symmetric tensor , i.e.,

 minλ,x∥A−B∥subject tobi1…im=λxi1⋯ximandx∈Σ. (7)

Details of the connection between (6) and (7) are available elsewhere [6]. The S-HOPM algorithm is shown in Algorithm 1. We discuss its connection to the eigenvalue problem in §3.1 and its convergence properties in §3.2.

### 3.1 Properties of f(x)=Axm

The function plays an important role in the analysis of eigenpairs of because all eigenpairs are constrained stationary points of , as we show below.

We first need to derive the gradient of . This result is perhaps generally well known [13, Equation 4], but here we provide a proof.

{lemma}

Let be symmetric. The gradient of is

 g(x)≡∇f(x)=mAxm−1∈Rn. (8)
{proof}

We use the basic relation Applying the product rule to (6), we find

 ∇kf(x)=∑i1,…,imm∑q=1ai1i2⋯imxi1xi2⋯xiq−1δiqkxiq+1⋯xim.

Upon bringing the sum over to the outside, we observe that for each the dummy indices and can be interchanged (without affecting the symmetric tensor ), and the result is independent of :

 ∇kf(x)=m∑q=1∑i1,…,imai1i2⋯imδi1kxi2⋯xiq−1xiqxiq+1⋯xim=m∑q=1∑i2,…,imaki2⋯imxi2⋯xim=m(Axm−1)k.

Hence, .

{theorem}

Let be symmetric. Then is an eigenpair of if and only if is a constrained stationary point of (6). {proof} By Theorem 2.3, any constrained stationary point of (6) must satisfy for some . Thus, is the eigenvalue corresponding to . Conversely, any eigenpair meets the condition for being a constrained stationary point with .

This is is the connection between (6) and the eigenvalue problem. It will also be useful to consider the Hessian of , which we present here.

{lemma}

Let be symmetric. The Hessian of is

 H(x)≡∇2f(x)=m(m−1)Axm−2∈Rn×n. (9)
{proof}

The entry of is given by the entry of . The function can be rewritten as

 gj(x)=m∑i2,…,imaji2⋯imxi2⋯xim=mB(j)xm−1

where is the order- symmetric tensor that is the subtensor of , defined by . From Lemma 3.1, we have

 ∇gj(x)=m(m−1)B(j)xm−2.

Consequently,

 (H(x))jk=m(m−1)∑i3,…,imajki3⋯imxi3⋯xim,

that is, .

From Theorem 2.3, we know that the projected Hessian of the Lagrangian plays a role in determining whether or not a fixed point is a local maximum or minimum. In our case, since , for any eigenpair (which must correspond to a constrained stationary point by Theorem 3.1) we have

 ∇2L(x∗,λ∗)=m(m−1)Axm−2∗−mλ∗I.

Specifically, Theorem 2.3 is concerned with the behavior of the Hessian of the Lagrangian in the subspace orthogonal to . Thus, we define the projected Hessian of the Lagrangian as

 C(λ∗,x∗)≡UT∗((m−1)Axm−2∗−λ∗I)U∗∈R(n−1)×(n−1), (10)

where the columns of form an orthonormal basis for . Note that we have removed a factor of for convenience. We now classify eigenpairs according to the spectrum of . The import of this classification will be made clear in §4.2.

{definition}

Let be a symmetric tensor. We say an eigenpair of is positive stable if is positive definite, negative stable if is negative definite, and unstable if is indefinite.

These labels are not exhaustive because we do not name the cases where is only semidefinite, with a zero eigenvalue. Such cases do not occur for generic tensors.

If is odd, then is positive stable if and only if is negative stable, even though these eigenpairs are in the same equivalence class. On the other hand, if is even, then is a positive (negative) stable eigenpair if and only if is also positive (negative) stable.

### 3.2 S-HOPM convergence analysis

S-HOPM has been deemed unreliable [6] because convergence is not guaranteed. Kofidis and Regalia [11] provide an analysis explaining that S-HOPM will converge if certain conditions are met, as well as an example where the method does not converge, which we reproduce here.

###### Example \thetheorem (Kofidis and Regalia [11, Example 1])

Let be the symmetric tensor defined by

 a1111 =−0.2883, a1112 =−0.0031, a1113 =−0.1973, a1122 =−0.2485, a1123 =−0.2939, a1133 =−0.3847, a1222 =−0.2972, a1223 =−0.1862, a1233 =−0.0919, a1333 =−0.3619, a2222 =−0.1241, a2223 =−0.3420, a2233 =−0.2127, a2333 =−0.2727, a3333 =−0.3054.

Kofidis and Regalia [11] observed that Algorithm 1 does not converge for this tensor. Because this problem is small, all eigenpairs can be calculated by Mathematica as described in §2.1. From Theorem 5.1, this problem has at most 13 eigenpairs; we list the 11 real eigenpairs in Table 1. We ran 100 trials of S-HOPM using different random starting points chosen from a uniform distribution on . For these experiments, we allow up to 1000 iterations and say that the algorithm has converged if . In every single trial for this tensor, the algorithm failed to converge. In Figure 1, we show an example sequence with . This coincides with the results reported previously [11].

###### Example \thetheorem

As a second illustrative example, we consider an odd-order tensor defined by

 a111 =−0.1281, a112 =−0.0516, a113 =−0.0954, a122 =−0.1958, a123 =−0.1790, a133 =−0.2676, a222 =−0.3251, a223 =−0.2513, a233 =−0.1773, a333 =−0.0338.

From Theorem 5.1, has at most 7 eigenpairs; in this case we achieve that bound and the eigenpairs are listed in Table 2. We ran 100 trials of S-HOPM as described for Example 3.2. Every trial converged to either or , as summarized in Table 3. Therefore, S-HOPM finds 2 of the 7 possible eigenvalues.

In their analysis, Kofidis and Regalia [11] proved that the sequence in Algorithm 1 converges if is even-order and the function is convex or concave on . Since (because is even), can be expressed as

 f(x)=(x⊗⋯⊗xℓ times)TA(x⊗⋯⊗x%$ℓ$times),

where is an unfolded version of the tensor .222Specifically, with and in matricization notation [12]. Since is symmetric, it follows that is symmetric. The condition that is convex (concave) is satisfied if the Hessian

 ∇2f(x)=(I⊗x⊗⋯⊗xℓ−1 times)TA(I⊗x⊗⋯⊗xℓ−1 times)

is positive (negative) semidefinite for all .

We make a few notes regarding these results. First, even though is convex, its restriction to the nonconvex set is not. Second, is increasing if is convex and decreasing if is concave. Third, only is proved to converge for S-HOPM [11, Theorem 4]; the iterates may not. In particular, it is easy to observe that the sign of may flip back and forth if the concave case is not handled correctly.

## 4 Shifted symmetric higher-order power method (SS-HOPM)

In this section, we show that S-HOPM can be modified by adding a “shift” that guarantees that the method will always converge to an eigenpair. In the context of ICA, this idea has also been proposed by Regalia and Kofidis [23] and Erdogen [8]. Based on the observation that S-HOPM is guaranteed to converge if the underlying function is convex or concave on , our method works with a suitably modified function

 ^f(x)≡f(x)+α(xTx)m/2. (11)

Maximizing on is the same as maximizing plus a constant, yet the properties of the modified function force convexity or concavity and consequently guarantee convergence to a KKT point (not necessary the global maximum or minimum). Note that previous papers [23, 8] have proposed similar shifted functions that are essentially of the form differing only in the exponent.

An advantage of our choice of in (11) is that, for even , it can be interpreted as

 ^f(x)=^Axm≡(A+αE)xm,

where is the identity tensor as defined in (4). Thus, for even , our proposed method can be interpreted as S-HOPM applied to a modified tensor that directly satisfies the convexity properties to guarantee convergence [11]. Because for , the eigenvectors of are the same as those of and the eigenvalues are shifted by . Our results, however, are for both odd- and even-order tensors.

Algorithm 2 presents the shifted symmetric higher-order power method (SS-HOPM). Without loss of generality, we assume that a positive shift () is used to make the modified function in (11) convex and a negative shift () to make it concave. We have two key results. Theorem 4.1 shows that for any starting point , the sequence produced by Algorithm 2 is guaranteed to converge to an eigenvalue in the convex case if

 α>β(A)≡(m−1)⋅maxx∈Σρ(Axm−2). (12)

Corollary 4.1 handles the concave case where we require . Theorem 4.2 further shows that Algorithm 2 in the convex case will generically converge to an eigenpair that is negative stable. Corollary 4.2 proves that Algorithm 2 in the concave case will generically converge to an eigenpair that is positive stable. Generally, neither version will converge to an eigenpair that is unstable.

### 4.1 SS-HOPM convergence analysis

We first establish a few key lemmas that guide the choice of the shift in SS-HOPM.

{lemma}

Let be symmetric and let be as defined in (12). Then . {proof} For all , we obtain by applying the triangle inequality to the sum of terms. Thus for all , and the result follows.

{lemma}

Let be symmetric, let , and let be as defined in (12). Then for all . {proof} We have .

The preceding lemma upper bounds the magnitude of any eigenvalue of by since any eigenpair satisfies . Thus, choosing implies that is greater than the magnitude of any eigenvalue of .

{lemma}

Let be symmetric and let and be as defined in (9) and (12). Then