On the numerical ranks of tensorsSubmitted to the editors July 1, 2019.      Funding: This work is supported by National Science Foundation grant no. 1818757.

# On the numerical ranks of tensors††thanks: Submitted to the editors July 1, 2019.      Funding: This work is supported by National Science Foundation grant no. 1818757.

Tianyi Shi Center for Applied Mathematics, Cornell University, Ithaca, NY 14853. (ts777@cornell.edu)    Alex Townsend Department of Mathematics, Cornell University, Ithaca, NY 14853. (townsend@cornell.edu)
###### Abstract

Tensors are often compressed by expressing them in low rank tensor formats. In this paper, we develop three methodologies that bound the compressibility of a tensor: (1) Algebraic structure, (2) Smoothness, and (3) Displacement structure. For each methodology, we derive bounds on numerical tensor ranks that partially explain the abundance of compressible tensors in applied mathematics. For example, we show that the solution tensor of a discretized Poisson equation on with zero Dirichlet conditions can be approximated to a relative accuracy of by a tensor-train decomposition of rank , where and .

Key words. numerical low rank, Tensor-Train, Multilinear, Canonical Polyadic, displacement

AMS subject classifications. 15A69, 65F99

## 1 Introduction

A wide variety of applications [19, 7, 24] lead to problems involving data or solutions that can be represented by tensors [21]. A general -order tensor has entries, which prevents it from being stored explicitly except for modest . It is often essential to represent or approximate tensors using sparse data formats, such as low rank tensor decompositions [10, 21]. However, the need for data sparse formats does not imply that such approximations are always mathematically possible. In this paper, we derive bounds on numerical tensor ranks (see LABEL:def:NumericalCPRank, LABEL:def:NumericalTuckerRank and LABEL:def:NumericalTTrank) for certain families of tensors, and in doing so, we partially justify the use of low rank tensor decompositions. Analogous theoretical results have already been derived that explicitly bound the numerical rank of matrices [2, 26, 30, 34].

The situation for tensors is more complicated than for matrices and this is reflected in several distinct low rank tensor decompositions [21, 28]. Here, we consider three such decompositions: (a) Tensor-train decomposition (see LABEL:sec:TT), (b) Orthogonal Tucker decomposition (see LABEL:sec:OrthogonalTucker), and (c) Canonical Polyadic (CP) decomposition (see LABEL:sec:CPdecomposition). These three tensor decompositions supply three different definitions of tensor rank, and therefore each one requires separate attention.

For a given tensor , we are also interested in developing a variety of tools to theoretically explain whether there exists a low rank tensor , in one or more of the tensor formats, such that

 ∥X−~X∥F≤ϵ∥X∥F,∥X∥2F=n1∑i1=1⋯nd∑id=1|Xi1,…,id|2, (1.1)

where is an accuracy tolerance. If can be well-approximated by , then dramatic storage and computational benefits can be achieved by replacing by  [10, 12]. In this paper, we explore three methodologies to bound the numerical rank of a tensor:

• Algebraic structures: If a tensor is constructed by sampling a multivariable function that can be expressed as a sum of products of single-variable functions, then that tensor is usually a low rank tensor. Occasionally, one may have to perform algebraic manipulations to a function to explicitly reveal its low rank structure, for example, by using trigonometric identities (see LABEL:sec:algebraic).

• Smoothness: If a tensor can be constructed by sampling a smooth function on a tensor-product grid, then that tensor is often well-approximated by a low rank tensor. This observation can be made rigorous by using the fact that smooth functions on compact domains can be well-approximated by polynomials [35, Chap. 6 & 7].

• Displacement structure: If a tensor satisfies a multidimensional Sylvester equation of the form:

 X×1A(1)+⋯+X×dA(d)=G,A(k)∈Cnk×nk,G∈Cn1×⋯×nd, (1.2)

where ‘’ denotes the -mode matrix product of a tensor (see LABEL:eq:kfold), then this—under additional assumptions—can ensure that the tensor is well-approximated by a low rank tensor. Multidimensional Sylvester equations such as LABEL:eq:TensorDisplacement appear when discretizing certain partial differential equations with finite differences [23] and are satisfied by several classes of structured tensors [11]. For example, we show that the solution tensor to on can be approximated up to a relative accuracy of by a tensor-train decomposition of rank , where and (see LABEL:sec:PoissonExample). This means that the solution tensor can be accurately represented in just degrees of freedom, despite the solution having weak corner singularities.

After some experience, one can successfully identify which methodology is likely to result in the best theoretical bounds on a tensor’s numerical tensor ranks. We emphasize that these three methodologies provide upper bounds on numerical tensor ranks, and do not provide a complete characterization of low rank tensors. Another methodology that partially explains the abundance of low rank tensors is artificial coordinate alignment [36], though we do not know how to use this observation to derive explicit bounds on tensor ranks.

### 1.1 Tensor notation

We follow as closely as possible the notation for tensors found in [21], which we briefly review now for the reader’s convenience.

The -mode product.

The -fold (or -mode) product of a tensor with a matrix is denoted by , and defined elementwise as

 (X×kA)i1,…,ik−1,j,ik+1,…,id=nk∑ik=1Xi1,…,idAj,ik. (1.3)

It corresponds to each mode- fiber of being multiplied by the matrix .

Double bracket.

In the tensor literature, the double bracket is used to denote a weighted sums of rank-1 tensors, i.e.,

 \llbracketG;A(1),…,A(d)\rrbracket=r1∑i1=1⋯rd∑id=1Gi1,…,idA(1)i1∘⋯∘A(d)id,A(k)∈Cnk×rk, (1.4)

where is often referred to as the core tensor and is the -way outer-product of vectors [21].

Flattening by reshaping.

One can always reorganize the entries of a tensor into a matrix and this idea is fundamental to the tensor-train decomposition [28]. Conventionally, one reorganizes the entries so that the mode-1 fibers are stacked. This is equivalent to .111In MATLAB, the reshape command reorganizes the entries of a tensor. For example, if , then returns a matrix of size formed by stacking entries according to their multi-index. We call the th unfolding of .

Flattening via matricization.

Another way to flatten a tensor is called mode- matricization (or th matricization), which arranges the mode- fibers to be the columns of a matrix [20]. We denote the mode- matricization of a tensor by . It is easy to see that the first unfolding and the mode-1 matricization of a tensor are identical, i.e., . In this paper, for a tensor , matricizations are constructed so that there exists another tensor satisfying [5]

 Yj(1)=X(j), …, Yj(d−j+1)=X(d), Yj(d−j+2)=X(1), …, Yj(d)=X(j−1). (1.5)

### 1.2 Summary of paper

In the next section, we review three tensor decompositions and define numerical tensor ranks in each format. In LABEL:sec:algebraic, we study the mathematical ranks of tensors constructed by sampling multivariate functions that have some algebraic structure. In LABEL:sec:smoothness, we consider the numerical tensor ranks of tensors constructed by sampling smooth multivariate functions. Finally, in LABEL:sec:displacement we consider the numerical tensor ranks of tensors that satisfy a multidimensional Sylvester equation.

## 2 Three tensor decompositions

In this section, we review three tensor decompositions: (a) Tensor-train decomposition, (b) Orthogonal Tucker decomposition, and (c) CP decomposition. Each one has an associated tensor rank and numerical tensor rank. Since the tensor ranks for tensor-train and Tucker decompositions are vectors, not numbers, we use lexicographical ordering to compare them. We say that a tensor rank of is less than , denoted by , if in the first entry for which the vectors differ, we find . In addition, we write if or for all .

### 2.1 Tensor-train decomposition and the numerical tensor-train rank

The tensor-train decomposition is a generalization of the singular value decomposition (SVD) that can be computed by a sequence of matrix SVDs [28]. A tensor has a tensor-train rank of at most if there exists matrix-valued functions for such that

 Xi1,…,id=G1(i1)G2(i2)⋯Gd(id),1≤ik≤nk. (2.1)

This decomposition writes each entry of as a product of matrices, where the th matrix in the “train” of length is determined by . Since the product of the matrices must always be a scalar, we have . Each can be represented by an tensor so a rank tensor-train decomposition requires at most elements to store. LABEL:fig:TT illustrates a tensor-train decomposition of rank at most .

We say that a tensor has a tensor-train rank of exactly , denoted by , if can be represented as in LABEL:eq:TensorTrain, and for all it is impossible to find such that for and .

Normally a tensor-train decomposition is constructed by separating out one dimension at a time, and compressing each dimension in turn [28]. For simplicity, the decomposition considered in this paper is performed in the order of dimension 1, dimension 2, and so on. In this way, the entries of the tensor-train rank are bounded from above by the ranks of matrices formed by flattening [28]. That is, for we have

 sk≤rank(Xk),Xk=reshape(X,k∏s=1ns,d∏s=k+1ns), (2.2)

where is the rank of the matrix . Therefore, if the ranks of all the matrices for are small, then the tensor can be exactly represented in a data-sparse format as a tensor-train decomposition.

In computational mathematics, we are interested in full rank tensors that are well-approximated (up to an accuracy of ) by low rank tensors. Therefore, we focus on the bounds of numerical tensor-train ranks.

###### Definition 2.1.

Given an , we say that the numerical tensor-train rank of is if is the smallest vector (with respect to lexicographical ordering) so that there exists a tensor with and . We denote this by .

For , we have and the numerical tensor-train rank can be dramatically less than the mathematical tensor-train rank.

### 2.2 Orthogonal Tucker decomposition and the numerical multilinear rank

The orthogonal Tucker decomposition is a factorization of a tensor into a set of matrices and a core tensor, where the matrices have orthonormal columns [16, 21, 5]. A tensor has a multilinear rank222The closely related Tucker rank of a tensor is also associated to the Tucker decomposition, except the matrices in LABEL:eq:Tucker are not constrained to have orthonormal columns. Since multilinear ranks are more commonly used in applications, we do not consider Tucker ranks in this paper. of at most if there are matrices with orthonormal columns and a core tensor such that

 X=\llbracketG;A(1),…,A(d)\rrbracket,A(k)∈Cnk×tk. (2.3)

Such a decomposition contains at most elements, and can be computed by the so-called higher-order singular value decomposition [5].

A tensor has a multilinear rank of exactly , denoted by , if can be represented as in LABEL:eq:Tucker, and for all it is impossible to write , where and .

As with tensor-train decomposition, we are interested in low rank tensors so our theoretical results derive upper bounds on the numerical multilinear ranks.

###### Definition 2.2.

Given , we say that the numerical multilinear rank of is if is the smallest vector (in lexicographical ordering) so that there exists a tensor with and . We denote this by .

For , we have , where the numerical multilinear rank can be dramatically less than the mathematical multilinear rank.

The CP decomposition expresses a tensor as a sum of rank-1 tensors. We say that a tensor is of rank at most if there are matrices and a diagonal tensor such that

 X=\llbracketD;A(1),…,A(d)\rrbracket,A(k)∈Cnk×r,D∈Cr×⋯×r, (2.4)

where the only nonzero entries of are for . If is omitted in this bracket notation, then by convention all the nonzero entries of are 1. The CP decomposition in LABEL:eq:CPdecomposition looks similar to the orthogonal Tucker decomposition, but there are two important differences: (1) The matrices in LABEL:eq:CPdecomposition do not need to have orthogonal columns and (2) The core tensor must be diagonal. This means that LABEL:eq:CPdecomposition is equivalent to expressing a tensor as a sum of rank-1 tensors.

If the tensor cannot be expressed as a sum of fewer than rank-1 tensors, then we say the tensor has rank . We denote this integer by . This tensor decomposition can be stored in at most entries, but the decomposition is NP-hard to compute for worst case examples [14].

Since we are aiming for upper bounds on the rank of a tensor, we can take any decomposition of the form in LABEL:eq:CPdecomposition with a potentially large , and see if its factor matrices are themselves low rank. For example, we find that [22, Lem. 1]:333Lemma 1 of [22] shows that the dimension of the vector space that spans the slices in the th index is equal to the rank of . The inequality in LABEL:eq:Kruskal follows from the extra assumption that the slices are themselves low rank tensors.

 rank(X)≤min1≤j≤d1rjd∏i=1ri, (2.5)

where for . The bound in LABEL:eq:Kruskal is potentially useful because it allows one to derive upper bounds on the rank of a tensor via bounds on the rank of factor matrices from any representation of the tensor as a sum of rank-1 tensors.

Again, we derive upper bounds for the numerical rank of a tensor, which can often be dramatically less than the mathematical rank of a tensor.

###### Definition 2.3.

Given , we say that the numerical rank of is if is the smallest integer so that there exists a tensor with and . We denote this integer by .

Since and , it is tempting to assume that . But, this is not the case. Instead, the weaker statement of is true. The concept of border rank captures this surprise and is one of the new features in multilinear algebra that is not present in linear algebra [4].

### 2.4 Inequalities between tensor ranks

Let be a tensor, and construct as in LABEL:eq:cyc_mat. Suppose that , , and . Then, the following inequalities hold [21, 28]:

 sj1≤tj≤max1≤i≤dti≤r≤min1≤j≤d1njd∏i=1ni. (2.6)

These inequalities remain valid when the ranks are replaced by numerical ranks too, but are generally not useful for determining whether a tensor is compressible. However, they do reveal an interdependency between the tensor formats. For example, if is small, then is low rank in any of the three tensor formats.

## 3 Tensors constructed via sampling algebraically structured functions

It is common in applications to encounter tensors that are constructed by sampling multivariate functions [18, 17]. For example, one can take a continuous function of three variables, , and sample on a tensor grid to obtain a tensor:

 Xijk=f(xi,yj,zk),1≤i,j,k≤n,

where , , and are sets of points.

### 3.1 Polynomials and algebraic structure

One common scenario where it is easy to spot mathematical tensor ranks is when the tensor is sampled from a polynomial. To be specific, if a tensor is derived by sampling a multivariate polynomial of degree at most in the variable from a tensor-product grid. Then, one finds that the tensor ranks of are bounded by a function of .

###### Lemma 3.1.

Let be a polynomial of degree at most in the variable for , and let be the tensor constructed by sampling , i.e.,

 Xi1,…,id=p(x(1)i1,…,x(d)id),1≤ij≤nj,1≤j≤d,

where are sets of nodes, respectively. Then,

• , where ,

• , and

• .

Here, the tensor-train decomposition is constructed in the order of .

###### Proof.

According to the degree assumptions on , we can write as

 p(x1,…,xd)=N1−1∑q1=0⋯Nk−1∑qk=0aq1,…,qk(xk+1,…,xd)xq11⋯xqkk,1≤k≤d,

where is a polynomial in the variables and for , has degree at most . After sampling, this means that and the bound on follows.

Another way to write is

 p(x1,…,xd)=Nk−1∑j=0bj(x1,…,xk−1,xk+1,…,xd)xjk,1≤k≤d,

where is a polynomial in of degree at most in . After sampling, this shows that and the bound on follows.

Finally, separating out , we can also write as

 p(x1,…,xd)=N1−1∑q1=0⋯Nd−1−1∑qd−1=0cq1,…,qd−1(xd)xq11⋯xqd−1d−1, (3.1)

where each term in LABEL:eq:CPpoly is a rank 1 tensor after sampling. We can do this to each variable and thus .

A special case of LABEL:lem:polyRanks is when , then the polynomial has maximal degree of at most 444We say that a polynomial has maximal degree if is a polynomial of degree at most in all the variables ., and the bounds on the tensor ranks of simplify to the following:

• ,

• , and

• .

The important observation is that tensors constructed by sampling polynomials on a sufficiently large tensor-product grid has tensor ranks that are bounded by , as opposed to .

### 3.2 Other special cases of algebraic structure

Similar to multivariate polynomials, it is easy to spot—after some experience—the tensor ranks of tensors constructed by sampling functions that have explicit algebraic structure since each variable in the function can be thought as a fiber of the tensor. The easiest ones to spot are those tensors derived from functions that are the sums of products of single-variable functions, such as

 f(x,y,z)=1+tan(x)y+y2z3,g(x,y,z,w)=cos(x)sin(y)+e10ze100w.

If and are tensors constructed by sampling and on a tensor-product grid, respectively, then the tensor ranks are bounded by

 rankTT(F)≤lex(1,2,2,1), rankML(F) ≤lex(2,3,2), rank(F)≤3, rankTT(G)≤lex(1,2,2,2,1), rankML(G) ≤lex(2,2,2,2), rank(G)≤2,

where the tensor-train decompositions are performed in the order . Other examples are functions that can be expressed with exponentials and powers.

Some special functions require reorganizations to reveal their algebraic structures. If the function is expressed with trigonometric functions, then the sampled tensor can often be low rank due to trigonometric identities. For example, since can be written as

 f(x,y,z)=(cos(x)cos(y)−sin(x)sin(y))cos(z)−(sin(x)cos(y)+cos(x)sin(y))sin(z),

any tensor constructed by sampling on a tensor-product grid satisfies

 rankTT(F)≤lex(1,2,2,1),rankML(F)≤lex(2,2,2),rank(F)≤4.

These examples can often be combined to build more complicated functions that result in low rank sampled tensors. This is an process and requires human ingenuity to express the sampled function in a revealing form. Again, tensors constructed by sampling such algebraically structured functions on a sufficiently large tensor-product grid has tensor ranks that are bounded by the structures, instead of the size of the sampling grid.

## 4 Tensors derived by sampling smooth functions

Although most functions do not have the algebraic structure specified in LABEL:sec:algebraic, tensors that are constructed by sampling smooth functions are often well approximated by low rank tensors. We make this observation precise by using the results in LABEL:sec:algebraic when is a multivariate polynomial.

An analogous observation is routinely made for matrices derived from sampling smooth bivariate functions with accompanying theoretical justifications already derived in the literature [30, 33].

### 4.1 The numerical tensor ranks of tensors derived by sampling smooth functions

In light of LABEL:lem:polyRanks, our idea to understand the numerical tensor ranks of a tensor derived from sampling a function is to first approximate that function by a multivariate polynomial, which is already a routine procedure for computing with low rank approximations to multivariate functions [9, 13].

Without loss of generality, suppose that is formed by sampling a smooth function on a tensor-product grid in , i.e.,

 Xi1,…,id=f(x(1)i1,…,x(d)id),1≤i1,…,id≤d, (4.1)

where are sets of nodes in . Our idea is to find a multivariate polynomial of degree in the variable that approximates in and then set . By LABEL:lem:polyRanks, has tensor ranks that only depend on and is an approximation to . In particular, we have

 ∥X−Y∥F≤(d∏i=1ni)12∥X−Y∥max≤(d∏i=1ni)12∥f−pN∥∞, (4.2)

where denotes the supremum norm on and is the absolute maximum entry norm. Therefore, if is a good approximation to , then is a good approximation to too.

One can now propose any linear or nonlinear approximation scheme to find a polynomial approximation of on . Clearly, excellent bounds on are obtained by finding a so that

 ∥f−p∥∞≈infq∈PN1,…,Nd∥f−q∥∞,

where is the space of -variate polynomials of maximal degree in for . This best multivariable polynomial problem is often, but not always, tricky to solve directly. Instead, we usually set to be the multivariate Chebyshev projection of . That is,

 pchebN1,…,Nd(x1,…,xd) =N1−1∑i1=0′⋯Nd−1∑id=0′ci1,…,idTi1(x1)⋯Tid(xd), ci1,…,id =(2π)d∫1−1⋯∫1−1f(x1,…,xd)Ti1(x1)⋯Tid(xd)√1−x21⋯√1−x2ddx1⋯dxd,

where the primes indicate that the first term in the sum is halved and is the Chebyshev polynomial of degree . Importantly, is a near-best polynomial approximation to  [35], and the error can be bounded. Thus, this choice of leads to bounds on the numerical tensor ranks of .

### 4.2 Worked examples

Here, we give two examples that illustrate how to employ our smoothness result and understand the numerical ranks of tensors constructed by sampling smooth functions. We consider two functions: (1) A Fourier-like function, where we use best polynomial approximation, and (2) A sum of Gaussian bumps, where we use Chebyshev approximation.

#### 4.2.1 Fourier-like function

Consider a tensor constructed by sampling the following Fourier-like function on a tensor-product grid [33]:

 f(x,y,z)=eiMxyz,x,y,z∈[−1,1],

where is a real parameter. While is a full rank tensor, numerically it has tensor ranks that depend linearly on . To see this, let and be the best minimax polynomial approximations of degree to and on , respectively, and define . Note that has maximal degree at most so that

 infwk−1∈Pk−1supx,y,z∈[−1,1]∣∣eiMπxyz−wk−1(x,y,z)∣∣ ≤supx,y,z∈[−1,1]∣∣eiMπxyz−sk−1(xyz)∣∣ =supt∈[−1,1]∣∣eiMπt−sk−1(t)∣∣,

where is the space of trivariate polynomials of maximal degree and the equality follows since if . Furthermore, we have and so

 supt∈[−1,1]∣∣eiMπt−sk−1(t)∣∣≤supt∈[−1,1]∣∣cos(Mπt)−pbestk−1(t)∣∣+supt∈[−1,1]∣∣sin(Mπt)−qbestk−1(t)∣∣.

By the equioscillation theorem [29, Thm. 7.4], for since equioscillates times in . Similarly, equioscillates times in and hence, for . However, for , decays super-geometrically to zero as . Hence, the numerical maximal degree, , of satisfies for some constant as , and by LABEL:lem:polyRanks, the numerical tensor ranks of can be estimated. For example, if is the first numerical tensor-train rank, then as .

Figure LABEL:fig:smooth_ex (left) plots the ratio of the first numerical tensor-train rank, , of a tensor sampled from the Fourier-like function and . We observe that as .

#### 4.2.2 A sum of Gaussian bumps

Consider a tensor constructed by sampling a sum of Gaussian bumps, centered at arbitrary locations in , i.e.,

 f(x,y,z)=M∑j=1e−γ((x−xj)2+(y−yj)2+(z−zj)2),γ>0. (4.3)

Each Gaussian bump is a separable function of three variables so, mathematically, the tensor ranks of depend linearly on . However, since the sum is a smooth function, the numerical tensor ranks are related to the polynomial degree required to approximate in to an accuracy of . Hence, the numerical tensor ranks of depend on and has very mild growth in .

Due to the symmetry in , , and as well as separability of each term in LABEL:eq:Bumps, we find that the Chebyshev approximation to can be bounded by

 supx,y,z∈[−1,1]∣∣ ∣∣f(x,y,z)−M∑j=1pjℓ(x)qjℓ(y)rjℓ(z)∣∣ ∣∣≤3Msupx∈[−1,1]∣∣e−γx2−sℓ(x)∣∣,

where , , , and are Chebyshev approximations of degree to , , , and , respectively. An explicit Chebyshev expansion for is known and given by [25, p. 32]

 e−γx2=∞∑j=0′(−1)je−γ/2Ij(γ/2)T2j(x),

where the prime on the summation indicates that the first term is halved, and is the modified Bessel function of the first kind with parameter  [27, (10.25.2)]. This means that one can show that [3]

 supx∈[−1,1]∣∣e−γx2−sℓ(x)∣∣≤2e−γ/4I⌊ℓ/2⌋+1(γ/4), sℓ(x)=ℓ∑j=0′(−1)je−γ/2Ij(γ/2)T2j(x).

By LABEL:lem:polyRanks and LABEL:eq:functionBound, the first numerical tensor-train rank of is bounded by the smallest integer such that .

Figure LABEL:fig:smooth_ex (right) shows the first numerical tensor-train rank and its bound of the tensor constructed by sampling the sum of Gaussian bumps with different values of . We observe that the bounds are relatively tight when is small.

## 5 Tensors with displacement structure

We say that has an -displacement structure of if satisfies the multidimensional Sylvester equation

 X×1A(1)+⋯+X×dA(d)=G,A(k)∈Cnk×nk, (5.1)

where ‘’ is the -mode matrix product of a tensor. In this section, we show that when