Geometry and matrix multiplication decompositions II

# The geometry of rank decompositions of matrix multiplication II: 3×3 matrices

Grey Ballard PO Box 7311, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 Christian Ikenmeyer Max Planck Institute for Informatics, Saarland Informatics Campus, Building E1.4, D-66123 Saarbrücken, Germany J.M. Landsberg Department of Mathematics
Texas A&M University
Mailstop 3368
College Station, TX 77843-3368, USA
and  Nick Ryder Department of Mathematics
UC Berkeley
###### Abstract.

This is the second in a series of papers on rank decompositions of the matrix multiplication tensor. We present new rank decompositions for the matrix multiplication tensor . All our decompositions have symmetry groups that include the standard cyclic permutation of factors but otherwise exhibit a range of behavior. One of them has 11 cubes as summands and admits an unexpected symmetry group of order 12.

We establish basic information regarding symmetry groups of decompositions and outline two approaches for finding new rank decompositions of for larger .

###### Key words and phrases:
matrix multiplication complexity, alternating least squares, MSC 68Q17, 14L30, 15A69
Landsberg supported by NSF grant DMS-1405348. Ballard supported in part by an appointment to the Sandia National Laboratories Truman Fellowship in National Security Science and Engineering, sponsored by Sandia Corporation (a wholly owned subsidiary of Lockheed Martin Corporation) as Operator of Sandia National Laboratories under its U.S. Department of Energy Contract No. DE-AC04-94AL85000.

## 1. Introduction

This is the second in a planned series of papers on the geometry of rank decompositions of the matrix multiplication tensor . Our goal is to obtain new rank decompositions of by exploiting symmetry. For a tensor , the rank of is the smallest such that , with . The rank of is a standard complexity measure of matrix multiplication, in particular, it governs the total number of arithmetic operations needed to multiply two matrices.

In this paper we present rank 23 decompositions of that have large symmetry groups, in particular all admit the standard cyclic -symmetry of permuting the three tensor factors. Although many rank 23 decompositions of are known [15, 13], none of them was known to admit the standard cyclic -symmetry.

We describe techniques to determine symmetry groups of decompositions and to determine if two decompositions are in the same family, as defined in §4. We also develop a framework for using representation theory to write down new rank decompositions for for all . Similar frameworks are also being developed implicitly and explicitly in [5] and [12].

As discussed below, decompositions come in families. DeGroote [8] has shown, in contrast to , the family generated by Strassen’s decomposition is the unique family of rank seven decompositions of . Unlike , it is still not known if the tensor rank of is indeed . The best lower bound on the rank is [17]. There have been substantial unsuccessful efforts to find smaller decompositions by numerical methods. While some researchers have taken this as evidence that might be optimal, it just might be the case that rank (or smaller) decompositions might be much rarer than border rank decompositions, and so using numerical search methods, and given initial search points, when they converge, with probability nearly one would converge to border rank decompositions.

In this paper we focus on decompositions with standard cyclic symmetry: viewed as a trilinear map, matrix multiplication is , where are matrices. Since , the matrix multiplication tensor has a symmetry by cyclically permuting the three factors. If one applies this cyclic permutation to a tensor decomposition of , it is transformed to another decomposition of . When a decomposition is transformed to itself, we say the decomposition has cyclic symmetry or -invariance.

In §2 we present three of our examples. We explain the search methods used in §3. We then discuss techniques for determining symmetry groups in §4 and determine the symmetry groups of our decompositions in §5. Our searches sometimes found equivalent decompositions but in different coordinates, and the same techniques enabled us to identify when two decompositions are equivalent. Further techniques for studying decompositions are presented in §6 and §8, respectively in terms of configurations of points in and eigenvalues. In §7, we precisely describe the subspace of -invariant tensors in , as well as the subspaces invariant under other finite groups. In an appendix §A, we present additional decompositions that we found.

### Why search for decompositions with symmetry?

There are many examples where the optimal decompositions (or expressions) of tensors with symmetry have some symmetry. This is true of [6, 4], the monomial [19], also see [16, §7.1], the optimal determinantal expression of [18], and other cases. In any case, imposing the symmetry i) reduces the size of the search space, and ii) provides a guide for constructing decompositions through the use of “building blocks”.

### Notation and conventions

are vector spaces, is the dual vector space to , denotes the group of invertible linear maps , the maps with determinant one, and the group of projective transformations of projective space . The action of on descends to an action of . If , denotes the corresponding point in projective space. denotes the permutation group on elements and denotes the cyclic group of order . For , denotes its pre-image under the projection map union the origin. denotes the quotient . We write , , and is the subgroup of diagonal matrices. For a matrix , denotes the entry in the -th row and -th column.

### Acknowledgements

Work on this paper began during the fall 2014 semester program Algorithms and Complexity in Algebraic Geometry at the Simons Institute for the Theory of Computing. We thank the Institute for bringing us together and making this paper possible.

## 2. Examples

### 2.1. A rank 23 decomposition of M⟨3⟩ with Z4×Z3 symmetry

Let be generated by

 (1) a0=⎛⎜⎝00−110−101−1⎞⎟⎠,

where the action on where are -matrices is and let denote the standard cyclic symmetry.

Here is the decomposition, we call it :

 (2) M⟨3⟩= −⎛⎜⎝00−110−101−1⎞⎟⎠⊗3 (3) (4) +Za04⋅⎛⎜⎝100000000⎞⎟⎠⊗3 (5) +Za04⋅⎛⎜⎝0−101−10000⎞⎟⎠⊗3 (6)

The decomposition is a sum of terms, each of which is a trilinear form on matrices. For example, the term sends a triple of matrices to the number . This expresses in terms of five -orbits, of sizes .

### 2.2. An element of the SZ4×Z3 family with a0 diagonalized

It is also illuminating to diagonalize , in other words decompose the decomposition with respect to the action: In the following plot each row of matrices forms an orbit under the -action. In the first four rows are the eleven matrices that appear as cubes in the decomposition. In remaining 4 rows each of the four columns forms three rank one tensors by tensoring the three matrices in the column in three different orders: 1-2-3, 2-3-1, and 3-1-2.

Each complex number in each matrix is depicted by plotting its position in the complex plane. To help identify the precise position, a square is drawn with vertices , , , . To quickly identify the absolute value of a complex number they are color coded:
Symbol for the complex number absolute value 1

Numbers with a blue background are the sum of two numbers with a yellow background. Numbers with a purple background are twice the numbers with a yellow background. Numbers with a red background are twice the numbers with a blue background. Numbers with a green background are the sum of a number with a yellow background and a number with a blue background.

Matrix cells with zeros are left empty.

### 2.3. A cyclic-invariant decomposition in the Laderman family

Let denote the rank Laderman decomposition of . By [4] the symmetry group of this decomposition (see §4) is , where . We found a decomposition, which we call that we identified as a -invariant member of the Laderman family in the sense of §4.

Let

 τ23=⎛⎜⎝100001010⎞⎟⎠, ϵ2=⎛⎜⎝1000−10001⎞⎟⎠, τ13=⎛⎜⎝001010100⎞⎟⎠,

Let , and let . Let be the generator of . Let denote the group generated by , and .

 (7) M⟨3⟩ =⎛⎜⎝000010000⎞⎟⎠⊗3 (8) +Γ/(Zπ3⋊Zζ2)⋅⎛⎜⎝000000001⎞⎟⎠⊗3 (9) +Γ/(Zπ3⋊Zζ2)⋅⎛⎜⎝−110−100000⎞⎟⎠⊗3 (10) +Γ/(Zϕ2⋊Zζ2)⋅⎛⎜⎝0−10000000⎞⎟⎠⊗⎛⎜⎝1−101−1−1011⎞⎟⎠⊗⎛⎜⎝000100000⎞⎟⎠ (11) +Γ/Zπ3⋅⎛⎜⎝100100000⎞⎟⎠⊗3

These five orbits are respectively of sizes . There are five invariant terms, one from each of (7),(8),(9) and two from (11) because preserves -invariance.

We originally found this decomposition by numerical methods. The incidence graphs discussed in §4.3 gave us evidence that it should be in the Laderman family, and then it was straightforward to find the transformation that exchanged and , namely . See §4.3 for more discussion. As shown in in [4], is isomorphic to and these are all the symmetries of the Laderman family. Translated to as presented in [4], these five orbits are respectively , , , and .

###### Remark 2.1.

As pointed out in [20], there are similarities between this decomposition and Strassen’s.

###### Remark 2.2.

The decompositions of Johnson-McLouglin [13] cannot have -invariant decompositions for rank reasons: The first space of decompositions has five terms (those numbered 3,12,16,22,23 in [13]) where there are two matrices of rank one and one of rank greater than one, while to have any external -symmetry, the number of such would have to be a multiple of three. The second space has a unique matrix of rank three appearing (23b), so is ruled out for the same reason.

### 2.4. Decomposition S2fix−Z3

Here is a decomposition with two -fixed points:

 (12) M⟨3⟩ =⎛⎜⎝100001001⎞⎟⎠⊗3 (13) +⎛⎜⎝00001−1000⎞⎟⎠⊗3 (14) +Zstd3⋅⎛⎜⎝010001000⎞⎟⎠⊗⎛⎜⎝00001−101−1⎞⎟⎠⊗⎛⎜⎝00010−1000⎞⎟⎠ (15) +Zstd3⋅⎛⎜⎝0−11000000⎞⎟⎠⊗⎛⎜⎝000000001⎞⎟⎠⊗⎛⎜⎝000000100⎞⎟⎠ (16) +Zstd3⋅⎛⎜⎝100100000⎞⎟⎠⊗⎛⎜⎝0−11000000⎞⎟⎠⊗⎛⎜⎝000000010⎞⎟⎠ (17) +Zstd3⋅⎛⎜⎝100001000⎞⎟⎠⊗⎛⎜⎝010001001⎞⎟⎠⊗⎛⎜⎝00010−101−1⎞⎟⎠ (18) +Zstd3⋅⎛⎜⎝100000000⎞⎟⎠⊗⎛⎜⎝001001001⎞⎟⎠⊗⎛⎜⎝0000001−10⎞⎟⎠ (19) +Zstd3⋅⎛⎜⎝000001000⎞⎟⎠⊗⎛⎜⎝010010010⎞⎟⎠⊗⎛⎜⎝000−110000⎞⎟⎠ (20) +Zstd3⋅⎛⎜⎝000001001⎞⎟⎠⊗⎛⎜⎝100100100⎞⎟⎠⊗⎛⎜⎝−110000000⎞⎟⎠

An interesting feature of this decomposition is that it “nearly” has a transpose-like symmetry, as discussed in later sections.

## 3. Discussion on the numerical methods used

Our techniques for discovering cyclic-invariant decompositions use numerical optimization methods that are designed to compute approximations rather than exact decompositions. We also use heuristics in order to encourage sparsity in the solutions, and a fortunate by-product of the sparsity is that the nonzero values often tend towards a discrete set of values from which an exact decomposition can be recognized. Our methods are based on techniques that have proved successful in discovering generic exact decompositions (those that have no noticeable symmetries) [1, 22]; we summarize this approach in Section 3.1. Our search process can be divided into two phases. First, as we discuss in Section 3.2, we find a dense, cyclic-invariant, approximate solution using nonlinear optimization methods. Then, as we describe in Section 3.3, we transform the dense approximate solution to an exact cyclic-invariant decomposition using heuristics that encourage sparsity.

### 3.1. Alternating Least Squares

Computing low-rank approximations of tensors is a common practice in data analysis when the data represents multi-way relationships. The most popular algorithm for computing approximations with CANDECOMP/PARAFAC (CP) structure, i.e., rank decompositions, is known as alternating least squares (ALS) [14]. In particular, in the case of the matrix multiplication tensor, the objective function of the optimization problem is given by

 (21) argminX,Y,Z∥∥M⟨n⟩−\tsum\slimits@Rr=1xr⊗yr⊗zr∥∥,

where , , and are factor matrices with th columns given by , , and and this and all norms correspond to the square root of the sum of squares of the entries of the tensor (or matrix). The objective function itself is nonlinear and non-convex and cannot be solved in closed form. However, if two of the three factor matrices are held fixed, then the resulting objective function is a linear least squares problem and can be solved using linear algebra. Thus, ALS works by alternating over the factor matrices, holding two fixed and updating the third, and iterating until convergence.

In general, ALS iterations are performed in floating point arithmetic. While an objective function value of 0 corresponds to a rank decomposition, with ALS we can hope for an objective function value only as small as the finite precision allows. In the case of the matrix multiplication tensor, there are multiple pitfalls that make it difficult to find approximations that approach objective function values of 0. The most successful technique was proposed by Smirnov and uses regularization, with objective function given by

 (22)

for judicious choices of scalar and matrices , , and [22]. We discuss effective choices for the regularization parameters in Section 3.3. This method works for the cases of non-square matrix multiplication and has been used to discover exact rank decompositions for many small cases [1, 22], but it does not encourage solutions to reflect any symmetries.

### 3.2. Nonlinear Optimization for Dense Approximations

To enforce cyclic invariance on rank decompositions, we can impose structure on the factor matrices , , and . In particular, if a cyclic-invariant approximation includes the component , then it must also include and . This implies either that all three components appear or that , which means the factor matrices have the following structure:

 X =(ABCD) (23) Y =(ADBC) Z =(ACDB),

where is an matrix and are matrices with . With this structure, (21) becomes

 (24) argminA,B,C,D∥∥M⟨n⟩−\tsum\slimits@Pp=1ap⊗ap⊗ap−\tsum\slimits@Qq=1(bq⊗cq⊗dq+dq⊗bq⊗cq+cq⊗dq⊗bq)∥∥.

Note that the total number of variables (now spread across 4 matrices) is reduced by a factor of 3.

Again, this objective function is nonlinear and non-convex. It is possible to use an ALS approach to drive the objective function value to 0; however, while (24) is linear in , , and , it is not linear in . Thus, the optimal update for , for fixed , , and , cannot be computed in closed form. While there are numerical optimization techniques for , we were not successful in using an ALS approach to drive the objective function value close to zero.

Instead, we used generic nonlinear optimization software to search for cyclic-invariant approximations. In particular, we used the LOQO software package [23], which relies on the AMPL [9] modeling language to specify the optimization problem. In order to find solutions, we try all possible values of and , use multiple random starting points, and constrain the variable entries to be no greater than 1 in absolute value. For fixed , there are possible values for and . Multiple starting points are required because the objective function is non-convex: numerical optimization techniques are sensitive to starting points in this case, with approximations often getting stuck at local minima. Constraining the variables to be no greater than 1 in absolute value is a technique to avoid converging numerically to border rank decompositions, in which case some variable values must grow in magnitude to continue to improve the objective function value. Driving the objective function value to zero with bounded variable values ensures that the approximation corresponds to a rank decomposition.

When we are successful, we obtain matrices that correspond to an objective function value very close to zero. Thus, the approximation has the cyclic invariance we desire. However, these matrices are dense and have floating point values throughout the range. Rounding these floating point values, even to a large set of discrete rational values, typically does not yield an exact rank decomposition. The next section describes the techniques we use to convert dense approximate solutions to exact solutions.

### 3.3. Heuristics to Encourage Sparsity for Exact Decompositions

In order to obtain exact decompositions, we use the ALS regularization heuristics that proved effective for the non-invariant case, given in equation (22). However, those techniques ignore the cyclic-invariant structure in the dense approximations we obtain from the techniques described in §3.2. The heuristic we used to discover cyclic-invariant rank decomposition consists of alternating between ALS iterations with regularization and projection of the approximation back to the set of cyclic-invariant solutions.

We now describe the specifics of the regularization terms from (22). The scalar parameter determines the relative importance between an accurate approximation of the matrix multiplication tensor and adherence to the regularization terms. In the context of ALS, only one of the regularization terms affects the optimization problem when updating one factor matrix. The target matrices , , are designed to encourage the corresponding factor matrices to match a desired structure, and they can be defined differently for each iteration of ALS. Here is the technique proposed by Smirnov [22]. Consider the update of factor matrix . By default, is set to have the values of from the previous iteration. If any of the values are larger than 1 (or any specified maximum value), then the corresponding value of is set to magnitude 1 with the corresponding sign. Then, for a given number of desired zeros, the smallest values of are set to exactly 0. Thus, the regularization term will encourage any large values of to tend towards and the smallest values to tend toward 0. The parameter can be varied over iterations and also across factor matrices (though in the case of cyclic-invariant approximations the set of values within each factor matrix is the same as those of the other factor matrices).

Because ALS does not enforce cyclic invariance, the approximation will tend to deviate from the structure given in equation (23). We project back to a cyclic-invariant approximation by setting all three values that should be the same in each of the factor matrices to their average value.

Fortunately, perhaps miraculously, by encouraging sparsity and bounding the variable values, when many entries are driven to zero, the nonzero values tend towards a discrete set of values (usually ). However, the process of converting a dense approximation to an exact decomposition involves manual tinkering and much experimentation. The basic approach we have used successfully is to maintain a tight approximation to the matrix multiplication tensor, start with , gradually increase , frequently project back to cyclic invariance, and play with the parameter between values of and . ALS iterations are relatively cheap, so often 100 or 1000 iterations can be taken with a given parameter setting before changing the configuration. While this process is artful and lacks any guarantees of success, we have nearly always been able to convert cyclic-invariant dense approximations of to exact decompositions.

## 4. Symmetry groups of tensors and decompositions

In this section we explain how we found the additional symmetries beyond the that was built into the search, and describe the full symmetry groups of the decompositions. We establish additional properties regarding symmetries for use in future work. We begin with a general discussion of symmetry groups of tensors and their decompositions.

### 4.1. Symmetry groups of tensors

Let be a complex vector space and let . Define the symmetry group of , to be the subgroup preserving , where acts by permuting the factors.

For a rank decomposition , where each has rank one, i.e., , define the set , which we also call the decomposition, and the symmetry group of the decomposition . We also consider as a point of the variety of -tuples of unordered points on the Segre variety of rank one tensors, denoted .

If , then is also a rank decomposition of , and (see [6]), so decompositions come in families parametrized by , and each member of the family has the same abstract symmetry group. We reserve the term family for -orbits. The quasi-projective subvariety of all rank decompositions is a union of orbits.

If one is not concerned with the rank of a decomposition, then for any finite subgroup , admits rank decompositions with , by taking any rank decomposition of and then averaging it with its -translates. We will be concerned with rank decompositions that are minimal or close to minimal, so only very special groups can occur.

### 4.2. Matrix multiplication

The symmetry groups of matrix multiplication decompositions are useful for determining if two decompositions lie in the same family, and the groups that appear in known decompositions will be a guide for constructing decompositions in future work.

The symmetry groups of many of the decompositions that have already appeared in the literature are determined in [5]. Burichenko does not use the associated graphs discussed below. One can recover the results of [5] with shorter proofs by using them.

Let where . Recall that is the re-ordering of and

 (25) GM⟨n⟩=[(PGL(U)×PGL(V)×PGL(W))⋊Z3]⋊Z2⊂GL(A)×GL(B)×GL(C)⋊S3,

see, e.g., [7, Thms. 3.3,3.4], [6, Prop. 4.1], [11, Thm. 2.9] or [5, Prop. 4.7]. The may be generated by e.g., , and the by cyclically permuting the factors . The is isomorphic to , but it is more naturally thought of as , since it is not the appearing in (25).

Thus if is a rank decomposition of , then .

###### Remark 4.1.

As pointed out by Burichenko [4], for matrix multiplication rank decompositions one can define an extended symmetry group by viewing . We do not study such groups in this paper.

We call a a transpose like symmetry if it corresponds to the symmetry of given by , or a cyclic variant of it such as , composed with an element of such that the total map is an involution on the elements of . Transpose like symmetries where the elements of are all the identity (which we will call convenient transpose symmetries) do not appear to be compatible with standard cyclic symmetries in minimal decompositions, at least this is the case for rank seven decompositions of and the known rank decompositions of .

###### Example 4.2.

Let , be a basis of , a basis of , and a basis of . Consider the standard decomposition of