Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication

# Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication

Josh Alman111MIT CSAIL and EECS, jalman@mit.edu. Supported by two NSF Career Awards.    Virginia Vassilevska Williams222MIT CSAIL and EECS, virgi@mit.edu. Partially supported by an NSF Career Award, a Sloan Fellowship, NSF Grants CCF-1417238, CCF-1528078 and CCF-1514339, and BSF Grant BSF:2012338.
###### Abstract

We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on monomial degenerations, which we call the Galactic method.

We then design a suite of techniques for proving lower bounds on the value of , the exponent of matrix multiplication, which can be achieved by algorithms using many tensors and the Galactic method. Some of our techniques exploit ‘local’ properties of , like finding a sub-tensor of which is so ‘weak’ that itself couldn’t be used to achieve a good bound on , while others exploit ‘global’ properties, like being a monomial degeneration of the structural tensor of a group algebra.

Our main result is that there is a universal constant such that a large class of tensors generalizing the Coppersmith-Winograd tensor cannot be used within the Galactic method to show a bound on better than , for any . We give evidence that previous lower-bounding techniques were not strong enough to show this. We also prove a number of complementary results along the way, including that for any group , the structural tensor of can be used to recover the best bound on which the Coppersmith-Winograd approach gets using as long as the asymptotic rank of the structural tensor is not too large.

## 1 Introduction

A fundamental problem in theoretical computer science is to determine the time complexity of Matrix Multiplication (MM), one of the most basic linear algebraic operations. The question typically translates to determining the exponent of matrix multiplication: the smallest real number such that the product of two matrices over a field can be determined using operations over . Trivially, . Many have conjectured over the years that . This conjecture is extremely attractive: a near-linear time algorithm for MM would immediately imply near-optimal algorithms for many problems.

Almost years have passed since Strassen [Str69] first showed that . Since then, an impressive toolbox of techniques has been developed to obtain faster MM algorithms, culminating in the current best bound [LG14, Wil12]. Unfortunately, this bound is far from , and the current methods seem to have reached a standstill. Recent research has turned to proving limitations on the two main MM techniques: the Laser method of Strassen [Str86] and the Group theoretic method of Cohn and Umans [CU03].

Both Coppersmith and Winograd [CW90] and Cohn et al. [CKSU05] proposed conjectures which, if true, would imply that . The first conjecture works in conjunction with the Laser method, and the second with the Group-theoretic method. The first “technique limitation” result was by Alon, Shpilka and Umans [ASU13] who showed that both conjectures would contradict the widely believed Sunflower conjecture of Erdös and Rado.

Ambainis, Filmus and Le Gall [AFLG15] formalized the specific implementation of the Laser method proposed by Coppersmith and Winograd [CW90] which is used in the recent papers on MM. They gave limitations of this implementation, and in particular showed that the exact approach used in [CW90, DS13, LG14, Wil12] cannot achieve a bound on better than . The analyzed approach, the “Laser Method with Merging”, is a bit more general than the approaches in [CW90, DS13, LG14, Wil12]: in a sense it corresponds to a dream implementation of the exact approach.

Blasiak et al. [BCC17a] considered the group theoretic framework for developing MM algorithms proposed by Cohn and Umans [CU03], and showed that this approach cannot prove using any fixed abelian group. In follow-up work, Sawin [Saw17] extended this to any fixed non-abelian group, and Blasiak et al. [BCC17b] extended it to a host of families of non-abelian groups.

Alman and Vassilevska W. [AW18] considered a generalization of the Laser method and proved limitations on this generalization when it is applied to any tensor which is a monomial degeneration of the structure tensor of the group algebra of the cyclic group of order . (See Section 3 for the definitions.) The bounds on achieved by known implementations of the Laser method [Str86, CW90, DS13, LG14, Wil12] can all be obtained from tensors of this form. The formalization also subsumes the group theoretic approach applied to . The main result of [AW18] is that this generalized approach cannot achieve for any fixed .

All limitations proven so far suffer from several weaknesses:

• All three of [BCC17a], [BCC17b] and [AW18] show how some approach that can yield the current best bounds on cannot give . None of the three works actually prove that one cannot use the particular tensor used in recent work [CW90, DS13, Wil12, LG14] to show . [AW18] proved this limitation for a rotated version of , but only for small . Although [BCC17a] and [BCC17b] do not say which version their proofs apply to, in this paper we give evidence that does not embed easily in a group tensor, and so it is likely that their proofs could also only apply to a rotated version of , and not to itself. Moreover, even for the Coppersmith-Winograd-like tensors for which the known limitations do apply, it is only shown that for a fixed one cannot derive . In particular, so far the lower bounds on what one can achieve for a value approached . This left open the possibility to prove by analyzing in the limit as .

• All limitations proven so far are for very specific attacks on proving . While the proofs of [AFLG15] apply directly to , they only apply to the restricted Laser Method with Merging, and no longer apply to slight changes to this. The proofs in [BCC17a] and [BCC17b] are tailored to the group theoretic approach and do not apply (for instance) to the Laser method on “non-group” tensors. While the limits in [AW18] do apply to a more general method than both the group theoretic approach and the Laser method, they only work for specific types of tensors, which in particular do not include .

##### Our results.

All known approaches to matrix multiplication follow the following outline. First, obtaining a bound on corresponds to determining the asymptotic rank of the matrix multiplication tensor (see the Preliminaries for a formal definition). Because getting a handle on this asymptotic rank seems difficult, one typically works with a tensor (or a tensor family) whose asymptotic rank is known. Then, to analyze the asymptotic rank of matrix multiplication, one considers large tensor powers of and attempts to “embed” into for large without increasing the asymptotic rank. In effect, one is showing that the recursive time algorithm for computing can be used to multiply matrices. This gives a bound on from . The larger is in terms of , the smaller the bound on .

When embedding matrix multiplication into a tensor power , we would like the embedding to have the property that if embeds in , then the asymptotic rank of is upper bounded by the asymptotic rank of . This way, our embedding gives an upper bound on the asymptotic rank of matrix multiplication, and hence on . The most general type of embedding that preserves asymptotic rank in this way is a so called degeneration of the tensor . A more restricted type of rank-preserving embedding is a so called monomial degeneration. The embeddings used in all known approaches for upper bounding so far are even more restricted zeroing outs. The laser method is a restricted type of zeroing out that has only been applied so far to tensors that look like matrix multiplication tensors or to ones related to the Coppersmith-Winograd tensor. The group theoretic approach gives clean definitions that imply the existence of a zeroing out of a group tensor into a matrix multiplication tensor. (See the preliminaries for formal definitions.)

We define three very general methods of analyzing tensors. There are no known techniques to analyze tensors in this generality.

• The Solar Method applied to a tensor of asymptotic rank considers for large , then considers all possible ways to zero out into a disjoint sum of matrix multiplication tensors, giving a bound on from the asymptotic sum inequality of , and then takes the minimum (or ) of all bounds on which can be achieved in this way. This method already subsumes both the group theoretic method and the laser method. It is also much more general, as it is unclear whether the two known techniques produce the best possible zeroing outs even for specific tensors.

• The Galactic Method replaces the zeroing out in the Solar Method with more powerful monomial degenerations. Since monomial degenerations are strictly more powerful than zeroing outs in general, this leads to even more possible embeddings of disjoint sums of matrix multiplication tensors.

• The Universal Method again replaces the monomial degenerations of the Galactic Method with the even more powerful degenerations.

We note that the methods only differ when they are applied to the same tensor . Trivially, any one of the methods can find the best bound on if it is “applied” to itself. Starting with the same tensor , however, the Universal method can in principle give much better bounds on than the Solar or Galactic methods applied to the same .

For a tensor , let be the best bound on that one can obtain by applying the Galactic method to . We define a class of generalized tensors that contain and many more tensors related to it, such as the rotated tensor used in [AW18]. Our main result is:

###### Theorem 1.1 (Informal).

There is a universal constant independent of so that for every one of the generalized tensors , .

Thus, if one uses a generalized CW tensor, even in the limit and even if one uses the Galactic method subsuming all known approaches, one cannot prove .

To prove this result, we develop several tools for proving lower bounds on for structured tensors. Most are relatively simple combinatorial arguments but are still powerful enough to show strong lower bounds on .

We also study the relationship between the generalized tensors and the structure tensors of group algebras. We show several new results:

1. A Limit on the Group-Theoretic Approach. The original tensor is not a sub-tensor (and hence also not a monomial degeneration) of the structure tensor of for any of order when (a) is abelian and arbitrary, or (b) is non-abelian and . Note that for these small values of are of particular interest: the best known bounds on have been proved using . This shows that lower bound techniques based on tri-colored sum-free sets and group tensors cannot be easily applied to .

2. All Finite Groups Suffice for Current Bounds. Every finite group has a monomial degeneration to some generalized CW tensor of parameter . Thus, applying the Galactic method on for every (with sufficiently small asymptotic rank, i.e. ) can yield the current best bounds on .

3. New Tri-Colored Sum-Free Set Constructions. For every finite group , there is a constant depending only on such that its th tensor power has a tri-colored sum-free set of size at least . For moderate , the constant is quite a bit larger than . To our knowledge, such a general result was not known until now.

For more details on our results, see Section 2 below.

## 2 Overview of Results and Proofs

In this section, we give an outline of our techniques which are used to prove our main result: that there exists a universal constant such that the Galactic method, when applied to any generalized Coppersmith-Winograd tensor, cannot prove a better upper bound on than . We will assume familiarity with standard notions and notation about tensors related to matrix multiplication algorithms in this section; we refer the reader to the Preliminaries, in Section 3, where these are defined. For a tensor , we will write to denote the best upper bound on which can be achieved using the Galactic method applied to .

##### Step 1: The Relationship Between Matrix Multiplication and Independent Tensors.

In Section 4, we begin by laying out the main framework for proving lower bounds on . The key is to consider a different property of , the asymptotic independence number of , denoted . Loosely, gives a measure of how large of an independent tensor can monomial degenerate into for large . From the definition, we will get a simple upper bound , the asymptotic rank of . By constructing upper bounds on , we will show in Corollary 4.3 that:

• For any tensor , if , then , and moreover,

• For every constant , there is a constant (which is increasing as decreases), such that if , then .

Hence, upper bounds on give lower bounds on . We will thus present a number of different ways to prove upper bounds on in the next steps.

##### Step 2: Partitioning Tools for Upper Bounding ~I.

In Section 5, we present our first suite of tools for proving upper bounds on . These tools are based on finding ‘local’ combinatorial properties of the tensor which imply that can’t be too large. They are loosely summarized as follows; in the below, let be a tensor over :

• Theorem 5.1: Let be any subset of the -variables of , and let be the tensor restricted to (i.e. with all the variables in zeroed out). If is sufficiently smaller than , then .

In other words, if has a sufficiently small so that it is relatively far away from being able to prove , then no matter how we complete to get to , the tensor will still not be able to prove .

• Theorem 5.2: If is a tensor such that is close to , then there is a probability distribution on the terms of such that each , , and variable is assigned almost the same probability mass.

For many tensors of interest, one or more of the variables ‘behave differently’ from the rest, and this can be used to prove that such a probability distribution cannot exist. For one example, we prove in Corollary 5.1 that if is a tensor with two ‘corner terms’ – terms such that no other term in contains either or – then, .

These ‘corner terms’ are actually quite common in tensors which have been analyzed with the Laser Method. For instance, one of the main improvements of Coppersmith-Winograd [CW90] over Strassen [Str86] was noticing that the border rank expression of Strassen could be augmented by adding in three corner terms, resulting in the Coppersmith-Winograd tensor.

• Theorem 5.3: For a tensor over variables , where each of these variables appears in the support of , we define the measure of , denoted , by . Suppose the terms of can be partitioned333We mean ‘partitioned’ as in a set partition, not any restricted notion like the ‘block partitions’ of the Laser Method. into tensors . Then, .

This gives a generalization of the basic inequality that . Whenever can be partitioned up into parts which each do not have many of one or more type of variable, we can get a nontrivial upper bound on . Many natural border rank expressions naturally give rise to such partitions, as do the ‘blockings’ used in the Laser method.

As we will see, is neither additive nor multiplicative, i.e. there are tensors and such that , and tensors and such that . One of the main components of the proofs of correctness of each of the three tools above will be narrowing in on classes of tensors and such that is not too much greater than , or classes of tensors and such that is not too much greater than . Our proofs will then manipulate our tensors using partitionings so that they fall into these classes.

##### The Main Result.

The three partitioning tools are designed to be useful for proving nontrivial upper bounds on for general classes of tensors. They are especially well-suited to tensors which have structures that make them amenable to known techniques like the Laser Method. In particular, we will ultimately show that any generalized Coppersmith-Winograd tensor has all three of these properties. Indeed, our main result, Theorem 7.1, follows from these tools: For any generalized CW tensor , a lower bound on for small will follow from Corollary 5.1, and a lower bound on as gets large (but such that the bound gets larger as increases, not smaller) will follow from either Theorem 5.1 or Theorem 5.3.

##### Bounds on ~I for Group Tensors.

In addition to the above, we also study group tensors. For a finite group , we call the structural tensor of the group algebra the group tensor of . We are able to achieve both nontrivial upper bounds and lower bounds on for any finite group , including non-abelian groups.

##### Upper Bounds on ~I(TG).

We first show that for any finite group , we have , and hence . In other words, no fixed group can yield by using the Galactic method applied to . By comparison, the Group Theoretic approach for can be viewed as analyzing using a particular technique within the Solar method (see Section 3.4 for more details). This therefore generalizes a remark which is already known within the Group Theoretic community [BCC17b]: that the Group Theoretic approach (using the so-called ‘Simultaneous Triple Product Property’) cannot yield using any fixed finite group . It does not, however, rule out using a sequence of groups whose lower bounds approach .

Our proof begins by proving a generalization of a remark from [AW18]: that lower bounds on give rise to constructions of ‘tri-colored sum-free sets’ in for sufficiently large integer ([AW18] proved this when is a cyclic group, although our proof is almost identical). Tri-colored sum-free sets are objects from extremal combinatorics which have been studied extensively recently. We will, in particular, use a recent result of Sawin [Saw17], who showed that for any finite group , there is a sufficiently large such that does not have particularly large tri-colored sum-free sets.

We give this proof in Section 6. In that section, we also show that there are natural tensors, like the Coppersmith-Winograd tensors used to give the best known upper bounds on , which cannot even be written as sub-tensors of relatively small group tensors. In other words, the high-powered hammer that cannot be used to give lower bound for every tensor of interest, and other techniques like the combinatorial partitioning techniques from step 2 above are needed.

##### Lower Bounds on ~I(TG)

Although our main framework involves proving upper bounds on for tensors in order to prove lower bounds on , step 1 of our proof actually involves constructing lower bounds on when has a monomial degeneration to a matrix multiplication tensor. In Section 7.2, we use this to give lower bounds on for any finite group .

We show in Theorem 7.2 that for any finite group , there is a monomial degeneration of into a generalized Coppersmith-Winograd tensor of parameter . We will see that the Laser method applies just as well to any generalized Coppersmith-Winograd tensor of parameter as it does to the original , and so the best-known approach for finding matrix multiplication tensors as monomial degenerations of a tensor can be applied to any group tensor as well. Two important consequences of this are:

1. For any group such that , we can use the Galactic method to achieve the best known upper bound on (that is known from ) by using as the underlying tensor instead of the Coppersmith-Winograd tensor. We think this has exciting prospects for designing new matrix multiplication algorithms; see Remark 7.1 for further discussion.

2. Once has been monomial degenerated into a Coppersmith-Winograd tensor, and thus a matrix multiplication tensor, we can then apply the tools from step 1 above to show that has a monomial degeneration to a relatively large independent tensor. In particular, we show that for any group , for some constant which depends only on . Combining this with the connection between and tri-colored sum-free sets in , we see that for any finite group , has a tri-colored sum-free set of size at least . See Theorem 7.3 and the remainder of Section 7.2 for the details. We will find that is much bigger than for reasonable ; for instance, that for .

## 3 Preliminaries

### 3.1 Tensor Notation and Definitions

Let , , and be three sets of formal variables. A tensor over is a trilinear form

 T=∑xi∈X,yj∈Y,zk∈ZTijkxiyjzk,

where the coefficients come from an underlying field . One writes , and the triads are typically written ; we omit the for ease of notation. When , and are clear from context, we will just call a tensor. The support of a tensor are all triples for which . The size of a tensor , denoted , is the size of its support. We will write to denote that is in the support of , and in this case we call a term of . We will call elements of the ‘-variables of ’, and similarly for and .

If and , then the tensor product (or Kronecker product) of and , denoted , is a tensor in over new variables given by

 A⊗B=∑(i,i′)∈[k]×[k′](j,j′)∈[m]×[m′](k,k′)∈[n]×[n′]AijkBi′j′k′¯xii′¯yjj′¯zkk′.

The th tensor power of a tensor , denoted , is the result of tensoring copies of together, so , and .

Intuitively, if is over and is over , then the variables of can be viewed as pairs of the original variables . We will use this view in some of our proofs. For instance, when considering we will often view the , and variables of as ordered -tuples of , and variables of . Then we can discuss for instance, in how many positions of an variable of , the variable of appears.

#### 3.1.1 Tensor Rank

A tensor has rank one if there are values for each , for each , and for each , such that , or in other words,

 T=∑xi∈X,yj∈Y,zk∈Zaibjck⋅xiyjzk=⎛⎝∑xi∈Xaixi⎞⎠⎛⎝∑yj∈Ybjyj⎞⎠⎛⎝∑zk∈Zckzk⎞⎠.

More generally, the rank of , denoted , is the smallest nonnegative integer such that can be written as the sum of rank-one tensors.

Let be a formal variable, and suppose is a tensor over . The border rank of , denoted by444Much of the literature uses for border rank; we instead use for ease of notation. , is the smallest such that there is a tensor with coefficients in (polynomials in ), so that for every setting of , evaluated at has rank , and so that there is an integer for which:

 λhT=T+O(λh+1).

The above notation means that for every , the polynomial over has no monomials with with , and the coefficient in front of in is exactly . In a sense, the family of rank tensors for can get arbitrarily close to – if , then we could think of taking and then .

The asymptotic rank of a tensor is defined as . The limit exists and equals . It is known that for any tensor ,

 R(T)≥¯R(T)≥~R(T),

and that each of these inequalities can be strict555For example, the first inequality is strict for the Coppersmith-Winograd tensor, and the second inequality is strict for the matrix multiplication tensor. Both of these tensors will be defined shortly.. One of the most common ways to show asymptotic rank upper bounds is to give border rank upper bounds, frequently using a tool called a ‘monomial degeneration’ which we will define shortly.

The tensor in is defined as follows: for all , and for all other entries . clearly has rank ; it is the natural generalization of an identity matrix. If a tensor is equivalent to up to permutation of the indices, we say that is an independent tensor of size .

#### 3.1.2 Sub-Tensors and Degenerations

We call a tensor a sub-tensor of a tensor , denoted by , if can be obtained from by removing triples from its support, i.e. for every , either , or .

A tensor is a restriction of a tensor , written , if there are homomorphisms , , and , so that .666The notation means the following. Let be any decomposition of into a sum of rank tensors, where . Then is well-defined. The rank of is if and only if .

A special type of restriction is the so called zeroing out (also called combinatorial restriction): let be a tensor over ; is a zeroing out of if it is obtained by selecting and setting to zero all ; thus, is a tensor over and it equals on all triples over these sets.

A degeneration of a tensor , written , is obtained as follows. Similarly to the definition of border rank, let be a formal variable. We say that if there exist , matrices with entries which are polynomials in (i.e. in ), so that

 λqt′=(A(λ)⊗B(λ)⊗C(λ))t+O(λq+1).

Similarly to the relationship between rank and restriction, the border rank of is at most if and only if .

A special type of degeneration is the so called monomial degeneration (also called combinatorial degeneration or toric degeneration), in which the matrices have entries that are monomials in . An equivalent definition of monomial degeneration [AW18] is as follows: suppose that is a tensor over , is a sub-tensor, and there are functions , , and such that (1) whenever , , (2) if , then , and (3) if , then .

#### 3.1.3 Structural Properties of Tensors

We say that a tensor is partitioned into tensors , if , and for every triple , there is a such that and for all , . In other words, the triples in the support of are partitioned into parts, forming tensors summing to .777Note that this notion of partitioning is more general than ‘block partitioning’ from the Laser Method (which we define shortly), although ‘block partitioning’ is occasionally referred to as just ‘partitioning’ in the literature.

A direct sum of two tensors and over disjoint variable sets and , is the tensor on variable sets which is exactly on triples in , exactly on triples in , and is on all other triples. In contrast, a regular sum could have and share variables.

Similar to how a matrix in can be viewed as a linear map from to , a tensor in can be viewed as a linear map which maps to . One can also exchange the roles of the and variables, so that can also be viewed as a linear map , or a linear map . The tensor is called concise if are injective. It is not hard to see that , so that for concise tensors, . All the explicit tensors we will discuss throughout this paper, including the tensor of matrix multiplication, and the Coppersmith-Winograd tensor, are concise.

### 3.2 The Matrix Multiplication Tensor and Methods for Analyzing ω

Let be integers. The tensor of by matrix multiplication over a field , denoted by , lies in , and in trilinear notation looks like this:

 ⟨m,n,p⟩=m∑i=1n∑j=1p∑k=1xijyjkzki.

The theory of matrix multiplication algorithms is concerned with determining the value , defined as . (As shown by Coppersmith and Winograd [CW82], is a limit point that cannot be achieved by any single algorithm.)

Getting a handle on has been difficult. Over the years various methods have been developed to obtain better understanding of the rank of . The basic idea of all methods is as follows: Although we do not know what the true rank of is, as grows, there are many other tensors for which we know their rank and even their asymptotic rank exactly. Hence, the approach is, take a tensor whose asymptotic rank we understand, take a large tensor power of , and “embed” into so that the embedding shows that . From this inequality we can get a bound on , by taking to . More generally, by Schönhage’s Asymptotic Sum Inequality (Theorem 3.1 below), it is actually sufficient to embed the direct sum of many smaller copies of matrix multiplication tensors into to get a similar bound on .

The way in which the approaches differ is mainly in how the embedding into is obtained. All known approaches to embed a matrix multiplication tensor into a tensor power of some other tensor actually all zero out variables in and argue that after the zeroing out, the remaining tensor is a matrix multiplication tensor.

There are two main approaches for obtaining good bounds on via zeroing out : the laser method and the group theoretic approach. We will describe them both shortly.

Zeroing out is a very restricted border-rank preserving operation on a tensor. The most general embedding of a matrix multiplication tensor into would be a potentially complicated degeneration of . In fact, in this case, since every border rank tensor is a degeneration888This folklore fact follows from inverting the DFT over cyclic groups; see eg. [AW18, Section 3.1]. of the structure tensor for addition modulo , , it would suffice to find a degeneration of into a large matrix multiplication tensor, for large . Unfortunately, we currently do not have techniques to find good degenerations. We call this hypothetical method the Universal method.

Instead of considering arbitrary degenerations of , we could instead consider monomial degenerations of into a large matrix multiplication tensor. This approach would subsume both the Laser Method and the Group Theoretic approach. Although again there are no known techniques to obtain better monomial degenerations than zeroing outs, monomial degenerations seem easier to argue about than arbitrary degenerations. We call the method of finding the optimal (with respect to bounding ) monomial degeneration of a tensor power into a matrix multiplication tensor, the Galactic method. (Reaching the end of our Galaxy is more feasible than seeing the entire Universe.) To complete the analogy, we can call the method using zeroing outs the Solar method (i.e. exploring the Solar System).

The Solar method subsumes the Group Theoretic Approach and the Laser Method, but is more general, and current techniques do not suffice to find the optimal zeroing-out of into matrix multiplication even for simple tensors. Our lower bounds will be not only for the Solar method, but also for the Galactic method which is even more out of reach for the current matrix multiplication techniques.

To be clear, the Solar method, Galactic method, and Universal method, give us successively more power when analyzing specific tensors. For example, it may be the case that for a specific tensor , the Solar method applied to cannot get as low an upper bound on as the Universal method applied to can. This captures the known methods to get bounds on by using tensors like the Coppersmith-Winograd tensor or a group tensor, which we will define shortly. The three different methods will trivially give the same bound, , when applied to matrix multiplication tensors themselves, but this is not particularly interesting: the entire point of these different methods is that the asymptotic rank of matrix multiplication tensors is not well-understood, and applying the methods to other tensors can help us get better bounds on it.

We will now describe the two approaches that follow the Solar method.

### 3.3 The Laser Method

Strassen [Str86] proposed a method for embedding a matrix multiplication tensor into a large tensor power of a starting tensor. He called it the Laser Method. In this method, we start with a tensor over variables , , of asymptotic rank , where say , so that has essentially optimal asymptotic rank. The variable sets are then partitioned into blocks: , , . Define by the sub-tensor of obtained by zeroing-out all variables , , . We obtain a partitioning

 t=∑I∈[a],J∈[b],K∈[c]tIJK.

Ideally, the constituent tensors should be matrix multiplication tensors, but this is not necessary.

In the large tensor power , one then is allowed to zero out variables , and (removing all triples containing them). This zeroing out is not arbitrary, however: if some variable, say is zeroed out, consider its index – it is a sequence of length of original indices . Say that (i.e. is the block that uses in its th coordinate). Then every other variable, for which for all , must be zeroed out as well. That is, variables with the same block sequence must either all be kept or all zeroed out.

One considers such possible zeroing outs and attempts to argue that one of them leaves exactly a direct sum of matrix multiplication tensors (possibly of different dimensions). Then one uses the asymptotic sum inequality of Schönhage [Sch81] to obtain a bound on :

###### Theorem 3.1 (Asymptotic Sum Inequality [Sch81]).

If has border rank , and , then , where .

Looking at Schönhage’s proof of the asymptotic sum inequality, however, we see that what it is actually doing is, taking a large tensor power of and zeroing out variables to obtain independent copies of the same single matrix multiplication tensor, i.e. . Thus, we can think of the laser method as zeroing out in a block-preserving fashion, to obtain a copies of the same matrix multiplication tensor.

We now turn to the most successful implementation of the Laser Method: the Coppersmith-Winograd approach.

The Coppersmith-Winograd (CW) family of tensors is as follows: Let be an integer.

 CWq=x0y0zq+1+xq+1y0z0+x0yq+1z0+q∑i=1(xiy0zi+x0yizi+xiyiz0).

is a concise tensor over , of border rank (and hence also asymptotic rank) .

Coppersmith and Winograd [CW90] followed the laser method. The tensors have a natural partitioning , where .

The partitioning is actually a block partitioning: The are obtained by blocking the , and variables into three blocks: the indices are blocked into block containing , block containing and block containing , and then, block of (resp. and ) contains all (resp. and ) with in block of the indices. Then is the block tensor formed by the triples with variables in block , variables in block and variables in block .

The sub-tensors have two useful properties: (1) they are all matrix multiplication tensors, (2) for each above, .

The Coppersmith-Winograd implementation of the laser method uses these properties together with sets excluding -term arithmetic progressions (in conjunction with property (2) above) to decide which blocks of variables to zero out in . Since the zeroing out proceeds by zeroing out variables that have the same block sequences, and due to property (1) in the end one obtains a sum of matrix multiplication tensors, and due to the use of sets excluding -term arithmetic progressions one can guarantee that in fact this is a direct sum of many large matrix multiplication tensors. Then one can use the asymptotic sum inequality to obtain a bound on . To optimize the bound on , one selects the best , which ends up being . Coppersmith and Winograd then achieve a slightly better bound on by analyzing the square in a similar way.

The later improvements on the Coppersmith-Winograd bounds by Stothers [DS13], Vassilevska W. [Wil12] and Le Gall [LG14] instead used the laser method with the CW tools starting from and and , respectively. Each new analysis used different, but related, blockings and partitionings, and each ultimately optimized the resulting bound on by picking , and hence using as the base tensor.

The Coppersmith-Winograd analysis works for any blocking of the variables of a tensor into blocks with integer names so that there exists an integer such that for every triple where is an -block, is a -block and is a -block, . For such a blocking, each constituent tensor should ideally be a matrix multiplication tensor itself. In recent applications of the method, the tensors need not be matrix multiplications, but then one needs to perform a Coppersmith-Winograd analysis on them to obtain a bound known as their Value which roughly says how good they are at supporting matrix multiplication.

The Coppersmith-Winograd approach doesn’t exploit very much about the block tensors . In particular, one can replace each with another tensor over the same sets of variables , as long as has the same “value”, and the modified tensor has the same border rank as ; the bound on the approach would give would be exactly the same! When is a matrix multiplication tensor , for instance, one can replace it with another matrix multiplication tensor as long as the new tensor uses the same variables and , and as long as the produced full tensor has the same border rank. For instance, if we take and replace it with , then we would get the rotated tensor studied in [AW18]. This tensor still has rank and this gives the same upper bound on using the CW approach.

We can thus define a family of generalized CW tensors, as follows.

###### Definition 3.1.

The family of tensors includes, for every permutation , the tensor

 CWσq=(x0y0zq+1+x0yq+1z0+xq+1y0z0)+q∑i=1(xiyσ(i)z0+xiy0zi+x0yizi).

We remark that the family above contains all tensors obtained from by replacing with for any choice of .

The constituent tensor of is , which is still a tensor. Thus, for any such tensor from the family , if its border rank is , the Coppersmith-Winograd approach would give exactly the same bound on , as with .

### 3.4 Group-theoretic approach

Cohn and Umans [CU03] pioneered a new group-theoretic approach for matrix multiplication. The idea is as follows. Take a group and consider its group tensor defined below. (Throughout this paper, we write groups in multiplicative notation.)

###### Definition 3.2.

For any finite group , the group tensor of , denoted , is a tensor over where , , and , given by

 TG:=∑g,h∈Gxgyhzgh.

(Note that the group tensor of is really the structure tensor of the group algebra , often written as . We use for ease of notation.)

The group-theoretic approach first bounds the asymptotic rank of using representation theory, as follows. Let be the dimension of the th irreducible representation of (i.e. the s are the character degrees). Then can be seen to degenerate from . In particular, we get that999It is more straightforward to see that this holds with inequalities (‘’ instead of ‘’) but in fact equality holds because the degeneration of is invertible, and is defined in terms of the asymptotic rank of matrix multiplication tensors.

 ~R(TG)=~R(ℓ⨁u=1⟨du,du,du⟩)=ℓ∑u=1dωu.

Now suppose that we can find any degeneration (e.g. a zeroing out) of into