An Almost Optimal Rank Bound for Depth-3 Identities
We show that the rank of a depth- circuit (over any field) that is simple, minimal and zero is at most . The previous best rank bound known was by Dvir and Shpilka (STOC 2005). This almost resolves the rank question first posed by Dvir and Shpilka (as we also provide a simple and minimal identity of rank ).
Our rank bound significantly improves (dependence on exponentially reduced) the best known deterministic black-box identity tests for depth- circuits by Karnin and Shpilka (CCC 2008). Our techniques also shed light on the factorization pattern of nonzero depth- circuits, most strikingly: the rank of linear factors of a simple, minimal and nonzero depth- circuit (over any field) is at most .
The novel feature of this work is a new notion of maps between sets of linear forms, called ideal matchings, used to study depth- circuits. We prove interesting structural results about depth- identities using these techniques. We believe that these can lead to the goal of a deterministic polynomial time identity test for these circuits.
Polynomial identity testing (PIT) ranks as one of the most important open problems in the intersection of algebra and computer science. We are provided an arithmetic circuit that computes a polynomial over a field , and we wish to test if is identically zero. In the black-box setting, the circuit is provided as a black-box and we are only allowed to evaluate the polynomial at various domain points. The main goal is to devise a deterministic polynomial time algorithm for PIT. Kabanets and Impagliazzo [KI04] and Agrawal [Agr05] have shown connections between deterministic algorithms for identity testing and circuit lower bounds, emphasizing the importance of this problem.
The first randomized polynomial time PIT algorithm, which was a black-box algorithm, was given (independently) by Schwartz [Sch80] and Zippel [Zip79]. Randomized algorithms that use less randomness were given by Chen & Kao [CK00], Lewin & Vadhan [LV98], and Agrawal & Biswas [AB03]. Klivans and Spielman [KS01] observed that even for depth- circuits for bounded top fanin, deterministic identity testing was open. Progress towards this was first made by Dvir and Shpilka [DS06], who gave a quasi-polynomial time algorithm, although with a doubly-exponential dependence on the top fanin. The problem was resolved by a polynomial time algorithm given by Kayal and Saxena [KS07], with a running time exponential in the top fanin. For a special case of depth- circuits, Saxena [Sax08] has designed a deterministic polynomial time algorithm for PIT. Why is progress restricted to small depth circuits? Agrawal and Vinay [AV08] recently showed that an efficient black-box identity test for depth- circuits will actually give a quasi-polynomial black-box test for circuits of all depths.
For deterministic black-box testing, the first results were given by Karnin and Shpilka [KS08]. Based on results in [DS06], they gave an algorithm for depth- circuits having a quasi-polynomial running time (with a doubly-exponential dependence on the top fanin)333[KS08] had a better running time for read- depth- circuits, where each variable appears at most times. But even there the dependence on is doubly-exponential.. One of the consequences of our result will be a significant improvement in the running time of their deterministic black-box tester.
This work focuses on depth- circuits. A structural study of depth- identities was initiated in [DS06] by defining a notion of rank of simple and minimal identities. A depth- circuit over a field is:
where, (a multiplication term) is a product of linear functions over . Note that for the purposes of studying identities we can assume wlog (by homogenization) that ’s are linear forms (i.e. linear polynomials with a zero constant coefficient) and that . Such a circuit is referred to as a circuit, where is the top fanin of and is the degree of . We give a few definitions from [DS06].
[Simple Circuits] is a simple circuit if there is no nonzero linear form dividing all the ’s.
[Minimal Circuits] is a minimal circuit if for every proper subset , is nonzero.
[Rank of a circuit] The rank of the circuit, , is defined as the rank of the linear forms ’s viewed as -dimensional vectors over .
Can all the forms be independent, or must there be relations between them? The rank can be interpreted as the minimum number of variables that are required to express . There exists a linear transformation converting the variables of the circuit into independent variables. A trivial bound on the rank (for any -circuit) is , since that is the total number of linear forms involved in . The rank is a fundamental property of a circuit and it is crucial to understand how large this can be for identities. A substantially smaller rank bound than shows that identities do not have as many “degrees of freedom” as general circuits, and lead to deterministic identity tests444We usually do not get a polynomial time algorithm.. Furthermore, the techniques used to prove rank bounds show us structural properties of identities that may suggest directions to resolve PIT for circuits.
Dvir and Shplika [DS06] proved that the rank is bounded by , and this bound is translated to a time black-box identity tester by Karnin and Shpilka [KS08]. Note that when is larger than , these bounds are trivial.
Our present understanding of identities is very poor when is larger than a constant. We present the first result in this direction.
Theorem 2 (Main Theorem).
The rank of a simple and minimal identity is .
This gives an exponential improvement on the previously known dependence on , and is strictly better than the previous rank bound for every . We also give a simple construction of identities with rank in Section 2, showing that the above theorem is almost optimal. As mentioned above, we can interpret this bound as saying that any simple and minimal identity can be expressed using independent variables. One of the most interesting features of this result is a novel technique developed to study depth- circuits. We introduce the concepts of ideal matchings and ordered matchings, that allow us to analyze the structure of depth- identities. These matchings are studied in detail to get the rank bound. Along the way we initiate a theory of matchings, viewing a matching as a fundamental map between sets of linear forms.
Why are the simplicity and minimality restrictions required? Take the non-simple identity . This has rank . Similarly, we can take the non-minimal identity that has rank . In some sense, these restrictions only ignore identities that are composed of smaller identities.
Apart from being an interesting structural result about identities, we can use the rank bound to get nice algorithmic results. Our rank bound immediately gives faster deterministic black-box identity testers for circuits. A direct application of Lemma 4.10 in [KS08] to our rank bound gives an exponential improvement in the dependence of compared to previous black-box testers (that had a running time of ).
There is a deterministic black-box identity tester for circuits that runs in time.
The above black-box tester is now much closer in complexity to the best non black-box tester known ( time by [KS07]).
Our result also applies to black-box identity testing of read- circuits, where each variable occurs at most times. We get a similar immediate improvement in the dependence of (the previous running time was .)
There is a deterministic black-box identity tester for read- circuits that runs in time.
Although it is not immediate from Theorem 2, our technique also provides an interesting algebraic result about polynomials computed by simple, minimal, and nonzero circuits555Here we can also consider circuits where the different terms in have different degrees. The parameter is then an upper bound on the degree of .. Consider such a circuit that computes a polynomial . Let us factorize into , where each is a nonconstant and irreducible polynomial. We denote by the set of linear factors of (that is, iff is linear).
If is computed by a simple, minimal, nonzero circuit then the rank of is at most .
We first give a simple construction of identities with rank in Section 2. Section 3 contains the proof of our main theorem. We give some preliminary notation in Section 3.1 before explaining an intuitive picture of our ideas (Section 3.2). We then explain our main tool of ideal matchings (Section 3.3) and prove some useful lemmas about them. We move to Section 3.4 where the concepts of ordered matchings and simple parts of circuits are introduced. We motivate these definitions and then prove some easy facts about them. We are now ready to tackle the problem of bounding the rank. We describe our proof in terms of an iterative procedure in Section 3.5. Everything is put together in Section 3.6 to bound the rank. Finally (it should hopefully be obvious by then), we show how to apply our techniques to prove Theorem 5.
2 High Rank Identities
The following identity was constructed in [KS07]: over (with ),
It was shown that, over , is a simple and minimal zero circuit of degree with multiplication terms and . For this section let , , denote the three multiplication terms of . We now build a high rank identity based on . Our basic step is given by the following lemma that was used in [DS06] to construct identities of rank .
[DS06] Let be a simple, minimal and zero circuit, over , with degree , fanin and rank . Define a new circuit over using and :
Then is a simple, minimal and zero circuit with degree , fanin and rank .
Since is an identity, we get that . Therefore,
The terms do not share any variables with (). Since and are simple, is also simple. Suppose is not minimal. We have some subset such that , where . If both and are , then we get , now must be the whole set , because is minimal. On the other hand, if both are , then which is impossible as is minimal. The only remaining possibility is (wlog) . As is coprime to and , this is impossible. Therefore, is minimal.
It is easy to see the parameters of : and . Because the ’s do not share any variables with ’s, the rank . ∎
Family of High Rank Identities: Now we will start with and apply the above lemma iteratively. The -th circuit we get is with degree , fanin and rank . So relates to as:
Also it can be seen that if then . Thus after simplification, we have for any , . This gives us an infinite family of identities over with rank . A similar family can be obtained over as well.
3 Rank Bound
Our technique to bound the rank of identities relies mainly on two notions - form-ideals and matchings by them - that occur naturally in studying a circuit . Using these tools we can do a surgery on the circuit and extract out smaller circuits and smaller identities. Before explaining our basic idea we need to develop a small theory of matchings and define gcd and simple parts of a subcircuit in that framework.
We set down some preliminary definitions before giving an imprecise, yet intuitive explanation of our idea and an overall picture of how we bound the rank.
We will denote the set by .
In this paper we will study identities over a field . So the circuits compute multivariate polynomials in the polynomial ring . We will be studying circuits : such a circuit is an expression in given by a depth- circuit, with the top gate being an addition gate, the second level having multiplication gates, the last level having addition gates, and the leaves being variables. The edges of the circuit have elements of (constants) associated with them (signifying multiplication by a constant). The top fanin is and is the degree of the polynomial computed by . We will call a -identity, if is an identically zero -circuit.
A linear form is a linear polynomial in . We will denote the set of all linear forms by :
Much of what we do shall deal with sets of linear forms, and various maps between them. A list of linear forms is a multi-set of forms with an arbitrary order associated with them. The actual ordering is unimportant : we merely have it to distinguish between repeated forms in the list. One of the fundamental constructs we use are maps between lists, which could have many copies of the same form. The ordering allows us to define these maps unambiguously. All lists we consider will be finite.
[Multiplication term] A multiplication term is an expression in given as (the product may have repeated ’s):
The list of linear forms in , , is just the list of forms occurring in the product above. is naturally called the degree of the multiplication term. For a list of linear forms we define the multiplication term of , , as or if .
[Forms in a Circuit] We will represent a circuit as a sum of multiplication terms of degree , . The list of linear forms occurring in is . Note that is a list of size exactly . The rank of , , is just the number of linearly independent linear forms in .
We set the scene, for proving the rank bound of a identity, by giving a combinatorial/graphical picture to keep in mind. Our circuits consist of multiplication terms, and each term is a product of linear forms. Think of there being groups of nodes, so each node corresponds to a form and each group represents a term666A form that appears many times corresponds to that many nodes.. We will incrementally construct a small basis for all these forms. This process will be described as some kind of a coloring procedure.
At any intermediate stage, we have a partial basis of forms. These are all linearly independent, and the corresponding nodes (we will use node and form interchangeably) are colored red. Forms not in the basis that are linear combinations of the basis forms (and are therefore in the span of the basis) are colored green. Once all the forms are colored, either green or red, all the red forms form a basis of all forms. The number of red forms is the rank of the circuit. When we have a partial basis, we carefully choose some uncolored forms and color them red (add them to the basis). As a result, some other forms get “automatically” colored green (they get added to the span). We “pay” only for the red forms, and we would like to get many green forms for “free”. Note that we are trying to prove that the rank is , when the total number of forms is . Roughly speaking, for every forms we color red, we need to show that the number of green forms will double.
So far nothing ingenious has been done. Nonetheless, this image of coloring forms is very useful to get an intuitive and clear idea of how the proof works. The main challenge comes in choosing the right forms to color red. Once that is done, how do we keep an accurate count on the forms that get colored green? One of the main conceptual contributions of this work is the idea of matchings, which aid us in these tasks. Let us start from a trivial example. Suppose we have two terms that sum to zero, i.e. . By unique factorization of polynomials, for every form , there is a unique form such that , where (we will denote this by ). By associating the forms in to those in , we create a matching between the forms in these two groups (or terms). This rather simple observation is the starting point for the construction of matchings.
Let us now move to , so we have a simple circuit . Therefore, there are no common factors in the terms. To get matchings, we will look at modulo some forms in . By looking at modulo various forms in , we reduce the fanin of and get many matchings. Then we can deduce structural results about . Similar ideas were used by Dvir and Shpilka [DS06] for their rank bound. Taking a form , we look at which gives . By unique factorization of polynomials modulo , we get a -matching. Suppose is an edge in this matching. In terms of the coloring procedure, this means that if is colored and gets colored, then must also be colored. At some intermediate stage of the coloring, let us choose an uncolored form . A key structural lemma that we will prove is that in the -matching (between and ) any neighbor of a colored form must be uncolored. This crucially requires the simplicity of . We will color red, and thus all neighbors of the colored forms in will be colored green. By coloring red, we can double the number of colored forms. It is the various matchings (combined with the above property) that allow us to show an exponential growth in the colored forms as forms in are colored red. By continuing this process, we can color all forms by coloring at most forms. Quite surprisingly, the above verbal argument can be formalized easily to prove that rank of a minimal, simple circuit with top fanin is at most . For this case of , the logarithmic rank bound was there in a lemma of Dvir and Shpilka [DS06], though they did not present the proof idea in this form, in particular, their rank bound grew to for .
The major difficulty arises when we try to push these arguments for higher values of . In essence, the ideas are the same, but there are many technical and conceptual issues that arise. Let us go to . The first attempt is to take a form and look at as a fanin circuit. Can we now simply apply the above argument recursively, and cover all the forms in ? No, the possible lack of simplicity in blocks this simple idea. It may be the case that and have no common factors, but once we go modulo , there could be many common factors! (For example, let . Modulo , the forms and would be common factors.)
Instead of doing things recursively (both [DS06] and [KS07] used recursive arguments), we look at generating matchings iteratively. By performing a careful iterative analysis that keeps track of many relations between the linear forms we achieve a stronger bound for . We start with a form , and look at . From , we remove all common factors. This common factor part we shall refer to as the gcd of , the removal of which leaves the simple part of . Now, we choose an appropriate form from the simple part, and look at . We now choose an and so on and so forth. For each that we choose, we decrease the top fanin by at least , so we will end up with a matching modulo the ideal , where . We call these special ideals form ideals (as they are generated by forms), and the main structures that we find are matchings modulo form ideals. The coloring procedure will color the forms in the form ideal red. Of course, it’s not as simple as the case of , since, for one thing, we have to deal with the simple and gcd parts. Many other problems arise, but we will explain them as and when we see them. For now, it suffices to understand the overall picture and the concept of matchings among the linear forms in .
We now start by setting some notation and giving some key definitions.
3.3 Ideal Matchings
We will use the concept of ideal matchings to develop tools to prove Theorem 2. In this subsection, we provide the necessary definitions and prove some basic facts about these matchings.
First, we discuss similarity between forms and form ideals.
We give several definitions :
[Similar forms] For any two polynomials we call similar to if there is a such that . We say is similar to mod , for some ideal of , if there is a such that . We also denote this by or is -similar to .
[Similar lists] Let and be two lists of linear forms with a bijection between them. and are called similar under if for all , is similar to . Any two lists of linear forms are called similar if there exists such a . Empty lists of linear forms are similar vacuously. For any we define the list of forms in similar to as the following list (unique upto ordering):
We call , coprime lists if , .
[Form-ideal] A form-ideal is the ideal of generated by some nonempty . Note that if then simply means that absolutely.
[Span ] For any we let be the linear span of the linear forms in over the field .
[Orthogonal sets of forms] Let be sets of linear forms for . We call orthogonal if for all :
Similarly, we can define orthogonality of form-ideals .
We give a few simple facts based on these definitions. It will be helpful to have these explicitly stated.
Let be lists of linear forms and be a form-ideal. If are similar then their sublists and are also similar.
If are similar then for some , . This implies:
Since elements of are not in , for any , does not divide . In other words divides , and vice versa. Thus, are similar and hence by unique factorization in , lists are similar. ∎
Let be two orthogonal form-ideals of and let be a circuit such that has all its linear forms in . If then .
As are orthogonal we can assume to be and to be where the ordered set , has linearly independent linear forms. Clearly, there exists an invertible linear transformation on that maps the elements of bijectively, in that order, to . On applying to the equation we get:
Obviously, this means that which by the invertibility of implies . ∎
We now come to the most important definition of this section. We motivated the notion of ideal matchings in the intuition section. Thinking of two lists of linear forms as two sets of vertices, a matching between them signifies some linear relationship between the forms modulo a form-ideal.
[Ideal matchings] Let be lists of linear forms and be a form-ideal. An ideal matching between by is a bijection between lists such that: for all , for some and . The matching is called trivial if are similar.
Note that being a bijection and being nonzero together imply that can also be viewed as a matching between by . We will also use the terminology -matching between and for the above. Similarly, an -matching between multiplication terms is the one that matches . (For convenience, we will just say “matching” instead of “ideal matching”.)
The following is an easy fact about matchings.
Let be a matching between lists of linear forms by and let , be similar sublists. Then there exists a matching between by such that: , are similar under .
Let be such that (for some and ) is not in or is not similar to . As is similar to there exists a form equal to in , for some , and being a matching must be mapping some to in . Also from the matching condition there must be some and such that .
Now we define a new matching by flipping the images of and under , i.e., define to be the same as on and: and . Note that inherits the bijection property from and it is an -matching because: for and more importantly,
The form is clearly in . Thus, we have obtained now a matching between by such that the is similar to .
Note that we increased the number of forms in that are matched to similar forms in . If we find another form in that is not matched to a similar form in , we can just repeat the above process. We will end up with the desired matching in at most many iterations. ∎
We are ready to present the most important lemma of this section. The following lemma shows that there cannot be too many matchings between two given nonsimilar lists of linear forms. It is at the heart of our rank bound proof and the reason for the logarithmic dependence of the rank on the degree. It can be considered as an algebraic generalization of the combinatorial result used by Dvir & Shpilka (Corollary 2.9 of [DS06]).
Let be lists of linear forms each of size and be orthogonal form-ideals such that for all , there is a matching between by . If then are similar lists.
Before giving the proof, let us first put it in the context of our overall approach. In the sketch that we gave for , at each step, we were generating orthogonal matchings between two terms. For each orthogonal matchings we got, we colored one linear form red (added one form to our basis) and doubled the number of green forms (doubled the number of forms in the circuit that are in the span of the basis). This showed that there is a logarithmic-sized basis for all . If we take the contrapositive of this, we get that there cannot be too many orthogonal matchings between two (nonsimilar) lists of forms. For dealing with larger , it will be convenient to state things in this way.
Let be a sublist such that: there exists a sublist similar to for which and are coprime lists. Let , be of size . If then are indeed similar and we are done already. So assume that . By the hypothesis and Fact 13, for all , there exists a matching between by such that: , are similar under and is a matching between , by . Our subsequent argument will only consider the latter property of for all .
Intuitively, it is best to think of the various s as bipartite matchings. The graph has vertices labelled with the respective form. For each and each , we add an (undirected) edge tagged with between and . There may be many tagged edges between a pair of vertices777It can be shown, using the orthogonality of the ’s, that an edge can have at most two distinct tags.. We call the -neighbor of (and vice versa, since the edges are undirected). Abusing notation, we use vertex to refer to a form in . We will denote by .
We will now show that there cannot be more than such perfect matchings in . The proof is done by following an iterative process that has phases, one for each . This is essentially the coloring process that we described earlier. We maintain a partial basis for the forms in which will be updated iteratively. This basis is kept in the set . Note that although we only want to span , we will use forms in the various ’s for spanning.
We start with empty and initialize by adding some to . In the th round, we will add all forms in to . All forms of in are now spanned. We then proceed to the next round. To introduce some colorful terminology, a green vertex is one that is in the set (a form in ). Here is a nice fact : at the end of a round, the number of green vertices in and are the same. Why? All forms of are in , at the end of any round. Let vertex be green, so . The -neighbor of is a linear combination of and . Therefore, the neighbor is in and is colored green. This shows that the number of green vertices in is equal to the number of those in .
Let be the least index such that , are not orthogonal, if it does not exist then set . Now we have the following easy claim.
The sets , are orthogonal and the sets:
Proof of Claim 15. The ideals , are orthogonal by the minimality of .
As are orthogonal but , are not orthogonal we deduce that . Thus, which is orthogonal to the sets by the orthogonality of .
We now show that the green vertices double in at least many rounds.
For , the number of green vertices doubles in the th round.
Proof of Claim 16. Let be a green vertex, say in , at the end of the th round (). Consider the -neighbor of . This is in and is equal to where and is a nonzero element in (this is because are coprime). If this neighbor is green, then would be a linear combination of two green forms, implying . But by Claim 15, is orthogonal to , implying which is a contradiction. Therefore, the -neighbor of any green vertex is not green. On adding to , all these neighbors will become green. This completes the proof.
We started off with one green vertex , and , each of size . This doubling can happen at most times, implying that .
The bound of is achievable by lists of linear forms inspired by Section 2. Fix an odd and define:
It is easy to see that over rationals, and for all , there is a matching between by , furthermore, there is a matching by . Thus there are many orthogonal matchings between these nonsimilar ; showing that our Lemma is tight.
3.4 Ordered Matchings and Simple Parts of Circuits
Before we delve into the definitions and proofs, let us motivate them by an intuitive explanation.
Our main goal is to deal with the case . The overall picture is still the same. We keep updating a partial basis for . This process goes through various rounds, each round consisting of iterations. At the end of each round, we obtain a form-ideal that is orthogonal to . In the first iteration of a round, we start by choosing a form in that is not in , and adding it to . We look at in the next iteration, which is obviously an identity, and try to repeat this step. The top fan-in has gone down by at least one, or in other words, some multiplication terms have become identically zero . We will say that the other terms have survived. The major obstacle to proceeding is that our circuit is not simple any more, because there can be common factors among multiplication terms modulo . Note how this seems to be a difficulty, since it appears that our matchings will not give us a proper handle on these common factors. Suppose that form is now a common factor. That means, in every surviving term, there is a form that is modulo . So these forms can be -matched to each other! We have converted the obstacle into some kind of a partial matching, which we can hopefully exploit.
Let us go back to . Let us remove all common factors from this circuit. This stripped down identity circuit is the simple part, denoted by . The removed portion, called the part, is referred to as . By the above observation, the part has -matchings. A key observation is that all the forms in the part are not similar to . This is because we were only looking at nonzero terms in . Having (somewhat) dealt with by finding -matchings, let us focus on the smaller circuit
We try to find an that is not in . Suppose we can find such an . Then, we add to and proceed to the next iteration. In a given iteration, we start with a form-ideal , and a circuit . We find a form . We add to (for convenience, let us set ) and look at the . We now have new terms in the part, which we can match through -matchings. As we observed earlier, all the terms that have forms in are removed, so the terms we match here are all nonzero modulo . We remove the part to get , and go to the next iteration with as the new . When does this stop? If there is no in , then this means that all of is in our current span. So we happily stop here with all the matchings obtained from the parts. Also, if the fan-in reaches , then we can imagine that the whole circuit is itself in the portion. At each iteration, the fan-in goes down by at least one, so we can have at most iterations in a round, hence the in any round is generated by at most forms. When we finish a round obtaining an ideal , there are some multiplication terms in that are nonzero modulo after the gcd parts in the various iterations are removed from these terms. These we shall refer to as constituting the blocking subset of , for that round.
The way we prove rank bounds is by invoking Lemma 14. Each round constructs a new orthogonal form ideal. At the end of a round, we have a set , which is a partial basis. If does not cover all of , then we use the above process (of iterations) to generate a form-ideal orthogonal to . Consider two terms and that survive this process (mod ). At each stage, when we add a form to , we remove forms from and , -matching them. When we stop with our form-ideal , we can think of and as split into two parts : one having forms from , and the other which is -matched. For each orthogonal form-ideal we generate, we match subsets of terms. We use Lemma 14 to tell us that we cannot have too many such form-ideals, which leads to the rank bound.
We start with looking at the particular kind of matchings that we get. Take two terms and that survive a round, where we find the form-ideal generated by . At the end of the first iteration, we add to . No form in can be . We match some forms in to via -matchings. They are removed, and then we proceed to the next iteration. We now match some forms via matchings and none of these forms are in this span. So in each iteration, the forms that are matched (and then removed) are non-zero mod the partial obtained by that iteration. We formalize this as an ordered matching.
[Ordered matching] Let be lists of linear forms and an ordered set be a form-ideal having linearly independent linear forms. A matching between by is called an ordered -matching if :
Let be zero. For all , where , and for some satisfying .
We add the zero element , just to deal with similar forms in and . Note that the inverse bijection is also an ordered matching between by . It is also easy to see that if and are ordered matchings between lists and lists respectively by the same ordered form-ideal then their disjoint union, , is an ordered matching between lists , by .
We will stick to the notation in Definition 18. For convenience, let . Let , where but