Unbalancing Sets and an Almost Quadratic Lower Bound for Syntactically Multilinear Arithmetic Circuits
We prove a lower bound of on the size of any syntactically multilinear arithmetic circuit computing some explicit multilinear polynomial . Our approach expands and improves upon a result of Raz, Shpilka and Yehudayoff ([RSY08]), who proved a lower bound of for the same polynomial. Our improvement follows from an asymptotically optimal lower bound for a generalized version of Galvin’s problem in extremal set theory.
An arithmetic circuit is one of the most natural and standard computational models for computing multivariate polynomials. Such circuits provide a succinct representation of multivariate polynomials, and in some sense, they can be thought of as algebraic analogs of boolean circuits. Formally, an arithmetic circuit over a field and a set of variables is a directed acyclic graph in which every vertex has in-degree either zero or two. The vertices of in-degree zero (called leaves) are labeled by variables in or elements of , and the vertices of in-degree two are labeled by either (called sum gates) or (called product gates). A circuit can have one or more vertices of out degree zero, known as the output gates. The polynomial computed by a vertex in any111Throughout this paper, we will use the terms gates and vertices interchangeably. given circuit is naturally defined in an inductive way: a leaf computes the polynomial which is equal to its label. A sum gate computes the polynomial which is the sum of the polynomials computed at its children and a product gate computes the polynomial which is the product of the polynomials at its children. The polynomials computed by a circuit are the polynomials computed by its output gates. The size of an arithmetic circuit is the number of vertices in it.
It is not hard to show (see, e.g., [CKW11]) that a random polynomial of degree in variables cannot be computed by an arithmetic circuit of size with overwhelmingly high probability. A fundamental problem in this area of research is to prove a similar super-polynomial lower bound for an explicit polynomial family. Unfortunately, the problem continues to remain wide open and the current best lower bound known for general arithmetic circuits222In the rest of the paper, when we say a lower bound, we always mean it for an explicit polynomial family. is an lower bound due to Strassen [Str73] and Baur and Strassen [BS83] from more than three decades ago. The absence of substantial progress on this general question has led to focus on the question of proving better lower bounds for restricted and more structured subclasses of arithmetic circuits. Arithmetic formulas [Kal85], non-commutative arithmetic circuits [Nis91], algebraic branching programs [Kum17], and low depth arithmetic circuits [NW97, GK98, GR00, Raz10, GKKS14, FLMS14, KLSS14, KS14, KS17] are some such subclasses which have been studied from this perspective. For an overview of the definition of these models and the state of art for lower bounds for them, we refer the reader to the surveys of Shpilka and Yehudayoff [SY10] and Saptharishi [Sap16].
Several of the most important polynomials in algebraic complexity and in mathematics in general are multilinear. Notable examples include the determinant, the permanent, and the elementary symmetric polynomials. Therefore, one subclass which has received a lot of attention in the last two decades and will be the focus of this paper is the class of multilinear arithmetic circuits.
1.1 Multilinear arithmetic circuits
For an arithmetic circuit and a vertex in , we denote by the set of variables such that there is a directed path from a leaf labeled by to ; in this case, we also say that depends on 333We remark that this is a syntactic notion of dependency, since it is possible that every monomial with might get canceled in the intermediate computation and might not eventually appear in the polynomial computed at .. A polynomial is said to be multilinear if the individual degree of every variable in is at most one.
An arithmetic circuit is said to be syntactically multilinear if for every multiplication gate in with children and , the sets of variables and are disjoint. We say that is semantically multilinear if the polynomial computed at every vertex is a multilinear polynomial. Observe that if is a syntactically multilinear circuit, then it is also semantically multilinear. However, it is not clear if every semantically multilinear circuit can be efficiently simulated by a syntactically multilinear circuit.
A multilinear circuit is a natural model for computing multilinear polynomials, but it is not necessarily the most efficient one. Indeed, it is remarkable that all the constructions of polynomial size arithmetic circuits for the determinant [Csa76, Ber84, MV97], which are fundamentally different from one another, nevertheless share the property of being non-multilinear, namely, they involve non-multilinear intermediate computations which eventually cancel out. There are no subexponential-size multilinear circuits known for the determinant, and one may very well conjecture these do not exist at all.
Multilinear circuits were first studied by Nisan and Wigderson [NW97]. Subsequently, Raz [Raz09] defined the notion of multilinear formulas444For formulas, it is known that syntactic multilinearity and semantically multilinearity are equivalent (See, e.g., [Raz09]). and showed that any multilinear formula computing the determinant or the permanent of an variable matrix must have super-polynomial size. In a follow up work [Raz06], Raz further strengthed the results in [Raz09] and showed that there is a family of multilinear polynomials in variables which can be computed by a size syntactically multilinear arithmetic circuits but require multilinear formulas of size .
Building on the ideas and techniques developed in [Raz09], Raz and Yehudayoff [RY09] showed an exponential lower bound for syntactically multilinear circuits of constant depth. Interestingly, they also showed a super-polynomial separation between depth and depth syntactically multilinear circuits for constant .
In spite of the aforementioned progress on the question of lower bounds for multilinear formulas and bounded depth syntactically multilinear circuits, there was no lower bounds known for general syntactically multilinear circuits for any constant . In fact, the results in [Raz06] show that the main technical idea underlying the results in [Raz09, Raz06, RY09] is unlikely to directly give a super-polynomial lower bound for general syntactically multilinear circuits. However, a weaker super-linear lower bound still seemed conceivable via similar techniques.
Raz, Shpilka and Yehudayoff [RSY08] showed that this is indeed the case. By a sophisticated and careful application of the techniques in [Raz09] along with several additional ideas, they established an lower bound for an explicit variate polynomial. Since then, this has remained the best lower bound known for syntactically multilinear circuits. In this paper, we improve this result by showing an almost quadratic lower bound for syntactically multilinear circuits for an explicit variate polynomial. In fact, the family of hard polynomials in this paper is the same as the one used in [RSY08]. We now formally state our result.
There is an explicit family of polynomials , where is an variate multilinear polynomial, such that any syntactically multilinear arithmetic circuit computing must have size at least .
What is the minimal integer for which there is a family of subsets , each satisfying such that for every , there exists an with ?
Raz, Shpilka and Yehudayoff [RSY08] showed that . For our proof, we show that .
In addition to its application to the proof of Theorem 1.1, 1.2 seems to be a natural problem in extremal combinatorics and might be of independent interest, and special cases thereof were studied in the combinatorics literature. In the next section, we briefly discuss the state of the art of this question and state our main technical result about it in Theorem 1.3.
1.2 Unbalancing Sets
The following question, which is of very similar nature to 1.2, is known as Galvin’s problem (see [FR87, EFIN87]): What is the minimal integer , for which there exists a family of subsets , each of size , such that for every subset of size there exists some such that ?
It is not hard to show that . Indeed, let , for , and let . Then is always an even integer, , and if . By a discrete version of the intermediate value theorem, it follows there exists such that , which implies that exactly elements of belong to . Thus, the family satisfies this property.
As for lower bounds, a counting argument shows that , since for each fixed of size and random of size ,
Frankl and Rödl [FR87] were able to show that for some if is odd, and Enomoto, Frankl, Ito and Nomura [EFIN87] proved that if is odd, which implies that even the constant in the construction given above is optimal. Until this work, the question was still open for even values of : in fact, Markert and West (unpublished, see [EFIN87]) showed that for , .
For our purposes, we need to generalize Galvin’s problem in two ways. The first is to lift the restriction on the set sizes. The second is to ask how small can the size of the family be if we merely assume each balanced partition is “-balanced” on some , namely, if for some (the main case of interest for us is ). Of course, since itself is balanced, very small or very large sets are always -balanced, and thus we impose the (tight) non-triviality condition for every .
Once again, by defining ( is always assumed to be even), the family gives a construction of size such that every balanced partition is -balanced on some .
It is natural to conjecture that, perhaps up to a constant, this construction is optimal. Indeed, this is what we prove here.
Let be any large enough even number, and let be an integer. Let be sets such that for all , . Further, assume that for every of size there exists such that . Then, .
In particular, Theorem 1.3 proves a linear lower bound for the original problem of Galvin, even when the universe size is of the form for even .
We remark that the relevance of problems of this form to lower bounds in algebraic complexity was also observed by Jansen [Jan08] who considered the problem of obtaining a lower bound on homogenous syntactically multilinear algebraic branching program (which is a weaker model than syntactically multilinear circuits), and essentially proposed Theorem 1.3 as a conjecture. In fact, a special case of this theorem (see Theorem 3.1), which has a simpler proof, is already enough to derive the improved lower bounds for syntactically multilinear circuits.
Alon, Bergmann, Coppersmith and Odlyzko [ABCO88] considered a very similar problem of balancing -vectors: they studied families of vectors such that for , which satisfy the properties that for every (not necessarily balanced), there exists such that . They generalized a construction of Knuth [Knu86] and proved a matching lower bound which together showed that is both necessary and sufficient for such a set to exist. Galvin’s problem seems like “the version” of the same problem, but, to quote from [ABCO88], there does not seem to be any simple dependence between the problems.
1.3 Proof overview
In this section, we discuss the main ideas and give a brief sketch of the proofs of Theorem 1.1 and Theorem 1.3. Since our proof heavily depends on the proof in [RSY08] and follows the same strategy, we start by revisiting the main steps in their proof and noting the key differences between the proof in [RSY08] and our proof. We also outline the reduction to the combinatorial problem of unbalancing set families in 1.2.
Proof sketch of [Rsy08]
The proof in [RSY08] starts by proving a syntactically multilinear analog of a classical result of Baur and Strassen [BS83], where it was shown that if an variate polynomial is computable by an arithmetic circuit of size , then there is an arithmetic circuit of size at most with outputs such that the -th output gate of computes . Raz, Shpilka and Yehudayoff show that if is syntactically multilinear, then the circuit continues to be syntactically multilinear. Additionally, there is no directed path from a leaf labeled by to the output gate computing .555See Theorem 4.2 for a formal statement.
Once we have this structural result, it would suffice to prove a lower bound on the size of . For brevity, we denote the subcircuit of rooted at the output gate computing by . As a key step of the proof in [RSY08], the authors identify certain sets of vertices in with the following properties.
For every , is a subset of vertices in .
For every and , the number of such that is not too large (at most ).
Observe that at this point, showing a lower bound of on the size of each implies a lower bound of on the size of and hence . In [RSY08], the authors show that there is an explicit such that each must have size at least , thereby getting a lower bound of on the size of .
For our proof, we follow precisely this high level strategy. Our improvement in the lower bound comes from showing that each must be of size at least and not just as shown in [RSY08]. We now elaborate further on the main ideas in this step in [RSY08] and the differences with the proofs in this paper.
We start with some intuition into the definition of the sets in [RSY08]. Consider a vertex in which depends on at least variables. Without loss of generality, let these variables be . From item 4 in Theorem 4.2, we know that the variable does not appear in the subcircuit . Therefore, the vertex cannot appear in the subcircuits . So, if we define the set as the set of vertices in which depend on at least variables, then must be disjoint from vertices in at least of the subcircuits . Picking would give us the desired property. So, if we can prove a lower bound on the size of the set , we would be done. However, the definition of the set so far turns out to be too general, and we do not know a way of directly proving a lower bound on its size.666Indeed, it is not even immediately clear if the has any other gates apart from the output gate of .
To circumvent this obstacle, [RSY08] define the set (called the upper leveled gates in ) as the set of all vertices in which depend on at least variables and have a child which depends on more than variables and less than variables. This additional structure is helpful in proving a lower bound on the size of . We now discuss this in some more detail.
For every , let be the set of vertices in , such that , and has a parent in . These gates are referred to as lower leveled gates. Observe that , since the in-degree of every vertex in is at most . The key structural property of the set is the following (see Proposition 5.5 in [RSY08]).
Let , and let be the polynomials computed by the gates in . Then, there exist multilinear polynomials such that
For every , and are variable disjoint.
The degree of is at most .
Observe that Equation 1.5 is basically a decomposition of a potentially-hard polynomial in terms of the sum of products of multilinear polynomials in an intermediate number of variables. The goal is to show that for an appropriate explicit , the number of summands on the right hand side of Equation 1.5 cannot be too small. A similar scenario also appears in the multilinear formula lower bounds and bounded depth multilinear formula lower bounds of [Raz09, Raz06, RY09] (albeit with some key differences). Hence, a natural approach at this point would be to use the tools in [Raz09, Raz06, RY09], namely the rank of the partial derivative matrix, to attempt to prove this lower bound. We refer the reader to Section 2.2 for the definitions and properties of the partial derivative matrix and proceed with the overview. For each , let the polynomial in 1.4 depend on the variables . The key technical step in the rest of the proof is to show that there is a partition of the set of variables into and such that and for every , . In [RSY08], the authors show that there is an absolute constant such that if , then there is an equipartition of which unbalances all the sets by at least . Our key technical contribution (Theorem 1.3) in this paper is to show that as long as , there is an equipartition which unbalances all the ’s by at least . This implies an on the size of each set , and thus an lower bound on the circuit size.
Before we dive into a more detailed discussion on the overview and main ideas in the proof of Theorem 1.3 in the next section, we would like to remark that the lower bound question in Equation 1.5 seems to be a trickier question than what is encountered while proving multilinear formula lower bounds [Raz09, Raz06] or bounded depth syntactically multilinear circuit lower bounds [RY09]. The main differences are that in the proofs in [Raz09, Raz06, RY09], the sets have a stronger guarantee on their size (at least and at most ), and each of the summands on the right has many variable disjoint factors and not just two factors as in Equation 1.5. For instance, in the formula lower bound proofs the number of variable disjoint factors in each summand on the right is , and for constant depth circuit lower bounds it is . Together, these properties make it possible to show much stronger lower bounds on . In particular, it is known that a random equipartition works for these two applications, in the sense that it unbalances sufficiently many factors in each summand, thereby implying that the rank of the partial derivative matrix of the polynomial is small. Hence, for an appropriate777 is chosen so that the the partial derivative matrix for is of full rank for every equipartition. , the number of summands must be large. However, since a set of size is balanced under a random equipartition with probability and the identity in Equation 1.5 involves just two variable disjoint factors, taking a random equipartition would not enable us to prove any meaningful bounds.
Proof sketch of Theorem 1.3
Recall that our task is, given a small collection of subsets of , to find a balanced partition which is unbalanced on each of the sets. Equivalently, we would like to prove that if is a family of subsets such that every balanced partition balances at least one set in , then must be large (of course, must satisfy the conditions in Theorem 1.3).
We first sketch the proof of a special case (which suffices for the main application here), when and is a prime. For the sake of simplicity, suppose also that all subsets are of even size, and assume further that for every subset of size there exists such that completely balances , namely, . One possible approach to obtain lower bounds on is via an application of the polynomial method as done, for example, in [ABCO88]. Define the following polynomial over, say, the rationals:
By the assumption on , the polynomial evaluates to over all points in with Hamming weight exactly . We can also argue, using the assumption on the set sizes in , that is not identically zero, and clearly . Thus, a lower bound on translates to a lower bound on .
This idea, however, seems like a complete nonstarter, since there exists a degree non-zero polynomial which evaluates to 0 over the middle layer of , namely, .
A very clever solution to this potential obstacle was found by Hegedűs [Heg10]. Suppose for some prime . The main insight in [Heg10] is to consider the polynomial over , and to add the requirement that there exists some , of Hamming weight exactly , such that . This requirement rules out the trivial example , and Hegedűs was able to show that the degree of any polynomial with these properties must be at least (see 2.1 for the complete statement).
We are thus left with the task of proving that our polynomial evaluates to a non-zero value over some point of Hamming weight . This turns out to be not very hard to show, assuming each set is of size at least, say, and at most , by choosing a random such vector . Indeed, it is not surprising that it is much easier to directly show that a highly unbalanced partition of (into vs ) unbalances all the sets .888In our case, we need to argue that the imbalance is non-zero modulo , which adds an extra layer of complication, although again, one which is not hard to solve.
As mentioned earlier, the case and in Theorem 1.3 is considerably easier to prove and suffices for the application to circuit lower bounds. Proving this theorem for every even and every requires further technical ideas. We postpone this discussion to Section 3.2.
Even though 2.1 seems to be a fundamental statement about polynomials over finite fields and could conceivably have an elementary proof, the proof in [Heg10] uses more advanced techniques. It relies on the description of Gröbner basis for ideals of polynomials in which vanish on all points in of weight equal to . A complete description of the reduced Gröbner basis for such ideals was given by Hegedűs and Rónyai [HR03] and their proof builds up on a number of earlier partial results [ARS02, FG06] on this problem.
Organization of the paper
In the rest of the paper, we set up some notation and discuss some preliminary notions in Section 2, prove Theorem 1.3 in Section 3 and complete the proof of Theorem 1.1 in Section 4. Throughout the paper we assume, whenever this is needed, that is sufficiently large, and make no attempts to optimize the absolute constants.
For , we denote . For a prime , we denote by the finite field with elements. For two integers with , we denote . The characteristic vector of a set is denoted by .
As is standard, denotes the family .
For an even and such that , we call a balanced partition of , with the implied meaning that partitions evenly into and . The imbalance of a set under is . Observe the useful symmetry , which follows from the fact that . We say is -unbalanced under if .
We use the following lemma from [Heg10].
Let be a prime, and let be a polynomial. Suppose that for all , it holds that , and that there exists such that and . Then .
2.1 Hypergeometric distribution
For parameters , where , by , we denote the distribution of , where is any fixed subset of of size , and is a uniformly random subset of of size equal to . Clearly,
The expected value of under this distribution is equal to . We need the following tail bound of hypergeometric distribution for our proof.
Let , and be as defined above. Then, for every
L (Hoeffding’s inequality, [As16]).
Let be independent random variables taking values in . Then,
2.2 Partial derivative matrix
For a circuit , we denote by the size of , namely, the number of gates in it. For a gate , we denote by the set of variables that occur in the subcircuit rooted at .
Let be a set of variables, (not necessarily of size ) and let . For a multilinear polynomial , we define the partial derivative matrix of with respect to , denoted , as follows: the rows of are indexed by multilinear monomials in . the columns of are indexed by multilinear monomials in . The entry which corresponds to is the coefficient of the monomial in . We define .
The following properties of the partial derivative matrix are easy to prove and well-documented (see, e.g., [RSY08]).
The following properties hold:
For every multilinear polynomial , and , .
For every two multilinear polynomials and for every partition , .
Let and be multilinear polynomials such that . Let and for . Set . Then .
Let be a multilinear polynomial such that and . Suppose , and let for some . Then .
Let be a multilinear polynomial of total degree . Then for every partition such that , .
3 Unbalancing sets under a balanced partition
In this section, we prove Theorem 1.3. We start by proving a special case (see Theorem 3.1 below) when equals for some prime , and . This special case already suffices for the application to the proof of Theorem 1.1 (for infinitely many values of ), and has a somewhat simpler proof. We then move on to prove the case for general and , which while being similar to the proof of Theorem 3.1, needs some additional ideas and care.
3.1 Special case : and
Let be a large enough prime, and let . Let be sets such that for all , . Further, assume that for every balanced partition of there exists such that . Then, .
We start with the following lemma, which shows that a small collection of sets can be unbalanced (modulo ) by a partition which is very unbalanced.
Let be a large enough prime, and let . Let be sets such that for all , . Assume further . Then, there exists , such that for all and for all , .
To prove 3.2, we use the following two technical claims. Let denote the probability distribution on subsets of obtained by putting each in with probability , independently of all other elements.
For a random set , .
The probability that is given by , which is , by Stirling’s approximation. ∎
Let and let such that . For a random set , the probability that for some integer it holds that is at most .
Denote . Then . We say is bad for if for some and . We claim this in particular implies that . Indeed, since is an integer in the interval , and by the bounds on , the only cases needed to be analyzed are .
If , then clearly which implies the statement.
If , then, as and ,
(The “” accounts for the fact that might not be an integer).
Finally, if , it holds that
which again implies the statement.
By Chernoff Bound (see, e.g., [AS16]), , hence is bad for with at most that probability. ∎
The proof of 3.2 is now fairly immediate.
Proof of 3.2.
It follows that with probability at most , either or is bad for some , and hence there exists a selection of such that and is good for all ’s. ∎
We are now ready to prove Theorem 3.1.
Proof of Theorem 3.1.
Let be a collection of sets as stated in the theorem. Since , we can assume without loss of generality, by possibly replacing a set with its complement, that for all . We may further assume as otherwise the statement directly follows. For , define the following polynomials over :
where and is the usual inner product. Further, define
as a polynomial over .
By assumption, for every , . This follows because , and by assumption, for some is holds that , so it must be that , so that .
By 2.1, , and by construction, , which implies the desired lower bound on . ∎
3.2 General and
In this section, we extend Theorem 3.1 for a more general range of parameters, by proving the following.
Let be a large enough even natural number, and let be a parameter. Let be sets such that for each , . Furthermore, assume that for every balanced partition of , there exists an such that . Then, .
Recall that in Theorem 3.1 we have required the universe size to be of the form for a prime , and the sets to be of size at least logarithmic in (as commented earlier, we may assume for every , by possibly replacing with its complement).
Our strategy for general even999In order to talk about balanced partitions of the universe, clearly must be even. However, our techniques can be easily extended to odd integers, if one is willing to replace balanced partitions by almost-balanced partitions, that is, partitions such that . We omit the straightforward details. and general will be very similar for the previous special case. In order to apply the useful 2.1, we start by “forcing” the universe size to be of the form . This is done by picking the largest number of the form which is smaller than (known results about the distribution of prime numbers guarantee the existence of such a prime such that ). We then randomly pick a subset of of size avoiding all the small sets and partition in an arbitrary balanced manner. Such a subset is guaranteed, with high probability, to have a small intersection with every , and thus for every such set the values of very few elements have been determined. Again, this intersection property is easier to show, by standard concentration bounds, when the sets are somewhat large, whereas in our case they can be small. However, the fact that itself is sublinear in enables us to handle all cases.
We now denote and , and, as before, we would like to find a set of size exactly that is unbalanced, modulo , on every (and since is a very large subset of , this property will extend to itself). A naïve random choice, as is done in the proof of Theorem 3.1, will not work, since the probability of failure for very small sets will be too large to apply a union bound over all sets. Thus, we pick using a different, and slightly more complicated, random procedure.
Given such and , the proof follows from a similar construction of a polynomial in a similar application of 2.1. We now provide the details.
We start by proving the existence of a set as described above.
Let be an integer and be subsets of , such that . Then, for every integer , there exists an of size exactly such that for every , . Moreover, for each , if , then .
Let and let . Since , we know that . Let to be a uniformly random subset of of size .
We now show that with high probability satisfies for every . We consider three cases.
Small sets: . By the choice of , we know that is disjoint from all subsets of size at most .
Large sets: . For any fixed set of size at least , by 2.2, we know that
Since and , this probability is at most . Thus, by a union bound, we know that with probability at least , for each with , .
Sets of intermediate size: . We now argue that for all such sets, , with high probability.
To this end, we first upper bound the probability that the set contains a fixed set of size , and then take a union bound over all sets of size which are a subset of some of intermediate size. Let be a fixed set of size . Then,
For each of size at most there are at most subsets of size . Therefore, by a union bound, the probability that for any subset of size at most is at most .
A union bound over all three cases completes the proof of the lemma. ∎
Having shown the existence of the set as described in the proof outline, we turn to show the existence of a set .
Let be a natural number, be a prime satisfying and let be an integer satisfying . Let be subsets of , such that and for every , . Let be a set of size such that for every , and is disjoint from all sets of size at most . Let be an arbitrary subset of . Then, there exists a set of size exactly , such that for every , if then for every integer with , it holds that . If , the same holds for .
Denote , and for all . We note that if , then . We construct the set by a randomized algorithm, which consists of several steps. In the first step, we greedily select a small number of elements from each set . The purpose of this step is to guarantee that is sufficiently far from , for every . Next, we pick each of the remaining elements of to with probability . This constant is chosen so that with high probability (assuming is sufficiently large), the intersection is non-zero modulo (and since and are very close, the same holds for ), and also with high probability the number of elements we have picked so far does not exceed .
The next step is again a deterministic, greedy step, which adds to sufficiently many elements from each “bad” set . Those are the sets of which too few elements were picked before. By standard concentration bounds, we do not expect to have many such large sets, and thus again we can control the number of elements added in this step.
Finally, assuming the number of elements that were picked so far is less than (which happens with high probability), we add arbitrary elements to our set so that it will be of size exactly . Of course, we also have to argue that this step preserves the previous intersection requirements. This follows from the fact that we do not expect to add many elements in this step.
We now provide the more formal details. is constructed using the following randomized algorithm.