Quadratic polynomials of small modulus cannot represent OR

# Quadratic polynomials of small modulus cannot represent OR

Holden Lee Department of Mathematics, Princeton University. Email: holdenl@math.princeton.edu.
July 6, 2019
###### Abstract

An open problem in complexity theory is to find the minimal degree of a polynomial representing the -bit OR function modulo composite . This problem is related to understanding the power of circuits with gates where is composite. The OR function is of particular interest because it is the simplest function not amenable to bounds from communication complexity. Tardos and Barrington [TB] established a lower bound of , and Barrington, Beigel, and Rudich [BBR] established an upper bound of . No progress has been made on closing this gap for twenty years, and progress will likely require new techniques [BL].

We make progress on this question viewed from a different perspective: rather than fixing the modulus and bounding the minimum degree in terms of the number of variables , we fix the degree and bound in terms of the modulus . For degree , we prove a quasipolynomial bound of , improving the previous best bound of implied by Tardos and Barrington’s general bound.

To understand the computational power of quadratic polynomials modulo , we introduce a certain dichotomy which may be of independent interest. Namely, we define a notion of boolean rank of a quadratic polynomial and relate it to the notion of diagonal rigidity. Using additive combinatorics, we show that when the rank is low, must have many solutions. Using techniques from exponential sums, we show that when the rank of is high, is close to equidistributed. In either case, cannot represent the OR function in many variables.

poly.bib

## 1 Introduction

### 1.1 Overview

A major open problem in complexity theory is to characterize the computational power of modular counting. For instance, for any composite , the question is still open, where is the class of functions computable by constant-depth circuits allowing gates.

One technique to tackle such problems is to relate circuits containing gates to polynomials over . This has been successful when is prime. For example, to show for prime and any not a power of , Razborov and Smolensky [R, S] showed that functions in can be approximated by polynomials of degree , and then proved that cannot be approximated by such polynomials. See [B93] for a survey of the polynomial method in circuit complexity. (See also [V].) What if we allow arbitrary moduli? Building on work of Yao [Y], Beigel and Tarui [BT] show that functions in can be written in the form where is a polynomial over of degree and is some function. Thus, to show an explicit family of functions is not in , it suffices to lower-bound the minimum degree of polynomials representing in this way. However, currently there are few techniques for doing so.

As a first step towards such lower bounds, Barrington, Beigel, and Rudich [BBR] consider a similar question over rather than . Write below.

###### Definition 1.

Let be a function. A function weakly represents if there exists a partition such that

 g(x)=0 ⟺f(x)∈A g(x)=1 ⟺f(x)∈Ac.

Define the weak degree to be the minimal degree of a polynomial that weakly represents .

The goal is to estimate for specific functions , and in particular exhibit functions with large weak degree.

One way to bound is using communication complexity. Gromulsz [Gr95] noted that if a function has -party communiction complexity , then its weak degree is at least . From Babai, Nisan, and Szegedy’s [BNS] lower bound for the communication complexity of the generalized inner product function he concluded that the GIP function has weak degree . Current techniques in communication complexity only give superconstant bounds when the number of parties is  [KN], so improvement along these lines is difficult.

Researchers have proved bounds for the more rigid notion of 1-sided representation, which requires in Definition 1, obtaining bounds of for the equality function [KW91] and the majority function [Ts93], and a bound of for the when has a prime not dividing [BBR]. However, 1-sided representation does not capture the full power of modular counting.

A natural function to consider is the OR function , defined by and for . (equivalently ) is a natural function to consider because it is the simplest function, in a sense, and its communication complexity is trivial, so other techniques are necessary to lower bound its degree. Note that because takes the value 0 only on , is the minimal degree of a polynomial such that for , iff (i.e., weak representation is equivalent to 1-sided representation).

When is a prime power it is folklore [TB] that

 nm−1≤Δ(ORn,m)≤n,

because one can turn a polynomial weakly representing , into a polynomial representing , with at most a factor increase in degree. See also [CFS] for general theorems on the zero sets of polynomials over finite fields.

Most interesting is the regime where is a fixed composite number (say, 6), and . Suppose has factors. Barrington, Beigel, and Rudich [BBR] show the upper bound

 Δ(ORn,m)=O(n1r).

This bound is attained by a symmetric polynomial. Moreover, they prove that any symmetric polynomial representing modulo has degree .

Alon and Beigel [AB] proved the first superconstant lower bound on the weak degree of . Later Tardos and Barrington [TB] proved the bound

 Δ(ORn,m)≥((1q−1−o(1))logn)1r−1=Ωm(logn)1r−1 (1)

where is the smallest prime power fully dividing . Their proof proceeded by finding a subcube of where the polynomial is constant modulo a prime power dividing ; then represents OR modulo on this subcube. An induction on the number of distinct prime factors results in the exponent. This technique has also been used to show structural theorems of polynomials over , with applications to affine and variety extractors [CT].

In this work, we make modest progress on this question. Rather than fixing the modulus and bounding the minimum degree , we fix the degree and bound the minimum modulus . Specifically, we focus on the degree 2 case, and prove the following.

###### Theorem 2.

There exists a constant such that the following holds. If has prime factors, counted with multiplicity, and the quadratic polynomial weakly represents modulo , then

 n≤mCd≤mClgm.

The lower bound by Tardos and Barrington (1) gives where is the smallest prime power factor of , and is the number of distinct prime factors. This gives . Hence, Theorem 2 improves this exponential upper bound to a quasipolynomial upper bound.

We conjecture that the correct upper bound is , or at the very least, we have . The loss comes from an inefficient way of dealing with multiple factors.

To prove Theorem 2, we define a new notion of boolean rank (Definition 1) for a quadratic polynomial , which differs from the ordinary notion of rank in that it captures rank only over the boolean cube, and has connections to matrix rigidity. This notion of boolean rank enables us to split the proof into two cases that we consider independently. When the rank is low, we use additive combiantorics to show must have many solutions. When the rank is high, we use Weyl differencing to show that is close to equidistributed. In either case, when is small will have more than one solution and hence cannot represent .

#### Organization:

The outline of the rest of the paper is as follows. In the remainder of the introduction, we introduce related work and notations. In Section 2 we give a more detailed overview of the proof. In Sections 3 and 4 we consider the low and high rank cases, respectively. In Section 5 we prove the main theorem. In Section 6 we speculate on ways to extend the argument to higher degree. Appendix A contains facts we will need about linear algebra over when is composite.

### 1.2 Related work

The problem of finding the weak degree of is connected to several other interesting problems. Firstly, polynomials representing modulo can be used to construct matching vector families (MVF) [Gr00], which can then be used to build constant-query locally decodable codes (LDCs) [E, DGY]. A matching vector family modulo is a pair of lists such that

 ⟨si,tj⟩{=0,i=j≠0,i≠j.

If is a polynomial representing , then iff . If this polynomial is , then the corresponding MVF consists of the vectors and vectors . The representation of by symmetric polynomials already gives a subexponential-length LDC. There is an large gap between the upper bound and lower bound for constant-query locally decodable codes. For each positive integer , there is a family of constant-query LDCs taking messages of length to length , while the best lower bound is for queries. Thus narrowing the gap for is a first step towards narrowing the gap for LDC’s.

Secondly, OR representations give explicit constructions of Ramsey graphs, and encompass many previous such constructions [Gr00, Gr00]. Gopalan defines OR representations slightly differently, as a pair of polynomials and such that for , and simultaneously only at . The construction puts an edge between iff . The probabilistic method gives nonexplicit graphs with vertices with clique number and independence number at most ; the best OR representations give explicit graphs with .

Recently, Bhomwick and Lovett [BL] showed a barrier to lower bounds for the weak degree of : to prove strong lower bounds, one has to use properties of polynomials that are not shared by nonclassical polynomials, because there exist nonclassical polynomials of degree that represent . A nonclassical polynomial of degree is a function such that for all , where . Thus, to go beyond , one cannot rely exclusively on the fact that the th difference of a degree polynomial is constant, which is the core of techniques such as Weyl differencing. This barrier it not directly relevant to our work because nonclassical polynomials for degree can only appear in characteristic 2, and any such nonclassical polynomial can be realized as a polynomial modulo 4, .

The maximum such that a degree 2 polynomial can weakly represent is not known. The best symmetric polynomial has , but the true answer lies in the interval [TB], as the polynomial works for .

### 1.3 Notation

We use the following notation.

• . Note that we regard as a subset of , hence distinguishing it from .

• Boldface font represents vectors; for instance is the vector .

• is the ring of integers modulo .

• For a prime power, write ( fully divides ) to mean that but .

• Let . Note this is well defined on .

## Acknowledgements

Thanks to Zeev Dvir for his guidance and comments on this paper, and to Sivakanth Gopi for useful discussions.

## 2 Proof overview

It suffices to show that if and is a quadratic polynomial modulo , then the number of zeros of is either 0 or .

We first define the notion of boolean rank (Definition 4). We say a quadratic has boolean rank at most if on the Boolean cube, it can be written as a function of linear forms. Boolean rank is useful because low boolean rank implies has many zeros, as we will show in Section 3. This is because if has low boolean rank, then whenever solves a small system of linear equations modulo . For example, if , then any solution to is a solution to . Because we have reduced the problem to a linear problem, additive combinatorics comes into play. We use bounds on the Davenport constant [GG] to show that there are many solutions.

The difficult case is when has large boolean rank. In Section 4, we show that roughly speaking, this implies is equidistributed (Theorem 1). Using orthogonality of characters, the fact that for ,

 1m∑j(mod m)em(jy)={0,y≠01,y=0

for any function , we can count the number of zeros of using the following exponential sum. (For a similar application of exponential sums in complexity theory, see [Bo].)

 |{x∈Bn:f(x)=0}| =∑x∈Bn1m∑j(mod m)em(jf(x)) (2) ⟹12n|{x∈Bn:f(x)=0}| =1m+1m∑j≢0(mod m)Ex∈Bnem(jf(x)) (3)

If each exponential sum is small, then the proportion of zeros approximately equals . We show that high boolean rank implies that these sums are small.

A standard technique to bound an exponential sum is by Weyl differencing: squaring the sum effectively reduces the degree of . Complications arise due to the fact that we are working in rather than the group . We will find that the sum is small when the matrix corresponding to has an off-diagonal submatrix of high rank ((10) and Lemma 6). We show that high boolean rank is equivalent to having high diagonal rigidity (Proposition 3), which in turn implies that has such a off-diagonal submatrix of high rank (Lemma 5), as desired. Note that diagonal rigidity is a special case of the widely studied notion of matrix rigidity due to Valiant [Val].

Finally, we note two technical points. First, we need to define a notion of rank over . We collect the relevant definitions and facts in Appendix A. This makes the proof more technical. For simplicity, the reader may consider the case when is a product of distinct primes, so that the usual notion of rank over suffices.

Secondly, note that if is already biased modulo for some , then we expect (3) to be biased as well. Thus we factor and break the sum in (3) up into and . Consider moving prime factors from to . If the boolean rank increases slowly at each step, then the boolean rank modulo the “worst” prime is bounded, and we are in the low rank case. If the boolean rank increases too fast at any step, we will be in the high rank case. We conclude the theorem in this fashion in Section 5.

## 3 Low rank quadratic polynomials have many solutions

###### Definition 1.

The rank of a quadratic polynomial modulo is the minimal such that there exists a function and vectors such that for all ,

 f(x)=F(vT1x,…,vTrx). (4)

Note this extends the definition of rank of a quadratic form (the homogeneous case).

The boolean rank is defined the same way, except that (4) only has to hold for .

Note that in Definition 1 has a special form here: it is a sum of squares with coefficients. However, we will not use the structure of in our arguments.

###### Theorem 2.

Let be a quadratic polynomial modulo . Suppose that for each prime power , has boolean rank . Let . If has a solution , then the following hold.

1. If then has at least 2 solutions.

2. has at least

 2n−mrlogmlogn

solutions in .

The theorem will be a consequence of the following.

###### Theorem 3.

Let be a collection of vectors. Then the number of solutions to the system

 vTpix=0,1≤i≤rq,q||m

in is at least 2 if , and is at least .

The proof of this relies on a well-studied problem in additive combinatorics, that of determining the Davenport constant of a group. See [GG] for a survey.

###### Definition 4.

Let be an abelian group. The Davenport constant of , denoted is the minimal such that for all and all , the equation

 n∑i=1xigi=0

has a nontrivial solution .

###### Theorem 5 ([Gg, Theorem 3.6]).

Let be a nontrivial abelian group with exponent . Then

 d(G)≤(m−1)+mlog|G|m.

We need to turn this existence result into a lower bound on the number of solutions.

###### Lemma 6.

Let be a nontrivial abelian group. The number of solutions to

 n∑i=1xigi=0

is at least

 2n−(d(G)+1)logn.
###### Proof.

Given a solution , we can apply the definition of to . Hence we see that any -dimensional slice of that has 1 solution must have another solution.

Now we claim that every Hamming ball of radius must have at least 1 solution. Consider a point . Take a point solving the equation such that is minimal. If , then consider the -dimensional slice of that contains and such that moving in any of the directions brings closer to . There must be another point in this hypercube that solves the equation, contradicting the minimality of .

Every Hamming ball of radius has at least 1 solution, so by counting in two ways, the number of solutions is at least . ∎

###### Proof of Theorem 3.

This is exactly the equation in the definition of the Davenport constant, where and . The Davenport constant satisfies

 d(G)≤(m−1)+mlog|G|m

Now apply Lemma 6. ∎

###### Proof of Theorem 2.

By definition of boolean rank there exist such that for all ,

 f(x)=F(vT1x,…,vTrx).

Without loss of generality , , so that whenever . Now use Theorem 3

## 4 High rank implies equidistribution

In this section we prove the following theorem.

###### Theorem 1 (High rank implies equidistribution).

Let be a positive integer. Let be a quadratic polynomial in variables. If there exists a factor such that modulo has boolean rank at least , then

 ∣∣∣Ex∈Bnem(f(x))∣∣∣<ε.

First we give a different interpretation for the (boolean) rank. For simplicity, suppose is prime. The boolean rank does not change if changes by a constant, so assume has constant term 0. For any linear form , on we can treat as a quadratic form because if , then . Hence,

 brank(f)≤1+minf0 linearrank(f+f0).

Equivalently, when , we can think in terms of the matrix corresponding to . Here is the matrix such that , i.e., the matrix of the bilinear form . By using , we have that linear forms corresponds to a diagonal matrices, so

 brank(f)≤1+minD diagonalrank(Af+D).

This motivates the following definition. (For the definition of matrix rank when is composite, see Appendix A.)

###### Definition 2.

Let be a matrix over . We say is -diagonal rigid if for all diagonal matrices , .

Diagonal rigidity is related to a more widely studied notion of matrix rigidity, in which the matrix can be any sparse matrix. Matrix rigidity is an extensively studied problem with many applications to complexity theory. (See [rigid] for a survey.)

We formalize our argument above as the following proposition. The argument extends to prime powers because it still holds that a quadratic form depends only on the projection of in directions (Proposition 4).

###### Proposition 3.

Let be a prime power, a quadratic polynomial. If , assume has even coefficients. If is -rigid, then

 brank(f)≤r+1.

Before we prove Theorem 1, we need a few lemmas.

###### Lemma 4.

Let be a positive integer and let be given by a linear polynomial modulo involving variables:

 f(x)=t∑j=1ajxij,aj≠0.

Then

 ∣∣∣Ex∈Bnem(f(x))∣∣∣≤(1−1m2)t≤e−tm2.
###### Proof.

The sum decomposes as a product over the coordinates:

 |Ex∈Bnem(f(x))| ≤∣∣ ∣∣∏j∈[n]Exj∈B(em(aijxj))∣∣ ∣∣ =∏j∈[n]∣∣ ∣∣1+em(aij)2∣∣ ∣∣ ≤∏j∈[n]∣∣∣1+em(1)2∣∣∣ ≤(1−1m2)t.

In the last step we use . ∎

Next we show that a symmetric, rigid matrix has a large off-diagonal submatrix of full rank. The main technicality comes from working over composite moduli.

###### Lemma 5.

Let be a matrix over , where is a prime power.

Suppose is symmetric and -rigid, . Then there exist disjoint sets of indices such that is a square matrix of full rank, with rank at least .

###### Proof.

Suppose is a matrix.

If there are disjoint , such that has rank at least , then the result follows because we can find a square submatrix of full rank (Proposition 3).

We show the contrapositive: if the maximum rank of an off-diagonal submatrix is , then there exists a diagonal matrix so that .

Take the off-diagonal matrix of maximal rank. To break ties, choose the matrix whose rows generate the largest subgroup. By Proposition 3 there is a submatrix whose rows and columns generate an isomorphic subgroup. Without loss of generality, assume that it has row indices and column indices . The matrix , also rank .

Now we show that we can pick the first entries of so that has rank at most . We will also be able to carry out the same procedure on the last rows by considering the reflection of across the diagonal, giving the total of .

For , consider the matrix . Let be its rows. Of all off-diagonal rank- matrices, generates the largest subgroup. Now contains this matrix so its th row is a linear combination of the previous rows,

 vt=∑iaivi. (5)

Let us be more precise: The set of a that satisfy (5) is where lnull denotes the left nullspace and is a particular solution to (5). In other words,

 lnull(A[1,s]∪{t}×[s+1,n]∖{t})=(lnull(AI1×I2),0)+⟨(at,−1)⟩⊆Zsm×Zm (6)

Now add in the th column: consider the matrix . Choose so that

 (D+A)tt=s∑i=1aiAit.

Choosing in this way for , we find that the left nullspace of is generated by

 (lnull(AI1×I2), 0,…,0) (as+1, −1,…,0) ⋮ (a⌊n2⌋, 0,…,−1),

and hence isomorphic to . Thus as groups,

 rowspace((D+A)[1,⌊n2⌋]×[s+1,n]) ≅Z⌊n2⌋m/% lnull((D+A)[1,⌊n2⌋]×[s+1,n]) ≅Z⌊n2⌋m/% lnull(AI1×I2)×Z⌊n2⌋−sm ≅Zsm/lnull(AI1×I2)≅rowspace(AI1×I2).

Hence

 rank((D+A)[1,⌊n2⌋]×[s+1,n])=rank(AI1×I2)=s,

as needed.

Finally, for any choice of , has rank . This completes the proof. ∎

###### Proof of Theorem 1.

By Proposition 3, a lower bound for the boolean rank gives a lower bound for the rigidity of . If is a power of 2 and has odd coefficients, then is not well defined. In this case we can replace by and by . This neither changes the boolean rank nor the exponential sum. Hence we can assume is -rigid over .

We use Weyl’s differencing technique. To bound the exponential sum we square it to reduce the degree of the polynomial in the exponent. We have to be careful of the fact that we are working in rather than , so the differences are not allowed to “wrap around.” For a function defined on , and , define

 Δhf(x)=f(x+h)−f(x)

when .

We have

 ∣∣∣Ex∈Bnem(f(x))∣∣∣2 =122n∑x,y∈Bnem(f(y)−f(x)) (7) =122n∑h∈{−1,0,1}n∑xi={0,hi=11,hi=−1em(Δhf(x)) (8) ≤122n∑h∈{−1,0,1}n∣∣ ∣∣∑xi={0,hi=11,hi=−1em(Δhf(x))∣∣ ∣∣ (9)

Here we used the fact that the set of pairs is the same as the set of pairs where satisfy the conditions below the sum.

Let be the set of nonzero entries of and be the number of nonzero entries of . Let denote the number of nonzero (nonconstant) coefficients of the linear function restricted to subcube of such that ; note that this subcube is of size . By Lemma 4 the exponential sum is at most

 ∣∣∣Ex∈Bnem(f(x))∣∣∣2 ≤122n∑h∈{−1,0,1}n2n−|Supp(h)|e−Nh/m2 =∑h∈{−1,0,1}nP(h)e−Nh/m2

where in the last expression we think of as a random variable with , .

We show that if is -rigid mod , then with high probability is large, so that is small.

Note that can be computed as follows. We have that . Since we are considering the restriction of to a subcube where only the with are free, is the number of nonzero entries in . We can consider choosing in 2 stages. First choose a random partition ; will contain the indices where is 0 and will contain the indices where is . Then choose uniformly at random. Now

 (Afh)[n]∖Supp(h)=∥∥(Af)I1×I2hI2∥∥0

so the expected value is

 (10)

We need the following claim.

###### Lemma 6.

Suppose that is a matrix over with rank . Suppose that is given and is chosen uniformly at random. Let . Then

 P(∥v+Aw∥0≤tr)≤2(t+H(t)−1)r+o(1)

as , where .

###### Proof.

We may reduce to the case where has rows by Proposition 3, because having at most nonzero entries in a given set of entries is a weaker condition than having at most nonzero entries.

First we claim that for any -dimensional hyperplane , the number of solutions to is at most . Suppose the column space of is isomorphic to . There exists an invertible matrix such that . We have are interested in solutions to

 v+Aw ∈H ⟺Aw ∈−v+H ⟺Dw ∈−Mv+MH.

Let be a matrix whose columns generate . We would like to count the number of solutions to

 Dw =−Mv+M(Nu) ⟺∀i,0 or ai =(−Mv+M(Nu))i

By putting in “column-echelon form,” we find that there are at most possibilities for . This proves the claim.

Now note the set is defined by hyperplanes. Thus

 |{w∈Br:∥v+Aw∥0≤tr}|≤(rtr)2tr=2(H(t)+t)r+o(1),

giving the bound. ∎

Let be a prime power fully dividing , and suppose is -rigid for , to be chosen. By Lemma 5, there exist disjoint such that has rank and is full rank.

Let be a small constant. We have the following with high probability.

1. If and are random subsets, where each element is included individually with probability , with high probability and

 rank((Af)J′1×J2)≥(1−δ)r8.

The probability of failure is .

2. If item 1 holds, choose any columns of that generate a rank subgroup. With high probability, will intersect at least of them, and

 rank((Af)J′1×J′2)≥(1−δ)2r16

The probability of failure is again .

3. For a random partition, the intersections , are random, so they can be modeled by and we get

 rank((Af)I′1×I′2)≥(1−δ)2r16

By Lemma 6, , i.e., with high probability

 ∥∥(Af)I′1×I′2h∥∥0>(1−δ)2r64.

The probability of failure is .

Thus separating out the terms in the sum which have in (10), we get

 (11)

In our setting , so (11) equals . This proves the theorem. ∎

## 5 Proof of main theorem

###### Proof of Theorem 2.

Note that if (not necessarily relatively prime) and the proportion of zeros is already biased, we expect (3) to be biased as well. To take this into account, we separate out the terms where and use . Then (3) becomes

 (???) =1m+1m∑j(mod m)≢0(mod m1)Ex∈Bnem(jf(x))+1m∑j(mod m)≡0(mod m1)em(jf(x)) (12) =1m+1m∑j(mod m)≢0(mod m1)Ex∈Bnem(jf(x))+1m∑k≢0(