Polynomial Representations of Threshold Functions and Algorithmic Applications

Polynomial Representations of Threshold Functions and Algorithmic Applications

Josh Alman1    Timothy M. Chan2    Ryan Williams3
11Computer Science Department, Stanford University, jalman@cs.stanford.edu. Supported by NSF CCF-1212372 and NSF DGE-114747.
22Cheriton School of Computer Science, University of Waterloo, tmchan@uwaterloo.ca. Supported by an NSERC grant.
33Computer Science Department, Stanford University, rrw@cs.stanford.edu. Supported in part by NSF CCF-1212372 and CCF-1552651 (CAREER). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Abstract

We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with “nice” threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilistic PTFs where we combine the above techniques to achieve even lower degree with similar “nice” threshold behavior. Utilizing these polynomial constructions, we design faster algorithms for a variety of problems:

  • Offline Hamming Nearest (and Furthest) Neighbors: Given red and blue points in -dimensional Hamming space for , we can find an (exact) nearest (or furthest) blue neighbor for every red point in randomized time or deterministic time . These improve on a randomized bound by Alman and Williams (FOCS’15), and also lead to faster MAX-SAT algorithms for sparse CNFs.

  • Offline Approximate Nearest (and Furthest) Neighbors: Given red and blue points in -dimensional or Euclidean space, we can find a -approximate nearest (or furthest) blue neighbor for each red point in randomized time near . This improves on an algorithm by Valiant (FOCS’12) with randomized time near , which in turn improves previous methods based on locality-sensitive hashing.

  • SAT Algorithms and Lower Bounds for Circuits With Linear Threshold Functions: We give a satisfiability algorithm for circuits with a subquadratic number of linear threshold gates on the bottom layer, and a subexponential number of gates on the other layers, that runs in deterministic time. This strictly generalizes a SAT algorithm for circuits of subexponential size by Williams (STOC’14) and also implies new circuit lower bounds for threshold circuits, improving a recent gate lower bound of Kane and Williams (STOC’16). We also give a randomized -time SAT algorithm for subexponential-size circuits, where the top gate and middle gates have fan-in.

1 Introduction

The polynomial method is a powerful tool in circuit complexity. The idea of the method is to transform all circuits of some class into “nice” polynomials which represent the circuit in some way. If the polynomial is always sufficiently nice (e.g. has low degree), and one can prove that a certain Boolean function cannot be represented so nicely, one concludes that the circuit class is unable to compute .

Recently, these tools have found surprising uses in algorithm design. If a subproblem of an algorithmic problem can be modeled by a simple circuit, and that circuit can be transformed into a “nice” polynomial (or “nice” distribution of polynomials), then fast algebraic algorithms can be applied to evaluate or manipulate the polynomial quickly. This approach has led to advances on problems such as All-Pairs Shortest Paths [Wil14a], Orthogonal Vectors and Constraint Satisfaction [WY14, AWY15, Wil14d], All-Nearest Neighbor problems [AW15], and Stable Matching [MPS16].

In most applications, the key step is to randomly convert simple circuits into so-called probabilistic polynomials. If is a Boolean function on variables, and is a ring, a probabilistic polynomial over for with error and degree is a distribution of degree- polynomials over such that for all , . Razborov [Raz87] and Smolensky [Smo87] introduced the notion of a probabilistic polynomial, and showed that any low-depth circuit consisting of AND, OR, and PARITY gates can be transformed into a low degree probabilistic polynomial by constructing constant degree probabilistic polynomials for those three gates. Many polynomial method algorithms use this transformation.

In this work, we are interested in polynomial representations of threshold functions. The threshold function determines whether at least a fraction of its input bits are 1s. Threshold functions are among the simplest Boolean functions that do not have constant degree probabilistic polynomials: Razborov and Smolensky showed that the MAJORITY function (a special case of a threshold function) requires degree . Nonetheless, as we will see throughout this paper, there are many important problems which can be reduced to evaluating circuits involving threshold gates on many inputs, and so further study of polynomial representations of threshold functions is warranted.

Threshold functions have been extensively studied in theoretical computer science for many years; there are numerous applications of linear and polynomial threshold functions to complexity and learning theory (a sample includes  [BRS91, BS92, ABFR94, Bei95, KS01, OS10, She14]).

1.1 Our Results

We consider three different notions of polynomials representing . Each achieves different tradeoffs between polynomial degree, the randomness required, and how accurately the polynomial represents . Each leads to improved algorithms in our applications.

Less Randomness. First, we revisit probabilistic polynomials. Alman and Williams [AW15] designed a probabilistic polynomial for which already achieves a tight degree bound of . However, their construction uses random bits, which makes it difficult to apply in deterministic algorithms. We show how their low-degree probabilistic polynomials for threshold functions can use substantially fewer random bits:

Theorem 1.1.

For any , there is a probabilistic polynomial for the function of degree on bits with error that can be randomly sampled using only random bits.

Polynomial Threshold Function Representations. Second, we consider deterministic Polynomial Threshold Functions (PTFs). A PTF for a Boolean function is a polynomial (not a distribution on polynomials) such that is smaller than a fixed value when , and is larger than the value when . In our applications, we seek PTFs with “good threshold behavior”, such that when , and is very large otherwise. We can achieve almost the same degree for a PTF as for a probabilistic polynomial, and even better degree for an approximate threshold function:

Theorem 1.2.

We can construct a polynomial of degree , such that

  • if , then ;

  • if , then ;

  • if , then .

For the “exact” setting with , we can alternatively bound the degree by .

By summing multiple copies of the polynomial from Theorem 1.2, we immediately obtain a PTF with the same degree for the OR of threshold functions (needed in our applications). This theorem follows directly from known extremal properties of Chebyshev polynomials, as well as the lesser known discrete Chebyshev polynomials. Because Theorem 1.2 gives a single polynomial instead of a distribution on polynomials, it is especially helpful for designing deterministic algorithms. Chebyshev polynomials are well-known to yield good approximate polynomials for computing certain Boolean functions over the reals [NS94, Pat92, KS01, She13, Val12] (please see the Preliminaries for more background).

Probabilistic PTFs. Third, we introduce a new (natural) notion of a probabilistic PTF for a Boolean function . This is a distribution on PTFs, where for each input , a PTF drawn from the distribution is highly likely to agree with on . Combining the techniques from probabilistic polynomials for and the deterministic PTFs in a simple way, we construct a probabilistic PTF with good threshold behavior whose degree is lower than both the deterministic PTF and the degree bounds attainable by probabilistic polynomials (surprisingly breaking the “square-root barrier”):

Theorem 1.3.

We can construct a probabilistic polynomial of degree , such that

  • if , then with probability at least ;

  • if , then with probability at least ;

  • if , then with probability at least .

For the “exact” setting with , we can alternatively bound the degree by .

The PTFs of Theorem 1.3 can be sampled using only random bits as well; their lower degree will allow us to design faster randomized algorithms for a variety of problems. For emphasis, we will sometimes refer to PTFs as deterministic PTFs to distinguish them from probabilistic PTFs.

These polynomials for can be applied to many different problems:

Offline Hamming Nearest Neighbor Search. In the Hamming Nearest Neighbor problem, we wish to preprocess a set of points in such that, for a query , we can quickly find the with smallest Hamming distance to . This problem is central to many problems throughout Computer Science, especially in search and error correction [Ind04]. However, it suffers from the curse of dimensionality phenomenon, where known algorithms achieve the nearly trivial runtimes of either or , with matching lower bounds in many data structure models (see e.g. [BR02]). Using our PTFs, we instead design a new algorithm for the natural offline version of this problem:

Theorem 1.4.

Given red and blue points in for , we can find an (exact) Hamming nearest/farthest blue neighbor for every red point in randomized time .

Using the same ideas, we are also able to derandomize our algorithm, to achieve deterministic time (see Remark 3 in Section 5). When for constant , these algorithms both have “truly subquadratic” runtimes. These both improve on Alman and Williams’ algorithm [AW15] which runs in randomized time , and only gives a nontrivial algorithm for . Applying reductions from [AW15], we can achieve similar runtimes for finding closest pairs in for vectors with small integer entries, and pairs with maximum inner product or Jaccard coefficient.

It is worth noting that there may be a serious limit to solving this problem much faster. Theorem 1.4 (and [AW15]) shows for all there is a such that Offline Hamming Nearest Neighbor search in dimension takes time. Showing that there is a universal that works for all would disprove the Strong Exponential Time Hypothesis [AW15, Theorem 1.4].

Offline Approximate Nearest Neighbor Search. The problem of finding high-dimensional approximate nearest neighbors has received even more attention. Locality-sensitive hashing yields data structures that can find -factor approximate nearest neighbors to any query point in (randomized) time after preprocessing in time and space,444Throughout the paper, the notation hides polylogarithmic factors, denotes , and denotes a fixed polynomial in . for not only Hamming space but also and space [HIM12, AI06]. Thus, a batch of queries can be answered in randomized time. Exciting recent work on locality-sensitive hashing [AINR14, AR15] has improved the constant factor in the bound, but not the growth rate in . In 2012, G. Valiant [Val12] reported a surprising algorithm running in randomized time for the offline version of the problem in . We obtain a still faster algorithm for the offline problem, with improved to about :

Theorem 1.5.

Given red and blue points in and , we can find a -approximate or nearest/farthest blue neighbor for each red point in randomized time.

Valiant’s algorithm, like Alman and Williams’ [AW15], relied on fast matrix multiplication, and it also used Chebyshev polynomials but in a seemingly more complicated way. Our new probabilistic PTF construction is inspired by our attempt to unify Valiant’s approach with Alman and Williams’, which leads to not only a simplification but also an improvement of Valiant’s algorithm. (We also almost succeed in derandomizing Valiant’s result in the Hamming case, except for an initial dimension reduction step; see Remark 3 in Section 5.)

Numerous applications to high-dimensional computational geometry follow; for example, we can approximate the diameter or Euclidean minimum spanning tree in roughly the same running time.

MAX-SAT. Another application is MAX-SAT: finding an assignment that satisfies the maximum number of clauses in a given CNF formula with variables. In the sparse case when the number of clauses is , a series of papers have given faster exact algorithms, for example, achieving time by Dantsin and Wolpert [DW06], time by Sakai et al. [SSTT15a], and time by Chen and Santhanam [CS15]. Using the polynomial method and our new probabilistic PTF construction, we obtain the following improved result:

Theorem 1.6.

Given a CNF formula with variables and clauses, we can find an assignment that satisfies the maximum number of clauses in randomized time.

For general dense instances, the problem becomes tougher. Williams [Wil04] gave an -time algorithm for MAX-2-SAT, but an -time algorithm for MAX-3-SAT (for a universal ) has remained open; currently the best reported time bound [SSTT15b] is , which can be slightly improved to with more care. We make new progress on not only MAX-3-SAT but also MAX-4-SAT:

Theorem 1.7.

Given a weighted 4-CNF formula with variables with positive integer weights bounded by , we can find an assignment that maximizes the total weight of clauses satisfied in , in randomized time. In the sparse case when the clauses have total weight , the time bound improves to .

LTF-LTF Circuit SAT Algorithms and Lower Bounds. Using our small sample space for probabilistic MAJORITY polynomials (Theorem 1.1), we construct a new circuit satifiability algorithm for circuits with linear threshold functions (LTFs) which improves over several prior results. Let be the class of circuits with a layer of LTFs at the bottom layer (nearest the inputs), a layer of LTFs above the bottom layer, and a size- circuit of depth above the two LTF layers.555Recall that for an integer , refers to constant-depth unbounded fan-in circuits over the basis , where outputs iff the sum of its input bits is divisible by .

Theorem 1.8.

For every integer , , and , there is an and an algorithm for satisfiability of circuits that runs in deterministic time.

Williams [Wil14b] gave a comparable SAT algorithm for circuits of size, where is sufficiently small.666Recall is the infinite union of for all integers . Theorem 1.8 strictly generalizes the previous algorithm, allowing another layer of linear threshold functions below the existing layer. Theorem 1.8 also trivially implies deterministic SAT algorithms for circuits of up to gates, improving over the recent SAT algorithms of Chen, Santhanam, and Srinivasan [CSS16] which only work for -wire circuits for , and the SAT algorithms of Impagliazzo, Paturi, and Schneider [IPS13].

Here we sketch the ideas in the SAT algorithm for . Similar to the SAT algorithm for circuits [Wil14b], the bottom layer of s can be replaced by a layer of DNFs, via a weight reduction trick. We replace s in the middle layer with circuits (modifying a construction of Maciel and Thérien [MT98] to keep the fan-in of gates low), then replace these gates of fan-in with probabilistic -polynomials of degree over a small sample space, provided by Theorem 1.1. Taking a majority vote over all samples, and observing that an -polynomial is a circuit, we obtain a circuit, but with size in some of its layers. By carefully applying known depth reduction techniques, we can convert the circuit into a depth-two circuit of size which can then be evaluated efficiently on many inputs. (This is not obvious: applying the Beigel-Tarui depth reduction to a -size circuit would make its new size quasi-polynomial in , yielding an intractable bound of .)

Applying the known connection between circuit satisfiability algorithms and circuit lower bounds for problems [Wil10, Wil14c, JMV15], the following is immediate:

Corollary 1.1.

For every , , and , there is an such that the class does not have non-uniform circuits in . In particular, for every , does not have circuits where the subcircuit has size and the bottom layer has gates.

Most notably, Corollary 1.1 proves lower bounds with LTFs on the bottom layer and subexponentially many LTFs on the second layer. This improves upon recent gate lower bounds of Kane and Williams [KW16], at the cost of raising the complexity of the hard function from to . Suguru Tamaki [Tam16] has recently reported similar results for depth-two circuits with both symmetric and threshold gates.

A Powerful Randomized SAT Algorithm. Finally, combining the probabilistic PTF for MAJORITY (Theorem 1.3) with the probabilistic polynomial of [AW15], we give a randomized SAT algorithm for a rather powerful class of circuits. The class denotes the class of circuits with a majority gate at the top, along with two layers of linear threshold gates, and arbitrary -depth circuitry between these three layers. This circuit class is arguably much more powerful than (), based on known low-depth circuit constructions for arithmetic functions (e.g. [CSV84, MT98, MT99]).

Theorem 1.9.

For all and integers , there is a and a randomized satisfiability algorithm for circuits of depth running in time, on circuits with the following properties:

  • the top gate, along with every on the middle layer, has fan-in, and

  • there are many gates (anywhere) and gates at the bottom layer.

Theorem 1.9 applies the probabilistic PTF of degree about (Theorem 1.3) to the top gate, probabilistic polynomials over of degree about (Theorem 1.1) to the middle LTFs, and weight reduction to the bottom LTFs; the rest can be represented with degree.

It would not be surprising (to at least one author) if the above circuit class contained strong pseudorandom function candidates; that is, it seems likely that the Natural Proofs barrier applies to this circuit class. Hence from the circuit lower bounds perspective, the problem of derandomizing the SAT algorithm of Theorem 1.9 is extremely interesting.

2 Preliminaries

Notation. In what follows, for define . For a logical predicate , we use the notation to denote the function which outputs when is true, and when is false.

For , define to be the threshold function . In particular, .

For classes of circuits and , denotes the class of circuits consisting of a single circuit whose inputs are the outputs of some circuits from . That is, is simply the composition of circuits from and .

Rectangular Matrix Multiplication. One of our key tools is fast rectangular matrix multiplication:

Lemma 2.1 (Coppersmith [Cop82]).

For all sufficiently large , multiplication of an matrix with an matrix can be done in arithmetic operations over any field.

A proof can be found in the appendix of [Wil14b].

Chebyshev Polynomials in TCS. Another key to our work is that we find new applications of Chebyshev polynomials to algorithm design. This is certainly not a new phenomenon in itself; here we briefly survey some prior related usages of Chebyshev polynomials. First, Nisan and Szegedy [NS94] used Chebyshev polynomials to compute the OR function on Boolean variables with an “approximating” polynomial , such that for all we have , yet . They also proved the degree bound is tight up to constants in the big-O; Paturi [Pat92] generalized the upper and lower bound to all symmetric functions.

This work has led to several advances in learning theory. Building on the polynomials of Nisan and Szegedy, Klivans and Servedio [KS01] showed how to compute an OR of ANDs of variables with a PTF of degree , similar to our degree bound for computing an OR of MAJORITYs of variables of Theorem 1.2 (however, note our bound in the “exact” setting is a bit better, due to our use of discrete Chebyshev polynomials). They also show how to compute an OR of ANDs on variables with a deterministic PTF of degree, similar to our cube-root-degree probabilistic PTF for the OR of MAJORITY of Theorem 1.3 in the “exact” setting. However, it looks difficult to generalize Klivans-Servedio’s degree bound to compute an OR of MAJORITY: part of their construction uses a reduction to decision lists which works for conjunctions but not for MAJORITY functions. Klivans, O’Donnell and Servedio [KOS04] show how to compute an AND of MAJORITY on variables with a PTF of degree . By a simple transformation via De Morgan’s law, there is a polynomial for OR of MAJORITY with the same degree. Their degree is only slightly worse than ours in terms of (because we use discrete Chebyshev polynomials).

In streaming algorithms, Harvey, Nelson, and Onak [HNO08] use Chebyshev polynomials to design efficient algorithms for computing various notions of entropy in a stream. As a consequence of a query upper bound in quantum computing, Ambainis et al. [ACR10] show how to approximate any Boolean formula of size with a polynomial of degree , improving on earlier bounds of O’Donnell and Servedio [OS10] that use Chebyshev polynomials. Sachdeva and Vishnoi [SV13] give applications of Chebyshev polynomials to graph algorithms and matrix algebra. Linial and Nisan [LN90] use Chebyshev polynomials to approximate inclusion-exclusion formulas, and Sherstov [She08] extends this to arbitrary symmetric functions.

3 Derandomizing Probabilistic Polynomials for Threshold Functions

In this section, we revisit the previous probabilistic polynomial for the majority function on bits, and show it can be implemented using only random bits. Our construction is essentially identical to that of [AW15], except that we use far fewer random bits to sample entries from the input vector in the recursive step of the construction.

For the analysis, we need a Chernoff bound for bits with limited independence:

Lemma 3.1 ([Sss95] Theorem 5 (I)(b)).

If is the sum of -wise independent random variables, each of which is confined to the interval , with , , and , then

In particular, the following inequality appears in the analysis of [AW15]:

Corollary 3.1.

If with , and is a vector each of whose entries is -wise independently chosen entry of , where , with , then for every ,

where .

Proof.

Apply Lemma 3.1 with , , and . ∎

Reminder of Theorem 1.1 For any , there is a probabilistic polynomial for the threshold function of degree on bits with error that can be randomly sampled using random bits.

Proof.

Our polynomial is defined recursively, just as in [AW15]. Set . Using their notation, the polynomial for computing on bits with error is defined by:

In [AW15], was a sample of bits of , chosen independently at random. Here, we pick to be a sample of bits chosen -wise independently, for . The other polynomials in this recursive definition are as in [AW15]:

  • for is the (recursively defined) probabilistic polynomial for on bits and error

  • for

  • is an exact polynomial of degree at most which gives the correct answer to for any vector with , and may give arbitrary answers on other vectors.

Examining the proof of correctness in Alman and Williams [AW15], we see that the only requirement of the randomness is that it satisfies their Lemma 3.4, a concentration inequality for sampling from . Our Corollary 3.1 is identical to their Lemma 3.4, except that it replaces their method of sampling with -wise sampling; the remainder of the proof of correctness is exactly as before.

Our polynomial construction is recursive: we divide by and divide by , each time we move from one recursive layer to the next. At the th recursive level of our construction, for , we need to -wise independently sample entries from a vector of length . Summing across all of the layers, we need a total of samples from a -wise independent space, where is never more than . This can be done all together using samples from which are -wise independent. Using standard constructions, this requires random bits. ∎

4 PTFs for ORs of Threshold Functions

In this section, we show how to construct low-degree PTFs representing threshold functions that have good threshold behavior, and consequently obtain low-degree PTFs for an OR of many threshold functions.

4.1 Deterministic Construction

We begin by reviewing some basic facts about Chebyshev polynomials. The degree- Chebyshev polynomial of the first kind is

Fact 4.1.

For any ,

  • if , then ;

  • if , then ;

  • if , then .

Proof.

The first property easily follows from the known formula for . The second and third properties follow from another known formula for , which for implies . ∎

In certain scenarios, we obtain slightly better results using a (lesser known) family of discrete Chebyshev polynomials defined as follows [Hir03, page 59]:

(See also [Sze75, pages 33–34] or Chebyshev’s original paper [Che99] with an essentially equivalent definition up to rescaling.)

Fact 4.2.

Let . For all ,

  • if , then ;

  • if , then .

Proof.

From [Hir03, page 61],

Thus, for every integer , we have .

For , we have , and by the Chu–Vandermonde identity,

Reminder of Theorem 1.2 We can construct a polynomial of degree , such that

  • if , then ;

  • if , then ;

  • if , then .

For the “exact” setting with , we can alternatively bound the degree by .

Proof.

Set for a parameter  to be determined. The first two properties are obvious from Fact 4.1. On the other hand, if , then Fact 4.1 shows that , provided we set . This achieves degree.

When the above yields degree; we can reduce the factor by instead defining . Now, if , then by setting . ∎

Using Theorem 1.2, we can construct a low-degree PTF for computing an OR of thresholds of bits:

Corollary 4.1.

Given , we can construct a polynomial of degree at most and at most monomials, such that

  • if the formula is false, then ;

  • if the formula is true, then .

For the exact setting with , we can alternatively bound by .

Proof.

Define where is from Theorem 1.2. The stated properties clearly hold. (In the second case, the output is at least .) ∎

4.2 Probabilistic Construction

Allowing ourselves a distribution of PTFs to randomly draw from, we can achieve noticeably lower degree than the previous section. We start with a fact which follows easily from the (tight) probabilistic polynomial for MAJORITY:

Fact 4.3.

(Alman–Williams [AW15], or Theorem 1.1) We can construct a probabilistic polynomial of degree , such that

  • if , then with probability at least ;

  • if , then with probability at least .

Reminder of Theorem 1.3 We can construct a probabilistic polynomial of degree , such that

  • if , then with probability at least ;

  • if , then with probability at least ;

  • if , then with probability at least .

For the “exact” setting with , we can alternatively bound the degree by .

Proof.

Let and be parameters to be set later. Draw a random sample of size . Let

for a sufficiently large constant . Define

where is the polynomial from Theorem 1.2, with and .

To verify the stated properties, consider three cases:

  • Case 1: . By a standard Chernoff bound, with probability at least , we have (assuming that ). Thus, with probability at least , we have and so .

  • Case 2: . With probability at least , we have and so .

  • Case 3: . By a standard Chernoff bound, with probability at least , we have . Thus, with probability at least , we have and so for , or for .

The degree of is

and we can set . For the exact setting, the degree is

and we can set . ∎

Remark 1.

Using the same techniques as in Theorem 1.1, we can sample a probabilistic polynomial from Theorem 1.3 with only random bits.

Corollary 4.2.

Given , we can construct a probabilistic polynomial of degree at most with at most monomials, such that

  • if is false, then with probability at least ;

  • if is true, then with probability at least .

For the exact setting with , we can alternatively bound by .

Proof.

Define

Remark 2.

The coefficients of the polynomials from Fact 4.3 are -bit integers, and it can be checked that the coefficients of all our deterministic and probabilistic polynomials are rational numbers with -bit numerators and a common -bit denominator, and that the same bound for the number of monomials holds for the construction time, up to factors. That is, computations with these polynomials have low computational overhead relative to .

5 Exact and Approximate Offline Nearest Neighbor Search

We now apply our new probabilistic PTF construction to obtain a faster algorithm for offline exact nearest/farthest neighbor search in Hamming space:

Reminder of Theorem 1.4 Given red and blue points in for , we can find an (exact) Hamming nearest/farthest blue neighbor for every red point in randomized time .

Proof.

We proceed as in Abboud, Williams, and Yu’s algorithm for Boolean orthogonal vectors [AWY15] or Alman and Williams’ algorithm for Hamming closest pair [AW15]. For a fixed , we first solve the decision problem of testing whether the nearest neighbor distance is less than for each red point. (Farthest neighbors are similar.) Let for some parameter to be set later. Arbitrarily divide the blue point set into groups of points. For every group of blue points and every red point , we want to test whether

(where denotes the -th coordinate of a point ). By Corollary 4.2, we can express as a probabilistic polynomial that has the following number of monomials: