Quantified Derandomization of Linear Threshold Circuits

One of the prominent current challenges in complexity theory is the attempt to prove lower bounds for , the class of constant-depth, polynomial-size circuits with majority gates. Relying on the results of Williams (2013), an appealing approach to prove such lower bounds is to construct a non-trivial derandomization algorithm for . In this work we take a first step towards the latter goal, by proving the first positive results regarding the derandomization of circuits of depth .

Our first main result is a quantified derandomization algorithm for circuits with a super-linear number of wires. Specifically, we construct an algorithm that gets as input a circuit over input bits with depth and wires, runs in almost-polynomial-time, and distinguishes between the case that rejects at most inputs and the case that accepts at most inputs. In fact, our algorithm works even when the circuit is a linear threshold circuit, rather than just a circuit (i.e., is a circuit with linear threshold gates, which are stronger than majority gates).

Our second main result is that even a modest improvement of our quantified derandomization algorithm would yield a non-trivial algorithm for standard derandomization of all of , and would consequently imply that . Specifically, if there exists a quantified derandomization algorithm that gets as input a circuit with depth and wires (rather than wires), runs in time at most , and distinguishes between the case that rejects at most inputs and the case that accepts at most inputs, then there exists an algorithm with running time for standard derandomization of .

## 1 Introduction

The classical problem of derandomization of a circuit class is the following: Given a circuit , deterministically distinguish between the case that the acceptance probability of is at least and the case that the acceptance probability of is at most . When , this problem can be solved in polynomial time if and only if . However, at the moment we do not know how to solve the problem in polynomial time even if is the class of polynomial-sized CNFs.

The derandomization problem for a circuit class is tightly related to lower bounds for . Relying on the classic hardness-randomness paradigm [Yao82, BM84, NW94], sufficiently strong lower bounds for a class imply the existence of pseudorandom generators with short seed for , which allow to derandomize (see, e.g., [AB09, Chp. 20][Gol08, Chp. 8.3]). On the other hand, the existence of a non-trivial derandomization algorithm for a circuit class typically implies (weak) lower bounds for . Specifically, for many specific classes (e.g., ), the existence of a derandomization algorithm for running in time implies that , and in some cases also that (see [Wil13, SW13, BV14], which build on [IW98, IKW02]).

Following Williams’ proof that does not contain  [Wil11], one of the prominent current challenges in complexity theory is the attempt to prove similar lower bounds for the complexity class (i.e., the class of constant-depth, polynomial-sized circuits with majority gates, which extends ). Even after extensive efforts during the last few decades (and with renewed vigor recently), the best-known lower bounds for assert the existence of functions in that require circuits with a slightly super-linear number of wires, or with a linear number of gates (see Section 2 for further background).

Since derandomization algorithms imply lower bounds in general, an appealing approach to prove lower bounds for is to construct derandomization algorithms for this class. Moreover, a non-trivial derandomization of would separate from (and not only from ; see [SW13, BV14]). Accordingly, the problem of either derandomizing or constructing a deterministic algorithm for satisfiability of (which would be a stronger result) was recently suggested as a central open problem in complexity theory both by Williams [Wil14a] and by Aaronson [Aar17]1

An intensive recent effort has been devoted to constructing deterministic algorithms for satisfiability . Such algorithms (with non-trivial running time) have been constructed for circuits of depth two, and for certain “structured subclasses” of (see [IPS13, Wil14b, AS15, SSTT16, Tam16]). However, much less is known about derandomization algorithms for . Following an intensive effort to construct pseudorandom generators for a single linear threshold function [DGJ10, RS10, GOWZ10, KRS12, MZ13, Kan11, Kan14, KM15, GKM15] (i.e., a single “gate”; for background see Sections 2.2 and 4.2), a first step towards derandomizing circuits was very recently undertaken by Servedio and Tan [ST17b], who considered the problem of derandomizing circuits of depth two2

In this work we take a significant additional step towards the derandomization of , by proving the first positive results regarding the derandomization of circuits of any constant depth . Loosely speaking, we first construct an algorithm for a “relaxed” type of derandomization problem of sparse circuits of any constant depth . As far as we are aware of, this is the first deterministic circuit-analysis algorithm for circuits of any constant depth that do not admit any special structure (other than being sparse). Then, we show that even a modest improvement in the parameters of the foregoing algorithm (for the “relaxed” problem) would yield a non-trivial algorithm for standard derandomization of all of ; indeed, as mentioned above, such a result would imply that . We thus suggest this approach (of the “relaxed” derandomization problem) as a potentially tractable line-of-attack towards proving (see Section 1.1.3).

### 1.1 Our results

Our two main results lie within the framework of quantified derandomization. Quantified derandomization, which was introduced by Goldreich and Wigderson [GW14], is the relaxed derandomization problem of distinguishing between a circuit that accepts of its inputs and a circuit that rejects of its inputs (where the term replaces the original term in standard derandomization).

On the one hand, this relaxation potentially allows to construct more efficient derandomization algorithms. But on the other hand, the standard derandomization problem can be reduced to quantified derandomization, by applying strong error-reduction within the relevant circuit class (such that a circuit with acceptance probability is transformed to a circuit with acceptance probability ). Of course, a main goal underlying this approach is to reduce standard derandomization to a parameter setting for which we are able to construct a corresponding algorithm for quantified derandomization.

#### A quantified derandomization algorithm

Our first result is a quantified derandomization algorithm for circuits with a slightly super-linear number of wires. In fact, our algorithm works not only for , but also for the class of linear threshold circuits: While in circuits each gate computes the majority function, in linear threshold circuits each gate computes a linear threshold function (i.e., a function of the form , for and ; see Section 4.2 for definitions). Towards stating this first result, denote by the class of linear threshold circuits over input bits of depth and with at most wires.

###### Theorem 1.1

(quantified derandomization of linear threshold circuits). There exists a deterministic algorithm that, when given as input a circuit , runs in time , and satisfies the following:

1. If accepts all but at most of its inputs, then the algorithm accepts .

2. If rejects all but at most of its inputs, then the algorithm rejects .

Observe that as grows larger, the algorithm in Theorem 1.1 solves a more difficult derandomization task (since is larger), but only has to handle circuits with fewer wires (i.e., ). Also note that the algorithm in Theorem 1.1 is “whitebox”: That is, the algorithm gets as input an explicit description of a specific linear threshold circuit , and uses this description when estimating the acceptance probability of 3 The actual algorithm that we construct works for a more general parameter regime, which exhibits a trade-off between the number of exceptional inputs for and the number of wires of (see Theorem 5.1 for a precise statement).

The limitation on the number of wires of in Theorem 1.1 (i.e., ) essentially matches the best-known lower bounds for linear threshold circuits. This is no coincidence: Our algorithm construction follows a common theme in the design of circuit-analysis algorithms (e.g., derandomization algorithms or algorithms for satisfiability), which is the conversion of techniques that underlie lower bound proofs into algorithmic techniques. In this case, we observe that certain proof techniques for correlation bounds for a circuit class can be used to obtain algorithmic techniques for quantified derandomization of . In particular, to construct the algorithm in Theorem 1.1, we leverage the techniques underlying the recent proof of Chen, Santhanam, and Srinivasan [CSS16] of correlation bounds for linear threshold circuits. A high-level description of our algorithm appears in Section 3.1.

#### A reduction of standard derandomization to quantified derandomization

Our second result reduces the standard derandomization problem of to the quantified derandomization problem of circuits with a super-linear number of wires. In fact, we show that even a modest improvement of Theorem 1.1 would yield a non-trivial algorithm for standard derandomization of all of .

###### Theorem 1.2

(a reduction of standard derandomization to quantified derandomization). Assume that there exists a deterministic algorithm that, when given as input a circuit , runs in time at most , and for the parameter satisfies the following: If accepts all but at most of its inputs then the algorithm accepts , and if rejects all but at most of its inputs then the algorithm rejects .

Then, there exists an algorithm that for every and , when given as input a circuit , runs in time , and satisfies the following: If accepts at least of its inputs then the algorithm accepts , and if rejects at least of its inputs then the algorithm rejects .

The gap between the algorithm constructed in Theorem 1.1 and the algorithm assumed in the hypothesis of Theorem 1.2 is quantitatively very small: Specifically, the algorithm in Theorem 1.1 works when the number of wires in the input circuit is , whereas the algorithm in the hypothesis of Theorem 1.2 is required to work when the number of wires is . Moreover, Theorem 1.2 holds even if this improvement (in the number of wires) comes at the expense of a longer running time; specifically, the conclusion of Theorem 1.2 holds even if the algorithm runs in (sufficiently small) sub-exponential time.

As mentioned in the beginning of Section 1, a non-trivial derandomization of implies lower bounds for this class. Specifically, combining Theorem 1.2 with [SW13, Thm 1.5] (see also [BV14]), we obtain the following corollary:

###### Corollary 1.3

(quantified derandomization implies lower bounds for ). Assume that there exists a deterministic algorithm as in the hypothesis of Theorem 1.2. Then, .

The result that we actually prove is stronger and more general than the one stated in Theorem 1.2 (see Theorem 6.10). First, the result holds even if we limit ourselves only to the class , rather than to the class of linear threshold circuits (i.e., if we interpret the class as the class of circuits over inputs of depth and with wires). And secondly, the hypothesis of the theorem can be modified via a trade-off between the number of exceptional inputs for the circuit and the number of wires in .

The proof of Theorem 1.2 is based on developing a very efficient method for error-reduction within sparse . Specifically, we construct a seeded extractor such that there exists a circuit that gets input and computes the outputs of the extractor on and on all seeds using only a super-linear number of wires (i.e., a circuit of depth uses wires); as far as we know, this is the first construction of a seeded extractor that is specific to . This construction extends the study of randomness extraction in weak computational models, which has so far focused on , on , and on streaming algorithms [BYRST02, Vio05, Hea08, GVW15, CL16]. The construction is described in high-level in Section 3.2, and a precise statement appears in Proposition 6.9.

#### Restrictions for sparse TC0 circuits: A potential path towards NEXP⊈TC0

Recall that the best-known lower bounds for circuits of arbitrary constant depth are for circuits with wires. Our results imply that a certain type of analysis of circuits with only wires, which is common when proving correlation bounds (i.e., average-case lower bounds), might suffice to deduce a lower bound for all of .

Specifically, a common technique to prove correlation bounds for a circuit is the “restriction method”, which (loosely speaking) consists of proving the existence of certain subsets of the domain on which “simplifies” (i.e., agrees with a simpler function on the subset). We pose the following open problem: Construct a deterministic algorithm that gets as input a circuit with wires, runs in sufficiently small sub-exponential time, and finds a subset of size larger than such that the acceptance probability of can be approximated in sufficiently small sub-exponential time (see Open Problem 1 in Section 8 for a precise statement). In Section 8 we show that a resolution of the foregoing problem would imply that ; this follows from Theorem 1.2 and from the techniques that underlie the proof of Theorem 1.1.

#### The special case of depth-2 circuits

In addition to our main results, we also construct an alternative quantified derandomization algorithm for the special case of linear threshold circuits of depth two. Specifically, we construct a pseudorandom generator with seed length for the class of depth-2 linear threshold circuits with wires that either accept all but of their inputs or reject all but of their inputs. This result is not a corollary of Theorem 1.1, and is incomparable to the pseudorandom generator of Servedio and Tan [ST17b].

The precise result statement and proof appear in Section 7. The generator construction is obtained by leveraging the techniques of Kane and Williams [KW16] for correlation bounds for linear threshold circuits of depth two.

### 1.2 Organization

In Section 2 we provide background and discuss some relevant previous works. In Section 3 we give high-level overviews of the proofs of Theorems 1.1 and 1.2. After presenting preliminary formal definitions in Section 4, we prove Theorem 1.1 in Section 5 and Theorem 1.2 in Section 6. In Section 7 we construct the pseudorandom generator mentioned in Section 1.1.4. Finally, in Section 8 we formally pose the open problem that was mentioned in Section 1.1.3 and show the consequences of a solution to the problem.

## 2 Background and previous work

### 2.1 Lower bounds for linear threshold circuits

The best-known lower bounds for computing explicit functions by linear threshold circuits of a fixed small depth have been recently proved by Kane and Williams [KW16]. Specifically, they showed that any depth-two linear threshold circuit computing Andreev’s function requires gates and wires. They also showed correlation bounds (i.e,. average-case lower bounds with respect to the uniform distribution) for such circuits with Andreev’s function. Extending their worst-case lower bounds to depth three, they proved that any depth- circuit with a top majority gate that computes a specific polynomial-time computable function also requires gates and wires (the “hard” function is a modification of Andreev’s function).

For linear threshold circuits of arbitrary constant depth , the best-known lower bounds on the number of wires required to compute explicit functions are only slightly super-linear. Specifically, Impagliazzo, Paturi, and Saks [IPS97] proved that any linear threshold circuit of depth requires at least wires to compute the parity function; Chen, Santhanam, and Srinivasan [CSS16] strengthened this by showing correlation bounds for such circuits with parity (as well as with the generalized Andreev function). These lower bounds for parity are essentially tight, since Beame, Brisson, and Ladner [BBL92] (and later [PS94]) constructed a linear threshold circuit with wires that computes parity. We also mention that linear lower bounds on the number of linear threshold gates required to compute explicit functions (e.g., the inner-product function) have been proved in several works during the early ’90s, and these gate lower bounds apply even for circuits of unrestricted depth (see [Smo90, GT91, ROS94, Nis93]).

### 2.2 Derandomization of LTFs and of functions of LTFs

There has been an intensive effort in the last decade to construct pseudorandom generators for a single linear threshold function. This problem was first considered by Diakonikolas et al. [DGJ10] (see also [RS10]), and the current state-of-the-art, following [GOWZ10, Kan11, KRS12, MZ13, Kan14, KM15], is the pseudorandom generator of Gopalan, Kane, and Meka [GKM15], which -fools any LTF with input bits using a seed of length . Harsha, Klivans, and Meka [HKM12] considered a conjunction of linear threshold functions, and constructed a pseudorandom generator for a subclass of such functions (i.e., for a conjunction of regular LTFs; see Section 4.2 for a definition). Gopalan et al. [GOWZ10] constructed pseudorandom generators for small decision trees in which the leaves are linear threshold functions.

Very recently, Servedio and Tan [ST17b] considered the problem of derandomizing linear threshold circuits. For every , they constructed a pseudorandom generator that -fools any depth-2 linear threshold circuit with at most wires, using a seed of length , where is a small constant that depends on . This yields a derandomization of depth-2 linear threshold circuits with wires in time .

### 2.3 Quantified derandomization

The quantified derandomization problem, which was introduced by Goldreich and Wigderson [GW14], is a generalization of the standard derandomization problem. For a circuit class and a parameter , the -derandomization problem is the following: Given a description of a circuit over input bits, deterministically distinguish between the case that accepts all but of its inputs and the case that rejects all but of its inputs. Indeed, the standard derandomization problem is represented by the parameter value . Similarly to standard derandomization, a solution for the quantified derandomization problem of a class via a “black-box” algorithm (e.g., via a pseudorandom generator) yields a corresponding lower bound for (see Appendix A).

Prior to this work, quantified derandomization algorithms have been constructed for , for subclasses of , for polynomials over that vanish rarely, and for a subclass of . On the other hand, reductions of standard derandomization to quantified derandomization are known for , for , for polynomials over large finite fields, and for the class (both the algorithms and the reductions appear in [GW14, Tel17]). In some cases, most notably for , the parameters of the known quantified derandomization algorithms are very close to the parameters of quantified derandomization to which standard derandomization can be reduced (see [Tel17, Thms 1 & 2]).

## 3 Overviews of the proofs

### 3.1 A quantified derandomization algorithm for linear threshold circuits

The high-level strategy of the quantified derandomization algorithm is as follows. Given a circuit , the algorithm deterministically finds a set of size on which the circuit simplifies; that is, agrees with a function from some “simple” class of functions on almost all points in . If accepts all but of its inputs, then the acceptance probability of will be very high, and similarly, if rejects all but of its inputs, then the acceptance probability of will be very low. The algorithm then distinguishes between the two cases, by enumerating the seeds of a pseudorandom generator for the “simple” class of functions.

Our starting point in order to construct a deterministic algorithm that finds a suitable set is the recent proof of correlation bounds for sparse linear threshold circuits by Chen, Santhanam, and Srinivasan [CSS16]. Their proof is based on a randomized “whitebox” algorithm that gets as input a linear threshold circuit with depth and wires, and restricts all but of the variables such that the restricted circuit can be approximated by a single linear threshold function. Thus, if we are able to modify their algorithm to a deterministic one, we will obtain a quantified derandomization algorithm with the parameters asserted in Theorem 1.1 (i.e., if , then ). 4

Converting the randomized restriction algorithm into a deterministic algorithm poses several challenges, which will be our focus in this overview. Let us first describe the original algorithm, in high-level. The algorithm iteratively reduces the depth of the circuit. In each iteration it applies a random restriction that keeps every variable alive with probability , and otherwise assigns a random value to the variable. The main structural lemma of [CSS16] asserts that such a random restriction turns any LTF to be very biased (i.e., -close to a constant function), with probability . Hence, after applying the restriction, most gates in the bottom layer of the circuit become very biased, and the fan-in of the rest of the gates in the bottom layer significantly decreases (i.e., we expect it to reduce by a factor of ). The algorithm replaces the very biased gates with the corresponding constants, thereby obtaining a circuit that approximates the original circuit (i.e., the two circuits agree on all but of the inputs); and in [CSS16] it is shown that the algorithm can afterwards fix relatively few variables such that the fan-in of each gate that did not become very biased decreases to be at most one (such a gate can be replaced by a variable or a constant). Thus, if the circuit in the beginning of the iteration was of depth , we obtain a circuit of depth that approximates .

One obvious challenge in converting the randomized restriction algorithm into a deterministic algorithm is “derandomizing” the main structural lemma; that is, we need to construct a pseudorandom distribution of restrictions that turns any LTF to be very biased, with high probability. The second challenge is more subtle: In each iteration we replace the “current” circuit by a circuit that agrees with on almost all inputs in the subcube of the living variables (i.e., the circuits disagree on at most inputs). However, in subsequent iterations we will fix almost all of these variables, such that only variables will remain alive. Thus, we have no guarantee that and will remain close after additional restrictions in subsequent iterations; in particular, and might disagree on all of the inputs in the subcube of living variables in the end of the entire process. Of course, this is very unlikely to happen when values for fixed variables are chosen uniformly, but we need to construct a pseudorandom distribution of restrictions such that the approximation of each by is likely to be maintained throughout the process.

#### Derandomizing the main structural lemma of [Css16].

Let be an LTF over input bits, and consider a random restriction that keeps each variable alive with probability . Peres’ theorem implies that the expected distance of from a constant function is approximately (see, e.g., [O’D14, Sec. 5.5]). 5 A natural question is whether we can prove a concentration of measure for this distribution. As an illustrative example, consider the majority function ; for any , with probability roughly it holds that is -close to a constant function (see Fact 5.3). The main structural lemma in [CSS16] asserts that a similar statement indeed holds for any LTF ; specifically, they showed that with probability at least it holds that is -close to a constant function.

We construct a distribution over restrictions that can be efficiently sampled using random bits such that for any LTF and any , with probability at least it holds that is -close to a constant function. (The actual statement that we prove is more general; see Proposition 5.8 for precise details.) Indeed, this is both an “almost-full derandomization” of the lemma of [CSS16] as well as a refinement of the quantitative bound in the lemma.

The original proof of [CSS16] relies on a technical case analysis that is reminiscent of other proofs that concern LTFs, and is based on the notion of a critical index of a vector (they refer to the ideas underlying such analyses as “the structural theory of linear threshold functions”; see, e.g., [Ser07, DGJ10], and Definitions 4.3 and 4.4). In each case, the main technical tools that are used are concentration and anti-concentration theorems for random weighted sums (i.e., Hoeffding’s inequality and the Berry-Esséen theorem, respectively), which are used to bound the probability that several specific random weighted sums that are related to the restricted function fall in certain intervals.

To derandomize the original proof, an initial useful observation is the following. We say that a distribution over is -pseudorandomly concentrated if for any and any interval , the probability that falls in is -close to the probability that falls in (where is the uniform distribution over ). In particular, the Berry-Esséen theorem and Hoeffding’s inequality approximately hold for pseudorandom sums when is pseudorandomly concentrated. The observation is that being -pseudorandomly concentrated is essentially equivalent to being -pseudorandom for LTFs (see Claim 4.11). 6 In particular, if a distribution over is chosen using the pseudorandom generator of Gopalan, Kane, and Meka [GKM15] for LTFs, which has seed length , then is -pseudorandomly concentrated.

The main part in the proof of the derandomized lemma is a (non-trivial) modification of the original case analysis, in order to obtain an analysis in which all claims hold under a suitably-chosen pseudorandom distribution of restrictions. Since this part of the proof is quite technical and low-level, we defer its detailed description to Section 5.1.1. However, let us mention that our pseudorandom distribution itself is relatively simple: We first choose the variables to keep alive such that each variable is kept alive with probability approximately , and the choices are -wise independent; and then we independently choose values for the fixed variables, using the generator of [GKM15] with error parameter . We also note that it is suprising that in our setting the case analysis can be modified in order to obtain an “almost-full derandomization” (i.e., seed length ), since previous derandomizations of similar case analyses regarding LTFs for different settings required much larger seed for error (see [DGJ10]).

#### Preserving the closeness of the circuit to its approximations.

Consider some iteration of the restriction algorithm, in which we start with a circuit of depth , and replace it by a circuit of depth that only approximates . (In particular, and disagree on more inputs than the number of inputs in the final subcube of living variables in the end of the entire restriction process.) Recall that was obtained by replacing very biased gates in with corresponding constants.

Our goal now is to show how to choose subsequent restrictions such that with high probability and will remain close even after applying these restrictions. We will in fact choose each restriction such that the following holds: For each gate that was replaced by a constant , with probability over choice of restriction it holds that is still -close to (i.e., ; the claim that and remain close with high probability follows by a union-bound on the gates). Specifically, we prove that if an LTF is, say, -close to a constant , and a restriction is chosen such that the distribution of values for the fixed variables is -pseudorandom for LTFs, then with probability it holds that is -close to (see Lemma 5.10). 7

A natural approach to prove such a statement is the following. For any fixed choice of a set of variables to keep alive, we want to choose the values for the fixed variables from a distribution that “fools” a test that checks whether or not is close to . That is, consider a test that gets as input values for the fixed variables , and decides whether or not remains close to in the subcube corresponding to . When is chosen uniformly, with high probability remains close to , and hence the acceptance probability of is high; thus, any distribution over that is pseudorandom for also yields, with high probability, values such that remains close to . The problem with this approach is that a test for such a task above might be very inefficient, since it needs to evaluate on all points in the subcube corresponding to ; thus, we might not be able to construct a pseudorandom generator with short seed to “fool” such a “complicated” test.

To solve this problem, we use the following general technique that was introduced in our previous work [Tel17], which is called randomized tests. Loosely speaking, a lemma from our previous work implies the following: Assume that there exists a distribution over tests such that for every fixed input for which is -close to it holds that , with high probability, and for every fixed input for which is not -close to it holds that , with high probability. That is, the distribution constitutes a “randomized test” that distinguishes, with high probability, between “excellent” ’s (such that is very close to ) and “bad” ’s (such that is relatively far from ). Also assume that almost all tests in the support of are “fooled” by a pseudorandom generator . Then, with high probability over choice of seed for the pseudorandom generator , the generator outputs such that is -close to (see Lemma 5.12 for a precise and general statement). The main point is that the distribution , which may have very high entropy, is only part of the analysis; the actual algorithm that generates is simply the pseudorandom generator .

The distribution that we will use is equivalent to the following random process: Given , uniformly sample points in the subcube corresponding to , and accept if evaluates to the constant on all the sample points. We show how to construct such a distribution such that almost all of the residual deterministic tests are conjunctions of LTFs, and have very high acceptance probability (at least ). Thus, any distribution that is -pseudorandom for LTFs is also -pseudorandom for almost all tests in the support of (for details see the proof of Lemma 5.13). Combining this statement with the aforementioned general lemma, we deduce the following: If whenever we fix variables we choose the values for the fixed variables according to a distribution that is -pseudorandom for LTFs, then with high probability the circuit will remain close to the circuit .

### 3.2 Reduction of standard derandomization to quantified derandomization

Given a circuit of depth over input bits, our goal is to construct a circuit of depth over input bits such that if accepts (resp., rejects) at least of its inputs then accepts (resp., rejects) all but of its inputs. 8 The circuit will use its input in order to sample inputs for by a seeded extractor, and then compute the majority of the evaluations of on these inputs. Specifically, fixing an extractor for min-entropy 9 the circuit gets input , and outputs the majority of the values .

The main technical challenge underlying this strategy is to construct an extractor such that the mapping of input to the outputs of the extractor on all seeds (i.e., the mapping ) can be computed by a circuit with as few wires as possible. In our construction, the seed length will be , and thus the number of output bits will be ; we will construct a circuit that computes the mapping of to these output bits with only a super-linear number of wires (i.e., the number of wires is only slightly larger than the number of output bits). Indeed, a crucial point in our construction is that we will efficiently compute the outputs of the extractor on all seeds in a “batch”, rather than compute the extractor separately for each seed.

#### Our starting point: A construction of C′ with n3.01 wires

As our starting point, let us construct a suitable circuit that has wires and is based on Trevisan’s extractor [Tre01]. Given an input and seed , Trevisan’s extractor first computes an encoding of by an -balanced error-correcting code (i.e., a code in which every non-zero codeword has relative Hamming weight ). 10 Fixing a suitable combinatorial design of sets of size in a universe of size , the output of is the bits of in the coordinates specified by .

An initial important observation is that the circuit only needs to compute the encoding of once, and then each of the copies of can take its inputs directly from the bits of (i.e., each copy of corresponds to a fixed seed , and takes its inputs from locations in that are determined by and by the predetermined combinatorial design). This is indeed a form of “batch computation” of the extractor on all seeds.

Let us see why this construction uses wires. To encode into we can use known polynomial-time constructions of suitable linear codes that map bits to bits (e.g., [NN93, ABN92, TS17]). Since the code is linear in , each bit of can be computed by a circuit with wires, and thus the number of wires that we use to compute is . Now, recall that we want the extractor to work for min-entropy ; relying on Trevisan’s proof and on standard constructions of combinatorial designs, the required seed length is 11 Therefore, the number of copies of in is , and the overall number of wires in is .

#### The actual construction of C′ with n1.01 wires

There are two parts in the construction above that led us to use a large number of wires: First, the seed length of the extractor is , which yields copies of ; and secondly, the number of wires required to compute the encoding of is super-quadratic, rather than super-linear. Let us now describe how to handle each of these two problems, and obtain a construction with only wires.

To reduce the seed length of the extractor, we follow the approach of Raz, Reingold, and Vadhan [RRV02]. They showed that Trevisan’s extractor works even if we replace standard combinatorial designs by a more relaxed notion that they called weak designs (see Definition 6.1). Indeed, weak designs can be constructed with a smaller universe size , which yields a smaller seed length for the extractor. Their construction yields , and we show a modified construction of weak designs that for our setting of parameters yields (see Lemma 6.2).

The second challenge is to construct an -balanced error-correcting code that maps bits to bits, and can be computed by a circuit of depth with wires (this is the code that we will use to compute from ; see Corollary 6.8). To describe the code, we describe the encoding process of , which has two steps: First we encode by a code with constant rate and constant relative distance, and then perform a second encoding that amplifies the distance of the code to .

Computing a code with distance . In the first step, we encode by a linear error-correcting code that has distance , instead of , and also has rate and can be computed in with wires. This will be done using tensor codes that are based on any (arbitrary) initial good linear error-correcting code.

To see why tensor codes are helpful, assume that , for some , and fix a linear code that maps bits to bits and has constant relative distance. Thinking of the input as an matrix, we first encode each row of the matrix using , to obtain an matrix , and then encode each column of using , to obtain an matrix . By well-known properties of tensor codes, this yields a linear error-correcting code with constant rate and constant relative distance. Moreover, computing the code in only requires wires: This is because the strings that we encode with (which are the rows of in the first step and then the columns of in the second step) are each of length . Thus, each of the bits in is a linear function of bits, and the latter can be computed by circuit with wires.

To obtain a code with wires instead of wires we can use a tensor code of higher order. Specifically, assume that , for some large constant , and think of as a tensor of dimensions . The encoding process will consist of iterations, and in each iteration we encode strings of length in the tensor by . The final codeword will be of length , will have constant relative distance, and can be computed by a circuit with only wires. (See Section 6.2 for further details.)

Amplifying the distance from to . Assume that the previous step mapped the input to , where . If was a non-zero message, then has relative Hamming weight . Our goal now is to increase the Hamming weight of to , using as few wires as possible. To do so we rely on the strategy of Naor and Naor [NN93], which is based on expander random walks. (This strategy was also recently used by Ta-Shma [TS17] to construct almost-optimal -balanced codes.)

Specifically, fix a graph on vertices with constant degree and constant spectral gap. Associate the vertices of with the coordinates of , and consider a random walk on that starts at a uniformly-chosen vertex and walks steps. With probability at least , such a walk meets the set of coordinates in which is non-zero (since this set has constant density). Thus, if we take such a random walk on the coordinates of , and output the parity of a random subset of the bits of that we encountered, with probability at least we will output one.

The encoding of is thus the following. Every coordinate in is associated with a specific walk of length on and with a subset ; thus, has coordinates. The bit of at a coordinate associated with a walk and with a subset is the parity of the bits of encountered in the walk . Thus, each bit in is the parity of at most bits in , so computing from only requires wires. Recall that in our setting we need ; the number of wires is thus at most . By the preceding paragraph, if has Hamming weight then has Hamming weight at least .

## 4 Preliminaries

Throughout the paper, the letter will always denote the number of inputs to a function or a circuit. We denote random variables by boldface letters, and denote by the uniform distribution on bits.

We are interested in Boolean functions, represented as functions . We say that a function accepts an input if . For two Boolean functions and over a domain , we say that and are -close if .

For a vector , we denote by the standard -norm . For , we denote and . For two vectors , we denote .

### 4.1 Two probabilistic inequalities

We will rely on two standard facts from probability theory that assert concentration and anti-concentration bounds for certain distributions. Specifically, we will need a standard version of Hoeffding’s inequality, and a corollary of the Berry-Esséen theorem:

###### Theorem 4.1

(Hoeffding’s inequality; for a proof see, e.g., [DP09, Sec. 1.7]). Let , and let be a uniformly-chosen random vector in . Then, for any it holds that

###### Theorem 4.2

(a corollary of the Berry-Esséen theorem; see, e.g., [DGJ10, Thm 2.1, Cor 2.2]). Let and such that for every it holds that , and let be a uniformly-chosen random vector in . Then, for any and it holds that:

 Pr[⟨w,z⟩∈θ±t⋅∥w∥2]≤2⋅(t+μ).

### 4.2 Linear threshold functions and circuits

A linear threshold function (or LTF, in short) is a function of the form , where is a vector of real “weights”, and is a real number (the “threshold”), and denotes the standard inner-product over the reals. 12 Indeed, the majority function is the special case where the weights are identical (e.g., for all ) and the threshold is zero (i.e., ).

We will be interested in linear threshold circuits, which are circuits that consist only of LTF gates with unbounded fan-in and fan-out. We assume that linear threshold circuits are layered, in the sense that for each gate , all the gates feeding into have the same distance from the inputs. For , let be the class of linear threshold circuits over input bits of depth and with at most wires. For some fixed sizes and depths, linear threshold circuits are known to be stronger than circuits with majority gates; however, linear threshold circuits can be simulated by circuits with majority gates with a polynomial size overhead and with one additional layer (see [GHR92, GK98]). Thus, the class as a whole equals the class of linear threshold circuits.

The following are standard definitions (see, e.g., [Ser07, DGJ10]), which refer to “structural” properties of LTFS and will be useful for us throughout the paper.

###### Definition 4.3

(regularity). For , we say that a vector is -regular if for every it holds that . An LTF is -regular if is -regular.

###### Definition 4.4

(critical index). When satisfies , the -critical index of is defined as the smallest such that is -regular (and if no such exists). The critical index of an LTF is the critical index of , where is the vector that is obtained from by permuting the coordinates in order to have .

###### Definition 4.5

(balanced LTF). For , we say that an LTF is -balanced if ; otherwise, we say that is -imbalanced.

Representation of linear threshold circuits The algorithm in Theorem 1.1 gets as input an explicit representation of a linear threshold circuit , where the weights and thresholds of the LTFs in may be arbitrary real numbers. Throughout the paper we will not be specific about how exactly is represented as an input to the algorithm, since the algorithm works in any reasonable model. In particular, the algorithm only performs addition, subtraction, and comparison operations on the weights and thresholds of the LTFs in .

Explicitly suggesting one convenient model, one may assume that the weights and threshold of each LTF are integers of unbounded magnitude (since the real numbers can be truncated at some finite precision without changing the function). In this case, the circuit has a binary representation, and the required time to perform addition, subtraction, and comparison on these integers is linear in the representation size. 13

### 4.3 Pseudorandomness

We need the following two standard definitions of pseudorandom distributions and of pseudorandom generators (or PRGs, in short).

###### Definition 4.6

(pseudorandom distribution). For and a domain , we say that a distribution over is -pseudorandom for a class of functions if for every it holds that .

###### Definition 4.7

(pseudorandom generator). Let , where for every it holds that is a set of functions , and let and . An algorithm is a pseudorandom generator for with error parameter and seed length if for every , when is given as input and a random seed of length , the output distribution of is -pseudorandom for .

We will rely on the following recent construction of a pseudorandom generator for LTFs, by Gopalan, Kane, and Meka [GKM15]:

###### Theorem 4.8

(a PRG for LTFs; [GKM15, Cor. 1.2]). For every , there exists a polynomial-time pseudorandom generator for the class of LTFs with seed length .

A distribution over is -almost -wise independent if for every of size it holds that is -close to the uniform distribution over in statistical distance. We will need the following standard tail bound for such distributions.

###### Fact 4.9

(tail bound for almost -wise independent distributions). Let be an even number, and let . Let be variables in that are -almost -wise independent, and denote . Then, for any it holds that .

In particular, for and and , where is a sufficiently large polynomial, we have that

 Pr⎡⎣1n⋅∑i∈[n]xi∈μ±(μ/2)⎤⎦=O((μ⋅n)−t/2).

We now define the notion of a distribution that is -pseudorandomly concentrated, and show that it is essentially equivalent to the notion of being -pseudorandom for LTFs. The equivalence was communicated to us by Rocco Servedio, and is attributed to Li-Yang Tan.

###### Definition 4.10

(-pseudorandomly concentrated distribution). For and , we say that a distribution over is -pseudorandomly concentrated if the following holds: For every and every it holds that .

###### Claim 4.11

(being pseudorandomly concentrated is equivalent to being pseudorandom for LTFs). Let be a distribution over . Then,

1. If is -pseudorandom for LTFs, then is -pseudorandomly concentrated.

2. If is -pseudorandomly concentrated, then is -pseudorandom for LTFs.

###### Proof..

Let us first prove Item (1). Fix and . For any fixed , exactly one of three events happens: Either , or , or . Since the event can be tested by an LTF (i.e., by the LTF ), this event happens with probability under a choice of . Similarly, the event happens with probability under a choice of . Thus, the probability under a choice of that is .

To see that Item (2) holds, let be an LTF over input bits, and let . Then, for every it holds that if and only if . Thus, .

### 4.4 Restrictions

A restriction for functions is a subset of . We will be interested in restrictions that are subcubes, and such restrictions can be described by a string in the natural way (i.e., the subcube consists of all strings such that for every such it holds that ). We will sometimes describe a restriction by a pair , where is the set of variables that the restriction keeps alive, and is the sequence of values that assigns to the variables that are fixed.

We identify strings , where , with restrictions , as follows: Each variable is assigned a block of bits in the string; the variable remains alive if the first bits in the block are all , and otherwise takes the value of the bit. When we refer to a “block” in the string that corresponds to a restriction, we mean a block of bits that corresponds to some variable. When we say that a restriction is chosen from a distribution over , we mean that a string is chosen according to , and interpreted as a restriction.

In addition, we will sometimes identify a pair of strings and with a restriction . In this case, the restriction is the restriction that is obtained by combining and to a string in the natural way (i.e., appending a bit from to each block of bits in ). Note that the string determines which variables