Deterministic Approximate Counting for
We give a deterministic algorithm for approximately computing the fraction of Boolean assignments that satisfy a degree- polynomial threshold function. Given a degree-2 input polynomial and a parameter , the algorithm approximates
to within an additive in time . Note that it is NP-hard to determine whether the above probability is nonzero, so any sort of multiplicative approximation is almost certainly impossible even for efficient randomized algorithms. This is the first deterministic algorithm for this counting problem in which the running time is polynomial in for . For “regular” polynomials (those in which no individual variable’s influence is large compared to the sum of all variable influences) our algorithm runs in time. The algorithm also runs in time to approximate to within an additive , for any degree-2 polynomial .
As an application of our counting result, we give a deterministic FPT multiplicative -approximation algorithm to approximate the -th absolute moment of a degree-2 polynomial. The algorithm’s running time is of the form .
A degree- polynomial threshold function (PTF) is a Boolean function defined by where is a degree- polynomial. In the special case where , degree- PTFs are often referred to as linear threshold functions (LTFs) or halfspaces. While LTFs and low-degree PTFs have been researched for decades (see e.g., [MK61, MTT61, MP68, Mur71, GHR92, Orp92, Hås94, Pod09] and many other works) their study has recently received new impetus as they have played important roles in complexity theory [She08, She09, DHK10, Kan10, Kan12c, Kan12a, KRS12], learning theory [KKMS08, SSSS11, DOSW11, DDFS12], voting theory [APL07, DDS12] and other areas.
An important problem associated with LTFs and PTFs is that of deterministically estimating the fraction of assignments that satisfy a given LTF or PTF over . In particular, in this paper we are interested in deterministically estimating the fraction of satisfying assignments for PTFs of degree . This problem is motivated by the long line of work on deterministic approximate counting algorithms, starting with the seminal work of Ajtai and Wigderson [AW85] who gave non-trivial deterministic counting algorithms for constant-depth circuits. Since then much progress has been made on the design of deterministic counting algorithms for other classes of Boolean functions like DNFs, low-degree polynomials and LTFs [LV96, GMR13, Vio09, GKM11]. Problems of this sort can be quite challenging; after close to three decades of effort, deterministic polynomial time counting algorithms are not yet known for a simple class like polynomial-size DNFs.
Looking beyond Boolean functions, there has been significant work on obtaining deterministic approximate counting algorithms for combinatorial problems using ideas and techniques from statistical physics. This includes work on counting matchings [BGK07], independent sets [Wei06], proper colorings [LLY13] and other problems in statistical physics [BG08]. We note that there is interest in obtaining such deterministic algorithms despite the fact that in some of these cases an optimal randomized algorithm is already known (e.g., for counting matchings [JSV01]) and the performance of the corresponding deterministic algorithm is significantly worse [BGK07]. For this paper, the most relevant prior work are the results of Gopalan et al. and Stefankovic et al. [GKM11] who independently obtained deterministic time algorithms for counting the satisfying assignments of an LTF up to multiplicative error. (As we discuss later, in contrast with LTFs it is NP-hard to count the satisfying assignments of a degree- PTF for any up to any multiplicative factor. Thus, the right notion of approximation for degree- PTFs is additive error.)
There has recently been significant work in the literature on unconditional derandomization of LTFs and PTFs. The starting point of these works are the results of Rabani and Shpilka [RS09] and Diakonikolas et al [DGJ09] who gave explicit constructions of polynomial-sized hitting sets and pseudorandom generators for LTFs. Building on these works, Meka and Zuckerman [MZ10] and subsequently Kane [Kan11a, Kan11b, Kan12b] constructed polynomial-sized PRGs for degree- PTFs for . These PRGs trivially imply deterministic polynomial-time counting algorithms for any fixed and fixed . While there has been significant research on improving the dependence of the size of these PRGs on , the best construction in the current state of the art is due to Kane [Kan12c] who gave an explicit PRG for degree- polynomial threshold functions over of size . (In a related but different line of work [Kan11a, Kan11b, Kan12b] focusing on PRGs for degree- PTFs over the Gaussian distribution, the strongest result to date is that of [Kan12b] which for any constant degree gives an explicit PRG of size for degree- PTFs; here is a slightly sub-polynomial function of , even for ). As a consequence, the resulting deterministic counting algorithms have a running time which is at least and thus the running time of these algorithms is not a fixed polynomial in . In particular, for any , the running time of these algorithms is super-polynomial in .
1.1 Our contributions.
In this work we give the first fixed polynomial time deterministic algorithm for a PTF problem of this sort. As our main result, for all we give a fixed poly-time algorithm to deterministically -approximately count the number of satisfying assignments to a degree-2 PTF:
[Deterministic approximate counting of degree-2 PTFs over , informal statement] There is a deterministic algorithm which, given a degree- polynomial and as input, runs in time and outputs a value such that
Note that, as a consequence of this theorem, we get a time deterministic algorithm to count the fraction of satisfying assignments of a degree- PTF over up to error .
The influence of variable on a polynomial , denoted , is the sum of squares of all coefficients of that are on monomials containing ; it is a measure of how much “effect” the variable has on the outcome of . Following previous work [DHK10] we say that a polynomial is -regular if For sufficiently regular polynomials, our algorithm runs in fully polynomial time :
[Deterministic approximate counting of regular degree-2 PTFs over , informal statement] Given and an -regular degree-2 polynomial our algorithm runs (deterministically) in time and outputs a value such that
We note that the regular case has been a bottleneck in all known constructions of explicit PRGs for PTFs; the seed-length of known generators for regular PTFs is no better than for general PTFs. Given Theorem 2, the only obstacle to improved running times for deterministic approximate counting algorithms is improving the parameters of the “regularity lemma” which we use. 111Indeed, Kane [Kan13] has suggested that using the notions of regularity and invariance from [Kan12c] may result in an improved, though still , running time for our approach; we have not explored that in this work.
Discussion. Our counting results described above give deterministic additive approximations to the desired probabilities. While additive approximation is not as strong as multiplicative approximation, we recall that the problem of determining whether is nonzero is well-known to be NP-hard for degree-2 polynomials even if all nonconstant monomials in are restricted to have coefficient 1 (this follows by a simple reduction from Max-Cut, see the polynomial defined below). Thus, no efficient algorithm, even allowing randomness, can give any multiplicative approximation to unless . Given this, it is natural to consider additive approximation.
Our results for degree- PTFs yield efficient deterministic algorithms for a range of natural problems. As a simple example, consider the following problem: Given an undirected -node graph and a size parameter , the goal is to estimate the fraction of all cuts that contain at least edges. (Recall that exactly counting the number of cuts of at least a given size is known to be #P-hard [Pap94].) We remark that a simple sampling-based approach yields an efficient randomized -additive approximation algorithm for this problem. Now note that the value of the polynomial on input equals the number of edges in the cut corresponding to (where vertices such that are on one side of the cut and vertices such that are on the other side). It is easy to see that if then must be -regular. Theorem 2 thus provides a deterministic poly-time algorithm that gives an -additive approximation to the fraction of all cuts that have size at least in -node graphs with at least edges, and Theorem 1 gives a deterministic poly-time -approximation algorithm for all -node graphs.
As another example, consider the polynomial . In this case, we have that equals the number of edges in the subgraph of that is induced by vertex set . Similarly to the example of the previous paragraph, Theorem 2 yields a deterministic poly-time algorithm that gives a -additive approximation to the fraction of all induced subgraphs that have at least edges in -node graphs with at least edges, and Theorem 1 gives a deterministic poly-time -additive approximation algorithm for any graph.
Estimating moments. Our results also imply deterministic fixed-parameter tractable (FPT) algorithms for approximately computing moments of degree- polynomials. Consider the following computational problem Absolute-Moment-of-Quadratic: given as input a degree-2 polynomial and an integer parameter , output the value It is clear that the raw moment can be computed exactly in time by expanding out the polynomial , performing multilinear reduction, and outputting the constant term. Since the -th raw moment equals the -th absolute moment when is even, this gives an time algorithm for Absolute-Moment-of-Quadratic for even . However, for any fixed odd the Absolute-Moment-of-Quadratic problem is #P-hard (see Appendix B). Thus, it is natural to seek approximation algorithms for this problem.
Using the hyper-contractive inequality [Bon70, Bec75] it can be shown that the natural randomized algorithm – draw uniform points from and use them to empirically estimate – with high probability gives a -accurate estimate of the -th absolute moment of in time. Using Theorem 1 we are able to derandomize this algorithm and obtain a deterministic FPT -multiplicative approximation algorithm for Absolute-Moment-of-Quadratic:
There is a deterministic algorithm which, given any degree- polynomial with -bit integer coefficients, any integer , and any , runs in time and outputs a value that multiplicatively -approximates the -th absolute moment of .
The major technical work in this paper goes into proving Theorem 2. Given Theorem 2, we use a (deterministic) algorithmic version of the “regularity lemma for degree- PTFs” from [DSTW10] to reduce the case of general degree-2 PTFs to that of regular degree-2 PTFs. (The regularity lemma that is implicit in [HKM09] can also be used for this purpose.)
As is usual in this line of work, we can use the invariance principle of Mossel et al. [MOO10] to show that for an -regular degree- polynomial , we have Thus, to prove Theorem 2, we are left with the task of additively estimating .
The first conceptual idea towards achieving the aforementioned task is this: Since Gaussian distributions are invariant under rotations, computing the probability of interest is equivalent to computing for a “decoupled” polynomial . More precisely, there exists a polynomial of the form such that the distributions of and (where ) are identical. Indeed, consider the symmetric matrix associated with the quadratic part of and let be the spectral decomposition of . It is easy to show that is a decoupled polynomial with the same distribution as , . The counting problem for should intuitively be significantly easier since there are no correlations between ’s monomials, and hence it would be useful if could be efficiently exactly obtained from . Strictly speaking, this is not possible, as one cannot obtain the exact spectral decomposition of a symmetric matrix (for example, can have irrational eigenvalues). For the sake of this informal discussion, we assume that one can in fact obtain the exact decomposition and hence the polynomial .
Suppose we have obtained the decoupled polynomial . The second main idea in our approach is the following: We show that one can efficiently construct a -variable “junta” polynomial , with , such that the distribution of is -close to the distribution of in Kolmogorov distance. (Recall that the Kolmogorov distance between two random variables is the maximum distance between their CDFs.) To prove this, we use a powerful recent result of Chatterjee [Cha09] (Theorem 42), proved using Stein’s method, which provides a central limit theorem for functions of Gaussians. Informally, this CLT says that for any function satisfying some technical conditions, if are independent random variables, then is close in total variation distance ( distance between the pdfs) to a Gaussian distribution with the “right” mean and variance. (We refrain from giving a more detailed description of the theorem here as the technical conditions stem from considering generators of the Ornstein-Uhlenbeck process, thus rendering it somewhat unsuitable for an intuitive discussion.) Using this result, we show that if (i.e., if the maximum magnitude eigenvalue of the symmetric matrix corresponding to is “small”), then the distribution of (hence, also of ) is -close to , and hence the one-variable polynomial is the desired junta. In the other case, i.e., the case that , one must resort to a more involved approach as described below.
If , we perform a “critical index based” case analysis (in the style of Servedio, see [Ser07]) appropriately tailored to the current setting. We remark that such analyses have been used several times in the study of linear and polynomial threshold functions (see e.g., [DGJ09, FGRW09, DHK10, DSTW10]). In all these previous works the critical index analysis was performed on influences of variables in the original polynomial (or linear form). Two novel aspects of the analysis in the current work are that (i) we must transform the polynomial from its original form into the “decoupled” version before carrying out the critical index analysis; and (ii) in contrast to previous works, we perform the critical index analysis not on the influences of variables, but rather on the eigenvalues of the quadratic part of the decoupled polynomial, i.e., on the values , ignoring the linear part of the decoupled polynomial. The following paragraph explains the situation in detail.
Suppose that the eigenvalues are ordered so that . Consider the polynomials (the “head part”) and (the “tail part”). Define the -critical index as the minimum such that . Let . If the -critical index is more than then we show that the “head part” (appropriately shifted) captures the distribution of up to a small error. In particular, the distribution of is -close to that of in Kolmogorov distance. On the other hand, if the critical index is , then it follows from Chatterjee’s CLT that the polynomial is -close to in total variation distance (hence, also in Kolmogorov distance). Note that in both cases, has at most variables and hence setting , we obtain a polynomial on variables whose distribution is Kolmogorov -close to that of .
Thus, we have effectively reduced our initial task to the deterministic approximate computation of . This task can potentially be achieved in a number of different ways (see the discussion at the start of Section 2.4); with the aim of giving a self-contained and poly-time algorithm, we take a straightforward approach based on dynamic programming. To do this, we start by discretizing the random variable to obtain a distribution (supported on many points) which is such that Since is a decoupled polynomial, computing can be reduced to the counting version of the knapsack problem where the weights are integers of magnitude , and therefore can be solved exactly in time by standard dynamic programming.
We note that the dynamic programming approach we employ could be used to do deterministic approximate counting for a decoupled -variable Gaussian degree-2 polynomial in time even without the junta condition. However, the fact that is Kolmogorov-close to a junta polynomial is a structural insight which has already proved useful in followup work. Indeed, achieving a junta structure is absolutely crucial for recent extensions of this result [DDS13, DS13] which generalize the deterministic approximate counting algorithm presented here (to juntas of degree-2 PTFs in [DDS13] and to general degree- PTFs in [DS13], respectively).
Singular Value Decomposition: The above informal description glossed over the fact that given a matrix , it is in general not possible to exactly represent the SVD of using a finite number of bits (let alone to exactly compute the SVD in polynomial time). In our actual algorithm, we have to deal with the fact that we can only achieve an “approximate” SVD. We define a notion of approximation that is sufficient for our purposes and show that such an approximation can be computed efficiently. Our basic algorithmic primitive is (a variant of the) well-known “powering method” (see [Vis13] for a nice overview). Recall that the powering method efficiently computes an approximation to the eigenvector corresponding to the highest magnitude eigenvalue. In particular, the method has the following guarantee: given that the largest-magnitude eigenvalue of has absolute value , the powering method runs in time and returns a unit vector such that .
For our purposes, we require an additional criterion: namely, that the vector is almost parallel to . (This corresponds to the notion of “decoupling” the polynomial discussed earlier.) It can be shown that if one naively applies the powering method, then it is not necessarily the case that the vector it returns will also satisfy this requirement. To get around this, we modify the matrix before applying the powering method and show that the vector so returned provably satisfies the required criterion, i.e., is almost parallel to . An additional caveat is that the standard “textbook” version of the method is a randomized algorithm, and we of course need a deterministic algorithm. This can be handled by a straightforward derandomization, resulting in only a linear time overhead.
We record basic background facts from linear algebra, probability, and analysis in Appendix A, along with our new extended notion of the “critical index” of a pair of sequences. Section 2 establishes our main technical result – an algorithm for deterministically approximately counting satisfying assignments of a degree-2 PTF under the Gaussian distribution. Section 3 extends this result to satisfying assignments over . Finally, in Section 4 we give the application to deterministic approximation of absolute moments.
2 Deterministic approximate counting for Gaussian distributions
Our goal is to compute, up to an additive , the probability . The algorithm has two main stages. In the first stage (Section 2.3) we transform the original -variable degree- polynomial into an essentially equivalent polynomial with a “small” number of variables – independent of – and a nice special form (a degree- polynomial with no “cross terms”). The key algorithmic tool used in this transformation is the routine APPROXIMATE-DECOMPOSE which is described and analyzed in Section 2.2. In particular, suppose that the original degree– polynomial is of the form . The first stage constructs a degree- “junta” polynomial with no cross terms (that is, every non-constant monomial in is either of the form or ) where , such that Theorem 27 summarizes what is accomplished in the first stage. We view this stage as the main contribution of the paper.
In the second stage (Section 2.4) we give an efficient deterministic algorithm to approximately count the fraction of satisfying assignments for . Our algorithm exploits both the fact that depends on only variables and the special form of . Theorem 43 summarizes what is accomplished in the second stage. Theorem 50 combines these two results and gives our main result for deterministic approximate counting of Gaussian degree-2 PTFs.
The first stage: Constructing a degree-2 junta PTF.
To implement the first step we take advantage of the fact that in order to “decouple” the variables. Suppose we have computed the spectral decomposition of as . (We remark that our algorithm does not compute this decomposition explicitly; rather, it iteratively approximates the eigenvector corresponding to the largest magnitude eigenvalue of , as is described in detail in the pseudocode of algorithm Construct-Junta-PTF. For the sake of this intuitive explanation, we assume that we construct the entire spectrum.) Then, we can write as
where and . Since is orthonormal, it follows that and that the desired probability can be equivalently written as .
At this point, let us arrange the variables in order, so that the sequence is non-increasing. We now consider the -critical index of the pair of sequences and (here is the “auxiliary sequence”see Definition 71). The starting point of our analysis is the following result.
Informal theorem: If the -critical index is zero, then the random variable , where , is -close in total variation distance to where and .
As mentioned earlier, the proof of the above theorem uses a recent result of Chatterjee [Cha09] (Theorem 42) which provides a central limit theorem for functions of Gaussians. With this as starting point, we consider a case analysis depending on the value of the -critical index of the pair of sequences and ( is the auxiliary sequence). Let be the value of the the -critical index of the pair. If , then the tail can be replaced by where and . On the other hand, if , then the distribution of differs from the distribution of by in Kolmogorov distance. In either case, we end up with a degree- polynomial on at most variables whose distribution is close to the distribution of in Kolmogorov distance.
The main difficulty in the real algorithm and analysis vis-a-vis the idealized version described above is that computationally, it is not possible to compute the exact spectral decomposition. Rather, what one can achieve is some sort of an approximate decomposition (we are deliberately being vague here about the exact nature of the approximation that can be achieved). Roughly speaking, at every stage of the algorithm constructing several approximations are introduced and non-trivial technical work is required in bounding the corresponding error. See Sections 2.2 and 2.3 for the detailed analysis.
The second stage: Counting satisfying assignments of degree- juntas over Gaussian variables.
We are now left with the task of (approximately) counting . To do this we start by discretizing each normal random variable to a sufficiently fine granularity – it turns out that a grid of size suffices. Let us denote by the discretized approximation to . We also round the coefficients of to a suitable granularity and denote by the rounded polynomial. It can be shown that and are -close in Kolmogorov distance. Finally, this reduces computing to computing . Since the terms in are decoupled (i.e., there are no cross terms) and have small integer coefficients, can be expressed as a read-once branching program of size . Using dynamic programming, one can efficiently compute the exact probability in time . See Section 2.4 for the details.
We note that alternative algorithmic approaches could potentially be used for this stage. We chose our approach of discretizing and using dynamic programming because we feel that it is intuitive and self-contained and because it easily gives a -time algorithm for this stage.
2.2 A useful algorithmic primitive.
In this section we state and prove correctness of the main algorithmic primitive APPROXIMATE-DECOMPOSE that our procedure for constructing a degree- junta over Gaussian variables will use. This primitive partially “decouples” a given input degree- polynomial by transforming the polynomial into an (essentially equivalent) polynomial in which a new variable (intuitively corresponding to the largest eigenvector of the input degree-2 polynomial’s matrix) essentially does not appear together with any other variables in any monomials.
Theorem 6 gives a precise statement of APPROXIMATE-DECOMPOSE’s performance. The reader who is eager to see how APPROXIMATE-DECOMPOSE is used may wish to proceed directly from the statement of Theorem 6 to Section 2.3.
We require the following definition to state Theorem 6. (Below a “normalized linear form” is an expression with )
Given a degree-2 polynomial defined by and a normalized linear form , we define the residue of with respect to , , to be the polynomial obtained by the following process : For each , express as where is orthogonal to the linear form . Now, consider the polynomial . is defined as the homogenous multilinear degree-2 part of which has the variable present in all its terms.
Let be a degree- polynomial (with constant term ) whose entries are -bit integers and let . There exists a deterministic algorithm APPROXIMATE-DECOMPOSE which on input an explicit description of , and runs in time and has the following guarantee :
If , then the algorithm outputs rational numbers , and a degree- polynomial with the following property: for , the two distributions and are identical, where equals Further, and .
If , then the algorithm either outputs “small max eigenvalue” or has the same guarantee as (a).
In the rest of Section 2.2 we prove Theorem 6, but first we give some high-level intuition. Recall from the introduction that we would like to compute the SVD of the symmetric matrix corresponding to the quadratic part of the degree-2 polynomial , but the exact SVD is hard to compute. APPROXIMATE-DECOMPOSE works by computing an approximation to the largest eigenvalue-eigenvector pair, and using the approximate eigenvector to play the role of in Definition 5.
The case that is of most interest to us is when the largest eigenvalue has large magnitude compared to the square root of the variance of (since we will use Chatterjee’s theorem to deal with the complementary case) so we focus on this case below. For this case, part (a) of Theorem 6 says that the algorithm outputs a degree- polynomial with the same distribution as . Crucially, in this polynomial , the first variable is “approximately decoupled” from the rest of the polynomial, namely (because is small), and moreover is substantially smaller than (this is important because intuitively it means we have “made progress” on the polynomial ). Note that if we were given the exact eigenvalue-eigenvector pair corresponding to the largest magnitude eigenvalue, it would be possible to meet the conditions of case (a) with .
While approximating the largest eigenvector is a well-studied problem, we could not find any off-the-shelf solution with the guarantees we required. APPROXIMATE-DECOMPOSE adapts the well-known powering method for finding the largest eigenvector to give the desired guarantees.
2.2.1 Decomposing a matrix.
In order to describe the APPROXIMATE-DECOMPOSE algorithm we first need a more basic procedure which we call APPROXIMATE-LARGEST-EIGEN. Roughly speaking, given a real symmetric matrix with a large-magnitude eigenvalue, the APPROXIMATE-LARGEST-EIGEN procedure outputs approximations of the largest-magnitude eigenvalue and its corresponding eigenvector. Theorem 7 gives a precise performance guarantee:
Let be a symmetric matrix whose entries are -bit integers (not all 0) and be given rational numbers. There exists a deterministic algorithm APPROXIMATE-LARGEST-EIGEN which on input , and , runs in time and has the following behavior:
If , the algorithm outputs a number and unit vector such that
the matrix satisfies ; and
If , the algorithm either outputs “small max eigenvalue” or behaves as in case (a).
Let us describe the APPROXIMATE-LARGEST-EIGEN algorithm. Let . The running time of the algorithm will have a polynomial dependence on . Without loss of generality, assume that is a positive number. Instead of working directly with the matrix , we will work with the matrix where . Note that an eigevector-eigenvalue pair of maps to the pair for .
For , the APPROXIMATE-LARGEST-EIGEN algorithm works as follows :
For unit vectors and , the algorithm computes
Let , and define
Note that since can have irrational entries, exact computation of and is not feasible. However, in time , we can compute a unit vector so that . Define as rounded to a precision . It is easy to see that .
If , then output the pair . Else, output “small max eigenvalue”.
Proof of Theorem 7: We start with the following simple claim:
If , then APPROXIMATE-LARGEST-EIGEN outputs “small max eigenvalue”.
Note that if , then . By our choice of we have that , hence the algorithm will output “small max eigenvalue”. ∎
Next let us recall the “powering method” to compute the largest eigenvalue of a symmetric matrix. See the monograph by Vishnoi [Vis13] (the following statement is implicit in Lemma 8.1).
Let be a symmetric matrix, be the largest magnitude eigenvalue of and be the corresponding eigenvector. Let be any unit vector such that . Then, for , .
Let be the eigenvector corresponding to the largest eigenvalue of and be the corresponding eigenvalue. It is clear that there is some such that .
Let be any such index. We will show that
Let be the eigenvectors of (and hence of ) and be the corresponding eigenvalues of . Let and let .
We have and hence . As all eigenvalues of are non-negative, for we have . If , then
The last inequality uses that . Thus, and hence . ∎
If , then for as defined above, .
Recall that an eigenvector with eigenvalue of maps to an eigenvalue of . Thus, if is such that , then . Since , if we choose , then . Thus, we have
Now, observe that . Hence,
where the second inequality uses Proposition 10. ∎
For as defined in Equation (1) and , if and , then for , .
We begin by noting that for , . This implies that . Using the bounds and , we get that .
By assumption we have that , and since we also have that . As a consequence, we have that for every , . Note that