Deterministic Approximate Counting for Juntas of Degree-2 Polynomial Threshold Functions

# Deterministic Approximate Counting for Juntas of Degree-2 Polynomial Threshold Functions

Anindya De
anindya@math.ias.edu. Research supported by Umesh Vazirani’s Templeton Foundation Grant 21674.
Ilias Diakonikolas
University of Edinburgh
ilias.d@ed.ac.uk. Supported in part by a SICSA PECE grant and a Carnegie research grant.
Rocco A. Servedio
Columbia University
rocco@cs.columbia.edu. Supported by NSF grant CCF-1115703.
###### Abstract

Let be any Boolean function and be any degree-2 polynomials over We give a deterministic algorithm which, given as input explicit descriptions of and an accuracy parameter , approximates

 Prx∼{−1,1}n[g(sign(q1(x)),…,sign(qk(x)))=1]

to within an additive . For any constant and the running time of our algorithm is a fixed polynomial in (in fact this is true even for some not-too-small and not-too-large ). This is the first fixed polynomial-time algorithm that can deterministically approximately count satisfying assignments of a natural class of depth-3 Boolean circuits.

Our algorithm extends a recent result [DDS13] which gave a deterministic approximate counting algorithm for a single degree-2 polynomial threshold function corresponding to the case of our result. Note that even in the case it is NP-hard to determine whether is nonzero, so any sort of multiplicative approximation is almost certainly impossible even for efficient randomized algorithms.

Our algorithm and analysis requires several novel technical ingredients that go significantly beyond the tools required to handle the case in [DDS13]. One of these is a new multidimensional central limit theorem for degree-2 polynomials in Gaussian random variables which builds on recent Malliavin-calculus-based results from probability theory. We use this CLT as the basis of a new decomposition technique for -tuples of degree-2 Gaussian polynomials and thus obtain an efficient deterministic approximate counting algorithm for the Gaussian distribution, i.e., an algorithm for estimating

 Prx∼N(0,1)n[g(sign(q1(x)),…,sign(qk(x)))=1].

Finally, a third new ingredient is a “regularity lemma” for -tuples of degree- polynomial threshold functions. This generalizes both the regularity lemmas of [DSTW10, HKM09] (which apply to a single degree- polynomial threshold function) and the regularity lemma of Gopalan et al [GOWZ10] (which applies to a -tuples of linear threshold functions, i.e., the case ). Our new regularity lemma lets us extend our deterministic approximate counting results from the Gaussian to the Boolean domain.

## 1 Introduction

Unconditional derandomization has been an important research area in computational complexity theory over the past two decades [AW85, Nis91, Nis92, NW94]. A major research goal in this area is to obtain efficient deterministic approximate counting algorithms for “low-level” complexity classes such as constant depth circuits, small space branching programs, polynomial threshold functions, and others [LVW93, LV96, Tre04, GMR13, Vio09, GKM11, DDS13]. Under the widely-believed hypothesis , there must be a polynomial time deterministic algorithm that can approximate the fraction of satisfying assignments to any polynomial–size circuit. Since finding such an algorithm seems to be out of reach of present day complexity theory [KI02], research efforts have been directed to the aforementioned low-level classes.

A natural class of Boolean functions to consider in this context is the class of polynomial threshold functions (PTFs). Recall that a degree- PTF, , is a Boolean function defined by where is a degree- polynomial over the reals and is defined as iff . In the special case where , degree- PTFs are often referred to as linear threshold functions (LTFs). Understanding the structure of these functions has been a topic of extensive investigation for decades (see e.g., [MK61, MTT61, MP68, Mur71, GHR92, Orp92, Hås94, Pod09] and many other works) due to their importance in fields such as concrete complexity theory [She08, She09, DHK10, Kan10, Kan12b, Kan12a, KRS12], learning theory [KKMS08, SSSS11, DOSW11, DDFS12], voting theory [APL07, DDS12], and others.

In the context of approximate counting, there is a significant gap in our understanding of low-degree PTFs. An outstanding open problem is to design a deterministic algorithm that approximates the fraction of satisfying assignments to a constant degree PTF over to an additive and runs in time . Even for the class of degree- PTFs, until recently no deterministic algorithm was known with running time for any sub-constant value of the error . In previous work [DDS13] we obtained such an algorithm. In the present paper we make further progress on this problem by developing the first efficient deterministic counting algorithm for the class of juntas of (any constant number of) degree- PTFs.

### 1.1 Our main result.

As our main result, we give a polynomial-time deterministic approximate counting algorithm for any Boolean function of constantly many degree-2 polynomial threshold functions.

###### Theorem 1.

[Deterministic approximate counting of functions of degree-2 PTFs over ] There is an algorithm with the following properties: given an arbitrary function and degree-2 polynomials and an accuracy parameter , the algorithm runs (deterministically) in time and outputs a value such that

 ∣∣Prx∈{−1,1}n[g(sign(q1(x)),…,sign(qk(x)))=1]−v∣∣≤ϵ.

Our result may be (somewhat informally) restated in terms of Boolean circuits as a -time deterministic approximate counting algorithm for the class -- of depth- circuits that have an arbitrary gate (i.e., junta) at the top level, arbitrary weighted threshold gates at the middle level, and fanin-2 gates at the bottom level. Theorem 1 is a broad generalization of the main result of [DDS13], which establishes the special case of the current result.

As noted in [DDS13], the problem of determining whether is nonzero for a degree- polynomial is well known to be NP-hard, and hence no efficient algorithm, even allowing randomness, can give a multiplicative approximation to unless NP RP. Given this, it is natural to work towards an additive approximation, which is what we achieve.

Previous work. For and Gopalan et al. in [GKM11] obtained a multiplicatively -accurate deterministic time approximate counting algorithm. For , however, as noted above additive approximation is the best one can hope for. For the special case of , in separate work [DDS13], the authors have given a deterministic approximate counting algorithm that runs in time . As we explain in detail in the rest of this introduction, more sophisticated ideas and techniques are required to obtain the results of the current paper for general . These include a new central limit theorem based on Malliavin calculus and Stein’s method, and a new decomposition procedure that goes well beyond the decomposition approach employed in [DDS13].

We remark that the only previous deterministic approximate counting algorithm for -juntas of degree- PTFs follows from the pseudorandom generators (PRGs) of [DKN10] (which are based on bounded independence). The running time of the resulting algorithm is , even for .

### 1.2 Techniques.

Our high-level approach to establishing Theorem 1 follows a by now standard approach in this area. We first (i) establish the result for general polynomials over Gaussian inputs; then (ii) use a “regularity lemma” to show that every polynomial over Boolean inputs can be decomposed into a “small” number of regular polynomials over Boolean inputs; and finally (iii) use an invariance principle to reduce the problem for “regular” polynomials over Boolean inputs to the problem for regular polynomials over Gaussian inputs. This general approach has been used in a number of previous works, including constructions of unconditional PRGs [DGJ10, MZ10, GOWZ10, DKN10, Kan11, Kan12b], learning and property testing [MORS10, OS11], and other works. However, we emphasize that significant novel conceptual and technical work is required to make this approach work in our setting. More specifically, to achieve step (i), we require (i.a) a new multidimensional CLT for degree- Gaussian polynomials with small eigenvalues and (i.b) a new decomposition procedure that transforms a -dimensional vector of Gaussian polynomials into a tractable form for the purpose of approximate counting. For step (ii) we establish a novel regularity lemma for -vectors of low-degree polynomials. Finally, Step (iii) follows by an application of the invariance principle of Mossel [Mos10] combined with appropriate mollification arguments [DKN10]. In the rest of this section we discuss our new approaches to Steps (i) and (ii).

##### Step (i): The counting problem over Gaussian inputs.

The current paper goes significantly beyond the techniques of [DDS13]. To explain our new contributions let us first briefly recall the [DDS13] approach.

The main observation enabling the result in [DDS13] is this: Because of rotational symmetry of the Gaussian distribution, a degree- Gaussian polynomial can be “diagonalized” so that there exist no “cross-terms” in its representation. In a little more detail, if (we ignore the linear term for simplicity), where , then can be rewritten in the form , where and the ’s are the eigenvalues of the corresponding matrix. Roughly speaking, once such a representation has been (approximately) constructed, the counting problem can be solved efficiently by dynamic programming. To construct such a decomposition, [DDS13] employs a “critical-index” based analysis on the eigenvalues of the corresponding matrix. For the analysis of the [DDS13] algorithm, [DDS13] proves a CLT for a single degree- Gaussian polynomial with small eigenvalues (this CLT is based on a result of Chaterjee [Cha09]). (We note that this informal description suppresses several non-trivial technical issues, see [DDS13] for details.)

At a high level, the approach of the current paper builds on the approach of [DDS13]. To solve the Gaussian counting problem we use a combination of (i.a) a new multidimensional CLT for -tuples of degree- Gaussian polynomials with small eigenvalues, and (i.b) a novel decomposition result for -tuples of degree-2 Gaussian polynomials. We now elaborate on these steps.

• As our first contribution, we prove a new multidimensional central limit theorem for -tuples of degree- Gaussian polynomials (Theorem 8). Roughly speaking, our CLT states that if each polynomial in the -tuple has small eigenvalues, then the joint distribution of the -tuple is close to a -dimensional Gaussian random variable with matching mean and covariance matrix. The closeness here is with respect to the -dimensional Kolmogorov distance over (a natural generalization of Kolmogorov distance to vector-valued random variables, which we denote and which is useful for analyzing PTFs). To establish our new CLT, we proceed in two steps: In the first (main) step, we make essential use of a recent multidimensional CLT due to Nourdin and Peccati [NP09] (Theorem 11) which is proved using a combination of Malliavin calculus and Stein’s method. To use this theorem in our setting, we perform a linear-algebraic analysis which allows us to obtain precise bounds on the Malliavin derivatives of degree- Gaussian polynomials with small eigenvalues. An application of [NP09] then gives us a version of our desired CLT with respect to “test functions” with bounded second derivatives (Theorem 12). In the second step, we use tools from mollification [DKN10] to translate this notion of closeness into closeness with respect to -dimensional Kolmogorov distance, thus obtaining our intended CLT. (As a side note, we believe that this work is the first to use Malliavin-calculus-based tools in the context of derandomization.)

• As our second contribution, we give an efficient procedure that transforms a -tuple of degree- Gaussian polynomials into a -tuple of degree- Gaussian polynomials such that: (1) and are -close, and (2) the -tuple has a “nice structure” that allows for efficient deterministic approximate counting. In particular, there is a “small” set of variables such that for each restriction fixing this set, the restricted -tuple of polynomials is well-approximated by a -dimensional Gaussian random variable (with the appropriate mean and covariance matrix). Once such an has been obtained, deterministic approximate counting is straightforward via an appropriate discretization of the -dimensional Gaussian distribution (see Section 5).

We now elaborate on Item (1) above. At a high level, the main step of our transformation procedure performs a “change of basis” to convert into an essentially equivalent (for the purpose of approximate counting) vector of polynomials. The high-level approach to achieve this is reminiscent of (and inspired by) the decomposition procedure for vectors of linear forms in [GOWZ10]. However, there are significant complications that arise in our setting. In particular, in the [GOWZ10] approach, a vector of linear forms is simplified by “collecting” variables in a greedy fashion as follows: Each of the linear forms has a “budget” of at most , meaning that at most variables will be collected on its behalf. Thus, the overall number of variables that are collected is at most . At each stage some variable is collected which has large influence in the remaining (uncollected) portion of some linear form. The [GOWZ10] analysis shows that after at most variables have been collected on behalf of each linear form, each of the linear forms will either be regular or its remaining portion (consisting of the uncollected variables) will have small variance. In our current setting, we are dealing with degree- Gaussian polynomials instead of linear forms. Recall that every degree- polynomial can be expressed as a linear combination of squares of linear forms (i.e., it can be diagonalized). Intuitively, since Gaussians are invariant under change of basis, we can attempt to use an approach where linear forms will play the role that variables had in [GOWZ10]. Mimicking the [GOWZ10] strategy, each quadratic polynomial will have at most linear forms collected on its behalf, and at most linear forms will be collected overall. Unfortunately, this vanilla strategy does not work even for , as it requires a single orthonormal basis in which all the degree- polynomials are simultaneously diagonalized.

Instead, we resort to a more refined strategy. Starting with the quadratic polynomials, we use the following iterative algorithm: If the largest magnitude eigenvalue of each quadratic form is small, we are already in the regular case (and we can appeal to our multidimensional CLT). Otherwise, there exists at least one polynomial with a large magnitude eigenvalue. We proceed to collect the corresponding linear form and “reduce” every polynomial by this linear form. (The exact description of this reduction is somewhat involved to describe, but intuitively, it uses the fact that Gaussians are invariant under orthogonal transformations.) This step is repeated iteratively; an argument similar to [GOWZ10] shows that for every quadratic polynomial, we can collect at most linear forms. At the end of this procedure, each of the quadratic polynomials will either be “regular” (have small largest magnitude eigenvalue compared to the variance of the remaining portion), or else the variance of the remaining portion will be small. This completes the informal description of our transformation.

Our main result for the Gaussian setting is the following theorem:

###### Theorem 2.

[Deterministic approximate counting of functions of degree- PTFs over Gaussians] There is an algorithm with the following properties: It takes as input explicit descriptions of -variable degree- polynomials , an explicit description of a -bit Boolean function , and a value It runs (deterministically) in time and outputs a value such that

 ∣∣PrG∼N(0,1)n[g(Q1(G),…,Qk(G))=1]−~v∣∣≤ϵ, (1)

where for

We note that in the case , the algorithm of the current work is not the same as the algorithm of [DDS13] (indeed, observe the above algorithm runs in time exponential in even for , whereas the algorithm of [DDS13] runs in time for a single Gaussian polynomial).

##### Step (ii): The regularity lemma.

Recall that the influence of variable on a multilinear polynomial over (under the uniform distribution) is and that the variance of is . For a degree- polynomial we have so for small constant the variance and the total influence are equal up to a small constant factor. A polynomial is said to be -regular if for all we have

As noted earlier, by adapting known invariance principles from the literature [Mos08] it is possible to show that an algorithm for approximately counting satisfying assignments of a junta of degree-2 PTFs over will in fact also succeed for approximately counting satisfying assignments of a junta of sufficiently regular degree-2 PTFs over . Since Theorem 2 gives us an algorithm for the Gaussian problem, to complete the chain we need a reduction from the problem of counting satisfying assignments of a junta of arbitrary degree-2 PTFs over , to the problem of counting satisfying assignments of a junta of regular degree-2 PTFs over .

We accomplish this by giving a novel regularity lemma for -tuples of degree-2 (or more generally, degree-) polynomials. Informally speaking, this is an efficient deterministic algorithm with the following property: given as input a -tuple of arbitrary degree-2 polynomials over , it constructs a decision tree of restrictions such that for almost every root-to-leaf path (i.e., restriction ) in the decision tree, all restricted polynomials are “easy to handle” for deterministic approximate counting, in the following sense: each is either highly regular, or else is highly skewed, in the sense that its constant term is so large compared to its variance that the corresponding PTF is guaranteed to be very close to a constant function. Such leaves are “easy to handle” because we can set the PTFs corresponding to “skewed” polynomials to constants (and incur only small error); then we are left with a junta of regular degree-2 PTFs, which can be handled using the Gaussian algorithm as sketched above.

A range of related “regularity lemmas” have been given in the LTF/PTF literature [DSTW10, HKM09, BELY09, GOWZ10], but none with all the properties that we require. [Ser07] implicitly gave a regularity lemma for a single LTF, and [DSTW10, HKM09, BELY09] each gave (slightly different flavors of) regularity lemmas for a single degree- PTF. Subsequently [GOWZ10] gave a regularity lemma for -tuples of LTFs; as noted earlier our decomposition for -tuples of degree-2 polynomials over Gaussian inputs given in Section 5 uses ideas from their work. However, as we describe in Section 7, their approach does not seem to extend to degrees , so we must use a different approach to prove our regularity lemma.

### 1.3 Organization.

After giving some useful background in Section 2, we prove our new multidimensional CLT in Section 3. We give the transformation procedure that is at the heart of our decomposition approach in Section 4, and present the actual deterministic counting algorithm for the Gaussian case that uses this transformation in Section 5. Section 6 shows how the new regularity lemma for -tuples of Boolean PTFs gives the main Boolean counting result, and finally the regularity lemma is proved in Section 7.

## 2 Definitions, Notation and Useful Background

##### Polynomials and PTFs.

Throughout the paper we use lower-case letters etc. to denote low-degree multivariate polynomials. We use capital letters to denote the corresponding polynomial threshold functions that map to , so typically , , etc.

We consider multivariate polynomials over the domains (endowed with the standard normal distribution ) and (endowed with the uniform distribution). Since for , in dealing with polynomials over the domain we may without loss of generality restrict our attention to multilinear polynomials.

##### Kolmogorov distance between Rk-valued random variables.

It will be convenient for us to use a natural -dimensional generalization of the Kolmogorov distance between two real-valued random variables which we now describe. Let and be two -valued random variables. We define the -dimensional Kolmogorov distance between and to be

 dK(X,Y)=sup(θ1,…,θk)∈Rk∣∣Pr[∀ i∈[k] Xi≤θi]−Pr[∀ i∈[k] Yi≤θi]∣∣.

This will be useful to us when we are analyzing -juntas of degree-2 PTFs over Gaussian random variables; we will typically have where and is a degree-2 polynomial, and have be a -dimensional Gaussian random variable whose mean and covariance matrix match those of .

##### Notation and terminology for degree-2 polynomials.

Let be a vector of polynomials over . We endow with the distribution, and hence we may view as a -dimensional random variable. We sometimes refer to the ’s as Gaussian polynomials.

For a real matrix we write to denote the operator norm

Given a degree- polynomial defined as , we define the (symmetric) matrix corresponding to its quadratic part as : . Note that with this definition we have that for the vector .

Throughout the paper we adopt the convention that the eigenvalues of a real symmetric matrix satisfy . We sometimes write to denote , and we sometimes write to refer to the -th eigenvalue of the matrix defined based on as described above.

##### Degree-2 polynomials and their heads and tails.

The following notation will be useful for us, especially in Section 4. Let be a degree-2 polynomial. For we say the -head of , denoted , is the polynomial

and the -tail of , denoted , is the polynomial

 Tailt(z(y))\lx@stackreldef=∑t

so clearly we have . (Intuitively, is the part of which does not “touch” any of the first variables and is the part which does “touch” those variables.)

###### Remark 3.

Note that if is a restriction fixing variables , then the restricted polynomial is of the form where is an affine form.

We further define , the “quadratic portion of the -tail,” to be

Setting aside heads and tails, it will sometimes be useful for us to consider the sum of squares of all the (non-constant) coefficients of a degree-2 polynomial. Towards that end we have the following definition:

###### Definition 4.

Given defined by , define as .

The following straightforward claim is established in [DDS13]:

###### Claim 5.

[Claim 20 of [DDS13]] Given , we have that .

##### Tail bounds and anti-concentration bounds on low-degree polynomials in Gaussian variables.

We will need the following standard concentration bound for low-degree polynomials over independent Gaussians.

###### Theorem 6 (“degree-d Chernoff bound”, [Jan97]).

Let be a degree- polynomial. For any , we have

 Prx∼N(0,1)n[|p(x)−E[p(x)]|>t⋅√Var(p(x))]≤de−Ω(t2/d).

We will also use the following anti-concentration bound for degree- polynomials over Gaussians:

###### Theorem 7 ([Cw01]).

Let be a degree- polynomial that is not identically 0. Then for all and all , we have

 Prx∼N(0,1)n[|p(x)−θ|<ϵ√Var(p)]≤O(dϵ1/d).

The model. Throughout this paper, our algorithms will repeatedly be performing basic linear algebraic operations, in particular SVD computation and Gram-Schmidt orthogonalization. In the bit complexity model, it is well-known that these linear algebraic operations can be performed (by deterministic algorithms) up to additive error in time . For example, let have -bit rational entries. It is known (see [GL96] for details) that in time , it is possible to compute a value and vectors , , such that and , where is the largest singular value of . Likewise, given linearly independent vectors with -bit rational entries, it is possible to compute vectors in time such that if is a Gram-Schmidt orthogonalization of then we have for all .

In this paper, we work in a unit-cost real number model of computation. This allows us to assume that given a real matrix with -bit rational entries, we can compute the SVD of exactly in time . Likewise, given vectors over , each of whose entries are -bit rational numbers, we can perform an exact Gram-Schmidt orthogonalization in time . Using high-accuracy approximations of the sort described above throughout our algorithms, it is straightforward to translate our unit-cost real-number algorithms into the bit complexity setting, at the cost of some additional error in the resulting bound.

Using these two observations, it can be shown that by making sufficiently accurate approximations at each stage where a numerical computation is performed by our “idealized” algorithm, the cumulative error resulting from all of the approximations can be absorbed into the final error bound. Since inverse polynomial levels of error can be achieved in polynomial time for all of the approximate numerical computations that our algorithm performs, and since only poly many such approximation steps are performed by poly-time algorithms, the resulting approximate implementations of our algorithms in a bit-complexity model also achieve the guarantee of our main results, at the cost of a fixed overhead in the running time. For the sake of completeness, such a detailed numerical analysis was performed in our previous paper [DDS13]. Since working through the details of such an analysis is tedious and detracts from the clarity of the presentation, we content ourselves with this brief discussion in this work.

## 3 A multidimensional CLT for degree-2 Gaussian polynomials

In this section we prove a central limit theorem which plays a crucial role in the decomposition result which we establish in the following sections. Let where each is a degree-2 polynomial in Gaussian random variables Our CLT states that under suitable conditions on — all of them have only small–magnitude eigenvalues, no is too large and at least one is not too small — the distribution of is close (in -dimensional Kolmogorov distance) to the distribution of the -dimensional Gaussian random variable whose mean and covariance matrix match .

###### Theorem 8.

Let where each is a degree-2 Gaussian polynomial that satisfies and for all . Suppose that . Let denote the covariance matrix of and let be a -dimensional Gaussian random variable with covariance matrix and mean where . Then

 dK(q,N)≤O(k2/3ϵ1/6λ1/6).

Looking ahead to motivate this result for our ultimate purposes, Theorem 8 is useful for deterministic approximate counting because if satisfies the conditions of the theorem, then the theorem ensures that is close to . Note that the latter quantity can be efficiently estimated by a deterministic algorithm.

A key ingredient in the proof of Theorem 8 is a CLT due to Nourdin and Peccati [NP09] which gives a bound that involves the Malliavin derivative of the functions . In Section 3.1we give the necessary background from Malliavin calculus and build on the [NP09] result to prove a result which is similar to Theorem 8 but gives a bound on rather than for a broad class of “test functions” (see Theorem 12 below). In Section 3.2we show how Theorem 12can be combined with standard “mollification” techniques to yield Theorem 8.

### 3.1 Malliavin calculus and test functions with bounded second derivative.

We need some notation and conceptual background before we can state the Nourdin-Peccati multi-dimensional CLT from [NP09]. Their CLT is proved using Stein’s method; while there is a rich theory underlying their result we give only the absolute basics that suffice for our purposes. (See e.g. [NP09, Nou12] for detailed treatments of Malliavin calculus and its interaction with Stein’s Method.)

We will use to denote the space endowed with the standard normal measure and to denote the family of all polynomials over . For integer we let denote the “-th Wiener chaos” of , namely the space of all homogeneous degree- Hermite polynomials over We define the operator as follows : maps to the degree- part of its Hermite expansion, so if has degree then

We next define the generator of the Ornstein-Uhlenbeck semigroup. This is the operator which is defined on via

 Lp=∞∑q=0−q⋅Iq(p).

It is easy to see that for we have the inverse operator

 L−1p=∞∑q=1−1qIq(p).

Next we introduce the notion of the Malliavin derivative. The Malliavin derivative operator maps a real-valued random variable (defined over by a differentiable real-valued function ) to an -dimensional vector of random variables in the following way: for ,

 Df=(∂f∂x1,…,∂f∂xn).

The following key identity provides the fundamental connection between Malliavin Calculus and Stein’s method, which is used to prove Theorem 11 below:

###### Claim 9 (see e.g. Equation (2.22) of [Np09]).

Let be a continuous function with a bounded first derivative. Let and be polynomials over with . Then .

Specializing to the case , we have

###### Corollary 10.

Let and be finite degree polynomials over with . Then, .

We now recall the following CLT due to Nourdin and Peccati:

###### Theorem 11.

[[NP09], see also [Nou12], Theorem 6.1] Let where each is a Gaussian polynomial with . Let be a symmetric PSD matrix in and let be a mean-0 -dimensional Gaussian random variable with covariance matrix . Then for any such that , we have

 |E[h(p)]−E[h(N)]|<12∥h′′∥∞⋅(k∑i=1k∑j=1E[|C(i,j)−Y(i,j)|])

where .

We now use Theorem 11 to prove our main result of this subsection, which is the following CLT for multidimensional degree-2 Gaussian polynomials with small-magnitude eigenvalues. Our CLT says that such multidimensional random variables must in fact be close to multidimensional Gaussian distributions, where “closeness” here is measured using test functions with bounded second derivative. (In the next subsection we extend this result using mollification techniques to obtain Theorem 8, which uses multidimensional Kolmogorov distance.)

###### Theorem 12.

Let where each is a degree- mean-0 Gaussian polynomial with and . Let denote the covariance matrix of , so Let be a mean-zero -dimensional Gaussian random variable with covariance matrix . Then for any such that , we have

 |E[h(q)]−E[h(N)]|
###### Proof.

As in Theorem 11, we write to denote For any , we have

 C(a,b)=Cov(qa,qb)=E[qaqb]=E[Y(a,b)], (5)

where the second equality is because and have mean 0 and the third equality is by Corollary 10. Since is a covariance matrix and every covariance matrix is PSD, we may apply Theorem 11, and we get that

 |E[h(q)]−E[h(N)]|

where we used (5) for the equality. By Jensen’s inequality we have Lemma 13 below gives us that , and the theorem is proved. ∎

It remains to establish the following lemma:

###### Lemma 13.

For each , we have that

###### Proof.

Fix , so and are degree-2 Gaussian polynomials with mean 0. Recalling the spherical symmetry of the distribution, by a suitable choice of basis that diagonalizes we may write

 qa(x)=n∑i=1λix2i+n∑i=1βixi+γandqb(x)=n∑i,j=1δijxixj+n∑i=1κixi+ρ,

where we take for all

Recalling that , we start by observing that For , we have that We have . Recalling that the first two normalized Hermite polynomials are and , it is straightforward to verify that (the homogeneous degree-2 part of the Hermite expansion of ) is

 I2(qb)=∑1≤i≠j≤kδijh1(xi)h1(xj)+n∑i=1√2⋅δiih2(xi).

Hence

 L−1qb=−n∑i=1κixi−12∑1≤i≠j≤kδijxixj−12n∑i=1δii(x2i−1),

so

 −DL−1qb=(κℓ+n∑i=1δiℓxi)ℓ=1,…,n.

We thus can write as a degree-2 polynomial in the variables as

 Y(a,b) = n∑ℓ=1(2λℓxℓ+βℓ)⋅(κℓ+n∑i=1δiℓxi) = n∑i=1n∑ℓ=12λℓδiℓxixℓ+n∑ℓ=12κℓλℓxℓ+n∑i=1(n∑ℓ=1βℓδiℓ)xi+n∑ℓ=1κℓβℓ.

By Claim 5, we know that . Using the inequality for the degree-1 coefficients, to prove the lemma it suffices to show that

 n∑i=1n∑ℓ=1(λℓδiℓ)2+n∑ℓ=1(κℓλℓ)2+n∑i=1(n∑ℓ=1βℓδiℓ)2≤O(ϵ2). (6)

We bound each term of (6) in turn. For the first, we recall that each is an eigenvalue of and hence satisfies ; hence we have

 n∑i=1n∑ℓ=1(λℓδiℓ)2≤ϵ2n∑i=1n∑ℓ=1(δiℓ)2≤ϵ2,

where we have used Claim 5 again to get that For the second term, we have

 n∑ℓ=1(κℓλℓ)2≤ϵ2⋅n∑ℓ=1κ2ℓ≤ϵ2⋅SS(qb)≤ϵ2.

Finally, for the third term, let us write for the matrix corresponding to the quadratic part of and for the column vector whose -th entry is . Then we have that

 n∑i=1(n∑ℓ=1βℓδiℓ)2=∥M¯β∥22≤∥λmax(M)¯β∥22≤ϵ2∥¯β∥2≤ϵ2,

where the second inequality is because each eigenvalue of has magnitude at most 1 and the third is because This concludes the proof of Lemma 13. ∎

### 3.2 From test functions with bounded second derivative to multidimensional Kolmogorov distance.

In this subsection we show how “mollification” arguments can be used to extend Theorem 12 to Theorem 8. The main idea is to approximate the (discontinuous) indicator function of an appropriate region by an appropriately “mollified” function (that is continuous with bounded second derivative) so that the corresponding expectations are approximately preserved. There are several different mollification constructions in the literature that could potentially by used for this purpose. We use the following theore from [DKN10].

###### Theorem 14.

[[DKN10], Theorem 4.8 and Theorem 4.10] Let be the indicator of a region in and be arbitrary. Then there exists a function satisfying:

• for any , and

• for all ,

where is the Euclidean distance of the point to the closest point in .

We use this to prove the following lemma, which says that if a -dimensional Gaussian “mimics” the joint distribution of a vector of degree-2 Gaussian polynomials (in the sense of “fooling” all test functions with bounded second derivative), then must have small -dimensional Kolmogorov distance from :

###### Lemma 15.

Let be degree-2 polynomials with , and let be their joint distribution when is drawn from . Let be a jointly normal distribution such that . Suppose that for all functions , it holds that . Then we have

 dK(X,Y)≤O(k1/3η1/6λ1/6).
###### Proof.

Fix any and define the function to be the indicator of the region Choose . We have

 = |E[I(X)]−E[I(Y)]| ≤ ∣∣E[~Ic(X)]−E[~Ic(Y)]∣∣+∣∣E[~I