Gaussian width bounds with applications to arithmetic progressions in random settings

# Gaussian width bounds with applications to arithmetic progressions in random settings

Jop Briët CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands  and  Sivakanth Gopi Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
###### Abstract.

Motivated by two problems on arithmetic progressions (APs)—concerning large deviations for AP counts in random sets and random differences in Szemerédi’s theorem—we prove upper bounds on the Gaussian width of the image of the -dimensional Boolean hypercube under a mapping , where each coordinate is a constant-degree multilinear polynomial with  coefficients. We show the following applications of our bounds. Let be the random subset of  containing each element independently with probability .

• Let be the number of -term APs in . We show that a precise estimate on the large deviation rate due to Bhattacharya, Ganguly, Shao and Zhao is valid if for , which slightly improves their bound of for (and matching their  and ).

• A set is -intersective if every dense subset of  contains a non-trivial -term AP with common difference in . We show that  is -intersective with probability  provided for , improving the bound due to Frantzikinakis, Lesigne and Wierdl for  and reproving more directly the same result shown recently by the authors and Dvir.

In addition, we discuss some intriguing connections with special kinds of error correcting codes (locally decodable codes) and the Banach-space notion of type for injective tensor products of -spaces.

J.B. was supported by a VENI grant and the Gravitation-grant NETWORKS-024.002.003 from the Netherlands Organisation for Scientific Research (NWO)
S.G. was supported by NSF CAREER award 1451191 and NSF grant CCF-1523816

## 1. Introduction

The Gaussian width of a point set measures the average maximum correlation between a standard Gaussian vector  and the points in ,

 GW(T)=E[supt∈T⟨t,g⟩].

The terminology reflects the fact that if  is symmetric around the origin, then its Gaussian width is closely related to its average width in a random direction. Motivated by two applications to arithmetic progressions in random settings discussed below, we bound the Gaussian width of certain sets given by the image of the -dimensional Boolean hypercube under a polynomial mapping , where each coordinate is a constant-degree multilinear polynomial given by a hypergraph. An -hypergraph  consists of a vertex set  and a multiset , also denoted , of subsets of  of size at most , called the edges. A hypergraph is -uniform if each edge has size exactly . The degree of a vertex is the number of edges containing it and the degree of , denoted , is the maximum degree among its vertices. Associate with a hypergraph , the multilinear polynomial given by

 (1) pH(x1,…,xn)=1n∑e∈E∏i∈exi.

Note that for a subset , the value counts the number of edges of  which lie completely inside . Associate with a collection of -vertex hypergraphs the polynomial mapping given by

 (2) ψH1,…,Hk(x)=⎛⎜ ⎜⎝pH1(x)⋮pHk(x)⎞⎟ ⎟⎠.

Our main result is then as follows.

###### Theorem 1.1.

Let  be positive integers and let be -hypergraphs with vertex set . Then,

In the following two subsections we discuss two applications of this result.

### 1.1. Large deviations for arithmetic progressions

Let be a hypergraph over a finite vertex set  of cardinality  and for denote by  the random binomial subset where each element of  appears independently of all others with probability . Let  be the number of edges in  that are induced by . Important instances of the random variable  include the count of triangles in an Erdős–Rényi random graph and the count of arithmetic progressions of a given length in the random set .

The study of the asymptotic behavior of  when  is allowed to depend on  and  grows to infinity motivates a large body of research in probabilistic combinatorics. Of particular interest is the problem of determining the probability that significantly exceeds its expectation for , referred to as the upper tail. Despite the fact that standard probabilistic methods fail to give satisfactory bounds on the upper tail in general, advances were made recently for special instances, in particular for triangle counts [LZ17] and general subgraph counts [BGLZ17]. For more general hypergraphs, progress was made by Chatterjee and Dembo [CD16] using a novel nonlinear large deviation principle (LDP), which was improved by Eldan [Eld16] shortly after. The LDPs give estimates on the upper tail in terms of a parameter  whose value is determined by the solution to a certain variational problem, for a range of values of  depending on . This splits the problem of estimating the upper tail into two sub-problems: (1) determining for what range of  the estimate in terms of  holds true and (2) solving the variational problem to determine the value of . The answer to problem (1) turns out to depend on the Gaussian width of a point set related to .

This approach was pursued in [CD16] in the context of 3-term arithmetic progressions, for which problem (1) was solved. The case of longer APs was treated by Bhattacharya et al. [BGSZ16], who solved the variational problem (2) and gave bounds for the relevant Gaussian width. Based on this, they showed that for every , fixed and  tending to zero sufficiently slowly as , the upper tail proability for the count  of -term arithmetic progressions in  is given by

 (3) \rm Pr[Xk≥(1+δ)EXk]=p(1+o(1))√δpk/2N.

The rate at which  is allowed to decay for (3) to hold depends on the Gaussian width of the image of  under the gradient , where  is the hypergraph over  whose edges are formed by -term arithmetic progressions. The bounds on the Gaussian width of this set proved in [BGSZ16] imply that (3) holds provided for

 c3=118,c4=148andck=16k(k−1)for k≥5,

and absolute constants  depending only on . However, the authors conjecture that a probability  slightly larger than  suffices. Evidence for this conjecture is given by a result of Warnke [War16] showing that for all , the logarithm of the upper tail is given by , where the asymptotic notation hides constants depending only on . Notice that (3) improves on this by (almost) determining those constants. The main motivation for finding such precise estimates of the upper tail probability is not so much the problem itself as it is to understand structure of the set  conditioned on  being much larger than its expectation (see [BGSZ16]). With regard to the constants , Theorem 1.1 implies that for all it suffices to set

 ck=16k⌈k−12⌉,

which slightly improves on the range of  for which (3) was known to hold for .

### 1.2. Random differences in Szemerédi’s Theorem

In 1975 Szemerédi [Sze75] proved that any subset of the integers of positive upper density contains arbitrarily long arithmetic progressions, answering a famous open question of Erdős and Turán. It is well known that this is equivalent to the assertion that for every positive integer  and any , there exists an such that if and is a set of size , then  must contain a non-trivial -term arithmetic progression. Certain refinements of Szemerédi’s theorem concern sets such that the theorem still holds true when the arithmetic progressions are required to have common difference from . Such sets are usually referred to as intersective sets in number theory, or recurrent sets in ergodic theory. More precisely, a set is -intersective (or -recurrent) if any set of positive upper density has an -term arithmetic progression with common difference in . Szemerédi’s theorem then states that is -intersective for every , but much smaller intersective sets exist. For example, for any , the set is -intersective for every , which is a special case of more general results of Sárközy [Sár78a] when and of Bergelson and Leibman [BL96] for all . The shifted primes and are also -intersective for every , shown by Sárközy [Sár78b] when  and in a more general setting by Wooley and Ziegler [WZ12] for all .

It is natural to ask at what density, random sets become -intersective. To simplify the discussion, we will look at the analogous question in .

###### Definition 1.2.

Let  be a positive integer and . A subset is -intersective if any subset of size  must contain a non-trivial -term arithmetic progression with common difference in .

It was proved independently by Frantzikinakis et al. [FLW12] and Christ [Chr11] that for and , the random set  is -intersective with probability , provided . This was improved for all in [FLW16], where it was shown that the same result holds with , though it was conjectured there that suffices for all . Based on Theorem 1.1 we obtain the following result, which improves on the latter bounds.

###### Theorem 1.3.

For every there exists an such that the following holds. Let be an integer and let

 βℓ=1⌈ℓ+12⌉andp≥ω(N−βℓlogN).

Then, with probability , the set is -intersective.

### 1.3. Locally decodable codes

There is a close connection between the Gaussian widths considered in Theorem 1.1 and special error-correcting codes known as locally decodable codes (LDCs). A map is a -query LDC if for every and , the value  can be retrieved by reading at most  coordinates of the codeword , even if the codeword is corrupted in a not too large (but possibly constant) fraction of coordinates. A main open problem is to determine the optimal trade-off between  and  when  is a fixed constant. Currently this problem is settled only in the cases  [KT00, KW04, GKST06] and remains wide open for the case . We refer to the extensive survey [Yek12] for more information on this problem. The connection with Gaussian width was established by the authors and Dvir in [BDG17], where we showed that there is a -query LDC from to if and only if there are -matchings on  of size  such that the set has Gaussian width . It was observed there that the best-known lower bounds on the length  of -query LDCs—proved using techniques from quantum information theory [KW04]—imply a slightly different but equivalent version of Theorem 1.3 (see Section 5). The proof of Theorem 1.1 is based on ideas from [KW04], but uses a 1974 random matrix inequality of Tomczak–Jaegermann instead of quantum information theory.111Not surprisingly, the LDC lower bounds of [KW04] are also implied by Theorem 1.1.

### 1.4. Gaussian width bounds from type constants

We observe that the Gaussian width in Theorem 1.1 can be bounded in terms of type constants of certain Banach spaces. Unfortunately, we do not have good enough bounds on the type constants of the required spaces to improve Theorem 1.1. But we hope that this connection will motivate progress on understanding these spaces.

A Banach space  is said to have (Rademacher) type  if there exists a constant such that for every  and ,

 (4) Eε∥∥ ∥∥k∑i=1εixi∥∥ ∥∥pX≤Tpk∑i=1∥xi∥pX,

where the expectation is over a uniformly random . The smallest  for which (4) holds is referred to as the type- constant of , denoted . Type, and its dual notion cotype, play an important role in Banach space theory as they are tightly linked to local geometric properties (we refer to [LT79] and [Mau03] for extensive surveys). Some fundamental facts are as follows. It follows from the triangle inequality that every Banach space has type 1 and from the Khintchine inequality that no Banach space has type . The parallelogram law implies that Hilbert spaces have type 2. An easy but important fact is that fails to have type . Indeed, a famous result of Maurey and Pisier [MP73] asserts that a Banach space fails to have type if and only if it contains  uniformly. Finite-dimensional Banach spaces have type- for all . But of importance to Theorem 1.1 are the actual type constants  of a certain family of finite-dimensional Banach spaces. Let be such that and let be the space of -linear forms on  ( times) endowed with the norm

This space is also known as the injective tensor product of for and as such plays an important role in the theory of tensor products of Banach spaces [Rya02]. The relevance of the type constants of this space to Theorem 1.1 is captured by the following lemma, proved in Section 6.

###### Lemma 1.4.

Let  be positive integers and let be -hypergraphs with vertex set . Then for any such that and any ,

Observe that the space may be identified with the space of matrices endowed with the spectral norm (or operator norm). A key ingredient in the proof of Theorem 1.1, Theorem 2.2 below, easily implies that the type-2 constant of this space is of order . A well-known lower bound of the same order follows for instance from the connection between Gaussian width and LDCs and a basic construction of a 2-query LDC known as the Hadamard code. More generally, lower bounds on the type constants of  are implied by -query LDCs [BNR12, Bri16].

### Acknowledgements

We thank Sean Prendiville, Fernando Xuancheng Shao and Yufei Zhao for helpful discussions. This work was in part carried out while the authors were visiting the Simons Institute during the Pseudorandomness program of Spring 2017 and we thank the institute and organizers for their hospitality.

## 2. Proof of Theorem 1.1

In this section we prove Theorem 1.1. We begin by giving a high-level overview of the ideas. The proof is based on a classic random matrix inequality of Tomczak-Jaegermann which bounds the expected operator norm of a sum of matrices weighted by independent standard normal random variables (Theorem 2.2 below). On its own, this inequality easily implies the result for graphs. To treat the general case, we first reduce to the case of -uniform matchings for , at a cost of a factor  in the number of vertices; a matching is a hypergraph where no two edges intersect. Then we reduce to the case of graphs (unless ) and apply the random matrix inequality. This involves constructing graphs  on approximately vertices, with the property that for each it holds that for some constant  depending only on  (Lemma 2.3). (Switching from Boolean vectors to sign vectors can only make the Gaussian width larger.) To illustrate how these graphs are constructed, we consider a 4-matching  on  vertices and let . It follows from the Birthday Paradox and symmetry that the number the strings in  containing at least two elements of a given edge  is . We let  be the graph with vertex set  with the edges formed by the strings that “cover” some edge in  and “complement” each other, meaning: there are indices such that and for all . The -fold tensor product of a vector is given by . If is an edge in  and , it then follows that . It can then be observed that , modulo the relations , is a linear combination of the monomials appearing in . A more careful analysis shows that the two evaluations are in fact related by a constant factor.

To make the above precise, we first collect some basic facts about hypergraphs. The edge chromatic number of a hypergraph , denoted by , is the minimum number of colors needed to color the edges of such that no two edges which intersect have the same color. Note that  equals the smallest number of matchings into which  can be partitioned. For small values of , the parameters and are closely related.

###### Lemma 2.1.

Let be an -hypergraph. Then,

 Δ(H)≤χE(H)≤s(Δ(H)−1)+1.
###### Proof.

Clearly since edges containing a maximum degree vertex should get different colors. To prove the upper bound, form a graph whose vertices are , and add edges between intersecting hypergraph edges. Then  is equal to the vertex chromatic number of the graph , which, by Brooks’ Theorem, is at most . Since an edge in  can intersect at most other edges, . ∎

The proof of Theorem 1.1 uses a non-commutative Khintchine inequality, which is a special case of a result of Tomczak-Jaegermann [TJ74, Theorem 3.1]. Let  be the standard inner product on  and denote by the Euclidean unit ball in . Given a matrix , its operator norm (or spectral norm) is given by .

###### Theorem 2.2 (Tomczak-Jaegermann).

There exists an absolute constant  such that the following holds. Let  be a collection of matrices and let be independent Gaussian random variables with mean zero and variance 1. Then,

 E[∥∥k∑i=1giAi∥∥]≤C√logN(k∑i=1∥Ai∥2)1/2.

To apply Theorem 2.2 we use the following matrix lemma, proved in the next section.

###### Lemma 2.3.

For every  there exist a and such that the following holds. Let , and . Let be a -uniform hypergraph and let be the polynomial as in (1). Then, there exists a matrix such that and for every ,

 pH(x)=1crN⟨x⊗m,Ax⊗m⟩.

Moreover,  is the adjacency matrix of a graph (with possible parallel edges).

###### of Theorem 1.1.

Assume as in Lemma 2.3. Let  and . We start by reducing to the setting where  are -uniform of degree at most . Given an -hypergraph  over  with degree at most , we first add new vertices to edges with less than  vertices to make all the edges have size . By grouping the edges into  sets of size at most  and using the same new vertices for all edges in the same group, we can obtain a new hypergraph  on vertices which is -uniform and whose degree is still at most . In terms of the polynomials, we are homogenizing the polynomial with new variables to a get a new multilinear polynomial of degree where are the new variables added. Clearly, when the variables are set to 1. So,

 GW(ψH1,…,Hk({0,1}n))≲rGW(ψH′1,…,H′k({0,1}n′)).

Since our claimed bound on the Gaussian width is  , the extra vertices will result in at most an extra factor . It thus suffices to prove the theorem for the case where are -uniform. Also observe that since the polynomials are multilinear, the Gaussian width is bounded from above by replacing binary vectors with sign vectors

Let and and for each , let be a matrix for as in Lemma 2.3. Then, for every ,

 k∑i=1gipHi(x) =1crNn∑i=1gi⟨x⊗m,Aix⊗m⟩ =c−1r⟨x⊗m√N,(k∑i=1giAi)x⊗m√N⟩ ≤c−1r∥∥k∑i=1giAi∥∥.

Hence, by Theorem 2.2,

 ≤c−1rE[∥∥k∑i=1giAi∥∥] ≤c−1r√logN(k∑i=1∥Ai∥2)1/2 ≲rK√kn1−1/rlogn,

where in the last line we used that for each . ∎

## 3. Matrix lemma

Here we prove Lemma 2.3. Let  be a maximal family of disjoint -sets of . Let . Given a string write its -fold tensor product as

 x⊗m=(m∏i=1xf(i))f:[m]→[n].

Given a mapping and set , let

 μS(f)=∑T∈(Sr)∏i∈T|f−1(i)|.

Note that this is a count of the -subsets such that . Denote

 ϕ(f)=∑S∈MμS(f).

For , say that is -good if . Say that complements  if it satisfies the following two criteria:

1. There exists exactly one such that .

2. For all , we have .

If complements  then clearly the converse also holds. Say that the complementary pair covers if . Observe that if covers , then for every , we have

 (5) (x⊗m)f(x⊗m)g=m∏i=1xf(i)xg(i)=∏j∈Sxj.

Define the set of ordered pairs

 (6) P={(f,g):f is t-good and g complements~{}f}.
###### Proposition 3.1.

Let  be as in (6). Then, for every , the number of pairs that cover  equals .

###### Proof.

Fix distinct sets and let be a permutation such that and for all . Let be the set of pairs which cover  and define similarly. We claim that the map is an injective map from  to . It follows that  is covered by at least as many pairs from  as  is. Similarly, interchanging and , the converse also holds. To prove the claim, note that if covers , then covers . Moreover, because maps edges of the matching to edges of . Thus . Finally is injective because if for some , then . Hence  covers all  equally. ∎

###### Proposition 3.2.

For every , we have that is -good.

###### Proof.

Let and be such that covers . Consider the histograms given by and for each . Then  and differ only in . In particular, there is an -set such that for each and for each . Hence,

 μS(g) =∑T∈(Sr)∏i∈TG(i) ≤∑T∈(Sr)∏i∈T(F(i)+1) ≤∑T∈(Sr)(1+2r∏i∈TF(i)) ≤4r+2rμS(f).

For all other , we have . Moreover, must be -good for to belong to . It follows that

 ϕ(g)=∑S′∈MμS′(g)≤4r+2r∑S′∈MμS′(f)=4r+2rϕ(f)≤t2,

where in the last line we used the choice of . ∎

###### Lemma 3.3 (Generalized birthday paradox).

For every there exists a and an such that the following holds. Let  be a uniformly distributed random variable over the set of maps from  to . Then, provided and ,

 \rm Pr[h is t-good]≥12.

We postpone the proof of Lemma 3.3 to Section 3.1.

###### Corollary 3.4.

Let  be as in (6) and let be its incidence matrix, that is . Then, and every row and every column of  has at most ones.

###### Proof.

The first claim follows from Lemma 3.3 and the fact that is at least the number of -good mappings. If  is -good, then there are at most  mappings from that complement . Hence, every row of  has at most  ones and by Proposition 3.2, every column of  has at most  ones. ∎

###### of Lemma 2.3.

By Lemma 2.1, the hypergraph can be decomposed into matchings, which we denote by . Complete each  to a maximal family  of disjoint -subsets of  in some arbitrary way. For each , let  be as in (6) and let be its incidence matrix. Set to zero all the entries of  that correspond to a pair covering a set in . Let and . It follows from (5) and Proposition 3.1 that for every , we have

 (7) ⟨x⊗m,χE(H)∑i=1(Ai+ATi)x⊗m⟩=2χE(H)∑i=1|Pi||Mi|∑S∈Fi∏j∈Sxi.

Since all are maximal, they have the same size, as do the . Hence, by Corollary 3.4, there exists a constant such that the right-hand side of (7) equals . Let  be the graph with adjacency matrix , allowing for parallel edges. Then  has degree at most  and, it follows from Lemma 2.1 that  can be partitioned into  matchings. Since the adjacency matrix of a matching has unit norm, we get that . ∎

### 3.1. Proof of the generalized birthday paradox.

For the proof of Lemma 3.3, we use a standard Poisson approximation result for “balls and bins” problems [MU05, Theorem 5.10]. A discrete Poisson random variable  with expectation  is nonnegative, integer valued, and has probability density function

 (8) \rm Pr[Y=ℓ]=e−μμℓℓ!,∀ℓ=0,1,2,…
###### Proposition 3.5.

If are independent Poisson random variables with expectations , respectively, then is a Poisson random variable with expectation .

###### Lemma 3.6.

Let  be a uniformly distributed map from  to . For each , let  and let . Let  be a vector of independent Poisson random variables with expectation . Then, for any nonnegative function such that  decreases or increases monotonically with , we have

 E[Φ(X)]≤2E[Φ(Y)].
###### of Lemma 3.3.

Let  be a parameter depending only on to be set later. Let and assume that . For a random map as in Lemma 3.6, we begin by lower bounding the probability of the event that . Recall that this occurs if there exists an and an -subset such that . Let  be as in Lemma 3.6. Let be the function

 ψ(x)=∏S∈M∏T∈(Sr)(1−∏i∈T1≥1(xi)).

Then if and  decreases monotonically with . Hence, for  a Poisson random vector as in Lemma 3.6, we have

 \rm Pr[ϕ(h)=0] =E[ψ(X)] ≤2E[ψ(Y)] (9) =2∏S∈ME[∏T∈(Sr)(1−∏i∈T1≥1(Yi))],

where in the last line we used the fact that since the sets  are disjoint, the random variables

 ∏T∈(Sr)(1−∏i∈T1≥1(Yi))

are independent. The random variables , , are independent Bernoullis that are zero with probability . The expectation in (9) equals the probability that these random variables form a string of Hamming weight strictly less than . Using that and the fact that when , this probability is at most

 1−\rm Pr[∀i∈T 1≥1(Yi)=1]=1−(1−e−μ)r≤1−(μ(1−μ/2))r≤1−Crren≤exp(−Crren)

where is some fixed subset of size . Hence, since  is maximal, the above and (9) give

 (10) \rm Pr[ϕ(h)=0]≤2exp(−Crr|M|en)≤2exp(−Crr⌊n/r⌋en)≤2exp(−Crr2er).

Set , then the above right-hand side is at most . Next, we upper bound the probability that . Define by

 χ(x)=∑S∈M∑T∈(Sr)∏i∈Txi.

Then, . Moreover, increases monotonically with . It thus follows from Lemma 3.6 that

 E[ϕ(h)] ≤2E[χ(Y)] =2∑S∈M∑T∈(Sr)∏i∈TE[Yi] ≤2|M|(2rr)(mn)r ≤2⋅nr⋅4r⋅(6er)n−1≤50⋅4r.

where in the second line we used the fact that the are independent. By Markov’s inequality, . With (10), we get that  is -good with probability at least . ∎

## 4. Arithmetic progressions in random sets

Below we state a special case of Eldan’s LDP [Eld16], similar to how it is stated in [BGSZ16]. Consider a multilinear polynomial with zero constant term. The discrete Lipschitz constant of  is given by

 Lip(F)=max{∥(∇F)(y)∥ℓ∞:y∈{0,1}N},

where  is the gradient of . For , define

 Ip(q)=qlogqp+(1−q)log1−q1−p.

For a vector , let be a random vector of independent random variables . Define by

 (11) ϕp(t)=infq∈[0,1]N{N∑i=1Ip(qi):EF(Yq)≥tN}.
###### Theorem 4.1 (Eldan).

Let be a vector of independent Bernoulli random variables and let  be a multilinear form with zero constant term. Let be real numbers such that . Then,

 log\rm Pr[F(X)≥tN]≤−(1−6L(logN)1/6εN1/3)ϕp(t−ε),

where

 (12) L

Moreover, if , then

 log\rm Pr[F(X)≥(t−ε)N]≥−(1+2Lip(F)2ε2N)ϕp(t)−log10.

Theorem 1.1 can be applied to get an upper bound on the parameter  given by (12) when  is a polynomial  given in terms of a hypergraph  with the property that only few edges are incident on any two vertices. For example, if the edges are the (unordered) -term arithmetic progressions in  with non-zero common difference, then any pair of distinct vertices appears in at most  edges. Indeed, if and both belong to an -term AP, then there is a step count and common difference such that . This leaves at most  possibilities for  and  possibilities for the position of  in the AP.

###### Proposition 4.2.

Let  be positive integers. Let  be a -hypergraph such that at most  edges are incident on any given pair of vertices. Then, and

 GW((∇pH)({0,1}N))