Gaussian width bounds with applications to arithmetic progressions in random settings
Abstract.
Motivated by two problems on arithmetic progressions (APs)—concerning large deviations for AP counts in random sets and random differences in Szemerédi’s theorem—we prove upper bounds on the Gaussian width of the image of the dimensional Boolean hypercube under a mapping , where each coordinate is a constantdegree multilinear polynomial with coefficients. We show the following applications of our bounds. Let be the random subset of containing each element independently with probability .

Let be the number of term APs in . We show that a precise estimate on the large deviation rate due to Bhattacharya, Ganguly, Shao and Zhao is valid if for , which slightly improves their bound of for (and matching their and ).

A set is intersective if every dense subset of contains a nontrivial term AP with common difference in . We show that is intersective with probability provided for , improving the bound due to Frantzikinakis, Lesigne and Wierdl for and reproving more directly the same result shown recently by the authors and Dvir.
In addition, we discuss some intriguing connections with special kinds of error correcting codes (locally decodable codes) and the Banachspace notion of type for injective tensor products of spaces.
1. Introduction
The Gaussian width of a point set measures the average maximum correlation between a standard Gaussian vector and the points in ,
The terminology reflects the fact that if is symmetric around the origin, then its Gaussian width is closely related to its average width in a random direction. Motivated by two applications to arithmetic progressions in random settings discussed below, we bound the Gaussian width of certain sets given by the image of the dimensional Boolean hypercube under a polynomial mapping , where each coordinate is a constantdegree multilinear polynomial given by a hypergraph. An hypergraph consists of a vertex set and a multiset , also denoted , of subsets of of size at most , called the edges. A hypergraph is uniform if each edge has size exactly . The degree of a vertex is the number of edges containing it and the degree of , denoted , is the maximum degree among its vertices. Associate with a hypergraph , the multilinear polynomial given by
(1) 
Note that for a subset , the value counts the number of edges of which lie completely inside . Associate with a collection of vertex hypergraphs the polynomial mapping given by
(2) 
Our main result is then as follows.
Theorem 1.1.
Let be positive integers and let be hypergraphs with vertex set . Then,
In the following two subsections we discuss two applications of this result.
1.1. Large deviations for arithmetic progressions
Let be a hypergraph over a finite vertex set of cardinality and for denote by the random binomial subset where each element of appears independently of all others with probability . Let be the number of edges in that are induced by . Important instances of the random variable include the count of triangles in an Erdős–Rényi random graph and the count of arithmetic progressions of a given length in the random set .
The study of the asymptotic behavior of when is allowed to depend on and grows to infinity motivates a large body of research in probabilistic combinatorics. Of particular interest is the problem of determining the probability that significantly exceeds its expectation for , referred to as the upper tail. Despite the fact that standard probabilistic methods fail to give satisfactory bounds on the upper tail in general, advances were made recently for special instances, in particular for triangle counts [LZ17] and general subgraph counts [BGLZ17]. For more general hypergraphs, progress was made by Chatterjee and Dembo [CD16] using a novel nonlinear large deviation principle (LDP), which was improved by Eldan [Eld16] shortly after. The LDPs give estimates on the upper tail in terms of a parameter whose value is determined by the solution to a certain variational problem, for a range of values of depending on . This splits the problem of estimating the upper tail into two subproblems: (1) determining for what range of the estimate in terms of holds true and (2) solving the variational problem to determine the value of . The answer to problem (1) turns out to depend on the Gaussian width of a point set related to .
This approach was pursued in [CD16] in the context of 3term arithmetic progressions, for which problem (1) was solved. The case of longer APs was treated by Bhattacharya et al. [BGSZ16], who solved the variational problem (2) and gave bounds for the relevant Gaussian width. Based on this, they showed that for every , fixed and tending to zero sufficiently slowly as , the upper tail proability for the count of term arithmetic progressions in is given by
(3) 
The rate at which is allowed to decay for (3) to hold depends on the Gaussian width of the image of under the gradient , where is the hypergraph over whose edges are formed by term arithmetic progressions. The bounds on the Gaussian width of this set proved in [BGSZ16] imply that (3) holds provided for
and absolute constants depending only on . However, the authors conjecture that a probability slightly larger than suffices. Evidence for this conjecture is given by a result of Warnke [War16] showing that for all , the logarithm of the upper tail is given by , where the asymptotic notation hides constants depending only on . Notice that (3) improves on this by (almost) determining those constants. The main motivation for finding such precise estimates of the upper tail probability is not so much the problem itself as it is to understand structure of the set conditioned on being much larger than its expectation (see [BGSZ16]). With regard to the constants , Theorem 1.1 implies that for all it suffices to set
which slightly improves on the range of for which (3) was known to hold for .
1.2. Random differences in Szemerédi’s Theorem
In 1975 Szemerédi [Sze75] proved that any subset of the integers of positive upper density contains arbitrarily long arithmetic progressions, answering a famous open question of Erdős and Turán. It is well known that this is equivalent to the assertion that for every positive integer and any , there exists an such that if and is a set of size , then must contain a nontrivial term arithmetic progression. Certain refinements of Szemerédi’s theorem concern sets such that the theorem still holds true when the arithmetic progressions are required to have common difference from . Such sets are usually referred to as intersective sets in number theory, or recurrent sets in ergodic theory. More precisely, a set is intersective (or recurrent) if any set of positive upper density has an term arithmetic progression with common difference in . Szemerédi’s theorem then states that is intersective for every , but much smaller intersective sets exist. For example, for any , the set is intersective for every , which is a special case of more general results of Sárközy [Sár78a] when and of Bergelson and Leibman [BL96] for all . The shifted primes and are also intersective for every , shown by Sárközy [Sár78b] when and in a more general setting by Wooley and Ziegler [WZ12] for all .
It is natural to ask at what density, random sets become intersective. To simplify the discussion, we will look at the analogous question in .
Definition 1.2.
Let be a positive integer and . A subset is intersective if any subset of size must contain a nontrivial term arithmetic progression with common difference in .
It was proved independently by Frantzikinakis et al. [FLW12] and Christ [Chr11] that for and , the random set is intersective with probability , provided . This was improved for all in [FLW16], where it was shown that the same result holds with , though it was conjectured there that suffices for all . Based on Theorem 1.1 we obtain the following result, which improves on the latter bounds.
Theorem 1.3.
For every there exists an such that the following holds. Let be an integer and let
Then, with probability , the set is intersective.
1.3. Locally decodable codes
There is a close connection between the Gaussian widths considered in Theorem 1.1 and special errorcorrecting codes known as locally decodable codes (LDCs). A map is a query LDC if for every and , the value can be retrieved by reading at most coordinates of the codeword , even if the codeword is corrupted in a not too large (but possibly constant) fraction of coordinates. A main open problem is to determine the optimal tradeoff between and when is a fixed constant. Currently this problem is settled only in the cases [KT00, KW04, GKST06] and remains wide open for the case . We refer to the extensive survey [Yek12] for more information on this problem. The connection with Gaussian width was established by the authors and Dvir in [BDG17], where we showed that there is a query LDC from to if and only if there are matchings on of size such that the set has Gaussian width . It was observed there that the bestknown lower bounds on the length of query LDCs—proved using techniques from quantum information theory [KW04]—imply a slightly different but equivalent version of Theorem 1.3 (see Section 5). The proof of Theorem 1.1 is based on ideas from [KW04], but uses a 1974 random matrix inequality of Tomczak–Jaegermann instead of quantum information theory.^{1}^{1}1Not surprisingly, the LDC lower bounds of [KW04] are also implied by Theorem 1.1.
1.4. Gaussian width bounds from type constants
We observe that the Gaussian width in Theorem 1.1 can be bounded in terms of type constants of certain Banach spaces. Unfortunately, we do not have good enough bounds on the type constants of the required spaces to improve Theorem 1.1. But we hope that this connection will motivate progress on understanding these spaces.
A Banach space is said to have (Rademacher) type if there exists a constant such that for every and ,
(4) 
where the expectation is over a uniformly random . The smallest for which (4) holds is referred to as the type constant of , denoted . Type, and its dual notion cotype, play an important role in Banach space theory as they are tightly linked to local geometric properties (we refer to [LT79] and [Mau03] for extensive surveys). Some fundamental facts are as follows. It follows from the triangle inequality that every Banach space has type 1 and from the Khintchine inequality that no Banach space has type . The parallelogram law implies that Hilbert spaces have type 2. An easy but important fact is that fails to have type . Indeed, a famous result of Maurey and Pisier [MP73] asserts that a Banach space fails to have type if and only if it contains uniformly. Finitedimensional Banach spaces have type for all . But of importance to Theorem 1.1 are the actual type constants of a certain family of finitedimensional Banach spaces. Let be such that and let be the space of linear forms on ( times) endowed with the norm
This space is also known as the injective tensor product of for and as such plays an important role in the theory of tensor products of Banach spaces [Rya02]. The relevance of the type constants of this space to Theorem 1.1 is captured by the following lemma, proved in Section 6.
Lemma 1.4.
Let be positive integers and let be hypergraphs with vertex set . Then for any such that and any ,
Observe that the space may be identified with the space of matrices endowed with the spectral norm (or operator norm). A key ingredient in the proof of Theorem 1.1, Theorem 2.2 below, easily implies that the type2 constant of this space is of order . A wellknown lower bound of the same order follows for instance from the connection between Gaussian width and LDCs and a basic construction of a 2query LDC known as the Hadamard code. More generally, lower bounds on the type constants of are implied by query LDCs [BNR12, Bri16].
Acknowledgements
We thank Sean Prendiville, Fernando Xuancheng Shao and Yufei Zhao for helpful discussions. This work was in part carried out while the authors were visiting the Simons Institute during the Pseudorandomness program of Spring 2017 and we thank the institute and organizers for their hospitality.
2. Proof of Theorem 1.1
In this section we prove Theorem 1.1. We begin by giving a highlevel overview of the ideas. The proof is based on a classic random matrix inequality of TomczakJaegermann which bounds the expected operator norm of a sum of matrices weighted by independent standard normal random variables (Theorem 2.2 below). On its own, this inequality easily implies the result for graphs. To treat the general case, we first reduce to the case of uniform matchings for , at a cost of a factor in the number of vertices; a matching is a hypergraph where no two edges intersect. Then we reduce to the case of graphs (unless ) and apply the random matrix inequality. This involves constructing graphs on approximately vertices, with the property that for each it holds that for some constant depending only on (Lemma 2.3). (Switching from Boolean vectors to sign vectors can only make the Gaussian width larger.) To illustrate how these graphs are constructed, we consider a 4matching on vertices and let . It follows from the Birthday Paradox and symmetry that the number the strings in containing at least two elements of a given edge is . We let be the graph with vertex set with the edges formed by the strings that “cover” some edge in and “complement” each other, meaning: there are indices such that and for all . The fold tensor product of a vector is given by . If is an edge in and , it then follows that . It can then be observed that , modulo the relations , is a linear combination of the monomials appearing in . A more careful analysis shows that the two evaluations are in fact related by a constant factor.
To make the above precise, we first collect some basic facts about hypergraphs. The edge chromatic number of a hypergraph , denoted by , is the minimum number of colors needed to color the edges of such that no two edges which intersect have the same color. Note that equals the smallest number of matchings into which can be partitioned. For small values of , the parameters and are closely related.
Lemma 2.1.
Let be an hypergraph. Then,
Proof.
Clearly since edges containing a maximum degree vertex should get different colors. To prove the upper bound, form a graph whose vertices are , and add edges between intersecting hypergraph edges. Then is equal to the vertex chromatic number of the graph , which, by Brooks’ Theorem, is at most . Since an edge in can intersect at most other edges, . ∎
The proof of Theorem 1.1 uses a noncommutative Khintchine inequality, which is a special case of a result of TomczakJaegermann [TJ74, Theorem 3.1]. Let be the standard inner product on and denote by the Euclidean unit ball in . Given a matrix , its operator norm (or spectral norm) is given by .
Theorem 2.2 (TomczakJaegermann).
There exists an absolute constant such that the following holds. Let be a collection of matrices and let be independent Gaussian random variables with mean zero and variance 1. Then,
To apply Theorem 2.2 we use the following matrix lemma, proved in the next section.
Lemma 2.3.
For every there exist a and such that the following holds. Let , and . Let be a uniform hypergraph and let be the polynomial as in (1). Then, there exists a matrix such that and for every ,
Moreover, is the adjacency matrix of a graph (with possible parallel edges).
of Theorem 1.1.
Assume as in Lemma 2.3. Let and . We start by reducing to the setting where are uniform of degree at most . Given an hypergraph over with degree at most , we first add new vertices to edges with less than vertices to make all the edges have size . By grouping the edges into sets of size at most and using the same new vertices for all edges in the same group, we can obtain a new hypergraph on vertices which is uniform and whose degree is still at most . In terms of the polynomials, we are homogenizing the polynomial with new variables to a get a new multilinear polynomial of degree where are the new variables added. Clearly, when the variables are set to 1. So,
Since our claimed bound on the Gaussian width is , the extra vertices will result in at most an extra factor . It thus suffices to prove the theorem for the case where are uniform. Also observe that since the polynomials are multilinear, the Gaussian width is bounded from above by replacing binary vectors with sign vectors
Let and and for each , let be a matrix for as in Lemma 2.3. Then, for every ,
Hence, by Theorem 2.2,
where in the last line we used that for each . ∎
3. Matrix lemma
Here we prove Lemma 2.3. Let be a maximal family of disjoint sets of . Let . Given a string write its fold tensor product as
Given a mapping and set , let
Note that this is a count of the subsets such that . Denote
For , say that is good if . Say that complements if it satisfies the following two criteria:

There exists exactly one such that .

For all , we have .
If complements then clearly the converse also holds. Say that the complementary pair covers if . Observe that if covers , then for every , we have
(5) 
Define the set of ordered pairs
(6) 
Proposition 3.1.
Let be as in (6). Then, for every , the number of pairs that cover equals .
Proof.
Fix distinct sets and let be a permutation such that and for all . Let be the set of pairs which cover and define similarly. We claim that the map is an injective map from to . It follows that is covered by at least as many pairs from as is. Similarly, interchanging and , the converse also holds. To prove the claim, note that if covers , then covers . Moreover, because maps edges of the matching to edges of . Thus . Finally is injective because if for some , then . Hence covers all equally. ∎
Proposition 3.2.
For every , we have that is good.
Proof.
Let and be such that covers . Consider the histograms given by and for each . Then and differ only in . In particular, there is an set such that for each and for each . Hence,
For all other , we have . Moreover, must be good for to belong to . It follows that
where in the last line we used the choice of . ∎
Lemma 3.3 (Generalized birthday paradox).
For every there exists a and an such that the following holds. Let be a uniformly distributed random variable over the set of maps from to . Then, provided and ,
Corollary 3.4.
Let be as in (6) and let be its incidence matrix, that is . Then, and every row and every column of has at most ones.
Proof.
of Lemma 2.3.
By Lemma 2.1, the hypergraph can be decomposed into matchings, which we denote by . Complete each to a maximal family of disjoint subsets of in some arbitrary way. For each , let be as in (6) and let be its incidence matrix. Set to zero all the entries of that correspond to a pair covering a set in . Let and . It follows from (5) and Proposition 3.1 that for every , we have
(7) 
Since all are maximal, they have the same size, as do the . Hence, by Corollary 3.4, there exists a constant such that the righthand side of (7) equals . Let be the graph with adjacency matrix , allowing for parallel edges. Then has degree at most and, it follows from Lemma 2.1 that can be partitioned into matchings. Since the adjacency matrix of a matching has unit norm, we get that . ∎
3.1. Proof of the generalized birthday paradox.
For the proof of Lemma 3.3, we use a standard Poisson approximation result for “balls and bins” problems [MU05, Theorem 5.10]. A discrete Poisson random variable with expectation is nonnegative, integer valued, and has probability density function
(8) 
Proposition 3.5.
If are independent Poisson random variables with expectations , respectively, then is a Poisson random variable with expectation .
Lemma 3.6.
Let be a uniformly distributed map from to . For each , let and let . Let be a vector of independent Poisson random variables with expectation . Then, for any nonnegative function such that decreases or increases monotonically with , we have
of Lemma 3.3.
Let be a parameter depending only on to be set later. Let and assume that . For a random map as in Lemma 3.6, we begin by lower bounding the probability of the event that . Recall that this occurs if there exists an and an subset such that . Let be as in Lemma 3.6. Let be the function
Then if and decreases monotonically with . Hence, for a Poisson random vector as in Lemma 3.6, we have
(9) 
where in the last line we used the fact that since the sets are disjoint, the random variables
are independent. The random variables , , are independent Bernoullis that are zero with probability . The expectation in (9) equals the probability that these random variables form a string of Hamming weight strictly less than . Using that and the fact that when , this probability is at most
where is some fixed subset of size . Hence, since is maximal, the above and (9) give
(10) 
Set , then the above righthand side is at most . Next, we upper bound the probability that . Define by
Then, . Moreover, increases monotonically with . It thus follows from Lemma 3.6 that
where in the second line we used the fact that the are independent. By Markov’s inequality, . With (10), we get that is good with probability at least . ∎
4. Arithmetic progressions in random sets
Below we state a special case of Eldan’s LDP [Eld16], similar to how it is stated in [BGSZ16]. Consider a multilinear polynomial with zero constant term. The discrete Lipschitz constant of is given by
where is the gradient of . For , define
For a vector , let be a random vector of independent random variables . Define by
(11) 
Theorem 4.1 (Eldan).
Let be a vector of independent Bernoulli random variables and let be a multilinear form with zero constant term. Let be real numbers such that . Then,
where
(12) 
Moreover, if , then
Theorem 1.1 can be applied to get an upper bound on the parameter given by (12) when is a polynomial given in terms of a hypergraph with the property that only few edges are incident on any two vertices. For example, if the edges are the (unordered) term arithmetic progressions in with nonzero common difference, then any pair of distinct vertices appears in at most edges. Indeed, if and both belong to an term AP, then there is a step count and common difference such that . This leaves at most possibilities for and possibilities for the position of in the AP.
Proposition 4.2.
Let be positive integers. Let be a hypergraph such that at most edges are incident on any given pair of vertices. Then, and