Simple Analysis of Sparse, Sign-Consistent JL

Simple Analysis of Sparse, Sign-Consistent JL

Meena Jagadeesan
Abstract.

Allen-Zhu, Gelashvili, Micali, and Shavit constructed a sparse, sign-consistent Johnson Lindenstrauss distribution, and proved that this distribution yields an essentially optimal dimension for the correct choice of sparsity. However, their analysis of the upper bound on the dimension and sparsity required a complicated combinatorial graph-based argument similar to Kane and Nelson’s analysis of sparse JL. We present a simple, combinatorics-free analysis of sparse, sign-consistent JL that yields the same dimension and sparsity upper bounds as the original analysis. Our proof also yields dimension/sparsity tradeoffs, which were not previously known.

As with previous proofs in this area, our analysis is based on applying Markov’s inequality to the th moment of an error term that can be expressed as a quadratic form of Rademacher variables. Interestingly, we show that, unlike in previous work in the area, the traditionally used Hanson-Wright bound is not strong enough to yield our desired result. Indeed, although the Hanson-Wright bound is known to be optimal for gaussian degree-2 chaos, it was already shown to be suboptimal for Rademachers. Surprisingly, we are able to show a simple moment bound for quadratic forms of Rademachers that is sufficiently tight to achieve our desired result, which given the ubiquity of moment and tail bounds in theoretical computer science, is likely to be of broader interest.

Harvard University. mjagadeesan@college.harvard.edu. Supported in part by a Harvard PRISE fellowship, Herchel-Smith Fellowship, and an REU supplement to NSF IIS-1447471.

1. Introduction

In many modern algorithms, which process high dimensional data, it is beneficial to pre-process the data through a dimensionality reduction scheme that preserves the geometry of the data. Such dimensionality reduction schemes have been applied in streaming algorithms[15] as well as algorithms in numerical linear algebra[22], graph sparsification[19], and many other areas.

The geometry-preserving objective can be expressed mathematically as follows. The goal is to construct a probability distribution over real matrices that satisfies the following condition for any :

(1)

An upper bound on the dimension achievable by a probability distribution that satisfies is given in the following lemma, which is a central result in the area of dimensionality reduction:

Lemma 1.1 (Johnson-Lindenstrauss[9]).

For any positive integer and parameters , there exists a probability distribution over real matrices with that satisfies (1).

The optimality of the dimension achieved by Lemma 1.1 was recently proven in [10, 8].

For many applications of dimensionality reduction schemes, it can be useful to consider probability distributions over sparse matrices111Sparsity refers to the constraint that there are a small number of nonzero entries in each column. in order to speed up the projection time. In this context, Kane and Nelson[11] constructed a sparse JL distribution and proved the following:

Theorem 1.2 (Sparse JL[11]).

For any positive integer and , there exists a probability distribution over real matrices with and sparsity that satisfies (1).

Notice that this probability distribution, even with its sparsity guarantee, achieves the same dimension as Lemma 1.1. The proof of Theorem 1.2 presented in [11] involved complicated combinatorics; however, Nelson[16] recently constructed a simple, combinatorics-free proof of this result.

Neuroscience-based constraints give rise to the additional condition of sign-consistency222 Sign-consistency refers to the constraint that the nonzero entries of each column are either all positive or all negative. on the matrices in the probability distribution. The relevance of dimensionality reduction schemes in neuroscience is described in a survey by Ganguli and Sompolinsky[4]. In convergent pathways in the brain, information stored in a massive number of neurons is compressed into a small number of neurons, and nonetheless the ability to perform the relevant computations is preserved. Modeling this information compression scheme requires a hypothesis regarding what properties of the original information must be accurately transmitted to the receiving neurons. A plausible minimum requirement is that convergent pathways preserve the similarity structure of neuronal representations at the source area333This requirement is based on the experimental evidence that semantically similar objects in higher perceptual or association areas in the brain elicit similar neural activity patterns[12] and on the hypothesis that the similarity structure of the neural code is the basis of our ability to categorize objects and generalize appropriate responses to new objects[18]..

It remains to select the appropriate mathematical measure of similarity. The candidate similarity measure considered in [4] is vector inner product, which conveniently gives rise to a model based on the JL distribution444It is not difficult to see that for vectors and in the unit ball, a -approximation of , , and implies an additive error approximation of the inner product .. Suppose there are “input” neurons at a source area and “output” neurons at a target area. In this framework, the information at the input neurons is represented as a vector in , the synaptic connections to output neurons are represented as a matrix (with th entry corresponding to the strength of the connection between input neuron and output neuron ), and the information received by the output neurons is represented as a vector in . The similarity measure between two vectors of neural information being taken to be motivates modeling a synaptic connectivity matrix as a random matrix drawn from a probability distribution that satisfies (1). Certain constraints on synaptic connectivity matrices arise from the biological limitations of neurons: the matrices must be sparse since a neuron is only connected to a small number (e.g. a few thousand) of postsynaptic neurons and sign-consistent since a neuron is usually purely excitatory or purely inhibitory.

This biological setting motivates the purely mathematical question: What is the optimal dimension and sparsity that can be achieved by a probability distribution over sparse, sign-consistent matrices that satisfies (1)? Related mathematical work includes, in addition to sparse JL[11], a construction of a dense, sign-consistent JL distribution[17, 5]. In [9], Allen-Zhu, Gelashvili, Micali, and Shavit constructed a sparse, sign-consistent JL distribution and proved the following upper bound:

Theorem 1.3 (Sparse, sign-consistent JL[1]).

For every , and , there exists a probability distribution over real, sign-consistent matrices with and sparsity .

In [1], it was also proven that the additional factor on is essentially necessary: specifically, any distribution over real, sign-consistent matrices satisfying requires . In order to achieve an (essentially) matching upper bound on , the proof of Theorem 1.3 in [1] involved complicated combinatorics even more delicate than in the analysis of sparse JL in [11].

We present a simpler, combinatorics-free proof of Theorem 1.3. Our proof also yields dimension/sparsity tradeoffs, which were not previously known555Cohen[2] showed a similar dimension/sparsity tradeoff for sparse JL as a corollary of his combinatorics-free analysis of oblivious subspace embeddings. It is not known, for either sparse JL or sparse, sign-consistent JL, how to obtain such a tradeoff via the combinatorial approaches of [11] or [1].:

Theorem 1.4.

For every , , and , there exists a probability distribution over real, sign-consistent matrices with and sparsity .

Notice Theorem 1.3 is recovered if .

As in [1, 11, 16], our analysis is based on applying Markov’s inequality to the th moment of an error term. Like in the combinatorics-free analysis of sparse JL in [16], we express this error term as a quadratic form of Rademachers (uniform random variables), and our analysis then boils down to analyzing the moments of this quadratic form. While the analysis in [16] achieves the optimal dimension for sparse JL using an upper bound on the moments of quadratic forms of subgaussians due to Hanson and Wright[6], we give a counterexample in Section 3.2 that shows that the Hanson-Wright bound is too loose in the sign-consistent setting to result in the optimal dimension. Since the Hanson-Wright bound is tight for quadratic forms of gaussians, we thus require a separate treatment of quadratic forms of Rademachers. We construct a simple bound on moments of quadratic forms of Rademachers that, unlike the Hanson-Wright bound, is sufficiently tight in our setting to prove Theorem 1.4. Our bound borrows some of the ideas from Latała’s tight bound on the moments of quadratic forms of Rademachers[14]. Although our bound is much weaker than the bound in [14] in the general case, it has the advantage of providing a greater degree of simplicity, while still retaining the necessary precision to recover the optimal dimension for sparse, sign-consistent JL.

1.1. A digression on Rademachers versus gaussians

The concept that drives our moment bound can be illustrated in the linear form setting. Suppose are i.i.d Rademachers, is a vector in such that , and . We use the notation to denote that there exist positive universal constants such that , and the notation to denote that there exists a positive universal constant such that . The Khintchine inequality, which is tight for linear forms of gaussians, yields the -norm bound . However, this bound can’t be a tight bound on for the following reason: As , the quantity goes to infinity, while for any , the quantity is bounded by . Surprisingly, a result due to Hitczenko[7] indicates that the tight bound is actually the following combination of the and norm bounds:

In this bound, the “big” terms (i.e. terms involving , , …, ) are handled with an -norm bound, while the remaining terms are approximated as gaussians and bounded with an -norm bound.

Our quadratic form bound is based on a degree-2 analog of this technique. We analogously handle the “big” terms with an -norm bound and bound the remaining terms by approximating some of the Rademachers by gaussians. From this, we obtain a combination of and norm bounds, like in the linear form setting. Our simple bound has the surprising feature that it yields tighter guarantees than the Hanson-Wright bound yields for our error term. For this reason, we believe that it is likely to be of interest in other theoretical computer science settings involving moments or tail bounds of Rademacher forms.

1.2. Outline for the rest of the paper

In Section 2, we describe the construction and analysis of [1] for sparse, sign-consistent JL. In Section 3, we present Nelson’s combinatorics-free approach for sparse JL that uses the Hanson-Wright bound, and we discuss why this approach does not yield the optimal dimension in the sign-consistent setting. In Section 4, we derive our bound on the moments of quadratic forms of Rademachers and use this bound to construct a combinatorics-free proof of Theorem 1.4.

2. Existing Analysis for Sparse, Sign-Consistent JL

In Section 2.1, we describe how to construct the probability distribution of sparse, sign-consistent matrices analyzed in Theorem 1.3. In Section 2.2, we briefly describe the combinatorial proof of Theorem 1.3 in [1].

2.1. Construction of Sparse, Sign-Consistent JL

The entries of a matrix are generated as follows. Let where and are defined as follows:

  • The families and are independent from each other.

  • The variables are i.i.d Rademachers (uniform random variables).

  • The variables are identically distributed Bernoulli random variables (random variables with support ) with expectation .

  • The are independent across columns but not independent within each column. For every column , it holds that . For every subset and every column , it holds that . (One common definition of that satisfies these conditions is the distribution defined by uniformly choosing exactly of these variables per column to be a .)

For every such that , we need to analyze an error term, which for this construction is the following random variable:

Proving that satisfies boils down to proving that . The main technique to prove this tail bound is the moment method. Bounding a large moment of is useful since it follows from Markov’s inequality that

The usual approach, used in the analyses in [1, 11, 16] as well as in our analysis, is to take to be an even integer and analyze the -norm of the error term.

2.2. Discussion of the combinatorial analysis of [1]

In the analysis in [1], a complicated combinatorial argument was used to prove the following lemma, from which Theorem 1.3 follows:

Lemma 2.1 ([1]).

If and , then .

The argument in [1] to prove Lemma 2.1 was based on expanding into a polynomial with roughly terms, establishing a correspondence between the monomials and the multigraphs, and then doing combinatorics to analyze the resulting sum. The approach of mapping monomials to graphs is commonly used in analyzing the eigenvalue spectrum of random matrices [21, 3]. This approach was also used in [11] to analyze sparse JL. The analysis in [1] borrowed some methods from the analysis in [11]; however, the additional correlations between the Rademachers imposed by sign-consistency forced the analysis in [1] to require more delicate manipulations at several stages of the computation.

The expression to be analyzed was

After layers of computation, it was shown that

where is a set of directed multigraphs with labeled vertices and labeled edges, where is the total degree of vertex in a graph , and where and are defined by and the edge colorings . The problem then boiled down to carefully enumerating the graphs in in six stages and analyzing the resulting expression.

3. Discussion of Combinatorics-Free Approaches

The main ingredient of the combinatorics-free approach for sparse JL in [16] is the Hanson-Wright bound on the moments of quadratic forms of subgaussians. In Section 3.1, we discuss the approach in [16]. In Section 3.2, we discuss why this approach, if applied directly to sparse, sign-consistent JL, fails to yield the optimal dimension.

3.1. Hanson-Wright approach for sparse JL in [16]

The relevant random variable for sparse JL is

where the independent Rademachers from the sign-consistent case are replaced by the independent Rademachers . The main idea in [16] was to view as a quadratic form . Here, is a -dimensional vector of independent Rademachers and is a symmetric, zero diagonal, block diagonal matrix with blocks of size , where the th entry (for ) of the th block is . The quantity was analyzed using the Hanson-Wright bound[6]:

Lemma 3.1 (Hanson-Wright[6]).

Let be a -dimensional vector of independent subgaussians, and let be a symmetric matrix with zero diagonal. Then, for any ,

In order to bound , since is a random matrix whose entries depend on the values, an expectation had to be taken over in the expression given by the Hanson-Wright bound. This resulted in the following:

(2)

The analysis then boiled down to bounding the RHS of (2).

3.2. Failure of the Hanson-Wright approach for sparse, sign-consistent JL

The approach for sparse JL in [16] using the Hanson-Wright bound cannot be directly applied to the sign-consistent case to obtain a tight bound on . The loss arises from the fact that while the Hanson-Wright bound (Lemma 3.1) is tight for quadratic forms of gaussians, it is not guaranteed to be tight for quadratic forms of Rademachers. We give a counterexample, i.e. a vector , that shows that the Hanson-Wright bound is too loose to give the optimal dimension for the sign-consistent case (when the random variables have the distribution defined by uniformly choosing exactly of the variables per column to be a ). The details of our construction are given in Appendix C.

4. Simple Proof of Theorem 1.4

The main ingredient in our combinatorics-free proof of Theorem 1.4 is the following bound on :

Lemma 4.1.

Let . If , then

Theorem 1.4 follows from Lemma 4.1 via Markov’s inequality, as we show in Section 4.4.

In order to analyze , we view as a quadratic form , where the vector is an -dimensional vector of independent Rademachers, and is a symmetric, zero-diagonal matrix where the th entry (for ) is . Since is symmetric in , we can assume WLOG that . For convenience, we define, like in [16],

(3)

to be the number of collisions between the nonzero entries of the th column and the nonzero entries of the th column. Now, the th entry (for ) can be rewritten as .

Our method to prove Lemma 4.1 resolves the issues imposed by directly applying the approach in [16]. We derive the following moment bound on quadratic forms of Rademachers666Latała[14] provides a tight bound on the moments of , and, in fact, on the moments of quadratic forms of much more general random variables. However, his proof is quite complicated, and his bound is more difficult to apply to this setting, though the bound can be used in a black box to generate a slightly messier solution. Our simplified bound, though much weaker in the general case, is sufficiently tight in this setting and has a cleaner proof. that yields tighter guarantees than the Hanson-Wright bound yields for :

Lemma 4.2.

If is a symmetric square matrix s and zero diagonal, is a set of independent Rademachers, and , then

We defer our proof of Lemma 4.2 to Section 4.1. Applying Lemma 4.2 coupled with an expectation over the values yields the following bound on :

(4)

We first discuss some intuition for why using avoids the issues of using the Hanson-Wright bound. In the Hanson-Wright bound, all of the Rademachers are essentially approximated by gaussians. In , we instead make use of an -norm bound for and (the upper left minor where the and values are the largest), which avoids the loss incurred in our setting by approximating the Rademachers in this range by gaussians. Since the original matrix is symmetric, it only remains to consider and . In this range, we approximate the Rademachers by gaussians and use an -norm bound. Approximating the Rademachers by gaussians as well would yield too loose of a bound for our application, so we preserve the Rademachers. For the remaining Rademacher linear forms, the interaction between the values (all of which are upper bounded in magnitude by ) and the Rademachers yields the desired bound.

In order to prove Lemma 4.1, it remains to prove Lemma 4.2 as well as bound and . In Section 4.1, we prove Lemma 4.2. The main ingredients in our analysis of and are moment bounds on sums of independent random variables. In Section 4.2, we present these moment bounds. In Section 4.3, we use these moment bounds to analyze and , and then finish our proof of Lemma 4.1. In Section 4.4, we show how Lemma 4.1 implies Theorem 1.4.

4.1. Proof of Lemma 4.2

We use the following standard lemmas in our proof of Lemma 4.2.

The first lemma allows us to decouple the two sets of Rademachers in our quadratic form so that we can reduce analyzing the moments of the quadratic form to analyzing the moments of a linear form.

Lemma 4.3 (Decoupling, Theorem 6.1.1 of [20]).

If is a symmetric, zero-diagonal matrix and are independent Rademachers, then

The next lemma is due to Khintchine and gives an -norm bound on linear forms of Rademachers. Since the Khintchine bound is derived from approximating by i.i.d gaussians, we only use this bound outside of the delicate upper left minor of our matrix .

Lemma 4.4 (Khintchine).

If are independent Rademachers, then for all and ,

Now, we are ready to prove Lemma 4.2.

Proof of Lemma 4.2.

By Lemma 4.3 and the triangle inequality, we know

We first bound . Since a Rademacher satisfies , it follows that as desired. We now bound using Lemma 4.4 to obtain

We now bound . From an analogous computation, it follows that , which implies that . ∎

4.2. Useful Moment Bounds

The main tools that we use in analyzing moments are the following bounds for moments of sums of nonnegative random variables and sums of symmetric random variables due to Latała[13]. Although the bounds in [13] are tight, we only use the upper bounds in our proof of Theorem 1.4. The proofs of these bounds in [13] are not complicated, and for completeness, we sketch proofs of the upper bounds in Appendix A.

Lemma 4.5 ([13]).

If and are independent symmetric random variables, then

Lemma 4.6 ([13]777This result was actually first due to S.J. Montgomery-Smith through a private communication with Latała. However, it is also a corollary of a result in [13].).

If and are i.i.d nonnegative random variables, then

We use Lemma 4.5 in analyzing a Rademacher linear form in . We use Lemma 4.6 to obtain the following bound on moments of binomial random variables, which we use in analyzing both and . We defer the proof of this bound to Appendix D.

Proposition 4.7.

Suppose that is a random variable distributed as for any and any integer . If and , then

4.3. Bounding () and () to prove Lemma 4.1

We bound the quantities () and () in the following sublemmas, which assume the notation used throughout the paper:

Lemma 4.8.

If , then

Lemma 4.9.

If , then

Both proofs use the following properties of the random variables.

Proposition 4.10.

Let be a random variable distributed as . For a fixed , the set of random variables for are independent. For any and any ,

Proof.

Let be a matrix drawn from , and pick any . Suppose the nonzero entries in column of occur at rows . For and , let be the indicator variable for whether the th row of the th column of is nonzero, so that . To prove the first statement, notice that the sets of random variables for are independent from each other, which means random variables in the set are independent. We now prove the second statement. For , and , let be distributed as i.i.d Bernoulli random variables with expectation . Notice that for a fixed , the random variables in are negatively correlated (and nonnegative), which means

We first prove Lemma 4.8.

Proof of Lemma 4.8.

Naively applying the triangle inequality yields a suboptimal bound, so we must more carefully analyze this expression. We know

By Proposition 4.10, we know where is distributed as . The result now follows from Proposition 4.7. ∎

We defer the proof of Lemma 4.9 to Appendix B (this proof boils down to a relatively short computation involving Lemma 4.5). We show Lemma 4.1 follows from the sublemmas introduced thus far:

Proof of Lemma 4.1.

By , we know

Applying Lemma 4.8 and Lemma 4.9 gives us the desired result. ∎

4.4. Proof of Theorem 1.4

We show Lemma 4.1 implies Theorem 1.4.

Proof of Theorem 1.4.

It suffices to show . By Markov’s inequality, we know

Suppose that . Then by Lemma 4.1, we know

Thus, to upper bound this quantity by , we can set and . We impose the additional constraint that to guarantee that . This proves the desired result888If we set , by Lemma 4.1, we know that in order to obtain an upper bound of , we would have to set and . This yields no better or values than those achieved when .. ∎

5. Acknowledgments

I would like to thank Prof. Jelani Nelson for proposing and advising this project.

References

  • [1] Z. Allen-Zhu, R. Gelashvili, S. Micali, and N. Shavit. Sparse sign-consistent Johnson–Lindenstrauss matrices: Compression with neuroscience-based constraints. In Proceedings of the National Academy of Sciences, volume 111, pages 16872–16876, 2014.
  • [2] M. B. Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 278–287, 2016.
  • [3] Z. Füredi and J. Komlós. The eigenvalues of random symmetric matrices. Combinatorica, 62:233–241, 1981.
  • [4] S. Ganguli and H. Sompolinsky. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annual Review of Neuroscience, 35:485–508, 2012.
  • [5] R.T. Gray and P.A. Robinson. Stability and structural constraints of random brain networks with excitatory and inhibitory neural populations. Journal of Computational Neuroscience, 27(1):81–101, 2009.
  • [6] D. L. Hanson and F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics, 42(3):1079–1083, 1971.
  • [7] P. Hitczenko. Domination inequality for martingale transforms of Rademacher sequence. Israel Journal of Mathematics, 84:161–178, 1993.
  • [8] T.S. Jayram and D. P. Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and steaming problems with subconstant Error. In ACM Transactions on Algorithms (TALG) - Special Issue on SODA’11, volume 9, pages 1–26, 2013.
  • [9] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189–206, 1984.
  • [10] D. M. Kane, R. Meka, and J. Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Proceedings of the Fourteenth International Workshop and Fifteenth International Conference on Approximation, Randomization, and Combinatorial Optimization: Algorithms and Techniques, pages 628–639, 2011.
  • [11] D. M. Kane and J. Nelson. Sparser Johnson-Lindenstrauss transforms. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 16872–16876. ACM Press, 2012.
  • [12] R. Kiani, H. Esteky, K. Mirpour, and K. Tanaka. Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97:4296–4309, 2007.
  • [13] R. Latała. Estimation of moments of sums of independent real random variables. Annals of Probability, 25(3):1502–1513, 1997.
  • [14] R. Latała. Tail and moment estimates for some types of chaos. Studia Mathematica, 135(1):39–53, 1999.
  • [15] S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1, 2005.
  • [16] J. Nelson. Dimensionality Reduction - Notes 2. Available at http://people.seas.harvard.edu/~minilek/madalgo2015/notes2.pdf, 2015.
  • [17] K. Rajan and L.F. Abbot. Eigenvalue spectra of random matrices for neural networks. Physical Review Letters, 97:188104, 2006.
  • [18] T. Rogers and J. McClelland. Semantic Cognition: A Parallel Distributed Processing Approach. MIT Press, 2004.
  • [19] D. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM Journal on Computing (SICOMP), 40:1913–1926, 2011.
  • [20] R. Vershynin. High-Dimensional Probability. Cambridge University Press, To appear. Available at https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html.
  • [21] E.P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62:548–564, 1955.
  • [22] D.P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10:1–157, 2014.

Appendix A Proof Sketches of Latała’s Moment Bounds[13]

We sketch the proofs of the upper bounds of Lemma 4.5 and Lemma 4.6. Full proofs of these lemmas can be found in [13]. For a random variable , we define

We begin with the following proposition that relates to the -norm, which is useful in proving Lemma 4.6 and Lemma 4.5.

Proposition A.1.

If independent random variables and value satisfy the following inequality for any :

then

Proof.

Suppose that . Then,

The proofs of the upper bounds of Lemma 4.5 and Lemma 4.6 boil down to showing that the condition of Proposition A.1 is satisfied. In Section A.1, we sketch a proof of the upper bound of Lemma 4.6. In Section A.2, we sketch a proof of the upper bound of Lemma 4.5.

a.1. Proof Sketch of Lemma 4.5 (Upper Bound) [13]

We first state the following two propositions. The proof of these propositions are straightforward calculations and can be found in [13].

Proposition A.2.

If and are independent symmetric random variables, then

Proposition A.3.

If and is a symmetric random variable, then

From Proposition A.2 and Proposition A.3, coupled with the fact that , and are independent symmetric random variables, it follows that the condition of Proposition A.1 is satisfied.

a.2. Proof Sketch of Lemma 4.6 (Upper Bound)

We first sketch the proof of the upper bound of the following sublemma, which is analogous to Lemma 4.5:

Lemma A.4 (Latała[13]).

If and are independent nonnegative random variables then

The upper bound of Lemma A.4 follows from the following propositions, coupled with the fact that , , …, , and are independent nonnegative random variables. The proofs of these propositions are straightforward calculations and can be found in [13].

Proposition A.5.

If and are independent nonnegative random variables, then

Proposition A.6.

If and is a nonnegative random variable, then

Now, we describe how the upper bound of Lemma A.4 implies the upper bound of Lemma 4.6. It suffices to show if we take

then

(5)

Since , it follows that

from which (5) follows.

Appendix B Proof of Lemma 4.9

The main tool that we use in this proof is Lemma 4.5.

Proof of Lemma 4.9.

We know

Approximating the by gaussians yields a suboptimal bound, so we must more carefully analyze this sum using Lemma 4.5. By Proposition 4.10, we can apply Lemma 4.5 to obtain:

Thus, it suffices to show

satisfies . We see

By Proposition 4.10 and Proposition 4.7, we know if that there exists a universal constant such that . Thus, we obtain

Since , if we set , then we obtain

as desired. An analogous argument shows that if , we can set . ∎

Appendix C Weakness of bound on from

Like in Section 4, we view the random variable as a quadratic form , where an -dimensional vector of independent Rademachers and is a symmetric, zero-diagonal matrix where the th entry (for ) is . Applying Lemma 3.1 followed by an expectation over the values yields