A Proof of Lemma 14

# Random Tensors and Planted Cliques

## Abstract

The -parity tensor of a graph is a generalization of the adjacency matrix, where the tensor’s entries denote the parity of the number of edges in subgraphs induced by distinct vertices. For , it is the adjacency matrix with ’s for edges and ’s for nonedges. It is well-known that the -norm of the adjacency matrix of a random graph is . Here we show that the -norm of the -parity tensor is at most , answering a question of Frieze and Kannan [3] who proved this for . As a consequence, we get a tight connection between the planted clique problem and the problem of finding a vector that approximates the -norm of the -parity tensor of a random graph. Our proof method is based on an inductive application of concentration of measure.

## 1 Introduction

It is well-known that a random graph almost surely has a clique of size and a simple greedy algorithm finds a clique of size . Finding a clique of size even for some in a random graph is a long-standing open problem posed by Karp in 1976 [6] in his classic paper on probabilistic analysis of algorithms.

In the early nineties, a very interesting variant of this question was formulated by Jerrum [5] and by Kucera [7]. Suppose that a clique of size is planted in a random graph, i.e., a random graph is chosen and all the edges within a subset of vertices are added to it. Then for what value of can the planted clique be found efficiently? It is not hard to see that suffices since then the vertices of the clique will have larger degrees than the rest of the graph, with high probability [7]. This was improved by Alon et al [1] to using a spectral approach. This was refined by McSherry [8] and considered by Feige and Krauthgamer in the more general semi-random model [2]. For , the following simple algorithm works: form a matrix with ’s for edges and ’s for nonedges; find the largest eigenvector of this matrix and read off the top entries in magnitude; return the set of vertices that have degree at least within this subset.

The reason this works is the following: the top eigenvector of a symmetric matrix can be written as

 maxx:∥x∥=1xTAx=maxx:∥x∥=1∑ijAijxixj

maximizing a quadratic polynomial over the unit sphere. The maximum value is the spectral norm or -norm of the matrix. For a random matrix with entries, the spectral norm (largest eigenvalue) is . In fact, as shown by Füredi and Komlós [4, 9], a random matrix with i.i.d. entries of variance at most has the same bound on the spectral norm. On the other hand, after planting a clique of size times a sufficient constant factor, the indicator vector of the clique (normalized) achieves a higher norm. Thus the top eigenvector points in the direction of the clique (or very close to it).

Given the numerous applications of eigenvectors (principal components), a well-motivated and natural generalization of this optimization problem to an -dimensional tensor is the following: given a symmetric tensor with entries , find

 ∥A∥2=maxx:∥x∥=1A(x,…,x),

where

 A(x(1),…,x(r))=∑i1i2…irAi1i2…irx(1)i1x(2)i2…x(r)ir.

The maximum value is the spectral norm or -norm of the tensor. The complexity of this problem is open for any , assuming the entries with repeated indices are zeros.

A beautiful application of this problem was given recently by Frieze and Kannan [3]. They defined the following tensor associated with an undirected graph :

 Aijk=EijEjkEki

where is is and otherwise, i.e., is the parity of the number of edges between present in . They proved that for the random graph , the -norm of the random tensor is , i.e.,

 supx:∥x∥=1∑i,j,kAijkxixjxk≤C√nlogcn

where are absolute constants. This implied that if such a maximizing vector could be found (or approximated), then we could find planted cliques of size as small as times polylogarithmic factors in polynomial time, improving substantially on the long-standing threshold of .

Frieze and Kannan ask the natural question of whether this connection can be further strengthened by going to -dimensional tensors for . The tensor itself has a nice generalization. For a given graph the -parity tensor is defined as follows. Entries with repeated indices are set to zero; any other entry is the parity of the number of edges in the subgraph induced by the subset of vertices corresponding to the entry, i.e.,

 Ak1,…,kr=∏1≤i

Frieze and Kannan’s proof for is combinatorial (as is the proof by Füredi and Komlós for ), based on counting the number of subgraphs of a certain type. It is not clear how to extend this proof.

Here we prove a nearly optimal bound on the spectral norm of this random tensor for any . This substantially strengthens the connection between the planted clique problem and the tensor norm problem. Our proof is based on a concentration of measure approach. In fact, we first reprove the result for using this approach and then generalize it to tensors of arbitrary dimension. We show that the norm of the subgraph parity tensor of a random graph is at most whp. More precisely, our main theorem is the following.

###### Theorem 1.

There is a constant such that with probability at least the norm of the -dimensional subgraph parity tensor for the random graph is bounded by

 ∥A∥2≤Cr1r(5r−1)/2√nlog(3r−1)/2n.

The main challenge to the proof is the fact that the entries of the tensor are not independent. Bounding the norm of the tensor where every entry is independently or with probability is substantially easier via a combination of an -net and a Hoeffding bound. In more detail, we approximate the unit ball with a finite (exponential) set of vectors. For each vector in the discretization, the Hoeffding inequality gives an exponential tail bound on . A union bound over all points in the discretization then completes the proof. For the parity tensor, however, the Hoeffding bound does not apply as the entries are not independent. Moreover, all the entries of the tensor are fixed by just the edges of the graph. In spite of this heavy inter-dependence, it turns out that does concentrate. Our proof is inductive and bounds the norms of vectors encountered in a certain decomposition of the tensor polynomial.

Using Theorem 1, we can show that if the norm problem can be solved for tensors of dimension , one can find planted cliques of size as low as . While the norm of the parity tensor for a random graph remains bounded, when a clique of size is planted, the norm becomes at least (using the indicator vector of the clique). Therefore, only needs to be a little larger than in order for the the clique to become the dominant term in the maximization of . More precisely, we have the following theorem.

###### Theorem 2.

Let be random graph with a planted clique of size , and let be the -parity tensor for . For , let be the time to compute a vector such that whp. Then, for such that

 n≥p>C0α−2r5n1/rlog3n,

the planted clique can be recovered with high probability in time , where is a fixed constant.

On one hand, this highlights the benefits of finding an efficient (approximation) algorithm for the tensor problem. On the other, given the lack of progress on the clique problem, this is perhaps evidence of the hardness of the tensor maximization problem even for a natural class of random tensors. For example, if finding a clique of size is hard, then by setting we see that even a certain polynomial approximation to the norm of the parity tensor is hard to achieve.

###### Corollary 3.

Let be random graph with a planted clique of size , and let be the -parity tensor for . Let be a small constant and let be the time to compute a vector such that . Then, for

 p≥C0r5n12−ϵlog3n,

the planted clique can be recovered with high probability in time , where is a fixed constant.

### 1.1 Overview of analysis

The majority of the paper is concerned with proving Theorem 1. In Section 2.1, we first reduce the problem of bounding over the unit ball to bounding it over a discrete set of vectors that have the same value in every non-zero coordinate. In Section 2.2, we further reduce the problem to bounding the norm of an off-diagonal block of , using a method of Frieze and Kannan. This enables us to assume that if is a valid index, then the random variables used to compute are independent. In Section 2.3, we prove a large deviation inequality (Lemma 6) that allows us to bound norms of vectors encountered in a certain decomposition of the tensor polynomial. This inequality gives us a considerably sharper bound than the Hoeffding or McDiarmid inequalities in our context. We then apply this lemma to bound for as a warm-up and then give the proof for general in Section 3.

In Section 4 we prove Theorem 2. We first show that any vector that comes close to maximizing must be close to the indicator vector of the clique (Lemma 4). Finally, we show that given such a vector it is possible to recover the clique (Lemma 14).

## 2 Preliminaries

### 2.1 Discretization

The analysis of is greatly simplified when is proportional to some indicator vector. Fortunately, analyzing these vectors is sufficient, as any vector can be approximated as a linear combination of relatively few indicator vectors.

For any vector , we define to be vector such that if and otherwise. Similarly, let if and otherwise. For a set , let be the indicator vector for , where the entry is if and otherwise.

###### Definition 1 (Indicator Decomposition).

For a unit vector , define the sets and through the recurrences

 Sj={i∈[n]:(x(+)−j−1∑k=12−kχSk)i>2−j}.

and

 Tj={i∈[n]:(x(−)−j−1∑k=12−kχSk)i<−2−j}.

Let . For , let and let . We call the set the indicator decomposition of .

Clearly,

 ∥y(i)(x)∥≤max{∥x(+)∥,∥x(−)∥}≤1.

and

 ∥∥ ∥∥x−N∑j=−Ny(j)(x)∥∥ ∥∥≤√n2−N.

We use this decomposition to prove the following theorem.

###### Lemma 4.

Let

 U={k|S|−1/2χS:S⊆[n],k∈{−1,1}}.

For any tensor over where

 maxx(1),…,x(r)∈B(0,1)A(x(1),…x(r))≤(2⌈rlogn⌉)rmaxx(1),…,x(r)∈UA(x(1),…,x(r))
###### Proof.

Consider a fixed set of vectors and let . For each , let

 ^x(i)=N∑j=−Ny(j)(x(i)).

We first show that replacing with gives a good approximation to . Letting be the maximum difference between an and its approximation, we have that

 maxi∈[r]∥x(i)−^x(i)∥=ϵ≤nr/22r

Because of the multilinear form of we have

 |A(x(1),…,x(r))−A(^x(1),…,^x(r))| ≤ r∑i=1ϵiri∥A∥ ≤ ϵr1−ϵr∥A∥ ≤ n−r/2∥A∥ ≤ 1.

Next, we bound . For convenience, let . Then using the multlinear form of and bounding the sum by its maximum term, we have

 A(^x(1),…,^x(r)) ≤ (2N)rmaxv(1)∈Y(1),…,v(r)∈Y(r)A(v(1),…,v(r)) ≤ (2N)rmaxv(1),…,v(r)∈UA(v(1),…,v(r)).

### 2.2 Sufficiency of off-diagonal blocks

Analysis of is complicated by the fact that all terms with repeated indices are zero. Off-diagonal blocks of are easier to analyze because no such terms exist. Thankfully, as Frieze and Kannan [3] have shown, analyzing these off-diagonal blocks suffices. Here we generalize their proof to .

For a collection of subsets of , we define

 A|V1×…×Vr(x(1),…,x(r))=∑k1∈V1,…,kr∈VrAk1…krx(1)i1x(2)i2…x(r)ir
###### Lemma 5.

Let be the class of partitions of into equally sized sets (assume wlog that divides ). Let . Let A be a random tensor over where each entry is in and let . If for every fixed , it holds that

 Pr[maxx(1),…,x(r)∈RA|V(x(1),…,x(r))≥f(n)]≤δ,

then

 Pr[maxx(1),…,x(r)∈RA(x(1),…,x(r))≥2rrf(n)]≤δnr/2f(n),
###### Proof of Lemma 5.

Each -tuple appears in an equal number of partitions and this number is slightly more than a fraction of the total. Therefore,

 ∣∣A(x(1),…A(x(r))∣∣ ≤ rr|P|∣∣ ∣∣∑{V1,…,Vr}∈PA|V(x(1),…A(x(r))∣∣ ∣∣ ≤ rr|P|∑{V1,…,Vr}∈P∣∣A|V(x(1),…A(x(r))∣∣

We say that a partition is good if

 maxx(1),…,x(r)∈RA|V(x(1),…,x(r))

Let the good partitions be denoted by and let . Although the upper bound does not hold for partitions in , the trivial upper bound of does (recall that every entry in the tensor is in the range and ). Therefore

 ∣∣A(x(1),…A(x(r))∣∣≤rr(|G||P|f+|¯G||P|nr/2).

Since by hypothesis, Markov’s inequality gives

 Pr[|G||P|nr/2>f]≤δnr/2f

and thus proves the result. ∎

### 2.3 A concentration bound

The following concentration bound is a key tool in our proof of Theorem 1. We apply it for .

###### Lemma 6.

Let and be collections of vectors of dimension where each entry of is or with probability and . Then for any ,

 Pr[N∑i=1(u(i)⋅v(i))2≥t]≤e−t/18(4√eπ)N.

Before giving the proof, we note that this lemma is stronger than what a naive application of standard theorems would yield for . For instance, one might treat each as an independent random variable and apply a Hoeffding bound. The quantity can vary by as much as , however, so the bound would be roughly for some constant . Similarly, treating each as an independent random variable and applying McDiarmid’s inequality, we find that every can affect the sum by as much as (simultaneously). For instance suppose that every and every . Then flipping would have an effect of , so the bound would be roughly for some constant .

###### Proof of Lemma 6.

Observe that is the length of the vector whose th coordinate is . Therefore, this is also equivalent to the maximum projection of this vector onto a unit vector:

  ⎷N∑i=1(u(i)⋅v(i))2=maxy∈B(0,1)N∑i=1N′∑j=1yiu(i)jv(i)j.

We will use an -net to approximate the unit ball and give an upper bound for this quantity. Let be the lattice .

###### Claim 7.

For any vector ,

 ∥x∥2≤2maxy∈L∩B(0,3/2)y⋅x.

Thus,

  ⎷N∑i=1(u(i)⋅v(i))2≤2maxy∈L∩B(0,3/2)N∑i=1yiN′∑j=1u(i)jv(i)j.

Consider a fixed . Each is or with equal probability, so the expectation for each term is zero. The difference between the upper and lower bounds for a term is

 2|2yju(i)jv(i)j|=4|yjv(i)j|

Therefore,

 16N∑i=1N′∑j=1(yiu(i)jv(i)j)2≤16N∑i=1y2N′∑j=1(v(i)j)2=36.

Applying the Hoeffding bound gives that

 Pr[N∑i=1(u(i)⋅v(i))2≥t]≤Pr[2N∑i=1yiN′∑j=1u(i)jv(i)j≥√t]≤e−t/18.

The result follows by taking a union bound over , whose cardinality is bounded according to Claim 8. ∎

###### Claim 8.

The number of lattice points in is at most

###### Proof of Claim 8.

Consider the set of hypercubes where each cube is centered on a distinct point in and each has side length of . These cubes are disjoint and their union contains the ball . Their union is also contained in the ball . Thus,

 |L∩B(0,3/2)| ≤ Vol(B(0,2))(2√N)−N ≤ πN/22NΓ(N/2+1)2NNN/2 ≤ (4√eπ)N.

###### Proof of Claim 7.

Without loss of generality, we assume that is a unit vector. Let be the closest point to in the lattice. In each coordinate , we have , so overall .

Letting be the angle between and , we have

 x⋅y∥x∥∥y∥=cosθ=√1−sin2θ≥(1−∥x−y∥2max{∥x2∥,∥y∥2})1/2≥√1516.

Therefore,

 x⋅y≥∥y∥√1516≥34√1516≥12.

## 3 A bound on the norm of the parity tensor

In this section, we prove Theorem 1. First, however, we consider the somewhat more transparent case of using the same proof technique.

### 3.1 Warm-up: third order tensors

For the tensor is defined as follows:

 Ak1k2k3=Ek1k2Ek2k3Ek1k3.
###### Theorem 9.

There is a constant such that with probability

 ∥A∥≤C1√nlog4n.
###### Proof.

Let be a partition of the vertices and let . The bulk of the proof consists of the following lemma.

###### Lemma 10.

There is some constant such that

 maxx(1),x(2),x(3)∈UA|V(x(1),x(2),x(3))≤C3√nlogn

with probability .

If this bound holds, then Lemma 4 then implies that there is some such that

 maxx(1),x(2),x(3)∈B(0,1)A|V(x(1),x(2),x(3))≤C2√nlog4n.

And finally, Lemma 5 implies that for some constant

 maxx(1),x(2),x(3)∈B(0,1)A(x(1),x(2),x(3))≤C1√nlog4n

with probability for some constant . ∎

###### Proof of Lemma 10.

Define

 Uk={x∈U:|supp(x)|=k} (1)

and consider a fixed . We will show that

 max(x(1),x(2),x(3))∈Un1×Un2×Un3A|V(x(1),x(2),x(3))≤C3√nlogn

with probability for some constant . Taking a union bound over the choices of then proves the lemma.

We bound the cubic form as

 max(x(1),x(2),x(3))∈Un1×Un2×Un3A|V(x(1),x(2),x(3)) = max(x(1),x(2),x(3))∈Un1×Un2×Un3∑k1∈V1,k2∈V2,k3∈V3Ak1k2k3x(1)k1x(2)k2x(3)k3 ≤ max(x(2),x(3))∈Un2×Un3  ⎷∑k1∈V1⎛⎝∑k2∈V2,k3∈V3Ak1k2k3x(2)k2x(3)k3⎞⎠2 = max(x(2),x(3))∈Un2×Un3  ⎷∑k1∈V1⎛⎝∑k2∈V2Ek1k2x(2)k2∑k3∈V3Ek2k3x(3)k3Ek1k3⎞⎠2.

Note that each of the inner sums (over and ) are the dot product of a random vector (the and terms) and another vector. Our strategy will be to bound the norm of this other vector and apply Lemma 6.

In more detail, we view the expression inside the square root a

 ∑k1∈V1⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝∑k2∈V2(u(k1)k2Ek1k2v(k1)(x(2),x(3))k2x(2)k2∑k3∈V3u(k2)k3Ek2k3(v(k1k2)(x(3))k3x(3)k3Ek1k3u(k2)⋅v(k1k2)(x(3))u(k1)⋅v(k1)(x(2),x(3))⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠2 (2)

where and , while

 v(k1k2)(x(3))k3=x(3)k3Ek1k3

and

 v(k1)(x(2),x(3))k2=x(2)k2(u(k2)⋅v(k1k2)(x(3))).

Clearly, the ’s play the role of the random vectors and we will bound the norms of the ’s in the application of Lemma 6.

To apply Lemma 6 with being the index , above, we need a bound for every on the norm of . We argue

 ∑k2⎛⎝x(2)k2∑k3∈V3Ek2k3x(3)k3Ek1k3⎞⎠2 ≤ maxk1∈V1maxx(2)∈Un2maxx(3)∈Un31n2∑k2∈supp(x(x2)⎛⎝∑k3Ek2k3x(3)k3Ek1k3⎞⎠2 = F21

Here we used the fact that . Note that is a function of the random variables only.

To bound , we observe that we can apply Lemma 6 to the expression being maximized above, i.e.,

 ∑k2⎛⎝∑k3Ek2k3(x(3)k3Ek1k3)⎞⎠2

over the index , with . Now we need a bound, for every and on the norm of the vector . We argue

 ∑k3(x(3)k3Ek1k3)2 ≤ ||x(3)||2∞∑k3E2k1k3 ≤ 1.

Applying Lemma 6 for a fixed and implies

 1n2∑k2∈supp(x(2))⎛⎝∑k3Ek2k3x(3)k3Ek1k3⎞⎠2>C3logn

with probability at most

 exp(−C3n2logn18)(4√eπ)n2.

Taking a union bound over the choices of , and the at most choices for and , we show that

 Pr[F21>C3logn]≤exp(−C3n2logn18)(4√eπ)n2nnn2nn3.

This probability is at most for a large enough constant .

Thus, for a fixed and , we can apply Lemma 6 to Eqn. 2 with to get:

 ∑k1∈V1⎛⎝∑k2∈V2Ek1k2⎛⎝x(2)k2∑k3∈V3Ek2k3x(3)k3Ek1k3⎞⎠⎞⎠2>F21C3nlogn

with probability at most . Taking a union bound over the at most choices for and , the bound holds with probability

 exp(−C3nlogn/18)(4√eπ)nnn2nn3≤n−10/2

for large enough constant .

Thus, we can bound the squared norm:

 max(x(1),x(2),x(3))∈Un1×Un2×Un3A|V(x(1),