ETH Hardness for DensestSubgraph with Perfect Completeness
Abstract
We show that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between a graph with an induced clique and a graph in which all subgraphs have density at most , requires time. Our result essentially matches the quasipolynomial algorithms of Feige and Seltser [FS97] and Barman [Bar15b] for this problem, and is the first one to rule out an additive PTAS for Densest Subgraph. We further strengthen this result by showing that our lower bound continues to hold when, in the soundness case, even subgraphs smaller by a nearpolynomial factor () are assumed to be at most dense.
Our reduction is inspired by recent applications of the “birthday repetition” technique [AIM14, BKW15]. Our analysis relies on information theoretical machinery and is similar in spirit to analyzing a parallel repetition of twoprover games in which the provers may choose to answer some challenges multiple times, while completely ignoring other challenges.
1 Introduction
Clique is one of the most fundamental problems in computer science: given a graph, decide whether it has a fully connected induced subgraph on vertices. Since it was proven NPcomplete by Karp [Kar72], extensive research has investigated the complexity of relaxed versions of this problem.
This work focuses on two natural relaxations of Clique which have received significant attention from both algorithmic and complexity communities: The first one is to relax “”, i.e. looking for a smaller subgraph:
Problem 1.1 (Approximate Max Clique, Informal).
Given an vertex graph , decide whether contains a clique of size , or all induced cliques of are of size at most for some .
The second natural relaxation is to relax the “Clique” requirement, replacing it with the more modest goal of finding a subgraph that is almost a clique:
Problem 1.2 (Densest Subgraph with perfect completeness, Informal).
Given an vertex graph containing a clique of size , find an induced subgraphs of of size with (edge) density at least , for some . (More modestly, given an vertex graph , decide whether contains a clique of size , or all induced subgraphs of have density at most ).
Today, after a long line of research [FGL96, AS98, ALM98, Hås99, Kho01, Zuc07] we have a solid understanding of the inapproximability of Problem 1.1. In particular, we know that it is NPhard to distinguish between a graph that has a clique of size , and a graph whose largest induced clique is of size at most for [Zuc07]. The computational complexity of the second relaxation (Problem 1.2) remained largely open. There are a couple of quasipolynomial algorithms that guarantee finding a dense subgraph in every graph containing a clique [FS97, Bar15b]^{1}^{1}1Barman [Bar15b] approximates the Densest BiSubgraph problem. Densest Subgraph can be handled via a simple modification [Bar15a]., suggesting that this problem is not NPhard. Yet we know neither polynomialtime algorithms, nor general impossibility results for this problem.
In this work we provide a strong evidence that the aforementioned quasipolynomial time algorithms for Problem 1.2 [FS97, Bar15b] are essentially tight, assuming the (deterministic) Exponential Time Hypothesis (ETH), which postulates that any deterministic algorithm for 3SAT requires time [IP01]. In fact, we show that under ETH, both parameters of the above relaxations are simultaneously hard to approximate:
Theorem 1.3 (Main Result).
There exists a universal constant such that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between the following requires time , where is the number of vertices of .
 Completeness

has an induced clique; and
 Soundness

Every induced subgraph of size has density at most ,
Our result has implications for two major open problems whose computational complexity remained elusive for more than two decades: The (general) Densest Subgraph problem, and the Planted Clique problem.
The Densest Subgraph problem, , is the same as (the decision version of) Problem 1.2, except that in the “completeness” case, has a subgraph with density , and in the “soundness” case, every subgraph is of density at most , where . Since Problem 1.2 is a special case of this problem, our main theorem can also be viewed as a new inapproximability result for . We remark that the aforementioned quasipolynomial algorithms for the “perfect completeness” regime completely break in the sparse regime, and indeed it is believed that (for ) in fact requires much more than quasipolynomial time [BCV12]. The best todate algorithm for Densest Subgraph due to Bhaskara et. al, is guaranteed to find a subgraph whose density is within an multiplicative factor of the densest subgraph of size [BCV12], and thus can be solved efficiently whenever (this improved upon a previous approximation of Feige et. al [FKP01]). Making further progress on either the lower or upper bound frontier of the problem is a major open problem.
Several inapproximability results for Densest Subgraph were known against specific classes of algorithms [BCV12] or under assumptions that are incomparable or stronger (thus giving weaker hardness results) than ETH: [Kho06], Unique Games with expansion [RS10], and hardness of random CNF [Fei02, AAM11]. The most closely related result is by Khot [Kho06], who shows that the Densest Subgraph problem has no PTAS unless SAT is in randomized subexponential time. The result of [Kho06], as well as other aforementioned works, focus on the subconstant density regime, i.e. they show hardness for distinguishing between a graph where every subgraph is sparse, and one where every subgraph is extremely sparse. In contrast, our result has perfect completeness and provides the first additive inapproximability for Densest Subgraph — the best one can hope for as per the upper bound of [Bar15b].
The Planted Clique problem is a special case of our problem, where the inputs come from a specific distribution ( versus “a planted clique of size ”, where is some constant, typically ). The Planted Clique Conjecture ([AAK07, AKS98, Jer92, Kuc95, FK00, DGGP10]) asserts that distinguishing between the aforementioned cases for cannot be done in polynomial time, and has served as the underlying hardness assumption in a variety of recent applications including machinelearning and cryptography (e.g. [AAK07, BBB13, BR13]) that inherently use the averagecase nature of the problem, as well as in reductions to worstcase problems (e.g. [HK11, AAM11, CLLR15, BPR15b]).
The main drawback of averagecase hardness assumptions is that many averagecase instances (even those of worstcasehard problems) are in fact tractable. In recent years, the centrality of the planted clique conjecture inspired several works that obtain lower bounds in restricted models of computation [FGR13, MPW15, DM15]. Nevertheless, a general lower bound for the averagecase planted clique problem appears out of reach for existing lower bound techniques. Therefore, an important potential application of our result is replacing averagecase assumptions such as the plantedclique conjecture, in applications that do not inherently rely on the distributional nature of the inputs (e.g., when the ultimate goal is to prove a worstcase hardness result). In such applications, there is a good chance that planted clique hardness assumptions can be replaced with a more “conventional” hardness assumption, such as the ETH, even when the problem has a quasipolynomial algorithm. Recently, such a replacement of the planted clique conjecture with ETH was obtained for the problem of finding an approximate Nash equilibrium with approximately optimal social welfare [BKW15].
We also remark that, while showing hardness for Planted Clique from worstcase assumptions seems beyond the reach of current techniques, our result can also be seen as circumstantial evidence that this problem may indeed be hard. In particular, any polynomial time algorithm (if exists) would have to inherently use the (rich and wellunderstood) structure of .
Techniques
Our simple construction is inspired by the “birthday repetition” technique which appeared recently in [AIM14, BKW15, BPR15a]: given a 2CSP (e.g. 3COL), we have a vertex for each tuple of variables and assignments (respectively, 3COL vertices and colorings). We connect two vertices by an edge whenever their assignments are consistent and satisfy all 2CSP constraints induced on these tuples. In the completeness case, a clique consists of choosing all the vertices that correspond to a fixed satisfying assignment. In the soundness case (where the value of the 2CSP is low), the “birthday paradox” guarantees that most pairs of vertices vertices (i.e. two tuples of variables) will have a significant intersection (nonempty CSP constraints), thus resulting in lower densities whenever the 2CSP does not have a satisfying assignment. In the language of twoprover games, the intuition here is that the verifier has a “constant chance in catching the players in a lie if thy are trying to cheat” in the game while not satisfying the CSP.
While our construction is simple, analyzing it is intricate. The main challenge is to rule out a “cheating” dense subgraph that consists of different assignments to the same variables (inconsistent colorings of the same vertices in 3COL). Intuitively, this is similar in spirit to proving a parallel repetition theorem where the provers can answer some questions multiple times, and completely ignore other questions. Continuing with the parallel repetition metaphor, notice that the challenge is doubled: in addition to a cheating prover correlating her answers (the standard obstacle to parallel repetition), each prover can now also correlate which questions she chooses to answer. Our argument follows by showing that a sufficiently large subgraph must accumulate many nonedges (violations of either 2CSP or consistency constraints). To this end we introduce an information theoretic argument that carefully counts the entropy of choosing a random vertex in the dense subgraph.
1.1 Open problems
There are several interesting open problems related to our work. We henceforth list four of them that are of particular interest and potential applications.
Strengthening the inapproximability factor
Our result states that it is hard to distinguish between a graph containing a clique and a graph that does not contain a very dense () subgraph. The latter seems to be a limitation of our technique. None of the algorithms we know (including the two quasipolynomial time algorithms mentioned above) can distinguish in polynomial time between a graph containing a clique and a graph that does not contain even a slightly dense () subgraph; for any constant , and in fact even for some subconstant values of . Furthermore, there is evidence [AAM11] that this problem may indeed be hard. This naturally leads to the following problem.
Problem 1.4 (Hardness Amplification).
Show that for every given constant , distinguishing between the following two cases is ETHhard:

There exists of size such that .

All of size have .
We remark that a similar amplification, from “clique versus dense” ( vs. ) to “clique versus sparse” ( vs. ), was shown by Alon et al. when the “clique vs. dense” instance is drawn at random according to the planted clique model [AAM11]. (Unfortunately, their techniques do not seem to apply to our hard instance.)
An easier variant of Problem 1.4 is to show hardness for a large gap in the imperfect completeness regime.
Problem 1.5 (Hardness Amplification  imperfect completeness).
Show that there exist parameters for which distinguishing between the following two cases is ETHhard:

There exists of size such that .

All of size have .
Beyond quasipolynomial hardness
Another interesting challenge is to trade the perfect completeness in our main result for stronger notions of hardness. Indeed, there are substantial evidences which suggest that the “sparse vs. verysparse” regime () is much harder to solve. The gap instance in [BCV12] where all known linear and semidefinite programming techniques fail is a very sparse instance and has integrality gap of . In particular, every vertex has degree , compared to almost linear average degree in our instance. Since no other algorithms succeed in this regime (even in quasipolynomial time), it is natural to look for stronger lower bounds on the running time.
Problem 1.6 (Tradingoff perfect completeness for stronger lower bounds).
Show that there exist parameters for which distinguishing between the following two cases is NPhard:

There exists of size such that .

All of size have .
Finding Stable Communities
The problem of finding Stable Communities is tightly related to Densest Subgraph, and has received recent attention in the context of social networks and learning theory [AGSS12, AGM13, BL13].
Definition 1.7 (Stable Communities [Bbb13]).
Let with be two positive parameters. Given an undirected graph, , is an cluster if is :

Internally Dense: , .

Externally Sparse: , .
Currently, only planted clique based hardness is known.
Theorem 1.8 ([Bbb13]).
For sufficiently small (constant) , finding a cluster is at least as hard as Planted Clique.
As insinuated in the introduction, we believe it is plausible and interesting to see whether the hardness assumption of the theorem above can be replaced with ETH.
Problem 1.9 (Hardness of Stable Communities).
Show that for some with , finding an cluster is ETHhard.
2 Preliminaries
Throughout the paper we use to denote the density of subgraph ,
2.1 Information theory
In this section, we introduce informationtheoretic quantities used in this paper. For a more thorough introduction, the reader should refer to [CT12]. Unless stated otherwise, all ’s in this paper are base.
Definition 2.1.
Let be a probability distribution on sample space . The Shannon entropy (or just entropy) of , denoted by , is defined as .
Definition 2.2 (Binary Entropy Function).
For , the binary entropy function is defined as follows (with a slight abuse of notation) .
Fact 2.3 (Concavity of Binary Entropy).
Let be a distribution on , and let . Then .
For a random variable we shall write to denote the entropy of the induced distribution on the support of . We use the same abuse of notation for other informationtheoretic quantities appearing later in this section.
Definition 2.4.
The Conditional entropy of a random variable conditioned on is defined as
Fact 2.5 (Chain Rule).
Fact 2.6 (Conditioning Decreases Entropy).
.
Another measure we will use (briefly) in our proof is that of Mutual Information, which informally captures the correlation between two random variables.
Definition 2.7 (Conditional Mutual Information).
The mutual information between two random variable and , denoted by is defined as
The conditional mutual information between and given , denoted by , is defined as
The following is a wellknown fact on mutual information.
Fact 2.8 (Data processing inequality).
Suppose we have the following Markov Chain:
where . Then or equivalently, .
Mutual Information is related to the following distance measure.
Definition 2.9 (KullbackLeiber Divergence).
Given two probability distributions and on the same sample space such that , the KullbackLeibler Divergence between is defined as (also known as relative entropy)
The connection between the mutual information and the KullbackLeibler divergence is provided by the following fact.
Fact 2.10.
For random variables and we have
2.2 and the PCP Theorem
In the problem, we are given a graph on vertices, where each of the edges is associated with some constraint function which specifies a set of legal “colorings” of and , from some finite alphabet ( in the term “ ” stands for the “arity” of each constraint, which always involves two variables). Let us denote by the entire instance, and define by the maximum fraction of satisfied constraints in the associated graph , over all possible assignments (colorings) of .
The starting point of our reduction is the following version of the PCP theorem, which asserts that it is NPhard to distinguish a instance whose value is , and one whose value is , where is some small constant:
Given a instance of size , there is a polynomial time reduction that produces a instance , with size variables and constraints, and constant alphabet size such that

(Completeness) If then .

(Soundness) If then , for some constant

(Balance) Every vertex in has degree for some constant .
Theorem 2.11 (PCP Theorem [Din07]).
In the appendix, we describe in detail how to derive this formulation of the PCP Theorem from that of e.g. [AIM14].
Notice that since the size of the reduction is near linear, ETH implies that solving the above problem requires near exponential time.
Corollary 2.12.
Let be as in Theorem 2.2. Then assuming ETH, distinguishing between and requires time .
3 Main Proof
3.1 Construction
Let be the instance produced by the reduction in Theorem 2.2, i.e. a constraint graph over variables with alphabet of constant size. We construct the following graph :

Let and .

Vertices of correspond to all possible assignments (colorings) to all tuples of variables in , i.e . Each vertex is of the form where are the chosen variables of , and is the corresponding assignment to variable .

If violates any constraints, i.e. if there is a constraint on in which is not satisfied by , then is an isolated vertex in .

Let and . iff:

does not violate any consistency constraints: for every shared variable , the corresponding assignments agree, ;

and also does not violate any constraints: for every constraint on (if exists), the assignment satisfy the constraint.

Notice that the size of our reduction (number of vertices of ) is .
Completeness
If , then has a clique: Fix a satisfying assignment for , and let be the set of all vertices that are consistent with this assignment. Notice that . Furthermore its vertices do not violate any consistency constraints (since they agree with a single assignment), or constraints (since we started from a satisfying assignment).
4 Soundness
Suppose that , and let be some constant to be determined later. We shall show that for any subset of size , , where is some constant depending on . The remainder of this section is devoted to proving the following theorem:
If , then of size , for some constant .
Theorem 4.1 (Soundness).
4.1 Setting up the entropy argument
Fix some subset of size , and let be a uniformly chosen vertex in (recall that is a vector of coordinates, corresponding to labels for a subset of chosen variables). Let denote the indicator variable associated with such that if the ’th variable appears in and otherwise. We let represent the coloring assignment (label) for the ’th variable whenever , which is of the form . Throughout the proof, let
denote the ’th prefix corresponding to . We can write :
since . Notice that since and determine each other, and was uniform on a set of size , we have
Observation 4.2.
.
Thus, in total, the choice of challenge and the choice of assignments should contribute to the entropy of . If much of the entropy comes from the assignment distribution (conditioned on the fixed challenge variables), we will show that must have many consistency violations, implying that is sparse. If, on the other hand, almost all the entropy comes from the challenge distribution, we will show that this implies many CSP constraint violations (implied by the soundness assumption). From now on, we denote
When conditioning on the ’th prefix, we shall write , and similarly for . Also for brevity, we denote
Prefix graphs
The consistency constraints induce, for each , a graph over the prefixes: the vertices are the prefixes, and two prefixes are connected by an edge if their labels are consistent. (We can ignore the constraints for now — the prefix graph will be used only in the analysis of the consistency constraints.) Formally,
Definition 4.3 (Prefix graph).
For let the th prefix graph, be defined over the prefixes of length as follows. We say that is a neighbor of if they do not violate any consistency constraints. Namely, for all , if for both and , then and assign the same label .
In particular, we will heavily use the following notation: let be the prefix neighborhood of ; i.e. it is the set of all prefixes (of length ) that are consistent with . For technical issues of normalization, we let , i.e. all the prefixes have selfloops.
Notice that is defined over the vertices of (the original subgraph). The set of edges on is contained in the set of edges of , since in the latter we only remove pairs that violated consistency constraints (recall that we ignore the constraints).
Unless stated otherwise, we always think of prefixes as weighted by their probabilities. Naturally, we also define the weighted degree and weighted edge density of the prefix graph.
Definition 4.4 (Prefix degree and density).
The prefix degree of is given by:
Similarly, we define the prefix density of as:
When it is clear from the context, we henceforth drop the prefix qualification, and simply refer to the neighborhood or degree, etc., of .
Notice that in , the probabilities are uniformly distributed. In particular, , since, as we mentioned earlier, the set of edges in is contained in that of . Finally, observe also that because we accumulate violations, the density of the prefix graphs is monotonically nonincreasing with .
Observation 4.5.
Useful approximations
We use the following bounds on and many times throughout the proof:
Fact 4.6.
Fact 4.7.
Proof.
The bound on follows from concavity of entropy (Fact 2.3). For the second bound, observe that is maximized by spreading mass uniformly over alphabet . ∎
We also recall some elementary approximations to logarithms and entropies that will be useful in the analysis. The proofs are deferred to the appendix.
Fact 4.8.
For then,
More useful to us will be the following bounds on :
Fact 4.9.
Let , and as specified in the construction. Then,
In particular, this means that most indices should contribute roughly entropy to the choice of .
We will also need the following bound which relates the entropies of a very biased coin and a slightly less biased one:
Fact 4.10.
Let , then
4.2 Consistency violations
In this section, we show that if the entropy contribution of the assignments () is large, there are many consistency violations between vertices, which lead to constant density loss. First, we show that if , then at least a constant fraction of such entropy is concentrated on “good” variables.
Definition 4.11 (Good Variables).
We say that an index is good if
where is a constant to be determined later in the proof.
Claim 4.12.
For any constant , if ,
Proof.
We want to show that many of the indices have both a large and a large simultaneously. We can write
Using Facts 4.6 and 4.7, we have
(1) 
Because the subgraph is of size , from the expansion of (Fact 4.9),
where the second inequality follows from the concavity of entropy. Plugging into (1), we have
Rearranging, we get
(2) 
For all the ’s in the LHS summation, by Fact 4.7. From now on, we will consider only ’s that satisfy this condition. Now, using the premise on and (2) we have:
where the second inequality follows from our approximation for (Fact 4.8).
We want to further restrict our attention to ’s for which is at least (aka good ’s). Note that the above inequality can be decomposed to
Now via a simple sum bound,
Rearranging, we get,
By CauchySchwartz we have:
Finally, since ,
∎
In the same spirit, we now define a notion of a “good” prefix. Intuitively, conditioning on a good prefix leaves a significant amount of entropy on the ’th index. We also require that a good prefix has a high prefix degree; that is, it has many neighbors it could potentially lose when revealing the th label.
Definition 4.13 (Good Prefixes).
We say is a good prefix if

is good;

;

where , with an an arbitrarily small constant that denotes the fraction of assignments that disagree with the majority of the assignments, factor, and a constant that satisfies , with .
In the following claim, we show that these prefixes contribute some constant fraction of entropy, assuming that our subset is dense.
Claim 4.14.
If , where and , then for every good index , it holds that
Proof.
We begin by proving that most prefixes satisfy the degree condition of Definition 4.13. Let be popular if is a good variable and its degree in the prefix graph is at least . Recall that (by Observation 4.5). Thus by Markov inequality, at most fraction of the prefixes are unpopular.
Let be the indicator variable for being popular. For the sake of contradiction, suppose that more than fraction of the mass is concentrated on unpopular prefixes, that is:
(3) 
We would like to argue that this condition implies that the distribution on the ’s is highly biased by the conditioning on the (popularity of the) prefix; this in turn implies that , the expected conditional entropy of , must be low, contradicting the assumption that is good. Indeed, by dataprocessing inequality (Fact 2.8),
(4)  
Since we can write mutual information as expected KLdivergence (Fact 2.10), and KLdivergence is nonnegative, we get
where the second inequality follows from the fact that for all good ’s, our degree assumption implies , and our assumption in (3) implies, via Bayes rule, that , and therefore . Note that by our setting of parameters .
Plugging into (4) we have:
(5) 
On the other hand, recall that since is good, . Recall also that , and therefore . Thus, we get a contradiction to (3). From now on we assume
(6) 
This implies that even if the assignment is uniform over the alphabet, the contribution to from unpopular prefixes is small:
where first inequality follows from Fact 4.7, second from (6), third from our setting of , and fourth from since is good. Therefore,
Using a similar argument, we show that for any popular , most of the mass is concentrated on its neighbors. Consider any popular , and let denote the complement of . Then we can rewrite as:
Notice that since is popular, has measure at most . Thus, if an fraction of the mass is concentrated on , we once again (like in (5)) have
which would again yield a contradiction to being a good variable. Therefore every popular prefix also satisfies the weighted condition on the degree:
(7) 
Recall that a prefix is good if it also satisfies . Fortunately, prefixes that violate this condition (i.e. those with small ), cannot account for much of the weight on :
Since is good and , this implies:
since
where last inequality follows from being good. ∎
Corollary 4.15.
For every good index ,
Lemma 4.16 (Labeling Entropy Bound).
If , then .
Proof.
Assume for a contradiction that . For prefix , let denote the induced distribution on labels to the th variable, conditioned on and . (If , take an arbitrary distribution.) After revealing each variable , the loss in prefix density is given by the probability of “fresh violations”: the sum over all prefix edges of the probability that they assign different labels to the th variable: