ETH Hardness for Densest--Subgraph with Perfect Completeness
We show that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between a graph with an induced -clique and a graph in which all -subgraphs have density at most , requires time. Our result essentially matches the quasi-polynomial algorithms of Feige and Seltser [FS97] and Barman [Bar15b] for this problem, and is the first one to rule out an additive PTAS for Densest -Subgraph. We further strengthen this result by showing that our lower bound continues to hold when, in the soundness case, even subgraphs smaller by a near-polynomial factor () are assumed to be at most -dense.
Our reduction is inspired by recent applications of the “birthday repetition” technique [AIM14, BKW15]. Our analysis relies on information theoretical machinery and is similar in spirit to analyzing a parallel repetition of two-prover games in which the provers may choose to answer some challenges multiple times, while completely ignoring other challenges.
-Clique is one of the most fundamental problems in computer science: given a graph, decide whether it has a fully connected induced subgraph on vertices. Since it was proven NP-complete by Karp [Kar72], extensive research has investigated the complexity of relaxed versions of this problem.
This work focuses on two natural relaxations of -Clique which have received significant attention from both algorithmic and complexity communities: The first one is to relax “”, i.e. looking for a smaller subgraph:
Problem 1.1 (Approximate Max Clique, Informal).
Given an -vertex graph , decide whether contains a clique of size , or all induced cliques of are of size at most for some .
The second natural relaxation is to relax the “Clique” requirement, replacing it with the more modest goal of finding a subgraph that is almost a clique:
Problem 1.2 (Densest -Subgraph with perfect completeness, Informal).
Given an -vertex graph containing a clique of size , find an induced subgraphs of of size with (edge) density at least , for some . (More modestly, given an -vertex graph , decide whether contains a clique of size , or all induced -subgraphs of have density at most ).
Today, after a long line of research [FGL96, AS98, ALM98, Hås99, Kho01, Zuc07] we have a solid understanding of the inapproximability of Problem 1.1. In particular, we know that it is NP-hard to distinguish between a graph that has a clique of size , and a graph whose largest induced clique is of size at most for [Zuc07]. The computational complexity of the second relaxation (Problem 1.2) remained largely open. There are a couple of quasi-polynomial algorithms that guarantee finding a -dense subgraph in every graph containing a -clique [FS97, Bar15b]111Barman [Bar15b] approximates the Densest -Bi-Subgraph problem. Densest -Subgraph can be handled via a simple modification [Bar15a]., suggesting that this problem is not NP-hard. Yet we know neither polynomial-time algorithms, nor general impossibility results for this problem.
In this work we provide a strong evidence that the aforementioned quasi-polynomial time algorithms for Problem 1.2 [FS97, Bar15b] are essentially tight, assuming the (deterministic) Exponential Time Hypothesis (ETH), which postulates that any deterministic algorithm for 3SAT requires time [IP01]. In fact, we show that under ETH, both parameters of the above relaxations are simultaneously hard to approximate:
Theorem 1.3 (Main Result).
There exists a universal constant such that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between the following requires time , where is the number of vertices of .
has an induced -clique; and
Every induced subgraph of size has density at most ,
Our result has implications for two major open problems whose computational complexity remained elusive for more than two decades: The (general) Densest -Subgraph problem, and the Planted Clique problem.
The Densest -Subgraph problem, , is the same as (the decision version of) Problem 1.2, except that in the “completeness” case, has a -subgraph with density , and in the “soundness” case, every -subgraph is of density at most , where . Since Problem 1.2 is a special case of this problem, our main theorem can also be viewed as a new inapproximability result for . We remark that the aforementioned quasi-polynomial algorithms for the “perfect completeness” regime completely break in the sparse regime, and indeed it is believed that (for ) in fact requires much more than quasi-polynomial time [BCV12]. The best to-date algorithm for Densest -Subgraph due to Bhaskara et. al, is guaranteed to find a -subgraph whose density is within an -multiplicative factor of the densest subgraph of size [BCV12], and thus can be solved efficiently whenever (this improved upon a previous -approximation of Feige et. al [FKP01]). Making further progress on either the lower or upper bound frontier of the problem is a major open problem.
Several inapproximability results for Densest -Subgraph were known against specific classes of algorithms [BCV12] or under assumptions that are incomparable or stronger (thus giving weaker hardness results) than ETH: [Kho06], Unique Games with expansion [RS10], and hardness of random -CNF [Fei02, AAM11]. The most closely related result is by Khot [Kho06], who shows that the Densest -Subgraph problem has no PTAS unless SAT is in randomized subexponential time. The result of [Kho06], as well as other aforementioned works, focus on the sub-constant density regime, i.e. they show hardness for distinguishing between a graph where every -subgraph is sparse, and one where every -subgraph is extremely sparse. In contrast, our result has perfect completeness and provides the first additive inapproximability for Densest -Subgraph — the best one can hope for as per the upper bound of [Bar15b].
The Planted Clique problem is a special case of our problem, where the inputs come from a specific distribution ( versus “a planted clique of size ”, where is some constant, typically ). The Planted Clique Conjecture ([AAK07, AKS98, Jer92, Kuc95, FK00, DGGP10]) asserts that distinguishing between the aforementioned cases for cannot be done in polynomial time, and has served as the underlying hardness assumption in a variety of recent applications including machine-learning and cryptography (e.g. [AAK07, BBB13, BR13]) that inherently use the average-case nature of the problem, as well as in reductions to worst-case problems (e.g. [HK11, AAM11, CLLR15, BPR15b]).
The main drawback of average-case hardness assumptions is that many average-case instances (even those of worst-case-hard problems) are in fact tractable. In recent years, the centrality of the planted clique conjecture inspired several works that obtain lower bounds in restricted models of computation [FGR13, MPW15, DM15]. Nevertheless, a general lower bound for the average-case planted clique problem appears out of reach for existing lower bound techniques. Therefore, an important potential application of our result is replacing average-case assumptions such as the planted-clique conjecture, in applications that do not inherently rely on the distributional nature of the inputs (e.g., when the ultimate goal is to prove a worst-case hardness result). In such applications, there is a good chance that planted clique hardness assumptions can be replaced with a more “conventional” hardness assumption, such as the ETH, even when the problem has a quasi-polynomial algorithm. Recently, such a replacement of the planted clique conjecture with ETH was obtained for the problem of finding an approximate Nash equilibrium with approximately optimal social welfare [BKW15].
We also remark that, while showing hardness for Planted Clique from worst-case assumptions seems beyond the reach of current techniques, our result can also be seen as circumstantial evidence that this problem may indeed be hard. In particular, any polynomial time algorithm (if exists) would have to inherently use the (rich and well-understood) structure of .
Our simple construction is inspired by the “birthday repetition” technique which appeared recently in [AIM14, BKW15, BPR15a]: given a 2CSP (e.g. 3COL), we have a vertex for each -tuple of variables and assignments (respectively, 3COL vertices and colorings). We connect two vertices by an edge whenever their assignments are consistent and satisfy all 2CSP constraints induced on these tuples. In the completeness case, a clique consists of choosing all the vertices that correspond to a fixed satisfying assignment. In the soundness case (where the value of the 2CSP is low), the “birthday paradox” guarantees that most pairs of vertices vertices (i.e. two -tuples of variables) will have a significant intersection (nonempty CSP constraints), thus resulting in lower densities whenever the 2CSP does not have a satisfying assignment. In the language of two-prover games, the intuition here is that the verifier has a “constant chance in catching the players in a lie if thy are trying to cheat” in the game while not satisfying the CSP.
While our construction is simple, analyzing it is intricate. The main challenge is to rule out a “cheating” dense subgraph that consists of different assignments to the same variables (inconsistent colorings of the same vertices in 3COL). Intuitively, this is similar in spirit to proving a parallel repetition theorem where the provers can answer some questions multiple times, and completely ignore other questions. Continuing with the parallel repetition metaphor, notice that the challenge is doubled: in addition to a cheating prover correlating her answers (the standard obstacle to parallel repetition), each prover can now also correlate which questions she chooses to answer. Our argument follows by showing that a sufficiently large subgraph must accumulate many non-edges (violations of either 2CSP or consistency constraints). To this end we introduce an information theoretic argument that carefully counts the entropy of choosing a random vertex in the dense subgraph.
1.1 Open problems
There are several interesting open problems related to our work. We henceforth list four of them that are of particular interest and potential applications.
Strengthening the inapproximability factor
Our result states that it is hard to distinguish between a graph containing a -clique and a graph that does not contain a very dense () -subgraph. The latter seems to be a limitation of our technique. None of the algorithms we know (including the two quasi-polynomial time algorithms mentioned above) can distinguish in polynomial time between a graph containing a -clique and a graph that does not contain even a slightly dense () -subgraph; for any constant , and in fact even for some sub-constant values of . Furthermore, there is evidence [AAM11] that this problem may indeed be hard. This naturally leads to the following problem.
Problem 1.4 (Hardness Amplification).
Show that for every given constant , distinguishing between the following two cases is ETH-hard:
There exists of size such that .
All of size have .
We remark that a similar amplification, from “clique versus dense” ( vs. ) to “clique versus sparse” ( vs. ), was shown by Alon et al. when the “clique vs. dense” instance is drawn at random according to the planted clique model [AAM11]. (Unfortunately, their techniques do not seem to apply to our hard instance.)
An easier variant of Problem 1.4 is to show hardness for a large gap in the imperfect completeness regime.
Problem 1.5 (Hardness Amplification - imperfect completeness).
Show that there exist parameters for which distinguishing between the following two cases is ETH-hard:
There exists of size such that .
All of size have .
Beyond quasi-polynomial hardness
Another interesting challenge is to trade the perfect completeness in our main result for stronger notions of hardness. Indeed, there are substantial evidences which suggest that the “sparse vs. very-sparse” regime () is much harder to solve. The gap instance in [BCV12] where all known linear and semidefinite programming techniques fail is a very sparse instance and has integrality gap of . In particular, every vertex has degree , compared to almost linear average degree in our instance. Since no other algorithms succeed in this regime (even in quasi-polynomial time), it is natural to look for stronger lower bounds on the running time.
Problem 1.6 (Trading-off perfect completeness for stronger lower bounds).
Show that there exist parameters for which distinguishing between the following two cases is NP-hard:
There exists of size such that .
All of size have .
Finding Stable Communities
Definition 1.7 (Stable Communities [Bbb13]).
Let with be two positive parameters. Given an undirected graph, , is an -cluster if is :
Internally Dense: , .
Externally Sparse: , .
Currently, only planted clique based hardness is known.
Theorem 1.8 ([Bbb13]).
For sufficiently small (constant) , finding a cluster is at least as hard as Planted Clique.
As insinuated in the introduction, we believe it is plausible and interesting to see whether the hardness assumption of the theorem above can be replaced with ETH.
Problem 1.9 (Hardness of Stable Communities).
Show that for some with , finding an -cluster is ETH-hard.
Throughout the paper we use to denote the density of subgraph ,
2.1 Information theory
In this section, we introduce information-theoretic quantities used in this paper. For a more thorough introduction, the reader should refer to [CT12]. Unless stated otherwise, all ’s in this paper are base-.
Let be a probability distribution on sample space . The Shannon entropy (or just entropy) of , denoted by , is defined as .
Definition 2.2 (Binary Entropy Function).
For , the binary entropy function is defined as follows (with a slight abuse of notation) .
Fact 2.3 (Concavity of Binary Entropy).
Let be a distribution on , and let . Then .
For a random variable we shall write to denote the entropy of the induced distribution on the support of . We use the same abuse of notation for other information-theoretic quantities appearing later in this section.
The Conditional entropy of a random variable conditioned on is defined as
Fact 2.5 (Chain Rule).
Fact 2.6 (Conditioning Decreases Entropy).
Another measure we will use (briefly) in our proof is that of Mutual Information, which informally captures the correlation between two random variables.
Definition 2.7 (Conditional Mutual Information).
The mutual information between two random variable and , denoted by is defined as
The conditional mutual information between and given , denoted by , is defined as
The following is a well-known fact on mutual information.
Fact 2.8 (Data processing inequality).
Suppose we have the following Markov Chain:
where . Then or equivalently, .
Mutual Information is related to the following distance measure.
Definition 2.9 (Kullback-Leiber Divergence).
Given two probability distributions and on the same sample space such that , the Kullback-Leibler Divergence between is defined as (also known as relative entropy)
The connection between the mutual information and the Kullback-Leibler divergence is provided by the following fact.
For random variables and we have
2.2 and the PCP Theorem
In the problem, we are given a graph on vertices, where each of the edges is associated with some constraint function which specifies a set of legal “colorings” of and , from some finite alphabet ( in the term “ ” stands for the “arity” of each constraint, which always involves two variables). Let us denote by the entire instance, and define by the maximum fraction of satisfied constraints in the associated graph , over all possible assignments (colorings) of .
The starting point of our reduction is the following version of the PCP theorem, which asserts that it is NP-hard to distinguish a instance whose value is , and one whose value is , where is some small constant:
Given a instance of size , there is a polynomial time reduction that produces a instance , with size variables and constraints, and constant alphabet size such that
(Completeness) If then .
(Soundness) If then , for some constant
(Balance) Every vertex in has degree for some constant .
Theorem 2.11 (PCP Theorem [Din07]).
In the appendix, we describe in detail how to derive this formulation of the PCP Theorem from that of e.g. [AIM14].
Notice that since the size of the reduction is near linear, ETH implies that solving the above problem requires near exponential time.
Let be as in Theorem 2.2. Then assuming ETH, distinguishing between and requires time .
3 Main Proof
Let be the instance produced by the reduction in Theorem 2.2, i.e. a constraint graph over variables with alphabet of constant size. We construct the following graph :
Let and .
Vertices of correspond to all possible assignments (colorings) to all -tuples of variables in , i.e . Each vertex is of the form where are the chosen variables of , and is the corresponding assignment to variable .
If violates any constraints, i.e. if there is a constraint on in which is not satisfied by , then is an isolated vertex in .
Let and . iff:
does not violate any consistency constraints: for every shared variable , the corresponding assignments agree, ;
and also does not violate any constraints: for every constraint on (if exists), the assignment satisfy the constraint.
Notice that the size of our reduction (number of vertices of ) is .
If , then has a -clique: Fix a satisfying assignment for , and let be the set of all vertices that are consistent with this assignment. Notice that . Furthermore its vertices do not violate any consistency constraints (since they agree with a single assignment), or constraints (since we started from a satisfying assignment).
Suppose that , and let be some constant to be determined later. We shall show that for any subset of size , , where is some constant depending on . The remainder of this section is devoted to proving the following theorem:
If , then of size , for some constant .
Theorem 4.1 (Soundness).
4.1 Setting up the entropy argument
Fix some subset of size , and let be a uniformly chosen vertex in (recall that is a vector of coordinates, corresponding to labels for a subset of chosen variables). Let denote the indicator variable associated with such that if the ’th variable appears in and otherwise. We let represent the coloring assignment (label) for the ’th variable whenever , which is of the form . Throughout the proof, let
denote the ’th prefix corresponding to . We can write :
since . Notice that since and determine each other, and was uniform on a set of size , we have
Thus, in total, the choice of challenge and the choice of assignments should contribute to the entropy of . If much of the entropy comes from the assignment distribution (conditioned on the fixed challenge variables), we will show that must have many consistency violations, implying that is sparse. If, on the other hand, almost all the entropy comes from the challenge distribution, we will show that this implies many CSP constraint violations (implied by the soundness assumption). From now on, we denote
When conditioning on the ’th prefix, we shall write , and similarly for . Also for brevity, we denote
The consistency constraints induce, for each , a graph over the prefixes: the vertices are the prefixes, and two prefixes are connected by an edge if their labels are consistent. (We can ignore the constraints for now — the prefix graph will be used only in the analysis of the consistency constraints.) Formally,
Definition 4.3 (Prefix graph).
For let the -th prefix graph, be defined over the prefixes of length as follows. We say that is a neighbor of if they do not violate any consistency constraints. Namely, for all , if for both and , then and assign the same label .
In particular, we will heavily use the following notation: let be the prefix neighborhood of ; i.e. it is the set of all prefixes (of length ) that are consistent with . For technical issues of normalization, we let , i.e. all the prefixes have self-loops.
Notice that is defined over the vertices of (the original subgraph). The set of edges on is contained in the set of edges of , since in the latter we only remove pairs that violated consistency constraints (recall that we ignore the constraints).
Unless stated otherwise, we always think of prefixes as weighted by their probabilities. Naturally, we also define the weighted degree and weighted edge density of the prefix graph.
Definition 4.4 (Prefix degree and density).
The prefix degree of is given by:
Similarly, we define the prefix density of as:
When it is clear from the context, we henceforth drop the prefix qualification, and simply refer to the neighborhood or degree, etc., of .
Notice that in , the probabilities are uniformly distributed. In particular, , since, as we mentioned earlier, the set of edges in is contained in that of . Finally, observe also that because we accumulate violations, the density of the prefix graphs is monotonically non-increasing with .
We use the following bounds on and many times throughout the proof:
The bound on follows from concavity of entropy (Fact 2.3). For the second bound, observe that is maximized by spreading mass uniformly over alphabet . ∎
We also recall some elementary approximations to logarithms and entropies that will be useful in the analysis. The proofs are deferred to the appendix.
More useful to us will be the following bounds on :
Let , and as specified in the construction. Then,
In particular, this means that most indices should contribute roughly entropy to the choice of .
We will also need the following bound which relates the entropies of a very biased coin and a slightly less biased one:
Let , then
4.2 Consistency violations
In this section, we show that if the entropy contribution of the assignments () is large, there are many consistency violations between vertices, which lead to constant density loss. First, we show that if , then at least a constant fraction of such entropy is concentrated on “good” variables.
Definition 4.11 (Good Variables).
We say that an index is good if
where is a constant to be determined later in the proof.
For any constant , if ,
We want to show that many of the indices have both a large and a large simultaneously. We can write
Because the subgraph is of size , from the expansion of (Fact 4.9),
where the second inequality follows from the concavity of entropy. Plugging into (1), we have
Rearranging, we get
where the second inequality follows from our approximation for (Fact 4.8).
We want to further restrict our attention to ’s for which is at least (aka good ’s). Note that the above inequality can be decomposed to
Now via a simple sum bound,
Rearranging, we get,
By Cauchy-Schwartz we have:
Finally, since ,
In the same spirit, we now define a notion of a “good” prefix. Intuitively, conditioning on a good prefix leaves a significant amount of entropy on the ’th index. We also require that a good prefix has a high prefix degree; that is, it has many neighbors it could potentially lose when revealing the -th label.
Definition 4.13 (Good Prefixes).
We say is a good prefix if
where , with an an arbitrarily small constant that denotes the fraction of assignments that disagree with the majority of the assignments, factor, and a constant that satisfies , with .
In the following claim, we show that these prefixes contribute some constant fraction of entropy, assuming that our subset is dense.
If , where and , then for every good index , it holds that
We begin by proving that most prefixes satisfy the degree condition of Definition 4.13. Let be popular if is a good variable and its degree in the prefix graph is at least . Recall that (by Observation 4.5). Thus by Markov inequality, at most -fraction of the prefixes are unpopular.
Let be the indicator variable for being popular. For the sake of contradiction, suppose that more than -fraction of the -mass is concentrated on unpopular prefixes, that is:
We would like to argue that this condition implies that the distribution on the ’s is highly biased by the conditioning on the (popularity of the) prefix; this in turn implies that , the expected conditional entropy of , must be low, contradicting the assumption that is good. Indeed, by data-processing inequality (Fact 2.8),
Since we can write mutual information as expected KL-divergence (Fact 2.10), and KL-divergence is non-negative, we get
where the second inequality follows from the fact that for all good ’s, our degree assumption implies , and our assumption in (3) implies, via Bayes rule, that , and therefore . Note that by our setting of parameters .
Plugging into (4) we have:
On the other hand, recall that since is good, . Recall also that , and therefore . Thus, we get a contradiction to (3). From now on we assume
This implies that even if the assignment is uniform over the alphabet, the contribution to from unpopular prefixes is small:
Using a similar argument, we show that for any popular , most of the mass is concentrated on its neighbors. Consider any popular , and let denote the complement of . Then we can rewrite as:
Notice that since is popular, has measure at most . Thus, if an -fraction of the mass is concentrated on , we once again (like in (5)) have
which would again yield a contradiction to being a good variable. Therefore every popular prefix also satisfies the -weighted condition on the degree:
Recall that a prefix is good if it also satisfies . Fortunately, prefixes that violate this condition (i.e. those with small ), cannot account for much of the weight on :
Since is good and , this implies:
where last inequality follows from being good. ∎
For every good index ,
Lemma 4.16 (Labeling Entropy Bound).
If , then .
Assume for a contradiction that . For prefix , let denote the induced distribution on labels to the -th variable, conditioned on and . (If , take an arbitrary distribution.) After revealing each variable , the loss in prefix density is given by the probability of “fresh violations”: the sum over all prefix edges of the probability that they assign different labels to the -th variable: