Maximum Cliques in Graphs with Small Intersection Number and Random Intersection Graphs
In this paper, we relate the problem of finding a maximum clique to the intersection number of the input graph (i.e. the minimum number of cliques needed to edge cover the graph). In particular, we consider the maximum clique problem for graphs with small intersection number and random intersection graphs (a model in which each one of labels is chosen independently with probability by each one of vertices, and there are edges between any vertices with overlaps in the labels chosen).
We first present a simple algorithm which, on input finds a maximum clique in time steps, where is an upper bound on the intersection number and is the number of vertices. Consequently, when the running time of this algorithm is polynomial.
We then consider random instances of the random intersection graphs model as input graphs. As our main contribution, we prove that, when the number of labels is not too large (), we can use the label choices of the vertices to find a maximum clique in polynomial time whp. The proof of correctness for this algorithm relies on our Single Label Clique Theorem, which roughly states that whp a “large enough” clique cannot be formed by more than one label. This theorem generalizes and strengthens other related results in the state of the art, but also broadens the range of values considered (see e.g. [S95] and [BTU08]).
As an important consequence of our Single Label Clique Theorem, we prove that the problem of inferring the complete information of label choices for each vertex from the resulting random intersection graph (i.e. the label representation of the graph) is solvable whp; namely, the maximum likelihood estimation method will provide a unique solution (up to permutations of the labels). Finding efficient algorithms for constructing such a label representation is left as an interesting open problem for future research.
A clique in an undirected graph is a subset of vertices any two of which are connected by an edge. The cardinality of the maximum clique is called the clique number of . The problem of finding the maximum clique in an arbitrary graph is fundamental in Theoretical Computer Science and appears in many different settings. As an example, consider a social network where vertices represent people and edges represent mutual acquaintance. Finding a maximum clique in this network corresponds to finding the largest subset of people who all know each other. More generally, the analysis of large networks in order to identify communities, clusters, and other latent structure has come to the forefront of much research. The Internet, social networks, bibliographic databases, energy distribution networks, and global networks of economies are some of the examples motivating the development of the field.
It is well known that determining the clique number of an arbitrary graph is NP-complete [K72]. In fact, the fastest algorithm known today runs in time [R01], where is the number of vertices in the graph. Moreover, the best known approximation algorithm for the clique number has a performance guarantee of [F04] (there are algorithms with better approximation ratios for graphs with large clique number; see e.g. [AK98]). Even though this approximation ratio appears to be weak at first glance, there are several results on hardness of approximation which suggest that there can be no approximation algorithm with an approximation ratio significantly less than linear (see e.g. [H99]). It was also shown in [CHKX06] that, if is the clique number, then the clique problem cannot be solved in time , unless the exponential time hypothesis fails (note that the brute force search algorithm runs in time , which seems quite close).
The intractability of the maximum clique problem for arbitrary graphs lead researchers to the study of the problem for appropriately generated random graphs. In particular, for Erdős-Rényi random graphs (i.e. random graphs in which each edge appears independently with probability ), there are several greedy algorithms that find a clique of size about with high probability (whp, i.e. with probability that tends to 1 as goes to infinity), see e.g. [GM75, K76]. Since the clique number of is asymptotically equal to with high probability, these algorithms approximate the clique number by a factor of 2. In fact, it was conjectured that finding a clique of size (for a constant ), with probability at least , would require techniques beyond the current limits of complexity theory. This belief was strengthened by the fact that the Metropolis algorithm also fails to find the maximum clique in (see [J92]). A more dramatized version of the above conjecture was presented in [J92], stating that the problem of finding an clique remains hard even if the input graph is a random graph in which we have planted a randomly chosen clique of size . This conjecture has some interesting cryptographic consequences, as shown in [JP98]. It also seems tight, since finding the maximum clique in the case where the planted clique has size at least can be done in polynomial time by using spectral properties of the adjacency matrix of the graph (see [AKS98]). We finally note that there are quite a few nice results concerning generalizations of the planted clique problem in various (quite general) random graphs models (see e.g. [C06, CL09]).
1.1 Our Contribution
In this work, we complement the state of the art by relating the maximum clique problem to the intersection number of the input graph (i.e. the minimum number of cliques that can edge cover ). In particular, we consider the maximum clique problem for graphs with small intersection number and random intersection graphs.
More analytically, we begin by considering arbitrary graphs with small intersection number. We present a simple algorithm which, on input finds a maximum clique in time steps, where is an upper bound on the intersection number of and is the number of vertices. Consequently, when the running time of this algorithm is polynomial. We note here that computing the exact value of the independence number of is itself an NP-complete problem, but this knowledge is only needed in the analysis of the algorithm.
We then consider random instances of the random intersection graphs model (introduced in [KSS99, S95]) as input graphs. In this model, denoted by , each one of labels is chosen independently with probability by each one of vertices, and there are edges between any vertices with overlaps in the labels chosen. Random intersection graphs are relevant to and capture quite nicely social networking. Indeed, a social network is a structure made of nodes (individuals or organizations) tied by one or more specific types of interdependency, such as values, visions, financial exchange, friends, conflicts, web links etc. Social network analysis views social relationships in terms of nodes and ties. Nodes are the individual actors within the networks and ties are the relationships between the actors. Other applications include oblivious resource sharing in a (general) distributed setting, efficient and secure communication in sensor networks [NRS11], interactions of mobile agents traversing the web etc. Even epidemiological phenomena (like spread of disease) tend to be more accurately captured by this “interaction-sensitive” random graph model.
As our main contribution, we prove that, when the number of labels is not too large, we can use the label choices of the vertices to find a maximum clique in polynomial time (in the number of labels and vertices of the graph). Most of the work in this paper is devoted in proving our Single Label Clique Theorem (Theorem LABEL:singlelabelclique in Section 4). Our proof technique is original and employs a probabilistic contradiction argument. The theorem states that when the number of labels is less than the number of vertices, any large enough clique in a random instance of is formed by a single label. This statement may seem obvious when is small, but it is hard to imagine that it still holds for all “interesting” values for (see also the discussion in Section 2). Indeed, when , by slightly modifying an argument of [BTU08], we can see that almost surely has no cycle of size whose edges are formed by distinct labels (alternatively, the intersection graph produced by reversing the roles of labels and vertices is a tree). On the other hand, for larger a random instance of is far from perfect111A perfect graph is a graph in which the chromatic number of every induced subgraph equals the size of the largest clique of that subgraph. Consequently, the clique number of a perfect graph is equal to its chromatic number. and the techniques of [BTU08] do not apply (for a more thorough discussion see the beginning of Section 4). By using the Single Label Clique Theorem, we provide a tight bound on the clique number of when . A lower bound in the special case where is constant, was given in [S95]. We considerably broaden this range of values to also include vanishing values for and also provide an asymptotically tight upper bound.
We claim that our proof also applies for , provided is not too small. We should note here that in [FSS00] the authors prove the equivalence (measured in terms of total variation distance) of random intersection graphs and Erdős-Rényi random graphs, when . This bound on the number of labels was improved in [R11], by showing equivalence of sharp threshold functions among the two models for . In view of these results, we expect that our work will shed light also in the problem of finding maximum cliques in Erdős-Rényi random graphs.
Finally, as yet another consequence of our Single Label Clique Theorem, we prove that the problem of inferring the complete information of label choices for each vertex from the resulting random intersection graph (i.e. the label representation of the graph) is solvable whp; namely, the maximum likelihood estimation method will provide a unique solution (up to permutations of the labels).222More precisely, if is the set of different label choices that can give rise to a graph , then the problem of inferring the complete information of label choices from is solvable if there is some such that , for all . In particular, given values and , such that , and given a random instance of the model, the label choices for each vertex are uniquely defined. Finding efficient algorithms for constructing such a label representation is left as an open problem for future research.
1.2 Organization of the paper
In Section 2 we formally define random intersection graphs. We also provide some useful definitions and notation which are used throughout the paper. The relation of the intersection number to the clique number of an arbitrary graph is discussed in Section 3. Section 4 is devoted to the proof of our Single Label Clique Theorem for random intersection graphs. The consequences of our main theorem concerning the efficient construction of a maximum clique and the uniqueness of the label representation of are presented in Section LABEL:labelreconstruction. Finally, we discuss the presented results and further research in Section LABEL:conclusions.
2 Definitions and Preliminaries
The formal definition of the random intersection graphs model is as follows:
Definition 1 (Random Intersection Graph - [Kss99, S95])
Consider a universe of elements and a set of vertices . Assign independently to each vertex a subset of , choosing each element independently with probability and draw an edge between two vertices if and only if . The resulting graph is an instance of the random intersection graphs model.
In this model we also denote by the set of vertices that have chosen label . Given , we will refer to as its label representation. Consider the bipartite graph with vertex set and edge set . We will refer to this graph as the bipartite random graph associated to . Notice that the associated bipartite graph is uniquely defined by the label representation.
It follows from the definition of the model that the edges in are not independent. In particular, the (unconditioned) probability that a specific edge exists is . Therefore, if goes to infinity with , then this probability goes to 1. In the paper, we will thus consider the “interesting” range of values (i.e. the range of values for which the unconditioned probability that an edge exists does not go to 1). Furthermore, as is usual in the literature, we will assume that the number of labels is some power of the number of vertices, i.e. , for some .
The following definitions will also be useful:
Definition 2 (Intersection number)
The intersection number of a graph is the smallest number of cliques needed to cover all of the edges of .
Equivalently, the intersection number is the smallest number of elements in a representation of as an intersection graph of finite sets.
Definition 3 (Edge clique cover)
A set of cliques is an edge clique cover of a graph if for every edge there is at least one clique such that and for every non edge , there is no such clique in .
Therefore, the intersection number of is the minimum such that is an edge clique cover of .
We use the convention that the random intersection graphs model is denoted by (i.e. with a calligraph ), while a specific random instance of the model is denoted by (i.e. with a simple ).
For a vertex , we denote by the set of neighbors of in . We will say that two vertices belong to the same closed neighborhood in and we will write if and only if .
Let denote a partition of the vertex set of a graph and let . We will denote by the unique set inside that contains , that is .
Throughout the paper, we make use of the well known asymptotic notation and . Furthermore, we use the relation “” for asymptotically equal. In particular, if are two functions of , then means that or equivalently .
3 An Algorithm for Maximum Clique
In this section we consider arbitrary graphs as input graphs for the maximum clique problem. In particular, we relate the running time of the following algorithm to the intersection number of the input graph .
An example of how the graph is constructed (in step 6) for a specific graph is shown in Figure 1. Notice that has five closed neighborhoods (whereas its intersection number is 3), which are shown in dashed squares, so the graph has 5 vertices. The corresponding clique of that maximizes is .
3.1 Analysis of FIND_MAX-CLIQUE
We first present the following lemma that concerns basic properties of the relation .
The closed neighborhood relation is an equivalence relation with the following properties:
It is an equivalence relation which partitions the vertex set in equivalence classes called closed neighborhoods.
A closed neighborhood is a clique. Two closed neighborhoods either form a clique, or no edge between their vertices exists.
Proof. (1) The fact that is an equivalence relation follows directly by its definition. Therefore, every vertex belongs to exactly one equivalence class (i.e. exactly one closed neighborhood).
(2) By definition, a closed neighborhood forms a clique. Let now be two distinct closed neighborhoods and let . Suppose that there is an edge between and in , i.e. . Consider now any two vertices (including ). By definition of the closed neighborhood relation, we must have that . Since the close neighborhoods are disjoint, this means that . Therefore, either every edge between and appears in , and forms a clique, or no edge between them exists. This completes the proof.
We now prove the following theorem about the correctness of the Algorithm FIND_MAX-CLIQUE.
Theorem 3.1 (Correctness)
FIND_MAX-CLIQUE correctly outputs a maximum clique in .
Proof. Notice that, by the second part of Lemma 1 and by construction of , any clique in corresponds to a clique in .
Therefore, we only need to show that a maximum clique of corresponds to a clique in , because then the algorithm will be able to find it in step 7. Equivalently, we need to show that there are closed neighborhoods which constitute a partition of , that is . Indeed, by construction of , the vertices in that correspond to these closed neighborhoods will form a clique in (any choice of two vertices will be connected).
To prove the above, let be a closed neighborhood that has at least one common vertex with , i.e. . Then, by definition of the relation, every vertex is connected to and to all the vertices that is connected to (including all vertices in ). Therefore, by maximality of , all the vertices in must be contained in the maximum clique, i.e. . Consequently, a closed neighborhood is either entirely contained in , or disjoint from it. By the first part of Lemma 1, we can then partition using all the closed neighborhoods that have common vertices with . This completes the proof.
The following result relates the running time of Algorithm FIND_MAX-CLIQUE to the intersection number of its input graph .
Theorem 3.2 (Efficiency)
Let be a graph with intersection number . Then FIND_MAX-CLIQUE on input finds a maximum clique in time steps.
Proof. By definition, since the intersection number of is , there is a set of cliques that is an edge clique cover of . For a vertex , we denote by the set of cliques in that include . Notice then that if , then not only are and connected, but they also have the exact same set of neighbors in , i.e. .
Given now a specific edge clique cover , there are at most different ways in which we can construct a set . Consequently, there are at most distinct closed neighborhoods in which constitute a partition of the set of non-isolated vertices. Note also that determining whether or not for any two vertices requires steps. Therefore, steps 2 to 5 needed for partitioning the vertex set in closed neighborhoods in the algorithm require time.
From the above, we also conclude that the number of vertices in is at most . Therefore, the time needed to construct in step 6 in the algorithm is . Finally, there are at most subsets of vertices in , so step 7 in the algorithm takes time. This completes the proof.
Note that the algorithm does not need the actual value of the independence number. We only use this information for bounding its running time. The following is a direct consequence of Theorem 3.2.
Let be an upper bound on the independence number of an arbitrary undirected graph on vertices. Then there is an algorithm that finds the maximum clique of in time .
As a final remark, since the intersection number of is at most (but could be even less), the above result also holds for any random instance of the random intersection graphs model with at most labels.
4 Clique number for
In this section we give a tight bound on the clique number of when . A lower bound in the special case where is constant, was given in [S95]. We considerably broaden this range of values to also include vanishing values for and also provide a tight upper bound.
We will also assume, without loss of generality, that . Indeed, when , by slightly modifying an argument of [BTU08], we can see that almost surely has no cycle of size whose edges are formed by distinct labels. Therefore, the maximum clique of when , is formed by exactly one label. As a matter of fact, if is the set of vertices that have chosen label , then the maximum clique is equal to , where . Furthermore, since is chordal whp (see Lemma 5 in [BTU08]), the maximum clique can be found in polynomial time.
We stress out the fact that the techniques employed to provide the algorithmic and structural results in [BTU08] cannot be used in the case where . In particular, is far from perfect, especially in the the case (which is included in the range of values that we study here). An intuitive justification is as follows: when , then the size of the label sets of every vertex are highly concentrated around their mean value . Therefore, the statistical behavior of is expected to be similar to the statistical behavior of uniform random intersection graphs , in which each vertex selects exactly labels from . It was proved in [TCS10] (part (iii) in Corollary 2), that the size of the maximum independent set when and , is asymptotically equal to . Therefore, when , the size of the maximum independent set in will be around , so its chromatic number will be . However, as can be seen in Corollary LABEL:maxcliquesize (which is a direct consequence of our main theorem), the size of the maximum clique in when and is asymptotically equal to . This is much smaller than the lower bound on the chromatic number in the case . Therefore, is far from perfect in this range of values.
We first provide some concentration results concerning the number of vertices that have chosen a particular label and the number of vertices that have chosen two particular labels.
Let be a random instance of the random intersection graphs model with and . Then the following hold:
Let be the set of vertices that have chosen label . Then
Let also denote the set of labels that were chosen by vertex . Then
For the first part, fix a label . Notice that is a binomial random variable with parameters , i.e. . By Chernoff bounds, for any , we have that
Setting and noting that , we then have that and the lemma follows from Boole’s inequality.
For the second part, fix a vertex . Notice that is a binomial random variable with parameters , i.e. . By Chernoff bounds, for any , we have that
Setting and using Boole’s inequality we get the desired result.
Notice that the above lemma provides a lower bound on the clique number. However, a clique in can be formed by combining more than one label. Clearly, a clique which is not formed by a single label will need at least 3 labels, since 2 labels cannot cover all the edges needed for to be a clique. In the discussion below, we will provide a much larger lower bound on the number of labels needed to form a clique of size which is not formed by a single label. The following definition will be useful.
Denote by the event that there are two disjoint sets of vertices , where and such that the following hold:
All vertices in have chosen some label , i.e. .
None of the vertices in has chosen , i.e. .
Every vertex in is connected to every vertex in .
As a warm-up, we prove the following technical lemma, which is a first indication that in a graph, whp we cannot have too large and too small at the same time. This lemma will also be used as a starting step in the proof of our main theorem.
Let be a random instance of the random intersection graphs model with and and . Then, for any , .