Percolation transition and distribution of connected components in generalized random network ensembles
Abstract
In this work, we study the percolation transition and large deviation properties of generalized canonical network ensembles. This new type of random networks might have a very rich complex structure, including high heterogeneous degree sequences, nontrivial community structure or specific spatial dependence of the link probability for networks embedded in a metric space. We find the cluster distribution of the networks in these ensembles by mapping the problem to a fully connected Potts model with heterogeneous couplings. We show that the nature of the Potts model phase transition, linked to the birth of a giant component, has a crossover from second to first order when the number of critical colors in all the networks under study. These results shed light on the properties of dynamical processes defined on these network ensembles.
pacs:
00.00, 20.00, 42.101 Introduction
Recently the study of critical phenomena in complex networks has attracted a great deal of interest [Dorogovtsev]. . One of the main critical phenomena occurring in networks is the percolation transition which is a continuous structural phase transition that can be characterized by critical indices as a statistical mechanics secondorder phase transition. This phase transition determines the robustness properties of complex networks [MR, Attack, Cohen1, Cohen2] and the critical temperature of the Ising [Ising1, Ising2, Ising3] and XY models [Isaac, Coolen] on complex networks. Moreover, the onset of a percolating cluster determines a transition in between a phase in which small loops are suppressed and a phase in which the expectation value of small loops is positive in the limit of large network sizes [Noh].
The percolation phase transition in Erdös and Renyi networks is a classic subject of graph theory [Bollobas]. For this network ensembles the large deviation of the number of connected components (or clusters) has been characterized [Monasson] by a mapping of the problem to a fully connected Potts model [Fortuin].
In uncorrelated complex networks, characterized by a nonPoisson degree distribution, the percolation transition depends on the second moment of the degree distribution [Cohen1, Cohen2] and can show non trivial critical exponents [Dorogovtsev].
This phase transition has been also studied in directed networks [Boguna] and in networks with degreedegree correlations [Doro2].
In this paper we study the percolation properties and the large deviation of the cluster distribution of the recently proposed generalized canonical random network ensembles [entropy1, entropy2] with non trivial degree distribution and an additional community structure or spatial structure. These networks ensembles can be cast in the wide category of Configuration or “hidden variable” models extensively study in the recent literature [MR, hv1, hv2, hv3, hv4, hv5] . The percolation properties and the large deviations of the cluster distribution in these ensembles are studied in this paper by mapping the problem to a fully connected Potts model with heterogeneous couplings. We find results in agreement with reference [Lee] where the Potts model formulation was first used for the study of the percolation properties of complex networks with heterogeneous degrees. In particular our framework generalize the results of [Lee] and can be applied in network ensembles with very diverse structure, not only network ensembles with heterogeneous degree distribution, but also network ensembles with an additional nontrivial community or spatial structure.
The paper is organized as follows. In section 2 we introduce the generalized canonical random ensembles. In section 3 we introduce the generating functions for the cluster distribution and we characterize its large deviations. In section 4 we relate the problem of finding the cluster distribution in generalized canonical model, and their percolation transition, to the study of a fully connected Potts model with heterogeneous couplings. In section 5 we solve the fully connected Potts model with heterogeneous couplings and we find the percolation threshold and critical exponent for the generalized canonical network ensembles. In section 6 we find the cluster distribution in the generalized canonical network ensembles. In section 7 we compare our theoretical predictions with simulation results. Finally in section 8 we give the conclusions.
2 Random network ensembles
In this section we introduce the generalized random ensembles described in [entropy1, entropy2]. The generalized random ensembles are an extension of the known and random network ensembles and are related to Configuration and ”hidden variable” ensembles [MR, hv1, hv2, hv3, hv4, hv5].
2.1 The and random network ensembles
The mathematical literature has widely studied the properties of the and random network ensembles.

A random network in the ensemble is a network having nodes and undirected links. If we indicate with the adjacency matrix of the network (with if there is a link between node and and otherwise), the probability that a network , associated to the adjacency matrix , belongs to the ensemble is given by
(1) with
(2) and with the indicating the Kronecker delta. The probability of each link in this ensemble of networks is given by .

A network in the ensemble is a network in which each possible pair of links is present with probability . Therefore the probability of a specific network in this ensemble is equal to
(3) where is the adjacency matrix. In the ensemble the total number of links is not fixed but is Poisson distributed with mean .
The and the ensemble with are linked by a Legendre transform, and, in the asymptotic limit of , they share the same statistical properties.
2.2 Generalized random network ensembles
Recently a statistical mechanics approach has been proposed [entropy1, entropy2] that is able to generalize the random networks ensembles to network ensembles with much more complex structure including networks with highly heterogeneous degree sequences and non trivial community structure or spatial dependence of the link probability. The statistical mechanics approach is able to describe both ”microcanonical” network ensembles (that satisfy hard structural constraints and generalize the random ensembles) and ”canonical” network ensembles (that satisfy the structural constraints when their properties are averaged over the whole ensemble and generalize the ensemble).

The ”microcanonical” networks have to satisfy a series of hard constraints and the probability of these networks are given by
(4) with indicating the cardinality of the ensemble. The probability of each link is computed introducing some Lagrange multipliers [entropy1, entropy2].

The ”canonical” conjugated ensemble can be built starting from the probability of the links in the “microcanonical” one. We assign to each network the probability
(5) which generalizes (3) to heterogeneous networks. In the ”canonical” ensembles the structural constraints are satisfied on average
(6) Here and in the following we always indicate by the average over the ensemble probability given by and with the average over all the nodes .
In this paper we focus on generalized ”canonical” networks. Each node in this ensemble is characterized by two discrete hidden variables and . We consider in this paper the link probability given by
(7) 
and is fully specified once the function is given. The link probability (7) corresponds to maximally entropic ensembles with given degree structural constraints [entropy1, entropy2].
In the ensembles described by (7), the degree of each node is a Poisson variable [hv5] with average
(8) 
In the following we specifically comment on some relevant limiting cases for the general distribution .

The ensemble
If the values of the hidden variables ’s are equal, i.e. and , the probability of a link is given by(9) The degree of each node is a Poisson variable with equal average . Performing also the average over all the nodes of the network we get
(10) We recover therefore the Erdös and Renyi ensemble by taking
(11) where the last expression is valid for sparse networks with finite.

The Configuration model
If the linking probability of equation depends only on and , (i.e. ), then(12) This ensemble is the canonical version of the Configuration model each node having a degree distributed according to a Poisson variable with average
(13) This ensemble has in general nontrivial degree degree correlations that disappears for . In this last case, the linking probability defined in equation can be approximated as
(14) Therefore in this limit the networks of the ensemble are uncorrelated and there is a simple relation between the hidden variables and the average degree of the node , i.e.
(15) Finally we observe that if we use the linking probability can be expressed in the well known expression for uncorrelated networks
(16) 
Structured networks
In the more general structured case we have two possibilities:
i) The index with can indicate the community of a node and the function can be a matrix. In this case the number of links between the community and the community will be distributed according to a Poisson distribution with average
(17) 
ii) The index can indicate a position in a metric space which determine the link probability. In this case the function is a vector depending only on the metric distance , i.e. .
For structured networks with a generic distribution of ’s and a non trivial function of we can consider the limit when the . In this limit the linking probability given by equation reduces to the simple form
(18) and we have
(19) with .

3 Large deviation of the cluster distribution
The number of connected components or ”clusters” of a network gives direct information on the topological structure of the network and their percolating properties. Indeed if is small there are few large connected components while in the opposite case the network is divided into a huge number of small clusters. In the limit of large network sizes each canonical generalized network ensemble will be characterized by a typical value of the number of clusters . The typical distribution of clusters gives the percolating properties of the networks belonging to the ensemble and will be able to characterize the critical exponents of the percolation phase transition. Moreover different network realizations of a generalized canonical ensemble will have a number of clusters which is subject to large deviations with respect to the typical value .
Given the probability of a network in the canonical generalized random ensembles, as defined in equation , we can define the probability density of generating a random network in this ensemble with clusters as in the following:
(20) 
In the thermodynamic limit, , the probability is centered at some typical value and decays extremely fast away from in the large networks limit. Let us indicate with the number of connected components per vertex, the typical value of this quantity converges in the thermodynamic limit to a size independent value . Therefore, in order to characterize in the thermodynamic limit, we consider the function defined as
(21) 
implying clearly for all
and .
Finally we introduce the generating function of the cluster probability
(22) 
where in the last expression we have used equation defining the generalized random ensembles. We characterize the asymptotic limit of the cluster generating function by the defined as
(23) 
From equation we obtain, with a saddle point calculation, that the conjugated Legendre transform of the quantity can be expressed in terms of according to the relation
(24) 
The cluster distribution is therefore fully characterized in the asymptotic limit if we know the function .
4 The fully connected heterogeneous Potts Model and the Percolation transition of the generalized random networks ensembles
In this section we will reduce the problem of finding the cluster distribution in generalized canonical random ensembles to the study of a meanfield Potts Models with heterogeneous couplings. We will prove that , given by , has a formal relation with the free energy of the mean field Potts model with heterogeneous couplings, after a suitable analytic continuation. This relation generalizes the known connection between the fully connected Potts model and the generating function of the cluster distribution of a random network [Fortuin, Monasson].
In order to present the results of the paper in a selfcontained way we describe here the cluster expansion of the fully connected Potts model. The Potts model is a well known statistical mechanical problem [Potts_rev] describing classical degrees of freedom associated to the nodes of a given network. Each variable can take different values, namely , and is coupled to all the other degrees of freedom by means of a twobody interaction of strength . This interaction favors configurations where all the nodes in the network have the same value of . Thus the energy reads
(25) 
where we assume that all the couplings are positive, and that the first sum in runs over all the pairs of nodes of a fully connected network. Moreover, we take the auxiliary field parallel to the direction . The partition function of the model is
(26) 
where is the inverse temperature and the summation runs over all spin configurations. In order to map the Potts model to the cluster structure of the generalized random network ensembles, we expand the partition function following the article [Fortuin]
(27) 
where we have defined
(28) 
Expanding equation (27) we obtain
(29)  
Each term in the expansion corresponds to a possible network formed by a subset of edges on the complete network. Each contribution from a network is weighted by the probability and the sum is made over all possible networks of nodes. Using this expansion, after performing the sum over the configurations , we can write the partition function reported in , in the form:
(30) 
with given by the set of all edges in , given by the number of connected components in the network and denotes the size of the th component. From the previous equation it follows that in absence of external field
(31) 
By comparing the definition of the cluster generating function and the expression for the partition function of the Potts Model, we observe that the two functions can be related by the following simple expression:
(32) 
and the associated logarithmic function reads
(33) 
where and is defined at null external field . In the high temperature limit the couplings given by are linked to the edge probability by means of the equation (28) so that
(34) 
Therefore in order to find the cluster generating function we can simply
solve the fully connected Potts model with heterogeneous couplings. Any assumption on the network ensemble will have a direct counterpart
on the structure of the couplings in the Potts model.
We will solve the model in this framework, specializing the results
for the cases of our interest . Using equation
we obtain
(35) 
In the various different cases under study the function takes different values:

The ensemble
For the characterization of the cluster distribution of a Poisson random network in the ensemble with we take(36) for all pairs .

The Configuration model
For the characterization of the cluster distribution in the Configuration model we take(37) In the case of an uncorrelated network we have and we can express the hidden variables in terms of the expected average degree , as Consequently the couplings of the Potts model take the form
(38) 
For the characterization of the cluster distribution in structured network ensemble with community structure or spatial dependence on the embedding geometric space, we have
(39) In the case in which the previous equation simplifies
(40)
For the properties of the partition function (31) are in correspondence with the percolation properties [Fortuin] of the generalized canonical network ensembles with linking probabilities given by (7). We will sketch the proof following [Lub]. It is straightforward that in the limit , the partition function so that
(41) 
We could choose the parameter , so that the external field favors the state, the partition function reported in (30) simplifies
(42) 
Using the fact that , where is the number of nodes in the same cluster and the number of clusters with nodes, we obtain the previous equation becomes
(43) 
Performing the summation over the graphs with a saddle point approximation, we obtain in the thermodynamic limit the equation (41) is
(44) 
where and . Differentiating the previous equation with respect to the external field we obtain that the node probability to be in the percolating cluster is linked to the free energy function of the Potts model in the limit
(45) 
While the second derivative gives the mean clusters per nodes. Using the Potts model, we are also able to compute the probability two given nodes belong to the percolating component. Let us introduce the nodenode correlation in the limit
(46) 
that measures the probability two nodes have the same colour. We could easily compute this quantity and we obtain
(47) 
where is the indicator function: if node and are in the same cluster it has the value one, otherwise it vanishes. We want to underline the fact that the probability two nodes are in the same nonpercolating component is defined through the following relation
(48) 
This shows how solving the Potts model in limit, gives us information on the percolating transition in generalized network ensemble.
5 Free energy of the Potts model and the percolation phase transition
In order to solve the meanfield Potts model we introduce the order parameters
(49) 
where
(50) 
are the number of nodes with a given hidden variables and . The order parameters satisfy their proper normalization
(51) 
The energy of the Potts model in absence of external field , expressed in terms of the order parameters , takes the form
(52) 
where we have explicitly shown the dependence of the coupling from external parameters and . In order to express the partition function as a sum over the collective variables , we need to take into account the entropic contribution, counting the number of microscopic configuration with a given value of . To the leading order in we get
(53) 
where the free energy density functional reads
(54)  
In the large limit one can evaluate the sum in (53) by the saddlepoint method. As a function of , the Potts model undergoes a phase transition. For the order parameter is invariant under the permutation of the spin values . Nevertheless above the percolation transition, for the ground state breaks the symmetry of the Hamiltonian.
5.1 Symmetric saddle point
The free energy of the Potts model is invariant under the permutation of the colors. When this symmetry is also shared by the ground state, the fraction of nodes of a given color could be written as
(55) 
which ensures different colors to be identical. Inserting this ansatz in equation (54) we get
(56) 
Computing the second order derivative of the free energy density functional, we can study the stability of the symmetric solution. When the eigenvalue of the Hessian Matrix of the free energy changes sign and becomes negative the ansatz (55) is no more correct. The Hessian matrix reads
(57)  
and the related eigenvalue problem is
(58) 
where the quantity is defined as
(59) 
Inserting equation (58) into (59), we find
(60) 
defining the eigenvalues of the Hessian matrix in (57). In order to obtain the critical values for the external parameters that cause instability in the free energy density, we have to find when eigenvalues change sign. Upon imposing we find this condition is
(61) 
In the general case , the stability condition can be expressed as
(62) 
with indicating the maximal eigenvalue of the matrix
(63) 
In the following we study in detail the critical point defined by and in few relevant cases of the generalized network ensembles.

The ensemble
In the special case of the networks in the ensemble networks with a delta like distribution , the critical point for percolation provided by the expressions and is the well known percolation condition for a random network 
The Configuration model
In the case of Configuration model the couplings factorize, . The stability condition becomes(64) In the case in which the network is uncorrelated we have and the degree of a node is a Poisson variable with average . The critical point can be then expressed in terms of the actual degree of the canonical Configuration ensemble as
(65) In the typical case limit, i.e. , the previous equation corresponds to the condition for the percolation transition in Configuration networks [Cohen1, Noh, Boguna].

Structured networks
In the general case of the structured networks the complete eigenvalue problem in equation and equation have to be solved on a case by case basis in order to find the percolation critical point.Nevertheless in the following we presents two simple cases in which the problem can be simplified.

First case
We present a case in which a perturbative analysis can give good approximation to the critical point. The case under study is the case in which the network has a detailed structure made of different communities labeled with an index and . Each community has well defined features such as the average degree and the number of links shared with other communities. This naturally leads to an interaction between nodes which depends on the community they belong to, encoded in the following matrix(66) In this hypothesis the matrix takes the form
(67) where we indicated with the average over one single component . The eigenvalue problem that we have to solve to find the critical point of the Potts model can be solved perturbatively in the limit . In this case the matrix is
(68) where is a diagonal matrix and has vanishing diagonal elements
(69) It is well known in perturbation theory for non degenerate states, that the eigenvalues of this problem show second order corrections to the diagonal entries in the parameter . Finally we obtain that the onset of instability occurs when the following relation is satisfied
(70) This set of coupled equations reduce to the value found in the Configuration model, i.e. when there is only one single community. Here we report the condition for the leading term in that has the following form
(71) We want to underline the new percolation condition becomes
(72) meaning that the percolation transition depends strongly on the number of links of the most connected community.
Whenever different communities have the same distribution i.e. the same second moment , we are able to perform the calculation exactly and the critical value reads(73) 
Second case
The second case that we consider is formed by sparse structured networks with the couplings taking the expression that we write here for convenience(74) In the further approximation that the density of nodes with ”hidden variables” and are factorisable, i.e. we can simplify the eigenvalue problem , to find the critical point of the Potts model as
(75) where is the maximal eigenvalue of the matrix defined as
(76)

5.2 Asymmetric saddle point
Below the phase transition the symmetric solution is no more stable, as shown in the previous section. In the stationary state of the Potts model a giant component appears, and a more complicated saddle point has to be found. Due to the fact that one single color becomes dominant, generalizing for similar ansatz made for the Potts model with homogeneous couplings [Monasson], the following ansatz on the parameter is proposed
(77) 
And thus the density functional free energy reads
(78)  
where we have to minimize over the variational parameters . Solving the equation
(79) 
we finally obtain the self consistent condition for the parameter we solved numerically
(80) 
with given by
(81) 
Therefore equation can be expressed as a close expression for , which is the order parameter for the Potts phase transition. In particular we find
(82) 
The solution of this equation is for and develops a non zero solution for . The transition can be continuous or discontinuous. In all the ensembles studied in this paper, signs the crossover between a second order phase transition and a first order one. This could be understood in the general framework of Landau Theory. As it is well known, the Hamiltonian of the Potts model is invariant under the permutation symmetry of the colors, which in the case is accidentally equivalent to the symmetry. In equation (78) it is easy to show the free energy is explicitly even under the transformation when , while for higher , the free energy density contains all possible powers of the order parameter . As a consequence, within the Landau Theory, the property of the free energy for necessary reflects into a continuous phase transition at least in absence of an external field. Thus on general ground we expect the crossover from second to first order transition could occur only at the value independently on the network we choose. If we expand to the first order in we get the equation
(83) 
with the matrix given by . We recover therefore the same critical point , where is the maximal eigenvalue of the matrix as was found by studying the stability of the Potts model above the phase transition.
For the case of the percolation transition, i.e. [Fortuin] we have a continuous phase transition and we can study equation for small values of to find the critical exponents of the percolation transition.

The ensemble
In this case the order parameter is independent on and the selfconsistent equation simplify to(84) with . The expansion for small value of and gives
(85) therefore we can derive the known result that
(86) with the mean field critical exponent given by .

The Configuration model
In the case of the Configuration model the order parameter is independent on , i.e. and the selfconsistent equation (82) reduces to(87) where . The expansion of this equation for small value of provides the critical exponents for networks in the Configuration model and generalizes the results of uncorrelated networks to network with the correlations imposed by the Configuration model. In the case in which is finite, the expansion of (