Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks
Most methods proposed to uncover communities in complex networks rely on combinatorial graph properties. Usually an edge-counting quality function, such as modularity, is optimized over all partitions of the graph compared against a null random graph model. Here we introduce a systematic dynamical framework to design and analyze a wide variety of quality functions for community detection. The quality of a partition is measured by its Markov Stability, a time-parametrized function defined in terms of the statistical properties of a Markov process taking place on the graph. The Markov process provides a dynamical sweeping across all scales in the graph, and the time scale is an intrinsic parameter that uncovers communities at different resolutions. This dynamic-based community detection leads to a compound optimization, which favours communities of comparable centrality (as defined by the stationary distribution), and provides a unifying framework for spectral algorithms, as well as different heuristics for community detection, including versions of modularity and Potts model. Our dynamic framework creates a systematic link between different stochastic dynamics and their corresponding notions of optimal communities under distinct (node and edge) centralities. We show that the Markov Stability can be computed efficiently to find multi-scale community structure in large networks.
How the structure of a network affects the dynamics (e.g., diffusion or synchronization) that takes place on it has been studied extensively in recent years [review, bocca, syn2]. This relationship is particularly relevant when the network is composed of tightly-knit modules or communities [GN, simon, fortunato2010community, revporter, sales, clauset], which can lead, for instance, to partially coherent dynamics [syn1, neave], or to the emergence of co-operation [cooperation] and coexistence of heterogeneous ideas in a social network [lambi]. Conversely, it has been proposed that dynamical processes such as random walks [JC, PL, rosvall, MuchaEtAl10] and synchronization [syn1] could be used as empirical means to extract information about the network and, specifically, to uncover its community structure.
Recently, there has been extensive research on the detection of communities and hierarchies in real world systems, ranging from social systems to technological and bio-chemical systems (for a review see [fortunato2010community]). Most of these studies follow from the classical problem of graph partitioning and are thus based on structural properties of graphs [fortunato2010community, revporter]. In order to discover communities, such methods usually proceed by optimizing a quantity that captures what is thought to be the goodness of a partition in terms of combinatorial properties of the graph. A variety of such quality functions (and associated optimization strategies) have been proposed, including different versions of balanced and normalized cuts, as well as modularity and its extensions [fortunato2010community, revporter]. In general, these combinatorial definitions operate by counting the number of links within and between the communities, and are thus blind to the flows of information taking place on the network.
In contrast, we adopt here a dynamical viewpoint for the analysis of community structure in graphs. Specifically, we use statistical properties of a random walk (or its associated Markov processes) evolving on a given network to quantify the quality of partitions across all time scales. Consider, for instance, the simple random walk, where a random walker jumps at every step from the node where it sits to one of its immediate neighbours with a probability proportional to the weight of the link joining the nodes. We define the Markov Stability [JC, lambiotte2008laplacian, Delvenne13, schaub2012markov] of a partition of the graph at time as the probability of a walker to be in the same community at time zero and at time when the system is at stationarity, discounting the expected probability as . For an ergodic and mixing random walk (i.e., on an aperiodic, strongly connected graph), this limiting probability is the probability of two independent walkers to be in the same community. The Markov Stability so defined measures the quality of a partition in terms of the persistence of the Markov dynamics within the communities of the partition within the time scale , i.e., the Markov Stability is large when it is unlikely that a random walker will escape the communities within time . Alternatively, the Markov stability can also be understood as the time auto-correlation of a coarse-grained signal. Hence, a large Markov Stability is equivalent to a non-asymptotic time scale separation [SimonAndo61, kokotovic86] within the diffusion dynamics, where the fast dynamics mixes the probability flow inside the communities and the slow dynamics describes the transfer of probability between the communities. It can be shown that the Markov Stability so defined, which we will make more explicit below, is monotonically decreasing for most partitions on most graphs [JC].
The dynamics-based Markov Stability framework for community detection introduced in [JC, lambiotte2008laplacian, Delvenne13] has mathematical connections with the wider literature relating random walks on graphs and graph properties and allows us to link those results with applications in community detection. A strong initial motivation for our work was the theory of quasi-stationary distributions in Markov chains [darrochseneta_discrete, darrochseneta_continuous], and the theory of quasi-stable (long-lived) states in the physics of energy landscapes [wales2003energy]. Random walks have been used by a variety of methods in graph partitioning and clustering. For example, the mixing rate of the random walker is closely related to the conductance, a measure of quality for two-way partitions [alon1985lambda, lovasz1990mixing, chung2007four]. Through their commute times [YenFoussDecaestFrancqSaerens08] or through more general spectral embeddings [coifman2006diffusion], random walks also allow representations of the graph in a Euclidean space on which classic machine learning techniques can be used, including clustering. Other partitioning algorithms have also made use of random walk measures [PL, meila2001random, piccardi2011finding, van2008graph]. The distinguishing feature of the Markov Stability approach is the systematic sweeping through all time scales, fast to slow, in order to discover fine or coarse partitions, thus relating characteristic time scales of the dynamics to the structural scales present in the network. In constrast, the precited methods focus on a fixed time scale (e.g., one-step) or a fixed number of communities (e.g., two) and hence do not exploit fully the dynamical aspects of the random walk. See [JC] for a more extensive discussion, and Section 2 for an overview of the unifying character of the Markov Stability framework, whose dynamical character allows the interpolation between the structural (edge-counting) measures and the spectral approach to community detection.
In this article, we extend the Markov Stability formalism and show that any random walk on a given network, whether in discrete or continuous time, generates a different partition Stability function, and therefore a different notion of community reliant on specific measures of node and/or edge centrality. Indeed, classical notions of centrality (e.g., degree, eigencentrality, pagerank) can be shown to correspond to different random walks on the networks. Within this framework, we observe that good communities appear as a result of an optimization that balances the cost of severing many or highly central edges against a maximum-entropy spread of the centrality across communities. This compound optimization is parametrically modulated by time, which gives varying weight to the energetic cost of the cut against the maximum entropy term. At long times, the problem turns out to be solved exactly by spectral methods. We show how these dynamical, graph-theoretical and optimization concepts are intertwined, providing insight on the nature of different community structures, the centrality optimizations they entail, and associated spectral partitioning algorithms known in the literature. Our work thus provides a unifying viewpoint for different variants and heuristics used in the graph-partitioning, clustering and community detection literatures, including several variants of null-model-based modularity or spectral algorithms, which appear as particular cases of our formalism. Conceptually, our work indicates that, rather than searching for a single partition at a particular scale, dynamics can be used to unfold and detect systematically the relevant partitions by scanning across all scales in the graph [JC, Delvenne13]. Similarly, we show here that the choice of dynamics can also be used to find the most appropriate community structure (if particular information about the system is available) or to explore the network under different (and complementary) viewpoints to gain deeper information about the system.
The paper is organized as follows. First, the framework is introduced via the standard (simple) random walk and its associated continuous-time processes, including those generated by the normalized and combinatorial Laplacians. We show how the relevant centrality measure in this case is the degree, yet different continuous-time Markov processes (potentially relevant for different network dynamics) lead to different communities linked to particular heuristic null models used in the community detection literature. The dynamical scanning implicit in our framework is used to illustrate the detection of community structure across scales in several examples without imposing the scale or number of communities a priori. Part of these results were reported in the unpublished preprint [lambiotte2008laplacian]. We then consider the analysis of less standard random walks, specifically the Ruelle-Bowen case, and show that its notion of community is based on a different kind of centrality, i.e., eigencentrality. This is followed by a brief section where we show how the dynamical viewpoint afforded by Markov Stability seamlessly extends to the case of directed graphs, thus allowing us to recast the concept of structural communities in terms of flow communities. The final section illustrates the framework with the analysis of synthetic benchmarks and real-world examples, and discusses computational and practical issues for Markov Stability, e.g. assessing the presence of robust partitions, or of a hierarchical structure.
2 The simple random walk and community detection: Discrete-time Markov Stability for undirected graphs
To make our arguments more precise, we first review briefly some of the notation and results from [JC, Delvenne13], where mathematical proofs and further results can be found. For simplicity, we start by considering the case of undirected graphs, although we will see below that the arguments extend to directed graphs too.
Consider an undirected graph with nodes and weighted adjacency matrix , such that the weight of the link between node and node is given by . The vector containing the degrees (or strengths) of the nodes is , where is the vector of ones, and we also define the diagonal matrix . The sum of all degrees is . The combinatorial graph Laplacian is defined as and the normalized graph Laplacian is defined as . Both Laplacians are symmetric nonnegative definite, with a simple zero eigenvalue when the graph is connected [chung1997spectral]. We denote the trace with the notation .
Consider the simple (unbiased) random walk governed by the standard dynamics:
where denotes the dimensional probability vector and is the transition matrix. Note that following the Markov chain literature, the probability vectors are defined as row vectors. Under the assumptions of a connected, undirected, and non-bipartite graph this dynamics converge to a unique stationary distribution
Each partition of the graph into communities is encoded by a indicator matrix with , where a 1 denotes that node belongs to community . Given a partition , the clustered autocovariance matrix of the diffusion process at time is:
where . The matrix reflects the probability of the random walk to remain within each block (diagonal elements) and to transfer between blocks (off diagonal elements) after a time . Consequently, we define the Markov Stability of the partition as
the approximation coming from the computational observation that is mostly monotonically decreasing for empirical graphs [LeMartelot2012]. A ‘good’ partition over a time scale has well-defined communities that preserve probability flows within them, hence maximizing the trace of and, conversely, the Markov Stability can be seen as a quality function for a partition of a graph as a function of the time horizon of the random walk.
The Markov Stability can be used to rank partitions of a given graph at different time scales or, alternatively, can be used as an objective function to be maximized for every time in the space of all possible partitions of the graph:
Such an optimization results in a sequence of partitions optimal over different time interval. Although this optimization is NP-hard, a variety of efficient optimization heuristics for graph clustering can be used, as discussed in later sections.
The discrete-time Markov Stability for undirected graphs encompasses several well-known heuristics and has other desirable theoretical properties, some of which we highlight here succinctly (see [JC, Delvenne13] for proofs):
Discrete-time Markov Stability at time is equal to the ‘usual’ modularity , i.e., with the configuration model as null model [newman_modul_PNAS, GN]:
Markov Stability at time is equivalent to the Gini-Simpson diversity index of the partition [simpson1949measurement]:
where is the -th column of the matrix . GS is a measure of entropy of the partition according to the values of , i.e., the degree. GS is large when the partition has many communities of equal size (according to ), and is low when the partition has few and uneven communities. GS is maximum for the partition into one-node communities. This index is well known in economics (Hirschman-Herfindahl index [hirschman1964paternity]) and information theory (Rényi entropy [renyi1961measures]), among others.
The probability of changing community in one step
is a measure of the cut induced by the partition, i.e., the fraction of edges between all the communities.
The long-term behavior of is governed by the normalized Fiedler eigenvector associated with the second dominant eigenvalue of , i.e., that which is closest to in absolute value. Hence the optimal community structure as is typically 111Close-to-bipartite graphs are the exception: they have a strongly negative eigenvalue whose odd and even powers generate an alternating . given by the bipartition according to the sign of the entries of the normalized Fiedler eigenvector [JC, Delvenne13].
Spectral algorithms (either iterative or based on several eigenvectors at a time) are classic relaxation heuristics [Fiedler73, Fiedler75] for the optimization of a variety of NP-hard partitioning quality functions, including modularity [Newman06] or normalized cut [ShiMalik00]. We have shown that spectral clustering methods provide exact procedures for the optimization of Markov Stability at long times.
3 Continuous-time Markov Stability: the dynamical origin of different quality functions
We now consider continuous-time Markov processes associated with the simple random walk (1) in order to extend our dynamics-based framework for community detection in undirected graphs.
3.1 Normalized Laplacian Markov Stability
Given the random walk (1) on an undirected graph, a standard way to derive a continuous-time model is to assign a continuous Poisson process of given density at each node [Bremaud, tijms2003first]. If we assume identically distributed Poisson processes (i.e., with identical waiting times) for all nodes, we obtain the standard diffusive dynamics:
Note that the operator is isospectral with the normalized Laplacian since they are related by the similarity transformation . Hence the dynamics of (9) is dictated by the spectral properties of . In particular, this process converges to the same unique stationary distribution (2) as the (discrete-time) simple random walk. As above, we thus define the continuous-time Markov Stability as:
where the notation emphasizes the connection with the normalized Laplacian. This continuous-time version of Markov Stability shares broadly similar properties with the discrete-time version (4), and most of the discussion presented in Section 2 applies here. For instance, Figure 1 shows the results of the optimization of over time and over the space of partitions for a simple example. Note that the Markov Stability explores the community structure at all scales (from finer to coarser) using the dynamic zooming provided by the Markov time of the diffusion process . The relevant (time) scales emerge as the ones leading to persistent (robust) partitions over extended intervals of time. See Section 6 and Refs. [Delvenne13, schaub2012markov] for a discussion of some of the practical issues of the computational implementation and more illustrative examples.
It is also instructive to consider the behavior of (10) in the limit of small times, . Keeping terms to first order, we obtain the linearized Markov stability:
Analogously to (8), the instantaneous probability rate of the walker escaping from its initial community is the Cut.
The Potts model heuristic proposed by Reichardt & Bornholdt [ReichardtBornholdt06] is exactly recovered as the linearized Markov stability. Hence we can see the Markov time as the equivalent of a resolution parameter. From (12) it also follows that the ‘usual’ modularity [newman_modul_PNAS, GN] is recovered at for undirected graphs:
Equation (11) provides an interpretation of Markov Stability as a compound quality function to be optimized under two competing objectives: minimize the Cut size while trying to maximize the diversity , which favours a large number of equally-sized communities according to , thus resulting in more balanced partitions. The relative weight between both objectives is modulated as the Markov time increases.
The stationary distribution plays a key role in the definition of the community quality function:
Firstly, can be understood as originating the null model of modularity, i.e., the model of random graph against which the network is compared to detect the significance of the communities. The null model in the ‘usual’ modularity is the configuration model, which randomly rewires the edges of a given graph preserving the degree of every node. The probabilistic description of this model is given by the outer product , which in our dynamical interpretation corresponds to the expected transfer probabilities at stationarity for this Markov process.
Secondly, measures the diversity of the partitions according to the node property . Hence, as the value of grows, the optimization leads to balanced distributions of across communities, splitting nodes with high values of into different communities. In this case, we tend to segregate nodes with high degree into different groups.
3.2 Combinatorial Laplacian Markov Stability
Given a discrete-time random walk, a variety of continuous-time Markov processes are possible. Although in (9) we assumed identical Poisson processes at all nodes, we have the flexibility to assign different waiting times at each node. An interesting choice is to consider that the waiting time at each node is inversely proportional to its degree, i.e., the walker spends less time on nodes of high degree. Using an inhomogeneous rescaling of time this leads to a Markov process governed by the combinatorial Laplacian:
where is the average degree and . The stationary distribution of (3.2) is now the uniform distribution over the nodes:
and the combinatorial continuous-time Markov Stability is:
The corresponding linearized version is then:
In this case, the stationary distribution leads to a different diversity index:
where is the number of nodes of community . The modularity associated with this process is:
which is precisely the modularity based on the Erdös-Rényi (ER) null model with a probabilistic description given by the outer product . This version of modularity was originally discussed by Newman [GN, newman_modul_PNAS] and has been recently studied against network benchmarks [traag2011narrow]. Based on our arguments above, the combinatorial Markov Stability optimizes partitions that balance the Cut against the diversity , which ignores degrees and counts only the fraction of nodes present in each community. Hence, it is more likely to group nodes with high degree in the same community when using combinatorial Markov Stability, as we will discuss below.
Finally, we remark that at long Markov time scales, the combinatorial Laplacian dynamics recovers the bipartition based on the classic heuristic of the signs of the components of the Fiedler eigenvector [Fiedler73], which constitutes the basis of several spectral algorithms. As stated above, the normalized Laplacian version converges to the bipartition based on the normalized Fiedler eigenvector, which is also used in other spectral algorithms like Shi-Malik [ShiMalik00]. Seeing those algorithms as the coarser extreme of a range of community detection problems provides additional insight into the meaning and differences between those popular spectral algorithms.
3.3 Normalized vs Combinatorial Markov Stability: some examples
The relevance of dynamical coherence
As discussed above, a driving force in the definition of quality functions for community detection has been the use of null models, i.e., random graph models that preserve certain properties of the graph under study and act as bootstraps to establish the significance of communities. Early on, it was proposed [GN, newman_modul_PNAS] that the configuration model should be preferable to Erdös-Rényi as the null model, because the former takes into account the degree heterogeneity typically found in realistic networks. However, it has been recently shown[traag2011narrow] that the Erdös-Rényi model behaves at least as well as the configuration modularity on benchmarks [LancichinettiEtAl08] and leads to improved results in particular graphs.
Under our dynamical framework, the two null models correspond to the stationary distributions of the Markov processes governed by the normalized and combinatorial Laplacians. The two Laplacian dynamics can emerge naturally in the modelling of different continuous-time dynamics on networks, such as heat diffusion [kondor2002diffusion, chung1997spectral], the linearization of Kuramoto oscillators [syn1, syn2], or consensus dynamics [ali, ROS-JAF-RMM:07, neave]. In the important cases when the dynamics of the system is governed by the combinatorial Laplacian (e.g., synchronization, consensus, or vibrational dynamics), we expect that the relevant dynamical groupings should correspond to communities obtained using the combinatorial version of Markov Stability (i.e., corresponding to the ER null model) and not the canonical configuration model.
Figure 2 illustrates this point by examining relevance of dynamic communities in synchronization dynamics on a toy network made of two triangles: links in the upper triangle have weight 5; links in the lower triangle have weight 25; and they are connected by links of weight 1. The dynamics of the network is given by the Kuramoto model with uniform frequencies, a prototypal model for synchronization where each node has a phase evolving as
The coherence between nodes and is measured by the order parameter , where the average is performed over an ensemble of random initial conditions. The coherence computed from simulations (bottom panel) shows that the lower triangle is always more coherent than the upper triangle, as expected. If we threshold to find coherent clusters [syn1], the first group detected is the lower triangle, followed by the upper triangle at later times. If we use the combinatorial Markov Stability on this toy graph, this sequence of partitions is correctly uncovered. This follows unsurprisingly from our dynamical interpretation since the linearization of the Kuramoto dynamics leads to the combinatorial Laplacian. In contrast, does not recover this result, as it first uncovers a dynamically irrelevant partition where the upper triangle is found. Interestingly, numerics on Kuramoto dynamics [syn1, arenas2] have shown that the ‘usual’ modularity is only optimized for near-regular graphs, i.e., when it is equivalent to the true optimization performed by the dynamics, . Therefore, if we are interested in coherent Kuramoto communities (e.g., motivated by power grid applications [dorfler2013synchronization]), the partitions found with the ‘usual’ modularity could be misleading. On the other hand, if we are interested in the study of probabilistic diffusive dynamics, the relevant communities should follow from the study of .
An optimization perspective: distinct cost functions
Further insight into the communities for each version of Markov Stability can be gained by examining the role of the stationary distribution of the Markov process in the definition of the diversity index appearing in the compound cost function to be optimized. From the definitions (7) and (19) of the diversity indices GS and GS (associated with the normalized and combinatorial versions of Markov Stability, respectively), it follows that the normalized version balances communities with respect to their edge volume while the combinatorial version balances communities with respect to their node volume. Therefore, the normalized version (related to the ‘usual’ modularity) tends to separate nodes with high degree into different communities. This may lead to unexpected results, e.g., in assortative networks, where high degree nodes tend to be strongly connected to one another, yet could be split when using quality functions based on the configuration model.
To illustrate this point, consider the community structure uncovered in the symmetrized version of the C. elegans neural network, a weighted network with nodes and edges. The partitions found by the combinatorial and normalized versions of Markov Stability are significantly different—not unexpectedly since the graph is far from being degree-homogeneous. In Fig. 3, we present the partitions at for both versions consisting of mainly 3 large communities. As discussed, the optimization of tends to balance the total degree of the communities , while tends to balance the number of nodes of the communities. Indeed, for the combinatorial Laplacian, the total degree of each of the three communities are , whereas these numbers are more balanced for the normalized Laplacian: . On the other hand, the fact that the combinatorial Markov Stability does not penalize as much grouping together nodes with high degree into the same community can also be seen in Fig. 3. The high degree nodes tend to be split evenly among the three communities for the normalized Laplacian, while the combinatorial Laplacian has a disproportionately large number of high degree nodes grouped together in the red community, less so in the green community and even fewer in the blue community. More specifically, the top 20 nodes with the highest degree are distributed among the three communities in the ratios for while the corresponding ratios for are .
3.4 The simple random walk and its continuous-time versions: degree as centrality
Our discussion above leads to the following generalization of the continuous-time versions of the simple (unbiased) random walk. When taking the continuum limit, the waiting times at each node can be weighted by any power of the degree:
where the notation denotes the average over all the nodes, i.e., , and we have introduced the -scaled Laplacian:
The stationary distribution of (3.4) is then
and the corresponding -scaled Markov Stability is:
The linearized version reads:
and, again, the diversity index of the partition is measured as a function of the stationary distribution :
Clearly, corresponds to making the waiting time independent of the degree and leads to the normalized Laplacian Markov Stability, while corresponds to making the waiting time inversely proportional to the degree and produces the combinatorial Laplacian version.
This generalization allows us the flexibility to modulate the effect of degree centrality in community detection using other continuous-time dynamics. We could consider a model where the waiting time is proportional to the degree, i.e., . This could be interpreted as the model of a random web surfer, spending on average more time reading a page with higher number of links. The community detection on such a system would then be based on the non-standard Laplacian and the diversity index (27) will try and balance communities according to the square of the degree, making it even more unlikely to group high degree nodes in the same community. If, on the contrary, we consider a model where the waiting times have an inverse square dependence on the degree (), the diversity index (27) would then be based on the inverse of the degree, and the community detection will tend to push neighboring high degree nodes together in a single community, while low degree nodes stand separated, as in a core-periphery decomposition. This phenomenon will be more acute as we make more negative, whereas, conversely, a large and positive will put the emphasis on separating the few top degree nodes, disregarding almost entirely the effect of the majority of nodes.
This extended discussion of the simple random walk and associated Markov processes highlights the connection of dynamical community detection with concepts of centrality. Measures of centrality aim at rating how connected nodes are with the rest of the network. The weighted degree is perhaps the most elementary concept of centrality—indeed, it is sometimes referred as ‘degree centrality’. As shown above, the degree appears as the stationary distribution of the simple random walk (1), and the optimization of the quality function for community detection balances the partitions according to the diversity of degree centrality. In particular, it is optimal to split apart highly central nodes (i.e., with high degree in this case) into different communities for short enough Markov time scales, and to aim towards balanced intra-community edge centrality. The continuous-time versions are able to modulate, amplify, attenuate, cancel or even invert the effect of degree centrality as the power is varied. We consider the connection of dynamical community detection with other measures of centrality in the following section.
4 Community detection based on other notions of centrality: the Ruelle-Bowen random walk
4.1 The role of centrality in community detection
In different applications, it might be desirable to employ other measures of centrality as the linchpin for community detection. We can achieve this using the random-walk framework discussed above. Many discrete-time random walks other than the simple random walk may be performed on a network. We then may think of the stationary distribution of every random walk as a centrality measure. Every random walk with transition matrix will then be associated with a dynamical Markov Stability quality function, and the corresponding community detection will produce optimized partitions which are balanced according to different measures of centrality. A generic way to generate random walks is to bias the simple random walk [lambiotte2011flow]. For instance, one may attribute a positive number to every node (e.g., a property related to a measure of centrality) and let a random walker at jump to with probability proportional to .
Once the discrete-time random walk (and its associated centrality) is chosen, different continuous-time processes can be obtained. Generically, this is done by combining two ingredients: the transition probabilities of the discrete-time random walk (i.e., the row-stochastic matrix ) and the waiting times of the continuous-time process at each node (compiled in a node vector ). The resulting process is then:
with . These two ingredients come into play differently in determining the corresponding Markov Stability function for community detection. The discrete-time random walk defined by determines the stationary distribution on nodes. On the other hand, the continuous-time stationary distribution on node , or node centrality, is given by , where is the normalization constant . As shown in the examples above, the choice of waiting times can thus modulate the effect of the node centralities. The centrality of edge , on the other hand, is the probability that an observed transition links to , which does not depend on the time elapsed between transition but rather on the respective frequencies of transitions given by . Edge centralities are therefore given by , hence completely determined by the discrete-time transitions and unaffected by waiting times. As a result, the discrete-time transitions and waiting times have a different effect on the resulting Markov Stability function: waiting times have no influence on the edge centrality but afford complete control over the node centrality (and on the Gini-Simpson term of the cost function), whereas the Cut term is completely determined by the edge centralities (i.e., the underlying discrete-time random walk). At long times, the optimal split is provided by the sign pattern of the second eigenvector of the ‘generalized Laplacian’ , which depends both on the discrete-time transitions and the waiting times . We now explore a classic discrete-time random walk with distinctive properties.
4.2 Community detection according to the Ruelle-Bowen random walk
A particularly interesting example is the random walk introduced by Ruelle, Bowen and others [Ruelle1978]. Consider a graph with adjacency matrix , under the usual assumptions of connected, undirected, and non-bipartite, for simplicity. An important notion of centrality is associated with , the dominant eigenvector of A (i.e., the eigenvector with the largest eigenvalue):
The eigencentrality [Bonacich72] of node is given by , its correspondent component of this eigenvector.
The discrete-time Ruelle-Bowen (RB) random walk is defined such that the transition between nodes and occurs with probability :
with the probability vector and . Under such assumptions, the unique stationary distribution of the RB random walk is
since for the normalized eigenvector. The stationary distribution can be seen as a centrality measure, which is called entropy rank (for the unweighted case) or free energy rank (for the weighted case) [delvenne2011centrality], thus essentially equivalent to eigencentrality in terms of ranking (although the concepts diverge in the directed case, not analyzed here).
This classic random walk has an interesting interpretation in terms of entropy: it is maximally exploratory in the sense that its per-step entropy is maximal. More precisely, let denote the (Kolmogorov-Sinai) entropy rate of the random walk, which is the average per-step entropy that is asymptotically approached for long paths, and let be the expectation of the edge transition energies , such that . Then the RB random walk maximizes the ‘free energy’ . It therefore tends to make all paths of same length equiprobable, with a bias to make high energy paths more probable [parry1964intrinsic]. Beyond its thermodynamic properties, the Ruelle-Bowen walk naturally emerges in other contexts, such as the computation of quasi-stationary distributions [darrochseneta_discrete, darrochseneta_continuous].
Similarly to the simple random walk, we can associate continuous-time Markov processes to the RB random walk. The simplest is given by the homogeneous waiting times:
with as in (30). The node stationary distribution of (32) is given by (31), whereas the edge centralities are given by the matrix . The full and linearized versions of the RB Markov Stability follow closely the expressions in (10)–(12). This continuous-time process can be generalized through the choice of waiting times.
The RB Markov Stability has connections with other heuristics in the literature. For instance, the spectral algorithm associated with the RB random walk on an undirected graph makes use of the second eigenvector of the adjacency matrix , similarly to the ‘adjacency spectral clustering’ of Sussman et al. [sussman2012consistent]. To illustrate the flexibility of the framework in designing cost functions associated to different notions of communities, let us consider waiting times . This choice makes thenode centralities proportional to the degree, since the discrete-time RB walk induces stationary probability on nodes proportional to (see Eq. (31), while the edge centralities, unaffected by waiting times, are still determined by the edge entropy rank. The linearized Markov Stability optimization will now look for communities balanced in terms of number of edges (through diversity term) while cutting edges with low entropy rank (through the Cut term).
As a simple example of the impact of such a choice on the outcome of partitioning, consider the graph composed of two -cliques and and a -cycle , interconnected by single edges. From the point of view of the simple random walk Markov Stability, cutting the edge or the edge is indifferent as far as the cut term is concerned. However, RB Markov Stability favours cutting the less central edge, thus isolating first the ‘hollow’ module on the account of cut minimization, while the Gini-Simpson term tends in this case to keep apart high-degree nodes, thus inducing non-trivial results [ochab2013maximal]. This priming of eigencentrality in the allocation of community splits could be desirable for particular applications, e.g., when analyzing networks with highly heterogeneous eigencentrality across the nodes. This will be particularly important in networks whose node eigencentrality is not fully captured by the degree centrality [Bonacich2007], e.g., when a low-degree individual is connected to high degree others or in which a high-degree node is only connected to low degree others.
Finally, an interesting property of the Ruelle-Bowen random walk is its universality. Any linear dynamics , where is a row vector of real entries over the nodes and is a nonnegative primitive matrix, can be transformed to make it interpretable as a random walk [ES:81]. Hence, besides consensus, heat diffusion, linearized synchronization, etc, random walks can also be used to represent a wider class of dynamics on networks.
5 Markov Stability for Directed Graphs
Another advantage of the dynamical framework for community detection introduced above is that it extends naturally to directed graphs, whereas the extension of structural quality functions, such as modularity, to the case of directed graphs is not trivial. For instance, although it has been argued [leicht2008community, Arenas] that the null configuration model in modularity should become in order to account for the directionality of the links, this choice and justification of the null model for directed graphs is not unique. Under our dynamical viewpoint, the notion of community becomes that of flow community, and the relevant centrality is pagerank with its associated null model, as we show below.
Consider the simple random walk for a directed graph wit the (non-symmetric) adjacency matrix: . Each node has an in-degree, collected in the vector , and an out-degree, collected in the vector , i.e., the sum of the weights of the edges directed at and departing from the node, respectively. The simple random walk in this case is given by
where and . For nodes where , we set .
For simplicity, consider first the case when the graph is strongly connected and aperiodic. Then the random walk (33) is ergodic and has a unique, stationary distribution corresponding to the dominant left eigenvector of . The stationary distribution is called pagerank, a key measure of centrality in directed graphs [perra2008spectral]. We can then define the directed Markov Stability based on the random walk (33), which has the same form as (4) and (3). This quality function can be used the same way as the undirected version to extract multiscale structure in graphs by using the Markov time as a resolution parameter. The directed Markov Stability at time which, following (6) above, corresponds to our quality function most closely related to ‘directed modularity’ :
Note that the null model we obtained here corresponds to the outer product of the normalized pagerank vector , in lieu of in- and/or out-degree vectors [leicht2008community, Arenas].
Clearly, using (34) gives different results to structural versions of directed modularity based on in- and out-degree null models. While optimization of (34) favours partitions with persistent flows of probability within modules, modularity favours partitions with high densities of links and is blind to the flow actually taking place on these links. To illustrate the difference, consider the toy example given by [rosvall] (Fig. 4), on which the directed random walk is ergodic. In this case, optimizing the in/out-degree modularity of this toy network leads to a partition where heavily weighted links are concentrated inside communities, as expected. On the other hand, optimization of directed Markov Stability leads to a partition where flows are trapped within modules. It is also interesting to stress that the partition that optimizes (34) also optimizes the map equation proposed by Rosvall and Bergstrom[rosvall]. For an independent study of directed modularity based on other arguments, see Kim et al [fKim].
Our definition of directed Markov Stability relies on the condition that the dynamics is ergodic. When the directed network is not ergodic, it is common to generalize the standard random walk by incorporating a random teleportation term (also known as ‘Google teleportation’). If the walker is located on a node with at least one outlink, it follows one of those outlinks with probability . Otherwise, with probability , the random walker teleports with a uniform probability to a random node. Instead of , the new transition matrix of the random walk (33) becomes:
where the vector is an indicator for dangling nodes: if (and the corresponding row of is assumed to be zero) and otherwise. Upon visiting a dangling node, a random walker is teleported with probability 1. It is customary to use the value . The teleportation scheme is known to make the dynamics ergodic and to ensure the existence of a single stationary solution that is an attractor of the dynamics. Indeed, teleportation is sometimes introduced even in the ergodic case to improve the numerical convergence of pagerank computation.
Finally, we remark that, as for the undirected case, there are continuous-time versions of directed Markov Stability. The simplest is given by the corresponding Kolmogorov equation:
and our discussion above applies to these processes too. An application to a large graph of airport connections is presented in the next section. See also [Beguerisse2014riots] for an application to social network analysis.
6 Computational methodology and practical considerations
Given a network, and based on modelling considerations or other assumptions, we can choose a discrete- or continuous-time Markov process to scan dynamically the structure of the graph at all scales. As shown in the toy example of Figure 1, the optimization of the chosen Markov Stability across time leads to a sequence of partitions that are optimal at different time scales. The extraction of these optimized partitions is the first step to uncover the multi-scale modular structure of the network (if present), but the practical application of the method still involves at least two non-trivial steps, which we now discuss in conjunction with several larger examples. Although the examples in this section exhibit a relatively hierarchical community structure, in Supp.Inf. we illustrate and measure quantitatively non-hierarchical multi-scale structures.
6.1 Optimization of Markov Stability
Although it has been shown that modularity optimization is NP-hard [NP], several heuristic algorithms have been proposed to provide satisfactory solutions, in the sense that they efficiently recover planted solutions in benchmark graphs, or that they can uncover groups that are clearly meaningful (e.g. classes in a school social network, etc) [fortunato2010community]. It has also been shown that in real-world examples the modularity landscape over partitions tends to exhibit large rugged plateaux, making it possible to find an approximately optimal partition [Good].
We will now show that it is always possible to rewrite the Markov Stability for any choice of random walk as the modularity of another symmetric graph. This observation has important practical implications, as it makes it possible to use any modularity-maximization algorithm, e.g. spectral or greedy, for the optimization of any version of Markov Stability. For example, consider the discrete-time stability , for transition matrix and the corresponding centrality . It is easy to see that this is the usual modularity for the graph of weighted adjacency matrix , a symmetric matrix of degree sequence . A similar observation holds in continuous time (where the exponential can be evaluated by Padé approximations), and also for the linearized versions of Markov Stability.
Any modularity maximization algorithm can therefore be used for Markov Stability optimization. As some of those algorithms [Blondel] are empirically known to run in on -edge graphs, the most expensive step turns out to be matrix multiplication or computation of the exponential, which limits the application of full Markov Stability to graphs with nodes. These overheard costs are spared when using the linearized version of Stability, which becomes the most suitable for the multi-scale analysis of very large networks . In our applications below, we have used mainly the Louvain algorithm [Blondel] adapted to the optimization of Markov Stability222An efficient code, also with a Matlab interface, can be downloaded at http://wwwf.imperial.ac.uk/~mpbara/Partition_Stability/ or http://michaelschaub.github.io/PartitionStability/, although spectral bisection methods [shimalik] for the generation of optimized partitions yields good results [JC].
6.2 Robustness of partitions
Once the sequence of optimized partitions is obtained, we need to select the most relevant scales (partitions) for our description. This is a well-known challenge for multi-resolution methods. Notions of robustness are usually considered when dealing with NP-hard optimizations to reflect the ruggedness of the landscape of the quality function to be optimized [Good]. In our approach, we establish the significance of a particular partition based on its robustness in three different ways [ronhovde, Delmotte2011, stability1, karrer, reichardt, multiscalelambiotte]: (i) robust (persistent) across time; (ii) robust to small perturbations to the graph; and (iii) robust to the optimization algorithm and the starting point of the optimization. We now exemplify (i) and (iii).
The basic notion is to evaluate the effect of these perturbing factors on the optimized partition: a partition is robust if such perturbations have little effect on the outcome and the perturbed result remains close to the unperturbed one. A popular way to compare two partitions and is the normalized variation of information [meila]
where is the conditional entropy of the partition given , i.e., the additional information needed to describe once is known assuming a uniform probability on the nodes. The conditional entropy is defined in the standard way for the joint distribution that a node belongs to a community of and to a community of . The normalized variation of information has been shown to be a true metric on the space of partitions and vanishes only when the two partitions are identical.
Within the Markov Stability framework, we use this metric to evaluate the persistence of partitions across time. By looking for block-diagonal regions with low values of , as well as plateaux in the number of communities as a function of time [mason], we can detect the relevant partitions and scales without assuming them a priori. Two examples of this approach are shown in Fig. 5, where we illustrate the detection of the relevant scale (12 communities) in a small real-life network of American football teams (), as well as three scales in a hierarchical benchmark random network with nodes. The same notion is evaluated in Fig. 6, where we detect 4 hierarchical levels in a larger benchmark network with nodes by comparing partitions across time using the scaling factor to evaluate .
In addition to the robustness of partitions based on persistence across time, it is also helpful to evaluate the robustness of the solution with respect to the optimization. We do this by repeating the Louvain optimization many times (in excess of 100 random initial seeds for each Markov time) and evaluating the average normalized variation of information within the ensemble of optimized solutions. If a partition is robust to the optimization, we expect a small value (or a dip) in the normalized variation of information of the ensemble of optimized solutions, signaling a relevant partition. This robustness to the optimization probes the ruggedness of the landscape and can be tested for different optimization algorithms [Good]. Here we use the Louvain algorithmic heuristic, which has been shown to perform well both in benchmarks and real-life examples [multiscalelambiotte]. In Figures 7 and 8, we show the application of this approach to two large networks: an undirected, weighted atomic protein network with nodes; and a directed, weighted network of airport connections with nodes. In both networks, we find relevant structure at different resolutions. Of note is that our results in the protein network are able to identify partitions corresponding to relevant chemical structures (involving only a few nodes), through secondary structures such as helices (involving several hundreds of atoms) to large conformational domains and, importantly, the subunits (involving several thousands of atoms). In the case of the airport network, the different levels of resolution reveal geographical and socio-political groupings. In this case, the directed character of Markov Stability is able to reveal communities with specific flow characteristics, including regions with focalized entry points coupled to a local asymmetric distribution network (e.g., Alaska and Greenland).
The selection of the relevant scales is still an open area of research in multiscale community methods and has strong links with non-convex optimization. Our notions of robustness reveal that the optimized partitions found at peaks of the variation of information tend to be hybrid combinations of natural partitions with non-uniform resolution, splitting some but not all the coarser communities, thereby explaining a high sensitivity to the random seed or Markov time. In other cases, such peaks correspond to the coexistence of a few ‘good’ partitions, which might indicate a tendency to flip between such outcomes and, hence, a lack of robustness. In this sense, the peaks in the variation of information tend to signal the separation between the relevant scales in the community structure of the network, and can also be related to the existence of non-hierarchical (yet multi-scale) community structure (see Supp. Info. for some examples). These topics will be the object of further work.
Our work emphasizes the importance of choosing proper dynamical processes in order to uncover information in networked systems. Here, we have focused on random walk processes, which are known to be mathematically equivalent to a broad range of diffusive processes: heat diffusion, evolution on a (free) energy landscape [wales2003energy], opinion dynamics on social networks and other kinds of consensus problems [fax2004information, ROS-JAF-RMM:07], linearization of synchronization [StevenH20001, syn2] and power networks [dorfler2013synchronization], among others. Importantly, using the random walk corresponding to the natural dynamics of the system allows us to find its central nodes (according to its intrinsic centrality measure) and to recover dynamically meaningful communities, i.e., the communities of nodes that best retain the diffusive flow for a certain time scale. If there is no intrinsic dynamics in the system, and hence no unique choice for the exploratory Markov dynamics, our approach provides tools to understand the effect of the different choices of random walks and associated centrality measures on the community structure obtained through Markov Stability optimization.
|Simple Random Walk||Normalized Laplacian||Combinatorial Laplacian||Ruelle-Bowen|
|Linearized Stability||Potts model[ReichardtBornholdt06]||Potts model[ReichardtBornholdt06]||Potts model[traag2011narrow, ReichardtBornholdt06]|
|Time-one (linearized) stability||Modularity[NG]||Modularity[NG]||Modularity[NG]|
|Null model||Configuration model||Configuration model||Erdös-Rényi|
|Spectral Algorithm||Shi-Malik[shimalik]||Shi-Malik[shimalik]||Fiedler[Fiedler73, Fiedler75]||Sussman[sussman2012consistent]|
More generally, our approach provides a unified viewpoint for a number of existing approaches, as summarized on Fig. 9, and Our approach paves the way for the development of metrics and algorithms that exploit real-world non-Markovian random walks [Rosval] or incorporate non-trivial temporal patterns into diffusive models [Hoffman]. Our work also opens perspectives in community detection by providing a dynamical interpretation of quality functions, and by interpreting the standard null-model paradigm in terms of stationary distributions [NG, GN]. The dynamical approach that we advocate here, not only generalizes the null model paradigm, but can also lead to fundamentally different quality functions. For instance, even the simple random walk on a directed graph leads to a Stability function containing the pagerank, which is not expressible in terms of combinatorial quantities, hence different from any null-model-based variant of modularity. The dynamic and null-model paradigms do overlap in a number of interesting cases. We have shown that for undirected networks, the two most common continuous-time dynamics, described by the normalized and combinatorial Laplacians, correspond to the two most meaningful null models, i.e., the configuration model and the Erdös-Rényi model. Through the intuition gained from the corresponding dynamics, we reinterpret the Erdös-Rényi null model (long considered as inferior in the null-model literature [NG, GN]) and show that it is linked to an optimization that tends to produce node-balanced communities, and can be more relevant under particular dynamical processes, consistent with the findings of Traag et al [traag2011narrow]. The exploration of alternative random walks, such as the Ruelle-Bowen walk, also highlights the capability of introducing alternative measures of centrality and extending community detection to include non-standard Markov processes.
We thank M.T. Schaub and A. Delmotte for extended discussions and for help with some of the figures and computations. We also acknowledge T. Evans, S.N. Yaliraki, M. Draief, H.J. Jensen, V. Blondel, M.A. Porter, V. Latora, M. Rosvall and J. Saramäki for fruitful discussions. J.-C.D. acknowledges support from the Belgian Programme of Interuniversity Attraction Poles, an Action de Recherche Concertée (ARC) of the French Community of Belgium, and FP7 STREP project EULER of the European Commission. R.L. acknowledges support from FNRS and project Optimizr of the European Commission. M.B. acknowledges funding from grant EP/I017267/1 of the UK EPSRC (Engineering and Physical Sciences Research Council) under the Mathematics Underpinning the Digital Economy program.
Hierarchical versus multiscale organization
Our use of time as a resolution parameter enables Markov Stability to detect robust partitions at different scales without imposing a priori the coarseness of the partitions. Although some of the methods used to optimize Markov Stability can lead to hierarchical community structure (e.g., the use of recursive bipartitions via Shi-Malik [JC]), we also use optimization heuristics that do not impose such a constraint (e.g., the use of the Louvain algorithm [Blondel] optimized independently at each time). It is then interesting to check whether or not the sequence of partitions is compatible with a hierarchical organization. This problem requires the introduction of a quantity that measures whether the communities at time are nested into the communities at a subsequent time . A well-known information theoretic measure that is particular adapted for such a purpose is the normalized conditional entropy:
which is also constrained to the interval but is now an asymmetric quantity that vanishes only if each community of is the union of communities of . The combined knowledge of and therefore allows us to uncover the significant partitions of the system and to verify if those partitions are organized in a hierarchical manner. For instance, the benchmark in Fig. 5b is clearly hierarchical, as can be seen in Figure 10a, whereas the toy network in Fig. 10b shows that the sequence of the optimal partitions is not necessarily hierarchical.