A local perspective on community structure in multilayer networks

A local perspective on community structure in multilayer networks

Abstract

The analysis of multilayer networks is among the most active areas of network science, and there are now several methods to detect dense “communities” of nodes in multilayer networks. One way to define a community is as a set of nodes that trap a diffusion-like dynamical process (usually a random walk) for a long time. In this view, communities are sets of nodes that create bottlenecks to the spreading of a dynamical process on a network. We analyze the local behavior of different random walks on multiplex networks (which are multilayer networks in which different layers correspond to different types of edges) and show that they have very different bottlenecks that hence correspond to rather different notions of what it means for a set of nodes to be a good community. This has direct implications for the behavior of community-detection methods that are based on these random walks.

L. G. S. Jeub et. al.]LUCAS G. S. JEUB
Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, OX2 6GG, UK
MICHAEL W. MAHONEY,
International Computer Science Institute, Berkeley, CA 94704
Department of Statistics, University of California at Berkeley, Berkeley, CA 94720
PETER J. MUCHA
Carolina Center for Interdisciplinary Applied Mathematics, Department of Mathematics, University of North Carolina, Chapel Hill, NC 27599-3250, USA
MASON A. PORTER
Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, OX2 6GG, UK
CABDyN Complexity Centre, University of Oxford, Oxford, OX1 1HP, UK

A “community” in a network describes a densely-connected set of nodes (often relative to a null model), and communities can reveal regularities in processes of network formation, strongly influence the behavior of dynamical processes that take place on a network, and be related to functional groups of nodes [\citenamePorter et al., 2009, \citenameFortunato, 2010, \citenameCoscia et al., 2011]. One can examine community structure in a network from either a global perspective or a local one. When using a global perspective, one typically partitions a network into a set of (potentially overlapping) communities; by contrast, when taking a local perspective, one seeks to determine the community or communities associated with a given node (or set of nodes). A local perspective naturally allows the detection of overlapping communities, as local communities for different seed nodes can share nodes without having to be identical.

From either a global or local perspective, communities can be viewed as dependent not only on network structure but also on dynamical processes (as a surrogate for function) on a network. Moreover, the choices of both dynamical process and initial conditions are very important [\citenameJeub et al., 2015]. A popular and successful approach for identifying community structure—both globally [\citenameRosvall & Bergstrom, 2008, \citenameDelvenne et al., 2010] and locally [\citenameAndersen et al., 2006, \citenameLeskovec et al., 2009, \citenameJeub et al., 2015]—is to analyze the behavior of a diffusion, random walk, or other spreading process on a network. This exploits the connection between the presence of communities in a network and the behavior of associated dynamical processes on that network [\citenameLambiotte et al., 2009, \citenameLambiotte et al., 2015].

Although numerous tools have been developed for the analysis of networks [\citenameNewman, 2010], most of them concentrate on time-independent networks with only a single type of tie between entities. Such ordinary networks are often unable to capture the complex interactions among entities in the real world. In general, interactions (and the entities themselves) can change over time, and there can also be multiple types of interactions between the same pair of entities. Temporal networks allow one to examine the former situation [\citenameHolme & Saramäki, 2012, \citenameHolme, 2015], and multiplex networks allow consideration of the latter [\citenameWasserman & Faust, 1994]. The use of multilayer networks [\citenameBoccaletti et al., 2014, \citenameKivelä et al., 2014] allows one to examine either temporal networks or multiplex networks. In the former case, each layer represents a time or a time window (though it is important to think about issues such as discrete versus continuous time). In the latter case, each layer represents a type of interaction. One can also represent a multiplex temporal network by using a multilayer framework.

Because multilayer networks are graphical structures with nodes and edges, the notion of bottlenecks to dynamical processes on networks extends in a natural way to multilayer networks. (See Boccaletti et al. \shortciteBoccaletti:2014bz, Kivelä et al. \shortciteKivela:2014dm, and Salehi et al. \shortcitesalehi2014 for discussions of numerous dynamical processes on such networks.) The notion that diffusion-like dynamics should exhibit bottlenecks when there are good communities has been a fruitful perspective for generalizing algorithmic detection of global community structure from single-layer networks to multilayer networks [\citenameMucha et al., 2010, \citenameDe Domenico et al., 2015]. In the present paper, we view community structure in multilayer networks from a local perspective, and we demonstrate that one can directly apply the methodology from Jeub et al. \shortciteJeub:2015fc by considering a dynamical process that traverses both intralayer and interlayer edges. (One can also study community structure in multilayer networks using other approaches, such as ones based on stochastic block models [\citenamePeixoto, 2015].) In particular, one way to do this is to define an appropriate random walk on a multilayer network.

In the present article, we illustrate some features that one can encounter as a consequence of the particular structure of multiplex networks. As examples, we use two different random walks to explore the structure of synthetic benchmark multiplex networks and two empirical multiplex networks. In Section 1, we discuss random walks on multilayer networks in general and the two random walks that we choose to explore in more detail. In Section 2, we introduce our methodology for identifying and summarizing community structure in networks. We then use this methodology to explore the behavior of the different random walks on synthetic benchmark networks in Section 3 and on a transportation and a social multiplex network in Section 4. We conclude in Section 5.

1 Random Walks on Multilayer Networks

As with a traditional network, different choices are possible when defining a random walk on a multilayer network [\citenameMucha et al., 2010, \citenameDe Domenico et al., 2014, \citenameDe Domenico et al., 2015, \citenameKuncheva & Montana, 2015]. Following [\citenameKivelä et al., 2014], a multilayer network is a graph with an additional layer structure. Here is a set of nodes, is a set of layers, is a set of state nodes1, and is a set of (directed) edges. We use to denote the state node that represents node in layer and to denote a directed edge from state node to state node . One can encode the connectivity structure of a multilayer network, including both intralayer and interlayer edges, using an adjacency tensor (the analogue of the adjacency matrix for single-layer networks) with elements

(1)

One can write a discrete-time random walk on a multilayer network as

(2)

where is the probability for a random walker to be at node in layer at time and is the probability for a random walker at node in layer to transition to node in layer in a time step. The transition tensor encodes both the intralayer and interlayer behavior of the random walk. We also want the random walk to be ergodic, so that it has a well-defined stationary distribution . The stationary distribution is a fixed point of equation 2. That is, it satisfies

(3)

There are different ways to define a random walk on a multilayer network that reduce to the usual definition of a random walk for a single-layer network. The most direct way to generalize the concept of a random walk to a multilayer network is the classical random walk [\citenameMucha et al., 2010, \citenameDe Domenico et al., 2014], which treats interlayer edges and intralayer edges as equivalent objects (though they can be differentiated using heterogeneous spreading rates). The elements of the transition tensor for a classical random walk on a multilayer network are thus

(4)

An alternative way to generalize the concept of a random walk is by using a physical random walk [\citenameDe Domenico et al., 2015, \citenameDe Domenico et al., 2014] with transition-tensor elements

(5)

One time step of this physical random walk corresponds to a random walker first switching layers with probabilities proportional to the weights of the interlayer edges and then performing an ordinary random-walk step in the new layer2. This type of physical random walk on a multilayer network is equivalent to a classical random walk on a transformed multilayer network with adjacency-tensor elements . This transformed multilayer network has non-diagonal, directed interlayer edges even when the original multilayer network is undirected.

(a) Classical random walk

(b) Relaxed random walk
Figure 3: Illustration of two types of random walk on a multilayer network. The walks differ in the way that random walkers change layers. (a) Classical random walk [\citenameMucha et al., 2010, \citenameDe Domenico et al., 2014], in which we introduce interlayer edges with weight between state nodes (i.e., node-layer tuples) that represent the same physical node in adjacent layers. (b) Relaxed random walk [\citenameDe Domenico et al., 2015], in which a random walker is constrained at each step to follow an edge within the same layer with probability and can choose any intralayer edge attached to the same physical node in any layer with probability . In the latter case, the walker chooses uniformly at random from the set of all neighbors across all layers.

In this paper, we consider two types of random walks that have been proposed to study communities in multiplex networks: the classical random walk with uniform categorical coupling [\citenameMucha et al., 2010] and the relaxed random walk [\citenameDe Domenico et al., 2015]. For the classical random walk with uniform categorical coupling (which we henceforth call the “classical random walk” for short), we introduce interlayer edges with uniform weight between all pairs of state nodes that correspond to the same physical node. That is, we define , in Eq. 4. For the relaxed random walk, we constrain the random walker to follow an edge within the same layer with probability (so ) and allow it to choose uniformly at random among all intralayer edges attached to the same physical node with probability . Thus, the transition-tensor for the relaxed random walk has elements

(6)

Alternatively, one can think of the relaxed random walk as a physical random walk with the interlayer edges defined as

In Fig. 3, we illustrate the classical and relaxed random walks. In Section 2, we discuss how we use random walks to identify communities and characterize mesoscale structures in multilayer networks.

2 Local Communities and Network Community Profiles

We seek to contrast the behavior of different types of random walks on multilayer networks. From a dynamical perspective, communities correspond to sets of state nodes that create bottlenecks to a diffusive dynamical process. For a random walk, one measure to quantify bottlenecks is conductance3 [\citenameJerrum & Sinclair, 1988]

(7)

of a set of state nodes. Conductance measures the outflow of random walkers from a set of state nodes relative to the total number of random walkers within the set at stationarity. If a set of state nodes constitutes a bottleneck to a random walk, only a few of the random walkers present within the set should leave the set in a given time step, so the set should have low conductance. The two extreme cases are if has no internal flow (i.e., no state node in is adjacent to any other state node in ) and if is disconnected from the rest of a network.

Different types of random walks correspond to different notions of what it means for a set of nodes to be a good community. Our choice of conductance as a measure of community quality is motivated by its nice theoretical properties. The presence of low-conductance sets (i.e., sets are considered to be “good” communities based on the conductance measure) relates directly to slow mixing of a random walk [\citenameMihail, 1989]. There are efficient algorithms for identifying low-conductance sets with known approximation guarantees [\citenameAndersen et al., 2006, \citenameLeskovec et al., 2009]. There is also some empirical evidence that conductance is a reasonably effective measure for evaluating community quality and that other measures for evaluating community quality give reasonably similar results in practice [\citenameYang & Leskovec, 2012]. However, conductance also has some limitations as a measure of community quality. Most notably, it is not very sensitive to the internal connectivity of putative communities. In the most extreme case, low-conductance sets may even be internally disconnected [\citenameLeskovec et al., 2009, \citenameLeskovec et al., 2010, \citenameJeub et al., 2015]. Our choice of algorithm for identifying local communities (see our discussion below) somewhat mitigates this problem, as it implicitly optimizes the internal connectivity of the identified communities [\citenameLeskovec et al., 2010].

We use the ACLcut method [\citenameAndersen et al., 2006, \citenameLeskovec et al., 2009, \citenameJeub et al., 2015] to identify putative communities. The ACLcut method is based on locally ranking nodes near a seed node by approximating a personalized PageRank () score. Given an appropriate random walk (or other Markov process4), one can define the associated score of state node as the solution to the equation

(8)

where is a probability distribution that determines the seed nodes for the method [\citenameGleich, 2015] and is a teleportation parameter. We use two different types of seeding procedure with the ACLcut method:

  • seeding using a state node , where in Eq. (8); and

  • seeding using a physical node , where in Eq. (8).

We describe the ACLcut method in more detail in Appendix A.

Our main tool for summarizing size-resolved community structure is a network community profile (NCP) [\citenameLeskovec et al., 2009]. An NCP shows the quality of the best community of a given size (i.e., number of nodes) as a function of community size. Because we are using conductance as a measure of community quality, we define the “best” community as the one with the lowest conductance. Hence, we define the NCP as

(9)

We also use local NCPs, where we constrain the communities to contain a given seed set of state nodes. That is,

(10)

An NCP of a network can reveal interesting structural features about the network. In particular, its qualitative shape can reveal the global organization of a network [\citenameJeub et al., 2015]. A local NCP is useful for identifying communities at different scales associated with a particular seed node (or seed set of nodes).

Our code for identifying local communities and visualizing networks is available at https://github.com/LJeub. There is also a recently-proposed extension of the ACLcut method [\citenameKloster & Gleich, 2015] that allows one to sample local NCPs more efficiently, although we did not use it for our computations in this paper.

3 Synthetic Benchmark Multiplex Networks

We now explore the behavior of the two different random walks on synthetic networks with known, planted community structure. The networks that we consider each have nodes and layers (for a total of state nodes, as we assume that every node exists on all layers) and planted communities.

We sample the planted community structure in the different layers in the following way. We first sample a background community structure by sampling the community assignment for each node uniformly at random from . We then sample the community assignments for the state nodes such that a state node inherits the background community assignment of the corresponding physical node with probability and otherwise its community assignment is sampled uniformly at random from .

Given the community assignments for the state nodes, we sample the intralayer edges for the network independently from a block model, such that an edge between two state nodes in the same layer and with the same community assignment is present with probability and an edge between two state nodes in the same layer but with different community assignment is present with probability . The ratio determines the strength of the community structure of the benchmark, where a small ratio indicates strong community structure. The parameter controls the dependency between the layers; the layers have identical community structure for , and community structures in different layers are progressively less related to each other with increasing .

Classical random walk Relaxed random walk
Figure 4: Recoverability of planted community structure using local NCPs. For each value of , we sample local NCPs for 100 uniformly-random seed nodes and compare the best community identified by the local NCP to the planted community for the seed node using the Jaccard coefficient. The curves indicate the median of the Jaccard-coefficient distributions, the dark shaded regions indicate the second and third quartiles of the distributions, and the light shaded regions indicate the bulk of the distributions. Markers indicate outliers.

In Fig. 4, we illustrate the ability of our local community-detection methods to recover the planted structure as we vary and . We fix , , and . We use local NCPs to identify communities in the following way. First, we select a state node uniformly at random as a seed node for the local NCP. We then identify the best community for as the community that achieves the minimum conductance

among all communities that contain the state node . This construction throws away a lot of information, as local minima of a local NCP can reveal interesting aspects of community structure. However, examining only globally optimal communities for a seed node enables one to easily compare algorithmically-obtained community structure with the planted structure. To compare the performance of the different random walks, we use the Jaccard coefficient [\citenameJaccard, 1912]

between the planted community and the best identified community for the seed node. (We obtain the same qualitative results when we compute normalized mutual information.) In Fig. 4, we show the distributions of the Jaccard coefficients for samples of 100 seed nodes.

As we can see from Fig. 4, the performances of the two different random walks are comparable to each other; neither one is clearly better than the other. One interesting result is that, as we increase , it is increasingly pronounced that there are “good” and “bad” seed nodes for identifying community structure. For , the variability in the Jaccard coefficient for different seed nodes is fairly small, but as we increase , the number of outliers increases and we observe increasing variability in the distribution of the Jaccard coefficients.

For small , strong interlayer coupling (i.e., for large values of or , so that the rate of switching layers is high) helps identify the planted partition. For large enough values of , there is a range of for which community structure is sufficiently strong that random walks with weaker interlayer coupling can outperform those with stronger interlayer coupling. This is already true when , and the difference becomes more pronounced as one increases .

For strong interlayer coupling, the two types of random walks (i.e., the classical random walk with and the relaxed walk with ) lose their ability to identify the planted structure as we increase in rather different ways. For small values of (in particular, for ), the classical random walk identifies the background community structure rather than the planted community structure. As one increases further, its performance at detecting both background and planted community structure decreases gradually for all seed nodes.

In contrast, the performance of the relaxed walk deteriorates in a different way. It loses the ability to identify the planted structure for progressively more choices of seed nodes as we increase , but it still performs remarkably well for some seed nodes even when .

4 Empirical Multiplex Networks

We now illustrate our methodology on two empirical multiplex networks.5 Our first example is the European Airline Network [\citenameCardillo et al., 2013], a multiplex transportation network with 37 layers, where each layer includes the flights for a single airline. Our second example is the Lazega Law Firm network [\citenameLazega & Pattison, 1999, \citenameLazega, 2001, \citenameSnijders et al., 2006], a multiplex social network with three layers that represent advice, friendship, and co-work relationships between partners and associates of a corporate law firm. In Table 1, we highlight some key properties of these two networks.

nodes edges layers
European Airline Network [\citenameCardillo et al., 2013] 450 airports 3558 (undirected, unweighted) 37 airlines
Lazega Law Firm Network [\citenameLazega & Pattison, 1999, \citenameLazega, 2001, \citenameSnijders et al., 2006] 71 employees 2223 (directed, unweighted) 3 (advice, friendship, co-work)
Table 1: Example network data sets
(a) European Airline Network
(b) Lazega Law Firm Network
Figure 7: Network community profiles (NCPs) of two aggregated empirical networks. We plot the quality (as measured by conductance) of the best community of each size (as measured by the number of nodes that are a member of the community). LABEL:sub@sfig:NCP_airline_aggregate The NCP of the aggregate European Airline Network has a shape that one sees in networks with a core–periphery structure. LABEL:sub@sfig:NCP_lazega_aggregate The NCP of the aggregate Lazega Law Firm Network is slightly downward-sloping, and the high minimum conductance indicates that the aggregate network has no clear communities.

In Fig. 7, we show the NCPs of aggregated networks that we construct from our example multiplex networks. We define the weight of an edge between two nodes in an aggregated network as the number of edges in the corresponding multilayer network between the associated state nodes. That is, the adjacency matrix of the aggregate network has entries

The plots in Fig. 7 give a reference for the NCPs of the multilayer networks (see Figs. 12 and 17). At the aggregate level, the European Airline Network has an NCP that is suggestive of a core–periphery structure (see [\citenameCsermely et al., 2013] for a review of such structure), although one cannot conclude this with certainty because the network is small and the conductance values are large. We do not observe any clear structure (and, in particular, no clear community structure) in the Lazega Law Firm Network.

(a) Classical random walk
(b) Relaxed random walk
(c) Best community with state nodes for (physical node as seed)
(d) Best community with state nodes for (state node as seed)
Figure 12: European Airline Network. Panels (a) and (b) show NCPs for this network. We plot the quality (as measured by conductance) of the best community of each size (as measured by the number of state nodes that are a member of the community). Sampling using physical nodes (thin curves) versus using state nodes (thick curves) leads to very similar results, and the thin curves are typically hidden underneath the thick curves in this example. Panels (c) and (d) illustrate some of the communities that we obtain. We shade the state nodes in a community from dark red to light gray based on their rank (of their corresponding component) in the degree-normalized PPR-vector that we use to identify the community. (See Appendix A for details.) The large arrows point to the seed nodes. For small layer-jumping probability in the relaxed random walk and small interlayer edge weight in the classical random walk, the best communities tend to consist of sets of similar types of airlines (e.g., they fly to the same airport, are low-cost airlines, or share some other feature). The prominent dips in the NCPs in panel (b) for communities consisting of two state nodes are the result of a spurious connection in the data set that creates a bottleneck for the relaxed random walk. Even for , the relaxed walk still predominantly identifies this type of community. By contrast, for large values of , the classical random walk finds relatively geographically localized communities.
(a) Classical random walk
(b) Relaxed random walk
(c) Best community with state nodes for (state node as seed)
(d) Best community with state nodes for (state node as seed)
Figure 17: Lazega Law Firm Network. Panels LABEL:sub@sfig:lz_NCP_c and LABEL:sub@sfig:lz_NCP_r show NCPs for this network. As with the European Airline Network, when we use a small layer-jumping probability in the relaxed random walk and a small interlayer edge weight in the classical random walk, we obtain similar results even though we consider two different dynamical processes. We also again obtain similar results whether we use a state node or a physical node as a seed. For both types of random walks, with our choice of interlayer connection probability, the communities tend to be localized to a single layer. The prominent minimum in the NCPs at 71 nodes is the result of a community that contains all state nodes in the “co-work” layer. The communities that we highlight in panels LABEL:sub@sfig:lz_w0.1_n19 and LABEL:sub@sfig:lz_r0.01_n19 are responsible for the other, less-pronounced minima in the NCPs at 19 nodes. They contain the members of the firm who are based in the Hartford office.
(e) Best community with state nodes for (physical node as seed)
(f) Best community with state nodes for (physical node as seed)
Figure 20: For large , the classical random walk yields communities that are largely “coherent” across layers: if a state node is a member of a community, then the other state nodes associated with the same physical node also tend to be in that community. By contrast, communities from the relaxed random walk are not as coherent across layers.

In Figs. 12 and 17, we explore the multilayer structure of the airline and law-firm networks. An interesting aspect of multilayer networks is that one can use either physical nodes or state nodes as seeds to sample local communities. We compare the results of these two sampling procedures in Figs. 12 and 17. In these two networks, the two sampling procedures produce very similar results. In some cases, sampling using physical nodes can result in slightly better communities. (See the thick and thin solid blue curves in Fig. (a)a for community sizes between to state nodes.) In other cases, sampling using physical nodes results in slightly worse communities. (See the thick and thin solid blue curves and dashed red curves in panels LABEL:sub@sfig:lz_NCP_c and LABEL:sub@sfig:lz_NCP_r of Fig. 17.)

For directed networks (e.g., the network in Fig. 17), random walks are not necessarily ergodic, so they might not have a unique stationary distribution. We use unrecorded edge teleportation [\citenameLambiotte & Rosvall, 2012, \citenameDe Domenico et al., 2015] to ensure that the random walk is ergodic. This corresponds to replacing the stationary distribution of the random walk in the definition of conductance (Eq. 7) by a PPR vector in which the seed vector is proportional to the vector of in-degrees of the state nodes. That is,

For our results in Fig. 17, we use a teleportation rate of . For weighted networks in which intralayer weights are very different in different layers, one may also need to rescale the edge weights appropriately [\citenameCranmer et al., 2015] to obtain results that are not dominated by a single layer (or small set of layers). However, this issue does not arise in our example networks in the present paper.

As one can see from Figs. 12 and 17, the NCPs for the multilayer networks look very different from those of the aggregated networks in Fig. 7. One exception are the NCPs for the relaxed walk with rate ; its shape resembles that of the NCPs for the aggregated network. For each of these networks, the multilayer structure introduces bottlenecks to the spreading of the random walks that are not present in the associated aggregated networks. This is also reflected in the types of communities that underlie these bottlenecks.

For the airline network, the best communities identified by random walks with weak interlayer coupling tend to contain all state nodes from a given layer or a set of layers. Effectively, the local communities are identifying sets of similar airlines that share many common destinations. However, as we illustrate in Fig. 12, the exact communities identified by the classical random walk and relaxed random walk can be very different. Once the interlayer coupling becomes sufficiently strong, the nature of the communities identified by the classical random walk changes completely. The best communities identified by the classical random walk with strong interlayer coupling tend to be localized geographically and span all layers. The relaxed random walk, however, still predominately identifies sets of airlines even when (i.e., when the interlayer coupling is maximal). In fact, the communities identified by the relaxed random walk with are often rather similar to those identified with .

For the Lazega Law Firm Network, the layer structure also results in bottlenecks to the random walks when the interlayer coupling is weak. This is the cause for the sharp minima in the NCPs in Fig. 17 for community sizes of 71 state nodes. At smaller community sizes, one can identify layer internal structures—most notably a community in the “co-work” layer that consists of the members of the firm that work in the Hartford office. Unlike for the airline network, in the case of the Lazega Law Firm Network, both types of random walk predominantly identify communities that span all layers when the interlayer coupling is sufficiently strong. However, the two types of random walks explore the law-firm network in rather different ways. The classical random walk explores the different layers of the network in a “coherent” manner when the interlayer coupling is strong. That is, if a state node associated with a particular physical node is included in a community, then the other state nodes of that physical node (i.e., its manifestation in the other layers) tend to also belong to the community. However, as we illustrate in panels LABEL:sub@sfig:lz_w10_n91 and LABEL:sub@sfig:lz_r1_n91 of Fig. 17, the communities identified by the relaxed random walk tend to be less coherent across layers than those identified by the classical random walk.

To understand the difference in behavior between the relaxed and classical random walks at high layer-switching rates (i.e., for and for large ), it is important examine the behavior of the two dynamical processes that are induced on the aggregated networks. The dynamical process induced on the aggregated network by the relaxed random walk with is simply a standard random walk. However, the process induced by the classical random walk for large explores the aggregated network much more slowly than a standard random walk, as most of the transitions are between state nodes that represent the same physical node. This results in a downward shift of the NCPs as one increases . This has a similar effect as introducing a self-loop at each node. (As discussed in Arenas et al. \shortciteArenas:2008hq, introducing self-loops is one way to introduce a resolution parameter in the modularity quality function.)

5 Discussion and Conclusions

We have seen using example synthetic networks that bottlenecks of random walks on a multilayer network can reveal nontrivial multiplex community structure in which community structure in different layers of a network is related but not identical. We explored two types of random walks — a classical random walk and a relaxed random walk — for identifying structure in our synthetic benchmarks, and we found that they can behave rather differently from each other in some situations. Consequently, different random walks give different community structures in a network, and one thus also expects to observe (although we did not test this directly) different community structures in different global methods (e.g., based on optimizing a quality function) based on the two different types of random walks. Similar results have been noted previously in other contexts [\citenameLambiotte et al., 2009, \citenameLambiotte et al., 2015].

As we saw in Section 4, the behavior of the random walks on a multilayer network is in general very different from the behavior of a random walk on a corresponding aggregated network. Consequently, examining a multilayer network can reveal important information that is not visible in a corresponding aggregated network. In particular, bottlenecks to random walks on a multilayer network can reveal structures, such as sets of related layers and communities confined to a single layer, that are impossible to see in aggregated networks.

Our approach is very general, and a suite of other dynamical processes can also used to develop a diverse family of local community-detection methods. In addition to considering other processes, in advancing our work further, it will also be interesting to exploit transformations between ordinary random walks and other types of random walks [\citenameLambiotte et al., 2011, \citenameYan et al., 2016]. Another interesting extension of our approach would be to use it as the seed-set-expansion part of a seed-centric algorithm [\citenameKanawati, 2014, \citenameHmimida & Kanawati, 2015, \citenameWhang et al., 2016] for detecting communities in multilayer networks.

Acknowledgements

LGSJ acknowledges a CASE studentship award from the EPSRC (BK/10/039), and LGSJ and MAP were supported in part from the James S. McDonnell Foundation 21st Century Science Initiative - Complex Systems Scholar Award grant # 220020177 and the FET-Proactive project PLEXMATH (FP7-ICT-2011-8; grant # 317614) funded by the European Commission. MAP was also supported by the EPSRC (EP/J001759/1). MWM acknowledges funding from the Army Research Office and from the Defense Advanced Research Projects Agency. PJM was supported from the James S. McDonnell Foundation 21st Century Science Initiative - Complex Systems Scholar Award grant # 220020315. MAP also thanks SAMSI for supporting several visits and MWM for his hospitality during a sabbatical at Stanford.

Appendix A Sampling Network Community Profiles (the ACLcut method)

\figrule\programmath
set up seed vector
compute -approximate PageRank vector
normalized sweep cut
return conductance and communities
\unprogrammath
Figure 21: ACLcut method for sampling local communities. The inputs are the transition tensor of the random walk, a seed set of state nodes, and a vector of node volumes — which is proportional either to the stationary distribution of or (when considering teleportation) to a PageRank vector. The resolution of the method is controlled by the teleportation parameter and the truncation parameter .
\figrule
\figrule\programmath
convert to equivalent lazy-walk teleportation
initialize PageRank vector
initialize residual
keep track of nodes to update
select a node to update
push probability mass to PageRank vector
check and its neighbors to update
return -approximate PageRank vector
\unprogrammath
Figure 22: APPR procedure for computing -approximate PageRank vectors using only local information. See Fig. 21 for a description of the input arguments.
\figrule
\figrule\programmath
return state nodes in descending order of
get the next state node to consider
update conductance
return conductance values and sweep sets
\unprogrammath
Figure 23: SweepCut procedure for identifying communities based on a ranking vector for the state nodes.
\figrule

We use the ACLcut method [\citenameAndersen et al., 2006, \citenameLeskovec et al., 2009, \citenameJeub et al., 2015] to sample local communities and network community profiles (NCPs). In this appendix, we briefly discuss how we apply this procedure to identify communities using a general random walk on a multilayer network.

The key idea behind the ACLcut method is the use of a “push” procedure [\citenameAndersen et al., 2006], which pushes probability mass from the residual vector to the PageRank vector while preserving the invariant . We describe the different parts of the ACLcut method in Figs. 2123. In addition to the teleportation parameter , the ACLcut method also depends on a truncation parameter . The ACLcut method terminates once the residual is small enough so that for all state nodes , where the quantity denotes a vector of node volumes. We set , where is either the stationary distribution of the random walk or (when considering teleportation) it is a PageRank vector. The rescaling is purely for computational convenience, as from a theoretical perspective the results are invariant under rescaling of the node volumes (because one also rescales ).

To sample an NCP, we use the ACLcut method to sample communities for different values of and different seed nodes, and we take the lower envelope of the conductance values. To sample a local NCP, we only vary and use the seed set of the local NCP as a seed set for the ACLcut method. We use 20 logarithmically spaced values for in the interval , and we fix . For each value of , we initially set the sample set of potential seed nodes to be either the set of all state nodes (i.e., ) or the set of all physical nodes (i.e., ). We then sample seed nodes uniformly at random without replacement from until is empty. To avoid excessive computations for small values of , we remove nodes from once they have been included in the best local community returned by the ACLcut method 10 times. This sampling procedure allows one to estimate an NCP in a time that scales almost linearly with the number of state nodes while ensuring good coverage of the structure of a network.

Footnotes

  1. Following [\citenameDe Domenico et al., 2015], we use the terms state node to refer to a node-layer tuple and physical node to refer to the collection of all state nodes that represent the same node.
  2. Note that this definition of a physical random walk assumes that the multilayer network has diagonal coupling (i.e., that all interlayer edges are between state nodes that represent the same node).
  3. For a random walk on an undirected, single-layer network, this definition of conductance is equivalent to the conductance in [\citenameLeskovec et al., 2009, \citenameJeub et al., 2015].
  4. More generally, it would also be both fruitful and interesting to develop local community-detection methods using dynamical processes that are not Markovian. A good start would be to use our approach through suitable adaptations of other processes that have been used to examine community structure in networks. Examples include Kuramoto phase oscillators [\citenameArenas et al., 2006]; epidemic spreading processes [\citenameGhosh et al., 2014]; and higher-order Markovian processes, such as those that have been employed in the study of “memory networks” [\citenameRosvall et al., 2014].
  5. Note that for many multilayer networks (and, in particular, for the example networks that we examine in this paper), data on the weights of interlayer edges are not explicitly available [\citenameKivelä et al., 2014].

References

  1. Andersen, R., Chung, F. R. K., & Lang, K. J. (2006). Local graph partitioning using PageRank vectors. Pages 475–486 of: Proceedings of the 47th Annual Symposium on Foundations of Computer Science. IEEE.
  2. Arenas, A, Díaz-Guilera, A, & Pérez-Vicente, C J. (2006). Synchronization reveals topological scales in complex networks. Physical Review Letters, 96(11), 114102.
  3. Arenas, A, Fernández, A, & Gómez, S. (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5), 053039.
  4. Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gómez-Gardenes, J., Romance, M., Sendiña-Nadal, I., Wang, Z., & Zanin, M. (2014). The structure and dynamics of multilayer networks. Physics Reports, 544(1), 1–122.
  5. Cardillo, A, Gómez-Gardeñes, J, Zanin, M, Romance, M, Papo, D, del Pozo, F, & Boccaletti, S. (2013). Emergence of network features from multiplexity. Scientific Reports, 3, 1344.
  6. Coscia, M, Giannotti, F, & Pedreschi, D. (2011). A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining, 4(5), 512–546.
  7. Cranmer, S J, Menninga, E J, & Mucha, P J. (2015). Kantian fractionalization predicts the conflict propensity of the international system. Proceedings of the National Academy of Sciences of the United States of America, 112(38), 11812–11816.
  8. Csermely, P., London, A., Wu, L.-Y., & Uzzi, B. (2013). Structure and dynamics of core–periphery networks. Journal of Complex Networks, 1, 93–123.
  9. De Domenico, M., Solè-Ribalta, A., Gómez, S., & Arenas, A. (2014). Navigability of interconnected networks under random failures. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8351–8356.
  10. De Domenico, M., Lancichinetti, A., Arenas, A., & Rosvall, M. (2015). Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems. Physical Review X, 5(1), 011027.
  11. Delvenne, J.-C., Yaliraki, S. N., & Barahona, M. (2010). Stability of graph communities across time scales. Proceedings of the National Academy of Sciences of the United States of America, 107(29), 12755–12760.
  12. Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75–174.
  13. Ghosh, R, Lerman, K, Teng, S-H, & Yan, X. (2014). The interplay between dynamics and networks: Centrality, communities, and Cheeger inequality. Pages 1406–1415 of: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’14. New York, NY, USA: ACM.
  14. Gleich, D F. (2015). PageRank beyond the Web. SIAM Review, 57(3), 321–363.
  15. Hmimida, M, & Kanawati, R. (2015). Community detection in multiplex networks: A seed-centric approach. Networks and Heterogeneous Media, 10(1), 71–85.
  16. Holme, P. (2015). Modern temporal network theory: A colloquium. The European Physical Journal B, 88, 234.
  17. Holme, P., & Saramäki, J. (2012). Temporal networks. Physics Reports, 519, 97–125.
  18. Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50.
  19. Jerrum, M, & Sinclair, A. (1988). Conductance and the rapid mixing property for Markov chains: The approximation of permanent resolved. Proceedings of the 20th annual ACM symposium on theory of computing. ACM.
  20. Jeub, L. G. S., Balachandran, P., Porter, M. A., Mucha, P. J., & Mahoney, M. W. (2015). Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physical Review E, 91, 012821.
  21. Kanawati, R. (2014). Seed-centric approaches for community detection in complex networks. Pages 197–208 of: Meiselwitz, Gabriele (ed), Lecture Notes in Computer Science, vol. 8531. Springer International Publishing.
  22. Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203–271.
  23. Kloster, K, & Gleich, D F. 2015 (Mar.). Seeded pagerank solution paths. arXiv:1503.00322v2 [cs.SI].
  24. Kuncheva, Z, & Montana, G. (2015). Community detection in multiplex networks using locally adaptive random walks. Pages 1308–1315 of: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ASONAM ’15. New York, NY, USA: ACM.
  25. Lambiotte, R, & Rosvall, M. (2012). Ranking and clustering of nodes in networks with smart teleportation. Physical Review E, 85(5), 056107.
  26. Lambiotte, R, Delvenne, J-C, & Barahona, M. (2009). Laplacian dynamics and multiscale modular structure in networks. arXiv:0812.1770v3 [physics.soc-ph].
  27. Lambiotte, R., Sinatra, R., Delvenne, J.-C., Evans, T. S., Barahona, M., & Latora, V. (2011). Flow graphs: Interweaving dynamics and structure. Physical Review E, 84(1), 017102.
  28. Lambiotte, R., Delvenne, J.-C., & Barahona, M. (2015). Random walks, Markov processes and the multiscale modular organization of complex networks. Transactions on Network Science and Engineering, 1(2), 76–90.
  29. Lazega, E. (2001). The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford, UK: Oxford University Press.
  30. Lazega, E., & Pattison, P. E. (1999). Multiplexity, generalized exchange and cooperation in organizations: A case study. Social Networks, 21(1), 67–90.
  31. Leskovec, J., Lang, K. J., Dasgupta, A., & Mahoney, M. W. (2009). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1), 29–123.
  32. Leskovec, J, Lang, K J, & Mahoney, M W. (2010). Empirical comparison of algorithms for network community detection. Pages 631–640 of: Proceedings of the 19th International Conference on World Wide Web. ACM.
  33. Mihail, M. (1989). Conductance and convergence of Markov chains — A combinatorial treatment of expanders. Pages 526–531 of: Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE.
  34. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J.-P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980), 876–878.
  35. Newman, M. E. J. (2010). Networks: An Introduction. Oxford, UK: Oxford University Press.
  36. Peixoto, T. P. (2015). Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. Physical Review E, 92(Oct), 042807.
  37. Porter, M A, Onnela, J-P, & Mucha, P J. (2009). Communities in networks. Notices of the American Mathematical Society, 56(9), 1082–1097, 1164–1166.
  38. Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the United States of America, 105(4), 1118–1123.
  39. Rosvall, M., Esquivel, A. V., Lancichinetti, A., West, J. D., & Lambiotte, R. (2014). Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications, 5, 4630.
  40. Salehi, M, Sharma, R, Marzolla, M, Magnani, M, Siyari, P, & Montesi, D. (2015). Spreading processes in multilayer networks. IEEE Transactions on Network Science and Engineering, 2(2), 65–83.
  41. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36(1), 99–153.
  42. Wasserman, S, & Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.
  43. Whang, J J, Gleich, D F, & Dhillon, I S. (2016). Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge and Data Engineering, 28(5), 1272–1284.
  44. Yan, X, Teng, S-H, Lerman, K, & Ghosh, R. (2016). Capturing the interplay of dynamics and networks through parameterizations of Laplacian operators. PeerJ Computer Science, 2(May), e57.
  45. Yang, J, & Leskovec, J. (2012). Defining and evaluating network communities based on ground-truth. Pages 3:1–3:8 of: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. New York: ACM.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
204646
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description