# Generative benchmark models for mesoscale structure in multilayer networks

###### Abstract

Multilayer networks allow one to represent diverse and interdependent connectivity patterns — e.g., time-dependence, multiple subsystems, or both — that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate “mesoscale” (i.e., intermediate-scale) structures, such as dense sets of nodes known as “communities” that are connected sparsely to each other, to discover network features that are not apparent at the microscale or the macroscale. A variety of methods and algorithms are available to identify communities in multilayer networks, but they differ in their definitions and/or assumptions of what constitutes a community, and many scalable algorithms provide approximate solutions with little or no theoretical guarantee on the quality of their approximations. Consequently, it is crucial to develop generative models of networks to use as a common test of community-detection tools. In the present paper, we develop a family of benchmarks for detecting mesoscale structures in multilayer networks by introducing a generative model that can explicitly incorporate dependency structure between layers. Our benchmark provides a standardized set of null models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. We discuss the parameters and properties of our generative model, and we illustrate its use by comparing a variety of community-detection methods.

^{†}

^{†}thanks: Both authors contributed equally.

^{†}

^{†}thanks: Both authors contributed equally.

## I Introduction

Many physical, technological, biological, financial, and social systems can be modeled as networks, which in their simplest form are represented as graphs Newman (2010). A graph consists of a set of nodes that represent entities and a set of edges between pairs of nodes that represent interactions between those entities. One can consider either unweighted graphs or weighted graphs, in which each edge has a weight that quantifies the strength of the associated interaction. Edges can also have a direction to represent asymmetric interactions or a sign to differentiate between positive and negative interactions.

Given a network representation of a system, it is often useful to apply a coarse-graining technique to investigate features that lie between those at the “microscale” (e.g., nodes, pairwise interactions between nodes, or local properties of nodes) and those at the “macroscale” (e.g., total edge weight, degree distribution, and mean clustering coefficient) Porter et al. (2009); Fortunato (2010). One thereby studies “mesoscale” features such as community structure Porter et al. (2009), core–periphery structure Csermely et al. (2013), role structure Rossi and Ahmed (2015), or others. To make our discussions concrete, we focus primarily on community structure in this paper. We then briefly explain how our results also apply to other types of mesoscale structures in Section VI.

Loosely speaking, a community in a network is a set of nodes that are “more densely” connected to each other than they are to nodes in the rest of the network Porter et al. (2009); Fortunato (2010); Fortunato and Hric (2016); Schaub et al. (2016). Communities thus represent an “assortative structure” — i.e., intracommunity edges are more likely than intercommunity edges — between sets of nodes in a network. Most scholars agree that a “good community” should be a set of nodes that are ‘surprisingly well-connected’ in some sense, but what one actually means by ‘surprising’ and ‘well-connected’ can be application-dependent (and sometimes appears to be somewhat subjective). In many cases, a precise definition of “community” ultimately depends on the method and algorithm that one uses to detect them. Myriad methods have been developed to algorithmically detect communities, and efforts at community detection have led to insights in applications such as granular force networks Bassett et al. (2015), committee and voting networks in political science Porter et al. (2005); Mucha et al. (2010), friendship networks at universities and schools Traud et al. (2012); González et al. (2007), protein interaction networks in biology Lewis et al. (2010); Guimerà et al. (2005), brain and behavioral networks in neuroscience Bassett et al. (2011); Wymbs et al. (2012), human communication networks Onnela et al. (2007); Expert et al. (2011), and more.

A popular approach to community detection is the optimization of a quality function, such as modularity Girvan and Newman (2002); Mucha et al. (2010); Bazzi et al. (2016), stability Petri and Expert (2014); Lambiotte et al. (2015); Delvenne et al. (2010), Infomap and its variants De Domenico et al. (2015); Rosvall and Bergstrom (2008), and likelihood functions derived from “stochastic block models” (SBMs) Holland et al. (1983); Karrer and Newman (2011); Stanley et al. (2016); Peixoto (2015a); Kloumann et al. (2016), which are models for partitioning a network into blocks of nodes with statistically homogeneous connectivity patterns. Different quality functions are motivated by different interpretations of “community” (e.g., as bottlenecks of dynamical processes Lambiotte et al. (2015); Delvenne et al. (2010); Rosvall and Bergstrom (2008)), and different SBMs make different assumptions on the underlying network structure Karrer and Newman (2011); Stanley et al. (2016); Peixoto (2015a). Further complications arise from the fact that community-assignment problems for many notions of community structure — including the most prominent methods — cannot be solved exactly in polynomial time (unless P = NP) Fortunato (2010); Brandes et al. (2008); Jacobs and Clauset (2014). Together with the need for methods to scale to be usable for very large networks, this necessitates the use of computational heuristics that only find approximate solutions, and there is often little theoretical understanding of how closely such approximate solutions resemble an optimal solution.

Benchmark networks with known structural properties are important for (1) analyzing and comparing the performance of different community-detection methods and algorithms and (2) determining which one(s) are most appropriate in a given situation. The ill-defined nature of community detection makes it particularly crucial to develop good benchmark networks. We propose to do this using generative models. In many benchmark networks, one plants a partition (i.e., an assignment of nodes to communities) of a network into well-separated communities, and one thereby imposes a so-called “ground truth” (should one wish to use such a notion) that a properly deployed community-detection method ought to be able to find Peel et al. (2016). However, another complication arises: there is a “detectability limit” for communities, as one can plant partitions that, under suitable conditions, cannot subsequently be detected algorithmically even though they exist by construction Decelle et al. (2011); Ghasemian et al. (2016); Taylor et al. (2016).

For a single-layer network (i.e., a “monolayer network”), which is the standard type of network and is represented mathematically as a graph, many different types of benchmark networks have been developed to capture different aspects of community structure. A benchmark that was employed in early studies of community structure is the “planted--partition model” Condon and Karp (2000), which was used in Girvan and Newman (2002). The most popular family of benchmark networks are the Lancichinetti–Fortunato–Radicchi (LFR) networks Lancichinetti et al. (2008), which were generalized subsequently in Lancichinetti and Fortunato (2009) to be able to generate weighted and directed networks with either disjoint or overlapping communities. To attempt to capture the heterogeneity observed in many real networks Newman (2010), LFR networks have power-law degree distributions and community-size distributions. A similar benchmark, based on a degree-corrected SBM, was suggested in Karrer and Newman (2011). In Section IV, we point out some advantages of this benchmark over LFR networks when used to generate multilayer networks. In our numerical experiments in Section V, we use a slight variant of the degree-corrected SBM benchmark in which we impose the additional constraint that there can be neither self-edges nor multi-edges (see Algorithm 2). There have also been some efforts towards highlighting unrealistic features of LFR networks Jeub et al. (2015a) and towards developing more realistic benchmarks Orman et al. (2013). Other benchmarks have also been developed to capture particular aspects of some networks, such as being embedded in or influenced by space Sarzynska et al. (2015).

In many applications, monolayer networks are insufficient for capturing the intricacies of connectivity patterns between entities. For example, this arises in both temporal Holme and Saramäki (2012); Holme (2015); Valdano et al. (2015); Bazzi et al. (2016) and multiplex networks Cranmer et al. (2015); Gallotti and Barthélemy (2014); Fan and Yeung (2015); Cantini et al. (2015); Kéfi et al. (2016), which we represent as multilayer networks Kivelä et al. (2014); Boccaletti et al. (2014). Our motivation for considering a single multilayer network instead of several networks independently is that connectivity patterns in different layers often depend on each other in some way. For example, the connectivity patterns in somebody’s Facebook friendship network today are not independent either of the connectivity patterns in that person’s Facebook friendship network last year (temporal) or of the connectivity patterns in that person’s Twitter follower/following network today (multiplex). Data sets that have multilayer structures are becoming increasingly available, and several methods to detect communities in multilayer networks have now been developed Mucha et al. (2010); De Domenico et al. (2015); Stanley et al. (2016); Peixoto (2015a); Kivelä et al. (2014); Jeub et al. (2015b); Wilson et al. (2016).

Existing benchmark networks for multilayer community detection assume either a temporal structure Sarzynska et al. (2015); Granell et al. (2015); Ghasemian et al. (2016); Zhang et al. (2016); Hulovatyy and Milenković (2016); Pasta and Zaidi (2016) or a simplified multiplex structure with independent blocks of layers in which layers in the same block have identical community structures De Domenico et al. (2015). Existing multilayer SBMs make different assumptions on the underlying network structure. For example, Paul and Chen (2015); Vallès-Català et al. (2016); Barbillon et al. (2016) used the same monolayer SBM for each layer, and Stanley et al. (2016) used independent monolayer SBMs (one for each set of “similar” layers). We also note Ref. Paul and Chen (2016), which developed several multilayer random-graph models (including an SBM) to use as null models. Much of the theory and examples in Peixoto (2015a) entail fitting a monolayer SBM with the same partition (but in which other model parameters may differ) to each layer, although the author pointed out that his framework allows more flexibility if one allows overlapping communities in the model Peixoto (2015a, b). Importantly, none of the aforementioned SBMs include an explicit parameter to capture dependency structure between layers. Therefore, although one may be able to use those SBMs to recover mesoscale structure that fits the employed model, one cannot use them to generate multilayer networks with a prescribed interlayer dependency structure other than the specific one(s) that are built into the model.

In this paper, we propose a method for generating random multilayer partitions with arbitrary interlayer dependency structures. Given a user-specified interlayer dependency structure, we define a Markov chain on the space of multilayer partitions and sample a multilayer partition from its stationary distribution. (See Section III for our generative model of a multilayer partition.) Our sampling procedure includes the temporal structure in Sarzynska et al. (2015); Ghasemian et al. (2016) and simplified multiplex structure in De Domenico et al. (2015) as special cases. Broadly speaking, our generation process is threefold: (1) a user specifies the interlayer dependency structure; (2) we then sample a multilayer partition of the set of nodes that satisfies the specified dependency structure; and (3) finally, we sample edges between nodes in a way that is consistent with the planted multilayer partition. We explain steps (1) and (2) in Section III, and we explain step (3) in Section IV. We treat the process of generating a multilayer partition separately from the process of generating edges for a given multilayer partition. That is, we carry out steps (2) and (3) successively (and not in parallel). This yields a modular approach for generating benchmark multilayer networks because one can modify steps 2 and 3 independently.

In our numerical experiments, we generate intralayer edges (i.e., edges within layers) using degree-corrected SBMs for each layer. Our construction produces a multilayer network with no edges between layers but in which the connectivity patterns in different layers depend on each other (i.e., they are “interdependent”). This is the most commonly studied type of multilayer network (see Table 2 of Kivelä et al. (2014)). One can also use our multilayer formulation of the degree-corrected SBM in Section IV to generate interlayer edges (i.e., edges between layers).

One can combine our approach for generating multilayer partitions with different network generating models that capture various important features. For example, one can use an SBM to generate intralayer edges and/or interlayer edges, or one can replace the degree-corrected SBM in Section IV with any other monolayer network model with a planted partition (e.g., other variants of SBMs Holland et al. (1983); Jacobs and Clauset (2014) and models for spatially embedded networks Sarzynska et al. (2015); Newman and Peixoto (2015)). In all of these examples, dependencies between connectivities in different layers result only from the planted multilayer partition. (See Zhang et al. (2016) for a different approach, in which dependencies between different layers are introduced via the network generation process.) It is also possible to modify our network-generation process to introduce additional interdependencies between connectivity patterns in different layers beyond that induced by planted mesoscale structure (see Section VI). Our generative model is a very general one (and, as discussed above, we have constructed it to easily admit additional salient features of empirical multilayer networks), and we illustrate its use with a few important special cases. Note, however, that it is not the goal of our paper to cover as many special cases as possible.

Along with this paper, we include publicly available code Jeub and Bazzi (2016) that users can modify to readily incorporate different types of interlayer dependency structures (see Section III.1), null distributions (see Section III.3), and monolayer network models with a planted partition (see Section IV).

The rest of the paper is organized as follows. We start in Section II with an overview of the definitions and notation that we use. In Section III, we explain in detail how we generate a multilayer partition with a specified dependency structure between layers. We give some properties of a sampled partition and discuss the effect of some parameter choices on the resulting partition. In Section IV, we describe how we generate edges that are consistent with the planted partition. The modular nature of our approach enables one to generate edges using any monolayer network model with a planted partition. In Section V, we describe numerical experiments to compare the performance of various community-detection methods. In Section VI, we provide a summary of our main results and we outline how one can incorporate more realistic features into the generative model (e.g., change points Peel and Clauset (2015); Peixoto (2015a), mesoscale structures other than community structure, and others) and discuss directions for future work.

## Ii Multilayer Networks: Preliminaries

The simplest type of network is a graph , where is a set of nodes and is a set of edges. Using a graph, one can encode the presence or absence of connections (the edges) between entities (the nodes). However, in many situations, one would like to include more detailed information about connections between entities. A common extension is to allow edges to have a weight, which one can use to represent the strength of a connection. We represent edge weights using a weight function which assigns a weight to each edge.

As we mentioned in Section I, one can further extend the network framework to represent different “aspects” of connections between entities, such as connections at different points in time or different types of connections that occur simultaneously. We adopt a “multilayer network” framework Kivelä et al. (2014); De Domenico et al. (2013) to represent such connections. In a multilayer network, a node is present in a variety of different “states”, where each state is further characterized by a variety of different aspects. In this setting, edges join “state nodes”, each of which is the representation of a given node in a particular state. One can think of the aspects as features that one needs to specify to identify the state of a node. In other words, a state is a collection of exactly one element from each aspect. For convenience, we introduce a mapping such that we assign an integer label to each element of an aspect. That is, the elements of the aspect are mapped to the elements of a set of integers. Aspects can be unordered (e.g., social-media platform) or ordered (e.g., time). For an ordered aspect, we require that the mapping respects the order of the aspect (e.g., for time, where is a set of discrete time points). A multilayer network can include an arbitrary number of ordered aspects and an arbitrary number of unordered aspects, and one can generalize these ideas further (e.g., by introducing a time horizon) Kivelä et al. (2014).

To illustrate the above ideas, consider a hypothetical social network in which we observe different types of connections (‘friendship’ and ‘following’) on different platforms (‘Facebook’, ‘Twitter’, and ‘LinkedIn’) between the same set of people at different points in time. In our example, we have three aspects: type of connection, social-media platform, and time. The first aspect is unordered and consists of two elements: ‘friendship’ (which we map to the integer ) and ‘following’ (which we map to the integer ). The second aspect is also unordered and consists of three elements: ‘Facebook’ (which we map to the integer ), ‘Twitter’ (which we map to the integer ), and ‘LinkedIn’ (which we map to the integer ). The third aspect is ordered and consists of as many elements as there are time points or time intervals. If we assume that the time resolution is daily and spans the year 2010, an example of a state is the triple (‘following’, ‘Facebook’, ‘01-Jan-2010’) or equivalently .

We refer to the set of all state nodes that represent a given entity as a “physical node” and the set of all state nodes in a given state as a “layer”. Note that there is a bijective mapping between nodes and physical nodes and a bijective mapping between states and layers. One can have connections between nodes in the same state (i.e., intralayer edges) and nodes in different states (i.e., interlayer edges). An example of an intralayer edge is (1, (‘following’, ‘Twitter’, ‘01-Jan-2010’)) (2, (‘following’, ‘Twitter’, ‘01-Jan-2010’)), indicating that entity 1 was following entity 2 on Twitter on 1 January 2010. An example of an interlayer edge is (1, (‘following’, ‘Twitter’, ‘01-Jan-2010’)) (1, (‘following’, ‘Twitter’, ‘02-Jan-2010’)), indicating that information can flow from entity on 1 January 2010 to entity on 2 January 2010.

More formally, following the notation of Kivelä et al. (2014), we consider a general multilayer network with nodes (or physical nodes) and states (or layers). For ease of writing, we use the same notation for nodes and physical nodes, and we also use the same notation for states and layers. We use to denote the number of aspects and use to denote the labels of the aspect (where ). We use to denote the set of ordered aspects and to denote the set of unordered aspects. The set of states is the Cartesian product of the aspects, where a state is an integer vector of length and each entry specifies an element of the corresponding aspect. Note that .

We use to denote the state node (i.e., “node-layer tuple” Kivelä et al. (2014)) representing node in state . We only include a state node in if the corresponding node exists in that state. The edges in a multilayer network are between state nodes, where we use to denote a directed edge from to . For two state nodes, and , connected by a directed edge , we say that is an in-neighbor of and is an out-neighbor of . We categorize the edges into intralayer edges , which have the form and link entities and in the same state , and interlayer (or coupling) edges , which have the form for . We thereby decompose the edge set .

We define a weighted multilayer network by introducing a weight function (analogous to the weight function for weighted monolayer networks), which encodes the edge weights within and between layers. For an unweighted multilayer network, we define for all to simplify our discussion. We encode the connectivity of a multilayer network using an adjacency tensor , analogous to the adjacency matrix for monolayer networks, with entries

(1) |

Note that is a graph on the state nodes of the multilayer network . We refer to as the flattened network associated with . The adjacency matrix of the flattened network is the “supra-adjacency matrix” Kivelä et al. (2014); De Domenico et al. (2014, 2013); Gómez et al. (2013) of the multilayer network. One obtains the supra-adjacency matrix by flattening ^{1}^{1}1
Flattening (or ‘matricizing’) Kivelä et al. (2014) a tensor corresponds to writing its entries in matrix form. Consequently, instead of having a tensor with elements , one has a matrix with entries , where there are bijective mappings between the indexes. the adjacency tensor (Eq. (1)) of the multilayer network. The multilayer network and the corresponding flattened network encode the same information Kolda and Bader (2009); Kivelä et al. (2014), provided one keeps track of the mapping between state nodes, nodes, and layers.

We denote a multilayer partition with elements by , where and for . We represent a partition using a partition tensor with entries , where if and only if the state node is in set . A multilayer partition induces a partition on each layer, where . For convenience, we refer to an element of a partition as a community and to as the induced community on layer . We call the label of community . We use the word “community” to make our discussions more concrete, but we emphasize that our model (see Section III) for generating a multilayer partition only assumes that its set of state nodes can be partitioned into subsets and makes no assumption on the multilayer network edge structure. In particular, one can use a planted multilayer partition to generate dependent mesoscale structures beyond community structure (e.g., including non-assortative structures) in different layers (see Section VI).

## Iii Generating sampled multilayer partitions

We generate multilayer networks with planted mesoscale structure by proceeding in two steps. First, we sample a multilayer partition that has user-specified dependency properties between induced partitions in different layers. Second, we generate a random multilayer network that reflects this prescribed partition. Importantly, we generate a multilayer network after generating a multilayer partition rather than in parallel. Our approach for generating benchmark multilayer networks is thus modular, and each of these steps can be modified separately to incorporate features however one desires (see Section VI). In this section, we focus on how we generate a multilayer partition. In Section IV, we discuss how we generate multilayer networks.

We use an iterative process to introduce dependencies between induced partitions in different layers. Our community-assignment update process for generating a multilayer partition consists of two parts: (1) incorporating interlayer dependencies and (2) incorporating layer-specific random components. The first part is governed by a user-specified “interlayer dependency tensor” that determines the extent to which an induced partition in one layer can be inferred directly from the induced partition of another layer. The second part is governed by independent layer-specific “null distributions” that determine the community assignment of state nodes whenever these assignments are not updated by the interlayer dependency tensor. Our independence assumption on the null distributions allows us to keep all of the interlayer dependencies in a single object (the interlayer dependency tensor).

Our procedure for generating a multilayer partition with a prescribed interlayer dependency structure has four main stages: (1) a user specifies the interlayer dependency structure via an interlayer dependency tensor; (2) one initializes the community assignments of state nodes using independent layer-specific null distributions; (3) one updates the community assignments of state nodes using a copying process that is governed by the interlayer dependency tensor and null distributions; (4) one repeats step (3) until the partition converges to a multilayer partition with the prescribed interlayer dependency structure. We illustrate these stages in Fig. 2.

In Section III.1, we explain our model for generating a multilayer partition in its most general form. In Section III.2, we restrict our discussions to the specific situation in which a physical node is present in all layers (i.e., the network is “fully interconnected” Kivelä et al. (2014)), and in which interlayer dependencies occur only between state nodes that correspond to the same physical node (i.e., “diagonal” coupling Kivelä et al. (2014)) and are uniform across state nodes for a given pair of layers (i.e., “layer-coupled” Kivelä et al. (2014)). In Section III.3, we describe possible choices for the layer-specific null distributions. In Section III.4, we focus on the case of temporal networks with interlayer dependencies only between contiguous layers. We take advantage of the analytical tractability of this special case to discuss some of its properties and to illustrate some effects of the choice of null distribution on a multilayer partition.

### iii.1 A general class of multilayer partitions

We begin by defining the interlayer dependency tensor. We then explain how, given a choice of null distributions, one can use the interlayer dependency tensor to generate multilayer partitions with a desired dependency structure. To do this, we use a copying process on the community assignment of state nodes to generate dependencies between induced partitions in different layers. The copying probabilities are user-specified and govern the dependencies between induced partitions.

We encode the copying probabilities in an interlayer dependency tensor , where is the probability that state node copies its community assignment from state node . The total probability that state node copies its community assignment from another state node is , and we thus require that for each state node . Furthermore, we assume that a state node can only copy its community assignment from a state node in a different layer (i.e., if ). We call the network with adjacency tensor the interlayer dependency network, and we note that it has only interlayer edges. In general, the interlayer dependency network is directed, with edges pointing in the direction of information flow between layers. The in-neighbors (see Section II) of a state node in the interlayer dependency network thus correspond to the set of state nodes from which can copy a community assignment.

As we mentioned in Section II, an aspect of a multilayer network can either be ordered or unordered, and we want to treat these two cases differently when generating multilayer partitions. To generate ordered aspects, the structure of the interlayer dependency tensor should reflect the causality implied by the order of the aspect’s elements. For an ordered aspect, structure in a given layer therefore only depends directly on structure in previous layers. Formally, an aspect of a multilayer network is causally ordered if

(2) |

where denotes the element of state corresponding to aspect and where, as stated in Section II, we require that the labels of an aspect’s elements reflect the ordering of those elements. We define a partial order for the layers based on the ordered aspects, where if and only if for all . This partial order of the layers allows us to combine the different notions of causality implied by the ordered aspects and to respect them through the order in which we update community assignments of state nodes.

A single community-assignment update step is independent of the partial order of the layers; it depends only on the choice of state node to update and the current multilayer partition. In this paragraph, we describe an update step for any given interlayer dependency tensor. Suppose that we are updating the community assignment of state node at step of the copying process and that the current multilayer partition is (with partition tensor ). The community assignment of state node is updated either by copying the community assignment in from one of its in-neighbors in the interlayer dependency network or by obtaining a new, random, community assignment from the specified null distribution for layer . In particular, with probability , a state node copies its community assignment from one of its neighbors in the interlayer dependency network; and with probability , it obtains its community assignment from the null distribution . We use the notation for the set of null distributions ^{2}^{2}2Note that we can use this framework to represent overlapping communities by identifying multiple state nodes in a single layer with the same physical node.. This yields the following update equation at step of our copying process:

(3) | ||||

The update equation in Eq. (3) is at the heart of our generative model for mesoscale structure in multilayer networks. It is clear from Eq. (3) that the null distributions are responsible for the specification of community assignments in the absence of interlayer dependencies (i.e., if for all , ). In particular, the support of the null distributions corresponds to the possible community assignments for a state node whenever the state node does not copy its assignment from one of its neighbors. In Section III.3, we discuss the choice of and its effect on a sampled multilayer partition.

We want to sample multilayer partitions that are consistent with both the order of the layers and the conditional probability distributions defined by Eq. (3). We use Gibbs sampling Gelfand (2000), as it relies only on the conditional distributions and does not require the joint distribution for the community assignments. Our sampling algorithm proceeds as follows: (1) we sample an initial multilayer partition from the null distribution (i.e., ); and (2) we then repeatedly update the multilayer partition by sequentially updating the community assignment of each state node using Eq. (3). When updating community assignments, we respect the order of the layers: if state node is updated before state node , then . This updating process defines a Markov chain on the space of multilayer partitions, and its stationary distribution is the desired joint distribution for the community assignments. We will sample from this joint distribution.

By running the updating process for a sufficiently large number of iterations (the so-called “burn-in period”), the state of the Markov chain is sampled approximately from its stationary distribution. The stationary distribution of the Markov chain defined by Gibbs sampling does not depend on the order in which we update the state nodes. However, the order of the update steps can influence its convergence speed Roberts and Sahu (1997). In particular, for fully ordered multilayer networks (i.e., for multilayer networks with ), updating state nodes in an order that is consistent with the order of the layers ensures that the sampling algorithm converges after a single pass over all state nodes, as the results of subsequent passes are independent from each other in this case. We expect that respecting the partial order of the layers also helps for situations in which a multilayer network is only partially ordered, though to our knowledge this question remains open.

Sampling initial conditions in an ‘unbiased’ way can be important for ensuring that one successfully explores the space of possible partitions, as the updating process is not necessarily ergodic over the supports of the null distributions (which is a subspace of the space of all multilayer partitions) when for some state nodes. A non-ergodic updating process corresponds to the case in which the conditional probabilities given by Eq. (3) do not define a unique joint probability distribution for the community assignments. Instead, there are multiple joint distributions that are consistent with the conditional distributions. The updating process will converge to one of the possible joint distributions, and the joint distribution that is selected depends both on the initial condition and on the exact sequence of random steps taken by the updating process. For example, if for all , then any partition with a single community is an “absorbing state” of the Markov chain — i.e., a state that, once reached, cannot be changed by the updating process. In this example, the updating process eventually reaches an absorbing state and the distribution of final absorbing states depends on the initial partition. The absorbing states in this example correspond to partitions in which all state nodes in each component of the interlayer dependency network have the same community assignment.

Importantly, provided the updating process has converged, any partition that one generates still has the desired dependencies between induced partitions in different layers. Thus, to work around the problem of non-unique joint distributions when the updating process is not ergodic, we reinitialize the updating process by sampling from the null distribution for each partition that we want to generate. Reinitializing the initial partition from a fixed distribution for each sample always defines a unique joint distribution, which is a mixture of all possible joint distributions when the updating process is not ergodic. When the updating process is ergodic, sampling partitions by reinitializing and sampling multiple partitions from a single long chain is equivalent. The second approach is usually more efficient when many samples are needed and some dependence between samples is not an issue Tierney (1994). However, in our case, one usually only needs a few samples with the same parameters, and independence tends to be more important than ensuring perfect convergence (provided the generated partitions exhibit the desired interlayer dependency structure). Using multiple chains thus has clear advantages even when the updating process is ergodic, and it is necessary to use multiple chains when it is not ergodic.

Determining whether a Markov chain has converged to a stationary distribution is a difficult problem, mostly because it is difficult to distinguish the case of a slow-mixing chain becoming stuck in a particular part of the state space from the case in which the chain has converged to a stationary distribution. There has been much work on trying to define a convergence criterion for Markov chains Cowles and Carlin (1996), but none of the approaches are entirely successful. In practice, one usually runs a Markov chain (or chains) for a predetermined number of steps (e.g., 1000). One manually checks on a few examples that the resulting chains exhibit behavior that is consistent with convergence by examining autocorrelations between samples of the same chain and cross-correlations between samples of independent chains with the same initial state. When feasible, one can also check whether parts of different, independent chains or different parts of the same chain are consistent with being sampled from the same distribution.

### iii.2 Fully-interconnected, diagonal, and layer-coupled multilayer partitions

In many situations, the complexity of allowing arbitrary interlayer dependencies is unnecessary (and even undesirable) when designing benchmark multilayer networks. A particularly useful restriction that still allows us to represent many situations of interest is to require the interlayer dependency network to be “fully interconnected”, “diagonal”, and “layer-coupled” Kivelä et al. (2014). A multilayer network is fully interconnected if each node is represented in each layer, it is diagonal if interlayer edges only exist between state nodes that correspond to the same physical node, and it is layer-coupled if the weight of the interlayer edges depends only on the layers and not on the nodes. For a fully interconnected, diagonal, layer-coupled dependency network, we can represent the interlayer dependency tensor using a layer dependency tensor such that its elements satisfy . The update equation (Eq. (3)) then simplifies to

(4) | ||||

which depends only on the layer dependency tensor and the null distributions . Each term quantifies the extent to which an induced partition in layer can be inferred directly from an induced partition in layer . As before, we require that . By changing the structure of , one can generate multilayer networks that correspond to several of the most important scenarios, including temporal networks, multiplex networks, and multilayer networks with more than one aspect (e.g., combinations of temporal and multiplex features). That is, the above restriction simplifies matters considerably while still allowing us to analyze several very important situations.

In Fig. 3, we show the dependency tensors for single-aspect multilayer networks. For a temporal network, one usually assumes that induced partitions in a layer depends directly only on induced partitions in the previous layer. There are thus copying probabilities (one for each pair of consecutive layers) that we are free to choose. Typical examples include choosing the same probability for each pair of consecutive layers to obtain a uniformly-evolving network Ghasemian et al. (2016); Sarzynska et al. (2015) or making some of the probabilities significantly smaller than the others to introduce change points that one may wish to detect Peel and Clauset (2015).

For a multiplex network, an induced partition in any layer can in principle depend directly on induced partitions in all other layers. This yields copying probabilities to choose. In Figure (b)b, we illustrate the simplest case, in which each layer depends equally on every other layer. In Fig. 4 and Fig. 5, we show example multilayer partitions obtained with the interlayer dependency tensors of Fig. 3 .

We can also generate multilayer networks with more than one aspect and can thereby combine temporal and multiplex features. In Fig. 6, we illustrate how to construct an appropriate layer-dependency tensor to generate such a multilayer network on a simple example with two aspects, one of which is multiplex and the other of which is temporal.

### iii.3 Categorical null distribution

The null distributions in Eqs. (3) and (4) determine the multilayer partition in the absence of interlayer dependencies, thereby fixing the numbers and sizes of the communities in the sampled partitions. We seek to allow the possibility of having heterogeneous community-size distributions. Although the LFR benchmarks Lancichinetti et al. (2008) incorporate such heterogeneity, the procedure used in Lancichinetti et al. (2008) to sample community sizes and then assign nodes to communities in a way that exactly preserves these community sizes does not mesh well with the copying process that we use to induce dependencies between community structures in different layers. In particular, we consider a process that updates the community assignment of a single node at a time, which does not guarantee preserving community sizes. Instead, we only fix expected community sizes by defining null distributions that we can use in Eqs. (3) and (4).

A simple choice for the null distributions is a categorical distribution, where for each layer and each community label , we fix the probability for a random state node in layer to be assigned to a community in the absence of interlayer dependencies. That is,

(5) |

where is the total number of communities in the multilayer partition and for all . The set corresponds to the set of community labels, and the support of a null distribution corresponds to the set of labels that have nonzero probability. In other words, the support of the null distribution is given by . We say that a label is active in a layer if it is in the support of the null distribution (i.e., if ), and we say that a label is inactive in layer if it is in the complement of the support of (i.e., if ). In the absence of interlayer dependencies, a categorical null distribution corresponds to fixing the expected size of each induced community in each layer. The actual community sizes follow a multinomial distribution. Therefore, by choosing the probabilities , one has some control over the community-size distribution of the sampled multilayer partitions. A natural choice for is to sample it from a Dirichlet distribution, which is the conjugate prior for the categorical distribution Raiffa and Schlaifer (2000); Agarwal and Daumé (2010). One can think of the Dirichlet distribution, which is the multivariate form of the beta distribution, as a probability distribution over the space of all possible categorical distributions with a given number of categories. Any other (probabilistic or deterministic) choice for is also possible.

The Dirichlet distribution over variables takes parameters (one for each variable). Its probability density function is

where and for all . The case in which all are equal is called a symmetric Dirichlet distribution, which we parametrize instead by the common value (the so-called “concentration parameter”) of the parameters and the number of variables.

The concentration parameter determines the kinds of discrete probability distributions that one is likely to obtain from the symmetric Dirichlet distribution. For , the symmetric Dirichlet distribution is the continuous uniform distribution over the space of all discrete probability distributions with states. As , the Dirichlet distribution becomes increasingly concentrated near the discrete uniform distribution, such that all entries are approximately equal. As , it becomes increasingly concentrated away from the uniform distribution, such that tends to have (or a few) large entries, and all other entries are close to . Consequently, to have very heterogeneous community sizes, one would choose . To have all communities to be of similar sizes, one would choose a large value of . To have a few large communities and many small communities, one would choose sufficiently below . The value of also affects the amount of community label overlap across layers that is not a result of our copying process. For example, if is the same for all layers, then larger values of incentivize less label overlap across layers (because there are more possible labels for each layer), and smaller values of incentivize more label overlap across layers (because there are fewer possible labels for each layer).

In some situations — e.g., when modeling the birth and death of communities in temporal networks — it is desirable to have communities that have a nonzero probability of obtaining nodes from the null distribution only in some layers. For example, if a label has probability in a given layer, it may be desirable to ensure that it also has probability in all subsequent layers. For these situations, we suggest sampling the support (i.e., the set of active communities in each layer) of the distributions before sampling the probabilities . As stated earlier in this section, the support of the null distribution is given by . Given supports for each layer, the total number of communities is . We then write for the complement. Given the supports for each layer, one samples the corresponding probabilities from a symmetric Dirichlet distribution. That is,

(6) |

A simple example for a birth/death process for communities is the following. First, fix a number of communities and a support (i.e., active community labels) for the first layer. One then sequentially initializes the supports for the other layers by removing each community present in the support of the previous layer with probability and adding a number, sampled from a Poisson distribution with rate , of new communities (with new community labels that are not active in any previous layer). In temporal networks for example, this ensures that if a community label is not in the support of a given layer, then the label is also not in the support of any subsequent layers. For this process, the expected number of communities in a given layer approaches as the number of iterations increases. Hence, one should initialize the size of the support for the first layer close to this value to avoid transients in the number of communities. For this process, the lifetime of communities follows a geometric distribution. The nature of the copying process that we use to introduce dependencies between induced partitions in different layers tends to imply that communities that have been removed from the support of the null distribution do not lose all of their members instantly but instead shrink at a speed that depends on the parameters (e.g., the values of the copying probabilities in the interlayer dependency tensor).

One can also allow labels to appear and disappear when examining multiplex multilayer partitions. For example, given a value for , one can generate the support for each layer by allowing every label to be present with some probability and absent with complementary probability . This yields a sets of active and inactive community labels for each layer. One can then sample the nonzero probabilities in that correspond to active labels from a Dirichlet distribution and set to for every inactive label . Because multiplex partitions are unordered, there is no notion of one layer occurring after another one, so we do not need to ensure that an inactive label in a given layer is also inactive in “subsequent” layers.

In general, the choice of , which encodes both the expected community sizes and the support for the null distribution (by allowing some community labels to have probability), can have a very large effect on the set of sampled multilayer partitions. We illustrate some effects of the choice of for the case of fully ordered temporal multilayer networks in Section III.4. For our numerical examples in Section V, we fix a value of and use a symmetric Dirichlet distribution with parameters and to sample probability vectors of length . This produces multilayer partitions in which the active community labels are the same across layers (and given by ) and for which the expected induced community sizes (given by ) vary across layers.

### iii.4 Temporal multilayer partitions

For the important special case of temporal networks that we illustrated in Fig. (a)a (where each layer depends only on the previous one), our generative model for multilayer partitions simplifies significantly. We take advantage of the analytic tractability of this special case to point out some of its properties that hold for any choice of , and we use it to illustrate some features of the categorical null distribution in Section III.3. In our discussion, we assume that dependencies between contiguous layers are uniform. That is, for all in Fig (a)a. (We note that one could generate change points Peel and Clauset (2015); Peixoto (2015a) by relaxing this assumption and allowing some probabilities to be significantly smaller than others. We show an example of this in Section V).

This example includes a single ordered aspect, so the layer index is a scalar and the order of the layers corresponds to temporal ordering. Furthermore, as we mentioned in Section III.1, for a fully ordered multilayer network, we require that the order of the community-assignment update process in Eq. (3) respects the order of the layers. The update order of state nodes in any given layer can be arbitrary (i.e., these can be updated simultaneously), but each update is conditional on the community assignment of state nodes in layer . The update process described in Section III.1 for generating a multilayer partition thus reduces to the procedure described in Algorithm 1. In accord with the discussion in Section III.1, note that convergence is not an issue for this case as only one pass is needed. In Fig. 4, we show example multilayer partitions for this scenario using different values of .

The generative model for temporal multilayer partitions in Algorithm 1 was also suggested in Ghasemian et al. (2016). The authors of that paper used the model to derive a detectability threshold for a planted partition for the case in which the null distributions are uniform across communities (i.e., in Section III.3) and intralayer edges are generated independently using the standard SBM (i.e., one replaces the DCSBM in Section IV by the non degree-corrected SBM in Holland et al. (1983)). In the following paragraphs, we highlight properties of the generative model in Algorithm 1 that hold for any choice of null distributions, and we illustrate that the choice of null distributions can greatly influence resulting partitions. Our observations are independent of one’s choice of monolayer network model with a planted partition. We also discuss how to generalize the temporal multilayer partition model in Algorithm 1 to include memory effects Rosvall et al. (2014) and “burstiness” Lambiotte et al. (2013) in Section VI.

A first important feature of the generative model in Algorithm 1 is that it respects the arrow of time. In particular, community assignments in a given layer depend only on community assignments in the previous layer (e.g., the previous temporal snapshot) and on the null distributions . That is, for all and all , the following condition is satisfied:

(7) | ||||

where is the -node partition induced on layer by the multilayer partition . The relative importance of the previous layer versus the null distribution is determined by the value of . When , community assignments in a given layer depend only on the null distribution of that layer [i.e., on the second term on the right-hand side of Eq. (7)]. When , community assignments in a given layer are identical to the community assignments of the previous layer (and, by recursion, to community assignments in all previous layers).

Using Eq. (7), one can easily compute the marginal probability that a given state node has a specific community assignment:

Computing marginal probabilities can be useful for computing expected community sizes for a given choice of null distributions.

We now highlight how the copying step (i.e., Step C) and the reallocation step (i.e., Step R) in Algorithm 1 govern the evolution of community assignments between consecutive layers. Steps C and R deal with the movement of nodes by first removing some nodes (“subtraction”) and then reallocating them (“addition”). In Step C, a community assignment in layer can lose a number of nodes that ranges from to all of them. It can keep all of its nodes in layer (i.e., ), lose some of its nodes (i.e., ), or disappear entirely (i.e., and ). The null distribution in Step R is responsible for a community assignment gaining new nodes (i.e., ) or for a community label appearing (i.e., and ). One needs to bear the interplay between Step C and Step R in mind when defining the null distributions .

To illustrate how the community-assignment copying process and the null distribution in Algorithm 1 can interact with each other, we give the conditional probability that a label disappears in layer and the conditional probability that a label appears in layer . For all and all , the conditional probability that a label disappears in layer is

This expression depends only on our copying process and simplifies to when (i.e., when the probability of being assigned to label is using the null distribution of layer ). Furthermore, as increases, the probability that a label disappears decreases.

For all and all , the conditional probability that a label appears in layer is

This expression gives the probability that at least one node in layer has the label , given that no node in layer has the label . When , the probability that a node disappears depends only on our community-assignment copying process and is given by . Furthermore, larger values of decrease the probability that a label appears in layer .

With the exception of Section III.3, our discussions thus far hold for any choice of . In the next two paragraphs, we give two examples to illustrate some features of the categorical null distribution in Section III.3. In particular, we focus on the effect of the support of a categorical null distribution on a sampled multilayer partition. As we stated in Section III.3, the support of a categorical null distribution is given by , where . An important feature of the support for a multilayer partition generated with the interlayer dependency matrix in Fig. (a)a is that overlap between and (i.e., ) is a necessary condition for communities in layer to gain new members in layer .

Let denote the vector of expected induced community sizes in layer (i.e., ), and suppose that the probabilities are the same in each layer (i.e., for all ). The expected number of community labels is then the same for each layer, and the expected number of nodes with community label is also the same in each layer and is given by . This choice produces a temporal network in which nodes change community labels across layers in a way that preserves both the expected number of induced communities in a layer and the expected size of induced communities in a layer.

Now suppose that one chooses the values such that their supports are nonoverlapping (i.e., for all ). At each iteration of Step C in Algorithm 1, an existing community label can then only lose members; and with probability , at least one new label will appear in every subsequent layer. For this case, one expects that members of community in layer to remain in community in layer and that members of community in layer are assigned to new communities (because labels are nonoverlapping) in layer . This choice thus produces multilayer partitions in which the expected number of new community labels per layer is nonzero (unless ) and the expected size of a given induced community decreases in time.

## Iv Sampling network edges

Having generated a multilayer partition , we want to sample multilayer networks in a way that reflects the desired mesoscale structure. We assume that all interdependencies between the different layers of the multilayer network are a result of dependencies between the partitions induced on the different layers. Therefore, we can generate edges independently for each layer. In Section VI, we point out how one can generalize our network generation process to include dependencies between layers beyond those induced by planted mesoscale structures. As we mentioned in Section I, for the purpose of the numerical examples in Section V, we generate multilayer networks without interlayer edges and with intralayer edges that reflect planted community structure in each layer. We consider multilayer networks without interlayer edges and with planted community structure (rather than another type of mesoscale structure) as an illustrative example, because it is the most commonly studied case. However, the multilayer SBM that we discuss in this section can be used to generate not only intralayer edges but also interlayer ones, and it can also be used to generate mesoscale structures other than the assortative structures of “communities”.

The generative network model that we discuss in this section is a generalization to multilayer networks of the degree-corrected SBM (DCSBM) Karrer and Newman (2011). In other words, it is a multilayer DCSBM (M-DCSBM). The parameters of a general, directed M-DCSBM are a multilayer partition (which determines the assignment of state nodes to communities), a block tensor (which determines the expected number of edges between communities in different layers), and a set of state-node parameters (which determine the allocation of edges to state nodes within communities).

The probability of observing an edge (or the expected number of edges if we allow multi-edges) from state node to state node with community assignments and in a M-DCSBM is

(8) |

where is the expected number of edges from state nodes in layer and community to state nodes in layer and community , the quantity is the probability for a random edge starting in community in layer and ending in layer to be attached to state node (note that the dependence on is implicit in ), and is the probability for an edge starting in layer and ending in community in layer to be attached to state node (note that the dependence on is implicit in ). For an undirected M-DCSBM, both the block tensor and the state-node parameters are symmetric. That is, and .

The above M-DCSBM can generate multilayer networks with arbitrary expected layer-specific in-degrees and out-degrees for each state node. (Note that the DCSBM Karrer and Newman (2011) can generate monolayer networks with arbitrary expected degrees.) Given a multilayer network with adjacency tensor , the layer--specific in-degree of state node is

and the layer--specific out-degree of state node is

(Note that the layer--specific in-degree and out-degree of state node are the “intralayer in-degree” and “intralayer out-degree” of .) For an undirected multilayer network, layer--specific in-degrees and out-degrees are equal (i.e., ). We refer to their common value as the “layer--specific degree” of a state node. For an ensemble of networks generated from an M-DCSBM, the associated means are

(9) |

and

(10) |

For the experiments in Section V, we use a model that is a slight variant (avoiding the creation of self-loops and multi-edges) of the DCSBM benchmark suggested in Karrer and Newman (2011). As we mentioned earlier, we only consider undirected multilayer networks with only intralayer edges for our experiments. The block tensor thus does not have any interlayer contributions (i.e., if ). Furthermore, we can reduce the number of node parameters that we need to specify the M-DCSBM to a single parameter for each state node . For this case, the M-DCSBM reduces to using independent monolayer DCSBMs for each layer. In Fig. 7, we describe the main stages for generating edges for a given multilayer partition with an arbitrary choice of monolayer network model with a planted partition.

We parametrize the DCSBM benchmark in terms of its distribution of expected degrees and a community-mixing parameter that controls the strength of the community structure in the benchmark. For , all edges lie within communities; for , edges are distributed independently of the communities, where the probability of observing an edge between two state nodes in the same layer depends only on the expected degrees of those two state nodes. We choose a truncated power law as the distribution for expected degrees.

We sample the expected intralayer degrees , where , for the state nodes from a truncated power law
^{5}^{5}5A power law with cutoffs and and exponent is a continuous probability distribution with probability density function
where
is the normalization constant.
with exponent , minimum cutoff , and maximum cut-off . We then construct the block tensor and state node parameters for the M-DCSBM from the sampled expected degrees and the community assignments . Let

be the expected degree of community in layer , and let

be the expected number of edges in layer . Consequently,

is the probability for an intralayer edge in layer to be attached to the particular state node , given that the edge is attached to a state node in layer and is attached to community .

The elements

of the block tensor give, for , the expected number of edges between state nodes in community in layer and state nodes in community in layer . For and , the block-tensor element instead gives twice the expected number of edges. One way to think of the DCSBM benchmark is that we categorize each edge that we want to sample as an intracommunity edge with probability or as a “random edge” (i.e., an edge that can be either an intracommunity edge or an intercommunity edge) with probability . To sample an edge, we sample two state nodes (which we then join by an edge). We call these two state nodes the “end points” of the edge. The two end points of an intracommunity edge are sampled with a frequency that is proportional to the expected degree of their associated state nodes, conditional on the end points being in the same community. By contrast, the two end points of a random edge are sampled with a frequency proportional to expected degree of their associated nodes (without conditioning on anything). We assume that the total number of edges in a layer is sampled from a Poisson distribution ^{6}^{6}6The choice of a Poisson distribution for the number of edges ensures that this approach for sampling edges is approximately consistent with sampling edges independently from Bernoulli distributions with success probabilities given by Eq. (8). The exact distribution for the number of edges is a Poisson–binomial distribution, from which it is difficult to sample; a Poisson–binomial distribution is well-approximated by a Poisson distribution if none of the individual edge probabilities are too large. with mean . One can easily extend this model to generate interlayer edges ^{7}^{7}7In the most general case of a directed multilayer network with interlayer edges, one would sample expected layer--specific in-degrees and out-degrees for each state node and layer from appropriate distributions. Given expected layer-specific degrees and a multilayer partition , one then constructs the block tensor and state node parameters for the M-DCSBM analogously to the special case described in the main text. Let
be the expected layer--specific in-degree and out-degree of community in layer , and let
be the expected number of edges from layer to layer (note the consistency constraint on expected in-degrees and out-degrees). It then follows that