Disentangling group and link persistence in Dynamic Stochastic block models
Abstract.
We study the inference of a model of dynamic networks in which both communities and links keep memory of previous network states. By considering maximum likelihood inference from single snapshot observations of the network, we show that link persistence makes the inference of communities harder, decreasing the detectability threshold, while community persistence tends to make it easier. We analytically show that communities inferred from single network snapshot can share a maximum overlap with the underlying communities of a specific previous instant in time. This leads to timelagged inference: the identification of past communities rather than present ones. Finally we compute the time lag and propose a corrected algorithm, the Lagged Snapshot Dynamic (LSD) algorithm, for community detection in dynamic networks. We analytically and numerically characterize the detectability transitions of such algorithm as a function of the memory parameters of the model.
Community detection in timeevolving interacting systems is an open problem in data mining. Temporal networks [1] provide a framework to study the dynamic evolution of interacting systems, and can be regarded as a sequence of network snapshots. In this paper we study the problem of learning the dynamic evolution of the community structure of a temporal network with link and community persistence. Community detection is a longstanding problem that has been thoroughly studied in the static network case with various approaches: modularity maximization [2], spectral methods [3, 4], beliefpropagation [5], and other heuristic algorithms [6].
For analytical tractability, we focus on stochastic block models with dynamic community structure and link persistence, which introduce time correlations in the network structure. When time correlations are present, the information obtained from the inference on individual snapshots might be contaminated by the past history of the system. This is analogous to what happens in multilayer networks [7], for which the analysis cannot be decomposed into the separate analysis over each layer if they are correlated.
Static stochastic block models have been shown to display a detectability transition [5, 8] when the ratio between the average degree within a block of nodes and the average degree towards different blocks, i.e. the assortativity parameter, becomes too low: below a critical value of assortativity, detection becomes computationally hard.
Recently the problem was also investigated in temporal networks [9, 10, 11, 12] and in a specific case of Markovian community structures [13]. In this dynamic network model, it was shown that persistence in communities can help detection, by decreasing the detectability threshold: a weaker assortativity is required to infer communities with respect to the static case. On the contrary, we show that persistence in relations can hinder detection, eventually causing the detection of old communities instead of the ones present at the time the detection is performed. We compute analytically the time lag in community detection and provide the first dynamic community detection algorithms for the model under study. The method is built upon optimal static algorithms on individual snapshots combined with our analytic result to correct for the time lag.
The paper is divided in 3 sections: in section 1 we define the dynamic stochastic block model where both communities and links are persistent in time. In section 2 we study the single snapshot inference and we show how link persistence leads to time lagged inference, that is the detection of past communities rather than present ones. In Section 3 we introduce the lagged snapshot dynamic (LSD) algorithm, that corrects static detection algorithm for the time lag. Finally, we discuss the need for correcting snapshot algorithms of community detection for timelagged inference, and suggest new directions of research in Bayesian inference for temporal networks.
1. Definition of the model
We consider a Dynamic Stochastic Block Model (DSBM) with link persistence, i.e. at each time step the presence of a link between two nodes is copied from the previous time with probability , while with probability the link is generated according to a SBM where the community structure changes over time. Several models of DSBM were previously introduced for community detection in dynamic networks [14, 15, 16, 13]. Our variant includes both link and community persistence. The SBM is a classical generative model for static networks with community structure, where a network with nodes and adjacency matrix is generated as follows. According to a prior over possible choices, each node is assigned to a community with probability . Edges are then generated according to a affinity matrix and the community structure : each couple of nodes are linked independently with probability .
In the DSBM the community structure changes over time. It consists of a sequence of networks , each with its own community structure . We will indicate with the sequence of observed adjacency matrices and with the sequence of community structures. As in [13], the dynamic of each node’s assignment is an independent Markov process with transition probability meaning that with probability a node remains in the same community, otherwise it changes randomly to a group (including ) with probability . Since at labels are assigned according to the prior, it is
(1.1) 
Adding link persistence to the DSBM we obtain the persistent dynamic model, see the flow in Fig. 1
where the network at is generated according to a static SBM from . Thus the two parameters and can be interpreted as, respectively, the persistence of communities and the persistence of links. Community persistence models the tendency of nodes to remain in the same group over time. Link persistence models the preference of nodes in keeping preexistent relations over time, for example because of the cost of adding or removing links in socioeconomic networks [17].
Here we focus on the common choices of a uniform prior, i.e. , and affinity matrix with a constant on the diagonal and another constant off diagonal, the so called assortative planted partition model that is widely used as benchmark in the mathematics and computer science community detection literature [5, 4, 23, 24]. Moreover we measure the level of assortativity with a parameter such that
(1.3) 
interpolating between a fully assortative (proportional to the identity matrix) and a fully random (proportional to a matrix of ones) affinity matrix, with fixed mean degree . We are interested in the sparse regime , that is the most challenging from the inference perspective, since most of real networks of interest are sparse and because sparsity allows to carry out asymptotically optimal analysis.
The central problem is to study under which conditions we can detect, better than chance, the correct labeling of the latent communities from the observation of , together with the most probable model’s parameters . For the static SBM, it was shown (and proved at least for [18]) that there exists a sharp threshold below which no algorithm can perform better than chance in recovering the planted community structure. This threshold occurs, in terms of the parametrization at meaning that there is a necessary minimum signal to noise ratio, in terms of assortativity, under which a community structure may still exists but is undetectable. The Bayesian inference approach considers the posterior distribution of the latent assignments
(1.4) 
where we have defined , for inferring a set of statistically significant communities and the posterior distribution over the model parameters
(1.5) 
to learn the most likely set of parameters given the data. Using smooth priors , is obtained by maximizing the likelihood with respect to , i.e. by solving the equations
(1.6) 
Since the maximization of the likelihood (1.5) requires computing expectations w.r.t the posterior (1.4), this is called ExpectationMaximization (EM) procedure [19]. The criticality of this approach is in the summation over all possible assignments whose number grows exponentially with . Overtaking this problem is usually done by Monte Carlo (MC) sampling [20] or by using belief propagation (BP) algorithms [5, 21]. Both provide an estimate of the posterior in terms of their marginals. From them, a partition is obtained by assigning each node to its most likely group
This is known [22] to be an optimal estimator, maximising the overlap with the planted assignment
(1.7) 
where the normalization is chosen to ensure if labels are assigned randomly.
In a static network, generated by a static SBM, the EM procedure described before provides a set of inferred assignments together with an estimate of the affinity matrix , obtained using the static posterior corresponding to and solving iteratively the equation
(1.8) 
This is the equivalent of equation , by deriving w.r.t. . The value of is obtained by fitting it on the inferred affinity matrix as in Eq. (1.3). Finally, throughout the paper, we will use the static EM procedure introduced in [5] where a BP algorithm is used for the expectation step, i.e. the estimate of the posterior marginals.
2. Single snapshot inference
The inference for the full dynamical model is complicated by the presence of both the link and community persistence. Here we ask first which community structure is inferred from a single snapshot of the dynamic network at a time . This might occur, for example, if one is unaware that is one observation of a dynamic process. Thus we need to compute the posterior giving the probability of community structure when only the information on the network at time is used. It holds that the posterior is that of a static SBM with an effective assortativity
(2.1) 
In fact it is sufficient to note that, from Bayes’ rule, , that can be always be written as
(2.2) 
with . Marginalising over previous network instances we get the recursive equation
(2.3)  
where in the first equality we have conditioned and summed over , while in the second over and where we used that and for every , that can be proved recursively. Since is simply we get
that gives once used the representation .
Eq. states that the posterior of a single snapshot of a DSBM is equal to the posterior of a static SBM with modified assortativity parameter. It is important to outline that the it does not imply that a single snapshot inference gives the planted assignments with modified assortativity parameter. Instead it states that, if the inferred assignments are the planted ones, then the estimated assortativity is the one of Eq. 2.1, i.e. , which is smaller than the value of the model. This happens because the link persistence decreases the effective assortative structure of the network, increasing the number of links assigned randomly with respect to those assigned on the base of their group labels. This effect is partially mitigated by the persistence of communities since it increases the probability that a link copied from a previous time is not actually random but was in turn assigned through the same community structure.
One of the consequences of eq. is that the signal provided by the observation of to the community structure at the same time decreases by the effect of the dynamics as , reducing to the static one in absence of link persistence ()^{1}^{1}1 Note that the detectability threshold from single snapshot is however higher than the threshold of the dynamic problem, i.e. the inference of all the assignments given the observation of the entire network series. For example [13] considers a DSBM without link persistence and shows that the detectability threshold is in general lowered by the communities persistence.. For , it is . Figure 2 shows the asymptotic phase space as a function of , where we have defined, in the same spirit of the static case, a detectability line as .
Figure 3 compares the theoretical predictions of with numerical simulations and BP inference of a DSBM considering different regions of the parameters space. In panel (a) the agreement is very good and this holds also in the other panels in the regions when is small. However the panels (b) and (c) show that when both and are large, some discrepancies between the theoretical curve and the simulations appear. This does not contradict necessarily eq. , which gives the assortativity parameter if the inferred assignments are the planted ones (or at least close to them). We now show that indeed the observed discrepancies can be explained by the fact that, for large persistencies, the inferred assignments are closer to a past planted assignment than to those at the time when the single snapshot inference is performed.
Given a network sequence of length generated with parameters , we call time lagged inference the problem of inferring communities at time given the observation of the network at time . It holds that the posterior is that of a static SBM with an effective assortativity
(2.4) 
where is given by . In fact, as for eq. it sufficient to compute the quantity , evaluated at . For , keepinig fixed and , it is
(2.5)  
Moreover, defining , for it is
(2.6)  
Solving and then , i.e. the recursive equation we get
Since corresponds to the non lagged in eq. , we get eq. simply using the representation (1.3).
The meaning of eq. is that every lagged inference problem has the posterior of a static SBM with effective assortativity . Thus fixing and varying we have a sequence of inference problems with the same posterior, same input data , and only different effective assortativity, thus detectability threshold. Fig. 4 shows the overlap of Eq. (1.7) between the inferred communities and the planted ones at . For small the maximum overlap is with , while for larger we observe a series of transitions where the largest overlap is with a with . We now show that the that maximizes the overlap is the one for which the effective assortativity is maximal. To this end we define
(2.7) 
Panel (a) of Fig. 5 show that for small link persistence , , i.e. a single snapshot inference solves the problem at the time of the observed snapshot . At a critical , depending on and , it is , suggesting that the inference procedure converges to the assignments at time . In fact the dashed lines in Fig. 4 are computed by solving the problem in Eq. 2.7 and it is clear that they correspond to the transitions in the overlap. Moreover the theoretical is shown in Fig. 3 to be in perfect agreement with the inferred assortativity .
To get more intuition, we note that for large
(2.8) 
Since as , when , i.e.
(2.9) 
the maximum of is not anymore at (see the panel (b) of Fig. 5).
For finite , there is a finite size effect since the range of is bounded by . In this situation for large and the maximum of is achieved at the extremum (panels (a) and (b) of Fig. 5). Finally, the panel (c) of Fig. 5 compares , , and . The black squares indicate the two transitions, the first one from zero to positive (computed with Eq. 2.9) and the second when due to the finite size effect. These correspond to the transitions observed in the empirical analysis of Fig. 3.
3. Lagged snapshot dynamic (LSD) algorithm
In this Section we propose a single snapshot algorithm for the inference of the optimal assignments together with a set of learned model’s parameters from the observation of a dynamic network. In Section 2 we showed how a naive single snapshot inference procedure, applied to a dynamic network with link and group persistencies, introduce a systematic bias in the result. This bias takes the shape of a temporal lag: communities inferred at time share a maximum overlap with planted communities at time . This can affect also the goodness of the optimal parameters learned from data, for example the measured effective assortativity parameter is systematically overestimated at high link persistency. For this reason we now propose a single snapshot algorithm able to detect and thus correct the possible presence of a temporal lag. Using only observations of the time series , we look for a set of inferred parameters , , and group assignments using the following scheme, whose details are presented below:

for each snapshot we estimate the assortativity and the assignments using a static method (e.g. BP on SBM);

we estimate the link and group persistence and from the sequence of inferred assignments;

we compute the optimal lag to get an unbiased estimation of the assortativity parameter and the correct assignments at time by considering the inferred assignments at time .
We now detail the three phases of the LSD algorithm.
Single snapshot estimations. For each snapshot observation we perform the inference from a static SBM, as in [5]. The result is a set of assignment and an effective assortativity . As shown in Section 2, the use of a static procedure introduces a bias in the result: is a downward biased estimation of the assortativity parameter and is an estimate of the planted assignment sequence but shifted by a lag , i.e. . Clearly at this point is still unknown.
Estimation of the persistence parameters. The inference of the persistence parameters and is performed by maximizing the likelihood (1.5). Deriving the loglikelihood w.r.t. we get
(3.1)  
(3.2)  
(3.3) 
where means the empirical frequency of an event over space and time. The quantity inside the bracket in equation is exactly what we would obtain by fitting a given observed assignment with a Markov chain. The difference is that now it is averaged over the assignments posterior. As a first approximation, assuming the posterior to be peaked around , the assignments inferred from the single snapshot procedure, we can simply find the solution of the polynomial equation
(3.4) 
Similarly, deriving the loglikelihood with respect to , we get
(3.6) 
having the same structure of , averaged over the assignments posterior and where we have introduced the quantities
(3.7) 
Again, as soon as the posterior is concentrated around a set of inferred assignments , we can simply find the solution of the equation
(3.8) 
Note that as soon as we use the inferred assignment instead of the full posterior distribution, equations (3.4) and (3.8) are not coupled, thus and can be obtained independently. It is worth noticing that the presence of a temporal lag doesn’t affect the result of learning link and group persistences even if we use instead of . This is because asymptotically, at large , the lag is constant, thus preserving the ordering, and the procedure bias can be considered as just a uniform shift over the inferred communities. At the same time equations () work as soon as a sequence of consecutive assignments is considered. In the next subsection we numerically test this procedure to infer the persistence parameters.
Lagged inference. Starting from the estimates and we get an estimate of the asymptotic optimal lag as
(3.9) 
from which we can shift back the inferred assignments and correct the effective learned assortativity to
(3.10) 
3.1. Results
We perform extensive numerical simulations to test the effectiveness of the LSD algorithm. Before showing the results of the full LSD, we first test step (2) of the algorithm, which estimates the persistence parameters from the (biased) estimation of the assignments. Fig. 6 shows the result of learning and from equations (3.4) and (3.8) using the assignment from the single snapshot procedure. The learned parameters and are in agreement with the planted ones, at least as soon as the overlap between the planted and inferred communities is far from zero.
We then test the performances of the LSD procedure against synthetic dynamic networks generated according to the DSBM with persistencies. We use snapshots of networks with nodes, mean degree , equally sized evolving communities and a wide range of planted parameters , , . In Fig. 7 (top left) we show the the overlap between planted and inferred assignments as a function of , . For a large region of the parameter space the overlap is very high, showing that the LSD algorithm is able to recover the planted assignment. The black dots indicate the detectability transition line of equation . As expected in the region to the right of this line the overlap is very small. The top right panel shows the estimated value of as a function of the persistence parameters. and the top right corner is the region where lagged inference is necessary. In fact the bottom left panel shows to highlight the role of time shift in assignment inference. As expected, the region where time shift is critical is the one where is different from zero. The transition line between these two region is described by (dashed line). Finally the bottom right panel shows the inferred , which in the detectability region is always very close to the planted value .
4. Conclusions
We studied the inference problem in a temporal network model where both communities and links are time varying. We focused on static algorithms for temporal networks, where inference is performed on each snapshot network and found that link persistence is the driver of a new kind of detectability transition, time lagged inference, i.e. the wrong detection of a past community rather than a present community. Analyzing static detection of dynamic communities we were able to define a first algorithm of timelagged corrected inference, the lagged snapshot dynamic (LSD) algorithm, that can serve as a benchmark algorithm for the performance analysis of other algorithms on dynamic networks. In fact, such efficient and parsimonious algorithm leaves room for improvement from new algorithms that, using the information given by the full temporal network, might reach optimality and solve efficiently the inference problem for persistent dynamic block models up to its detectability threshold.
5. Aknowledgment
Authors acknowledge support from the grant SNS16LILLB  Financial networks: statistical models, inference, and shock propagation; PB acknowledges support from FET Project DOLFINS nr. 640772 and FET IP Project MULTIPLEX nr. 317532; DT acknowledges support from the grant GR15ATANTARI and was supported by National Group of Mathematical Physics (GNFMINDAM).
References
 [1] P. Holme, and J. Saramaki, Physics reports 519, no. 3: 97125 (2012).
 [2] M. E. J. Newman, Phys. Rev. E 94, no. 5: 052315 (2016).
 [3] B. Hendrickson, and R. Leland. SIAM Journal on Scientific Computing 16.2: 452469 (1995).
 [4] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, P. Zhang, Proc. Natl. Acad. Sci. USA 110 2093520940 (2013).
 [5] A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Phys. Rev. E 84 (6), 066106 (2011).
 [6] V. D. Blondel et al., Journal of statistical mechanics: theory and experiment 2008.10: P10008 (2008).
 [7] S. Boccaletti et al., Physics Reports, 544, 1122 (2014).
 [8] E. Mossel, J. Neeman, and A. Sly, arXiv preprint arXiv:1311.4115 (2013).
 [9] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J. Onnela. Science 328, no. 5980: 876878 (2010).
 [10] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin, Machine learning 82, no. 2: 157189 (2011).
 [11] D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos: An Interdisciplinary Journal of Nonlinear Science 23, no. 1: 013142 (2013).
 [12] M. Bazzi, M. A. Porter, S. Williams, M. McDonald, D. J. Fenn, and S. D. Howison, Multiscale Modeling & Simulation 14, no. 1: 141 (2016).
 [13] A. Ghasemian, P. Zhang, A. Clauset, C. Moore, L. Peel, Phys. Rev. X 6, 031005 (2016).
 [14] X. Zhang, C. Moore, and M.E.J. Newman, arXiv preprint arXiv:1607.07570, (2016).
 [15] K. S. Xu, and A. O. Hero, IEEE Journal of Selected Topics in Signal Processing 8, no. 4: 552562 (2014).
 [16] K. S. Xu, Stochastic Block Transition Models for Dynamic Networks, AISTATS (2015).
 [17] L. A. N. Amaral, A. Scala, M. Barthelemy and H.E. Stanley, Proc. Natl. Acad. Sci. USA 97.21: 1114911152 (2000).
 [18] E. Mossel, J. Neeman, and A. Sly, Probability Theory and Related Fields , 1 (2012).
 [19] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning (Springer, Berlin, 2001). Vol. 1.
 [20] P.T. Peixoto, Physical review letters 110, 14: 148701 (2013).
 [21] A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Phys. Rev. Lett. 107 (6), 065701 (2011).
 [22] Y. Iba, Journal of Physics A: Mathematical and General 32, 3875 (1999)
 [23] M. E. Dyer and A. M. Frieze, J. Algorithm 10, 451 (1989).
 [24] A. Condon and R. M. Karp, Random Struct. Algor. 18, 116 (2001).