Exact ICL maximization in a nonstationary temporal extension of the stochastic block model for dynamic networks
Abstract
The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixedlength time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated completedata likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology.
keywords:
Dynamic networks, stochastic block models, Exact ICL.1 Introduction
Network analysis has been applied since the 30s to many scientific fields. Indeed graph based modelling has been used in social sciences since the pioneer work of Jacob Moreno Moreno (). Nowadays, network analyses are used for instance in physics (Albert2002, ), economics 1979 (), biology barabasi2004 (); Palla2005 () and history villa2008mining (), among other fields.
One of the main tools of network analysis is clustering which aims at detecting clusters of nodes sharing similar connectivity patterns. Most of the clustering techniques look for communities, a pattern in which nodes of a given cluster are more likely to connect to members of the same cluster than to members of other clusters (see FortunatoSurveyGraphs2010 () for a survey). Those methods usually rely on the maximization of the modularity, a quality measure proposed by Girvan and Newman girvan2002community (). However, maximizing the modularity has been shown to be asymptotically biased bickel2009nonparametric ().
In a probabilistic perspective, the stochastic block model (SBM) Hollands83 () assumes that nodes of a graph belong to hidden clusters and probabilities of interactions between nodes depend only on these clusters. The SBM can characterize the presence of communities but also more complicated patterns inbooklatouche2009 (). Many inference procedures have been derived for the SBM such as variational expectation maximization (VEM) daudin2008mixture (), variational Bayes EM (VBEM) articlelatouche2012 (), Gibbs sampling nowicki2001estimation (), allocation sampler mcdaid11 (), greedy search Come_Latouche15 () and non parametric schemes kemp2006learning (). A detailed survey on the statistical and probabilistic take on network analysis can be found in Goldenberg09 ().
While the original SBM was developed for static networks, extensions have been proposed recently to deal with dynamic graphs. In this context, both nodes memberships to a cluster and interactions between nodes can be seen as stochastic processes. For instance, in the model of Yang et al. (yang2011detecting, ), the connectivity pattern between clusters is fixed through time and a hidden Markov model is used to describe cluster evolution: the cluster of a node at time is obtained from its cluster at time via a Markov chain. Conversely, Xu et al. xu2013dynamic () as well as Xing et al. (xing2010, ) used a state space model to describe temporal changes at the level of the connectivity pattern. In the latter, the authors developed a method to retrieve overlapping clusters through time.
Other temporal variations of the SBM have been proposed. They generally share with the ones described above a major assumption: the data set consists in a sequence of graphs. This is by far the most common setting for dynamic networks. Some papers remove those assumptions by considering continuous time models in which edges occur at specific instants (for instance when someone sends an email). This is the case of e.g. proceedingsdubois2013 () and of Rossi12 (); GuigouresEtAl2015 (). The model developed in the present paper introduces a sequence of graphs as an explicit aggregated view of a continuous time model.
More precisely, our model, that we call the temporal SBM (TSBM), assumes that nodes belong to clusters that do not change over time but that interaction patterns between those clusters have a time varying structure. The time interval over which interactions are studied is first segmented into subintervals of fixed identical duration. The model assumes that those subintervals can be clustered into classes of homogeneous interaction patterns: the distribution of the number of interactions that take place between nodes of two given clusters during a subinterval depends only on the clusters of the nodes and on the cluster of the subinterval. This provides a non stationary extension of the SBM, which is based on the simultaneous modelling of clusters of nodes and of subintervals of the time horizon. Notice that a related approach is adopted in (Come_Govaert14, ), but with a substantial difference: they consider time intervals whose membership is known and hence exogenous, whereas in this paper the membership of each interval is hidden and therefore inferred from the data.
The greedy search strategy proposed for the (original) stationary SBM was compared with other SBM inference tools in many scenarios using both simulated and real data in Come_Latouche15 (). Experimental results emerged illustrating the capacity of the method to retrieve relevant clusters. Note that the same framework was considered for the (related) latent block model wyse2014inferring (), in the context of biclustering, and similar conclusions were drawn. Indeed, contrary to most other techniques, this approach relies on an exact likelihood criterion, so called integrated completedata likelihood (ICL), for optimization. In particular, it does not involve any variational approximations. Moreover, it allows the clustering of the nodes and the estimation of the number of clusters to be performed simultaneously. Alternative strategies usually do first the clustering for various number of clusters, by maximizing a given criterion, typically a lower bound. Then, they rely on a model selection criterion to estimate the number of clusters (see daudin2008mixture () for instance). Some sampling strategies also allow the simultaneous estimation kemp2006learning (); mcdaid11 (). However, the corresponding Markov chains tend to exhibit poor mixing properties, i.e. low acceptance rates, for large networks. Finally, the greedy search incurs Come_Latouche15 () a smaller computational cost than existing techniques. Therefore, we follow the greedy search approach and derive an inference algorithm, for the new model we propose, which estimates the number of clusters, for both nodes and time intervals, as well as memberships to clusters.
Finally, we cite the recent work of Matias et al. Matias_Poisson () who independently developed a temporal stochastic block model, related to the one proposed in this paper. Interactions in continuous time are counted by non homogeneous Poisson processes whose intensity functions only depend on the nodes clusters. A variational EM algorithm was derived to maximize an approximation of the likelihood and non parametric estimates of the intensity functions are provided.
2 A non stationary stochastic block model
We describe in this section the proposed extension of the stochastic block model (SBM) to non stationary situations. First, we recall the standard modeling assumptions of the SBM, then introduce our temporal extension and finally derive an exact integrated classification likelihood (ICL) for this extension.
2.1 Stochastic block model
We consider a set of nodes and the adjacency matrix such that counts the number of direct interactions from to over the time interval . Self loops are not considered here, so the diagonal of is made of zeros (). Nodes in are assumed to belong to disjoint clusters
We introduce a hidden random vector , labelling each node’s membership
The are assumed to be independent and identically distributed random variables with a multinomial probability distribution depending on a common parameter
Thus, node belongs to cluster with probability . As a consequence, the joint probability of vector is
(1) 
where denotes the number of nodes in cluster (we denote the cardinal of a set ).
The first assumption of the original (stationary) SBM is that interactions between nodes are independent given the cluster membership vector , that is
In addition, is assumed to depend only on and . More precisely, let us introduce a matrix of model parameters
Then, if is such that and , we assume that is such that
Combining the two assumptions, the probability of observing the adjacency matrix , conditionally to , is given by
When characterizes interaction counts, a common choice for is the Poisson distribution.
2.2 A non stationary approach
In order to introduce a temporal structure, we modify the model described in the previous section. The main idea is to allow interaction counts to follow different regimes through time. The model assumes that interaction counts are stationary at some minimal time resolution. This resolution is modeled via a decomposition of the time interval in subintervals delimited by the following instants
whose increments
have all the same fixed value denoted .
As for the nodes, a partition is considered for the time subintervals. Thus, each is assumed to belong to one of hidden clusters and the random vector is such that
A similar multinomial distribution as the one of , is used to model that is
(2) 
where is the cardinal of cluster and .
We now define as the number of observed interactions from to , in the time interval . With the notations above, we have
Following the SBM case, we assume conditional independence between all the given the two hidden vectors and . Denoting , the three dimensional tensor of interaction counts, this translates into
Given a three dimensional tensor of parameters , we assume that when is such that and , and is such that , then
In addition, is assumed to be a Poisson distributed random variable, that is
(3) 
Remark 1
In the standard SBM, the adjacency matrix is a classical matrix and the parameter matrix is also a classical matrix. In the proposed extension, those matrices are replaced by three dimensional tensors, with dimensions and with dimensions .
Remark 2
For and fixed and known, the random variables are independent but are not identically distributed. As corresponds to time this induces a non stationary structure as an extension of the traditional SBM.
Notation 1
To simplify the rest of the paper, let us denote
and similarly for and .
As in the case of the SBM, the distribution of , conditional to and , can be computed explicitly
(4) 
where
The full generative model is obtained by adding an independence assumptions between and which gives to those vectors the following joint distribution (obtained using equations (1) and (2))
(5) 
where .
The identifiability of the proposed model could be assessed in future works, being outside the scope of the present paper. For a detailed and more general survey of the identifiability of the model parameters, in dynamic stochastic block models, the reader is referred to (matias_dynamic_SBM, ).
2.3 Exact ICL for non stationary SBM
The assumptions we have made so far are conditional on the number of clusters and being known, which is not the case in real applications. A standard solution to estimate the labels and as well as the number of clusters would consist in fixing the values of and at first and then in estimating the labels through one of the methods mentioned in the introduction (e.g. variational EM). A model selection criterion could finally be used to choose the values of and . Many model selection criteria exist, such as the Akaike Information Criterion (AIC) articleakaike1974 (), the Bayesian Information Criterion (BIC) Schwarz () and the integrated classification likelihood (ICL), introduced in the context of Gaussian mixture models by Biernacki et al. biernacki2000assessing (). Authors in Come_Latouche15 () proposed an alternative approach: they introduced an exact version of the ICL for the stochastic block model, based on a Bayesian approach and maximized it directly with respect to the number of clusters and to cluster memberships. They ran several experiments on simulated and real data showing that maximizing the exact ICL through a greedy search algorithm provided more accurate estimates than those obtained by variational inference or MCMC techniques. Similar results are provided in wyse2014inferring (), in the context of the latent block model (LBM) for bipartite graphs: the greedy ICL approach outperforms its competitors in both computational terms and in the accuracy of the provided estimates. Therefore, in this paper, we chose to extend the proposed greedy search algorithm to the temporal model. More details are provided in Section 1.
In the following, the expressions “ICL” or “exact ICL” will be used interchangeably.
Following the Bayesian approach, we introduce a prior distribution over the model parameters and , given the meta parameters and , denoted . Then the ICL is the complete data loglikelihood given by
(6) 
where the model parameters and have been integrated out, that is
(7) 
We emphasize that the marginalization over all model parameters naturally induces a penalization on the number of clusters. For more details, we refer to biernacki2000assessing (); Come_Latouche15 (). The integral can be simplified by a natural independence assumption on the prior distribution
which gives
(8) 
Notice that we use in this derivation the implicit hypothesis from equation (5) which says that is independent from (given , and ).
2.4 Conjugated a priori distributions
A sensible choice of prior distributions over the model parameters is a necessary condition to have an explicit form of the ICL.
2.4.1 Gamma a priori
In order to integrate out and obtain a closed formula for the first term on the right hand side of (2.3), we impose a Gamma a priori distribution over
leading to following joint density
(9) 
where and is the gamma function. By multiplying (2.2) and (9), the joint density for the pair follows
This quantity can now be easily integrated w.r.t. to obtain
(10) 
with
(11) 
A non informative prior for the Poisson distribution corresponds to limiting cases of the Gamma family, when tends to zero. In all the experiments we carried out, we set the parameters and to one, in order to have unitary mean and variance for the Gamma distribution.
2.4.2 Dirichlet a priori
We attach a factorizing Dirichlet a priori distribution to , namely
where the parameters of each distribution have been set constant for simplicity. It can be proved (A) that the joint integrated density for the pair , reduces to
(12) 
A common choice consists in fixing these parameters to 1 to get a uniform distribution, or to to obtain a Jeffreys non informative prior.
3 ICL maximization
The integrated complete likelihood (ICL) in equation (2.3) has to be maximized with respect to the four unknowns , , , and which are discrete variables. Obviously no closed formulas can be obtained and it would computationally prohibitive to test every combination of the four unknowns. Following the approach described in Come_Latouche15 (), we rely on a greedy search strategy. The main idea is to start with a fine clustering of the nodes and of the intervals (possibly size one clusters) and then to alternate between an exchange phase where nodes/intervals can move from one cluster to another and a merge phase where clusters are merged. Exchange and merge operations are locally optimal and are guaranteed to improve the ICL.
The algorithm is described in detail in the rest of the section. An analysis of its computational complexity is provided in B.
Remark 3
The algorithm is guaranteed to increase the ICL at each step and thus to converge to a local maximum. Randomization can be used to explore several local maxima but the convergence to a global maximum is not guaranteed. Moreover, let us denote by , , , the estimators of , , and , respectively, obtained through the maximization of the function in equation (2.3). A formal proof of the consistency of these estimators is outside the scope of this paper. More in general, the consistency of this kind of estimators, maximizing the exact ICL, is still an open issue.
3.1 Initialization
Initial values are fixed for both and , say and . These values may be fixed equal to and respectively and each node (interval) would be alone in its own cluster (time cluster). Alternatively, simple clustering algorithms (kmeans, hierarchical clustering) may be used to reduce and up to a certain threshold. This choice should be preferred to speed up the greedy search.
3.2 Greedy  Exchange (GE)
A shuffled sequence of all the nodes (time intervals) in the graph is created. One node (time interval) is chosen and is moved from its current (time) cluster into the (time) cluster leading to the highest increase in the exact ICL, if any. This is called a greedy exchange (GE). This routine is applied to every node (time interval) in the shuffled sequence. This iterative procedure is repeated until no further improvement in the exact ICL is possible. In case a node (time interval) is alone inside its cluster, an exchange becomes a merge of two clusters (see below).
The ICL does not have to be completely evaluated before and after each swap: possible increases can be computed directly, reducing the computational cost. Let us consider first the case of temporal intervals. Moving interval from the cluster to cluster induces a modification of the ICL given by
where and refer to the new configuration where . It can easily be shown that reduces to
(13) 
The case of nodes is slightly more complex. When a node is moved from cluster to , with , the change in the ICL is
which simplifies into
where and refer to the new configuration.
3.3 Greedy  Merge (GM)
Once the GE step is concluded, all possible merges of pairs of clusters (time clusters) are tested and the best merge is finally retained. This is called a greedy merge (GM).This procedure is repeated until no further improvement in the ICL is possible.
In this case too, the ICL does not need to be explicitly computed. Merging in fact time clusters and into leads to the following ICL modification
(14) 
Notice that if , then has to be replaced by inside .
When merging clusters and into the cluster , the change in the ICL can be expressed as follows
3.4 Optimization strategies
We have to deal with two different issues:

the optimization order of nodes and times: we could either run the greedy algorithm for nodes and times separately or choose an hybrid strategy that switches and merges nodes and time intervals alternatively, for instance;

whether to execute merge or switching movements at first.
The second topic has been largely discussed in the context of modularity maximization for community detection in static graphs. One of the most commonly used algorithms is the socalled Louvain method (Blondel08fastunfolding, ) which proceeds in a rather similar way as the one chosen here: switching nodes from clusters to clusters and then merging clusters. This is also the strategy used in (Come_Latouche15, ) for stationary SBM. Combined with a choice of sufficiently small values of and , this approach gives very good results at a reasonable computational cost. It should be noted that more complex approaches based on multilevel refinements of a greedy merge procedure have been shown to give better results than the Louvain method in the case of modularity maximization (see (NoakRotta, )). However, the computation complexity of those approaches is acceptable only because of the very specific nature of the modularity criterion and with the help of specialized data structures. We cannot leverage such tools for ICL maximization.
The first issue is hard to manage since the shape of the function is unknown. We developed three optimization strategies:

GE + GM for time intervals and then GE + GM for nodes (Strategy A);

GE + GM for nodes and then GE + GM for times (Strategy B);

Mixed GE + mixed GM (Strategy C).
In the mixed GE a node is chosen in the shuffled sequence of nodes and moved to the cluster leading to the highest increase in the ICL. Then a time interval is chosen in the shuffled sequence of time intervals and placed in the best time cluster and so on alternating between nodes and time intervals until no further increase in the ICL is possible. The mixed GM works similarly. In all the experiments, the three optimization strategies are tested and the one leading to the highest ICL is retained.
4 Experiments
To assess the reliability of the proposed methodology some experiments on synthetic and real data were conducted. All runtimes mentioned in the next two sections are measured on a twelve cores Intel Xeon server with 92 GB of main memory running a GNU Linux operating system. The greedy algorithm described in Section 3 was implemented in C++. An euclidean hierarchical clustering algorithm was used to initialize the labels and and have been set equal to and respectively.
4.1 Simulated Data
4.1.1 First Scenario
We simulated interactions between 50 nodes, belonging to three clusters . Interactions take place over 50 times intervals of unitary length, belonging to three time clusters (denoted ). Clusters are assumed to be balanced on average by fixing . Notice that while the clusters are balanced on average they can be relatively imbalanced in some particular cases.
A community structure setting is chosen, corresponding to the following diagonal form for the intensity matrix
where is a free parameter in . A non stationary behaviour is obtained by modifying the intensity matrix over time as follows
(15) 
where is a free parameter in and denotes the indicator function over a set . In other words, is equal to when belongs to , to when belongs to and to when belongs to . The overall community pattern does not evolve through time but the average interaction intensity is different in the three time clusters. Both the community structure and the non stationary behaviour can be made more or less obvious based on the value of and .
For several values of the pair (), 50 dynamic graphs were sampled according to the Poisson intensities in equation (15). Estimates of labels vectors and are provided for each graph^{1}^{1}1The average runtime of our implementation on those artificial data is 0.96 second.. The greedy algorithm following the optimization strategy A, led to the best results (see next paragraph for more details). In order to avoid convergence to local maxima, ten estimates of labels are provided for each graph and the pair () leading to the highest ICL is retained^{2}^{2}2Calculations are done in parallel as they are independent. The reported runtime is the wall clock time..
Experiments show that for sufficiently large values of and , the true structure can always be recovered. We can see this in detail for two special cases, as illustrated in Figure 1 .
In Figure 1, we set , which means there is not any community structure and let varying in the range . Adjusted Rand Indexes (ARIs) rand1971objective () are used to assess the time clustering, varying between zero (null clustering) and one (optimal clustering). When we are in a degenerate case and no time structure affects the interactions: not surprisingly the algorithm assigns all the intervals to the same cluster (null ARI). The higher the value of the more effective the clustering is up to a perfect recovery of the planted structure (ARI of 1). In particular the true time structure is fully recovered for all the fifty graphs when is higher than .
Similar results can be observed in Figure 1 about nodes clustering: by setting , we removed any time structure and a stationary community structure is detected by the model. In this case it is interesting to make a comparison with a traditional SBM, which is expected to give similar results to those shown in Figure 1. For a fixed value of we simulated a dynamic graph, corresponding to 50 adjacency matrices, one per time interval. Then a static graph is obtained by summing up these adjacency matrices. The temporal SBM (TSBM) we propose deals with the dynamic graph, whereas a SBM is used on the static graph ^{3}^{3}3This choice is the most natural one to compare the two models. Alternatively, the SBM could be used on a single adjacency matrix among the fifty adjacency matrices provided, at each iteration. In the experiments we carried out, we obtained similar results for the two options.. The Gibbs sampling algorithm introduced in nouedoui2013 () was used to recover the number of clusters and cluster memberships according to a SBM (with Poisson distributed edge values). The experiment was repeated 50 times for each value of in the set . In Figure 2 we compare the ARIs of the two models for each value of .
The greedy ICL TSBM (faster than the Gibbs sampling algorithm, who has an average runtime of 15.15 seconds) recovers the true structure at levels of contrast lower than those required by the Gibbs sampling algorithm (SBM). This comparison aims at showing that, in a stationary framework, the TSBM works at least as well as a standard SBM. The difference in terms of performance of the two models in this context can certainly be explained by the greedy search approach which is more effective than Gibbs sampling, as expected (see (Come_Latouche15, ) and section 1).
4.1.2 Optimization strategies
As mentioned in the previous section, in the present experiments, the optimization strategy A is more efficient than the two other strategies outlined in Section 3.4. We illustrate this superiority in the following test: the pair is set to and 50 dynamic graphs are simulated according to the same settings discussed so far. Three different estimations are obtained, one for each strategy, and ARIs for nodes labels are computed. Results in Figure (3) can be compared with the mean value of the final ICL for each strategy:
mean ICL  

strategy A  
strategy B  
strategy C 
4.1.3 Scalability
A full scalability analysis of the proposed algorithm is out of the scope of this paper (see B), but we have performed a limited assessment in this direction with a simple example.
A fixed is maintained and for several values of and dynamic graphs with 100 nodes and 100 times intervals were sampled according to the intensity in equation (15). The mean runtime for reading and providing labels estimates for dynamic graph is 13.16 seconds. As expected, the algorithm needs a lower contrast to recover the true structure as the reader can observe by comparing Figure (4) with Figure (1). This is a consequence of the increase in the number of interactions (induced by the longer time frame).
In terms of computational burden, each dynamic graph is handled in a average time of 13.16 seconds, that is less than 14 slower than in the case of a graph with 50 nodes and 50 time intervals. As we use and , the worst case cost of one “iteration” of the algorithm is and thus doubling both and should multiply the runtime by 16. On this limited example, the growth is slightly less than expected.
4.1.4 Non community structure
We now consider a different scenario showing how the TSBM model can perfectly recover a clustering structure in a situation where the SBM fails. We considered two clusters of nodes and and two time clusters and (clusters are balanced in average as in the previous examples). We simulated directed interactions between nodes over time intervals according to the following intensity matrix
(16) 
where
In this scenario, a clustering structure is persistent over time, but the agents behaviour changes abruptly depending on the time cluster the interactions are taking place, moving from a community like pattern to a bipartite like one. When aggregating observations, since the expected percentage of time intervals belonging to cluster is , the two opposite effects compensate each other (on average) and the SBM cannot detect any community structure. This can be seen in Figure 5: we simulated 50 dynamic graphs according to the Poisson intensities in equation (16) and estimates of and are provided for each graph by both TSBM and SBM. The outliers ARIs in the right hand side figure (7 over 50) correspond to sampled vectors in which the proportion of time intervals belonging to cluster is far from . No outlier is observed when the experiment is performed with a fixed label vector placing the same number of time intervals in each cluster.
The optimization strategy A has been used to produce the results shown in Figure 3. Very similar results can be obtained through optimization strategies B and C: with these settings the greedy ICL algorithm can always estimate the true vectors and .
4.2 Real Data
The data set we used was collected during the ACM Hypertext conference held in Turin, June 29th  July 1st, 2009. It represents the dynamic network of facetoface proximity interactions of 113 conference attendees over about 2.5 days^{4}^{4}4More informations can be found at: http://www.sociopatterns.org/datasets/hypertext2009dynamiccontactnetwork/.
We focused on the first conference day, namely the twenty four hours going from 8am of June 29th to 7.59am of June 30th. The day was partitioned in small time intervals of 20 seconds in the original data frame and interactions of facetoface proximity (less than 1.5 meters) were monitored by electronic badges that attendees volunteered to wear. Further details can be found in Isella:2011qo (). We considered 15 minutes time aggregations, thus leading to a partition of the day made of 96 consecutive quarterhours ( with previous notation). A typical row of the aggregated data set looks like the following one:
ID  ID  Time Interval ()  Number of interactions 
52  26  5  16 
It means that conference attendees 52 and 26, between 9am and 9.15am, have spoken for .
In Figure 5(a), we computed the total number of interactions for each quarter hour. The presence of a time pattern is clear: the volume of interactions, for example, is much higher at pm than at am. The greedy ICL algorithm found 20 clusters for nodes (people) and 4 time clusters. Figure 5(b) shows how daily quarterhours are assigned to each cluster: it can clearly be seen how time intervals corresponding to the highest number of interactions have been placed in cluster , those corresponding to an intermediate interaction intensity, in (yellow) and (green). Cluster (magenta) contains intervals marked by a weaker intensity of interactions. It is interesting to note how the model closely recovers times of social gathering^{5}^{5}5A complete program of the day can be found at http://www.ht2009.org/program.php.:

9.0010.30  setup time for posters and demos.

13.0015.00  lunch break.

18.0019.00  wine and cheese reception.
Results in Figure 6 are obtained through the optimization strategy A. To make a comparison with the other two optimization strategies, we run the algorithm ten times for each strategy (A, B and C) and compare the final values of the ICL. Labels and are randomly initialized before each run, according to multinomial distributions (no hierarchical clustering was used) and and are set equal to and , respectively. The mean final values of the ICL are reported in the following table:
mean ICL  

Strategy A  
Strategy B  
Strategy C 
As it can be seen, the hybrid strategy C is the one leading to the highest final ICL, on average. In Figure (7) we report the final value of the ICL for each run (from 1 to 10) for each strategy. The optimization strategy C always outperforms the remaining two patterns.
5 Conclusion
We proposed a nonstationary extension of the stochastic block model (SBM) allowing us to simultaneously cluster nodes and infer the time structure of a network. The approach we chose consists in partitioning the time interval over which interactions are studied into subinterval of fixed identical duration. Those intervals provide aggregated interaction counts that are studied with a SBM inspired model: nodes and time intervals are clustered in such a way that aggregated interaction counts are homogeneous over clusters. We derived an exact integrated classification likelihood (ICL) for such a model and proposed to maximize it with a greedy search strategy. The experiments we run on artificial and real world networks highlight the capacity of the model to capture nonstationary structures in dynamic graphs.
Appendix A Joint integrated density for labels
Consider at first the vector , whose joint probability function is given by (1). We attach a Dirichlet a priori distribution to the vector
The joint probability density for the pair is obtained by multiplying (1) by the prior density
This is still a Dirichlet probability density function of parameters () and integration with respect to is straightforward
This integrated density corresponds to the first term on the right hand side of (12). The second term is obtained similarly and the joint density follows by independence.
Appendix B Computational complexity
To evaluate the computational complexity of the proposed algorithm, we assume that the gamma function can be computed in constant time (see numericalRecipes2007 ()). The core computation task consists in evaluating the change in ICL induced by exchanges and merges. The main quantities involved in those computations are the . We first describe how to handle those quantities and then analyze the cost of the exchange and merge operations.
b.1 Data structures
The quantities are stored in a three dimensional array that is never resized (it occupies a memory space) so that at any time during the algorithm, accessing to a value or modifying it can be done in constant time. The quantities needed to compute , the , and are handled in a similar way.
In addition, we maintain aggregated interaction counts for each time interval and each node. More precisely, we have for instance for a time interval
and similar quantities such as . For a node , we have e.g.
and other related quantities. The memory occupied by those structures is in . Cluster memberships and clusters sizes are also stored in arrays.
In order to evaluate the ICL change induced by an operation, we need to compute its effect on in order to obtain . This can be done in constant time for one value. For instance moving time interval from to implies the following modifications:

is reduced by while is increased by the same quantity;

is divided by while is multiplied by the same quantity;

is decreased by (or ) while is increased by the same quantity.
When an exchange or a fusion is actually implemented, we update all the data structures. The update cost is dominated by the other phases of the algorithm. For instance when is moved from to , we need to update:

cluster memberships and cluster sizes, which is done in ;

and for all and , which is done in ;

aggregated counts and products, such as and , which is done in .
Considering that and , the total update cost is in for time interval related operations and in for node related operations.
b.2 Exchanges
The calculation of for a time interval cluster exchange from equation (13) involves a sum with terms. As explained above each term is obtained in constant time, thus the total computation time is in . This has to be evaluated for all time clusters and for all time interval, summing to a total cost of .
Similarly, the calculation of involves a fix number of sums with at most terms in each sum. The total computation time is therefore in . This had to be evaluated for each node and for all node clusters, summing to a total cost of .
Notice that we have evaluated the total cost of one exchange round, i.e., in the case where all time intervals (or all nodes) are considered once. This evaluation does not take into account the reduction in the number of clusters generally induced by exchanges.
b.3 Merges
Merges are very similar to exchange in terms of computational complexity. They involve comparable sums that can be computed efficiently using the data structures described above. The computational cost for one time cluster merge round is in while it is in for node clusters.
b.4 Total cost
The worst case complexity of one full exchange phase (with each node and each time interval considered once) is . The worst case complexity of one merge with mixed GM is which is smaller than the previous one for and . Thus the worst case complexity of one “iteration” of the algorithm is .
Unfortunately, the actual complexity of the algorithm, while obviously related to this quantity, is difficult to evaluate for two reasons. Firstly, we have no way to estimate the number of exchanges needed in the exchange phase (apart from bounding them with the number of possible partitions). Secondly, we observe in practice that exchanges reduce the number of clusters, especially when and are high (i.e. close to and , respectively). Thus the actual cost of one individual exchange reduces very quickly during the first exchange phase leading to a vast overestimation of its cost using the proposed bounds. As a consequence, the merge phase is also quicker than evaluated by the bounds.
A practical evaluation of the behaviour of the algorithm, while outside the scope of this paper, would be very interesting to assess its potential use on large data sets.
References
References
 (1) J. Moreno, Who shall survive?: A new approach to the problem of human interrelations., Nervous and Mental Disease Publishing Co, 1934.
 (2) R. Albert, A. Barabási, Statistical mechanics of complex networks, Modern Physics 74 (2002) 47–97.

(3)
D. Snyder, E. L. Kick, Structural
position in the world system and economic growth, 19551970: A
multiplenetwork analysis of transnational interactions, American Journal of
Sociology 84 (5) (1979) pp. 1096–1126.
URL http://www.jstor.org/stable/2778218  (4) A. Barabási, Z. Oltvai, Network biology: understanding the cell’s functional organization, Nature Rev. Genet 5 (2004) 101–113.
 (5) G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005) 814–818.
 (6) N. Villa, F. Rossi, Q. Truong, Mining a medieval social network by kernel som and related methods, Arxiv preprint arXiv:0805.1374.
 (7) S. Fortunato, Community detection in graphs, Physics Reports 486 (35) (2010) 75 – 174.
 (8) M. Girvan, M. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences 99 (12) (2002) 7821.
 (9) P. Bickel, A. Chen, A nonparametric view of network models and newman–girvan and other modularities, Proceedings of the National Academy of Sciences 106 (50) (2009) 21068–21073.
 (10) P. Holland, K. Laskey, S. Leinhardt, Stochastic blockmodels: first steps, Social Networks 5 (1983) 109–137.
 (11) P. Latouche, E. Birmelé, C. Ambroise, Bayesian methods for graph clustering, Springer, 2009, pp. 229–239.
 (12) J.J. Daudin, F. Picard, S. Robin, A mixture model for random graphs, Statistics and Computing 18 (2) (2008) 173–183.
 (13) P. Latouche, E. Birmelé, C. Ambroise, Variational bayesian inference and complexity control for stochastic block models, Statistical Modelling 12 (1) (2012) 93–115.
 (14) K. Nowicki, T. Snijders, Estimation and prediction for stochastic blockstructures, Journal of the American Statistical Association 96 (455) (2001) 1077–1087.
 (15) A. Mc Daid, T. Murphy, F. N., N. Hurley, Improved bayesian inference for the stochastic block model with application to large networks, Computational Statistics and Data Analysis 60 (2013) 12–31.
 (16) E. Côme, P. Latouche, Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood, Statistical Modelling 15 (6) (2015) 564–589. doi:10.1177/1471082X15577017.
 (17) C. Kemp, J. Tenenbaum, T. Griffiths, T. Yamada, N. Ueda, Learning systems of concepts with an infinite relational model, in: Proceedings of the National Conference on Artificial Intelligence, Vol. 21, 2006, pp. 381–391.
 (18) A. Goldenberg, X. Zheng, S. E. Fienberg, E. M. Airoldi, A survey of statistical network models, Machine Learning 2 (2) (2009) 129–133. doi:10.1561/2200000005.
 (19) T. Yang, Y. Chi, S. Zhu, Y. Gong, R. Jin, Detecting communities and their evolutions in dynamic social networks – a bayesian approach, Machine learning 82 (2) (2011) 157–189.
 (20) K. S. Xu, A. O. Hero III, Dynamic stochastic blockmodels: Statistical models for timeevolving networks, in: Social Computing, BehavioralCultural Modeling and Prediction, Springer, 2013, pp. 201–210.
 (21) E. P. Xing, W. Fu, L. Song, A statespace mixed membership blockmodel for dynamic network tomography, Ann. Appl. Stat. 4 (2) (2010) 535–566. doi:10.1214/09AOAS311.
 (22) C. Dubois, C. Butts, P. Smyth, Stochastic blockmodelling of relational event dynamics, in: International Conference on Artificial Intelligence and Statistics, Vol. 31 of the Journal of Machine Learning Research Proceedings, 2013, pp. 238–246.
 (23) R. Guigourès, M. Boullé, F. Rossi, A triclustering approach for time evolving graphs, in: Coclustering and Applications, IEEE 12th International Conference on Data Mining Workshops (ICDMW 2012), Brussels, Belgium, 2012, pp. 115–122. doi:10.1109/ICDMW.2012.61.

(24)
R. Guigourès, M. Boullé, F. Rossi,
Discovering patterns in
timevarying graphs: a triclustering approach, Advances in Data Analysis and
Classification (2015) 1–28doi:10.1007/s1163401502186.
URL http://dx.doi.org/10.1007/s1163401502186  (25) A. Randriamanamihaga, E. Côme, L. Oukhellou, G. Govaert, Clustering the vélib dynamic origin/destination flows using a family of poisson mixture models, Neurocomputing 141 (2014) 124–138.
 (26) J. Wyse, N. Friel, P. Latouche, Inferring structure in bipartite networks using the latent block model and exact icl, Network Science, to appear.
 (27) C. Matias, T. Rebafka, F. Villers, Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks, ArXiv eprintsarXiv:1512.07075.

(28)
C. Matias, V. Miele,
Statistical clustering
of temporal networks through a dynamic stochastic block model, working
paper or preprint (Jun. 2015).
URL https://hal.archivesouvertes.fr/hal01167837  (29) H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control 19 (1974) 716–723.
 (30) G. Schwarz, Estimating the dimension of a model, Annals of Statistics 6 (1978) 461–464.
 (31) C. Biernacki, G. Celeux, G. Govaert, Assessing a mixture model for clustering with the integrated completed likelihood, Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (7) (2000) 719–725.
 (32) V. D. Blondel, J. loup Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks (2008).

(33)
A. Noack, R. Rotta, Multilevel
algorithms for modularity clustering, CoRR abs/0812.4073.
URL http://arxiv.org/abs/0812.4073  (34) W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association 66 (336) (1971) 846–850.
 (35) L. Nouedoui, P. Latouche, Bayesian non parametric inference of discrete valued networks, in: 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2013), Bruges, Belgium, 2013, pp. 291–296.
 (36) L. Isella, J. StehlÃ©, A. Barrat, C. Cattuto, J. Pinton, W. Van den Broeck, What’s in a crowd? analysis of facetoface behavioral networks, Journal of Theoretical Biology 271 (1) (2011) 166–180. doi:DOI:10.1016/j.jtbi.2010.11.033.
 (37) W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 3rd Edition, Cambridge University Press, 2007.