A Poisson Gamma Probabilistic Model for Latent Nodegroup Memberships in Dynamic Networks
Abstract
We present a probabilistic model for learning from dynamic relational data, wherein the observed interactions among networked nodes are modeled via the Bernoulli Poisson link function, and the underlying network structure are characterized by nonnegative latent nodegroup memberships, which are assumed to be gamma distributed. The latent memberships evolve according to Markov processes. The optimal number of latent groups can be determined by data itself. The computational complexity of our method scales with the number of nonzero links, which makes it scalable to large sparse dynamic relational data. We present batch and online Gibbs sampling algorithms to perform model inference. Finally, we demonstrate the model’s performance on both synthetic and realworld datasets compared to stateoftheart methods.
main.glsdefs
A Poisson Gamma Probabilistic Model for Latent Nodegroup Memberships in Dynamic Networks
Sikun Yang, Heinz Koeppl Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt 64283 Darmstadt, Germany {sikun.yang, heinz.koeppl}@bcs.tudarmstadt.de
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Introduction
Considerable work has been done on the analysis of static networks in terms of community detection or link prediction (?; ?; ?). However, due to the temporal evolution of nodes (e.g. individuals), their role within a network can change and hence observed links among nodes may appear or disappear over time (?; ?). Given such dynamic network data, one may be interested in understanding the temporal evolution of groups in terms of their size and nodegroup memberships and in predicting missing or future unobserved links based on historical records.
A dynamic network of nodes can be represented as a sequence of adjacency matrices , , where indicates the presence of a link between node and at time point and otherwise. For the sake of clarity we focus on undirected and unweighted networks but the presented method can be extended to weighted networks via the compound Poisson distribution (?) and to multirelational networks (?; ?).
Many of the current probabilistic methods for dynamic networks map observed binary edgevariables (either links or nonlinks) to latent Gaussian random variables via the logistic or probit function (?). The timecomplexity of these approaches often scales quadratically with the number of nodes, i.e., . This will become infeasible for large networks and will become especially inefficient for sparse networks. In this work we leverage the BernoulliPoisson link function (?; ?; ?) to map an observed binary edge to a latent Poisson count random variable, which leads to a computational cost that only scales with the number of nonzero edges. As large, realworld networks are usually very sparse, the proposed method yields significant speedup and enables the analysis of larger dynamic networks. We allow for a timevarying nonnegative degree of membership of a node to a group. We realize this by constructing a gamma Markov chain to capture the time evolution of those latent membership variables. Inspired by recent advances in data augmentation and marginalization techniques (?), we develop an easytoimplement efficient Gibbs sampling algorithm for model inference. We also present an online Gibbs sampling algorithm that can process data in minibatches and thus readily scales to massive sparse dynamic networks. The algorithms performs favorably on standard datasets when compared to stateoftheart methods (?; ?; ?).
Dynamic Poisson Gamma Membership Model
In the proposed model, each node is characterized by a timedependent latent membership variable that determines its interactions or involvement in group at the snapshot of the dynamic networks. This latent nodegroup membership is modeled by a gamma random variable and is, thus, naturally nonnegative realvalued. This is contrast to multigroup memberships models (or latent feature models) (?; ?; ?) where each nodegroup membership is represented by a binary latent feature vector. These models assume that each node either associates to one group or not – simply by a binary feature. The proposed model on the other hand can characterize how strongly each node associates with multiple groups.
Dynamics of latent nodegroup memberships. For dynamic networks, the latent nodegroup membership can evolve over time to interpret the interaction dynamics among the nodes. For example, latent group could mean “play soccer” and could mean how frequently person plays soccer or how strongly person likes playing soccer. The person’s degree of association to this group could be increasing over time due to, for instance, increased interaction with professional soccer players, or decreasing over time as a consequence of sickness. Hence, in order to model the temporal evolution of the latent nodegroup memberships, we assume the individual memberships to form a gamma Markov chain. More specifically, is drawn from a gamma distribution, whose shape parameter is the latent membership at the previous time
where the parameter controls the variance without affecting the mean, i.e., .
Model of latent groups. We characterize the interactions or correlations among latent groups by a matrix of size , where relates to the probability of there being a link between node affiliated to group and node affiliated to group . Specifically, we assume the latent groups to be generated by the following hierarchical process: we first generate a separate weight for each group as
(1) 
and then generate the intergroup interaction weight and intragroup weight as
(2) 
where and . The reasonable number of latent groups can be inferred from dynamic relational data itself by the shrinkage mechanism of our model. More specifically, for fixed , the redundant groups will effectively be shrunk as many of the groups weights tend to be small for increasing . Thus, the interaction weights between the redundant group and all the other groups , and all the nodememberships to group will be shrunk accordingly. In practice, the intragroup weight would tend to almost zero if for small , and the corresponding groups will disappear inevitably. Hence, we use a separate variable to avoid overly shrinking of small groups with less interactions with other groups. As has a large effect on the number of the latent groups, we do not treat it as a fixed parameter but place a gamma prior over it, i.e., . Given the latent nodegroup membership and the interaction weights among groups, the probability of there being a link between node and is given by
(3) 
Interestingly, we can also generate by truncating a latent count random variable at , where can be seen as the integervalued weight for node and and can be interpreted as the number of times the two nodes interacted. More specifically, can be drawn as
(4)  
(5) 
where indicates the Poisson distribution. We can obtain Eq. (3) by marginalizing out the latent count from the above expression. The conditional distribution of the latent count can then be written as
where is the zerotruncated Poisson distribution with probabilisty mass function (PMF) and and denotes the set of all nodegroup membership variables. The usefulness of this construction for will become clear in the inference section. We note that the latent count only needs to be sampled for , using rejection sampling detailed in (?). To this end the proposed hierarchical generative model is as follows.

For our model’s hyperparameters, we draw and from . The graphical model is shown in Fig. 1.
Related Work
Approaches to analyze dynamic networks range from nonBayesian methods such as the exponential random graph models (?) or matrix and tensor factorization based methods (?) to Bayesian latent variable models (?; ?; ?; ?; ?). Our work falls into the latter class and we hence confine ourselves to discuss it’s relation to this class. Dynamic extensions of mixed membership models, where each node is assigned to a set of latent groups represented by multinomial distribution, have been developed (?; ?). One limitation of mixed membership models is that if the probability that node associates to group is increased, the probability that node associates to group has to be decreased. The multigroup memberships models use a binary latent feature vector to characterize each node’s multigroup memberships. In multigroup memberships models, a node’s membership to one group does not limit its memberships to other groups. However, differences in the degree associations of a node to different groups cannot be captured by such models (?; ?; ?). One possible extension is to introduce a Gaussian distributed random variables to characterize how strongly each node is associated to different groups as previously done for latent factor models (?). Such approaches where membership variables evolve according to linear dynamical systems (?) can exploit the rich and efficient toolset for inference, such as Kalman filtering. However, the resulting signedvalued latent features lack an intuitive interpretation, e.g., in terms of degree of membership to a group. In contrast to these approaches, our model is based on a bilinear Poisson factor model (?), where each node’s memberships to groups are represented by a nonnegative realvalued memberships variable. The model does not only allow each node to be associated with multiple groups but also captures the degree at which each node is associated to a group. It means that our model combines the advantages of both mixed membership and multigroup memberships models. We exploit recent data augmentation technique (?), to construct a sampling scheme for the time evolution of the nonnegative latent features. Related to our work is the dynamic gamma process Poisson factorization model (DGPPF) (?) where the underlying groups’ structure can evolve over time but each nodegroup membership is static. This is in constrast to our approach where the node’s memberships evolve over time. We note that the gamma Markov chain used by our method and by DGPPF is motivated by the augmentation techniques in (?). In the experiment section we compare our model to (1) the hierarchical gamma process edge partition model (HGPEPM) (?), which is the static counterpart of our model, (2) the dynamic relational infinite feature model (DRIFT) (?) which uses binary latent features to represent the nodegroup memberships, and characterizes the temporal dependences of latent features via a hidden Markov process and (3) the DGPPF model.
Inference
We present a Gibbs sampling procedure to draw samples of from their posterior distribution given the observed dynamic relational data and the hyperparameters . In order to circumvent the technical challenges of drawing samples from the gamma Markov chain which does not yield closedform posterior, we make use of the idea of data augmentation and marginalization technique and of the gammaPoisson conjugacy to derive a closedform update.
Notation. When expressing the full conditionals for Gibbs sampling we will use the shorthand “–” to denote all other variables or equivalently those in the Markov blanket for the respective variable according to Fig. 1. We use “” as a index summation shorthand, e.g., .
We repeatedly exploit the following three results (?; ?; ?) to derive the conditional distributions used in our sampling algorithm.
Result 1. A negative binomially (NB) distributed random variable can be generated from a gamma mixed Poisson distribution as, i.e., and , as seen by marginalizing over .
Result 2. The Poissonlogarithmic bivariate distributed variable (?) with and a Chinese restaurant table (CRT) (?) distributed variables , can equivalently be expressed as a sumlogarithmic (SumLog) and Poisson variable, i.e., with and .
Result 3. Let , where are independently drawn from a Poisson distribution with rate , then according to the Poissonmultinomial equivalence (?), we have and .
Gibbs sampling
Sampling latent counts . We sample a latent count for each time dependent observed edge as
(6) 
Sampling individual counts .
We can partition the latent count
using the Poisson additive property as
, where . Then, via the Poissonmultinomial equivalence, we sample the latent count as
(7) 
Sampling group weights . Via the Poisson additive property, we have
(8) 
where we defined and . We can marginalize out from Eq. (8) and (2) using the gammaPoisson conjugacy, which gives
where and denotes the Kronecker delta. According to Result 2, we introduce the auxiliary variables as
(9) 
We then reexpress the bivariate distribution over and as
(10) 
Using Eq. (1) and (10), via the gammaPoisson conjugacy, we obtain the conditional distribution of as
(11) 
Sampling intragroup weight . We resample the auxiliary variables using Eq. (9), and then exploit the gammaPoisson conjugacy to sample as
(12) 
Sampling intergroup weights . We sample from its conditional obtained from Eq. (2) and (8) via the gammaPoisson conjugacy as
(13) 
Sampling hyperparameter . Using Eq. (10) and the Poisson additive property, we have as
Marginalizing out using the gammaPoisson conjugacy, we have
where . We introduce the auxiliary variables and reexpress the bivariate distribution over and as
(14) 
Using Eq. (14), we can then sample via the gammaPoisson conjugacy as
(15) 
Sampling latent memberships . Since the latent memberships evolve over time according to our Markovian construction, the backward and forward information need to be incorporated into the updates of . We start from time slice ,
where
Via the gammaPoisson conjugacy, we have
(16) 
Marginalizing out yields
(17) 
where . According to Result 2, the NB distribution can be augmented with an auxiliary variable as
(18) 
We reexpress the bivariate distribution over and as
(19) 
where
(20) 
Given , via the Poisson additive property, we have
(21) 
Combing the likelihood in Eq. (21) with the gamma prior placed on , we immediately have the conditional distribution of via the gammaPoisson conjugacy as
(22)  
Here, can be considered as the backward information passed from to . Recursively, we augment at each time slice with an auxiliary variable as
(23) 
where the NB distribution over is obtained via the Poisson additive property and gammaPoisson conjugacy with . Repeatedly using Result 2, we have
By repeatedly exploiting the Poisson additive property and gammaPoisson conjugacy, we obtain
(24)  
We sample the auxiliary variables and update recursively from to , which can be considered as the backward filtering step. Then, in the forward pass we sample from to .
Sampling hyperparameters.
Via the gammagamma conjugacy, we sample and as
(25)  
Algorithm 1 summarizes the full procedure.
Online Gibbs sampling
To make our model applicable to largescale dynamic networks, we propose an online Gibbs sampling algorithm based on the recent developed Bayesian conditional density filtering (BCDF) (?), which has been adapted for Poisson tensor factorization (?) recently. The main idea of BCDF is to partition the data into small minibatches, and then to perform inference by updating the sufficient statistics using each minibatch in each iteration. Specifically, the sufficient statistics used in our model are the latent count numbers. We use and to denote the indices of the entire data and the minibatch in iteration respectively. We define the quantities updated with the minibatch in iteration as:
The main procedure of our online Gibbs sampler is then as follows. We first update the sufficient statistics used to sample model parameters as
where , where and is the decay factor commonly used for online methods (?). We calculate the sufficient statistics for each minibatch and then resample the model parameters using the procedure in batch Gibbs sampling algorithm outlined in Algorithm 1.
Experiments
We evaluate our model by performing experiments on both synthetic and realworld datasets. First, we generate a synthetic data with the true underlying network structure evolving over time to test our model on dynamic community detection. For quantitive evaluation, we determine the model’s ability to predict heldout missing links. Our baseline methods include DRIFT, DGPPF and HGPEPM as we discussed before. For DRIFT, we use default settings as the code released online. ^{1}^{1}1http://jfoulds.informationsystems.umbc.edu/code/DRIFT.tar.gz. We implemented DGPPF by ourselves and set the hyperparameters and initialize the model parameters with the values provided in (?). For HGPEPM, we used the code released for (?). ^{2}^{2}2https://github.com/mingyuanzhou/EPM. In the following, we refer to our model as DPGM (Dynamic Poisson Gamma Membership model). For DPGM, we set and use , where is the number of nodes, for initilization. We obtain similar results when instead setting in a sensitivity analysis. For online Gibbs sampling, we set , and minibatch size . All our experiments were performed on a standard desktop with 2.7 GHz CPU and 24 GB RAM. Following (?), we generate a set of smallscale dynamic networks from realworld relational data while we use heldout relational data to evaluate our model. The following three realworld datasets are used in our experiments, the detail of which are summarized in Table 1.
NIPS. The dataset records the coauthorship information among 5722 authors on publications in NIPS conferences over the past ten years (?). We first take the 70 authors who are most connected across all years to evaluate all methods (NIPS 70). We also use the whole dataset (NIPS 5K) for evaluation.
Enron. The dataset contains 136776 emails among 2359 persons over 28 months () (?). We generate a binary symmetric matrix for each monthly snapshot. The presence or absence of an email between each pair of persons during one month is described by the binary link at that time. We first select 61 persons by taking a 7core of the aggregated network for the entire time and filter out the authors with email records less than 5 snapshots (Enron 61). We also use the whole dataset for evaluation (Enron 2K).
DBLP. The DBLP dynamic networks dataset (?) are generated from the coauthorship recordings among 347013 authors over 25 years, which is a subset of data contained in the DBLP database. We first choose 7750 authors by taking a 7core of the aggregated network for the entire time (DBLP 7K) and subsequently filter out authors with less than 10 years of consecutive publication activity to generate a small dataset (DBLP 96). The proposed method and the two baselines are applied to all six datasets, except for DRIFT. DRIFT could not be applied to NIPS 5K, Enron 2K and DBLP 7K due to its unfavorable computational complexity. Most of these datasets exhibit strong sparsity that the proposed algorithm can exploit through its BernoulliPoisson link function.
Datasets  NIPS 70  DBLP 96  Enron 61 

Nodes  70  96  61 
Time slices  10  25  28 
Nonzero links  528  1392  1386 
NIPS 5K  DBLP 7K  Enron 2K  
Nodes  5722  7750  2359 
Time slices  10  10  28 
Nonzero links  5514  108980  76828 
Dynamic community detection


(a)  (b)  (c)  (d)  (e) 
NIPS 70  DBLP 96  Enron 61  
Model  AUCROC  AUCPR  AUCROC  AUCPR  AUCROC  AUCPR 
HGPEPM  
DGPPF  
DRIFT  
DPGM (batch)  
NIPS 5K  DBLP 7K  Enron 2K  
Model  AUCROC  AUCPR  AUCROC  AUCPR  AUCROC  AUCPR 
HGPEPM  
DGPPF  
DPGM (batch)  
DPGM (online) 
NIPS 70  DBLP 96  Enron 61  NIPS 5K  DBLP 7K  Enron 2K  
DGPPF  0.0388  0.1350  0.2161  DGPPF  7.6440  7.9160  7.8227 
DRIFT  11.7047  42.1853  24.7505  DPGM (batch)  10.6240  15.6576  15.8584 
DPGM (batch)  0.1283  0.6302  0.7364  DPGM (online)  8.9501  10.8152  10.4521 
We adapt the synthetic example used in (?) to generate a dynamic network of size that evolve over five time slices as shown in Fig. 2. More specifically, we generate three groups at , and split the second group at . From to , the second and third group merge into one group. In Fig. 2, column (b) and (d) show the discovered latent groups over time by DGPPF and DPGM, respectively. DGPPF captures the evolution of the discovered groups but has difficulties to characterize the changes of nodegroup memberships over time. Our model (DPGM) can detect the dynamic groups quite distinctively. We also show the associations of the nodes to the inferred latent groups by DGPPF and DPGM in column (c) and (e), respectively. In particular, we calculate the association weights of each node to the latent groups as for DGPPF and for DPGM. For both models, most of the redundant groups can effectively be shrunk even though we initialize both algorithms with . The nodegroup association weights estimated by DPGM vary smoothly over time and capture the evolution of the nodegroup memberships, which is consistent to the ground truth shown in column (a).
Missing link prediction
For the task of missing link prediction, we randomly hold out of the observed interactions (either links or nonlinks) at each time as test data. The remaining data is used for training. HGPEPM, DRIFT and DGPPF are considered as the baseline methods. We train a HGPEPM model on the training entries for each time slice separately. For each method, we use 2000 burnin iterations, and collect 1000 samples of the model posterior. We estimate the posterior mean of the link probability for each heldout edge in the test data by averaging over the collected Gibbs samples. We then use these link probabilities to evaluate the predictive performance of each model by calculating the area under the curve of the receiver operating characteristic (AUCROC) and of the precisionrecall (AUCPR). Table 2 shows the average evaluation metrics for each model over 10 runs. Overall, our model (DPGM) shows the best performance. We observe that both DRIFT and DPGM outperform DGPPF because the evolution of individual nodegroup memberships are explicitly captured in these two models. DGPPF essentially assumes that the nodes’ memberships are static over time and thus has difficulties to fully capture the dynamics of each node’s interactions caused by the same node’s memberships’ evolution. We see that DPGM outperforms its static counterpart, HGPEPM, via capturing the evolution of nodes’ memberships over time. We also compare periteration computation time of each model (all models are implemented in Matlab), as shown in Table 3. The computational cost of DRIFT scales quadratically with the number of nodes. Both DGPPF and DPGM are much faster than DRIFT because the former two models scale only with the number of nonzero edges. We also report periteration computation time of GPPF and DPGM with Matlab/MEX/C implementation on mediumscale data in Table 3.
Conclusion
We have presented a probabilistic model for learning from dynamic relational data. The evolution of the underlying structure is characterized by the Markovian construction of latent memberships. We also proposed efficient batch and online Gibbs algorithms that make use of the data augmentation technique. Experimental results on synthetic and real datasets illustrate our model’s interpretable latent representations and competitive performance. Our model is dedicated to dynamic networks modeling but can be considered for other related problems such as dynamic multirelational graph model (?). Another interesting direction is to scale up the model inference algorithm via stochastic gradient variational Bayes (?).
Acknowledgments
The authors thank Adrian Šošić and the reviewers for constructive comments. This work is funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement 668858.
References
 [Acharya et al. 2015] Acharya, A., et al. 2015. Nonparametric dynamic network modeling. In KDD Workshop on MLT, 104–113.
 [Airoldi et al. 2008] Airoldi, E. M., et al. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9:1981–2014.
 [Basbug and Engelhardt 2016] Basbug, M. E., and Engelhardt, B. E. 2016. Hierarchical compound Poisson factorization. In ICML, 1795–1803.
 [Bordes et al. 2011] Bordes, A., et al. 2011. Learning structured embeddings of knowledge bases. In AAAI, 301–306.
 [Caron and Fox 2015] Caron, F., and Fox, E. B. 2015. Sparse graphs using exchangeable random measures. CoRR arXiv:1401.1137.
 [Dunlavy et al. 2011] Dunlavy, D. M., et al. 2011. Temporal link prediction using matrix and tensor factorizations. ACM Trans. Knowl. Discov. Data 5(2):10:1–10:27.
 [Dunson and Herring 2005] Dunson, D. B., and Herring, A. H. 2005. Bayesian latent variable models for mixed discrete outcomes. Biostatistics 6(1):11–25.
 [Durante et al. 2014] Durante, D., et al. 2014. Bayesian logistic Gaussian process models for dynamic networks. In AISTATS, 194–201.
 [Foulds et al. 2011] Foulds, J. R., et al. 2011. A dynamic relational infinite feature model for longitudinal social networks. In AISTATS, 287–295.
 [Fu et al. 2009] Fu, W., et al. 2009. Dynamic mixed membership blockmodel for evolving networks. In ICML, 329–336.
 [Goldenberg et al. 2010] Goldenberg, A., et al. 2010. A survey of statistical network models. Found. Trends Mach. Learn. 2(2):129–233.
 [Guhaniyogi et al. 2014] Guhaniyogi, R., et al. 2014. Bayesian conditional density filtering for big data. CoRR abs/1401.3632.
 [Heaukulani et al. 2013] Heaukulani, C., et al. 2013. Dynamic probabilistic models for latent feature propagation in social networks. In ICML, 275–283.
 [Ho et al. 2011] Ho, Q., et al. 2011. Evolving cluster mixedmembership blockmodel for timeevolving networks. In AISTATS, 342–350.
 [Hoff et al. 2001] Hoff, P. D., et al. 2001. Latent space approaches to social network analysis. JASA 97:1090–1098.
 [Hu et al. 2015] Hu, C., et al. 2015. Zerotruncated Poisson tensor factorization for massive binary tensors. In UAI, 375–384.
 [Ishiguro et al. 2010] Ishiguro, K., et al. 2010. Dynamic infinite relational model for timevarying relational data analysis. In NIPS, 919–927.
 [Kemp et al. 2006] Kemp, C., et al. 2006. Learning systems of concepts with an infinite relational model. In AAAI, 381–388.
 [Kim et al. 2013] Kim, M., et al. 2013. Nonparametric multigroup membership model for dynamic networks. In NIPS, 1385–1393.
 [Knowles 2015] Knowles, D. A. 2015. Stochastic gradient variational Bayes for gamma approximating distributions. arXiv preprint arXiv:1509.01631.
 [Mucha et al. 2010] Mucha, P. J., et al. 2010. Community structure in timedependent, multiscale, and multiplex networks. Science 328(5980):876–878.
 [Nickel et al. 2016] Nickel, M., et al. 2016. A review of relational machine learning for knowledge graphs. Proc. of the IEEE 104(1):11–33.
 [Pitman 2006] Pitman, J. 2006. Combinatorial stochastic processes. Berlin: SpringerVerlag. Lectures on Probability Theory.
 [Quenouille 1949] Quenouille, M. H. 1949. A relation between the logarithmic, Poisson, and negative binomial series. Biometrics 5(2):162–164.
 [Rai and Daume III 2008] Rai, P., and Daume III, H. 2008. The infinite hierarchical factor regression model. In NIPS.
 [Robins et al. 2007] Robins, G.; Pattison, P.; Kalish, Y.; and Lusher, D. 2007. An introduction to exponential random graph (p*) models for social networks. Social Networks 29(2):173–191.
 [Rosenfeld et al. 2014] Rosenfeld, N., et al. 2014. Learning structured models with the auc loss and its generalizations. In AISTATS, 841–849.
 [Tang et al. 2008] Tang, L., et al. 2008. Community evolution in dynamic multimode networks. In KDD, 677–685.
 [Tay et al. 2017] Tay, Y., et al. 2017. Nonparametric estimation of multiple embeddings for link prediction on dynamic knowledge graphs. In AAAI, 1243–1249.
 [Xing et al. 2010] Xing, E. P., et al. 2010. A statespace mixed membership blockmodel for dynamic network tomography. Ann. Appl. Stat. 4(2):535–566.
 [Zhou and Carin 2015] Zhou, M., and Carin, L. 2015. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 37(2):307–320.
 [Zhou 2015] Zhou, M. 2015. Infinite edge partition models for overlapping community detection and link prediction. In AISTATS, 1135–1143.