Dynamic Nonparametric Edge-Clustering Model
for Time-Evolving Sparse Networks
Interaction graphs, such as those recording emails between individuals or transactions between institutions, tend to be sparse yet structured, and often grow in an unbounded manner. Such behavior can be well-captured by structured, nonparametric edge-exchangeable graphs. However, such exchangeable models necessarily ignore temporal dynamics in the network. We propose a dynamic nonparametric model for interaction graphs that combine the sparsity of the exchangeable models with dynamic clustering patterns that tend to reinforce recent behavioral patterns. We show that our method yields improved held-out likelihood over stationary variants, and impressive predictive performance against a range of state-of-the-art dynamic interaction graph models.
Dynamic Nonparametric Edge-Clustering Model
for Time-Evolving Sparse Networks
Elahe Ghalebi TU Wien firstname.lastname@example.org Hamidreza Mahyar TU Wien email@example.com Radu Grosu TU Wien firstname.lastname@example.org Sinead A. Williamson University of Texas, Austin email@example.com
noticebox[b]Preprint. Under review.\end@float
Many forms of social interaction can be represented in terms of a multigraph, where each individual interaction corresponds to an edge in the graph, and repeated interactions may occur between two individuals. For example, we might have multigraphs where the value of an edge corresponds to the number of emails between two individuals, or the number of packages sent between two computers.
Recently, the class of edge exchangeable graphs [1, 2, 3] have been proposed for modeling networks as exchangeable sequences of edges. These models are able to capture many properties of large-scale social networks, such as sparsity, community structure, and power-law degree distribution.
Being explicit models for sequences of edges, the edge-exchangeable models are appropriate for networks that grow over time: we can add more edges by expanding the sequence, and their nonparametric nature means that we expect to introduce previously unseen vertices as the network expands. However, their exchangeable nature precludes graphs whose properties change over time. In practice, the dynamics of social interactions tend to vary over time. In particular, in models that aim to capture community dynamics, the popularity of a given community can wax and wane over time.
We propose a new model for sparse multigraphs with clustered edges, that breaks the exchangeability of existing models by preferentially assigning edges to clusters that have been recently active. We show that incorporating dynamics using a mechanism based on the distance-dependent Chinese restaurant process (ddCRP)  leads to improved test-set predictive likelihood over exchangeable models. Further, when used in a link prediction task, we show improved performance over both its exchangeable counterpart and a range of state-of-the-art dynamic network models.
2 Background and related work
Our goal is to construct a Bayesian model for sparse interaction multigraphs where the interaction patterns can vary over time. Like many dynamic graph models, our model incorporates temporal dynamics into a previously defined stationary model. We begin by discussing Bayesian stationary models, before moving on to dynamic models. We end this section by discussing dynamic extensions of the Chinese restaurant process, which we will use in our construction.
2.1 Bayesian models for multigraphs
Bayesian models for multigraphs can loosely be divided into three camps. First, we have graphs where the value of an edge between vertices and is a random variable parametrized by the value of some function . In the multigraph setting we consider in this paper, that function might be a Poisson distribution.
We refer to these graphs as jointly vertex-exchangeable, since the distribution over the adjacency matrix is invariant to jointly permuting the row and column indices. In this class, we have models such as the stochastic blockmodel, where vertices are clustered into finitely many communities and a parameter is associated with each community-community pair [5, 6]; the infinite relational model, where the number of communities is unbounded ; mixed-membership stochastic blockmodels, where edges are generated according to an admixture model ; the Latent Feature Relational Model, where parameter values are distributed according to a latent feature model based on the Indian buffet process ; and Poisson factor analysis, where the parameter values are distributed according to a gamma process-based latent factor model [10, 11]. While these models are able to capture interesting community structure, the resulting graphs are dense almost surely [12, 13]. This makes them a poor choice for large networks, which are typically sparse.
Second, we have multigraphs where the edges occur according to a Poisson process on the space of potential edges. The main example of such a model is , where the edges are sampled according to a Poisson process where the rate measure is distributed according to a generalized gamma process. Such a model has been extended to have community structure . These models yield sparse graphs with power-law degree distribution, properties that are common in large social networks.
Third, we have multigraphs constructed using an exchangeable sequence of edges [1, 2]. Here, we assume the edges are generated by sequentially sampling pairs of vertices. These pairs of vertices are iid given some nonparametric prior, such as a Dirichlet process, a normalized generalized gamma process, or a Pitman Yor process, resulting in a sparse multigraph. Unlike the Poisson process based models, edge exchangeable multigraphs can grow over time by adding new edges, either between new or previously seen vertices.
Several extensions have been made to these edge exchangeable graphs to incorporate community structure. [3, 16]. Most relevant to this paper, the mixture of Dirichlet network distributions (MDND)  uses a mixture of edge-exchangeable models, with shared infinite-dimensional support. Concretely, the MDND assumes exchangeable sequences of links are generated according to:
Each link is associated with a cluster , which governs which edge-exchangeable model it is generated from. The clusters are distributed according to a Chinese restaurant process (CRP), allowing an unbounded number of clusters. A hierarchical Dirichlet process formulation  ensures that the component edge exchangeable models have common support, so that a vertex can appear in edges associated with multiple clusters.
2.2 Models for dynamic graphs
There has been significant research attention on dynamic (time-evolving) networks modelling, ranging from non-Bayesian methods such as dynamic extensions of the exponential random graph model (ERGM) , or matrix and tensor factorization-based methods , to Bayesian latent variable models [20, 21, 22, 23, 24, 25, 26, 27]. A common approach relies on the extensions of static network models to a dynamic framework. We focus here on dynamic extensions of Bayesian models of the forms discussed in Section 2.1.
Most dynamic Bayesian networks extend jointly vertex-exchangeable graphs. For example,  extends the stochastic blockmodel using an extended Kalman filter (EFK) based algorithm, and the stochastic block transition model  relaxes a hidden Markov assumption on the edge-level dynamics, allowing the presence or absence of edges to directly influence future edge probabilities. Several methods have also been used to incorporate temporal dynamics into the mixed membership stochastic blockmodel framework [30, 31, 32] and the latent feature relational model [33, 34, 35]. Most recently, several models have extended Poisson factor analysis. The dynamic gamma process Poisson factorization (DGPPF)  introduces dependency by incorporating a Markov chain of marginally gamma random variables into the latent representation. The dynamic Poisson gamma model (DPGM)  extends a bilinear form of Poisson factor analysis  in a similar manner; they dynamic relational gamma process model (DRGPM)  also incorporates a temporally dependent thinning process.
Much less work has been carried out on dynamic extensions of the sparse graphs generated using Poisson processes or via a sequence of exchangeable edges. In the Poisson process-based space,  use a time-dependent base measure, and assume edges have a geometric lifespan. In the edge exchangeable case,  incorporates temporal dynamics into the MDND by introducing a latent Gaussian Markov chain, and a Poisson vertex birth mechanism.
2.3 Dynamic nonparametric priors
Our model extends the MNDN by replacing the exchangeable CRP-based clustering mechanism with a temporally varying clustering mechanism. A number of methods exist for incorporating temporal dynamics into the CRP, e.g. [39, 40, 41]. For our purposes, we choose to use the distance-dependent CRP (ddCRP) . Recall that the CRP can be described in terms of a restaurant analogy, where customers select tables (clusters) proportional to the number of people sat at that table, or sit at a new table with probability proportional to a concentration parameter . The ddCRP modifies this by encouraging customers to sit next to “similar” customers. In a time-dependent setting, similarity is evaluated based on arrival time using some non-negative, non-increasing decay function such that . Concretely, if customers and arrive at times , let
Then the th customer picks a customer to sit next to (and therefore a cluster) according to
The CRP is recovered if for all .
3 A dynamic edge-clustering graph models for time-evolving sparse graphs
We propose a dynamic extension of the MDND, which is appropriate for sparse, structured graphs for temporal dynamics. The MDND is based on a sequence of Dirichlet processes (see Eqn 3). One Dirichlet process (the distribution over the the cluster indicators , represented in Eqn 3 in terms of a CRP) governs the clustering structure of the edges. Another (the distribution over ) controls the number of vertices, and their overall popularity, within the graph. Finally, the distributions over the and the control the cluster-specific distributions over the “sender” and “recipient” of edges in the graph.
Any of these distributions could be replaced with dynamic or dependant clustering models to generate a temporally evolving graph. In practice, replacing all of the distributions with dynamic alternatives is likely to lead to overspecification of the dependencies, making inference challenging. We choose to retain stationary models for , and , implying that a cluster’s representation stays stable over time, and allow the cluster popularities to vary by making the sequence time-varying.
We capture this variation using a ddCRP (see Section 2.3), yielding the generative process (for some decay function )
The ddCRP is well-suited to our use case, as it captures the behavior that we are likely to see clusters that have appeared recently. In an interaction network context, this implies that we are likely to see modes of communication that have been popular in recent time periods, over modes of communication that have fallen out of popularity. Another reason to favor the ddCRP is ease of inference: its construction lends itself to an easy-to-implement Gibbs sampler, allowing us to apply our method to larger graphs. By contrast, many other dependent Dirichlet processes have much more complicated inference algorithms, which would limit scalability.
A limitation of the ddCRP is that it assumes that all data has been observed up to the current time point; the distribution is not invariant to adding edges at previously observed time points. This is not a concern in our setting, since we are typically able to observe past instances of the full graph, and are interested in predicting future edges.
We perform inference by combining the ddCRP sampler of  with the original MDND sampler . The original MDND sampler is based on the direct assignment inference algorithm for the hierarchical Dirichlet process , which assigns observations (in our case, links) to “tables” and represents using a finite-dimensional vector , where is the number of observed vertices. Concretely, let and be the number of tables in cluster associated with vertex as a sender and a recipient, respectively. Our procedure for inferring , and exactly follows .
Rather than sample the cluster assignment of link directly, we sample the link, that link “follows” or sits next to. Following , we first set th link to follow itself, i.e. , and then sample a new value for based on the conditional probability that ,
where represents the edge structure of the graph.
Rather than directly calculate , the likelihood of the graph given the entire partition, we calculate , where
is 1 if the clustering structure implied by is the same as that implied by . Alternatively, if joins two partitions that would be separate if , then the ratio becomes
where is the subset of edges that are in cluster (note that since we have assigned edge to follow itself, we have ). We have
where is the number of times vertex appears in cluster in role , and is the number of edges in cluster . The inference algorithm, which scales as , is summarized in Algorithm 1.
In this section, we address the following questions: (1) How well does DynMDND capture the underlying network behavior, as evaluated using test set log likelihoods, and (2) How well does DynMDND perform in a link prediction task, compared with other state-of-the-art dynamic network models? We explored these questions on four real-world datasets, detailed in Section 5.1.
We evaluated DynMDND using two decay functions, an Exponential decay where , and a Logistic decay where . We explored several values for between 0.5 and 2, and found setting worked well on all datasets. We placed Gamma(1, 1) priors on , , and , and sampled their values using the augmented samplers described in . We initialized our algorithm using the Louvain graph clustering method. All the experiments were run on a single node of a compute cluster with 48 cores, and 2.67 GHz RAM, using python code attached to this submission.111Code will be made public following submission
We evaluate our model on four real-world networks: (1) Face-to-Face dynamic contacts network (FFDC)222http://www.sociopatterns.org/datasets/high-school-contact-and-friendship-networks/  records face to face contacts among students with communications for school days in Marseilles, France. We consider each day as one time slot, and an edge between any pair of students at a timestamp is considered if they have at least one contact recorded at that given time. It leads to the total of edges and network sparsity . (2) Social Evolution network (SocialEv)333http://realitycommons.media.mit.edu/socialevolution.html , released by MIT Human Dynamics Lab, tracks the everyday life of a whole undergraduate dormitory with mobile phones. We consider the surveys of proximity (observed Bluetooth connections), calls and SMSs as the event time observations. The network consists of nodes and links with the total sparsity . This network has high clustering coefficient and about events over time slots. (3) DBLP  maintains information on more than 800,000 computer scientist publications among 958 authors over ten years (1997-2006) in 28 conferences. We extract a subset of most connected authors over all time period which contains edges with sparsity . We choose the snapshot interval to be a year, resulting in consecutive snapshot networks. (4) Enron444https://www.cs.cmu.edu/~enron/ contains interactions among users over 38 months (May 1999- June 2002) with sparsity . We consider an edge between each pair of users at each month, if they have at least one email recorded at that given time. We use the first snapshots of the network for the evaluation results.
5.2 Evaluation metrics
We study the effectiveness of DynMDND by evaluating our model of dynamic link prediction and dynamic test set likelihood prediction.
Test set log likelihood. We held out 100 test set allocations from each time slot, and trained our model on the remaining data.555We held out the values of the sender and recipient for the test set, but kept the time stamps, since the ddCRP method assumes the arrival times of all edges are known. We then used a Chib-style estimator  to estimate the log likelihood of the test set, and report mean and standard error of log likelihood for each decay method. We compare our dynamic link prediction model with various decays against the exchangeable MDND . We implement the MDND using the same code, but with a distance of 1 between all edges. This setting reduces the ddCRP to the CRP.
Link Prediction. Test set log likelihood is useful for evaluating whether the model is a good fit for data. However, in practical applications we often want to make concrete predictions for future network values. In the datasets described below, each discrete time step is associated with multiple edges, and within each time period there are no repeated edges. To predict the next edges in this context, we consider the probability distribution over the location of the next edge, and pick the highest probability edges. This task allows us to compare with models that are not explicitly designed for edge prediction. We consider three state-of-the-art network models, discussed in Section 2.2: DRGPM , DPGM , and DGPPF . All of these models are not designed for explicit link prediction, but can be modified to give predictions using the above procedure of selecting the highest probability edges. These models also have the limitation of assuming a fixed number of vertices. While the edge-based dynamic model of  is a dynamic extension of MDND and an appropriate comparison method, we were not able to compare due to lack of available code.
For performance comparison, we use F1 score, Map@ and Hit@. F1 score is 2(precisionrecall)/(precision+recall). Precision is the fraction of edges in the future network present in the true network, Recall is the fraction of edges of the true network present in the future network. MAP@ is the classical mean average precision measure and Hits@ is the rate of the top- ranked edges.
|Enron||-1032.94 147.18||-640.7390.51||-700.88 22.97|
Table 1 shows the predictive log likelihood computed by our DynMDND method using two different decays (i.e. Exponential and Logistic) in comparison to the CRP decay function. At each time slot , We use 80% of the network data for training the model and the remaining 20% for the test set. It can be seen that considering time dependency into our mixture models results in a better log likelihood for the task of prediction. We use a sample size of 1000. We repeat the experiments 10 times and report the mean and standard deviation of the results over four real networks.
Figure 1 compares dynamic log likelihood inference of DynMDND over the underlying evolving network, on all four datasets. It can be seen that DynMDND outperforms CRP in terms of better clustering and proves that considering time in the model can significantly improves inference.
Figure 2 illustrates the F1 score, Map@ and Hits@ for DynMDND with all three decay types, Exponential, Logistic and CRP vs. DRPGM, DPGM and DGPPF for dynamic link prediction. We use the networks of time slots 1 to as training set and predict the network edges of time slot . We report the results on the three datasets, FFDC, DBLP and Enron, using time interval one day, one year and one month respectively. For each task, we repeat the experiments 10 times and report the mean and standard deviation of each evaluation metric.
We see that DynMDND significantly outperforms DRPGM, DPGM and DGPPF on all metrics, for the task of dynamic link prediction. We hypothesise that this is due to several reasons. First, DynMDND is explicitly designed in terms of a predictive distribution over edges, making it well-suited to predicting future edges. Second, DynMDND is able to increase the number of vertices over time, and is likely better able to capture natural network growth. Conversely, the other methods assume the number of vertices is fixed—and explicitly incorporates the absence of edges at earlier time points into the likelihood.
We have presented a new model for interaction networks that can be represented in terms of sequences of links, such as email interaction graphs and collaboration graphs. Using a nonparametric sequence of links makes our model well-suited to predicting future links, and unlike many vertex-based graphs allows for an unbounded number of vertices.
Unlike previous edge-sequence models, we explicitly incorporate temporal dynamics in our construction. As we saw in Section 5, this allows us to make more accurate predictions in real-world multigraphs where the underlying patterns of behavior move over time.
In this paper, we incorporate dynamics using a ddCRP model, which encourages edges to belong to clusters that have been recently active. An interesting avenue for future research would be to explore alternative forms of dependency, and incorporate mechanisms that can capture link reciprocity .
-  Diana Cai, Trevor Campbell, and Tamara Broderick. Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, pages 4249–4257, 2016.
-  Harry Crane and Walter Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 113(523):1311–1326, 2018.
-  Sinead A Williamson. Nonparametric network models for link prediction. The Journal of Machine Learning Research, 17(1):7102–7121, 2016.
-  David M Blei and Peter I Frazier. Distance dependent chinese restaurant processes. Journal of Machine Learning Research, 12(Aug):2461–2488, 2011.
-  T.A.B. Snijders and T. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75–100, 1997.
-  B. Karrer and M.E.J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):016107, 2011.
-  C. Kemp, J.B. Tenenbaum, T.L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In National Conference on Artificial Intelligence (AAAI), pages 381–388, 2006.
-  Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels. Journal of machine learning research, 9(Sep):1981–2014, 2008.
-  Kurt Miller, Michael I Jordan, and Thomas L Griffiths. Nonparametric latent feature models for link prediction. In Advances in neural information processing systems, pages 1276–1284, 2009.
-  Mingyuan Zhou and Lawrence Carin. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013.
-  Prem Gopalan, Jake M Hofman, and David M Blei. Scalable recommendation with hierarchical poisson factorization. In UAI, pages 326–335, 2015.
-  David J Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581–598, 1981.
-  D.N. Hoover. Relations on probability spaces and arrays of random variables. Preprint. Institute for Advanced Study, Princeton., 1979.
-  François Caron and Emily B Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1295–1366, 2017.
-  Juho Lee, Lancelot F James, Seungjin Choi, and François Caron. A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure. arXiv preprint arXiv:1810.01778, 2018.
-  Tue Herlau, Mikkel N Schmidt, and Morten Mørup. Completely random measures for modelling block-structured sparse networks. In Advances in Neural Information Processing Systems, pages 4260–4268, 2016.
-  Yee Whye Teh, Michael I Jordan, Matthew J Beal, and David M Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
-  Fan Guo, Steve Hanneke, Wenjie Fu, and Eric P Xing. Recovering temporally rewiring networks: A model-based approach. In Proceedings of the 24th international conference on Machine learning, pages 321–328. ACM, 2007.
-  Daniel M Dunlavy, Tamara G Kolda, and Evrim Acar. Temporal link prediction using matrix and tensor factorizations. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(2):10, 2011.
-  Purnamrita Sarkar, Sajid M. Siddiqi, and Geogrey J. Gordon. A latent space approach to dynamic embedding of co-occurrence data. In International Conference on Artificial Intelligence and Statistics, volume 2, pages 420–427, 21–24 Mar 2007.
-  Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, and Joshua B Tenenbaum. Dynamic infinite relational model for time-varying relational data analysis. In Advances in Neural Information Processing Systems, pages 919–927, 2010.
-  Purnamrita Sarkar, Deepayan Chakrabarti, Michael Jordan, et al. Nonparametric link prediction in large scale dynamic networks. Electronic Journal of Statistics, 8(2):2022–2065, 2014.
-  Daniele Durante and David B Dunson. Nonparametric bayes dynamic modelling of relational data. Biometrika, 101(4):883–898, 2014.
-  Aaron Schein, Mingyuan Zhou, David M Blei, and Hanna Wallach. Bayesian poisson tucker decomposition for learning the structure of international relations. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, pages 2810–2819, 2016.
-  Konstantina Palla, Francois Caron, and Yee Whye Teh. Bayesian nonparametrics for sparse dynamic networks. arXiv preprint arXiv:1607.01624, 2016.
-  Yin Cheng Ng and Ricardo Silva. A dynamic edge exchangeable model for sparse temporal networks. arXiv:1710.04008, 2017.
-  Sikun Yang and Heinz Koeppl. Dependent relational gamma process models for longitudinal networks. In International Conference on Machine Learning, pages 5547–5556, 2018.
-  Kevin S Xu and Alfred O Hero. Dynamic stochastic blockmodels for time-evolving social networks. IEEE Journal of Selected Topics in Signal Processing, 8(4):552–562, 2014.
-  Kevin Xu. Stochastic block transition models for dynamic networks. In International Conference on Artificial Intelligence and Statistics, pages 1079–1087, 2015.
-  Wenjie Fu, Le Song, and Eric P Xing. Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th annual international conference on machine learning, pages 329–336, 2009.
-  Eric P Xing, Wenjie Fu, Le Song, et al. A state-space mixed membership blockmodel for dynamic network tomography. The Annals of Applied Statistics, 4(2):535–566, 2010.
-  Qirong Ho, Le Song, and Eric Xing. Evolving cluster mixed-membership blockmodel for time-evolving networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 342–350, 2011.
-  James Foulds, Christopher DuBois, Arthur Asuncion, Carter Butts, and Padhraic Smyth. A dynamic relational infinite feature model for longitudinal social networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 287–295, 2011.
-  Creighton Heaukulani and Zoubin Ghahramani. Dynamic probabilistic models for latent feature propagation in social networks. In International Conference on Machine Learning, pages 275–283, 2013.
-  Myunghwan Kim and Jure Leskovec. Nonparametric multi-group membership model for dynamic networks. In Advances in neural information processing systems, pages 1385–1393, 2013.
-  Ayan Acharya, Joydeep Ghosh, and Mingyuan Zhou. Nonparametric bayesian factor analysis for dynamic count matrices. arXiv preprint arXiv:1512.08996, 2015.
-  Sikun Yang and Heinz Koeppl. A poisson gamma probabilistic model for latent node-group memberships in dynamic networks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
-  Mingyuan Zhou. Infinite edge partition models for overlapping community detection and link prediction. In Artificial Intelligence and Statistics, pages 1135–1143, 2015.
-  Steven N MacEachern. Dependent dirichlet processes. Unpublished manuscript, Department of Statistics, The Ohio State University, pages 1–40, 2000.
-  Dahua Lin, Eric Grimson, and John W Fisher. Construction of dependent dirichlet processes based on poisson processes. In Advances in neural information processing systems, pages 1396–1404, 2010.
-  Lu Ren, David B. Dunson, and Lawrence Carin. The dynamic hierarchical dirichlet process. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 824–831, New York, NY, USA, 2008. ACM.
-  Rossana Mastrandrea, Julie Fournet, and Alain Barrat. Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one, 10(9):e0136497, 2015.
-  Anmol Madan, Manuel Cebrian, Sai Moturu, Katayoun Farrahi, et al. Sensing the" health state" of a community. IEEE Pervasive Computing, 11(4):36–45, 2011.
-  Sitaram Asur, Srinivasan Parthasarathy, and Duygu Ucar. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(4):16, 2009.
-  Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1105–1112, 2009.
-  Charles Blundell, Jeff Beck, and Katherine A Heller. Modelling reciprocating relationships with hawkes processes. In Advances in Neural Information Processing Systems, pages 2600–2608, 2012.