Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data
We propose a novel class of network models for temporal dyadic interaction data. Our goal is to capture a number of important features often observed in social interactions: sparsity, degree heterogeneity, community structure and reciprocity. We propose a family of models based on self-exciting Hawkes point processes in which events depend on the history of the process. The key component is the conditional intensity function of the Hawkes Process, which captures the fact that interactions may arise as a response to past interactions (reciprocity), or due to shared interests between individuals (community structure). In order to capture the sparsity and degree heterogeneity, the base (non time dependent) part of the intensity function builds on compound random measures following Todeschini2016. We conduct experiments on a variety of real-world temporal interaction data and show that the proposed model outperforms many competing approaches for link prediction, and leads to interpretable parameters.
#3 #4 \beforenumber#3\afternumber \scalefont#3#4
There is a growing interest in modelling and understanding temporal dyadic interaction data. Temporal interaction data take the form of time-stamped triples indicating that an interaction occurred between individuals and at time . Interactions may be directed or undirected. Examples of such interaction data include commenting a post on an online social network, exchanging an email, or meeting in a coffee shop. An important challenge is to understand the underlying structure that underpins these interactions. To do so, it is important to develop statistical network models with interpretable parameters.
One important aspect to capture is the community structure of the interactions. Individuals are often affiliated to some latent communities (e.g. work, sport, etc.), and their affiliations determine their interactions: they are more likely to interact with individuals sharing the same interests than to individuals affiliated with different communities. An other important aspect is reciprocity. Many interactions are responses to recent interactions. For example, if Helen sends an email to Mary, then Mary is more likely to send an email to Helen shortly afterwards.
A number of papers have proposed statistical models to capture both community structure and reciprocity in temporal interaction data Blundell2012; Dubois2013; Linderman2014. They use models based on Hawkes processes for capturing reciprocity and stochastic block-models or latent feature models for capturing community structure.
In addition to the above two properties, it is important to capture the global properties of the interaction data. Interaction data are often sparse: only a small fraction of the pairs of nodes actually interact. Additionally, they typically exhibit high degree (number of interactions per node) heterogeneity: some individuals have a large number of interactions, whereas most individuals have very few, therefore resulting in empirical degree distributions being heavy-tailed. As shown by Karrer2011, Gopalan2013 and Todeschini2016, failing to account explicitly for degree heterogeneity in the model can have devastating consequences on the estimation of the latent structure.
Recently, two classes of statistical models, based on random measures, have been proposed to capture sparsity and power-law degree distribution in network data. The first one is the class of models based on exchangeable random measures (Caron2017; Veitch2015; Herlau2015; Borgs2016; Todeschini2016; Palla2016; Janson2017). The second one is the class of edge-exchangeable models (Crane2015; Crane2017; Cai2016; Williamson2016; Janson2017a; Ng2017). Both classes of models can handle both sparse and dense networks and, although the two constructions are different, connections have been highlighted between the two approaches (Cai2016; Janson2017a).
The objective of this paper is to propose a class of statistical models for temporal dyadic interaction data that can capture sparsity, degree heterogeneity, community structure and reciprocity. The model builds on both Hawkes processes and the (static) model of Todeschini2016 for sparse and modular graphs with overlapping community structure. The approach uses multivariate completely random measures, and can be seen as a natural extension of both models based on exchangeable random measures and edge-exchangeable models. What is more, our model can be seen as the generalisation of existing reciprocating relationships models Blundell2012 to the sparse and power-law regime. In Section 2, we present Hawkes processes and compound completely random measures which form the basis of our model’s construction. The statistical model for temporal dyadic data is presented in Section 3 and its properties derived in Section 4. The inference algorithm is described in Section 5. Section 6 presents experiments in which we show that this model outperforms alternative models regarding link prediction.
2 Background material
2.1 Hawkes processes
Let be a sequence of event times with , and let the subset of event times between time and time . Let denote the number of events between time and time , where if is true, and 0 otherwise. Assume that is a counting process with conditional intensity function , that is for any and any infinitesimal interval
The counting process is called a Hawkes process with self-excitation self_exc_HP if the conditional intensity function takes the form
where is the so-called base intensity and is a non-negative kernel function parameterised by . We will assume that the kernel satisfies for , for . This defines a process in which the current rate of events depends on the occurrence of past events. If admits a form of fast decay then this will result in strong local effects. However, if it prescribes a peak away from the origin then longer term effects are likely to occur. We consider here an exponential kernel
where , determines the sizes of the self-excited jumps and is the constant rate of exponential decay. The stationarity condition for this Hawkes process is . Figure 1 gives an illustration of a Hawkes process with exponential kernel and its conditional intensity.
2.2 Compound completely random measures
A homogeneous completely random measure (CRM) Kingman1967; Kingman1993 on , without fixed atoms nor deterministic component, takes the form
where are the points of a Poisson process on with mean measure where is a Lévy measure, is a locally bounded measure and is the delta dirac mass at . The homogeneous CRM is completely characterized by and , and we write , or simply when is taken to be the Lebesgue measure. Griffin2017 proposed a multivariate generalisation of CRMs, called compound CRM (CCRM). A compound CRM with independent scores is defined as
where and the scores are independently distributed from some probability distribution and is a CRM with mean measure . In the rest of this paper, we will assume that is a gamma distribution with parameters , is the Lebesgue measure and is the Lévy measure of a generalized gamma process
where and .
3 Statistical model for temporal interaction data
The temporal interaction data are of the form where represents an (undirected) interaction at time between nodes/individuals and . For example, the data may correspond to the exchange of messages between students on an online social network.
Our construction will model this using a point process on , by considering that each node is assigned some continuous label . Note that the labels are just used for the model construction, similarly to Caron2017; Todeschini2016, and these labels will not be observed nor inferred from the data. A point at location indicates that the nodes with labels interact at time . See Figure 2 for an illustration.
For each pair of nodes and with labels , let be the counting process defined by
which counts the number of interactions between nodes and in the time interval . We assume that, for each pair , the counting process is modelled by a self-exciting Hawkes process with conditional intensity function
where is the exponential kernel defined in Equation (3). New interactions between two individuals may arise as a response to past interactions through the kernel , or via the base intensity . We want to model assortativity so that individuals with similar interests are more likely to interact than individuals with different interest. We therefore assume that each node has a set of positive latent parameters , where can be interpreted as its level of affiliation to a latent community where the number of communities is assumed to be known. The base rate is then defined as
Two nodes with high levels of affiliation to the same communities will be more likely to interact than nodes with affiliation to different communities, favouring assortativity.
In order to capture sparsity and power-law properties and as in Todeschini2016, the set of affiliation parameters and node labels will be modelled using a compound CRM with gamma scores, that is
where the Lévy measure is defined by Equation (6), and for each node and community
The parameters is a degree correction for node and can be interpreted as measuring the overall popularity/sociability of a given node irrespective of its level of affiliation to the different communities. An individual with a high sociability parameter will be more likely to have interactions overall than individuals with low sociability parameters. The scores measure the level of affiliation of individual to the community . The model is defined on . We assume that we observe interactions over a subset where and tune both the number of nodes and number of interactions. The whole model is illustrated in Figure 2.
The model admits the following set of hyperparameters, which all tune a particular property of the model:
The hyperparameters where and of the kernel tune the reciprocity in the model.
The hyperparameters tune the community structure of the interactions. tunes the size of community while tunes the variability of the level of affiliation to this community; larger values imply more separated communities.
The hyperparameter tunes the sparsity and the degree heterogeneity: larger values imply higher sparsity and heterogeneity. It also tunes the slope of the degree distribution. Parameter tunes the exponential cut-off in the degree distribution. This is illustrated in Figure 3.
Finally, the hyperparameters and tune the overall number of interactions and nodes.
As in Blundell2012, we use uniform priors and . Following Todeschini2016 we set vague Gamma priors on , , , and . The right limit for time window, is considered known.
4.1 Connection to sparse vertex-exchangeable and edge-exchangeable models
The model presented here can be seen as a natural extension of sparse vertex-exchangeable and edge-exchangeable graph models. Let be a binary variable indicating if there is at least one interaction in between nodes and . We have
which corresponds to the probability of a connection in the static simple graph model proposed by Todeschini2016. Additionally, for fixed and (no reciprocal relationships), the model corresponds to a rank- extension of the rank-1 Poissonized version of edge-exchangeable models considered by Cai2016 and Janson2017. The sparsity properties of our model will follow from the sparsity properties of these two classes of models.
The size of the dataset is tuned by both and . Given these quantities, both the number of interactions and the number of nodes with at least one interaction are random variables. We now study the behaviour of these quantities, showing that the model exhibits sparsity. Define
be the overall number of interactions between nodes with label until time and
the total number of pairs of nodes with label who had at least one interaction before time , and
the number of nodes with label who had at least one interaction before time . We will refer to , and respectively as the number of interactions, number of edges and number of nodes.
The expected number of interactions , edges and nodes are given as follows:
where and is the multivariable Laplace exponent given by The multivariate Lévy measure is given by where is the pdf of a Gamma random variable with parameters and and is defined in Equation (6).
The proof of Theorem 1 for the case of , and follows the lines of Theorem 3 in Todeschini2016. Detailed proofs are given in the appendix.
We now consider the asymptotic behaviour of the expectations of , and as and go to infinity. In the following we will use the asymptotic notations , and which are defined as follows:
First consider the asymptotic behaviour for fixed and for that tends to infinity. We have
as tends to infinity. For , the number of edges and interactions grows quadratically with the number of nodes, and we are in the dense regime. When , the number of edges and interaction grows subquadratically, and we are in the sparse regime. Higher values of lead to higher sparsity. For fixed , we have
as tends to infinity. Sparsity in arises when for the number of edges and when for the number of interactions.
The derivation of the asymptotic behaviour of expectations of , and follows the lines of the proofs of Theorems 3 and 5.3 in Todeschini2016 and Lemma D.6 in the supplementary material of Cai2016 , and is omitted here.
5 Approximate Posterior Inference
Assume that we have observed a set of interactions between individuals over a period of time . We aim at inferring the parameters of the Hawkes kernel as well as the parameters and hyperparameters of the compound CRM model . Therefore, our objective is to approximate the posterior distribution . Using the representation of the Hawkes process as a cluster process, it is possible to derive a Gibbs sampler which targets this posterior distribution, using a data augmentation scheme which associates a latent variable to each interaction, similar to the approach taken in Rasmussen2013. However, such an algorithm may be slow to converge and not scale well with the number of observed interactions. Additionally, we would like to make use of existing code for posterior inference with Hawkes processes and graphs based on compound CRMs, and therefore propose a two-step approximate inference procedure, motivated by recent work on modular approximate Bayesian inference (Jacob2017).
Let be the adjacency matrix defined by if there is at least one interaction between and in the interval , and 0 otherwise. We have
The idea of the two-step procedure is to