Propagation of moments in Hawkes networks
The present paper provides a mathematical description of high-order moments of spiking activity in a recurrently-connected network of Hawkes processes. It extends previous studies that have explored the case of a (linear) Hawkes network driven by deterministic rate functions to the case of a stimulation by external inputs (rate functions or spike trains) with arbitrary correlation structure. Our approach describes the spatio-temporal filtering induced by the afferent and recurrent connectivities using operators of the input moments. This algebraic viewpoint provides intuition about how the network ingredients shape the input-output mapping for moments, as well as cumulants.
Immense efforts in neuroscience have been invested in measuring neuronal activity as well as the detailed connectivity between neurons. Such studies have been too often conducted separately, despite the fact that neuronal activity and synaptic connectivity are deeply intertwined. Indeed, the synaptic connectome determines the neuronal activity, while the latter reshapes the connectome through activity-dependent plasticity. To better understand the intricate link between activity and connectivity at the neuronal level, it is important to build tractable network models that relate one to the other.
In this paper we analytically compute the statistics of neuronal activity —described via moments and cumulants— from the connectivity. In particular, we investigate how the spiking statistics propagates from an input population of neurons to an output population of recurrently-connected neurons as a function of the synaptic kernels, see Fig. 1A.
In order to remain tractable, the neuronal activity is modeled using a Hawkes process [Hawkes 1971a, Hawkes 1971b], also known as (linear) Poisson neurons [Kempter, Gerstner, and Van Hemmen 1999] whose firing activity depends on upstream neurons as represented in Fig. 1B and C. Here we refer to the multivariate Hawkes process as Hawkes network. Hawkes’ model has attracted much attention in various disciplines such as neuroscience [Gilson, Burkitt, and van Hemmen 2010, Mei and Eisner 2017], artificial intelligence [Etesami, Kiyavash, Zhang, and Singhal 2016], seismology [Le 2018, Lima and Choi 2018], epidemiology [Saichev, Maillart, and Sornette 2013] and finance [Errais, Giesecke, and Goldberg 2010, Bacry, Mastromatteo, and Muzy 2015]. Due to the event-like nature of its activity, intrinsic correlations arise and reverberate as echoes induced by the recurrent connectivity. The present study builds upon Hawkes’ results that describe second-order correlations for mutually exciting point processes [Hawkes 1971a, Hawkes 1971b] and extends them to higher orders. In particular, we show how moments of arbitrary order propagate from one layer to the next.
The vast majority of studies focuses on the first and second orders of spiking statistics [Hawkes 1971a, Hawkes 1971b, Gilson, Burkitt, and van Hemmen 2010, Brémaud, Massoulié, and Ridolfi 2005, Tannenbaum and Burak 2017]. Up to our knowledge, only two recent studies have investigated higher-order cumulants [Jovanović, Hertz, and Rotter 2015, Ocker, Josić, Shea-Brown, and Buice 2017]. In the earliest [Jovanović, Hertz, and Rotter 2015], the authors derived a recursive algorithm based on the theory of branching Hawkes processes to calculate the cumulants for the spiking activity. The second study [Ocker, Josić, Shea-Brown, and Buice 2017] relies on path-integral representation to explore the cumulants, which are closely related to moments, for Hawkes process with possible non-linearities. If the path-integral representation derived from field theory is adequate to tackle non-linearities, it provide limited intuition on the geometrical aspect for the transmission of spiking density across layers. A common limitation of both studies is that they provide little intuition about how the moments propagate in neuronal networks, which we aim to address here. Moreover, the case of neurons stimulated by inputs with correlated activity has not been explored yet for larger-than-second orders.
A motivation for investigating higher-than-second orders of correlations in Hawkes networks comes from the study of spike-timing dependent plasticity (STDP). The established formula [Hawkes 1971a] is sufficient to analyze in recurrently-connected networks the effect of the so-called pairwise STDP: As the synaptic weights between neurons are modified depending on the time difference between input and output spikes, the overall effect can be captured by the spiking covariances [Gilson, Burkitt, Grayden, Thomas, and van Hemmen 2009a, Gilson, Burkitt, Grayden, Thomas, and van Hemmen 2009b, Pfister and Tass 2010]. However, the more elaborate model of triplet STDP [Pfister and Gerstner 2006, Gjorgjieva, Clopath, Audet, and Pfister 2011] requires the knowledge about the third order of the spike statistics, involving input-output-output spikes. To gain intuition, a key is understanding how the synaptic connectivity shapes the input correlation structure in a network as illustrated in Fig. 1A.
Another motivation is that, although pairwise correlations have been argued to be sufficient to represent experimental data [Barreiro, Gjorgjieva, Rieke, and Shea-Brown 2014], this view has been recently challenged and mechanisms related to higher-order correlations have been found to improve descriptive statistical models [Shimazaki, Sadeghi, Ishikawa, Ikegaya, and Toyoizumi 2015]. In dynamic neuron models, even though population mean-field dynamics can be captured by non-spiking models [Helias, Tetzlaff, and Diesmann 2013, Grytskyy, Tetzlaff, Diesmann, and Helias 2013], networks with realistic sizes exhibit finite-size effects in their pairwise correlations [van Albada, Helias, and Diesmann 2015]. This calls for analytical techniques to evaluate those at arbitrary orders, as was done recently for binary neurons [Dahmen, Bos, and Helias 2016].
This led us to investigate a general solution for the spatio-temporal correlation structure via moments of arbitrary orders in Hawkes processes as a function of the moments in the input population. Our results are structured around three theorems. The first one describes how moments (of arbitrary orders) propagate in feedforward networks, thereby generalizing the results by [Kempter, Gerstner, and Van Hemmen 1999]. The second theorem describes the effect of recurrent connectivitywithin the output population, extending [Gilson, Burkitt, Grayden, Thomas, and van Hemmen 2009b, Pfister and Tass 2010]. The last theorem translates the mappings for moments into mappings for cumulants, in line with a recent line of work [Jovanović, Hertz, and Rotter 2015, Ocker, Josić, Shea-Brown, and Buice 2017].
Let us consider an input population of neurons whose spiking activity (superposition of Dirac deltas at spike times) is denoted by the vector of functions . This input population together with some stochastic input rates feed a network (output) population of neurons denoted by which are also a superposition of Dirac deltas.
Definition 1 (Hawkes Process)
The Hawkes process is a point-emission process whose intensity depends upon both the past input spiking activity and its own past spiking activity with as well as on a time-dependent (possibly stochastic) rate :
where is a matrix of “synaptic” kernels, made of functions that describe the causal effect from the input neuron on the network neuron . These functions are equal to zero for all . Similarly is a matrix of kernels , each corresponding to the recurrent interraction from neuron to neuron . Note that the convolution operator is a matrix convolution; see Eq. (5) below. In this paper we omit the summation symbol in line with Einstein’s convention for tensor calculus. All kernel functions and spontaneous rate functions are assumed to be positive-valued.
Let be the counting process associated with the spiking activity , which gives the number of spikes from to for the network neuron . The increment of the counting process in an infinitesimally small bin size is given by
As a consequence, we have , hence .
Remark 1 (Atomic contributions and contraction of indices)
Note that for an infinitesimally small , the increment can take only 2 values: 0 or 1. In that case, we have for any ,
Following, atomic contributions arise from the point-process nature of spike trains when taking expectations of products of for all possible redundancies in the time variables together with the “spatial” coordinates [Daley and Vere-Jones 1988]. For the example of the 2nd-order moment, it corresponds to the twofold condition and , which leads to the second term in
In the remainder we refer to terms involving Kronecker deltas and Dirac delta as contractions of indices (here and ).
Definition 2 (Matrix convolution)
In Eq. (1), the standard convolution is extended to a matrix form, which involves a matrix multiplication. For the kernel matrix and vector , the element of the matrix convolution is given by
Definition 3 (Moments of order )
Let denote a set of coordinates . The moment of order of the input population evaluated at times is defined as
Similarly, the moment of order of the output population for the coordinates and the time variables is defined as
Note that the mathematical expectation corresponds to three sources of stochasticity, as indicated by the superscript. Note that, due to the recurrent connectivity, the dependency of on itself also concerns the past activity.
Remark 2 (Symmetry of moments)
The moments and have many symmetries. For the example of the input moments, any permutation of such that the transformed coordinates and leaves invariant:
Definition 4 (Generalized Spatio-Temporal Delta Function)
Let be the generalized delta function defined for the set of coordinates and times , which combines the Kronecker and Dirac delta functions as
Note that for , one recovers the product of the standard Kronecker delta with the Dirac delta: . Note also that when the lower index is omitted, we will assume that (i.e. single neuron case).
Example 1 (Moment for a single spike train with oscillatory firing rate)
Before presenting the general result, we provide an illustrative example to fix ideas and help the reader with concepts and notation.
For a single (input) neuron driven by a deterministic rate function , the contraction in the 2nd-order moment corresponds to the condition without “spatial” coordinates here, simplifying Eq. (4):
The 3rd-order moment for a single spike train is given by
This expression exhibits two “extreme” cases where all time variables are equal corresponding to the two Dirac delta for the partition , and where they are all distinct giving for . In addition, the three remaining terms involve a contraction for 2 out of the 3 variables.
Fig. 2 illustrates the moments for and with a single spike train driven by an oscillatory firing rate. Note that “spatial” coordinates in the above equation are simply ignored, together with the Kronecker deltas. Fig. 2B, C and E highlight the atomic contributions along the various “diagonals” where the time variables coincide. Away from those subspaces, the spike densities are much lower, as can be seen in the scaling of values in the middle and right plots of Fig. 2C and E. Note that the main diagonal for is slightly larger than that for , as autocorrelation effects cumulate.
Proposition 1 (Moments for inputs driven by deterministic rate functions)
Let denote the set of all partitions of the set . If each input neuron is independent from each other and the firing rate of neuron is given by , then the input moment of order over the coordinates at times can be expressed as
where spans the disjoint subsets of whose union is , with and . In addition, the instantaneous firing rate appears with a representative index, here taken as the minimum . Recall the convention when is a singleton.
Proof of Proposition 1:
Eq. (12) can be obtained using the moment generating function [Daley and Vere-Jones 1988, Ocker, Josić, Shea-Brown, and Buice 2017], via its derivative for order . Here we provide a proof by induction, which highlights the key observation that every combination of contractions can be described by a partition.
Let assume that Eq. (12) is valid for all orders . Now considering the order with given coordinates and time variables in , we denote by the set of order indices in such that coordinates and times are identical to their counterpart for , namely . Using the probabilistic independence as before, we can write:
In the second line, we have used the hypothesis for order where is the number of elements in for the indices that are not in , as well as the contraction for all elements in . The previous expression is valid for each containing , which is determined by and . We conclude by observing that the above dichotomy of partitions actually spans the whole set :
which accounts for all possible configurations of and . This is also related to the decomposition of the Bell number —giving the number of partitions — in the sum of the Stirling numbers of the second kind —giving the number of partitions that have groups. They satisfy the relationship for all (corresponding to the above dichotomy), as well as the “boundary” condition when or .
2.1 Network with afferent connectivity
Now that we have introduced definitions and concepts that will be useful to characterize the high-order moments, we turn to the case of a network with afferent connections, but no recurrent connections. The following theorem is the first of our two core results. We denote the driving rate function of the network neurons that lumps together the spontaneous activity and the input influx by
Definition 5 (Moment for the input rate )
Considering the stochastic spontaneous rate function (e.g., a Cox process), we define the corresponding moment of order for the coordinates at times as
Definition 6 (Tensor convolution operator)
Let be a matrix of kernels. We define the -dimensional tensor that replicates the matrix for all pairs of indices :
with , and . For a -order tensor with coordinates , the tensor convolution between and evaluated at times gives the following tensor of oreder :
The second line is a reformulation to stress that the convolutions of are applied on each of the dimensions —as indicated above each asterisk— on the tensor , followed by the summation for the tensor product (similar to a matrix product), in line with the definition in Eq. (5).
In essence, this convolution operator involves the same joint “multiplication” on paired spatial and temporal dimensions (related to and , the temporal convolution being seen as a function multiplication operator) as the matrix convolution in Eq. (5), but extended on all dimensions of the tensor. In particular, this operation is linear.
Definition 7 (Moment for the filtered inputs)
As with the spontaneous rate , we define the following moments of the input filtered by the afferent kernels :
with , and .
Note that this definition implicitly involves the averaging over the statistics of the inputs , that is .
Definition 8 (Moment symmetrical expansion operator)
Let us consider two tensors of order and , say with coordinates and as well as with coordinates and . For any given , we define the following tensor operation that constructs a moment of order with and from the tensors and of smaller orders and :
Here we have defined , the set of minima for the groups in the partition . By convention, the 0-order tensors are valued 1 when or .
Eq. (20) uses contractions to augment the order of the combinations of tensors and from to with all possible symmetries. In particular, if and are symmetric tensors (see Remark 2) with respect to all their own dimensions, the output of is symmetric as well.
Theorem 1 (Input-output mapping for afferent connectivity)
Consider an uncoupled Hawkes network (definition 1) whose neurons are excited by both inputs (via afferent connections) and spontaneous rate , which are probabilistically independent. The moment of order of of the network population depends on all smaller-order moments of the input population as well as moments for the spontaneous firing rate :
Proof of Theorem 1:
Provided the statistics of inputs and the spontaneous rate is known, the spiking activity of the network neurons is determined by the driving rate function in Eq. (15). Similar to Eq. (12) in Proposition 1, the Poisson nature of the spiking of the network neurons thus gives the following expression for the unconnected neurons with spike trains :
In the previous expression, the contractions basically extend the moment of smaller order for the driving rate to the order .
The product involving the sum of gives terms with being the number of elements in . Now we develop this product to isolate the contributions originating from the input moments of the same order, as well as with the spontaneous rate. To this end, we define the subset of indices belonging to that concern input neurons in Eq. (22), while is the subset of indices that concern . This gives
From the second line to the last line, we have swapped the summation terms of the partitions and the decomposition of in two subsets. The important point here is to understand that the construction of from and exactly spans the whole set of partitions . Note that and can be empty sets. Finally, we simply group the subsets of the same size , and similarly of the same size :
When the network of unconnected neurons is not driven by an spontaneous rate (), the moment expansion operator in Eq. (20) can be simplified with as
which means that
Conversely, in the absence of spiking inputs () and when the driving rates are deterministic, the moments simply come from the multiplication of the rate functions:
2.2 Network with recurrent connectivity
The last step is to consider connections determined by between the network neurons, the second half of our core result.
Definition 9 (Effective recurrent kernel)
Let denote the effective recurrent kernel and be defined as
is the order convolution.
Recall that the convolution is defined for kernel matrices, see Eq. (5). Because for and all pairs (due to the causality requirement), as well for .
The effective recurrent kernel satisfies the following self-consistency equation:
where . Therefore, can be thought as the inverse of for the convolution operator.
Proof of Property 1:
By convolving the kernel with the effective recurrent kernel , we find (omitting the time variables)
which we reorganize to factorize , obtaining Eq. (30).
Example 2 (Single neuron with self-connection and with spontaneous rate )
We firstly present an illustrative version of our proof by induction for a single neuron with self-feedback and driven by a deterministic rate in the cases . In this example as there is no other source of stochasticity. Note that corresponds to Hawkes’ results [Hawkes 1971a] with moments instead of (auto)covariances. The motivation is providing a concrete case for stepping from orders to , which is formalized in the proof below.
Cases and :
The first-order moment for corresponds to the mean firing rate and can be calculated from the spontaneous rate function by solving the self-consistency equation given by the second line of Eq. (1) using the equality for the instantaneous firing rate :
For the second order, the point is to take into account the effects of spikes upon the future spiking probability, with the effect of the self-feedback loop. Assuming (purple semi-plane in Fig. 3A), we can develop in using Eq. (1). This holds because the rate function requires the knowledge of past spiking activity with , as illustrated by the blue arrow in Fig. 3A, moving toward the diagonal . This development gives
Note that is inside the angular brackets on the right-hand side of the first line, because and are not independent, when the difference in the time variables lies within the range of . The last line is simply a rewriting using a specific notation with a line above multivariate functions to indicate the order of the functions with respect to the time variables, which will be useful for this example. In addition, we use the notation introduce in Eq. (18) where indicates the convolution performed on the second time variable and the Dirac delta is a redundant expression as a function of , while keeping the information about .
The solution must satisfy Eq. (33) for all , which is a Wiener-Hopf equation. The atomic contribution (Dirac delta) acts as a “boundary condition” when . Our strategy is the following: we propose a solution for the moment of order and verify that it satisfies the required Eq. (33). As the solution is fully symmetric in and , this implies that the solution is also valid on the complementary space , being eventually valid for all . The putative 2nd-order moment is:
Note that our notation does not require the time variables, allowing for compact writing. We use the equality in Eq. (30) on to obtain
For the first term of the right-hand side in the upper line, the convolution by on the second variable has been rewritten by moving out, while the rest is in fact in Eq. (34). In the second term, the convolution by the Dirac on and we obtain two terms involving , see the solution for the 1st-order moment in Eq. (32). Together, these three terms are the right-hand side of Eq. (33), which is thus satisfied.
Note also that for (reflecting causality of the overall “feedback’ kernel), which implies that the operator applied on the 2-dimensional function under the overline only “spreads” the function mass towards future (see Fig. 3B).
Following the previous section, we extend the calculations to the case in order to prepare for the generalization to arbitrary . As with , we consider the ordering (purple subspace in Fig. 3C), which allows the development of the third time variable as was done in Eq. (33)
with the Dirac corresponding to the “boundary condition” when , corresponding to the “lower” tilted plane of the purple subspace to which points the blue arrow in Fig. 3B. Note that this involves only the atomic contribution ( is in corresponding to ), the other alone is not possible in this space. See also the discussion in Example 1 for the second-order input moments. Now we pursue the calculations without the time variables in arguments, as before for . The putative symmetric solution is
which involves the contractions for all partitions of , in a similar fashion to Eq. (21). We use again Eq. (30) as in Eq. (35) to obtain the convolution of with on and regroup the other terms where the convolution with vanishes because of the Dirac in order to use the expression of the 2nd-order moment in Eq. (34), namely :
What remains to be seen is that the condition implies that always: when , in fact we have , which corresponds to . This means that the last term in Eq. (38) vanishes and Eq. (36) is satisfied. The symmetry argument ensures the validity over all , as will be formalized below.
The upper plot in Fig. 4A illustrates that the rhythm of the output spiking is altered by the recurrent self-connection. This comes from the fact that, for an excitatory self-connection, output spikes momentarily increase the firing rate, as can be seen when comparing the green curve with the dotted black curve in the bottom plot. The output first-order moment in Fig. 4B (solid gray curve for the simulation and dashed black curve for the prediction) is above the input first-order moment related to the underlying firing rate (dotted black curve). Note also the shift to later time.
The decomposition of the second-order moment in Fig. 4C illustrates that the effect of autocorrelations (right plot) spreads from the diagonal due to the self-connection. The main diagonal for in Fig. 4D has larger values than the curve for in Fig. 4B. In Fig. 4E, the main diagonal for (cyan curve) is even larger, indicating that effects due to autocorrelation cumulate (as for input moments in Fig. 2).
The slices of the output third-order moment in Fig. 4E has different scales, but note the high spike density along the diagonal of the red matrix, due to the spreading of atomic contributions by the recurrent kernel .
Theorem 2 (Input-output mapping for recurrent connectivity)
The moment of order of the Hawkes process (definition 1) of the network population can be expressed as
The effects of the recurrent connectivity on the input moments are determined by spatio-temporal filtering described by the effective recurrent kernel defined similarly to Eq. (17) on the moment for uncoupled neurons in Eq. (21).
Proof of Theorem 2:
where the conditioning is over the input activity and the spontaneous activity . Note that the statistical averaging over and of the conditional moment gives the (unconditional) moment defined in Eq. (7): . To demonstrate Eq. (39), we prove by induction the following result on , which straightforwardly leads to the expression in Theorem 2 by taking the same statistical averaging over and as done above: