Infinite systems of interacting chains with memory of variable length - a stochastic model for biological neural nets

# Infinite systems of interacting chains with memory of variable length - a stochastic model for biological neural nets

## Abstract

We consider a new class of non Markovian processes with a countable number of interacting components. At each time unit, each component can take two values, indicating if it has a spike or not at this precise moment. The system evolves as follows. For each component, the probability of having a spike at the next time unit depends on the entire time evolution of the system after the last spike time of the component. This class of systems extends in a non trivial way both the interacting particle systems, which are Markovian, and the stochastic chains with memory of variable length which have finite state space. These features make it suitable to describe the time evolution of biological neural systems. We construct a stationary version of the process by using a probabilistic tool which is a Kalikow-type decomposition either in random environment or in space-time. This construction implies uniqueness of the stationary process. Finally we consider the case where the interactions between components are given by a critical directed Erdös-Rényi-type random graph with a large but finite number of components. In this framework we obtain an explicit upper-bound for the correlation between successive inter-spike intervals which is compatible with previous empirical findings.

Dedicated to Errico Presutti, frateddu e mastru

Key words : Biological neural nets, interacting particle systems, chains of infinite memory, chains of variable length memory, Hawkes process, Kalikow-decomposition.

AMS Classification : 60K35, 60G99

## 1 Introduction

A biological neural system has the following characteristics. It is a system with a huge (about ) number of interacting components, the neurons. This system evolves in time, and its time evolution is not described by a Markov process (Cessac 2011). In particular, the times between successive spikes of a single neuron are not exponentially distributed (see, for instance, Brillinger 1988).

This is the motivation for the introduction of the class of models that we consider in the present paper. To cope with the problem of the large number of components it seems natural to consider infinite systems with a countable number of components. In this new class of stochastic systems, each component depends on a variable length portion of the history. Namely, the spiking probability of a given neuron depends on the accumulated activity of the system after its last spike time. This implies that the system is not Markovian. The time evolution of each single neuron looks like a stochastic chain with memory of variable length, even if the influence from the past is actually of infinite order. This class of systems represents a non trivial extension of the class of interacting particle systems introduced in 1970 by Spitzer. It is also a non trivial extension of the class of stochastic chains with memory of variable length introduced in 1983 by Rissanen.

The particular type of dependence from the past considered here is motivated both by empirical as well as theoretical considerations.

From a theoretical point of view, Cessac (2011) suggested the same kind of dependence from the past. In the framework of leaky integrate and fire models, he considers a system with a finite number of membrane potential processes. The image of this process in which only the spike times are recorded is a stochastic chain of infinite order where each neuron has to look back into the past until its last spike time. Cessac’s process is a finite dimensional version of the model considered here.

Finite systems of point processes in discrete or continuous time aiming to describe biological neural systems have a long history whose starting points are probably Hawkes (1971) from a probabilistic point of view and Brillinger (1988) from a statistical point of view, see also the interesting paper by Krumin et al. (2010) for a review of the statistical aspects. For non-linear Hawkes processes, but in the frame of a finite number of components, Brémaud and Massoulié (1994) address the problem of existence, uniqueness and stability. Møller and coauthors propose a perfect simulation algorithm in the linear case, see Møller and Rasmussen (2005). In spite of the great interest in Hawkes processes during the last years, especially in association with modeling problems in finance and biology, all the studies are reduced to the case of systems with a finite number of components. Here we propose a new approach which enables us to deal also with infinite systems with a countable number of components, without any assumption of the type linearity or attractiveness.

This paper is organized as follows. In Section 2 we state two Theorems proving the existence and uniqueness of infinite systems of interacting chains with memory of variable length, under suitable conditions. Our main technical tool is a Kalikow-type decomposition of the infinite order transition probabilities which is a non trivial extension of previous results of the authors in the case of Markovian systems, cf. Galves et al. (2013). The decomposition considered here has two major differences with respect to what has been done before. Firstly this is due to the non-Markovian nature of the system. Secondly, and most importantly, the structure of the transition laws leads to the need of either considering a decomposition depending on a random environment or considering a space-time decomposition. Using the Kalikow-type decomposition we prove the existence, the uniqueness as well as a property of loss of memory of the stationary process.

In Section 3 we study the correlation between successive inter-spike intervals (ISI). This aims at explaining empirical results presented in the neuroscientific literature. Gerstner and Kistler (2002), quoting Goldberg et al. (1964), observe that in many experimental setups the empirical correlation between successive inter-spike intervals is very small “indicating that a description of spiking as a stationary renewal process is a good approximation”. However, Nawrot et al. (2007) find statistical evidence that neighboring inter-spike intervals are correlated, having negative correlation. We show that we can account for these apparently contradictory facts within our model. This requires the introduction of a new setup in which the synaptic weights define a critical directed Erdös-Rényi random graph with a large but finite number of components. We obtain in Theorem 3 an explicit upper bound for the correlations involving the number of components of the system, as well as the typical length of one inter-spike interval. For a system having a large number of components, our result is compatible with the discussion in Gerstner and Kistler (2002). Gerstner and Kistler (2002) deduce from this that spiking can be described by a renewal process. However, for systems with a small number of components, the correlation might as well be quite big, as reported by Nawrot et al. (2007) who show that neighboring inter-spike intervals are negatively correlated. Therefore, both features are captured by our model, depending on the scale we are working in.

The proofs of all the results are presented in Sections 4, 5 and 6.

## 2 Systems of interacting chains with memory of variable length: Existence, uniqueness and loss of memory

We consider a stochastic chain taking values in for some countable set of neurons defined on a suitable probability space For each neuron at each time if neuron has a spike at that time , and otherwise. The global configuration of neurons at time is denoted We define the filtration

 Ft=σ(Xs,s∈Z,s≤t),t∈Z.

For each neuron and each time let

 Lit=sup{s

be the last spike time of neuron strictly before time We introduce a family of “synaptic” weights for for all is the “synaptic weight of neuron on neuron ”. We suppose that the synaptic weights have the following property of uniform summability.

 supi∈I∑j|Wj→i|<∞. (2.2)

Now we are ready to introduce the dynamics of our process. At each time conditionally on the whole past, sites update independently. This means that for any finite subset we have

 P(Xt(i)=ai,i∈J|Ft−1)=∏i∈JP(Xt(i)=ai|Ft−1). (2.3)

Moreover, the probability of having a spike in neuron at time is given by

 P(Xt(i)=1|Ft−1)=ϕi⎛⎜⎝∑jWj→it−1∑s=Litgj(t−s)Xs(j),t−Lit⎞⎟⎠, (2.4)

where and are measurable functions for all We assume that is uniformly Lipschitz continuous, i.e. there exists a positive constant such that for all

 |ϕi(s,n)−ϕi(s′,n)|≤γ|s−s′|. (2.5)

Observe that in the case where the function is increasing with respect to the first coordinate, the contribution of components is either excitatory or inhibitory, depending on the sign of This is reminiscent of the situation in biological neural nets in which neurons can either stimulate or inhibit the expression of other neurons.

It is natural to ask if there exists at least one stationary chain which is consistent with the above dynamics, and if so, if this process is unique. In what follows we shall construct a probability measure on the configuration space of all space-time configurations of spike trains, equipped with its natural sigma algebra On this probability space, we consider the canonical chain where for each neuron and each time is the projection of onto the coordinate of

For each neuron we introduce

 V⋅→i={j∈I,j≠i:Wj→i≠0},

the set of all neurons that have a direct influence on neuron Notice that in our model, can be both finite or infinite. We fix a growing sequence of subsets of such that if and

We consider two types of systems. The first system incorporates spontaneous spike times, see Condition (2.6) below. These spontaneous spikes can be interpreted as external stimulus or, alternatively, as autonomous activity of the brain. The existence and uniqueness of this class is granted in our first theorem.

###### Theorem 1

[Existence and uniqueness in systems with spontaneous spikes]
Grant conditions (2.2) and (2.5). Assume that the functions and satisfy moreover the following assumptions:

1. There exists such that for all

 ϕi(s,n)≥δ. (2.6)
2. We have that

 G(1)+∞∑n=2(1−δ)n−2n2G(n)<∞, (2.7)

where and where is as in condition 1.

3. We have fast decay of the synaptic weights, i.e.

 supi∑k≥1|Vi(k)|⎛⎝∑j∉Vi(k−1)|Wj→i|⎞⎠<∞. (2.8)

Then under these Conditions (2.6)–(2.8), there exists a critical parameter such that for any there exists a unique probability measure on under which the canonical chain satisfies (2.3) and (2.4).

###### Remark 1

The stochastic chain introduced in Theorem 1 is a chain having memory of infinite order (cf. Doeblin and Fortet 1937, Harris 1955, Berbee 1987, Bressaud, Fernández and Galves 1999, Johannson and Öberg 2003 and Fernández and Maillard 2004). The setup we consider here extends what has been done in the above cited papers. First of all, the chain we consider takes values in the infinite state space Moreover, in Theorem 1 no summability assumption is imposed on the functions In particular, the choice is possible. This implies that the specification of the chain is not continuous. More precisely, introducing

 p(i,t)(1|x)=ϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)xj(j),t−Lit⎞⎟⎠,

where we have that

 supx,y:x=y\small on Vi(k)×[t−k,t−1]|p(i,t)(1|x)−p(i,t)(1|y)|↛0 as k→∞

in the case for all which can be seen by taking configurations and such that and A similar type of discontinuity has been considered in Gallo (2011) for stochastic chains with memory of variable length taking values in a finite alphabet.

As an illustration of Theorem 1 we give the following example of a system with interactions of infinite range.

###### Example 1

We give an example of a system satisfying the assumptions of Theorem 1. Take for all and for some fixed where is the norm of In this case, if we choose we have and condition (2.8) is satisfied, since

 ∑k≥1|Vi(k)|⎛⎝∑j∉Vi(k−1)|Wj→i|⎞⎠=∑k≥1(k+1)d∞∑l=kcard{j:∥j−i∥1=l}1l2d+α≤C(d)∑k≥1(k+1)d∞∑l=kld−1l2d+α≤C(d)d+α∑k≥1(k+1)d(k−1)d+α<∞,

as

The next theorem deals with the second type of system. Now we don’t assume a minimal spiking rate. But additionally to the fast decay of the synaptic weights we also assume a sufficiently fast decay of the aging factor see Condition (2.9) below. This additional assumption implies that the specification of the chain is continuous. This is the main difference with the setup of Theorem 1.

###### Theorem 2

[Existence and uniqueness in systems with uniformly summable memory] Suppose that does not depend on Assume conditions (2.2) and (2.5) and suppose moreover that

 supi∑k≥0(k+1)⋅|Vi(k)|⎛⎝∑j∉Vi(k−1)|Wj→i|∞∑n=1gj(n)+∑j∈Vi(k−1)|Wj→i|∞∑n=k∨1gj(n)⎞⎠<1γ, (2.9)

where is given in (2.5).

Then there exists a unique probability measure on such that under the canonical chain satisfies (2.3) and (2.4).

Now, for any let the trajectory of between times and As a byproduct of the proof of Theorems 1 and 2 we obtain the following loss of memory property.

###### Corollary 1
1. Under the assumptions of either Theorem 1 or Theorem 2, there exists a non increasing function such that for any the following holds. For all for all bounded measurable functions

 ∣∣E[f(Xts(i))|F0]−E[f(Xts(i))]∣∣≤(t−s+1)∥f∥∞φ(s). (2.10)

Moreover, for some fixed constant

2. Under the assumptions of Theorem 2, suppose moreover that there exists a constant such that

 gj(n)≤Ce−βn and supi∑j∉Vi(n)|Wi→j|≤Ce−βn, (2.11)

for all for some

Then there exists a critical parameter such that if (2.10) holds with

 φ(s)=Cϱs for some ϱ∈]0,1[ depending only % on β. (2.12)

The proof of Theorem 1 is given in Section 4 below. It is based on a conditional Kalikow-type decomposition of the transition probabilities where we decompose with respect to all possible interaction neighborhoods of site A main ingredient is the construction of an associated branching process in random environment. The setup of Theorem 2 is conceptually less difficult, since in this case the transition probabilities are continuous. This follows from the summability of the aging factors The proof of Theorem 2 relies on a space-time Kalikow-type decomposition presented in Section 5.

###### Remark 2

The proofs of both Theorem 1 and Theorem 2 imply the existence of a perfect simulation algorithm of the stochastic chain By a perfect simulation algorithm we mean a simulation which samples in a finite space-time window precisely from the stationary law In the setup of Theorem 2 the simulation can be implemented analogously to what is presented in Galves et al. (2013). The setup of Theorem 1 requires a conditional approach, conditionally on the realization of the spontaneous spike times.

## 3 Correlations between inter-spike intervals in the critical directed Erdös-Rényi random graph

We consider a finite system consisting of a large number of neurons with random synaptic weights The sequence is a sequence of i.i.d. Bernoulli random variables defined on some probability space with parameter i.e.

 ~P(Wi→j=1)=1−~P(Wi→j=0)=pN,

where

 pN=λ/N and λ=1+ϑ/N for some 0<ϑ<∞. (3.13)

If we represent this as a directed graph where the directed link is present if and only if we obtain what is called a “critical directed Erdös-Rényi random graph”. For a general reference on random graphs we refer the reader to the classical book by Bollobás (2001).

We put for all Notice that the synaptic weights and are distinct and independent random variables. Conditionally on the choice of the connectivities the dynamics of the chain are then given by

 PW(Xt(i)=1|Ft−1)=ϕi(∑jWj→it−1∑s=Litgj(t−s)Xs(j)).

Here we suppose that is a function only of the accumulated and weighted number of spikes coming from interacting neurons, but does not depend directly on the time elapsed since the last spike.

denotes the conditional law of the process, conditioned on the choice of We write for the annealed law where we also average with respect to the random weights, i.e. where denotes the expectation with respect to

Fix a neuron and consider its associated sequence of successive spike times

 …

where

 Si1=inf{t≥1:Xt(i)=1},…,Sin=inf{t>Sin−1:Xt(i)=1},n≥2,

and

 Si0=sup{t≤0:Xt(i)=1},…,Si−n=sup{t

Let us fix We are interested in the covariance between successive inter-spike intervals

 CovW(Sik+1−Sik,Sik−Sik−1)=EW[(Sik+1−Sik)(Sik−Sik−1)]−EW(Sik+1−Sik)EW(Sik−Sik−1),

for any Since the process is stationary, the above covariance does not depend on the particular choice of The next theorem shows that neighboring inter-spike intervals are asymptotically uncorrelated as the number of neurons tends to infinity.

###### Theorem 3

Assume that (2.5), (2.6) and (2.7) are satisfied. Then there exists a measurable subset such that on

 |CovW(Si3−Si2,Si2−Si1)|≤3δ2N(1−δ)√N,

where is the lower bound appearing in Condition (2.6). Moreover,

 P(Ac)≤e2ϑN−1/2.

For large if the graph of synaptic weights belongs to the “good” set the above result is compatible with the discussion in Gerstner and Kistler (2002). Gerstner and Kistler (2002) deduce from this that spiking can be described by a renewal process. However, for small or on the correlation might as well be quite big, as reported by Nawrot et al. (2007) who show that neighboring inter-spike intervals are negatively correlated. Therefore, both features are captured by our model, depending on the scale we are working in.

The proof of the above theorem is given in Section 6 below.

## 4 Conditional Kalikow-type decomposition and proof of Theorem 1

In order to prove existence and uniqueness of a chain consistent with (2.3) and (2.4), we introduce a Kalikow-type decomposition of the infinite order transition probabilities. This type of decomposition was considered in the papers by Ferrari, Maass, Martínez and Ney (2000), Comets, Fernández and Ferrari (2002) and Galves et al. (2013). All these papers deal with the case in which the transition probabilities are continuous. This is not the case here. We are dealing with a more general case in which the transition probabilities might as well be discontinuous, see the discussion in Remark 1. This makes our approach new and interesting by itself. The new ingredient is a construction of a decomposition depending on a random environment. This random environment is given by the realization of the spontaneous spikes.

More precisely, Condition (2.6) allows to introduce a sequence of i.i.d. Bernoulli random variables of parameter such that positions and times with are spike times for any realization of the chain. We call these times “spontaneous spike times”. We work conditionally on the choice of In particular we will restrict everything to the state space

 Sξ={x∈{0,1}I×Z:xt(i)≥ξt(i),∀i∈I,t∈Z}

which is the space of all configurations compatible with i.e. all neural systems such that every spike time of is also a spike time of We write

 Rit=sup{s

for the last spontaneous spike time of neuron before time Moreover, for we put

Consider a couple with In order to prove Theorem 1 we need to introduce the following quantities which depend on the realization of namely

 r[−1](i,t)(1)=infx∈Sξϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)xs(j),t−Lit(x)⎞⎟⎠, (4.16)

which is the minimal probability that neuron spikes at time uniformly with respect to all configurations, and

 r[−1](i,t)(0)=infx∈Sξ⎡⎢⎣1−ϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)xs(j),t−Lit(x)⎞⎟⎠⎤⎥⎦, (4.17)

which is the minimal probability that neuron does not spike at time

Notice that for all Hence in the above formulas, only a finite time window of the configuration is used, and uniformly in this time window is contained in In particular the above quantities are well-defined.

Now fix For any we write for short for the space-time configuration

 xt−1Lit(Vi(k))=(xs(j):Lit≤s≤t−1,j∈Vi(k))

and put

 r[k](i,t)(1|xt−1Lit(Vi(k)))=infz∈Sξ:z(Vi(k))=x(Vi(k))ϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)zs(j),t−Lit(x)⎞⎟⎠, (4.18)
 r[k](i,t)(0|xt−1Lit(Vi(k)))= infz∈Sξ:z(Vi(k))=x(Vi(k))⎡⎢⎣1−ϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)zs(j),t−Lit(x)⎞⎟⎠⎤⎥⎦. (4.19)

In what follows and whenever there is no danger of ambiguity, we will write for short and or instead of and

We put

 α(i,t)(−1)=λ(i,t)(−1)=1∑a=0r[−1](i,t)(a), (4.20)
 α(i,t)(k)=infx∈Sξ(1∑a=0r[k](i,t)(a|x(Vi(k)))),k≥0, (4.21)

and

 λ(i,t)(k)=α(i,t)(k)−α(i,t)(k−1),k≥0. (4.22)

Note that and that almost surely with respect to the realization of

###### Remark 3

The are random variables that depend only on the realization of More precisely, for any and is measurable with respect to the sigma-algebra We write in order to emphasize the dependance on the external field

We introduce the short hand notation

 p(i,t)(1|x)=ϕi⎛⎜⎝∑jWj→it−1∑s=Lit(x)gj(t−s)xs(j),t−Lit(x)⎞⎟⎠ (4.23)

and

 p(i,t)(0|x)=1−p(i,t)(1|x).

The proof of Theorem 1 is based on the following proposition.

###### Proposition 1

Given for all with there exists a family of conditional probabilities satisfying the following properties.

1. For all depends only on the variables

2. For all

3. For all is a measurable random variable.

4. For all we have the following convex decomposition.

 p(i,t)(a|x)=λ(i,t)(−1)p[−1],ξ(i,t)(a)+∑k≥0λ(i,t)(k)p[k],ξ(i,t)(a|x(Vi(k))), (4.24)

where

 p[−1],ξ(i,t)(a)=r[−1](i,t)(a)λ(i,t)(−1).

From now on, we shall omit the subscript whenever there is no danger of ambiguity and write instead of

###### Remark 4

The decomposition (4.24) of the transition probability can be interpreted as follows. In a first step, we choose a random spatial interaction range according to the probability distribution Once the range of the spatial interaction is fixed, we then perform a transition according to which depends only on the finite space-time configuration A comprehensive introduction to this technique can be found in the lecture notes of Fernández et al. (2001).

###### Example 2

Suppose that all interactions are excitatory, i.e. for all Suppose further that for all and that does not depend on and is non-decreasing in Then

 r[−1](i,t)(1)=Φi(∑jWj→iξt−1(j))%andr[−1](i,t)(0)=1−Φi(∑jWj→i(t−Rit)).

Hence,

 λ(i,t)(−1)=1+Φi(∑jWj→iξt−1(j))−Φi(∑jWj→i(t−Rit)).

Similarly,

 r[k](i,t)(1|x)=Φi⎛⎜⎝∑j∉Vi(k)Wj→it−1∑m=Lit(x)ξs(j)+∑j∈Vi(k)Wj→it−1∑s=Lit(x)xs(j)⎞⎟⎠

and

 r[k](i,t)(0|x)=1−Φi⎛⎜⎝∑j∉Vi(k)Wj→i(t−Lit(x))+∑j∈Vi(k)Wj→it−1∑s=Lit(x)xs(j)⎞⎟⎠.

Proof of Proposition 1 We have for any and

 p(i,t)(a|x)=r[−1](i,t)(a)+(N∑k=0Δ[k](i,t)(a|x(Vi(k))))+(p(i,t)(a|x)−r[N](i,t)(a|x)),

where

 Δ[k](i,t)(a|x(Vi(k)))=r[k](i,t)(a|x(Vi(k)))−r[k−1](i,t)(a|x(Vi(k−1))).

Now, due to condition (2.5),

 |p(i,t)(a|x)−r[N](i,t)(a|x)|≤γ⎛⎜⎝t−1∑s=Ritsupjgj(t−s)⎞⎟⎠∑j∉Vi(N)|Wj→i|→0

as due to (2.2). In the above upper bound we used that

 t−1∑s=Litgj(t−s)|zs(j)−xs(j)|≤t−1∑s=Ritsupjgj(t−s)<∞

almost surely, which is a consequence of (2.7). Therefore we obtain the following decomposition.

 p(i,t)(a|x)=r[−1](i,t)(a)+∞∑k=0Δ[k](i,t)(a|x(Vi(k))),a∈{0,1}, for all x∈Sξ. (4.25)

Now let

 p[−1](i,t)(a)=r[−1](i,t)(a)λ(i,t)(−1).

Moreover, for put

 ~λ(i,t)(k,x(Vi(k)))=∑aΔ[k](i,t)(a|x(Vi(k))), (4.26)

and for any such that we define

 ~p[k](i,t)(a|x(Vi(k)))=Δ[k](i,t)(a|x(Vi(k)))~λ(i,t)(k,x(Vi(k))).

For such that define in an arbitrary fixed way. Hence

 p(i,t)(a|x)=λ(i,t)(−1)p[−1](i,t)(a)+∞∑k=0~λ(i,t)(k,x(Vi(k)))~p[k](i,t)(a|x(Vi(k))). (4.27)

In (4.27) the factors still depend on To obtain the desired decomposition, we must rewrite it as follows.

For any take the sequences as defined in (4.21) and (4.22), respectively. Define the new quantities

 α(i,t)(k,x(Vi(k)))=∑l≤k~λ(i,t)(l,x(Vi(l)))

and notice that

 α(i,t)(k,x(Vi(k)))=∑ar[k](i,t)(a,x(Vi(k)))

is the total mass associated to From now on and for the rest of the proof we shall write for short instead of writing

Reading (4.27) again, this means that for any we have to use the transition probability on the interval

By definition of in (4.21), is the smallest total mass associated to uniformly with respect to all possible neighborhoods Hence, in order to get the decomposition (4.24) with weights not depending on the configuration, we have to define a partition of the interval