Interacting neurons : estimation of the spiking rate

Non-parametric estimation of the spiking rate in systems of interacting neurons

Abstract.

We consider a model of interacting neurons where the membrane potentials of the neurons are described by a multidimensional piecewise deterministic Markov process (PDMP) with values in where is the number of neurons in the network. A deterministic drift attracts each neuron’s membrane potential to an equilibrium potential When a neuron jumps, its membrane potential is reset to a resting potential, here while the other neurons receive an additional amount of potential We are interested in the estimation of the jump (or spiking) rate of a single neuron based on an observation of the membrane potentials of the neurons up to time We study a Nadaraya-Watson type kernel estimator for the jump rate and establish its rate of convergence in This rate of convergence is shown to be optimal for a given Hölder class of jump rate functions. We also obtain a central limit theorem for the error of estimation. The main probabilistic tools are the uniform ergodicity of the process and a fine study of the invariant measure of a single neuron.

Key words and phrases:
Piecewise deterministic Markov processes. Kernel estimation. Nonparametric estimation. Biological neural nets.
2010 Mathematics Subject Classification:
62G05; 60J75; 62M05

1. Introduction

This paper is devoted to the statistical study of certain Piecewise Deterministic Markov Processes (PDMP) modeling the activity of a biological neural network. More precisely, we are interested in estimating the the underlying jump rate of the process, i.e. the spiking rate function of each single neuron.

Piecewise Deterministic Markov Processes (PDMP’s) have been introduced by Davis ([8] and [9]) as a family of càdlàg Markov processes following a deterministic drift with random jumps. PDMP’s are widely used in probabilistic modeling of e.g. biological or chemical phenomena (see e.g. [7] or [28], see [1] for an overview). In the present paper, we study the particular case of PDMP’s which are systems of interacting neurons. Building a model for the activity of a neural network that can fit biological considerations is crucial in order to understand the mechanics of the brain. Many papers in the literature use Hawkes Processes in order to describe the spatio-temporal dependencies which are typical for huge systems of interacting neurons, see [12], [15] and [19] for example. Our model can be interpreted as Hawkes process with memory of variable length (see [13]); it is close to the model presented in [10]. It is of crucial interest for modern neuro-mathematics to be able to statistically identify the basic parameters defining the dynamics of a model for neural networks. The most relevant mechanisms to study are the way the neurons are connected to each other and the way that a neuron deals with the information it receives. In [11] and in [15], the authors build an estimator for the interaction graph, in discrete or in continuous time. In the present work, we assume that we observe a subsystem of neurons which are all interconnected and behaving in a similar way. We then focus on the estimation of the firing rate of a neuron within this system. This rate depends on the membrane potential of the neuron, influenced by the activity of the other neurons.

More precisely, we consider a process where is the number of neurons in the network and where each variable represents the membrane potential of neuron for Each membrane potential takes values in a compact interval where is interpreted as resting potential (corresponding to in real neurons) and where (see e.g. [21]). This process has the following dynamic. A deterministic drift attracts the membrane potential of each neuron to an equilibrium potential with an exponential speed of parameter Moreover, a neuron with membrane potential “fires” (i.e., jumps) with intensity where is a given intensity function. When a neuron fires, its membrane potential is reset to interpreted as resting potential, while the membrane potentials of the other neurons are increased by until they reach the maximal potential height

The goal of this paper is to explore the statistical complexity of the model described above in a non-parametric setting. We aim at giving precise statistical characteristics (such as optimal rates of convergence, estimation procedures) such that we are able to compare systems of interacting neurons to benchmark non-parametric models like density estimation or nonlinear regression. More precisely, given the continuous observation 1 of the system of interacting neurons over a time interval (with asymptotics being taken as ), we infer on the different parameters of the model which are: the equilibrium potential the speed of attraction and the spiking rate function . Since in a continuous time setting, the coefficients and are known (they can be identified by any observation of the continuous trajectory of a neuron’s potential between two successive jumps), the typical problem is the estimation of the unknown spiking rate

Therefore we restrict our attention to the estimation of the unknown spiking rate We measure smoothness of the spiking rate by considering Hölder classes of possible shapes for the spiking rate and suppose that the spiking rate has smoothness of order in a Hölder sense. To estimate the jump rate in a position we propose a Nadaraya-Watson type kernel estimator which is roughly speaking of the form

 ^ft(a)=♯ spikes in positions in Bh(a) during [0,t] occupation time of Bh(a) during [0,t] ,

where is a neighborhood of size of the position where we estimate the jump rate function A rigorous definition of the estimator is given in terms of the jump measure and an occupation time measure of the process The convergence of the estimator is implied by the fact that the compensator of the jump measure is the occupation time measure integrated against the jump rate function together with uniform ergodicity of the process. Assuming that the jump rate function has smoothness of order in a Hölder sense, we obtain the classical rate of convergence of order for the point-wise error of the estimator. This rate is shown to be optimal. We also state two important probabilistic tools that are needed in order to obtain the statistical results. The first one is the uniform positive Harris recurrence of process. The second one is the existence of a regular density function of the invariant measure of a single neuron.

In the literature, non-parametric estimation for PDMP’s has already been studied, see for example [2] and, more particularly concerning the estimation of the jump rate, [3]. On the contrary to these studies, the framework of the present work is more difficult for two reasons. The first reason is the fact that our process is multidimensional, presenting real interactions between the neurons. Of course, estimation problems for multidimensional PDMP’s have already been studied. However, in all cases we are aware of, a so-called “many-to-one formula” (see [24], see also [18]) allows to express the occupation time measure of the whole system in terms of a single “typical” particle. This is not the case in the present paper – and it is for this reason that we have to work under the relatively strong condition of uniform ergodicity which is implied by compact state space – a condition which is biologically meaningful. The second, more important, reason is the fact that the transition kernel associated to jumps is degenerate. This is why the construction of our estimator is different from other constructions in previous studies. The degeneracy of the transition kernel also leads to real difficulties in the study of the regularity of the invariant density of a single neuron, see [27] and the discussions therein.

In Section 2, we describe more precisely our model and state our main results. We first provide two probabilistic results necessary to prove the convergence of the estimator: firstly, the positive Harris recurrence of the process in Theorem 1 and secondly the properties of the invariant measure in Theorem 2. The speed of convergence of our estimator is established in Theorem 3. Finally, Theorem 4 states that our speed of convergence is optimal for the point-wise error, uniformly in The key tool to prove this optimality is to study the asymptotic properties of the likelihood process for a small perturbation of the function close to

The proofs of Theorems 1,3 and 4 are respectively given in Sections 3, 4 and 5. We refer the reader to [27] for a proof of Theorem 2.

2. The model

2.1. The dynamics

Let be fixed and be a family of i.i.d. Poisson random measures on having intensity measure We study the Markov process taking values in and solving, for , for ,

 (2.1) Xit = Xi0−λ∫t0(Xis−m)ds−∫t0∫∞0Xis−1{z≤f(Xis−)}Ni(ds,dz) +∑j≠i∫t0∫∞0aK(Xis−)1{z≤f(Xjs−)}Nj(ds,dz).

In the above equation, is a positive number, is the equilibrium potential value such that Moreover, we will always assume that Finally, the functions and satisfy (at least) the following assumption.

Assumption 1.

1. is non-increasing and smooth, for all and for all
2. is non-decreasing, and there exists non-decreasing, such that for all .

All membrane potentials take values in where is the maximal height of the membrane potential of a single neuron. is interpreted as resting potential (corresponding to in real neurons) and (see e.g. [21]). In (2.1), gives the speed of attraction of the potential value of each single neuron to an equilibrium value The function denotes the increment of membrane potential received by a neuron when an other neuron fires. For neurons with membrane potential away from the bound this increment is equal to However, for neurons with membrane potential close to this increment may bring their membrane potential above the bound This is why we impose this dynamic close to the bound

In what follows, we are interested in the estimation of the intensity function assuming that the parameters and are known and that the function belongs to a certain Hölder class of functions. The parameters of this class of functions are also supposed to be known. The assumption comes from biological considerations and expresses the fact that a neuron, once it has fired, has a refractory period during which it is not likely to fire.

The generator of the process is given for any smooth test function and by

 (2.2) Lφ(x)=N∑i=1f(xi)[φ(Δi(x))−φ(x)]−λ∑i(∂φ∂xi(x)[xi−m]),

where

 (2.3) (Δi(x))j={xj+aK(xj)j≠i0j=i}.

The existence of a process with such dynamics is ensured by an acceptance/rejection procedure that allows to construct solutions to (2.1) explicitly. More precisely, since each neuron spikes at maximal intensity we can work conditionally on the realization of a Poisson process with intensity We construct the process considering the jump times of as candidates for the jump times of and accepting them with probability

 ∑Ni=1f(Xi¯Tn−)Nf(K).

It is then possible to construct a solution to (2.1) step by step, following the deterministic drift between the jump times of and jumping according to this acceptance/rejection procedure. We refer the reader to Theorem 9.1 in chapter IV of [20] for a proof of the existence of the process

We denote by the probability measure under which the solution of (2.1) starts from Moreover, denotes the probability measure under which the process starts from Figure 1 is an example of trajectory for neurons, choosing and

The aim of this work is to estimate the unknown firing rate function based on an observation of continuously in time. Notice that for all reaches the value only through jumps. Therefore, the following definition gives the successive spike times of the th neuron, We put

 Ti0=0,Tin=inf{t>Tin−1:Xit−>0,Xit=0},n≥1,

and introduce the jump measures

 μi(ds,dy)=∑n≥11{Tin<∞}δ(Tin,XiTin−)(dt,dy),μ(dt,dx)=N∑i=1μi(ds,dx).

By our assumptions, is compensated by and therefore the compensator of is given by

 ^μ(dt,dy)=f(y)η(dt,dy), where η(A×B)=∫A(N∑i=11B(Xis))ds

is the total occupation time measure of the process

We will also write for the successive jump times of the process i.e.

 T0=0,Tn=inf{Tik:Tik>Tn−1,k≥1,1≤i≤N},n≥1.

For some kernel function such that

 (2.4) Q∈Cc(R),∫RQ(y)dy=1,

we define the kernel estimator for the unknown function at a point with bandwidth , based on observation of up to time by

 (2.5) ^ft,h(a)=∫t0∫RQh(y−a)μ(ds,dy)∫t0∫RQh(y−a)η(ds,dy), where Qh(y):=1hQ(yh) and 00:=0.

For small, is a natural estimator for Indeed, this expression as a ratio follows the intuitive idea to count the number of jumps that occurred with a position close to and to divide by the occupation time of a neighborhood of which is natural to estimate an intensity function depending on the position More precisely, by the martingale convergence theorem, the numerator should behave, for large, as But by the ergodic theorem,

 ∫t0∫RQh(y−a)f(y)η(ds,dy)∫t0∫RQh(y−a)η(ds,dy)→π1(Qh(⋅−a)f)π1(Qh(⋅−a))

as where is the stationary measure of each neuron Finally, if the invariant measure is sufficiently regular, then

 π1(Qh(⋅−a)f)π1(Qh(⋅−a))→f(a)

as

We restrict our study to fixed Hölder classes of rate functions For that sake, we introduce the notation for and We consider the following Hölder class for arbitrary constants and a function as in Assumption 1.

 (2.6) H(β,F,L,fmin)={f∈Ck(R+):|dldxlf(x)|≤F, for all 0≤l≤k,x∈[0,K],f(x)≥fmin(x) for all x∈[0,K], |f(k)(x)−f(k)(y)|≤L|x−y|α for all x,y∈[0,K]}.

2.2. Probabilistic results

In this Section, we collect important probabilistic results. We first establish that the process is recurrent in the sense of Harris.

Theorem 1.

Grant Assumption 1. Then the process is positive Harris recurrent having unique invariant probability measure i.e. for all

 (2.7) π(B)>0 implies Px(∫∞01B(Xs)ds=∞)=1

for all Moreover, there exist constants and which do only depend on the class but not on such that

 (2.8) supf∈H(β,F,L,fmin)∥Pt(x,⋅)−π∥TV≤Cκ−t.

It is well-known that the behavior of a kernel estimator such as the one introduced in (2.5) depends heavily on the regularity properties of the invariant probability measure of the system. Our system is however very degenerate. Firstly, it is a piecewise deterministic Markov process (PDMP) in dimension with interactions between particles. Hence, no Brownian noise is present to smoothen things. Moreover, the transition kernels associated to the jumps of system (2.1) are highly degenerate (recall (2.3)). The transition kernel

 K(x,dy)=L(XT1|XT1−=x)(dy)=N∑i=1f(xi)¯f(x)δΔi(x)(dy)

with puts one particle (the one which is just spiking) to the level As a consequence, the above transition does not create density – and it even destroys smoothness due to the reset to of the spiking neuron. Finally, the only way that “smoothness” is generated by the process is the smoothness which is present in the “noise of the jump times” (which are basically of exponential density). For this reason, we have to stay away from the point where the drift of the flow vanishes. Moreover, the reset-to- of the spiking particles implies that we are not able to say anything about the behavior of the invariant density of a single particle in (actually, near to ) neither. Finally, we also have to stay strictly below the upper bound of the state space That is why we introduce the following open set given by

 (2.9) Sd,β:={w∈[0,K]:⌊β⌋Nd},

where is the smoothness of the fixed class that we consider and where is fixed such that Notice that also depends on and which are supposed to be known. We are able to obtain a control of the invariant measure only on this set The dependence in is due to the fact that the regularity of is transmitted to the invariant measure by the means of successive integration by parts (see [27] for more details).

We quote the following theorem from [27].

Theorem 2.

(Theorem 5 of [27])

Suppose that Let

 π1:=Lπ(X1t)

be the invariant measure of a single neuron, i.e.  Then possesses a bounded continuous Lebesgue density on for any such that which is bounded on uniformly in Moreover, and

 (2.10) supℓ≤⌊β⌋,w∈Sd,β|π(ℓ)1(w)|+supw≠w′,w,w′∈Sd,βπ(⌊β⌋)1(w)−π(⌊β⌋)1(w′)|w−w′|α≤CF,

where the constant depends on and on the smoothness class but on nothing else.

2.3. Statistical results

We can now state the main theorem of our paper which describes the quality of our estimator in the minimax theory. We assume that and are known and that is the only parameter of interest of our model. We shall always write and in order to emphasize the dependence on the unknown Fix some and some suitable point For any possible rate of convergence increasing to and for any process of measurable estimators we shall consider point-wise square risks of the type

 supf∈H(β,F,L,fmin)r2tEfx[|^ft(a)−f(a)|2|At,r],

where

 At,r:={1Nt∫t0∫RQh(y−a)η(ds,dy)≥r}

is roughly the event ensuring that sufficiently many observations have been made near during the time interval We are able to choose small enough such that

 (2.11) liminft→∞inff∈H(β,F,L,fmin)Pfx(At,r)=1,

see Proposition 8 below.

Recall that the kernel is chosen to be of compact support. Let us write for the diameter of the support of therefore if For any fixed write Here,

Theorem 3.

Let and choose such that for all and Then there exists such that the following holds for any and for any

(i) For the kernel estimate (2.5) with bandwidth for all

 limsupt→∞supf∈H(β,F,L,fmin)t2β2β+1Efx[|^ft,ht(a)−f(a)|2|At,r]<∞.

(ii) Moreover, for for every and

 √tht(^ft,ht(a)−f(a))→N(0,Σ(a))

weakly under where

The next theorem shows that the rate of convergence achieved by the kernel estimate is indeed optimal.

Theorem 4.

Let and be any starting point. Then we have

 (2.12) liminft→∞inf^ftsupf∈H(β,F,L,fmin)t2β1+2βEfx[|^ft(a)−f(a)|2]>0,

where the infimum is taken over the class of all possible estimators of

The proofs of Theorems 3 and 4 are given in Sections 4 and 5.

2.4. Simulation results

In this subsection, we present some results on simulations, for different jump rates The other parameters are fixed: and The dynamics of the system are the same when and have the same ratio. In other words, variations of and keeping the same ratio between the two parameters lead to the same law for the process rescaled in time. This is why we fix and propose different choices for The kernel used here is a truncated Gaussian kernel with standard deviation 1.

We present for each choice of a jump rate function the associated estimated function and the observed distribution of or more precisely of Figures 2, 3 and 4 correspond respectively to the following definitions of and

For Figures 2, 3 and 4, we fixed the length of the time interval for observations respectively to and This allows us to obtain a similar number of jump for each simulation, respectively equal to and These simulations are realized with the software R.

The optimal bandwidth depends on the regularity of given by the parameter Therefore, we propose a data-driven bandwidth chosen according to a Cross-validation procedure. For that sake, we define the sequence by for all For each and each sample for we define the random variable by

 ^πℓ,n,h1(a)=1(n−ℓ)Nn∑k=ℓ+1N∑i=1Qh(Zik−a).

can be seen as an estimator of the invariant measure of the discrete Markov chain.

We propose an adaptive estimation procedure at least for this simulation part. We use a Smoothed Cross-validation (SCV) to choose the bandwidth (see for example the paper of Hall, Marron and Park [14]), following ideas which were first published by Bowmann [5] and Rudemo [29]. As the bandwidth is mainly important for the estimation of the invariant probability , we use a Cross validation procedure for this estimation. More precisely, we use a first part of the trajectory to estimate and then another part of the trajectory to minimize the Cross validation in In order to be closer to the stationary regime, we chose the two parts of the trajectory far from the starting time. Moreover we chose two parts of the trajectory sufficiently distant from each other. This is why we consider and such that

We use the method of the least squares Cross validation and minimize

 SCV(h)=∫(ˆπℓ,n,h1(x))2dx−2N(m2−m1)m2∑k=m1+1N∑i=1ˆπℓ,n,h1(Zik)

(where we have approximated the integral term by a Riemann approximation), giving rise to a minimizer We then calculate the estimator the long of the trajectory. In the next figure, we use this method to find the reconstructed with an adaptive choice of

As expected, we can see that the less observations we have, the worse is our estimator. Note that close to the observed density of explodes. This was also expectable due to the reset to of the jumping neurons. Moreover, the simulations show a lack of regularity of the observed density close to which is consistent with our results, but this does not seem to affect the quality of the estimator.

3. Harris recurrence of X and speed of convergence to equilibrium – Proof of Theorem 1

In this section, we give the proof of Theorem 1 and show that the process is positive recurrent in the sense of Harris. We follow a classical approach and prove the existence of regeneration times. This is done in the next subsection and follows ideas given in Duarte and Ost [10].

3.1. Regeneration

The main idea of proving a regeneration property of the process is to find some uniform “noise” for the whole process on some “good subsets” of the state space. Since the transition kernel associated to the jumps of our process is not creating any density (and actually destroys it for the spiking neurons which are reset to ), the only source of noise is given by the random times of spiking. These random times are then transported through the deterministic flow which is given for any starting configuration by

 (3.13) γs,t(vi)=e−λ(t−s)vi+(1−e−λ(t−s))m,0≤s≤t,γt(vi):=γ0,t(vi).

The key idea of what follows – which is entirely borrowed from [10] – is the following.

Write for the sequence giving the index of the spiking neuron at time i.e. if and only if for some It is clear that in order to produce an absolute continuous law with respect to the Lebesgue measure on we need at least jumps of the process. On any event of the type it is possible to write the position of the process at time as a concatenation of the deterministic flows given by

 (3.14) Γ(t1,…,tN,i1,…,iN)(t,v)=γtN,t(ΔiN(γtN−1,tN(ΔiN−1(…Δi1(γ0,t1(v)))))).

Proving absolute continuity amounts to prove that the determinant of the Jacobian of the map does not vanish. For general sequences of this will not be true (think e.g. of the sequence ).

The main idea is however to consider the sequence and to use the regeneration property of spiking, i.e. the fact that the neuron spiking at time is reset to zero at time In this case, for all later times, its position does not depend on any more. In other words, the Jacobian of is a diagonal matrix, and all we have to do is to control that all diagonal elements do not vanish. The second idea is to linearize the flow, i.e. to consider the flow during very short time durations, and to use that, just after spiking, each diagonal element is basically of the form

 ∂γs,t(0)∂s∼−λm, as t−s→0.

The important fact here is that the absolute value of the drift term of the deterministic flow of one neuron is strictly positive when starting from the initial value

In the following, this idea is made rigorous. Our proof follows the approach given in Section 4 of [10]. We fix and put

 Aε={iε−ε/4

and

 S={I1=1,I2=2,…,IN=N}

which is the event that all neurons have spiked in the fixed order given by their numbers, i.e. neuron spikes first, then neuron then and so on. We introduce

 u∗=(N−1N,N−2N,…,1N,0)

which would be the position of neurons after spikes and on the event if (here, we suppose w.l.o.g. that ).

Now we fix any initial configuration and introduce the sequence of configurations given by and

 (3.15) vi(k)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩k−iN,ik⎫⎪ ⎪ ⎪⎬⎪ ⎪ ⎪⎭.

Notice that if Notice also that

We cite the following lemma from [10].

Lemma 1 (Lemma 4.1 of [10]).

If then on the event we have for all
(i) if

(ii) if

(iii) if and

Here, and Moreover, the remainder functions are of order

 Rδε(Tk1,u)=O(δε),Rε2(Tk1,u)=O(ε2),Rδ(u)=O(δ),…,

and all partial derivatives are of order either or uniformly in

Remark 1.

Our model is slightly different from the model in [10]: instead of an attraction to the empirical mean of the system, we have an attraction to a fixed equilibrium value This leads to our definition of which is slightly different from the one used in [10].

Corollary 1 ([10], Corollary 3).

Put Then we have on

 (3.16) Xi(t∗)=u∗i+λ(t∗−TN)d∗i+N∑r=i+1λ(Tr−Tr−1)di(r−1)+Rδε(TN1,u)+Rε2(TN1,u),

where

We put as in [10] where

 γ0i(tN1):=u∗i+λ(t∗−tN)d∗i+N∑r=i+1λ(tr−tr−1)di(r−1),1≤i≤N.

Hence models how the successive jump times are mapped, through the deterministic flow, into a final position at time on the event In order to control how the law of the successive jump times is transported through this flow, we calculate the partial derivatives of with respect to One sees immediately that

 ∂γ0i∂tk=0,k

whence

Corollary 2 (Corollary 4 of [10]).

For each the determinant of the Jacobian of the map is given by

 λNmN+Rε(tN1,u)+Rδ(tN1,u)

which is different from zero for and small enough, for all

As in Proposition 4.1 of [10], we now have two important conclusions from the above discussion.

Proposition 1.

There exists and such that for

 (3.17) Pt∗(x,⋅)≥η11Bδ∗(u∗)(x)ν,

where is a probability measure and

The lower bound (3.17) is a local Doeblin condition, and its proof is given in Proposition 4.1 of [10]. We call a regeneration set: if the process visits this regeneration set, then after a time there is a probability that the law of the process is independent from its initial position

To be able to make use of the local Doeblin condition, we have to be sure that the process actually does visit the regeneration set This is granted by the following result.

Proposition 2.

There exist and such that

 inff∈H(β,F,L,fmin)infv∈[0,K]NPt∗(v,Bδ∗(u∗))≥η2,

for

Proof.

By (3.16), there exists such that for all we have that on when Hence

 Pt∗(v,Bδ∗(u∗))≥Pv(Aε∩S).

Recalling (3.13), we then obtain

 Pv(Aε∩S)=