Hearing the Maximum Entropy Potential of neuronal networks

# Hearing the Maximum Entropy Potential of neuronal networks

R. Cofré NeuroMathComp team (INRIA, UNSA LJAD) 2004 Route des Lucioles, 06902 Sophia-Antipolis, France    B. Cessac NeuroMathComp team (INRIA, UNSA LJAD) 2004 Route des Lucioles, 06902 Sophia-Antipolis, France
July 15, 2019
###### Abstract

We consider a spike-generating stationary Markov process whose transition probabilities are known. We show that there is a canonical potential whose Gibbs distribution, obtained from the Maximum Entropy Principle (MaxEnt), is the equilibrium distribution of this process. We provide a method to compute explicitly and exactly this potential as a linear combination of spatio-temporal interactions. In particular, our results establish an explicit relation between Maximum Entropy models and neuro-mimetic models used in spike train statistics.

###### pacs:
87.19.lo 05.10.-a 87.10.-e 87.85.dq

The spike train response of a neuronal network to external stimuli is largely conditioned by the stimulus itself, synaptic interactions and neuronal network history gerstner-kistler:02b (). Understanding these dependences is a current challenge in neuroscience rieke-warland-etal:97 (). One current research trend is based on the assumption that spikes are generated by a Markov process where the form of transition probabilities is derived from our knowledge about neuronal network dynamics (“neuro-mimetic” models). Prominent examples are the Linear-Non Linear model (LN) and the Generalized Linear Model (GLM) brillinger:88 (); chichilnisky:01 () or Integrate-and-Fire models gerstner-kistler:02b (), cofre-cessac:13 (). In all these examples the transition probabilities are explicit functions of “structural” parameters in the neural network (synaptic weights matrix, and stimulus ) (Fig. 1a). Another trend is based on the Maximum Entropy Principle (MaxEnt) jaynes:57 (). It consists of fixing a set of constraints, determined as the empirical average of observables measured from the spiking activity. Maximizing the statistical entropy given those constraints provides a unique probability, called a Gibbs distribution. The choice of constraints determines a “model”. Prominent examples are the Ising model schneidman-berry-etal:06 (); shlens-field-etal:06 () where constraints are firing rates and probabilities that neurons fire at the same time, the Ganmor-Schneidman-Segev model ganmor-segev-etal:11a (), which considers additionally the probability of triplets and quadruplets of spikes, or the Tkačik et al model tkacik-marre-etal:13 () where the probability that out of cells in the network generate simultaneous action potentials is an additional constraint. In these examples dynamics has no memory; but Markovian models where the probability of a spike pattern at a given time depends on the spike history can be considered as well marre-boustani-etal:09 (); vasquez-marre-etal:12 (). These models depends on a set of parameters (Lagrange multipliers) which are naturally interpreted, in the case of pairwise constraints, as “functional interactions” between neurons ganmor-segev-etal:11a () (see fig 1b for the Ising model).

To summarize, (at least) two different representations can be used to analyze spike train statistics in neuronal networks (fig. 1).

The goal of this paper is to establish an explicit and exact link between these two representations. An important work in this direction has been done in cocco-leibler-etal:09 (), linking the “neuro-mimetic” Integrate and Fire model to Ising model. We propose here a generalization which allows us to handle more general types of neuro-mimetic models as well as general MaxEnt distributions (including memory). The method we used is based on Hammersley-Clifford decomposition theorem hammersley-clifford:71 () and periodic orbits decomposition in ergodic theory pollicott-weiss:03 (). The techniques are therefore different from cocco-leibler-etal:09 (). More generally, we solve the following problem. Assuming that a spike train has been generated by a Markov process where transition probabilities are given (and strictly positive), can we construct a MaxEnt model, with a minimum of constraints, reproducing exactly the (spatio-temporal) statistics of this process? When the Markov process is generated by a neuro-mimetic model, this establishes an exact correspondence between the structural parameters and the parameters of the MaxEnt.

We study a network composed of neurons. Time has been discretized so that a neuron can at most fire a spike within one time bin. A spike train is represented by a binary time series with entries if neuron fires at time and otherwise. The spiking pattern at time is the vector . A spike block is an ordered list of spiking patterns where spike times range from to .

In a neuronal network the probability that the spike pattern occurs at time is the result of the complex membrane potentials dynamics gerstner-kistler:02b (). A simplification consists of assuming that this probability is only a function of the spike history up to a certain memory depth . This provides a family of conditional probabilities which may depend explicitly on time as indicated by the sub-index . They define a Markov chain. In neuro-mimetic models these probabilities depends on parameters that mimics biophysical quantities such as synaptic weights matrix and stimulus e.g. :

 Pn[ω(n)∣∣ωn−1n−D]=g[ω(n),WX(ωn−1n−D)+i(ωnn−D)], (1)

where integrates the spike history, whereas integrates the stimulus effect. is a non linear function (typically sigmoid) of its second argument. The probability depends also on the spike pattern at time (first argument of ).

In this paper, we make two assumptions. (i) Transition probabilities do not depend on time i.e. spike statistics is stationary. We can then drop the index in ; (ii) . This ensures that there is a unique invariant distribution for the Markov chain, denoted by . We note:

 ϕ(ωD0)=logP[ω(D)∣∣ωD−10]. (2)

From the Chapman-Kolmogorov equation we have, for , . This suggests that is a Gibbs distribution with potential and normalization factor . We see below that this is indeed the case. We call a normalized potential.

The MaxEnt also defines a Markov chain in the following way. A potential of range is a function which associates to a spike block a real value. We assume . Any such potential can be written:

 H(ωD0)=L∑l=0hlml(ωD0), (3)

where . ’s are real numbers whereas the function with is called a monomial. is the degree of the monomial. By analogy with spin systems, we see from (3) that monomials somewhat constitute spatio-temporal interactions: degree monomials corresponds to “self-interactions”, degree to pairwise interactions, and so on. In many examples most ’s are equal to zero. For example, Ising model considers only monomials of the form (singlets) or (pairwise events). More generally, the potential (3) considers spatio-temporal events occurring within a time horizon .

There is a unique stationary probability , called Gibbs distribution with potential satisfying:

 P[H]=supν∈M(S[ν]+ν[H])=S[μ]+μ[H], (4)

where is the set of stationary probabilities on the set of spike trains, whereas is the entropy of . is the average of with respect to . The average value of each , , is fixed e.g. by experiments. This constitutes a set of constraints in (4). The quantity is the free energy; this is a convex function of ’s and .

Two distinct potentials of range can correspond to the same Gibbs distribution (we call them equivalent). A standard result in ergodic theory states that and are equivalent if and only if there exists a range function such that bowen:08 ():

 H(2)(ωD0)=H(1)(ωD0)−f(ωD−10)+f(ωD1)+Δ, (5)

where . This relation establishes a strict equivalence and does not correspond e.g. to renormalization. The “if” part is easy. Indeed, injecting in the variational form (4) the terms corresponding to cancels because is time-translation invariant. Therefore, the supremum is reached for the same Gibbs distribution as whereas is indeed the difference of free energies. The “only if” part is more tricky.

Let us give a key example. To any potential of the form (3) corresponds a unique equivalent normalized potential (2). In this case the function acts as normalization function. For potentials of range (), , where is the partition function and one obtains the standard normalization of spatial Gibbs distributions. In the spatio-temporal case is written in terms of the largest eigenvalue and the corresponding right eigenvector of a transfer matrix associated with georgii:88 ().

This result establishes a relation between Markov chain-normalized potentials (2) on one hand and potentials of the form (3) on the other hand (the arrow in fig. 1). However, due to the arbitrariness in the choice of in (5) there are infinitely many potentials corresponding to the same Gibbs distribution (the same normalized potential . So the next question is: Can we find a canonical form of with a minimal number of terms equivalent to a given normalized potential ? The situation is a bit like normal forms in bifurcations theory where variable changes permits to eliminate non resonant terms in the Taylor expansion of the vector field arnold:83 (). Here, the role of the variable changes is played by . By suitable choices of one should be able to eliminate some monomials in the expansion (3). An evident situation corresponds to monomials related by time translation, e.g. and : since any is time translation invariant , the firing rate of neuron . Such monomials correspond to the same constraint in (4) and can therefore be eliminated. A potential where monomials, related by time translation to a given monomial, have been eliminated (the corresponding vanishes) is called canonical. A canonical potential contains thus, in general, terms. One can show that only these monomials can be eliminated by a transformation like (5). This implies that two canonical potentials are equivalent if and only if their coefficients , , are equal cessac-cofre:13c (). There is still an arbitrariness due to the term (“Gauge” invariance). One can set it equal without loss of generality. In this way, there is only one canonical potential, with a minimal number of monomials, corresponding to a given stationary Markov chain.

The next question is: how to compute the coefficients of the canonical potential from the knowledge of ? Given a spike block , a periodic orbit of period is a sequence of spike blocks , where , , . Now, from (5) we have, for such a periodic orbit, , because the -terms disappear when summed along a periodic orbit. It follows that the sum of a potential along a periodic orbit is an invariant (up to the constant term ) in the class of equivalent potentials. This is a classical result in ergodic theory extending to infinite range potentials livsic:72 (). Therefore, in the correspondence between and its canonical potential we have:

 τ∑n=1L∑l=0hlml(ω(ln))=τ∑n=1ϕ(ω(ln))+τP[H] (6)

This equation is especially useful if one takes advantage of an existing hierarchy between blocks and between monomials, the Hammersley-Clifford hierarchy hammersley-clifford:71 (). Each spike block is associated to a unique integer (index) . We denote the spike block corresponding to the index . Likewise, since a monomial is defined by a set of spike events , one can associate to each monomial a spike block or “mask” where the only bits ’’ are located at , . This mask has therefore an index. Whereas the labeling of monomials in (3) was arbitrary, denotes from now on the monomial with mask . We define the block inclusion by if , with the convention that the block of degree , , is included in all blocks. This inclusion defines the Hammersley-Clifford hierarchy. Then, for two integers , if and only if . It follows that (6) becomes

 τ∑n=1∑l⊑lnhlml(ω(ln))=τ∑n=1ϕ(ω(ln))+τP[H] (7)

where, with a slight abuse of notations stands for .

What is the use of this relation ? Start from the first mask in hierarchy, the mask containing only ’s, whose corresponding monomial is and consider the periodic orbit, of period , . The application of (7) gives and since we choose for the canonical potential we obtain a direct way to compute the free energy . In the memory-less case one has . To save space we now give the following examples with (the generalization is straightforward). Consider the mask , corresponding to the monomial , and the periodic orbit obtained by a -circular shift of this block: . Since and are related by time translation the coefficient of one of these monomials is set to in the canonical potential . We use the convention to keep the monomial whose mask contains a in the last column. This convention extends to the monomials considered below. Then, from (7) we obtain . This procedure gives all degree one interactions terms.

The idea is then to proceed recursively, by increasing degree in the Hammersley-Clifford hierarchy. Let us consider pairwise interactions. For Ising coefficients, corresponding to blocks of the form , the procedure is the same as above: take the periodic orbit , remark that corresponds to a non canonical interaction, and compute the pairwise coefficient of where the l.h.s. of (7) contains and coefficients corresponding to masks of lower degree which have been already computed. Then the Ising coefficient can be determined (see eq. (9) for an explicit form). For the one step of memory pairwise coefficients (e.g. ) appearing in marre-boustani-etal:09 () the situation is slightly different. Consider e.g. the mask corresponding to the monomial . The periodic orbit is not useful because it leads to unknown in eq. (7). Instead, the orbit , of period can be used. When applying (7) there is only one unknown, corresponding to the mask whereas the coefficients of the other blocks have a lower degree and have therefore been already computed. The general procedure for arbitrary blocks is described in cessac-cofre:13c ().

Obviously, when getting to larger degrees the method becomes rapidly intractable because of the exponential increase in the number of terms. The hope is that the influence of monomials decays rapidly with their degree. Additionally, applying it to real data where transition probabilities are not exactly known leads to severe difficulties. These aspects will be treated in a separated paper. Our goal here was to answer the questions asked in the introduction and, as a by-product, to establish a link between neuro-mimetic models for spike train statistics and MaxEnt models. This goal is now achieved. To illustrate this we show how the coefficients shaping the Ising model are written in terms of transition probabilities of a Markov process with range . The Ising potential only provides an approximation of the equilibrium distribution of the process, corresponding to degree expansion in the Hammersley-Clifford hierarchy, where, besides, memory effects are neglected. Let us write the Ising potential in the form . We keep here the spike description with ’s and ’s but the relation with the classical Ising Hamiltonian with spins is straightforward. From the method described above we obtain:

 hi=R∑n=1ϕ(ω(σnli))−Rϕ(ω(0)), (8)

where is the mask having ’s everywhere but on row -column whereas denotes the periodic shift. In the same way, denoting the mask with ’s everywhere but on rows -column :

 Jij=R∑n=1ϕ(ω(σnlij))−hi−hj−Rϕ(ω(0)). (9)

Note that these coefficients depend on the memory depth of the Markov process.

When is derived from a neuro-mimetic model (e.g. eq. (1)), it follows that the “local field” depends non linearly on the complete stimulus (not only the stimulus applied to neuron ); this is also a non linear function of the synaptic weights matrix . This is not that surprising. Even in an Ising model of two neurons with no memory, a strong favorable pairwise interaction between the two neurons firing simultaneously will increase the average firing rate of both neurons, even in the absence of an external field. Likewise, depends on the whole synaptic weights matrix and not only on the connection between and . This example clearly shows that there is no straightforward relation between the so-called “functional connectivity” in Ising model (’s) and the real connectivity.

It results from our analysis that the ’s of a canonical potential corresponding to a neuro-mimetic model are generically non zero: considering e.g. random synaptic weights , the probability that some in (7) vanishes is zero. Therefore, while there are parameters to constraint the transition probabilities in neuro-mimetic models, the number of MaxEnt parameters increases exponentially fast with . Thus, there is a great amount of redundant information in the which are related by non linear relations. However, real neural networks are non generic: synaptic weights are not drawn at random but result from a long phylogenetic and ontogenetic evolution. When trying to “explain” spike statistics of real neural networks with the Maximum Entropy Principle, one is seeking some general laws that has to be expressed with relatively few phenomenological parameters in the potential (3). The hope is that many coefficients coming from real data are or close to . This could explain the efficiency of pairwise MaxEnt models bialek-ranganathan:07 () for spike trains analysis. Our method provides a way do detect this, if the l.h.s. in (7) is close to cessac-cofre:13c ().

More generally our method opens up new possibilities which allow a better understanding of the role of different neural network topologies and stimulus on spike responses. It is not limited to spike trains however and could also impact different areas of scientific knowledge where binary time series are considered.

This work was supported by the French ministry of Research and University of Nice (EDSTIC), INRIA, ERC-NERVI number 227747, KEOPS ANR-CONICYT and European Union Project FP7-269921 (BrainScales), Renvision 600847. We thank the reviewers for constructive criticism.

## References

• [1] V. I. Arnold. Geometrical Methods in the Theory of Ordinary Differential Equations. Springer–Verlag New York Inc., 1983.
• [2] W. Bialek and R. Ranganathan. arXiv:0712.4397., 2007.
• [3] D. R. Brillinger. Biol Cybern, 59(3):189–200, 1988.
• [4] B. Cessac and R. Cofré. Research report, INRIA, 2013.
• [5] R. Bowen. Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms. Springer–Verlag Berlin, 2008.
• [6] E. J. Chichilnisky. Network: Comput. Neural Syst., 12:199–213, 2001.
• [7] S. Cocco, S. Leibler, and R. Monasson. PNAS, 106(33):14058–14062, 2009.
• [8] R. Cofré and B. Cessac. Chaos, Solitons and Fractals, 50(8):13–31, 2013.
• [9] E. Ganmor, R. Segev, and E. Schneidman. The Journal of Neuroscience, 31(8):3044–3054, 2011.
• [10] H.-O. Georgii. Gibbs measures and phase transitions. De Gruyter Studies in Mathematics:9. Berlin; New York, 1988.
• [11] W. Gerstner and W. Kistler. Spiking Neuron Models. Cambridge University Press, 2002.
• [12] J. M. Hammersley and P. Clifford. Markov fields on finite graphs and lattices. unpublished, 1971.
• [13] E.T. Jaynes. Phys. Rev., 106:620, 1957.
• [14] A. Livšic. Math. USSR- Izvestia, (6):1278–1301, 1972.
• [15] O. Marre, S. El Boustani, Y. Frégnac, and A. Destexhe. Phys. rev. Let., 102:138101, 2009.
• [16] M. Pollicott and H. Weiss. Communications in Mathematical Physics, 240:457–482, 2003.
• [17] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek. Spikes: Exploring the Neural Code. Bradford Books, 1997.
• [18] E. Schneidman, M.J. Berry, R. Segev, and W. Bialek. Nature, 440(7087):1007–1012, 2006.
• [19] J. Shlens, G.D. Field, J.L. Gauthier, M.I. Grivich, D. Petrusca, A. Sher, A.M. Litke, and E.J. Chichilnisky. Journal of Neuroscience, 26(32):8254, 2006.
• [20] G. Tkačik, O. Marre, T. Mora, D. Amodei, M.J. Berry 2nd, and W. Bialek. J Stat Mech, page P03011, 2013.
• [21] J. C. Vasquez, O. Marre, A. G. Palacios, M. J Berry, and B. Cessac. J. Physiol. Paris, 106(3-4):120–127, 2012.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters