Theory of population coupling and applications to describe high order correlations in large populations of interacting neurons
To understand the collective spiking activity in neuronal populations, it is essential to reveal basic circuit variables responsible for these emergent functional states. Here, I develop a mean field theory for the population coupling recently proposed in the studies of visual cortex of mouse and monkey, relating the individual neuron activity to the population activity, and extend the original form to the second order, relating neuron-pair’s activity to the population activity, to explain the high order correlations observed in the neural data. I test the computational framework on the salamander retinal data and the cortical spiking data of behaving rats. For the retinal data, the original form of population coupling and its advanced form can explain a significant fraction of two-cell correlations and three-cell correlations, respectively. For the cortical data, the performance becomes much better, and the second order population coupling reveals non-local effects in local cortical circuits.
pacs:87.19.L-, 02.50.Tt, 75.10.Nr
To uncover the neural circuit mechanisms underlying animal behavior, e.g., working memory or decision making, is a fundamental issue in systems neuroscience [1, 2]. Recent developments in multi-neuron recording methods make simultaneous recording of neuronal population activity possible, which gives rise to the challenging computational tasks of finding basic circuit variables responsible for the observed collective behavior of neural populations . The collective behavior arises from interactions among neurons, and forms the high dimensional neural code. To search for a low dimensional and yet neurobiologically plausible representation of the neural code, thus becomes a key step to understand how the collective states generate behavior and cognition.
Correlations among neurons’ spiking activities play a prominent role in deciphering the neural code . Various models were proposed to understand the pairwise correlations in the population activity [5, 6, 7]. Modeling these correlations sheds light on the functional organization of the nervous system . However, as the population size grows, higher order correlations have to be taken into account for modeling synchronous spiking events, which are believed to be crucial for neural information transmission [9, 10, 11]. In addition, the conclusion drawn from small size populations may not be correct for large size populations. Theoretical studies have already proved that high order interactions among neurons are necessary for generating widespread population activity [12, 13]. However, introduction of high order multi-neuron couplings always suffers from a combinatorial explosion of model parameters to be estimated from the finite neural spike train data.
To account for high order correlations, various models with different levels of approximation were proposed, for example, the reliable interaction model  with the main caveat that the rare patterns are discarded during inference of the coupling terms, the dichotomized Gaussian model [7, 15] in which correlations among neurons are caused by common Gaussian inputs to threshold neurons, the K-pairwise model [16, 17] in which an effective potential related to the synchronous firing of neurons was introduced, yet hard to be interpreted in terms of functional connectivity, and the restricted Boltzmann machine  where hidden units were shown to be capable of capturing high order dependences but their number should be predefined and difficult to infer from the data . One can also take into account part of the statistical features of the population activity (e.g., simultaneous silent neural pattern) and assume homogeneity for high order interactions among neurons due to the population size limitation . In this paper, I provide a low dimensional neurobiological model for describing the high order correlations and extracting useful information about neural functional organization and population coding.
In this study, I interpret correlations in terms of population coupling, a concept recently proposed to understand the multi-neuron firing patterns of the visual cortex of mouse and monkey . The population coupling characterizes the relationship of the activity of a single neuron with the population activity; this is because, the firing of one neuron is usually correlated with the firing pattern of other neurons. I further generalize the original population coupling to its higher order form, i.e., the relationship of pairwise firing with the population activity. I then derive the practical dimensionality reduction method for both types of population couplings, and test the method on different types of neural data, including ganglion cells in the salamander retina onto which a repeated natural movie was projected , and layer 2/3 as well as layer 5 cortical cells in the medial prefrontal cortex (MPC) of behaving rats .
In this paper, I develop a theoretical model of population coupling and its advanced form, to explain higher order correlations in the neural data. Methodologically, I propose the fast mean field method not only to learn the population couplings but also to evaluate the high order correlations. Note that this is computationally hard in a traditional maximum entropy model by using sampling-based method. Conceptually, I generalize the normal population coupling by introducing the second-order population coupling, which reveals interesting features from the data. First, it can explain a significant amount of three-cell correlations, and it works much better in cortical data than in retinal data. Second, the second-order population coupling matrix has distinct features in retinal and cortical data. The cortical one shows clear stripe-like structure while the retinal one has no such apparent structure. Altogether, this work marks a major step to understand the low-order representation of complex neural activity in both concepts and methods.
Ii Model description and mean-field methods
For a neuronal population of size , the neural spike trains of duration are binned with temporal resolution , yielding samples of -dimensional binary neural firing patterns. I use to denote firing state of neuron , and for silent state. Neural responses to repeated stimulus (or the same behavioral tasks) vary substantially (so-called trial-to-trial variability) [23, 24]. To model the firing pattern statistics, I assign each firing pattern a cost function (energy in statistical physics jargon) , then the probability of observing one pattern can be written as , where
This is the first low dimensional representation to be studied. High energy state corresponds to low probability of observation. is the firing bias constraining the firing rate of neuron , while characterizes how strongly neuron ’s spiking activity correlates with the population activity measured by the sum of other neurons’ activity. I name the first order population coupling (). Thus, only parameters needs to be estimated from the neural data. This number of model parameters is much less than that in conventional maximum entropy model .
To model the high order correlation (e.g., three neuron firing correlation), I further generalize to its advanced form, i.e., the second order population coupling, namely , describing the relationship of pairwise firing with the population activity, and the corresponding energy is given by
where characterizes how strongly the firing state of the neuron pair correlates with the firing activities of other neurons. This term is expected to increase the prediction ability for modeling high order correlations in the neural data. Under the framework of , the total number of parameters to be estimated from the data is . and have a clear neurobiological interpretation (for , see a recent study , and the results obtained under the can also be experimentally tested), and moreover they can be interpreted in terms of functional interactions among neurons (as shown later).
To find the model parameters as a low dimensional representation, I apply the maximum likelihood learning principle corresponding to maximizing the log-likelihood with respect to the parameters. The learning equation for is given by
where and denote the learning step and learning rate, respectively. The maximum likelihood learning shown here has a simple interpretation of minimizing the Kullback-Leibler divergence between the observation probability and the model probability [25, 26]. In an analogous way, one gets the learning equation for ,
In the learning equations Eq. (3) and Eq. (4), the data dependent terms can be easily computed from the binned neural data. However, the model expectations of the firing rate (magnetization in statistical physics) and correlations are quite hard to evaluate without any approximations. Here I use the mean field method to tackle this difficulty.
First, I write the energy term into a unified form,
where denotes the interaction index and denotes the neuron set involved in the interaction . for and for . Therefore, introduces the pairwise interaction as , while introduces the triplet interaction as . The multi-neuron interaction in the conventional Ising model is decomposed into first order or second order population coupling terms. This decomposition still maintains the functional heterogeneity of single neurons or neuron pairs, but reduces drastically the dimensionality of the neural representation for explaining high order correlations. In principle, one can combine and to predict both pairwise and triplet correlations. However, in this work, I focus on the pure effect of each type of population coupling.
In fact, the conventional Ising model  can be recovered by setting , which is pairwise interaction. The learning equation is derived similarly, and is run by reducing the deviation between the model pairwise correlation and the clamped one (computed from the data) .
Second, the statistical properties of the model (Eq. 5) can be analyzed by the cavity method in the mean field theory . The self-consistent equations are written in the form of message passing (detailed derivations were given in Refs [30, 31], see also Appendix A) as
where denotes the member of interaction except , and denotes the interaction set is involved in with removed. is interpreted as the message passing from the neuron to the interaction it participates in, while is interpreted as the message passing from the interaction to its member . This iteration equation is also called the belief propagation (BP), which serves as the message passing algorithm for the statistical inference of the model parameters. Iteration of the message passing equation on the inferred model would converge to a fixed point corresponding to a global minimum of the free energy (in the cavity method approximation )
where is the normalization constant of the model probability . The free energy contribution of one neuron is and the free energy contribution of one interaction is . I define the function . At the same time, the model firing rate and multi-neuron correlation can be estimated as
Magnetization and correlation are defined as and , respectively.
A brief derivation of Eq. (8) is given in Appendix A. Here the multi-neuron correlation is calculated directly from the cavity method approximation and expected to be accurate enough for current neural data analysis . This is because, correlations under the model are evaluated taking into account nearest-neighbor interactions, rather than naive full independence among neurons. This approximation is expected to work well in a weakly-correlated neural population [5, 8], where long-range strong correlations do not develop. A similar application of this principle revealed a non-trivial geometrical structure of population codes in salamander retina . Another advantage is the low computation cost. Both the free energy and the pairwise correlations can be estimated by the time complexity of the order for , and for triplet correlations in , which is one order of magnitude lower than the tractable model of recently proposed in Ref. . A more accurate expression could be derived from linear response theory  with much more expensive computational cost (increased by an order of magnitude ().
To estimate the information carried by a neural population, one needs to compute the entropy, which is defined as , and it measures the capacity of the neural population for information transmission. The more obvious variability the neural responses have, the larger the entropy value is. The entropy of the model can be estimated from the fixed point of the message passing equation. Based on the standard thermodynamic relation, , where is the energy of the neural population and given by
The basic procedure to infer population couplings is given as follows. At the beginning, all model parameters are assigned zero value. It is followed by three steps: () Messages are initialized randomly and uniformly in the interval . () Eq. (6) are then run until converged, and the magnetizations as well as multi-neuron correlations are estimated using Eq. (8). () The estimated magnetizations and correlations are used at each gradient ascent learning step (Eq. (3) or Eq. (4)). When one gradient learning step is finished, another step starts by repeating the above procedure (from () to ()). To learn the higher order population coupling, the damping technique is used to avoid oscillation behavior, i.e., where is the damping factor taking a small value.
The inferred model can also be used to generate the distribution of spike synchrony, i.e., the probability of simultaneous spikes. This distribution can be estimated by using Monte Carlo (MC) simulation on the model. The standard procedure goes as follows. The simulation starts from a random initial configuration , and tries to search for the low energy state, then the energy is lowered by a series of elementary updates, and for each elementary update, proposed neuronal state flips are carried out. That is, the transition probability from state to with only flipped () is expressed as where . The equilibrium samples are collected after sufficient thermal equilibration. These samples (a total of samples in simulations) are finally used to estimate the distribution of spike synchrony.
By using the mean field method, I first test both types of population couplings on the retina data, which is the spike train of ganglion cells in a small patch of the salamander retina . The retina was stimulated with a repeated natural movie. The spike train data is binned with the bin size equal to reflecting the temporal correlation time scale, yielding about binary firing patterns for data modeling.
I then test the same concepts on the cortical data of behaving rats. Rats performed the odor-place matching working memory during one task session, and spiking activities of cells in both superficial layer and deep layer (layer ) of medial prefrontal cortex were simultaneously recorded (for detailed experiments, see Ref. ). One task session consists of about trials, yielding a spike train of these cortical cells binned with the temporal resolution (a total of firing patterns).
iii.1 Inference performances on the retinal data
Fig. 1 reports the inference result on a network example of neurons selected randomly from the original dataset. The firing rate is predicted faithfully by the model using either MC or BP (Fig. 1 (A)). Inferring only , one could predict about of entire pairwise correlation (a precision criterion is set to in this paper) (Fig. 1 (B)). This means that of the whole correlation set have the absolute value of the deviation between the predicted correlation and measured one () smaller than the precision criterion. Using the sampled configurations of neural firing activity from the MC simulation, one could also predict three-cell correlations (Fig. 1 (C)), whereas, the prediction fraction can be improved by a significant amount after introducing , as I shall show later. In addition, fitting only model parameters in analysis could not predict the tail of spike synchrony distribution (Fig. 1 (D)); this is expected as no higher order interaction terms are included in the model, and rare events of large spikes are also difficult to observe in a finite sampling during MC simulations.
The inference results of are given in Fig. 2. Note that, by considering the correlation between the pairwise firing activity and the global population activity, i.e., the second order population coupling, the three-cell correlation could be predicted partially (), and this fraction is much larger than that of (Fig. 1 (C)). This is due to the specific structure of , which incorporates explicitly three-cell correlations into the construction of couplings (Eq. (4)). Technically, the mean-field theory for avoids the slow sampling and evaluates the high order correlations in a fast way. Alternatively, one could fit the data using the conventional Ising model  with the same number of model parameters as , whereas, the three-cell correlations are hard to predict using MC samplings, and a similar phenomenon was also observed in a previous work for modeling pairwise correlations . Therefore I speculate that acts as a key circuit variable for third order correlations.
The interaction matrix of reveals how important each pair of neurons is for the entire population activity (emergent functional state of the whole network). As shown in Fig. 2 (C), matrix has no apparent structure of organization, i.e., each neuron can be paired with both positive and negative couplings. Some pairs have large negative , suggesting that these components are anti-correlated with the population activity. That is to say, the activity of these neuron-pairs is not synchronized to the population activity characterized by the summed activity over all neurons except these pairs. In the network, there also exist positive s, which shows that these neuron-pairs are positively correlated with the population in neural activity. The interaction matrix shown here may be related to the revealed overlapping modular structure of retinal neuron interactions [8, 14]. In this structure, neurons interact locally with their adjacent neurons, and in particular this feature is scalable and applicable for larger networks. It seems that one individual neuron does not impact directly the entire population, and a small group of neighboring neurons have similar visual feature selectivity . This result is also consistent with two-neuron interaction map of the conventional Ising model (Fig. 3 (a)). Note that in functional interpretation, these two-neuron interactions are inherently different from , which is designed to explain high-order correlations by using less model parameters than necessary.
behaves better than in predicting the spike synchrony distribution (Fig. 2 (D)) in the small regime (the prediction is improved from for to for ). An intuitive explanation is that introduces equivalently triplet interactions among neurons, and it is known that high order interactions are necessary for generating widespread population activity . However, overestimates the distribution when rare events of synchronous spiking are considered. This may be related to the difficulty of obtaining sufficient equilibrium samples of the model, especially those samples with large population activity. The spike synchrony distribution is also compared with that obtained under Ising model (Fig. 3 (b)). Different performances are related to the multi-information measure of neural population explained below.
The amount of statistical structure in the neural data due to introducing interactions among neurons can be measured by the multi-information . I first introduce an independent model where only the firing rates of individual neurons are fitted and the corresponding entropy is defined as . The multi-information is then defined as , in which , where , and is assumed to be an upper bound to the true entropy. The true entropy for large populations is difficult to estimate since it requires including all possible interactions among neurons. However, the model entropy with low order interaction parameters could be an approximate information capacity for the neural population, which depends on how significant the higher order correlations are in the population.
Fig. 4 (a) shows the multi-information as a function of the network size. and are compared with the Ising model , which reconstructs faithfully the pairwise correlations. improves significantly over in capturing the information content of the network, but its multi-information is still below that of the Ising model, which is much more evident for larger network size. This is expected, because only part of third order correlations are captured by , while the Ising model describes accurately the entire pairwise correlation profile which may be the main contributor to the collective behavior observed in the population. However, provides us an easy way to understand the higher order correlation, while in the Ising model, it is computationally difficult to estimate the higher order correlations. The average prediction fraction of correlations by and is plotted in Fig. 4 (b). predicts more than of the pairwise correlations, while predicts more than of the triplet correlations. The prediction fraction changes slightly with the network size.
iii.2 Inference performances on the cortical data
To show the inference performance of both types of population couplings on the cortical data, I randomly select a typical network example of neurons from the original dataset, and then apply the computation scheme to this typical example. Results are shown in Fig. 5. Surprisingly, the simplified is able to capture as high as of pairwise correlations, implying that when a rat performed working memory tasks, there exists a simplified model to describe emergent functional states in the medial prefrontal cortical circuit. Moreover, MC sampling of the model also predicts well the spike synchrony distribution (Fig. 5 (D)). This is very different from that observed in the retinal data. In this sense, the MPC circuit is simple in its functional states when the subject is performing specified tasks.
More interesting circuit features are revealed by , which is shown in Fig. 6. About of three-cell correlations are explained by in the MPC circuit. The interaction matrix of in Fig. 6 (C) shows a clear non-local structure in the cortical circuit (stripe-like structure). That is, some neurons interact strongly with nearly all the other neurons in the selected population, and these interactions have nearly identical strength of . Such neurons having stripe-like structure in the matrix may receive a large number of excitatory inputs from pyramidal neurons , and thus play a key role in shaping the collective spiking behavior during the working memory task. The non-local effects are consistent with findings reported in the original experimental paper (cross-correlogram analysis)  and the two-neuron interaction map under Ising model (Fig. 7 (a)). Thus, to some extent, may reflect intrinsic connectivity in the cortical circuit, although the relationship between functional connections and anatomical connections has not yet been well established . Lastly, overestimates the tail of the spike synchrony distribution (Fig. 6 (D)), which may be caused by the sampling difficulty of the inferred model (a model with triplet interactions among its elements). The spike synchrony distribution of Ising model is also compared (Fig. 7 (b)).
Multi-information versus the cortical network size is plotted in Fig. 8 (a). In the cortical circuit, behaves comparably with the Ising model; even for some network size (), it reports a higher information content than the Ising model in the randomly selected subpopulations, which may be caused by the nature of the selected neurons (e.g., inhibitory interneurons , and they have stripe-like structure in the matrix). Note that gives an information close to zero for small network sizes, suggesting that by introducing , one could not increase significantly the amount of statistical structure in the network activity explained by the model. However, the multi-information of grows with the network size, indicating that the role of would be significant for larger neural populations. Fig. 8 (b) reports the prediction fraction of the correlation profile by applying and . Both population couplings can capture over of correlations, which is significantly different from that observed in the retinal data.
The emergent properties of the neural code arise from interactions among individual neurons. A complete characterization of the population activity is difficult, because on the one hand, the number of potential interactions suffers from a combinatorial explosion, on the other hand, the collective behavior at the network level would become much more complex as the network size grows. In this paper, I develop a theoretical framework to understand how pairwise or higher order correlations arise and the basic circuit variables corresponding to these correlation structures. The model is based on the concept of population coupling, characterizing the relationship between local firing activity of individual neuron or neuron-pair and the global neural activity. An advantage is that, it provides a low dimensional and neurobiologically interpretable representation to understand the functional interaction between neurons and their correlation structures. In particular, the concept of population coupling and the associated mean field method used in this paper offer an easy way to evaluate higher order correlations, while the usual sampling method is computationally hard and traditional models (e.g., Ising model) lack a direct interpretation of higher order correlations in terms of simplified (population) couplings.
With the mean field method, the concept of population coupling is tested on two different types of neural data. One is the firing neural activities of retinal ganglion cells under natural movie stimuli. The other is the population activities of medial prefrontal cortex when a rat was performing odor-place matching working memory tasks.
For the retinal data, on average accounts for more than of pairwise correlations, and accounts for over of three-cell correlations. The interaction matrix of contains information about the functional interaction features in the retinal circuitry. It seems that a retinal neuron can be paired with not only negatively strong couplings, but also slightly positive couplings. Only a few pairs of neurons have strong correlations with the global activity of the population. To describe the spike synchrony distribution, performs better than , nevertheless, both of them could not capture the trend of the tail (rare events related to higher order interactions existing in the network). This is not surprising, because and are simplified descriptions of the original high dimensional neural activity, taking the trade-off between the computation complexity and the description goodness.
To extract the statistical structure embedded in the neural population, improves significantly over , and has further additional benefit of describing the third-order correlations observed in the data, as could be used to construct triplet interactions among neurons, although direct constructing all possible triplet interactions is extremely computationally difficult.
Unlike the retinal circuit, the cortical circuit yields a much smaller absolute value of the multi-information, implying that no significant higher order correlations (interactions) were present in the neural circuit when the circuit was carrying out task-related information processing rather than encoding well-structured stimuli (as in the retinal network). This also explains why a simplified description such as and is accurate enough to capture the main features of the population activity, including the spike synchrony distribution. The inferred model on the cortical data reveals a different interaction map from that of the retinal circuit. In the cortical circuit, neurons form the stripe-like structure in the interaction matrix, suggesting that these neurons may receive a large number of excitatory inputs . These inputs may come from different layers of cortex, and they can execute top-down or bottom-up information processing, thus modulate the global brain state in the target cortex during behavioral tasks.
Before summary of this work, I made some discussions about two relevant recent studies on population coupling (see notes added). Ref.  modeled directly the joint probability distribution of individual neural response and population rate (the number of neurons simultaneously active) by linear coupling and complete coupling models. The linear coupling reproduces separately the distribution of individual neuronal state and the population rate distribution, and their couplings, while introduced in my work reproduces mean firing rate and the correlation between individual neuronal state and the background population activity (except the neuron itself). Note that does not model population rate distribution explicitly, which is hard to interpret in terms of functional connectivity. The complete-coupling model reproduces the joint probability distributions between the response of each neuron and the population rate, from which it is hard to conclude that the high-order interactions responsible for high-order correlations can be interpreted and tested. However, reproduces mean firing rate and the correlation between neuron-pair activities and the background population activity (except the neuron-pair itself), and thus explains high-order correlations by an energy model. Furthermore, in this sense, this work overcame a weakness pointed out in another independent later work of population coupling , which fitted directly the population rate distribution and the firing probability for each neuron conditioned on the population rate, and analogously the corresponding model parameters can not be readily interpretable in a biological setting. Due to intrinsic difference in model definitions, these two relevant works have nice properties of studying tuning curves of individual neurons to the population rate, and sampling from the model to reproduce the population synchrony distribution.
In summary, I develop a theoretical model of population coupling and its advanced form, to relate the correlation profile in the neural collective activity to the basic circuit variables. The practical dimensional reduction method is tested on different types of neural data, and specific features of neural circuit are revealed. This model aiming at describing high order correlations with a low order representation, is expected to be useful for modeling big neural data. Note that the interaction matrices shown in Fig. 2 and Fig. 6 are qualitatively robust to changes of the data size to only the first half (data not shown), verifying that the revealed features are not an artifact of overfitting. However, it still deserves further studies by introducing regularization in the learning equation. It is also very interesting to incorporate more physiologically plausible parameters to explain how the collective spiking behavior arises from the microscopic interactions among the basic units. Another interesting study is to clarify the role of higher order correlations in decoding performances based on maximum likelihood principles [37, 38].
Acknowledgements.I am grateful to Shigeyoshi Fujisawa and Michael J Berry for sharing me the cortical and retinal data, respectively. I also thank Hideaki Shimazaki and Taro Toyoizumi for stimulating discussions. This work was supported by the program for Brain Mapping by Integrated Neurotechnologies for Disease Studies (Brain/MINDS) from Japan Agency for Medical Research and development, AMED.
note added.—After I submitted this work to arXiv:1602.08299, I became aware of Ref.  (arXiv:1606.08889), and later Ref.  (bioRxiv, 2016). Discussions about these two recent relevant works are made in the last section of this paper.
Appendix A Derivation of mean-field equations
In this appendix, I give a simple derivation of mean-field equations given in Sec. II. More details can be obtained from Refs [30, 31]. First, after removing an interaction , one defines the cavity probability , where the product comes from the physical meaning of the second kind of cavity probability, namely defined as the cavity probability when only the connection from to is retained while other neighbors of are removed (so-called cavity probability). is thus formulated as . With these two probabilities, it follows that the cavity magnetization , where is named cavity bias in physics . It is related to by .
More specifically, the cavity magnetization is derived as follows,
where I have used the definition . Next, I show how to derive the cavity bias. First, for pairwise interaction,
where I have used the parameterization , and the mathematical identity . Similarly, for triplet interaction,
These results are written in a compact form as Eq. (6) in the main text.
Similarly, the single neuron magnetization is obtained via , where is a normalization constant, and the multi-neuron correlation , where is a normalization constant. Note that and are also related to the free energy contribution of single neuron and neuronal interaction , respectively. The full (non-cavity) magnetization can be derived in a similar manner to Eq. (10), as . In detail, the two-point correlation () is computed as follows,
Analogously, three-point correlation () can be evaluated as
These results are written in a compact form as Eq. (8b) in the main text.
Finally, the partition function for pairwise interaction is computed as follows,
where I have used and the magnetization parameterization of . For triplet interaction, , and the corresponding . Following the same line, the partition function is evaluated for pairwise interaction as
Similarly, the partition function for triplet interaction can be computed as
which is exactly the compact equation given in the main text.
-  R. Q. Quiroga and S. Panzeri. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci, 10:173, 2009.
-  R. Yuste. From the neuron doctrine to neural networks. Nat Rev Neurosci, 16:487, 2015.
-  I. H. Stevenson and K. P. Kording. How advances in neural recording affect data analysis. Nat Rev Neurosci, 14:139, 2011.
-  M. R. Cohen and A. Kohn. Measuring and interpreting neuronal correlations. Nat Neurosci, 14:811, 2011.
-  E. Schneidman, M. J. Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440:1007, 2006.
-  S. Cocco, S. Leibler, and R. Monasson. Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods. Proc. Natl. Acad. Sci. USA, 106:14058, 2009.
-  J. H. Macke, M. Opper, and M. Bethge. Common input explains higher-order correlations and entropy in a simple model of neural population activity. Phys. Rev. Lett, 106:208102, 2011.
-  Elad Ganmor, Ronen Segev, and Elad Schneidman. The architecture of functional interaction networks in the retina. The Journal of Neuroscience, 31:3044, 2011.
-  L. Martignon, G. Deco, K. Laskey, M. Diamond, W. Freiwald, and E. Vaadia. Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies. Neural Computation, 12:2621, 2000.
-  M. J. Schnitzer and M. Meister. Multineuronal firing patterns in the signal from eye to brain. Neuron, 37:499, 2003.
-  I. E. Ohiorhenuan, F. Mechler, K. P. Purpura, A. M. Schmid, Q. Hu, and J. D. Victor. Sparse coding and high-order correlations in fine-scale cortical networks. Nature, 466:617, 2010.
-  S.-I. Amari, H. Nakahara, S. Wu, and Y. Sakai. Synchronous firing and higher-order interactions in neuron pool. Neural Computation, 15:127, 2003.
-  F. Montani, E. Phoka, M. Portesi, and S. R. Schultz. Statistical modelling of higher-order correlations in pools of neural activity. Physica A: Statistical Mechanics and its Applications, 392:3066, 2013.
-  E. Ganmor, R. Segev, and E. Schneidman. Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proc. Natl. Acad. Sci. USA, 108:9679, 2011.
-  Shan Yu, Hongdian Yang, Hiroyuki Nakahara, Gustavo S. Santos, Danko Nikolić, and Dietmar Plenz. Higher-order interactions characterized in cortical activity. The Journal of Neuroscience, 31:17514, 2011.
-  G. Tkacik, O. Marre, T. Mora, D. Amodei, M. J. Berry II, and W. Bialek. The simplest maximum entropy model for collective behavior in a neural network. Journal of Statistical Mechanics: Theory and Experiment, 2013:P03011, 2013.
-  G. Tkacik, O. Marre, D. Amodei, E. Schneidman, W. Bialek, and M. J. Berry II. Searching for collective behavior in a large network of sensory neurons. PLoS Comput Biol, 10:e1003408, 2014.
-  U. Köster, J. Sohl-Dickstein, C. M. Gray, and B. A. Olshausen. Modeling higher-order correlations within cortical microcolumns. PLoS Comput Biol, 10:e1003684, 2014.
-  Haiping Huang. Effects of hidden nodes on network structure inference. Journal of Physics A: Mathematical and Theoretical, 48:355002, 2015.
-  H. Shimazaki, K. Sadeghi, T. Ishikawa, Y. Ikegaya, and T. Toyoizumi. Simultaneous silence organizes structured higher-order interactions in neural populations. Scientific Reports, 5:9821, 2015.
-  M. Okun, N. A. Steinmetz, L. Cossell, M. F. Iacaruso, H. Ko, P. Bartho, T. Moore, S. B. Hofer, T. D. Mrsic-Flogel, M. Carandini, and K. D. Harris. Diverse coupling of neurons to populations in sensory cortex. Nature, 521:511, 2015.
-  S. Fujisawa, A. Amarasingham, M. T. Harrison, and G. Buzsaki. Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nat Neurosci, 11:823, 2008.
-  Rob R. de Ruyter van Steveninck, Geoffrey D. Lewen, Steven P. Strong, Roland Koberle, and William Bialek. Reproducibility and variability in neural spike trains. Science, 275:1805, 1997.
-  Michael Okun, Pierre Yger, Stephan L Marguet, Florian Gerard-Mercier, Andrea Benucci, Steffen Katzner, Laura Busse, Matteo Carandini, and Kenneth D Harris. Population rate dynamics and multineuron firing patterns in sensory cortex. The Journal of Neuroscience, 32:17108, 2012.
-  Simona Cocco and Rémi Monasson. Adaptive cluster expansion for the inverse ising problem: convergence, algorithm and tests. J. Stat. Phys, 147:252, 2012.
-  H. Huang. Sparse hopfield network reconstruction with regularization. Eur. Phys. J. B, 86:484, 2013.
-  Y. Roudi, E. Aurell, and J. Hertz. Statistical physics of pairwise probability models. Front. Comput. Neurosci, 3:1, 2009.
-  Haiping Huang and Taro Toyoizumi. Clustering of neural code words revealed by a first-order phase transition. Phys. Rev. E, 93:062416, 2016.
-  M. Mézard and G. Parisi. The bethe lattice spin glass revisited. Eur. Phys. J. B, 20:217, 2001.
-  H. Huang and H. Zhou. Cavity approach to the sourlas code system. Phys. Rev. E, 80:056113, 2009.
-  M. Mézard and A. Montanari. Information, Physics, and Computation. Oxford University Press, Oxford, 2009.
-  Christophe Gardella, Olivier Marre, and Thierry Mora. A Tractable Method for Describing Complex Couplings between Neurons and Population Rate. eneuro, 3:e0160–15.2016, 2016. arXiv:1606.08889.
-  H. Huang and H. Zhou. Counting solutions from finite samplings. Phys. Rev. E, 85:026118, 2012.
-  Ko Ho, Hofer Sonja B., Pichler Bruno, Buchanan Katherine A., Sjostrom P. Jesper, and Mrsic-Flogel Thomas D. Functional specificity of local synaptic connections in neocortical networks. Nature, 473:87, 2011.
-  Dimitri Yatsenko, Josic Kresimir, Ecker Alexander S., Froudarakis Emmanouil, Cotton R. James, and Tolias Andreas S. Improved Estimation and Interpretation of Correlations in Neural Circuits. PLoS Comput Biol, 11:e1004083, 2015.
-  Cian O’Donnell, J Tiago Goncalves, Nick Whiteley, Carlos Portera-Cailliau, and Terrence J Sejnowski. The population tracking model: A simple, scalable statistical model for neural population data. Neural Computation, 29:50–93, 2017.
-  Michael T. Schaub and Simon R. Schultz. The ising decoder: reading out the activity of large neural ensembles. J Comput Neurosci, 32:101, 2012.
-  Greg Schwartz, Jakob Macke, Dario Amodei, Hanlin Tang, and Michael J. Berry. Low error discrimination using a correlated population code. J Neurophysiol, 108:1069, 2012.