# A Practical Approximation Method for Firing Rate Models of Coupled Neural Networks with Correlated Inputs

###### Abstract

Rapid experimental advances now enable simultaneous electrophysiological recording of neural activity at single-cell resolution across large regions of the nervous system. Models of this neural network activity will necessarily increase in size and complexity, thus increasing the computational cost of simulating them and the challenge of analyzing them. Here we present a novel method to approximate the activity and firing statistics of a general firing rate network model (of Wilson-Cowan type) subject to noisy correlated background inputs. The method requires solving a system of transcendental equations and is fast compared to Monte Carlo simulations of coupled stochastic differential equations. We implement the method with several examples of coupled neural networks and show that the results are quantitatively accurate even with moderate coupling strengths and an appreciable amount of heterogeneity in many parameters. This work should be useful for investigating how various neural attributes qualitatively effect the spiking statistics of coupled neural networks. Matlab code implementing the method is freely available at GitHub (http://github.com/chengly70/FiringRateModReduction).

###### pacs:

02.60.-x, 87.18.Tt, 84.35.+i, 87.18.Sn^{†}

^{†}preprint: APS/123-QED

## I Introduction

With advances in neural recording technologies, experimentalists can now record simultaneous activity across multiple brain regions at single cell resolution Ahrens et al. (2013); Prevedel et al. (2014); Kandel et al. (2013); Lemon et al. (2015). However, it is still a technical challenge to measure the interactions within and across brain regions that govern this multi-region activity. This challenge is heightened by the fact that cortical neurons are heterogeneous and show substantial trial-to-trial variability Cohen and Kohn (2011). Numerous theoretical studies have examined how neural networks can lead to cortex-like dynamics Abbott and van Vreeswijk (1993); Van Vreeswijk et al. (1994); van Vreeswijk and Sompolinsky (1996); Brunel (2000); Brunel and Hakim (1999); Buice and Chow (2007); Bressloff (2009); Renart et al. (2010); J.Touboul and Ermentrout (2011); however, most have been limited to a single region, leaving open the question of how inter-region connection strengths contribute to network processing.

One challenge presented by analyzing multi-region neural networks, is the increased number of parameters which must be specified. To survey a high-dimensional parameter space, one must have a way to efficiently simulate (as in Stringer et al. (2016)) or approximate network statistics (as in Gerstner and Kistler (2002)). Here we present a novel approximation method for calculating the statistics of a general coupled firing rate model (based on Wilson and Cowan (1972)) of neural networks where we: i) assume the activity (not the firing statistics) are pairwise normally distributed, ii) take the entire probability distribution of the presynaptic neurons/populations (providing input) into account. Our method is fast because it requires solving nonlinear equations self-consistently rather than simulating stochastic differential equations. Several example neural networks are considered and compared with Monte Carlo simulations. A specific version of this method was presented in Barreiro et al. (2017) to model the olfactory sensory pathway; here, we derive formulas in a general way which is easy to evaluate and can accommodate heterogeneous networks. We also demonstrate the method’s efficacy on several example networks with much larger dimension than the specific networks examined in our previous work.

## Ii Neural Network Model

Each cell (or homogeneous population) has a prescribed activity that is modeled by the following equation Wilson and Cowan (1972) for :

(1) |

where is a transfer function mapping activity to firing rate (in some units), related to the so-called F-I curve, for the cell/population. Thus, the instantaneous firing rate of the neuron is:

(2) |

Depending on the context, the activity variable may represent membrane voltage, calcium concentration, or some other quantity associated with a neuron’s internal state Dayan and Abbott (2001). This type of equation has historically been used to capture the average activity of a population of neurons but from here on out we will use the term “cell” for exposition purposes. All cells receive background noise , the increment of a Weiner process, uncorrelated in time but potentially correlated at each instant: , , and for with . The parameters and are constants that give the background input mean and input standard deviation, respectively. The parameter represents coupling strength from the presynaptic cell and is a signed quantity; represents inhibitory coupling.

We would like to compute the following statistics:

(3) | |||||

(4) | |||||

(5) | |||||

covariance of activity | |||||

(6) | |||||

(7) | |||||

(8) | |||||

covariance of spiking | |||||

(9) | |||||

correlation of spiking |

where the angular brackets denote averaging over time and realizations ^{1}^{1}1We assume the networks of interest satisfy the conditions of the Ergodic Theorems so that averaging over time and state are the same.
We will use the following definitions for the following Normal/Gaussian probability density functions (PDF):

(10) |

the standard normal PDF, and

a bivariate normal distribution with mean, unit variance, and covariance .

In the absence of coupling, i.e. , Eq. (1) would describe a multi-dimensional Ornstein-Uhlenbeck process. Such a process is well-understood: any pair of activity variables, , are bivariate normal random variables Gardiner (1985). To see this, consider the following two equations without synaptic coupling:

(12) | |||||

Note that we have re-written as sums of independent white noise processes . Since (where we have taken the initial time to be in the far past to eliminate any impact from the initial conditions), we calculate marginal statistics using Itô isometries:

(14) |

A similar calculation shows that in general we have:

(15) |

Thus, . Statistics for the firing rates, , are inherited from this normal distribution, since the firing rate is simply a nonlinear function of the activity .

When coupling is included, i.e. for some indices and , it may no longer be true that the activity variables remain normally distributed. However, it is reasonable to suppose that, for sufficiently weak coupling, the deviations from a normal distribution will be small. Furthermore, if the firing rate function has thresholding and saturating behavior (as does a sigmoidal function), then higher moments of have limited impact on statistics of . Thus, our first assumption will be that each pair of activity variables , can be approximated by a bivariate normal, even when coupling is present. We can think of this as a weak coupling assumption, as it holds exactly only with no coupling.

## Iii Reduction Method

Abbreviation | Definition |
---|---|

In our method, we assume that time is dimensionless so that the subsequent assumptions have the proper units. Note that our method can in principle be applied to systems where time has a dimension, as long they are of the form in Eq. (1) (with appropriate units for the parameters).

To compute statistics, we start by writing Eq. (1) as a low-pass filter of the right-hand-side:

used as the basis for calculating the desired moments of . For example, when is desired, we use the previous equation for and , multiply, then take the expected value . By letting the initial time , we eliminate transients; the resulting statistics will be stationary. The resulting exact formulas are complicated by the network coupling, so we simplify the calculation(s) as follows.

We only account for direct connections in the formulas for the first and second order statistics, assuming the terms from the indirect connections are either small or already accounted for in the direct connections. For example: although on the RHS of Eq. (III) itself depends on coupling terms of the form , etc., we will neglect such terms. We further make the following assumptions:

(17) | |||

(18) | |||

(19) |

See Table 1 for the definition of the symbols: .

Each assumption is equivalent to the assumption that two of the random variables of interest are -correlated in time; thus avoiding the need to compute autocorrelation functions explicitly. The first assumption, Eq. (17), states that is -correlated with itself; the second, Eq. (18), addresses and . The final assumption, Eq. (19), states that and are -correlated. We provide a detailed derivation of Eqs. (17)–(19) in an Appendix (section VIII). After testing our method on several examples in section IV, we will revisit the accuracy of these assumptions in section V.

We arrive at the following (approximation) formulas for the statistics of the activity:

(20) | |||

(21) | |||

(22) |

See Table 1 for the definition of the symbols: , which all depend on the statistical quantities and of the activity . Our approximation formulas form a system of equations in , , (i.e. for the activity only, as defined by Eqs. (3)–(5), not the firing) when considering all possible . This large system of equations, although nonlinear, is simple to solve because it requires a sequence of function evaluations and matrix multiplications, rather than random sampling.

Note that the normal distribution assumptions allow us to conveniently write the average quantities as integrals with respect to standard normal distributions but with shifted integrands, which leads to faster calculations because one does not have to calculate new probability density functions at each step of the iteration when solving the system self-consistently.

The resulting formulas can be written compactly with matrices; Eq. (20) for the mean activity can easily be written as a matrix-vector equation and is thus omitted. Let denote the covariance matrix of the activity with , represent the coupling strengths , and denote the correlation matrix of the background noise (i.e. ). Then we have

where represents element-wise multiplication, denotes matrix transposition, and

(24) | |||

(25) | |||

(26) | |||

(27) |

Note that the matrices and have the same nonzero entries as . Denoting as the diagonal matrix with diagonal , the unperturbed covariance (Eq. (25)) can also be expressed in matrix form as:

Once the statistics of the activity (, , and ) are solved for self-consistently, the firing statistics are solved as follows.

(28) | |||||

(29) |

(30) |

where is a bivariate normal PDF with zero mean and covariance: The off-diagonal terms are obtained from the second order statistics of the activity, Eq. (21)–(22).

## Iv Example Networks and Results

Network I. We first consider a network that allows us to systematically explore algorithm performance as two key parameters vary. Specifically, we consider two cells () that are reciprocally coupled without autaptic (i.e. self) coupling. For simplicity, we set the intrinsic parameters for the two cells to be identical, with , (arbitrary units), but the mean and variance of the background input differ: , , , . We vary two parameters: (input strength from to ), and , with fixed.

In Fig. 1, we see that all of the activity and firing statistics are accurate compared to Monte Carlo simulations. Figure 1(a) shows the mean of as the input strength varies from negative (inhibitory) to positive (excitatory); this statistic is independent of background correlation. Figure 1(b) shows the variance of ; deviations are apparent when the magnitude of the coupling is large. The covariance of the activity (Fig. 1(c)) is also accurate. Even the statistics of the firing rate are relatively accurate; the mean firing rate (Fig. 1(d)) is only weakly dependent on background correlation whereas the variance of (Fig. 1(e)) appears to vary more with background correlation. In Fig. 1(f), the strong dependence of the covariance of the firing rate on background correlation is captured by our method. For brevity, we omit the corresponding statistics for ; the method performs equally well there.

Network II. We next consider an all-to-all coupled network of neurons with heterogeneity in all parameters. The parameter values were selected from specific distributions and gave rise to quenched variability. The transfer function was set to , where and are fixed parameters that depend on the the neuron. The distributions of the parameters for this network are:

(31) | |||||

(32) | |||||

(33) | |||||

(34) | |||||

(35) |

where is a uniform random variable, and is normally distributed with the mean and variance as the arguments. The covariance matrix of the background noise was randomly selected as follows:

(36) |

where the entries of the matrix are independently chosen from a normal distribution: and is the inverse square-root of the diagonal of ; i.e., if we set with entries , then . By construction, is symmetric positive semidefinite with 1’s on the diagonal.

Finally, the entries of the coupling matrix are randomly chosen, but the parameters of the distribution were varied:

(37) |

where for . There are no zero entries in (i.e. coupling is all-to-all), with both inhibition, excitation, and autaptic (self) coupling.

For each of the four values for the variance of the normal distribution, we chose a single realization of a coupling matrix G and computed first and second-order statistics of and . In Fig. 2 we compared our analytic vs. Monte Carlo results for each cell or cell pair. Each realization is identified by a different color; in Fig. 2(a) for example, there are red data points, corresponding to each for . Points that are on the black diagonal line represent a perfect match between Monte Carlo simulations and our method.

First-order statistics and are well-captured by the analytic method, even for the largest coupling strength (Fig. 2(a,b)). This excellent agreement is present despite the substantial amount of heterogeneity in these networks: note that and that , and thus that single-cell firing rates in Fig. 2(b) have a relatively large range. Second-order statistics (variances and covariances: Fig. 2(c-f)) are captured well for smaller coupling values (blue and cyan) but become less accurate for the largest coupling value (red). In particular, the analytic method appears to overestimate variance for the largest coupling strength (Fig. 2(c)).

Network III. Finally we consider a moderately sized network of neurons with quenched heterogeneity in all of the intrinsic parameters, but with more physiological connectivity structure than Network II. The first 50 neurons are excitatory ( for ) and the last 50 are inhibitory ( for ). We choose a sparse (random) background correlation matrix via:

where as before is a Gaussian random variable. That is, each cell shares correlated input with its nearest-neighbors of the same type (excitatory vs. inhibitory), and a single cell of the opposite type, where cell location varies along a one-dimensional line. This results in a correlation matrix which is tridiagonal, with an antidiagonal band for the E and I correlation; this sparsity structure is shown in Fig. 3(a).

In a variety of cortical areas, there is evidence that the correlation of neural activity within a population is on average positive with a wide distribution Poulet and Petersen (2008); Yu and Ferster (2010); Gentet et al. (2010); thus we set the distributions of excitatory and inhibitory correlation coefficients to and respectively (second and third lines of Eq. (LABEL:parms_cr_net3a)). Also, there is evidence that E and I neurons are positively correlated (i.e., the synaptic currents are negatively correlated) Borg-Graham et al. (1998, 1996); Okun and Lampl (2008), so we set the average background E-I correlation (, fourth line of Eq. (LABEL:parms_cr_net3a)) to a higher value than correlations within E or I (second and third lines respectively).

In order to capture some realistic features of cortical neural networks, we impose sparse but clustered connectivity. Specifically, we have 5 clusters of E cells of size 10 with all-to-all connectivity and no autaptic (self-coupling) connections, and sparse random coupling within the I population (no autaptic connections) and between E and I cells (35% connection probability). See Fig. 3(b) for the sparsity structure of . This is motivated by experimental evidence that E cells show clustered connectivity Song et al. (2005); Perin et al. (2011); Ko et al. (2011), and that cells tuned for specific stimulus features can be more connected, while inhibitory connections have less structure Fino and Yuste (2011).

Synaptic connection strengths were chosen randomly for each realization with the following distributions:

(39) |

where again is a uniform random variable. The value is used for all nonzero E to E connections: i.e. with ; is used for all nonzero I to E connections: i.e. with and ; for all nonzero E to I: i.e. with and ; similarly for .

The distributions for the rest of the parameters were similar to Network II, with only inconsequential differences:

(40) | |||||

(41) | |||||

(42) | |||||

(43) | |||||

(44) |

The choices for and intrinsic parameters are not physiologically motivated, but rather chosen so that we can examine how the algorithm performs on cells with a wide range of intrinsic and network parameters.

In Fig. 3(c) and (d) we show the results of the analytic approximation compared to Monte Carlo simulations for the activity and firing rates, respectively. In each panel, we have combined the mean, variance and covariance and, as in Fig. 2, a data point is plotted for each cell (for means and variances) or cell pair (for covariances). Also, we show data from two (2) instances of the network, labeled A and B; for each instance a new realization of the coupling matrix and the coupling parameters (Eq. (39)) are generated (see Fig. 3 caption for values), but each of the other randomly selected parameters were kept fixed. Points that are on the black diagonal line represent a perfect match between Monte Carlo simulations and our method. As with Network II, the analytic method accurately captures the statistics cell-by-cell, despite an appreciable degree of heterogeneity.

Finally, we test how well our method approximates firing rate correlation, which is an important normalized measure of trial-to-trial variability (or noise correlations). The Pearson’s correlation coefficient is the predominant measure in neuroscience: , i.e. the ratio of two quantities which we must estimate using the analytic method. Since this is the ratio of estimated quantities, we might expect larger errors. In Fig. 4, we show comparisons between the analytic method and Monte Carlo simulations for Network II and Network III. The method is accurate for a wide range of correlations: Fig. 4(a) shows correlations as low as and as high as . Thus, the viability of our approximation is not limited to small correlation values, but can robustly capture the full range of correlation values observed in cortical neurons Cohen and Kohn (2011); Doiron et al. (2016).

## V The -correlation Assumption

The assumptions made in deriving Eqs. (17)–(19) — each equivalent to an assumption that two random variables are -correlated in time — might suggest that the error of our method compared to Monte Carlo simulations would increase as increases, or perhaps that the method breaks down when the distribution of has larger variance. Thus far, we have only considered relatively narrow distributions of . We now test this possibility in a setting where we can examine how the method performs as is increased without other confounding effects on the error.

Specifically, we simulate a network of cells with the majority of the network parameters set to be homogenous values:

The correlation matrix for background noise is a tridiagonal matrix with in the upper and lower diagonal bands, and the coupling matrix is the same as in Network II: with and . Finally, we set

(45) |

so that varies uniformly over an order of magnitude: and .

Figure 5(a) shows that the method is accurate for all possible first and second order statistics despite this large variation in . Figures 5(b) and (c) show that the L-error between the Monte Carlo simulations and our method does not depend in any apparent way on the value of the time constant . Even when analyzed by a particular statistic, there is no trend with time constant. We conclude that our method is robust to large and disparate values of .

## Vi Accuracy of the reduction method

As with most calculations that assume small/large values in the parameters, an exact analytic determination of when the approximation fails is difficult, if not impossible. To capture how our method deviates as the coupling strength is increased, we performed further computations varying the coupling strengths , system size , and setting the other parameters to a variety of values.

The coupling matrix was randomly chosen with half of the entries set to zero, 25% of the entries set to and 25% set to , where is a scale parameter representing the magnitude of the coupling (Note that was chosen only once for each given network size , i.e. it was held fixed while other random parameters were varied). The correlation matrix of background noise is a banded matrix with between 1 and 4 bands above/below the diagonal set to ; that is, each cell shared noise with its -nearest neighbors, for or . The rest of the parameters were chosen randomly as for Network III, Eq. (41)–(44).

Figure 6 shows these results; for each network size and shared common noise footprint , the error of our method compared to Monte Carlo simulations is plotted as a function of the magnitude of the coupling strength . Each point represents the L error between Monte Carlo and our method, averaged over the entire set of six spiking/activity statistics (mean, variances, and covariances of both activity and firing rate ). For each system size (except ; see explanation below), four curves show results for and respectively. We see that as the magnitude of coupling values increase, any given error curve tends to increase.

In performing computations for Fig. 6, we made the following modifications for computational tractability: for larger values (), we augmented our method to use a subset of correlation/covariance values; we chose the main diagonal and the super/sub-diagonal ; we further computed only one instance for (i.e. there is only one curve). We further note that some entries are not plotted because our method did not converge to a solution and/or the resulting covariance matrices are not positive definite, which is not unexpected with randomly chosen parameters.

In summary, Fig. 6 provides a representative snapshot of the range of possible error values. While the average error increases nonlinearly as coupling strength increases, overall the error appears to be relatively insensitive to system size. We note that the scaling of coupling strengths by the square root of system size is considered to be “strong scaling” popularized by the theory of balanced networks van Vreeswijk and Sompolinsky (1996, 1998), compared to the relatively weak scaling .

## Vii Discussion

There has been a long history of analytic reduction methods for neural network models, both to enhance efficiency in simulation and to aid mathematical analyses. Here, we summarize some of this literature and its relationship to the work presented here.

The simplest approach is a mean-field analysis, which would self-consistently estimate the mean values, assuming the variances to be 0 Tuckwell (2005). However, this neglects the fluctuations which we know to be important in neural systems; therefore many authors have augmented these theories with corrections to capture second-order statistics, higher-order statistics, or time-dependent correlations. Several authors have proposed to derive these corrections by starting with the microscopic dynamics of single neurons in the network. The microscopic dynamics in question may be given by a master equation Buice and Cowan (2007); Bressloff (2009); Buice et al. (2010); J.Touboul and Ermentrout (2011); Bressloff (2015), a generalized linear model Toyoizumi et al. (2009); Ocker et al. (2017), or the theta model Buice and Chow (2013a, b). The result is a principled theory for the second-order statistics of the network, however, the resulting calculations are often complicated and hard to execute.

Here, we aimed to take a middle road between simple (but inaccurate) mean-field calculations, and principled (but complicated) theories to compute network fluctuations from microscopic dynamics. Specifically, we start with a system of coupled stochastic differential equations, each of which may represent either a single neuron or a homogeneous population, and sought to quickly and accurately estimate statistics of the coupled system. Importantly, the unperturbed state in our system is not one in which all neurons are independent; instead we perturb from a state with background noise correlations. Thus, we anticipate this approximation can be used to probe a range of neural networks, in which correlations can be significant and activity-modulated.

While the coupled firing rate models we study here were not derived directly from the microscopic dynamics of a spiking network, our results can still yield insight into spiking networks Rosenbaum et al. (2017); Kanashiro et al. (2017); Ledoux and Brunel (2011). Our previous work Barreiro et al. (2017) used the qualitative principles and intuitions gained from a simple firing rate model to characterize relationships between the analogous parameters in a full spiking model of a multi-region olfactory network. In that paper, a small system with simple coupling and background correlations was studied, whereas this paper treats networks of arbitrary size, and arbitrary coupling and input correlation structures. The work here is thus a generalization of the calculations in Barreiro et al. (2017).

In other models, represents the function that maps firing rate to synaptic input. Here, we assume that the effective synaptic input is a fixed scaling of the firing rate . In other biophysical models the effective synaptic input may be a more complex transformation of the firing rate (e.g., an alpha function convolved with firing rate): the methods presented here can easily be altered to account for this. To do this, the only change would be to use in Eq. (1) instead of , where is some synaptic activation function.

Our method relies on the assumption that statistics are stationary in time; this assumption allows a set of statistics to be solved self-consistently. Thus we have not addressed complex network dynamics, such as oscillations or time-varying statistics. However, this limitation is not specific to our method, but also applies to related work. Previously developed approximation methods may fail when the system undergoes a bifurcation Buice and Cowan (2007); Buice et al. (2010), and truncation methods (or moment closure methods) are known to fail in certain parameter regimes Ly and Tranchina (2007). When the set of self-consistent equations cannot be solved, there may be other methods available to characterize the oscillatory dynamics (see Nicola et al. (2015) where this is done for the adaptive quadratic integrate-and-fire model). Likewise, we did not consider time-lagged network statistics (i.e., the entire cross-correlation functions) but rather only the instantaneous statistics. This perhaps enables the delta-correlation assumption in our method to give accurate approximations even with disparate time-scales (see section V). Such considerations are a fruitful path of future work.

###### Acknowledgements.

Cheng Ly is supported by a grant from the Simons Foundation (#355173).## Viii Appendix: Derivation of Equations (17)–(19)

Here, we provide a formal derivation of the assumptions that we make to derive our main result Eqs. (20)–(22). Our challenge is that while we are principally interested in zero-time-lag statistics, computing second-order statistics (such as ) using Eq. (III) requires us to know the autocorrelation function, i.e. . Therefore we need to close the equation by making some kind of assumption about these temporal correlations.

The key assumption to justify Eq. (17) is that is -correlated in time: i.e.

where is the mean. Using this in the integral on the left-hand side of Eq. (17), we find that