A Distribution Free Unitary Events Method based on Delayed Coincidence Count
Univ. Nice Sophia Antipolis, CNRS, LJAD, UMR 7351, 06100 Nice, France.
Univ. Nice Sophia Antipolis, CNRS, LPMC, UMR 7336, 06100 Nice, France.
Univ. Européenne de Bretagne, CNRS, IRMAR, UMR 6625, 35043 Rennes Cedex, France.
Keywords: Unitary Events - Synchronization - Multiple testing - Independence tests - Trial-Shuffling - Permutation - Bootstrap - Point process
We investigate several distribution free dependence detection procedures, mainly based on bootstrap principles and their approximation properties. Thanks to this study, we introduce a new distribution free Unitary Events (UE) method, named Permutation UE, which consists in a multiple testing procedure based on permutation and delayed coincidence count. Each involved single test of this procedure achieves the prescribed level, so that the corresponding multiple testing procedure controls the False Discovery Rate (FDR), and this with as few assumptions as possible on the underneath distribution. Some simulations show that this method outperforms the trial-shuffling and the MTGAUE method in terms of single levels and FDR, for a comparable amount of false negatives. Application on real data is also provided.
The eventual time dependence either between cerebral areas or between neurons, and in particular the synchrony phenomenon, has been vastly debated and investigated as a potential element of the neuronal code (Singer, 1993). To detect such a phenomenon at the microscopic level, multielectrodes are usually used to record the nearby electrical activity. After pretreatment, the time occurrences of action potentials (spikes) for several neurons are therefore available. One of the first steps of analysis is then to understand whether and how two simultaneously recorded spike trains, corresponding to two different neurons, are dependent or not.
Several methods have been used to detect synchrony (Perkel et al., 1967; Aertsen et al., 1989). Among the most popular ones, the UE method, due to Grün and collaborators (Grün, 1996; Grün et al., 2010), has been applied in the last decade on a vast amount of real data (see for instance (Kilavik et al., 2009) and references therein). Two of its main features are at the root of its popularity: UE is not only able to give a precise location in time of the dependence periods, but also to quantify the degree of dependence by providing -values for the independence tests.
From the initial method, substantial upgrades have been developed:
(i) In the original UE method, the point processes modelling the data are binned and clipped at a rough level, so that the final data have a quite low dimension (around a few hundreds per spike train). However, it is proved in (Grün et al., 1999) that the binned coincidence count as a result of this preprocessing may induce a loss in synchrony detection of about 60% in certain cases. The idea of (Grün et al., 1999) was therefore to keep the data at the initial resolution level despite its high dimension, but to define the notion of multiple shift (MS) coincidence count, nicely condensing the dependence feature that neurobiologists want to analyze without any loss in synchrony detection.
(ii) The original UE method assesses -values by assuming that the underlying point processes are Poisson (or Bernoulli) processes. As there is still no commonly validated and accepted model for spike trains (see for instance Pouzat & Chaffiol (2009) for a thorough data analysis), several surrogate data methods have been proposed (Louis et al., 2010). In particular, trial-shuffling methods (Pipa & Grün, 2003; Pipa et al., 2003) allow to assess -values based on the fact that i.i.d. trials are available, using bootstrap paradigm and without making any assumption on the underlying point processes distribution. However, up to our knowledge, surrogate data methods are always based on binned coincidence count (see section 2 for a precise definition) whose low complexity combined to parallel programming (Denker et al., 2010) make algorithms usable in practice.
(iii) The original UE method is based on two main statistical approximations. First, it involves the underlying intensities (or firing rates) of the Poisson processes, which are unknown in practice, and so replaced by estimates. However, this plug-in procedure is not taken into account in the -values computations. Then, the detection of dependence through time is done by applying several tests at once without correcting for the multiplicity of the tests. In the recent work of (Tuleau-Malot et al., 2014), a multiple testing procedure based on a Gaussian approximation of the Unitary Events (MTGAUE) corrects those two facts, moreover including a generalization of the notion of MS coincidence count, namely the delayed coincidence count, which does not suffer from any loss in synchrony detection. But MTGAUE is still based on the assumption that the underlying point processes are Poisson.
Our aim is here to go further by proposing a new delayed coincidence count-based multiple testing procedure, which does not need any binning preprocessing of the data as in (Tuleau-Malot et al., 2014), but which does not assume any model on the underlying point processes anymore. This procedure combines a permutation approach in the line of (Hoeffding, 1952; Romano, 1989; Romano & Wolf, 2005), with the multiple testing procedure of (Benjamini & Hochberg, 1995).
To do so, we first propose a fast algorithm to compute the delayed coincidence count, with a computational cost equivalent to the one of the binned coincidence count. Next we study several distribution free tests, most of them based on bootstrap approaches, as the trial-shuffling, or the permutation approach. We finally propose a complete multiple testing algorithm which satisfies similar properties as existing UE methods, but without sharing any of the previous drawbacks. Some simulations and applications to real data complete this study.
In all the sequel, and denote two point processes modelling the spike trains of two simultaneously recorded neurons and represents the couple . The abbreviation ”i.i.d.” stands for independent and identically distributed. In this sense, by assuming that independent trials are observed, the observation is modeled by an i.i.d. sample of size of couples from the same distribution as , meaning i.i.d. copies of . This sample is denoted in the sequel by . The corresponding probability and expectation are respectively denoted by and .
Since the independence between and is the main focus of the present work, the following notation is useful: denotes a couple such that (resp. ) has the same distribution as (resp. ), but is independent of . In particular, the couple has the same marginals as the couple . Moreover, denotes an i.i.d. sample of size from the distribution of , and and are the corresponding probability and expectation.
Note in particular that if the two observed neurons indeed behave independently, then the observed sample has the same distribution as .
The notation stands for a function whose value is 1 if the event holds and 0 otherwise.
Finally, for any point process (), stands for its associated point measure, defined by:
and for any interval , denotes the number of points of observed in .
2 Binned and delayed coincidence counts
Because of the way neurons transmit information through action potentials, it is commonly admitted that the dependence between the spike trains of two neurons is due to temporal correlations between spikes produced by both neurons. Informally, a coincidence occurs when two spikes (one from each neuron) appear with a delay less than a fixed (of the order of a few milliseconds). Several coincidence count functions have been defined in the neuroscience literature, and among them the classical binned coincidence count, used for instance in (Pipa & Grün, 2003; Pipa et al., 2003).
The binned coincidence count between point processes and on the interval with for an integer and a fixed delay is given by
where is the th bin of length , i.e. and
More informally, the binned coincidence count is the number of bins that contain at least one spike of each spike trains. The binned coincidence count computation algorithm is usually performed on already binned data. Therefore, given two sequences of and of length , operations are needed to compute the binned coincidence count, without counting the number of operations needed for the binning preprocessing.
The more recent notion of delayed coincidence count, introduced in (Tuleau-Malot et al., 2014), is a generalization of the multiple-shift coincidence count, defined in (Grün et al., 1999) for discretized point processes, to non necessarily discretized point processes.
The delayed coincidence count between point processes and on the interval is given by
More informally, is the number of couples of spikes (one spike from and one from ) appearing in with delay at most equal to .
The delayed coincidence count can be computed using the following algorithm.
It is easy to see that the complexity of this algorithm is not governed only by the lengths and but also by the random numbers of points in intervals of length for step 4.b. More precisely, it is bounded by (for steps 1, 3 and 4.a), (for all steps 2 on all points of ) and times the number of points of in a segment (namely ) of length (for step 4.b). In average, if and are for instance independent homogeneous Poisson processes of respective intensities and , at most operations are required. So, for typical parameters (s, s, Hz), operations in average are required to compute the binned coincidence count, against operations for the delayed coincidence count. Both algorithms are therefore linear in with a slight advantage for the delayed coincidence count which exploits the sparsity of the spike trains, in the usual range of parameters. Notice that all surrogate data methods (see (Louis et al., 2010)) could in principle be applied on this new notion of coincidence, at least when only two neurons are involved.
3 Some distribution free independence tests
Given the observation of a sample corresponding to different trials, the aim is to test:
All existing UE methods are based on a statistic equal to the total number of coincidences:
where generically denotes either , or , or other coincidence count functions that practitioners would like to use (see (Albert et al., 2014) for other choices).
To underline what is observable or not, when is computed on the observation of , it is denoted by , the total number of observed coincidences.
In the following, several of these UE methods or testing procedures are described, which all rely on the same paradigm: ”reject when is significantly different from what is expected under ”. More precisely, the independence is rejected and the dependence is detected when a quantity, based on the difference between the observed coincidence count and what is expected under , is smaller or larger than some critical values. Those critical values are obtained in various ways, each of them being peculiar to each method. Note that the following procedures could be applied to any chosen coincidence count function , though the implemented procedures of the present simulations and data analysis only use the delayed coincidence count, that is .
3.1 A naive approach and the centering issue
As noticed above, the only question, when considering UE methods, is how to construct the critical values.
If the values of the expectation and the variance of under , that is
are precisely known, then the classical Central Limit Theorem gives under independence that
Then, given in , the test which consists in rejecting when is larger than , the quantile of a standard Gaussian distribution, is asymptotically of level . It means that, for this test, the probability of rejecting independence, whereas independence holds, is asymptotically (in , the number of trials) smaller than the prescribed .
In the present point processes framework, strong distribution assumptions, for which the values of and are precisely known, are unrealistic. Even if the spike trains are assumed to be homogeneous Poisson processes as in (Tuleau-Malot et al., 2014), those quantities depend, through some formulas, on the unknown firing rates that have to be estimated and plugged into these precise formulas. It has been shown that this modifies the asymptotic variance shape, and tests under Poisson assumptions with unknown firing rates have been developed in (Tuleau-Malot et al., 2014).
Since Poisson assumptions are quite doubtful on real data (Pouzat & Chaffiol, 2009), the aim of the present work is to go further by not assuming any Poisson or other model assumptions for the spike trains. In this sense, the aim is to develop ”distribution free” methods that are completely agnostic with respect to the underlying distribution of the spike trains. In this case, a preliminary step is to estimate , only using the sample without any distribution assumption. Note that
and that for , as is always assumed to be independent of ,
Therefore, can always be estimated in a distribution free manner by
Hence a reasonable test statistic would be based on the difference:
its observed version being denoted by . Here, is not an empirical mean, but a -statistic, so it does not satisfy the classical Central Limit Theorem. Hence, its limit distribution under is not as straightforward as usual. Nevertheless, some asymptotic theorems, proved in (Albert et al., 2014), show that under mild conditions (always satisfied in practice in the present cases) the following convergence result holds:
As above, denoting by the quantity computed on the observed sample, (2) implies that for some fixed in , the test that consists in rejecting when , is asymptotically of level .
Clearly, one can see that the distribution approximation is good when is large () as expected, but not so convincing for small values of (, or even ), particularly in the tail parts of the distributions. However, as it is especially the tails of the distributions that are involved in the test through the quantile , one can wonder, by looking at Figure 1, if it may perform reasonably well in practice with a usual number of a few tens of trials.
Furthermore, looking informally at Equation (2), readers could think of two approximations that could be roughly formulated in the following way:
This is illustrated on Figure 2.
Looking at the first line of Figure 2, one can see that the approximation formulated in (3) is actually conceivable for large values of . Note that in practice, one cannot have access to and it has to be replaced by , meaning that it is computed with the observed sample. This does not change anything under since is in this case distributed as . But this is a particularly important sticking point if is not satisfied as one can see on the third line of Figure 2: the distribution of does not look like a centered Gaussian distribution of variance , when does not satisfy .
(i) moves around its expectation (which is also the expectation of ) with realizations of . These fluctuations have an order of magnitude of and are therefore perfectly observable on the distribution of whose variance is also of order .
(ii) estimates the variance of and not the one of or . This explains why not only the mean but also the variance are badly estimated in the second line of Figure 2. Both randomness (the one of and the one of ) have to be taken into account to estimate the variance of .
As a conclusion of this first naive approach, the test of purely asymptotic nature, which consists in rejecting when may work for large enough, as the variance is here computed by considering the correctly recentered statistic , and this even if the behavior of the statistic under is not clear. But an ad hoc and more naive test statistic, based on an estimation of the variance of directly and without taking into account the fact that the centering term is also random, would not lead to a meaningful test.
3.2 The bootstrap approaches
It is well known (Giné, 1997) that tests of purely asymptotic nature as the one presented above are less accurate for small values of than more involved procedures. In this article, the focus is on bootstrap/resampling procedures that are usually known to improve the performance from moderate to large sample sizes. Three main procedures are investigated: the trial-shuffling introduced in (Pipa & Grün, 2003; Pipa et al., 2003), the classical full bootstrap of independence and the permutation approach (Romano, 1989).
The main common paradigm of these three methods, as described in the sequel, is that starting from an observation of the sample , they randomly generate via a computer another sample , whose distribution should be close to the distribution of (see also Figure 4).
In particular, the corresponding bootstrapped coincidence count is
This algorithm seems natural with respect to (1) because it avoids the diagonal terms of the square . Hence as a result,
In particular, the corresponding bootstrapped coincidence count is
Note that this algorithm draws uniformly at random in the square and therefore does not avoid the diagonal terms. The idea behind this algorithm is to mimic the independence under of and by drawing the indexes and independently. However
Hence under , but, under , and are only asymptotically equivalent.
In particular, the corresponding bootstrapped coincidence count is
The idea is to use permutations to avoid picking twice the same spike train of the same trial. In particular under , the sum in is still a sum of independent variables, which is not the case in both of the previous algorithms. However, under , the behavior is not as limpid. As for the full bootstrap,
Hence under , and are only asymptotically equivalent.
To compare those three bootstrap/resampling algorithms, the first thing to wonder is whether, at least under , the introduced extra randomness has not impacted the distribution. More precisely, as stated above, all three procedures satisfy
but is the full distribution of the same as the one of ?
The first line of Figure 3 shows as expected that the permutation does not change the distribution of , since, as said above, no spike train is picked twice. However, clearly the trial-shuffling and the full bootstrap have not the same property, even if the distributions are quite close.
Nevertheless, this is not completely convincing. Indeed, the main advantage of bootstrap procedures is to be able for one current observation of to generate several realizations of to obtain not the unconditional distribution of but the conditional distribution of given . Figure 4 gives a more visual representation of the difference between conditional and unconditional distributions.
In particular, in a bootstrap testing procedure, the critical values are quantiles
The second line of Figure 3 shows what happens for the approximated conditional distribution of given under in the three considered bootstrap approaches. Surprisingly none of these three conditional distributions seems to fit the distribution of . One may eventually think that this is due to the Monte-Carlo approximation of the conditional distributions, but for the trial-shuffling approach, Pipa and Grün developed an algorithm for exact computation of the conditional distribution (Pipa et al., 2003): both Monte-Carlo and exact conditional distribution are so close that it is difficult to make any difference between them. Hence there should be another explanation. In fact, the curves on the second line of Figure 3 are similar to the ones on the second line of Figure 2. In both set-ups, one wonders if the distribution of can or cannot be approximated by a distribution depending on the observation of : a very basic Gaussian distribution for Figure 2 and a more intricate distribution using the bootstrap paradigm for Figure 3. Nevertheless both distributions are too widely spread around the aim which is the distribution of . Since the explanation for Figure 2 was a centering defect that can be corrected by considering , one of the possible explanation here is a centering defect for the bootstrap procedures too.
3.3 Which centering for which bootstrap ?
All the bootstrap approaches that have been proved to work from a mathematical point of view are based on centered quantities (Giné, 1997). In particular, the precursor work of Bickel and Freedman (Bickel & Freedman, 1981) on the bootstrap of the mean can be heuristically explained as follows.
Given a sample of i.i.d. real random variables with mean and a corresponding bootstrap sample , it is not possible to estimate the distribution of the empirical mean directly. However one can estimate the centered distribution, i.e. the distribution of . To do so, it is sufficient to replace ”empirical mean” by ”empirical bootstrap mean” and ”expectation” by ”conditional expectation”. More explicitly, denoting by the empirical mean of the bootstrap sample , the distribution of is approximated by the conditional distribution given of
Transposed in our framework, this paradigm would mean that one can obtain a good fit of the distribution of by the conditional distribution of given . But as seen above, constructing a test with test statistic is impossible in a full distribution free context where the value of is unknown.
Therefore one needs to find a quantity that is completely observable but whose mean is still null under . The statistic introduced in Section 3.1 is suitable from this point of view. What one needs to check is whether the distribution of under , that is of (which has zero mean), is well approximated by the distribution of .
For the trial-shuffling, since
one can easily see that because the couple is drawn uniformly at random in the set of the ’s such that (set of cardinality ),
Hence the correct bootstrap statistic is
However similar computations show that the full bootstrap and the permutation satisfy
so and can be used directly.
Figure 5 shows the quality of approximation of the distribution of by the conditional distribution given the observation of either or . The approximation is accurate under but it is actually also accurate even if the observed sample is simulated under , which is in complete accordance with the mathematical results of consistence in Wasserstein distance proved in (Albert et al., 2014). The approximation is just as accurate for the recentered statistic . Note that the difference between the conditional c.d.f. of and the one of is particularly visible under when . Hence, as explained by the computations above, in a trial-shuffling approach, the recentered version leads to the correct bootstrap distribution. Note finally that this corroborates the previous intuition: the reason why the approximation works for and not for is exactly the same as for the naive approach of Figure 2. The centering is indeed random (here it can be viewed as ) and one needs to take it into account to have a correct approximation.
Finally an extra simplification holds in the permutation case, which may seem very surprising. One can easily rewrite on the one hand,
and, on the other hand, for the permutation sample
Note that the sum is invariant by the action of the permutation. Hence if denotes the quantile of order of the conditional distribution of given and if denotes the quantile of order of the conditional distribution of given , this very simple relationship holds
Hence the test that rejects when is exactly the one that rejects when . Therefore despite the fact that the conditional distribution of is not close at all to the one of , the test based on works, because it is equivalent to the test based on , for which the approximation of the conditional distribution works. Note however that this phenomenon happens only in the permutation approach, but not in the trial-shuffling or the full bootstrap approaches.
3.4 Practical testing procedures and -values
From the considerations given above, five different tests may be investigated, the first one based on a purely asymptotic approach, and the four other ones based on bootstrap approaches, with critical values approximated through a Monte-Carlo method. For each test, the corresponding -values (i.e. the values of for which the test passes from acceptance to rejection) are given.
The naive test (N)
It consists in rejecting when
The corresponding -value is given by:
where is the c.d.f. of a standard Gaussian distribution.
The Trial-Shuffling test, version C (Tsc)
It consists in rejecting when
where is the empirical quantile of order of the conditional distribution of given . This empirical quantile is estimated over ( usually) realizations given the observed sample . The corresponding -value is given by:
Despite the problems underlined in Section 3.3, we kept this test in the present study since it corresponds to the one programmed in (Pipa & Grün, 2003) and since this bootstrap procedure is usually the one applied by neuroscientists.
The Trial-Shuffling test, version recentered U (Tsu)
It consists in rejecting when
where is the empirical quantile of order of the conditional distribution of (the correctly recentered statistic) given . This empirical quantile and the corresponding -value are obtained in a similar way as for the above (TSC).
The Full Bootstrap test, version U (Fbu)
It consists in rejecting when
where is the empirical quantile of order of the conditional distribution of given . This empirical quantile and the corresponding -value are obtained in a similar way as for the above (TSC).
The permutation test (P)
The reader may think that it should consist in rejecting when
where is the empirical quantile of order of the conditional distribution of given . But the test by permutation is in fact directly defined by its -value, which is slightly different here, equal to:
The permutation test then consists in rejecting when this -value is less than . Indeed, such a permutation test, with such a slightly different version of -value, has been proved to be exactly of level , whatever (Romano & Wolf, 2005).
Note however that such a slight correction does not work for full bootstrap or trial-shuffling approaches, where the tests are only guaranteed to be asymptotically of level .
Saying that a test rejects at level is exactly equivalent to saying that its -value is less than . If a test is of level for any in , the c.d.f. of its -values should therefore be smaller that the one of a uniform variable (i.e. the diagonal). Between several tests with this guarantee, the less conservative one is the one for which the c.d.f of its -values is the closest to the diagonal. The left hand-side of Figure 6 shows the c.d.f. under of the corresponding -values for the five considered testing procedures and focuses on small -values, which are the only ones usually involved in testing, to highlight the main differences between the five methods. For the chosen value of (), the c.d.f. of the (TSU) and (FBU) -values are almost identical and above the diagonal, meaning that the corresponding tests do not guarantee the level. On the contrary, the c.d.f. of the (N) and (TSC) -values are clearly under the diagonal and far from it, meaning that the corresponding tests are too conservative. As guaranteed by (Romano & Wolf, 2005), the permutation approach guarantees the level of the test: the c.d.f. of the (P) -values is also under the diagonal, but much closer to the diagonal than the one of the (N) and (TSC) -values.
Furthermore, the behavior of the c.d.f. of the -values under gives an indication of the power of the test: the highest the c.d.f. of the -values, the most powerful the corresponding test. Hence among the tests that guarantee the level, the permutation test (P) is the most powerful one. Note that other simulations in more various cases have been performed in (Albert et al., 2014) leading to the same conclusion.
In the sequel, the focus is therefore on the permutation approach, keeping also the trial-shuffling version C approach as a variant of the method developed in (Pipa & Grün, 2003).
4 Multiple tests
4.1 Description of the complete multiple testing algorithm
To detect precise locations of dependence periods that can be matched to some experimental or behavioral events, it is classical to consider a family of windows of cardinal , which is a collection of potentially overlapping intervals covering the whole interval on which trials have been recorded (Grün et al., 1999; Tuleau-Malot et al., 2014). Then, some independence tests are implemented on each window of the collection. Here we propose a complete algorithm which takes into account the multiplicity of the tests, and which moreover enables to see if the coincidence count is significantly too large or too small on each window as in (Tuleau-Malot et al., 2014).
Permutation UE algorithm
Fix a real number in and an integer larger than .
|- Do in parallel for all window in :|