Bayesian log-Gaussian Cox process regression with applications to fMRI meta-analysis

Bayesian log-Gaussian Cox process regression with applications to fMRI meta-analysis

Pantelis Samartsidis MRC Biostatistics Unit Tor D. Wager University of Colorado at Boulder Lisa Feldman Barrett Northeastern University Shir Atzil Massachusetts General Hospital Simon B. Eickhoff Heinrich-Heine University Düsseldorf Thomas E. Nichols University of Warwick Timothy D. Johnson University of Michigan

A typical neuroimaging study will produce a 3D brain statistic image that summarises the evidence for activation during the experiment. However, for practical reasons those images are rarely published; instead, authors only report the (x,y,z) locations of local maxima in the statistic image. Neuroimaging meta-analyses use these foci from multiple studies to find areas of consistent activation across the human brain. However, current methods in the field do not account for study-specific characteristics. Therefore, we propose a fully Bayesian model based on log-Gaussian Cox processes that allows for the inclusion of study-specific variables. We present an efficient MCMC scheme based on the Hamiltonian Monte Carlo algorithm to simulate draws from the posterior. Computational time is significantly reduced through a parallel implementation using a graphical processing unit card. We evaluate the method on both simulated and real data.

1 Introduction

Functional magnetic resonance imaging (fMRI) has experienced an explosive growth over the past two decades, leading to significant advances in our understanding of human brain function (Raichle, 2003). The objective of an fMRI study is to identify the neural correlates of a physical, mental, or perceptual process. During the experiment, participants are asked to perform a series of tasks, such as viewing images or reading text, while the MRI device measures changes in the Blood Oxygenation Level Dependent (BOLD) signal, a surrogate for neuronal activity. A series of 3D volumes of BOLD measurements are collected for each participant and at each voxel (volume element) a hierarchical multi-subject time series model is fit. The ultimate result of this process is a statistic image that assesses the evidence for a change in the BOLD signal induced by the task. For more details on the statistical analysis of fMRI data see Lindquist (2008) or Poldrack et al. (2011).

While fMRI is now a standard tool, the importance of the findings can be undermined by the inherent limitations of a single fMRI experiment. In particular, there are three facets of fMRI experiments that put the quality of the results into question. First, fMRI studies suffer from low power. The typical sample size of an fMRI study is small, with the majority of experiments using less than 20 participants (Carp, 2012). For example, a recent meta-analysis of emotions found a median sample size of 13 (Lindquist et al., 2012). Such small sample sizes limit power and surely result in Type II errors. Furthermore, Type I errors are also prevalent (Wager et al., 2009), as a result of searching for significant effects over thousands of voxels. While methods exist to correct multiplicity, sometimes they either are not applied or are applied inappropriately. Finally, fMRI experiments suffer from low reliability (Raemaekers et al., 2007); that is, an individual examined under the same experimental conditions may give different results on two different sessions.

For all these reasons, there has been a growing interest in meta-analysis of functional neuroimaging studies. A meta-analysis can increase power by combining the available information across studies, hence separating the consistent findings from those occurring by chance. In fMRI there are two broad approaches for meta-analysis. When the full statistical images are available, that is effect sizes and associated standard errors for all voxels in the brain, an Intensity-Based Meta-Analysis (IBMA) can proceed by means of standard meta-analytic mathods (see Hartung et al. (2008) for an overview). However, these statistic images (200,000+ voxels) traditionally have not been shared by authors. Instead, researchers only publish the brain atlas coordinates of the local maxima in significant regions of the statistic image. We call these coordinates the foci (singular focus). When only foci are available then a Coordinate-Based Meta-Analysis (CBMA) is conducted. As can be expected, the transition from full images to the lists of foci involves a heavy loss of information (Salimi-Khorshidi et al., 2009). However, since the vast majority of researchers rarely provide the full images, CBMA constitutes the main approach for fMRI meta-analysis.

Most work in the field is focused on the so-called kernel-based methods such as Activation Likelihood Estimation (Turkeltaub et al., 2002; Eickhoff et al., 2012, ALE), Multilevel Kernel Density Analysis (Wager et al., 2004, 2007, MKDA) and Signed Differential Mapping (Radua and Mataix-Cols, 2009; Radua et al., 2012, SDM). Briefly, these methods construct a statistic map as the convolution of the foci111Precisely, this is a convolution of a Dirac delta function located at each foci with a given kernel. with 3D spatial kernels. Areas of the map with large values suggest brain regions of consistent activation across studies. For statistical inference, the map is thresholded by reference to a Monte Carlo distribution under the null hypothesis of uniform spatial foci allocation. Kernel-based methods are not based on an explicit probabilistic model and hence often lack interpretability. Moreover, for some methods it is difficult to obtain standard errors and hence only -values are reported for each voxel. Recently, some model-based methods were proposed to address the limitations of kernel-based methods, such as the Bayesian Hierarchical Independent Cox Cluster Process model of Kang et al. (2011) and the Bayesian Nonparametric Binary Regression model of Yue et al. (2012). However, none of these methods account for study-specific characteristics in the meta-analysis.

In this work we propose a Bayesian spatial point process model, an extension of the log-Gaussian Cox process model (Møller et al., 1998) that can include study specific characteristics as explanatory variables, thus introducing the notion of meta-regression (Greenland, 1994) to CBMA. Meta-regression is an important facet of meta-analysis, especially when there is appreciable heterogeneity among the studies. For example, in a review of 241 fMRI studies, Carp observed 223 different analytical strategies. It is therefore essential to explore the effect that study characteristics have on the analyses outcomes, as well as to identify which of the outcomes replicate regardless of the analytical pipelines. Further, our method takes advantage of some recent advances in parallel computing hardware, specifically the use of graphical processing units (GPUs).

The remainder of this manuscript is organised as follows. In Section 2 we outline the details of our model and describe the approximations required for posterior inferences. In Section 3 we describe the Hamiltonian Monte Carlo algorithm used for simulating samples from the posterior distribution of our model parameters. In Sections 4 and 5 we apply the model to simulated and real data respectively. In Section 6 we conclude and set some possible directions for future work.

2 Model specifications

Suppose that there are a total number of studies in the meta analysis and that each study comes with a point pattern , a set of foci , where is the support of the analysis, usually set from a standard atlas of the brain, and , where is the number of foci in a study. Additionally, suppose that for each point pattern there is a set of study specific characteristics, .

We will assume that each point pattern is the realisation of a Cox point process defined on , driven by a random intensity . We can then model the intensity function at each point as:


where is the baseline intensity parameter, and the are the regression coefficients. Equation (1) defines a spatial log-linear model over the brain. Foci are more likely to occur in regions of the brain with high intensity values whereas we expect almost no foci in regions as the intensity approaches zero. The exact rates are given by the properties of a Cox process. In particular, given , the expected number of foci in any bounded is a Poisson random variable with mean (Møller and Waagepetersen, 2004).

In practice we expect that some covariates will have a global (homogenous) effect. Therefore, we split the covariates into the that have a local effect and that have a global effect:


where for all .

A Bayesian model is defined with prior distributions on model parameters, here the functional parameters , , and scalar parameters , . A natural way to proceed is to assume that are realisations of Gaussian processes and that the have normal distributions. That way, the right hand side of Equation (2) is also a Gaussian process, and each point process is a log-Gaussian Cox process (LGCP) (Møller et al., 1998). We will assume an isotropic, power exponential correlation structure, that is for points we have:


for , where are the correlation decay parameters and are the smoothness parameters. The same correlation structure was used by Møller et al. (1998) and Møller and Waagepetersen (2003) in the context of LGCPs. Of course, one could consider alternative correlation structures such as the Matérn covariance function (see for example Rasmussen and Williams (2005)).

The log-Gaussian Cox process is a flexible model for spatial point data that can account for aggregation (Møller et al., 1998; Møller and Waagepetersen, 2007) or even repulsion between points (Illian et al., 2012a) and has therefore found applications in several fields such as disease mapping (Benes et al., 2002; Liang et al., 2009) and ecology (Møller and Waagepetersen, 2003; Illian et al., 2012b).

By the definition of a Cox process, is a Poisson point process on conditional on (Møller and Waagepetersen, 2004). The density (Radon-Nikodym derivative) of this point process with respect to the unit rate Poisson process is:


for , with denoting the volume of the brain. We can view as the density of the sampling distribution of the data; if we further assume independent studies, then we obtain the likelihood for the model as:


where is as defined in Equation (2). Inference can be then achieved through the posterior distribution of the model which is given, up to a normalising constant, by:


where and are the priors on the functional and scalar parameters, respectively. We discuss the priors below in Section 2.1.

2.1 Posterior approximation

Calculation of the posterior in Equation (6) requires the evaluation of the infinite dimensional Gaussian processes , , which we approximate with a finite dimensional distribution. Following Møller et al. (1998) and Benes et al. (2002), we consider the discretisation of the 3D volume with a regular rectangular grid . We use cubic cells (i.e. voxels) in with volume , where is the length of the side. In neuroimaging, analysis with cubic voxels is typical, leading to a box-shaper grid of about 1 million voxels, of which about 200,000 are in the brain or cerebellum. Note that for simplicity, we consider both grey matter and white matter voxels in our implementations. Voxels are indexed , and the coordinate of is the location of the center .

For any , the Gaussian process can be now approximated with a step function which is constant within each voxel and equal to the value of at the location of the center, i.e. . Waagepetersen (2004) shows that the accuracy of this approximation improves as goes to zero. By definition, are multivariate Gaussian vectors. We parametrise as:


where are the overall (scalar) means, are the marginal standard deviations, are the correlation matrices with elements , and are the a priori vectors, . The same parametrisation is used by Møller et al. (1998), Christensen and Waagepetersen (2002) and is advocated by Christensen et al. (2006) because it allows for computationally efficient posterior simulations. For the purposes of this work we will only consider the case where i.e. the Gaussian correlation function. This choice may seem rather simplistic but is justified by the sparsity of points in CBMA data and ubiquitous use of Gaussian smoothing kernels in neuroimaging data analysis.

Priors for the -vectors are induced by the parametrisation of Equation (7). For the remaining parameters we will assume weekly informative priors: , , and 222This prior may not be uninformative for applications with a different scale of distances.

Once the latent Gaussian processes are approximated, one can also approximate with a step function as before. The intensities at the center of each voxel are given by:


where is the -vector, the discretised intensity. We will write for the -element of study ’s intensity, and note we require to capture the mean effect. The approximated posterior is:


where , takes on the value when and 0 otherwise, is the index of the voxel containing , and is the joint prior distribution of the parameters. The posterior distribution in Equation (9) is still analytically intractable due to the presence of an unknown normalising constant and thus we need to resort to Monte Carlo simulation or approximation techniques to obtain samples from it.

3 Sampling algorithm details

Bayesian methodology for inference on LGCPs can be broadly divided into two main categories: simulation based approximations of the posterior such as Markov Chain Monte Carlo (Møller et al., 1998) and elliptical slice sampling (Murray et al., 2010), and deterministic approximations to the posterior such as integrated nested Laplace approximations (Illian et al., 2012a; Simpson et al., 2016, INLA) and variational Bayes (Jaakkola and Jordan, 2000). In a recent study, Taylor and Diggle (2014) compare the Metropolis-adjusted Langevin (MALA) algorithm with INLA and find that both methods give similar results. In our application, we choose to use simulation based methods because application on our 3D problem is more straightforward.

Of course, building an algorithm for such high-dimensional problem is non-trivial. Girolami and Calderhead (2011) showed that of all possible strategies, their Riemann Manifold Hamiltonian Monte Carlo (RMHMC) sampler is the computationally most efficient for LGCPs in a 2D setting. Unfortunately, application in this problem (3D setting) is prohibitive as it would require the inversion of a huge matrix. Therefore, we choose to use the standard Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011, HMC) algorithm which Girolami and Calderhead (2011) found to be the second most efficient.

HMC initially appeared in the physics literature by Duane et al. (1987) under the name Hybrid Monte Carlo, and later emerged into statistics literature by Neal (2011). HMC emulates the evolution of a particle system which is characterised by its position () and momentum () over time. In our case, will be the parameter vector of interest and will be introduced artificially from a distribution, with being the dimensionality of the problem and the mass matrix. The dynamics of the system are described by a set of differential equations, known as Hamilton’s equations.

HMC alternates between moves for the position vector and the momentum vector based on Hamilton’s equations. If the solutions of the equations can be found analytically then moves will be deterministic; if not, numerical integration is required and an acceptance/rejection step must be performed to account for integration error. Integration is done in fictitious time , where is the stepsize and is the number of steps. Typically the leapfrog integrator is employed, which for and starting at time is performed as (Neal, 2011):


Overall, if the method is applied correctly, it will produce a time-reversible Markov chain that has the desired distribution as its stationary distribution. As we show in Appendix A, gradient expressions are available in closed form for all model parameters including correlation parameters . We therefore choose to update all and jointly with HMC. The solutions to Hamilton’s equations are not available analytically so we need to use the Leapfrog integrator and include an accept/reject step at the end of it.

The procedure requires the specification of a stepsize and a total number of leapfrog steps . Hoffman and Gelman (2014) show how tuning can be achieved automatically but when we applied this method to our problem running time was increased substantially. Therefore we use an alternative approach to tune these parameters. The stepsize is automatically adjusted during the burn-in phase of the HMC to give an overall acceptance rate close to the suggested by Neal (2011). In particular, if is the stepsize at iteration and is the acceptance rate over the past iterations, then every iterations we calculate the new stepsize as:


Specifically we use and A similar approach is empoyed by Marshall and Roberts (2012) for MALA. The latter (number of leapfrog steps), is always fixed to . We took this approach because we found that, for our LGCP application, the mixing properties of the algorithm scale linearly with but also with the total number of HMC iterations. Hence one can use a relatively large and few iterations or relatively smaller and more iterations, the total computation time staying relatively constant.

The last tuning parameter in the HMC algorithm is the variance-covariance matrix of the zero mean normal momentum parameters, . To our knowledge, there is only limited off the shelf methodology on how to adjust . As a starting place we set . Neal (1996) suggests that if an estimate of the posterior variance is available then a good practice is to set . In principle, can be estimated during the burn-in phase of HMC but in practice this is not possible due to the dimensionality of the problem. In our simulations, we found that the mean posterior variance of the elements of the was higher compared to the scalar parameters, followed by or and then . Especially for the the scale is typically much smaller compared to the other parameters in our applications and so we use instead of . After the reparametrisation we found that setting the mass for parameters of , , and equal to 1, 3, 3 and 10 respectively worked well in all implementations.

The most computationally demanding part of the algorithm is the the calculation of the large matrix-vector products appearing in the intensity functions of Equation (8). Luckily, an elegant solution to this problem is given by Møller et al. (1998) based on circulant embedding that was first proposed by Dietrich and Newsam (1993) and Wood and Chan (1994). The key to the approach is the linear algebra result that a circulant matrix has the discrete Fourier basis as its eigenvectors. is not circulant but is block toeplitz and can be embeded in a matrix that is circulant. Thus the matrix square root, inversion and multiplication can be accelerated by using (the highly efficient) discrete Fourier transform (DFT) of the embedded matrix and manipulating Fourier coefficients, followed by inverse DFT and extracting the appropriate sub-matrix/sub-vector. See Rue and Held (2005) for more details.

We close this section by stressing that despite the massive dimensionality of the parameter vector, the problem has a very high degree of parallelisation. Intensities can be evaluated in blocks of thousands of voxels simultaneously making the algorithm suitable for implementation in a graphical processing unit (GPU). The most computationally intensive part of our model, namely operations with DFTs, is also amenable to parallelisation and there exist libraries such as NVIDIA’s cuFFT library that are designed for this specific task. Overall, we believe that implementation of the log-Gaussian Cox process model described above will soon become a routine task for any moderately powerful GPU device.

4 Simulation studies

We consider two simulation setups. In the first we draw samples directly from the log-Gaussian Cox process whereas in the second we create synthetic studies based on a different model to assess the robustness of our method to model misspecification. For consistency, all processes are defined on the same brain atlas used in the application of Section 5, consisting of cubic voxels. The average number of foci per simulated study is kept low (mean number of foci per study is 5) to resemble the sparsity of points observed in real CBMA data. Finally, the total number of studies is fixed to in both analyses, similar to the sample sizes available in real applications (Kang et al., 2011, for example).

4.1 Setup 1

In this setting we simulate 200 studies, with two spatially varying covariates that account for the mean of two groups of studies, and two non-spatially varying covariates. For we set:


where , , and . Note that this parametrisation of the covariates implies existence of two types of studies, say type 1 and 2, with different spatially varying means and the effect of one continuous and one categorical covariate. The expected total number of foci is 3.99 and 4.16 for studies of type 1 and 2 respectively. We draw from their prior and fix the values of the scalar parameters shown in Table 1. We run the HMC algorithm of Section 3 for 10,000 iterations, discarding the first 4,000 as a burn-in and save every 6 iterations for a total of 1,000 saved posterior samples. This took roughly 14 hours on an NVIDIA Tesla K20c GPU card.

Results are summarised in Table 1 and Figure 1. In Table 1 we see that the scalar parameters are estimated accurately despite the sparsity of points in the realisations. The 95% credible intervals contain the true values of all the parameters in the setup. Some traceplots for these parameters can be found in Section B of the Appendix.

For , the median expected number of points is 3.97 (95% CI [3.84,4.10])for type 1 and 4.61 (95% CI [4.46,4.78]) for type 2. These values are very similar to the values we observe in the simulated dataset, that is 3.98 for type 1 and 4.53 for type 2. This indicates that our model does a good job fitting the data. The shape of the latent Gaussian processes is generally captured for both types as can be seen in Figure 1. In particular, we can see that the maxima in the true and estimated images appear roughly in the same locations. The same cannot be said about the other values but this is expected given the dearth of information in regions of low intensity.

Parameter True Value Posterior median credible interval
-13.7 -13.72 [ -13.99 , -13.48 ]
-14.2 -14.14 [ -14.47 , -13.86 ]
1.2 1.19 [ 1.01 , 1.38 ]
1.6 1.61 [ 1.43 , 1.81 ]
1 0.93 [ 0.69 , 1.27 ]
2 2.30 [ 1.69 , 3.15 ]
2 1.44 [ 0.22 , 2.52 ]
1 0.95 [ 0.32 , 1.65 ]
Table 1: Posterior summaries of the scalar parameters of the LGCP model, fit to the simulated data of Section 4.1. Results are based on 1,000 posterior draws. The values for the correlation parameters , are multiplied by 100. The values for and are multiplied by 10.

4.2 Setup 2

In this setup we create datasets with a pattern of points that follows brain structures of interest. Again there are two types of studies, say type 1 and type 2. For each study , , we generate the total number of points from a Negative Binomial distribution with mean and variance . For the covariates, and . Once we know the exact number of foci per study, we assign the study uniformly at random to one of the 2 types and the distribute its foci as follows. For type 1, foci appear systematically in the following regions: each focus can be observed in the right amygdala () with probability 55%, the orbifrontal cortex () with probability 30% or anywhere else in the brain with probability 15%. The configuration for type 2 differs in that most of the points will go to the left amygdala () instead of the right amygdala. If a focus is assigned to one of the three broad regions, the exact location has a uniform distribution over the region. In the fourth column of Figure 2 the regions in red and blue correspond to the left and right amygdala respectively while the orbifrontal cortex is coloured in green.

HMC is run for 10,000 iterations, discarding the 4,000 first as a burn-in and saving every 6 to obtain a total of 1,000 samples from the posterior. The run took approximately 15 hours on a Tesla K20c GPU card.

Results are shown in Figure 2 where in the first two columns we see median posterior log-intensities for the two types, in different axial slices. In both cases, we find that the regions with the highest intensities are the amygdalae and that the orbifrontal cortex is a region of high posterior intensity as well. The median expected number of points is 5.81 for type 1 (95% CI [5.36,6,32]) and 6.45 for type 2 (95% CI [5.97,6.97]). The observed values are 6.27 and 6.73 respectively.

Conditional on there being exactly one focus, we can estimate the probability that this focus appears in any subset as . Using the posterior draws obtained from the HMC algorithm, we can obtain the posterior distribution of any such quantity. For our simulated type 1 data we find that the median posterior probability of observing a focus in the right amygdala () is 0.43 (95% CI [0.40,0.48]). For type 2, the probability of observing a focus in the left amygdala () is 0.42 (95% CI [0.39,0.46]). For the orbifrontal cortex () the median posterior probabilities are 0.25 for type 1 and 0.23 for type 2, with 95% credible intervals and respectively. We therefore see that the model underestimates the probabilities for , and . This bias can be attributed to the smoothness that is imposed by our parameter thus leading to increases intensities just outside these regions as well as regions where noise foci appear.

An interesting question one may ask is which are the regions of the brain that are activated by one type or the other, but not both. To answer this, one can construct the mean standardised posterior difference map computed as the ratio of the posterior mean of the difference , to the posterior standard deviation of that difference: . Extreme negative or positive values are evidence of differences between the two types. We show the difference map in the third column of Figure 2. As we see, the model distinguishes the the two types in the amygdala but the differences are small in the rest of the brain. This is a very interesting feature of the model, especially for applications in CBMA where researchers are sometimes interested in comparing a similar process in a different domain, see for example Rottschy et al. (2012) or Section 5 for an application. An alternative way to do the comparison would be to use the posterior intensity draws to find , where is a threshold difference, is the posterior intensity for type in voxel obtained from the -th iteration of the algorithm, and is the total number of HMC draws. However, in this approach it is hard to choose the threshold difference .

Figure 1: Some true (top row) and estimated (bottom row) latent Gaussian processes for type 1 (columns 1 and 2) and type 2 (columns 3 and 4) in the simulation setup 1 of Section 4.1. Columns 1 and 3 correspond to axial slice ; columns 2 and 4 correspond to axial slice . While they may appear dissimilar at first, observe that the most intense regions of the true and estimated intensity match up; in this punishingly sparse setting (mean 4 foci per 3D image), the less intense regions have too few points to learn the intensity.
Figure 2: Results for simulation setup 2 of Section 4.2. The top row corresponds to axial slice whereas the bottom row corresponds to axial slice . Columns 1 and 2 are the estimated log-intensities for type 1 and type 2 respectively. The third column is the standardised mean posterior difference between the two latent Gaussian processes in the corresponding slice; bright colours indicate areas mostly activated by type 1 process. The fourth column shows the regions of the brain systematically activated by the two processes; red for type 1, blue for type 2 and green for both.

5 Application: meta-analysis of emotion and executive control studies

In this Section we apply our model to a real meta-analysis data set.

5.1 Data despcription

Our dataset consists of 1,193 neuroimaging studies. The studies were conducted between 1985 and 2015 and the average number of participants was 16. Of these, 855 studies are on emotion and the remaining 338 are on executive control. The sample has a total of 10,266 foci, 6,112 from emotion (7.15 on average) and 4,154 from executive control (12.29 on average). Figure 3 is a graphical representation of the data. We see that even though there is some clustering, the foci are distributed throughout the brain. Our application will focus on identifying the regions of consistent activation across studies and infer on possible difference between the two types.

Figure 3: Graphical representation of the meta-analysis dataset. Data consist of foci from 1,193 studies of two types, 855 emotion (left, red) and 338 exectutive control (right, blue) studies. The overall number of foci is 6,112 and 4,154 for emotion and executive control respectively, for a combined total of 10,266 foci. The code used to generate the figure is courtesy of Jian Kang.

5.2 Algorithm details and convergence diagnostics

We use the same parametrisation as in Section 4: we have 2 spatially varying intercepts, one for emotion and one for executive control. We run the HMC for 15,000 iterations in total, discarding the first 5,000 as a burn-in. The total number of leapfrog steps is set to and the stepsize is initialised at . We use a diagonal mass matrix with units specified as explained in Section 3.

Convergence of the HMC chain is assessed visually by inspection of posterior traceplots for the model parameters. We run a total of 3 HMC chains in order to examine if they all converge to the same values. Posterior traceplots, along with autocorrelation plots are shown in Section C of the Appendix. Due to the large number of parameters we only focus on the scalars , and , as well as intensities in voxels and where the highest median posterior values are observed for the two types. Integrated intensities over the entire brain are also examined. Results indicate that our chains have converged to their stationary distribution. This is verified by the fact that posterior values from the 3 different runs overlap one with another for all the quantities that we plot. Finally it is worth noting that our chains show very good mixing properties since autocorrelation falls to low values after only a small number of iterations.

5.3 Results

In Figure 4 we plot median posterior intensities for emotion in several axial slices of the brain. Our results look qualitatively similar to the results obtained by Kober et al. (2008), Yue et al. (2012) and Kang et al. (2011) in their meta-analyses of emotion studies. The expected number of foci is 7.15 (95% CI [6.99,7.33]) and the regions mostly activated by emotions are the right and the left amygdala where the peak intensities in Figure 4 appear. Executive control processing generally recruits more regions of the brain and hence the median expected number of foci is 12.30 (95% CI [11.92,12.68]). The main effects are localised in the right and left cerebral cortex as can be seen in Figure 5.

Several quantities of interest can be obtained from our model, based on the properties of the spatial Poisson process. For example, one may calculate the probability of observing at least one focus in a set of voxels, e.g. a region of interest (ROI) or the entire brain. In Figure 7 we show the posterior distribution of , the probability of observing at least one focus in B, for several ROIs . The division of the brain in ROIs is done according to the Harvard-Oxford atlas (Desikan et al., 2006). A full brain analysis can be found in Section D of the Appendix. Note that this type of inference cannot be easily obtained from kernel-based methods such as MKDA (Wager et al., 2007) or ALE (Eickhoff et al., 2012) and therefore is a relative merit of our Bayesian point process model.

Comparison of the two types can be made as described in Section 4.2. The mean standardised posterior difference of the two intercepts is shown in Figure 6. Generally, we see that the two types have entirely distinct localisations and the main differences are observed where the main effects appear that is, the amygdala and regions of the cortex including front cortex, dorsolateral prefrontal cortex and parietal cortex.

Figure 4: Emotion results. The figure presents the median posterior intensity at each voxel for several axial slices, as obtained from 1,000 independent draws from the posterior. Top row (left to right): , , , and . Middle row (left to right): , , , and . Bottom row (left to right): , , , and .
Figure 5: Executive control results. The figure presents the median posterior intensity at each voxel for several axial slices, as obtained from 1,000 independent draws from the posterior. Top row (left to right): , , , and . Middle row (left to right): , , , and . Bottom row (left to right): , , , and .
Figure 6: Comparison of the two types. The figure presents mean standardised posterior differences of the latent Gaussian processes at each voxel for several axial slices, as obtained from 1,000 independent draws from the posterior. Top row (left to right): , , , and . Middle row (left to right): , , , and . Bottom row (left to right): , , , and .
Figure 7: Posterior % probabilities of observing at least one focus for several ROIs. The points represent the median posterior values, red for emotion and blue for executive control. The errors represent the 95% posterior credible intervals. Black asterisks are the empirical values as obtained from the data. The ROIs are (left to right): left amygdala, right amygdala, anterior cingulate gyrus, left hippocampus, right hippocampus and inferior frontal gyrus pars opercularis.

5.4 Model assessment

Posterior predictive checks (Gelman et al., 1996) can be found in Section E of the Appendix. In particular, we compare first and second order properties of observed data with samples obtained from the posterior predictive distribution. Results indicate some weakness in capturing the second order properties of the data but are overall satisfactory.

6 Discussion

In this work we have presented a new coordinate-based meta-analysis model, extension of the log-Gaussian Cox process model of Møller et al. (1998). To our knowledge, this is the first application of the LGCP with multiple realisations in a 3D problem. Note that even though our application is focused on neuroimaging, the method is directly applicable to any multi-type point pattern problem.

The model has an appealing interpretation being a spatial GLM and several interesting inferences can be obtained based on the properties of the spatial Cox process. A significant advantage of our approach compared to existing methods is the inclusion of covariates in the analysis thus introducing the notion of meta-regression in CBMA. Another very interesting feature of the model is that it allows multi-type comparison directly from the posterior without having to run the model several times.

The main weakness of our approach is the large amount of computational effort required. Nevertheless, the proposed HMC algorithm exhibits quick convergence and good mixing properties and hence less samples from the posterior are required. Additionally, implementation of the method on a GPU vastly reduces computation time and thus makes the method applicable on big meta-analysis problems.

Application of the method on a meta-analysis of emotion and executive control studies has given valuable insights on the data. In particular, we have found that the main structures activated by emotion processing are the right and left amygdala, a finding which is consistent with previous studies on the topic. Executive control functions engage more regions compared to emotion and the main areas activated are the right and left cerebral cortex. Furthermore, comparison of the two types has allowed the detection of several regions in which they differ significantly.

There are several ways in which our work can be extended. A potential future direction is to study the conditions, such as sample size or minimum number of foci, under which it is possible to estimate several global or spatially varying effects using the LGCP. Such work can be of importance for practical implementations since it will provide some guidance regarding the complexity of meta-regression models that can be fit to a given dataset. One other option would be consider a random effects LGCP model where:


where are the random effect terms with . This extension would account for further variability among the observed point patterns. Gradient expression are still available in closed form and hence inference for this model is still feasible by jointly updating the random effects parameters along with the remaining parameters of the LGCP.

Another possibility is to use some additional information about the foci such as -values or scores. These values can be attached as marks to the existing point patterns in order to improve estimation of the intensity function. Such an approach can enrich the inferences obtained from a meta-analysis by characterising the magnitude of activation in each region as opposed to the localisation of activations, which is the question that current methods address. Finally, it would be interesting to see how point pattern data can be combined with IBMA data so that a meta-analysis can take advantage of all the available information on a topic. Radua et al. (2012) have already done some work for kernel-based methods but the problem still needs to be tackled for spatial models as well.


The authors would like to thank Derek Nee, John Jonides, David Hayes, Brian T. Gold, Georg Northoff and Raffael Kalisch for their help collecting and sharing the data. PS, TEN & TDJ were supported by NIH grant R01 NS075066-01A1; TEN was supported by a Wellcome Trust fellowship 100309/Z/12/Z.


  • Benes et al. (2002) Benes, V., Bodlák, K., Møller, J., and Waagepetersen, R. P. (2002). Bayesian analysis of log Gaussian Cox processes for disease mapping. Technical report, Depertment of Mathematical Sciences, Aalborg University.
  • Carp (2012) Carp, J. (2012). The secret lives of experiments: Methods reporting in the fMRI literature. NeuroImage, 63(1), 289–300.
  • Christensen and Waagepetersen (2002) Christensen, O. F. and Waagepetersen, R. P. (2002). Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics, 58(2), 280–286.
  • Christensen et al. (2006) Christensen, O. F., Roberts, G. O., and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics, 15(1), 1–17.
  • Desikan et al. (2006) Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner, R. L., Dale, A. M., Maguire, R. P., Hyman, B. T., Albert, M. S., and Killiany, R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980.
  • Dietrich and Newsam (1993) Dietrich, C. R. and Newsam, G. N. (1993). A fast and exact method for multidimensional Gaussian stochastic simulations. Water Resources Research, 29(8), 2861–2869.
  • Duane et al. (1987) Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.
  • Eickhoff et al. (2012) Eickhoff, S. B., Bzdok, D., Laird, A. R., Kurth, F., and Fox, P. T. (2012). Activation likelihood estimation meta-analysis revisited. NeuroImage, 59(3), 2349–2361.
  • Gelman et al. (1996) Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–807.
  • Girolami and Calderhead (2011) Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 73(2), 123–214.
  • Greenland (1994) Greenland, S. (1994). Invited commentary: a critical look at some popular meta-analytic methods. American Journal of Epidemiology, 140(3), 290–296.
  • Hartung et al. (2008) Hartung, J., Knapp, G., and Sinha, B. K. (2008). Statistical Meta-Analysis with Applications. John Wiley & Sons, Hoboken.
  • Hoffman and Gelman (2014) Hoffman, M. and Gelman, A. (2014). The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
  • Illian et al. (2009) Illian, J. B., Møller, J., and Waagepetersen, R. P. (2009). Hierarchical spatial point process analysis for a plant community with high biodiversity. Environmental and Ecological Statistics, 16(3), 389–405.
  • Illian et al. (2012a) Illian, J. B., Sørbye, S. H., and Rue, H. (2012a). A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). The Annals of Applied Statistics, 6(4), 1499–1530.
  • Illian et al. (2012b) Illian, J. B., Sørbye, S. H., Rue, H., and Hendrichsen, D. K. (2012b). Using INLA to fit a complex point process model with temporally varying effects – A case study. Journal of Environmentl Statistics, 3(7), 1–25.
  • Jaakkola and Jordan (2000) Jaakkola, T. and Jordan, M. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.
  • Kang et al. (2011) Kang, J., Johnson, T. D., Nichols, T. E., and Wager, T. D. (2011). Meta analysis of functional neuroimaging data via Bayesian spatial point processes. Journal of the American Statistical Association, 106(493), 124–134.
  • Kober et al. (2008) Kober, H., Barrett, L. F., Joseph, J., Bliss-Moreau, E., Lindquist, K., and Wager, T. D. (2008). Functional grouping and cortical and subcortical interactions in emotion: a meta-analysis of neuroimaging studies. NeuroImage, 42(2), 998–1031.
  • Liang et al. (2009) Liang, S., Carlin, B. P., and Gelfand, A. E. (2009). Analysis of Minnesota colon and rectum cancer point patterns with spatial and nonspatial covariate information. The Annals of Applied Statistics, 3(3), 943–962.
  • Lindquist et al. (2012) Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., and Barrett, L. F. (2012). The brain basis of emotion: a meta-analytic review. Behavioral and Brain Sciences, 35(3), 121–143.
  • Lindquist (2008) Lindquist, M. A. (2008). The statistical analysis of fMRI data. Statistical Science, 23(4), 439–464.
  • Marshall and Roberts (2012) Marshall, T. and Roberts, G. (2012). An adaptive approach to Langevin MCMC. Statistics and Computing, 22(5), 1041–1057.
  • Møller and Waagepetersen (2003) Møller, J. and Waagepetersen, R. P. (2003). An introduction to simulation-based inference for spatial point processes. In J. Møller, editor, Spatial Statistics and Computational Methods, chapter 4, pages 143–198. Springer-Verlag.
  • Møller and Waagepetersen (2004) Møller, J. and Waagepetersen, R. P. (2004). Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton.
  • Møller and Waagepetersen (2007) Møller, J. and Waagepetersen, R. P. (2007). Modern statistics for spatial point processes. Scandinavian Journal of Statistics, 34(4), 643–684.
  • Møller et al. (1998) Møller, J., Syversveen, A. R., and Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.
  • Murray et al. (2010) Murray, I., Adams, R. P., and MacKay, D. J. (2010). Elliptical slice sampling. Journal of Machine Learning Research: Workshop and Conference Proceeding, 9(6), 541–548.
  • Neal (1996) Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer-Verlag New York, Inc., Secaucus, NJ, USA.
  • Neal (2011) Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, and X. Meng, editors, Handbook of Markov Chain Mote Carlo, chapter 5, pages 113–162. Chapman & Hall/CRC.
  • Poldrack et al. (2011) Poldrack, R. A., Mumford, J. A., and Nichols, T. (2011). Handbook of Functional MRI Data Analysis. Cambridge University Press.
  • Radua and Mataix-Cols (2009) Radua, J. and Mataix-Cols, D. (2009). Voxel-wise meta-analysis of grey matter changes in obsessive-compulsive disorder. The British Journal of Psychiatry : the Journal of Mental Science, 195(5), 393–402.
  • Radua et al. (2012) Radua, J., Mataix-Cols, D., Phillips, M. L., El-Hage, W., Kronhaus, D. M., Cardoner, N., and Surguladze, S. (2012). A new meta-analytic method for neuroimaging studies that combines reported peak coordinates and statistical parametric maps. European Psychiatry, 27(8), 605–611.
  • Raemaekers et al. (2007) Raemaekers, M., Vink, M., Zandbelt, B., van Wezel, R., Kahn, R., and Ramsey, N. (2007). Test-retest reliability of fMRI activation during prosaccades and antisaccades. NeuroImage, 36(3), 532–542.
  • Raichle (2003) Raichle, M. E. (2003). Functional brain imaging and human brain function. Journal of Neuroscience, 23(10), 3959–3962.
  • Rasmussen and Williams (2005) Rasmussen, C. E. and Williams, C. K. I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.
  • Rottschy et al. (2012) Rottschy, C., Langner, R., Dogan, I., Reetz, K., Laird, A. R., Schulz, J. B., Fox, P. T., and Eickhoff, S. B. (2012). Modelling neural correlates of working memory: a coordinate-based meta-analysis. NeuroImage, 60(1), 830–846.
  • Rue and Held (2005) Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.
  • Salimi-Khorshidi et al. (2009) Salimi-Khorshidi, G., Smith, S. M., Keltner, J. R., Wager, T. D., and Nichols, T. E. (2009). Meta-analysis of neuroimaging data: a comparison of image-based and coordinate-based pooling of studies. NeuroImage, 45(3), 810–823.
  • Simpson et al. (2016) Simpson, D., Illian, J., Lindgren, F., Sørbye, S., and Rue, H. (2016). Going off grid: computationally efficient inference for log-Gaussian Cox processes. Biometrika, 103(1), 49–70.
  • Taylor and Diggle (2014) Taylor, B. M. and Diggle, P. J. (2014). INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes. Journal of Statistical Computation and Simulation, 84(10), 2266–2284.
  • Turkeltaub et al. (2002) Turkeltaub, P. E., Eden, G. F., Jones, K. M., and Zeffiro, T. A. (2002). Meta-analysis of the functional neuroanatomy of single-word reading: method and validation. NeuroImage, 16(3, Part A), 765–780.
  • Waagepetersen (2004) Waagepetersen, R. P. (2004). Convergence of posteriors for discretized log Gaussian Cox processes. Statistics and Probability Letters, 66(3), 229–235.
  • Wager et al. (2004) Wager, T. D., Jonides, J., and Reading, S. (2004). Neuroimaging studies of shifting attention: a meta-analysis. NeuroImage, 22(4), 1679–1693.
  • Wager et al. (2007) Wager, T. D., Lindquist, M., and Kaplan, L. (2007). Meta-analysis of functional neuroimaging data: current and future directions. Social Cognitive and Affective Neuroscience, 2(2), 150–158.
  • Wager et al. (2009) Wager, T. D., Lindquist, M. a., Nichols, T. E., Kober, H., and Van Snellenberg, J. X. (2009). Evaluating the consistency and specificity of neuroimaging data using meta-analysis. NeuroImage, 45(Supplement 1), S210––S221.
  • Wood and Chan (1994) Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in . Journal of Computational and Graphical Statistics, 3(4), 409–432.
  • Yue et al. (2012) Yue, Y. R., Lindquist, M. A., and Loh, J. M. (2012). Meta-analysis of functional neuroimaging data using Bayesian nonparametric binary regression. The Annals of Applied Statistics, 6(2), 697–718.

Appendix A Gradient expressions for the LGCP

The log-posterior, up to a normalising constant is given by:


where , , , and the intensity function at each voxel is defined as:


We now calculate the derivatives with respect to the parameters of interest.

Partial derivatives with respect to

We have that:


As a result:


where is the total number of foci in study .

Partial derivatives with respect to

We have that:



Partial derivatives with respect to



For ease of exposition we will complete the derivation for the one-dimensional case; however, similar arguments hold when . Matrices are circulant and so, the matrix-vector product can be found using the discrete Fourier transform as:


where are the diagonal matrices containing the eigenvalues of and is the matrix of eigenvectors. In Equation (20), the only term depending on is and, hence:


We know that , where for we have that:


where is the imaginary unit. Now it is straightforward to see that for :


where can be viewed as the -th eigenvalue of the of a circulant matrix with base and , . Overall we see that:


where stands for element wise division. Combining Equations (19) and (24), we find that:



Partial derivatives with respect to



where is the -th row of the matrix . Now we can see that:


since is a nested block circulant matrix, where are -vectors with elements .

Appendix B LGCP simulation setup I traceplots

In this section we provide traceplots for the scalar parameters of the LGCP model, as fit to the simulated data of Section 4.1. Trace plots for the parameters