Bayesian logGaussian Cox process regression with applications to fMRI metaanalysis
Abstract
A typical neuroimaging study will produce a 3D brain statistic image that summarises the evidence for activation during the experiment. However, for practical reasons those images are rarely published; instead, authors only report the (x,y,z) locations of local maxima in the statistic image. Neuroimaging metaanalyses use these foci from multiple studies to find areas of consistent activation across the human brain. However, current methods in the field do not account for studyspecific characteristics. Therefore, we propose a fully Bayesian model based on logGaussian Cox processes that allows for the inclusion of studyspecific variables. We present an efficient MCMC scheme based on the Hamiltonian Monte Carlo algorithm to simulate draws from the posterior. Computational time is significantly reduced through a parallel implementation using a graphical processing unit card. We evaluate the method on both simulated and real data.
1 Introduction
Functional magnetic resonance imaging (fMRI) has experienced an explosive growth over the past two decades, leading to significant advances in our understanding of human brain function (Raichle, 2003). The objective of an fMRI study is to identify the neural correlates of a physical, mental, or perceptual process. During the experiment, participants are asked to perform a series of tasks, such as viewing images or reading text, while the MRI device measures changes in the Blood Oxygenation Level Dependent (BOLD) signal, a surrogate for neuronal activity. A series of 3D volumes of BOLD measurements are collected for each participant and at each voxel (volume element) a hierarchical multisubject time series model is fit. The ultimate result of this process is a statistic image that assesses the evidence for a change in the BOLD signal induced by the task. For more details on the statistical analysis of fMRI data see Lindquist (2008) or Poldrack et al. (2011).
While fMRI is now a standard tool, the importance of the findings can be undermined by the inherent limitations of a single fMRI experiment. In particular, there are three facets of fMRI experiments that put the quality of the results into question. First, fMRI studies suffer from low power. The typical sample size of an fMRI study is small, with the majority of experiments using less than 20 participants (Carp, 2012). For example, a recent metaanalysis of emotions found a median sample size of 13 (Lindquist et al., 2012). Such small sample sizes limit power and surely result in Type II errors. Furthermore, Type I errors are also prevalent (Wager et al., 2009), as a result of searching for significant effects over thousands of voxels. While methods exist to correct multiplicity, sometimes they either are not applied or are applied inappropriately. Finally, fMRI experiments suffer from low reliability (Raemaekers et al., 2007); that is, an individual examined under the same experimental conditions may give different results on two different sessions.
For all these reasons, there has been a growing interest in metaanalysis of functional neuroimaging studies. A metaanalysis can increase power by combining the available information across studies, hence separating the consistent findings from those occurring by chance. In fMRI there are two broad approaches for metaanalysis. When the full statistical images are available, that is effect sizes and associated standard errors for all voxels in the brain, an IntensityBased MetaAnalysis (IBMA) can proceed by means of standard metaanalytic mathods (see Hartung et al. (2008) for an overview). However, these statistic images (200,000+ voxels) traditionally have not been shared by authors. Instead, researchers only publish the brain atlas coordinates of the local maxima in significant regions of the statistic image. We call these coordinates the foci (singular focus). When only foci are available then a CoordinateBased MetaAnalysis (CBMA) is conducted. As can be expected, the transition from full images to the lists of foci involves a heavy loss of information (SalimiKhorshidi et al., 2009). However, since the vast majority of researchers rarely provide the full images, CBMA constitutes the main approach for fMRI metaanalysis.
Most work in the field is focused on the socalled kernelbased methods such as Activation Likelihood Estimation (Turkeltaub et al., 2002; Eickhoff et al., 2012, ALE), Multilevel Kernel Density Analysis (Wager et al., 2004, 2007, MKDA) and Signed Differential Mapping (Radua and MataixCols, 2009; Radua et al., 2012, SDM). Briefly, these methods construct a statistic map as the convolution of the foci^{1}^{1}1Precisely, this is a convolution of a Dirac delta function located at each foci with a given kernel. with 3D spatial kernels. Areas of the map with large values suggest brain regions of consistent activation across studies. For statistical inference, the map is thresholded by reference to a Monte Carlo distribution under the null hypothesis of uniform spatial foci allocation. Kernelbased methods are not based on an explicit probabilistic model and hence often lack interpretability. Moreover, for some methods it is difficult to obtain standard errors and hence only values are reported for each voxel. Recently, some modelbased methods were proposed to address the limitations of kernelbased methods, such as the Bayesian Hierarchical Independent Cox Cluster Process model of Kang et al. (2011) and the Bayesian Nonparametric Binary Regression model of Yue et al. (2012). However, none of these methods account for studyspecific characteristics in the metaanalysis.
In this work we propose a Bayesian spatial point process model, an extension of the logGaussian Cox process model (Møller et al., 1998) that can include study specific characteristics as explanatory variables, thus introducing the notion of metaregression (Greenland, 1994) to CBMA. Metaregression is an important facet of metaanalysis, especially when there is appreciable heterogeneity among the studies. For example, in a review of 241 fMRI studies, Carp observed 223 different analytical strategies. It is therefore essential to explore the effect that study characteristics have on the analyses outcomes, as well as to identify which of the outcomes replicate regardless of the analytical pipelines. Further, our method takes advantage of some recent advances in parallel computing hardware, specifically the use of graphical processing units (GPUs).
The remainder of this manuscript is organised as follows. In Section 2 we outline the details of our model and describe the approximations required for posterior inferences. In Section 3 we describe the Hamiltonian Monte Carlo algorithm used for simulating samples from the posterior distribution of our model parameters. In Sections 4 and 5 we apply the model to simulated and real data respectively. In Section 6 we conclude and set some possible directions for future work.
2 Model specifications
Suppose that there are a total number of studies in the meta analysis and that each study comes with a point pattern , a set of foci , where is the support of the analysis, usually set from a standard atlas of the brain, and , where is the number of foci in a study. Additionally, suppose that for each point pattern there is a set of study specific characteristics, .
We will assume that each point pattern is the realisation of a Cox point process defined on , driven by a random intensity . We can then model the intensity function at each point as:
(1) 
where is the baseline intensity parameter, and the are the regression coefficients. Equation (1) defines a spatial loglinear model over the brain. Foci are more likely to occur in regions of the brain with high intensity values whereas we expect almost no foci in regions as the intensity approaches zero. The exact rates are given by the properties of a Cox process. In particular, given , the expected number of foci in any bounded is a Poisson random variable with mean (Møller and Waagepetersen, 2004).
In practice we expect that some covariates will have a global (homogenous) effect. Therefore, we split the covariates into the that have a local effect and that have a global effect:
(2) 
where for all .
A Bayesian model is defined with prior distributions on model parameters, here the functional parameters , , and scalar parameters , . A natural way to proceed is to assume that are realisations of Gaussian processes and that the have normal distributions. That way, the right hand side of Equation (2) is also a Gaussian process, and each point process is a logGaussian Cox process (LGCP) (Møller et al., 1998). We will assume an isotropic, power exponential correlation structure, that is for points we have:
(3) 
for , where are the correlation decay parameters and are the smoothness parameters. The same correlation structure was used by Møller et al. (1998) and Møller and Waagepetersen (2003) in the context of LGCPs. Of course, one could consider alternative correlation structures such as the Matérn covariance function (see for example Rasmussen and Williams (2005)).
The logGaussian Cox process is a flexible model for spatial point data that can account for aggregation (Møller et al., 1998; Møller and Waagepetersen, 2007) or even repulsion between points (Illian et al., 2012a) and has therefore found applications in several fields such as disease mapping (Benes et al., 2002; Liang et al., 2009) and ecology (Møller and Waagepetersen, 2003; Illian et al., 2012b).
By the definition of a Cox process, is a Poisson point process on conditional on (Møller and Waagepetersen, 2004). The density (RadonNikodym derivative) of this point process with respect to the unit rate Poisson process is:
(4) 
for , with denoting the volume of the brain. We can view as the density of the sampling distribution of the data; if we further assume independent studies, then we obtain the likelihood for the model as:
(5) 
where is as defined in Equation (2). Inference can be then achieved through the posterior distribution of the model which is given, up to a normalising constant, by:
(6) 
where and are the priors on the functional and scalar parameters, respectively. We discuss the priors below in Section 2.1.
2.1 Posterior approximation
Calculation of the posterior in Equation (6) requires the evaluation of the infinite dimensional Gaussian processes , , which we approximate with a finite dimensional distribution. Following Møller et al. (1998) and Benes et al. (2002), we consider the discretisation of the 3D volume with a regular rectangular grid . We use cubic cells (i.e. voxels) in with volume , where is the length of the side. In neuroimaging, analysis with cubic voxels is typical, leading to a boxshaper grid of about 1 million voxels, of which about 200,000 are in the brain or cerebellum. Note that for simplicity, we consider both grey matter and white matter voxels in our implementations. Voxels are indexed , and the coordinate of is the location of the center .
For any , the Gaussian process can be now approximated with a step function which is constant within each voxel and equal to the value of at the location of the center, i.e. . Waagepetersen (2004) shows that the accuracy of this approximation improves as goes to zero. By definition, are multivariate Gaussian vectors. We parametrise as:
(7) 
where are the overall (scalar) means, are the marginal standard deviations, are the correlation matrices with elements , and are the a priori vectors, . The same parametrisation is used by Møller et al. (1998), Christensen and Waagepetersen (2002) and is advocated by Christensen et al. (2006) because it allows for computationally efficient posterior simulations. For the purposes of this work we will only consider the case where i.e. the Gaussian correlation function. This choice may seem rather simplistic but is justified by the sparsity of points in CBMA data and ubiquitous use of Gaussian smoothing kernels in neuroimaging data analysis.
Priors for the vectors are induced by the parametrisation of Equation (7). For the remaining parameters we will assume weekly informative priors: , , and ^{2}^{2}2This prior may not be uninformative for applications with a different scale of distances.
Once the latent Gaussian processes are approximated, one can also approximate with a step function as before. The intensities at the center of each voxel are given by:
(8) 
where is the vector, the discretised intensity. We will write for the element of study ’s intensity, and note we require to capture the mean effect. The approximated posterior is:
(9) 
where , takes on the value when and 0 otherwise, is the index of the voxel containing , and is the joint prior distribution of the parameters. The posterior distribution in Equation (9) is still analytically intractable due to the presence of an unknown normalising constant and thus we need to resort to Monte Carlo simulation or approximation techniques to obtain samples from it.
3 Sampling algorithm details
Bayesian methodology for inference on LGCPs can be broadly divided into two main categories: simulation based approximations of the posterior such as Markov Chain Monte Carlo (Møller et al., 1998) and elliptical slice sampling (Murray et al., 2010), and deterministic approximations to the posterior such as integrated nested Laplace approximations (Illian et al., 2012a; Simpson et al., 2016, INLA) and variational Bayes (Jaakkola and Jordan, 2000). In a recent study, Taylor and Diggle (2014) compare the Metropolisadjusted Langevin (MALA) algorithm with INLA and find that both methods give similar results. In our application, we choose to use simulation based methods because application on our 3D problem is more straightforward.
Of course, building an algorithm for such highdimensional problem is nontrivial. Girolami and Calderhead (2011) showed that of all possible strategies, their Riemann Manifold Hamiltonian Monte Carlo (RMHMC) sampler is the computationally most efficient for LGCPs in a 2D setting. Unfortunately, application in this problem (3D setting) is prohibitive as it would require the inversion of a huge matrix. Therefore, we choose to use the standard Hamiltonian Monte Carlo (Duane et al., 1987; Neal, 2011, HMC) algorithm which Girolami and Calderhead (2011) found to be the second most efficient.
HMC initially appeared in the physics literature by Duane et al. (1987) under the name Hybrid Monte Carlo, and later emerged into statistics literature by Neal (2011). HMC emulates the evolution of a particle system which is characterised by its position () and momentum () over time. In our case, will be the parameter vector of interest and will be introduced artificially from a distribution, with being the dimensionality of the problem and the mass matrix. The dynamics of the system are described by a set of differential equations, known as Hamilton’s equations.
HMC alternates between moves for the position vector and the momentum vector based on Hamilton’s equations. If the solutions of the equations can be found analytically then moves will be deterministic; if not, numerical integration is required and an acceptance/rejection step must be performed to account for integration error. Integration is done in fictitious time , where is the stepsize and is the number of steps. Typically the leapfrog integrator is employed, which for and starting at time is performed as (Neal, 2011):
(10)  
Overall, if the method is applied correctly, it will produce a timereversible Markov chain that has the desired distribution as its stationary distribution. As we show in Appendix A, gradient expressions are available in closed form for all model parameters including correlation parameters . We therefore choose to update all and jointly with HMC. The solutions to Hamilton’s equations are not available analytically so we need to use the Leapfrog integrator and include an accept/reject step at the end of it.
The procedure requires the specification of a stepsize and a total number of leapfrog steps . Hoffman and Gelman (2014) show how tuning can be achieved automatically but when we applied this method to our problem running time was increased substantially. Therefore we use an alternative approach to tune these parameters. The stepsize is automatically adjusted during the burnin phase of the HMC to give an overall acceptance rate close to the suggested by Neal (2011). In particular, if is the stepsize at iteration and is the acceptance rate over the past iterations, then every iterations we calculate the new stepsize as:
(11) 
Specifically we use and A similar approach is empoyed by Marshall and Roberts (2012) for MALA. The latter (number of leapfrog steps), is always fixed to . We took this approach because we found that, for our LGCP application, the mixing properties of the algorithm scale linearly with but also with the total number of HMC iterations. Hence one can use a relatively large and few iterations or relatively smaller and more iterations, the total computation time staying relatively constant.
The last tuning parameter in the HMC algorithm is the variancecovariance matrix of the zero mean normal momentum parameters, . To our knowledge, there is only limited off the shelf methodology on how to adjust . As a starting place we set . Neal (1996) suggests that if an estimate of the posterior variance is available then a good practice is to set . In principle, can be estimated during the burnin phase of HMC but in practice this is not possible due to the dimensionality of the problem. In our simulations, we found that the mean posterior variance of the elements of the was higher compared to the scalar parameters, followed by or and then . Especially for the the scale is typically much smaller compared to the other parameters in our applications and so we use instead of . After the reparametrisation we found that setting the mass for parameters of , , and equal to 1, 3, 3 and 10 respectively worked well in all implementations.
The most computationally demanding part of the algorithm is the the calculation of the large matrixvector products appearing in the intensity functions of Equation (8). Luckily, an elegant solution to this problem is given by Møller et al. (1998) based on circulant embedding that was first proposed by Dietrich and Newsam (1993) and Wood and Chan (1994). The key to the approach is the linear algebra result that a circulant matrix has the discrete Fourier basis as its eigenvectors. is not circulant but is block toeplitz and can be embeded in a matrix that is circulant. Thus the matrix square root, inversion and multiplication can be accelerated by using (the highly efficient) discrete Fourier transform (DFT) of the embedded matrix and manipulating Fourier coefficients, followed by inverse DFT and extracting the appropriate submatrix/subvector. See Rue and Held (2005) for more details.
We close this section by stressing that despite the massive dimensionality of the parameter vector, the problem has a very high degree of parallelisation. Intensities can be evaluated in blocks of thousands of voxels simultaneously making the algorithm suitable for implementation in a graphical processing unit (GPU). The most computationally intensive part of our model, namely operations with DFTs, is also amenable to parallelisation and there exist libraries such as NVIDIA’s cuFFT library that are designed for this specific task. Overall, we believe that implementation of the logGaussian Cox process model described above will soon become a routine task for any moderately powerful GPU device.
4 Simulation studies
We consider two simulation setups. In the first we draw samples directly from the logGaussian Cox process whereas in the second we create synthetic studies based on a different model to assess the robustness of our method to model misspecification. For consistency, all processes are defined on the same brain atlas used in the application of Section 5, consisting of cubic voxels. The average number of foci per simulated study is kept low (mean number of foci per study is 5) to resemble the sparsity of points observed in real CBMA data. Finally, the total number of studies is fixed to in both analyses, similar to the sample sizes available in real applications (Kang et al., 2011, for example).
4.1 Setup 1
In this setting we simulate 200 studies, with two spatially varying covariates that account for the mean of two groups of studies, and two nonspatially varying covariates. For we set:
(12) 
where , , and . Note that this parametrisation of the covariates implies existence of two types of studies, say type 1 and 2, with different spatially varying means and the effect of one continuous and one categorical covariate. The expected total number of foci is 3.99 and 4.16 for studies of type 1 and 2 respectively. We draw from their prior and fix the values of the scalar parameters shown in Table 1. We run the HMC algorithm of Section 3 for 10,000 iterations, discarding the first 4,000 as a burnin and save every 6 iterations for a total of 1,000 saved posterior samples. This took roughly 14 hours on an NVIDIA Tesla K20c GPU card.
Results are summarised in Table 1 and Figure 1. In Table 1 we see that the scalar parameters are estimated accurately despite the sparsity of points in the realisations. The 95% credible intervals contain the true values of all the parameters in the setup. Some traceplots for these parameters can be found in Section B of the Appendix.
For , the median expected number of points is 3.97 (95% CI [3.84,4.10])for type 1 and 4.61 (95% CI [4.46,4.78]) for type 2. These values are very similar to the values we observe in the simulated dataset, that is 3.98 for type 1 and 4.53 for type 2. This indicates that our model does a good job fitting the data. The shape of the latent Gaussian processes is generally captured for both types as can be seen in Figure 1. In particular, we can see that the maxima in the true and estimated images appear roughly in the same locations. The same cannot be said about the other values but this is expected given the dearth of information in regions of low intensity.
Parameter  True Value  Posterior median  credible interval 

13.7  13.72  [ 13.99 , 13.48 ]  
14.2  14.14  [ 14.47 , 13.86 ]  
1.2  1.19  [ 1.01 , 1.38 ]  
1.6  1.61  [ 1.43 , 1.81 ]  
1  0.93  [ 0.69 , 1.27 ]  
2  2.30  [ 1.69 , 3.15 ]  
2  1.44  [ 0.22 , 2.52 ]  
1  0.95  [ 0.32 , 1.65 ] 
4.2 Setup 2
In this setup we create datasets with a pattern of points that follows brain structures of interest. Again there are two types of studies, say type 1 and type 2. For each study , , we generate the total number of points from a Negative Binomial distribution with mean and variance . For the covariates, and . Once we know the exact number of foci per study, we assign the study uniformly at random to one of the 2 types and the distribute its foci as follows. For type 1, foci appear systematically in the following regions: each focus can be observed in the right amygdala () with probability 55%, the orbifrontal cortex () with probability 30% or anywhere else in the brain with probability 15%. The configuration for type 2 differs in that most of the points will go to the left amygdala () instead of the right amygdala. If a focus is assigned to one of the three broad regions, the exact location has a uniform distribution over the region. In the fourth column of Figure 2 the regions in red and blue correspond to the left and right amygdala respectively while the orbifrontal cortex is coloured in green.
HMC is run for 10,000 iterations, discarding the 4,000 first as a burnin and saving every 6 to obtain a total of 1,000 samples from the posterior. The run took approximately 15 hours on a Tesla K20c GPU card.
Results are shown in Figure 2 where in the first two columns we see median posterior logintensities for the two types, in different axial slices. In both cases, we find that the regions with the highest intensities are the amygdalae and that the orbifrontal cortex is a region of high posterior intensity as well. The median expected number of points is 5.81 for type 1 (95% CI [5.36,6,32]) and 6.45 for type 2 (95% CI [5.97,6.97]). The observed values are 6.27 and 6.73 respectively.
Conditional on there being exactly one focus, we can estimate the probability that this focus appears in any subset as . Using the posterior draws obtained from the HMC algorithm, we can obtain the posterior distribution of any such quantity. For our simulated type 1 data we find that the median posterior probability of observing a focus in the right amygdala () is 0.43 (95% CI [0.40,0.48]). For type 2, the probability of observing a focus in the left amygdala () is 0.42 (95% CI [0.39,0.46]). For the orbifrontal cortex () the median posterior probabilities are 0.25 for type 1 and 0.23 for type 2, with 95% credible intervals and respectively. We therefore see that the model underestimates the probabilities for , and . This bias can be attributed to the smoothness that is imposed by our parameter thus leading to increases intensities just outside these regions as well as regions where noise foci appear.
An interesting question one may ask is which are the regions of the brain that are activated by one type or the other, but not both. To answer this, one can construct the mean standardised posterior difference map computed as the ratio of the posterior mean of the difference , to the posterior standard deviation of that difference: . Extreme negative or positive values are evidence of differences between the two types. We show the difference map in the third column of Figure 2. As we see, the model distinguishes the the two types in the amygdala but the differences are small in the rest of the brain. This is a very interesting feature of the model, especially for applications in CBMA where researchers are sometimes interested in comparing a similar process in a different domain, see for example Rottschy et al. (2012) or Section 5 for an application. An alternative way to do the comparison would be to use the posterior intensity draws to find , where is a threshold difference, is the posterior intensity for type in voxel obtained from the th iteration of the algorithm, and is the total number of HMC draws. However, in this approach it is hard to choose the threshold difference .
5 Application: metaanalysis of emotion and executive control studies
In this Section we apply our model to a real metaanalysis data set.
5.1 Data despcription
Our dataset consists of 1,193 neuroimaging studies. The studies were conducted between 1985 and 2015 and the average number of participants was 16. Of these, 855 studies are on emotion and the remaining 338 are on executive control. The sample has a total of 10,266 foci, 6,112 from emotion (7.15 on average) and 4,154 from executive control (12.29 on average). Figure 3 is a graphical representation of the data. We see that even though there is some clustering, the foci are distributed throughout the brain. Our application will focus on identifying the regions of consistent activation across studies and infer on possible difference between the two types.
5.2 Algorithm details and convergence diagnostics
We use the same parametrisation as in Section 4: we have 2 spatially varying intercepts, one for emotion and one for executive control. We run the HMC for 15,000 iterations in total, discarding the first 5,000 as a burnin. The total number of leapfrog steps is set to and the stepsize is initialised at . We use a diagonal mass matrix with units specified as explained in Section 3.
Convergence of the HMC chain is assessed visually by inspection of posterior traceplots for the model parameters. We run a total of 3 HMC chains in order to examine if they all converge to the same values. Posterior traceplots, along with autocorrelation plots are shown in Section C of the Appendix. Due to the large number of parameters we only focus on the scalars , and , as well as intensities in voxels and where the highest median posterior values are observed for the two types. Integrated intensities over the entire brain are also examined. Results indicate that our chains have converged to their stationary distribution. This is verified by the fact that posterior values from the 3 different runs overlap one with another for all the quantities that we plot. Finally it is worth noting that our chains show very good mixing properties since autocorrelation falls to low values after only a small number of iterations.
5.3 Results
In Figure 4 we plot median posterior intensities for emotion in several axial slices of the brain. Our results look qualitatively similar to the results obtained by Kober et al. (2008), Yue et al. (2012) and Kang et al. (2011) in their metaanalyses of emotion studies. The expected number of foci is 7.15 (95% CI [6.99,7.33]) and the regions mostly activated by emotions are the right and the left amygdala where the peak intensities in Figure 4 appear. Executive control processing generally recruits more regions of the brain and hence the median expected number of foci is 12.30 (95% CI [11.92,12.68]). The main effects are localised in the right and left cerebral cortex as can be seen in Figure 5.
Several quantities of interest can be obtained from our model, based on the properties of the spatial Poisson process. For example, one may calculate the probability of observing at least one focus in a set of voxels, e.g. a region of interest (ROI) or the entire brain. In Figure 7 we show the posterior distribution of , the probability of observing at least one focus in B, for several ROIs . The division of the brain in ROIs is done according to the HarvardOxford atlas (Desikan et al., 2006). A full brain analysis can be found in Section D of the Appendix. Note that this type of inference cannot be easily obtained from kernelbased methods such as MKDA (Wager et al., 2007) or ALE (Eickhoff et al., 2012) and therefore is a relative merit of our Bayesian point process model.
Comparison of the two types can be made as described in Section 4.2. The mean standardised posterior difference of the two intercepts is shown in Figure 6. Generally, we see that the two types have entirely distinct localisations and the main differences are observed where the main effects appear that is, the amygdala and regions of the cortex including front cortex, dorsolateral prefrontal cortex and parietal cortex.



5.4 Model assessment
Posterior predictive checks (Gelman et al., 1996) can be found in Section E of the Appendix. In particular, we compare first and second order properties of observed data with samples obtained from the posterior predictive distribution. Results indicate some weakness in capturing the second order properties of the data but are overall satisfactory.
6 Discussion
In this work we have presented a new coordinatebased metaanalysis model, extension of the logGaussian Cox process model of Møller et al. (1998). To our knowledge, this is the first application of the LGCP with multiple realisations in a 3D problem. Note that even though our application is focused on neuroimaging, the method is directly applicable to any multitype point pattern problem.
The model has an appealing interpretation being a spatial GLM and several interesting inferences can be obtained based on the properties of the spatial Cox process. A significant advantage of our approach compared to existing methods is the inclusion of covariates in the analysis thus introducing the notion of metaregression in CBMA. Another very interesting feature of the model is that it allows multitype comparison directly from the posterior without having to run the model several times.
The main weakness of our approach is the large amount of computational effort required. Nevertheless, the proposed HMC algorithm exhibits quick convergence and good mixing properties and hence less samples from the posterior are required. Additionally, implementation of the method on a GPU vastly reduces computation time and thus makes the method applicable on big metaanalysis problems.
Application of the method on a metaanalysis of emotion and executive control studies has given valuable insights on the data. In particular, we have found that the main structures activated by emotion processing are the right and left amygdala, a finding which is consistent with previous studies on the topic. Executive control functions engage more regions compared to emotion and the main areas activated are the right and left cerebral cortex. Furthermore, comparison of the two types has allowed the detection of several regions in which they differ significantly.
There are several ways in which our work can be extended. A potential future direction is to study the conditions, such as sample size or minimum number of foci, under which it is possible to estimate several global or spatially varying effects using the LGCP. Such work can be of importance for practical implementations since it will provide some guidance regarding the complexity of metaregression models that can be fit to a given dataset. One other option would be consider a random effects LGCP model where:
(13) 
where are the random effect terms with . This extension would account for further variability among the observed point patterns. Gradient expression are still available in closed form and hence inference for this model is still feasible by jointly updating the random effects parameters along with the remaining parameters of the LGCP.
Another possibility is to use some additional information about the foci such as values or scores. These values can be attached as marks to the existing point patterns in order to improve estimation of the intensity function. Such an approach can enrich the inferences obtained from a metaanalysis by characterising the magnitude of activation in each region as opposed to the localisation of activations, which is the question that current methods address. Finally, it would be interesting to see how point pattern data can be combined with IBMA data so that a metaanalysis can take advantage of all the available information on a topic. Radua et al. (2012) have already done some work for kernelbased methods but the problem still needs to be tackled for spatial models as well.
Acknowledgements
The authors would like to thank Derek Nee, John Jonides, David Hayes, Brian T. Gold, Georg Northoff and Raffael Kalisch for their help collecting and sharing the data. PS, TEN & TDJ were supported by NIH grant R01 NS07506601A1; TEN was supported by a Wellcome Trust fellowship 100309/Z/12/Z.
References
 Benes et al. (2002) Benes, V., Bodlák, K., Møller, J., and Waagepetersen, R. P. (2002). Bayesian analysis of log Gaussian Cox processes for disease mapping. Technical report, Depertment of Mathematical Sciences, Aalborg University.
 Carp (2012) Carp, J. (2012). The secret lives of experiments: Methods reporting in the fMRI literature. NeuroImage, 63(1), 289–300.
 Christensen and Waagepetersen (2002) Christensen, O. F. and Waagepetersen, R. P. (2002). Bayesian prediction of spatial count data using generalized linear mixed models. Biometrics, 58(2), 280–286.
 Christensen et al. (2006) Christensen, O. F., Roberts, G. O., and Sköld, M. (2006). Robust Markov chain Monte Carlo methods for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics, 15(1), 1–17.
 Desikan et al. (2006) Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner, R. L., Dale, A. M., Maguire, R. P., Hyman, B. T., Albert, M. S., and Killiany, R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980.
 Dietrich and Newsam (1993) Dietrich, C. R. and Newsam, G. N. (1993). A fast and exact method for multidimensional Gaussian stochastic simulations. Water Resources Research, 29(8), 2861–2869.
 Duane et al. (1987) Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.
 Eickhoff et al. (2012) Eickhoff, S. B., Bzdok, D., Laird, A. R., Kurth, F., and Fox, P. T. (2012). Activation likelihood estimation metaanalysis revisited. NeuroImage, 59(3), 2349–2361.
 Gelman et al. (1996) Gelman, A., Meng, X.L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–807.
 Girolami and Calderhead (2011) Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 73(2), 123–214.
 Greenland (1994) Greenland, S. (1994). Invited commentary: a critical look at some popular metaanalytic methods. American Journal of Epidemiology, 140(3), 290–296.
 Hartung et al. (2008) Hartung, J., Knapp, G., and Sinha, B. K. (2008). Statistical MetaAnalysis with Applications. John Wiley & Sons, Hoboken.
 Hoffman and Gelman (2014) Hoffman, M. and Gelman, A. (2014). The NoUturn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
 Illian et al. (2009) Illian, J. B., Møller, J., and Waagepetersen, R. P. (2009). Hierarchical spatial point process analysis for a plant community with high biodiversity. Environmental and Ecological Statistics, 16(3), 389–405.
 Illian et al. (2012a) Illian, J. B., Sørbye, S. H., and Rue, H. (2012a). A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). The Annals of Applied Statistics, 6(4), 1499–1530.
 Illian et al. (2012b) Illian, J. B., Sørbye, S. H., Rue, H., and Hendrichsen, D. K. (2012b). Using INLA to fit a complex point process model with temporally varying effects – A case study. Journal of Environmentl Statistics, 3(7), 1–25.
 Jaakkola and Jordan (2000) Jaakkola, T. and Jordan, M. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.
 Kang et al. (2011) Kang, J., Johnson, T. D., Nichols, T. E., and Wager, T. D. (2011). Meta analysis of functional neuroimaging data via Bayesian spatial point processes. Journal of the American Statistical Association, 106(493), 124–134.
 Kober et al. (2008) Kober, H., Barrett, L. F., Joseph, J., BlissMoreau, E., Lindquist, K., and Wager, T. D. (2008). Functional grouping and cortical and subcortical interactions in emotion: a metaanalysis of neuroimaging studies. NeuroImage, 42(2), 998–1031.
 Liang et al. (2009) Liang, S., Carlin, B. P., and Gelfand, A. E. (2009). Analysis of Minnesota colon and rectum cancer point patterns with spatial and nonspatial covariate information. The Annals of Applied Statistics, 3(3), 943–962.
 Lindquist et al. (2012) Lindquist, K. A., Wager, T. D., Kober, H., BlissMoreau, E., and Barrett, L. F. (2012). The brain basis of emotion: a metaanalytic review. Behavioral and Brain Sciences, 35(3), 121–143.
 Lindquist (2008) Lindquist, M. A. (2008). The statistical analysis of fMRI data. Statistical Science, 23(4), 439–464.
 Marshall and Roberts (2012) Marshall, T. and Roberts, G. (2012). An adaptive approach to Langevin MCMC. Statistics and Computing, 22(5), 1041–1057.
 Møller and Waagepetersen (2003) Møller, J. and Waagepetersen, R. P. (2003). An introduction to simulationbased inference for spatial point processes. In J. Møller, editor, Spatial Statistics and Computational Methods, chapter 4, pages 143–198. SpringerVerlag.
 Møller and Waagepetersen (2004) Møller, J. and Waagepetersen, R. P. (2004). Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton.
 Møller and Waagepetersen (2007) Møller, J. and Waagepetersen, R. P. (2007). Modern statistics for spatial point processes. Scandinavian Journal of Statistics, 34(4), 643–684.
 Møller et al. (1998) Møller, J., Syversveen, A. R., and Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.
 Murray et al. (2010) Murray, I., Adams, R. P., and MacKay, D. J. (2010). Elliptical slice sampling. Journal of Machine Learning Research: Workshop and Conference Proceeding, 9(6), 541–548.
 Neal (1996) Neal, R. M. (1996). Bayesian Learning for Neural Networks. SpringerVerlag New York, Inc., Secaucus, NJ, USA.
 Neal (2011) Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. L. Jones, and X. Meng, editors, Handbook of Markov Chain Mote Carlo, chapter 5, pages 113–162. Chapman & Hall/CRC.
 Poldrack et al. (2011) Poldrack, R. A., Mumford, J. A., and Nichols, T. (2011). Handbook of Functional MRI Data Analysis. Cambridge University Press.
 Radua and MataixCols (2009) Radua, J. and MataixCols, D. (2009). Voxelwise metaanalysis of grey matter changes in obsessivecompulsive disorder. The British Journal of Psychiatry : the Journal of Mental Science, 195(5), 393–402.
 Radua et al. (2012) Radua, J., MataixCols, D., Phillips, M. L., ElHage, W., Kronhaus, D. M., Cardoner, N., and Surguladze, S. (2012). A new metaanalytic method for neuroimaging studies that combines reported peak coordinates and statistical parametric maps. European Psychiatry, 27(8), 605–611.
 Raemaekers et al. (2007) Raemaekers, M., Vink, M., Zandbelt, B., van Wezel, R., Kahn, R., and Ramsey, N. (2007). Testretest reliability of fMRI activation during prosaccades and antisaccades. NeuroImage, 36(3), 532–542.
 Raichle (2003) Raichle, M. E. (2003). Functional brain imaging and human brain function. Journal of Neuroscience, 23(10), 3959–3962.
 Rasmussen and Williams (2005) Rasmussen, C. E. and Williams, C. K. I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.
 Rottschy et al. (2012) Rottschy, C., Langner, R., Dogan, I., Reetz, K., Laird, A. R., Schulz, J. B., Fox, P. T., and Eickhoff, S. B. (2012). Modelling neural correlates of working memory: a coordinatebased metaanalysis. NeuroImage, 60(1), 830–846.
 Rue and Held (2005) Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.
 SalimiKhorshidi et al. (2009) SalimiKhorshidi, G., Smith, S. M., Keltner, J. R., Wager, T. D., and Nichols, T. E. (2009). Metaanalysis of neuroimaging data: a comparison of imagebased and coordinatebased pooling of studies. NeuroImage, 45(3), 810–823.
 Simpson et al. (2016) Simpson, D., Illian, J., Lindgren, F., Sørbye, S., and Rue, H. (2016). Going off grid: computationally efficient inference for logGaussian Cox processes. Biometrika, 103(1), 49–70.
 Taylor and Diggle (2014) Taylor, B. M. and Diggle, P. J. (2014). INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in logGaussian Cox processes. Journal of Statistical Computation and Simulation, 84(10), 2266–2284.
 Turkeltaub et al. (2002) Turkeltaub, P. E., Eden, G. F., Jones, K. M., and Zeffiro, T. A. (2002). Metaanalysis of the functional neuroanatomy of singleword reading: method and validation. NeuroImage, 16(3, Part A), 765–780.
 Waagepetersen (2004) Waagepetersen, R. P. (2004). Convergence of posteriors for discretized log Gaussian Cox processes. Statistics and Probability Letters, 66(3), 229–235.
 Wager et al. (2004) Wager, T. D., Jonides, J., and Reading, S. (2004). Neuroimaging studies of shifting attention: a metaanalysis. NeuroImage, 22(4), 1679–1693.
 Wager et al. (2007) Wager, T. D., Lindquist, M., and Kaplan, L. (2007). Metaanalysis of functional neuroimaging data: current and future directions. Social Cognitive and Affective Neuroscience, 2(2), 150–158.
 Wager et al. (2009) Wager, T. D., Lindquist, M. a., Nichols, T. E., Kober, H., and Van Snellenberg, J. X. (2009). Evaluating the consistency and specificity of neuroimaging data using metaanalysis. NeuroImage, 45(Supplement 1), S210––S221.
 Wood and Chan (1994) Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in . Journal of Computational and Graphical Statistics, 3(4), 409–432.
 Yue et al. (2012) Yue, Y. R., Lindquist, M. A., and Loh, J. M. (2012). Metaanalysis of functional neuroimaging data using Bayesian nonparametric binary regression. The Annals of Applied Statistics, 6(2), 697–718.
Appendix A Gradient expressions for the LGCP
The logposterior, up to a normalising constant is given by:
(14) 
where , , , and the intensity function at each voxel is defined as:
(15) 
We now calculate the derivatives with respect to the parameters of interest.
Partial derivatives with respect to
We have that:
(16)  
As a result:
(17)  
where is the total number of foci in study .
Partial derivatives with respect to
We have that:
(18)  
Therefore:
Partial derivatives with respect to
Again:
(19)  
For ease of exposition we will complete the derivation for the onedimensional case; however, similar arguments hold when . Matrices are circulant and so, the matrixvector product can be found using the discrete Fourier transform as:
(20) 
where are the diagonal matrices containing the eigenvalues of and is the matrix of eigenvectors. In Equation (20), the only term depending on is and, hence:
(21) 
We know that , where for we have that:
(22) 
where is the imaginary unit. Now it is straightforward to see that for :
(23)  
where can be viewed as the th eigenvalue of the of a circulant matrix with base and , . Overall we see that:
(24)  
where stands for element wise division. Combining Equations (19) and (24), we find that:
(25) 
So:
Partial derivatives with respect to
Finally:
(26)  
where is the th row of the matrix . Now we can see that:
(27)  
since is a nested block circulant matrix, where are vectors with elements .
Appendix B LGCP simulation setup I traceplots
In this section we provide traceplots for the scalar parameters of the LGCP model, as fit to the simulated data of Section 4.1. Trace plots for the parameters