Parameter clustering in Bayesian functional PCA of fMRI data

Parameter clustering in Bayesian functional PCA of fMRI data

NICOLÒ MARGARITELLA, VANDA INÁCIO DE CARVALHO, RUTH KING
University of Edinburgh, School of Mathematics, James Clerk Maxwell Building, The King’s Buildings, Peter Guthrie Tait Road, Edinburgh, UK
N.Margaritella@sms.ed.ac.uk
Abstract

The extraordinary advancements in neuroscientific technology for brain recordings over the last decades have led to increasingly complex spatio-temporal datasets. To reduce oversimplifications, new models have been developed to be able to identify meaningful patterns and new insights within a highly demanding data environment. To this extent, we propose a new model that merges ideas from Functional Data Analysis and Bayesian nonparametrics to obtain a flexible and computationally feasible exploration of spatio-temporal neuroscientific data. In particular, we make use of a Dirichlet process Gaussian mixture model to cluster functional Principal Component (fPC) scores within the standard Bayesian functional Principal Component Analysis (fPCA) framework; this allows us to capture the structure of spatial dependence among smoothed time series (curves) and its interaction with the time domain. Moreover, by moving the mixture from data to fPC scores, we obtain a more general clustering procedure, thus allowing much finer curve classification and higher level of intricate insight and understanding of the data. We present results from a Monte Carlo simulation study showing improvements in curves and correlation reconstruction compared with the standard Bayesian fPCA model before applying our method to a resting-state fMRI data analysis providing a rich exploration of the spatio-temporal dependence in brain time series that offer further insights into the underlying neurophysiological processes. Bayesian hierarchical model; Clustering; Dirichlet process; Functional data analysis; Neuroscientific data; Spatio-temporal data.

\@sect

section1[Introduction]Introduction

Several tools for the recording of different brain processes, such as functional Magnetic Resonance Imaging (fMRI) and Electroencephalogram (EEG) produce remarkable amounts of spatio-temporal data which challenge researchers to find suitable models for increasingly complex datasets. Consequently, the last decade has seen a marked increase in flexible methods for high dimensional data. Functional Data Analysis (FDA) is a fairly recent research field in statistics concerned with the analysis of data providing information about curves, shapes and images which vary over a continuum, usually time or space (see Ramsay and Silverman (2005) for an overview). In the FDA framework, data can be considered as noise-corrupted, discretised realisations of underlying smooth functions (curves or trajectories) which are recovered using basis expansions and smoothing (Ramsay and others, 2009). Many standard statistical tools have been translated into the FDA framework. Functional PCA (fPCA) is a technique that defines a set of smooth trajectories as an expansion of orthonormal bases (eigenfunctions) and weights which are called functional Principal Component scores (fPC scores, Ramsay and Silverman, 2005, Section 7). One of the advantages of fPCA is that it can be conveniently represented as a hierarchical mixed model in the Bayesian setting, with the joint posterior distribution of the fPC scores being the main target of inference (Crainiceanu and Goldsmith, 2010).
There has been a growing interest in applying FDA to neuroscientific data (see, among others, Hasenstab and others, 2017; Tian, 2010; Viviani and others, 2005). Often, in the FDA literature, underlying random curves are assumed to be independent and their correlation is ignored if believed to be mild (Liu and others, 2017). However, curve dependence is of particular importance in the analysis of brain activity because of the complex architecture of spatio-temporal connections between brain areas (Wolfson, 2018). Recently, Liu and others (2017) considered spatial dependence among trajectories by modelling the covariance of the fPC scores within a frequentist approach. Their results showed significant improvements in curves reconstruction compared to the standard approach assuming independence, especially with low signal-to-noise ratios.
The present study introduces a new method for the analysis of functional data in neuroscience. We develop a modified version of the Bayesian fPCA model (Crainiceanu and Goldsmith, 2010) that makes use of a Dirichlet Process (DP) mixture (Rasmussen, 2000; Neal, 2000; Escobar and West, 1995) to model the prior distribution of the fPC scores. Different functional mixture models that cluster functions through clustering of the coefficients in a basis expansion have been proposed in the literature (Angelini and others, 2012; James and Sugar, 2003; Ray and Mallick, 2006; Zhou and Wakefield, 2006). In our work we use the principal component bases for their straightforward interpretation and employ independent DP mixture priors for every eigendimension retained. By allowing a different partition for each mode of variation, we avoid the limitations of assuming separability of the cross-covariance, obtaining further insights from space-time interactions. We show that our approach has multiple advantages in the analysis of neuroscientific data as it improves curve reconstruction thanks to the local borrowing of information; it offers further insights into the spatio-temporal structure of the data as a result of dimension-specific curve classification; and it can be defined as a simple and computationally feasible hierarchical model which can be easily implemented in R.
The rest of the paper is structured as follows: in Section Parameter clustering in Bayesian functional PCA of fMRI data we overview the standard Bayesian fPCA model and introduce the new method, along with computational details. Section Parameter clustering in Bayesian functional PCA of fMRI data reports the setting and results of a Monte Carlo simulation study where we compare the performance of the new model with the standard Bayesian fPCA approach under different noise levels. In all scenarios our model showed improvements in both curve reconstruction and their correlation, especially in the case of low signal-to-noise ratio. Section Parameter clustering in Bayesian functional PCA of fMRI data addresses the application of our method to a resting-state fMRI dataset recorded from a healthy subject and we discuss the further insights obtained in the spatio-temporal structure of the data and the underlying neurophysiological processes. Conclusions are drawn in Section Parameter clustering in Bayesian functional PCA of fMRI data.

\@sect

section1[Methods]Methods

\@sect

subsection2[Functional Bayesian PCA]Functional Bayesian PCA

The standard FDA model is given by

(0.0)

where denote the noise-corrupted, discretised observed data for every trajectory and time point ; the associated underlying random curve and the noise term with zero mean and precision .
Functional PCA assumes that the process can be represented by the Karhunen-Loève expansion so that each realisation takes the form

(0.0)

where represents a population average and the infinite sum is a linear combination of orthonormal eigenfunctions , which are usually assumed to be observed, and fPC scores , which are the main goal of inference. In practice, only terms of the linear expansion are retained pertaining to those that explain a sufficiently large part of the total variability in the data (Sørensen and others, 2013). Often the case is assumed and the centered data are obtained by subtracting an estimate of the population average (Crainiceanu and Goldsmith, 2010).
The fPC scores are given prior probability distributions in the Bayesian framework. The standard Bayesian fPCA model (Crainiceanu and Goldsmith, 2010) assumes fPC scores to be independent draws from a univariate zero-centred normal distribution whose variance is dependent on the eigendimension . The most straightforward hierarchical representation of the standard Bayesian fPCA model is

(0.0)

with usually set to low values (e.g. ). In this model the noise term is assumed to be Gaussian and independent gamma priors are placed over the precision parameters because of their conjugacy property, permitting closed-form conditional posterior distributions and the use of Gibbs sampling. However, alternative prior distributions for the precision parameter have been proposed (Gelman, 2006).
Recently, Liu and others (2017) proposed to capture spatial dependence through a suitable model for the covariance of fPC scores. In particular, they defined as a function of the correlation coefficient which they modelled using the Matérn function family and estimating the relative parameters. This approach implies separability between temporal and spatial components as well as a covariance structure that depends on the distance between observations; such assumptions might not be suitable for complex spatio-temporal phenomena such as brain activity where dependencies are the result of both structural and functional neuronal pathways as well of task-specific characteristics. In this study, we overcome these limitations to achieve a higher level of flexibility in the modelling of spatio-temporal covariance of neuroscientific data.

\@sect

subsection2[Proposed model]Proposed model In this section we present the the structure of the proposed model and the features of this approach that improve the current methods for functional PCA. The following hierarchical model defines the probability distribution generating observed time series. We present and comment each level separately.
Level 1: As the standard Bayesian fPCA model in Equation (Parameter clustering in Bayesian functional PCA of fMRI data), the distribution of the centred data given the parameters of the underlying smooth function and the noise term is given by:

(0.0)

where and are T-dimensional vectors and is the probability density function of a multivariate Gaussian random variable with mean and variance-covariance matrix such that I denotes the identity matrix. Here we assume constant noise for simplicity although other characterisations are possible (Wang and others, 2016). It follows that the likelihood function is given by

(0.0)

Level 2: To encode fPC scores cluster membership we introduce a classification variable as a stochastic indicator that identifies which latent class is associated with parameter . Prior distributions of the fPC scores , given the parameters of underlying clusters and the classification variable , are given by

(0.0)

where and are mean and precision for the -th cluster in the -th eigendimension, respectively. Here we use a dimensional mixture of Gaussian distributions, independently for each retained eigendimension as we admit different (independent) partitions for each mode of variation.
Level 3: Prior distributions for and , given hyperparameters and parameters , are given by

(0.0)

where is the probability mass function of a categorical random variable which is the generalisation of a Bernoulli random variable to outcomes. Cluster precision can be modelled also by using Uniform distributions on the cluster standard deviation .
Hyperparameters and are often centred around empirical estimates in the literature (Richardson and Green, 1997); here, we take advantage of the properties of fPCA decomposition to tune the higher hierarchical levels in our model around weakly informative prior distributions. It follows from the Karhunen-Loève representation that, for any given , are uncorrelated fPC scores with monotonically decreasing variance given by the eigenvalues (Liu and others, 2017); therefore, sensible functions of the empirical estimates of the eigenvalues can be used to fix and under the assumption that, for every eigendimension , the position and dispersion of a cluster are both functions of . We note that setting and appeared to work well in our applications.
Level 4 and 5: Prior distribution for , given hyperparameter and prior distribution for are given by

(0.0)

where Dir is a symmetric Dirichlet distribution with parameter that models the prior belief over the mixing proportions . The dispersion parameter is usually fixed or modelled with a prior distribution; here we used the uniform distribution with a sufficiently large (Rasmussen, 2000; Medvedovic and Sivaganesan, 2002; Medvedovic and others, 2004; De Iorio and others, 2018).
Different tuning of and can be employed for and to incorporate the knowledge that the first eigendimension is more likely to capture global patterns in the data while the following dimensions are more sensitive to local features. We provide specific examples in Section Parameter clustering in Bayesian functional PCA of fMRI data.
The model structure can be displayed with a direct acyclic graph (DAG) (Figure 1). As approaches infinity the model corresponds to a DP mixture model (Rasmussen, 2000; Medvedovic and Sivaganesan, 2002; Medvedovic and others, 2004; McDowell and others, 2018) with the difference that we have placed here multiple independent mixtures over the prior distribution of the fPC scores. In practice we used the truncated stick-breaking construction and tested the model with different values of . All the conditional posteriors of this model (most of them available in closed form) are provided in Appendix A. Markov Chain Monte Carlo (MCMC) techniques are used to simulate from the joint posterior distribution of all parameters given the data. Reconstruction of the smooth trajectories is made easy by its linear relationship with the model parameters ; thus it is possible to obtain the posterior distribution of the -th curve for every and at every MCMC iteration ,

(0.0)

where is the smoothed estimate of the sample mean. It follows that symmetric point-wise credible intervals for each trajectory-specific mean can be obtained easily from Equation 0.0 by considering the and quantiles of the empirical distribution.

\@sect

subsection2[Clustering]Clustering

In this section we focus on the clustering of fPC scores. DP mixture models implicitly produce classification through the allocation of each data point to a generating distribution with some probability. Clustering uncertainty can be evaluated at different levels such as the number of clusters, the size of each cluster and the observations (or parameters) assigned to them. Although a single partition is often of interest in applications, we note that exploring the cluster uncertainty can result in a better interpretation of the findings. In order to explore cluster uncertainty, we employ the empirical distribution of generated clusterings which can be considered a good approximation of the true posterior distribution after burn-in iterations (Neal, 2000). We use functions of to obtain other distributions of interest, such as the number and size of clusters, as well as to compute Maximum a Posteriori Probabilities (MAPs) and pairwise probability matrices. MAPs are commonly used to identify the most probable clustering for each observation while pair-wise probability matrices represent the posterior belief for all pair of curves to belong to the same cluster (Medvedovic and Sivaganesan, 2002; Medvedovic and others, 2004; McDowell and others, 2018). Each of these distributions has some limitations. Simply considering the number of clusters does not account for size and stability (i.e. the number of times a cluster appears in the MCMC chain); however, we find it useful to test the presence of more than one cluster using the Bayes factor. The size of clusters informs on the most relevant groups, though it does not guarantee that the observations joining a cluster remain loyal to it. In addition, MAPs are known to be limited by the possible presence of multiple modes and cases where individuals who share the same modal group are less frequently together than with others in different clusters. These issues can be addressed by the pairwise probability matrices but they can be hard to interpret, especially for large and high cluster uncertainty.
Although we find each analysis by itself inadequate to draw robust conclusions, considering them together as a whole provides rich information on the (a posteriori) most likely partition for each eigendimension. We present an application of these analyses to fMRI data in Section Parameter clustering in Bayesian functional PCA of fMRI data.

\@sect

subsection2[fPC scores clustering as generalisation of standard clustering]fPC scores clustering as generalisation of standard clustering

In the standard infinite mixture model based clustering, the indicators would associate a couple of trajectories to a certain cluster with probability . On the other hand, by placing infinite mixtures over the fPC scores for every eigendimension retained, we allow for a more complex network of dependence among curves. In our model, and would associate fPC scores and to potentially different clusters in every eigendimension with probability . It follows that a pair of curves could happen to share the same cluster in only part of the eigendimension retained, expanding the standard model based clustering to a richer classification method. Furthermore, as each dimension represents a mode of variation (eigenfunction) and its importance (eigenvalue), our method offers additional insights into the underlying spatio-temporal structure of the data. In the following sections we show how clustering fPC scores produces a rich spatio-temporal exploration of complex neuroscientific data.

\@sect

section1[Simulation study]Simulation study

\@sect

subsection2[Simulation scenarios]Simulation scenarios

We performed a Monte Carlo (MC) simulation study to assess the performance of the proposed model and compare it to the standard Bayesian fPCA model in terms of both curves reconstruction and classification. In order to test model performance with simulated data matching those of the targeted neuroscientific applications as closely as possible, we used curves resembling evoked responses in the brain.
We generated two groups (Group 1 and 2) of length time series for a total of curves. Group 1 was composed of trajectories representing active brain areas, while Group 2 represented all other brain regions. Within Group 1, two different sub-patterns (1.1 and 1.2) represented meaningful differences in the way brain areas are activated by the stimulus. We applied a random Gaussian noise and tested the models with both high and low signal-to-noise ratios (STN). Figure 2 shows an example from the set of generated curves where either a low or high random noise is added.
One hundred datasets () for each STN were input to fPCA first for curve smoothing using cubic B-splines and dimension reduction by estimating the respective eigenvalues and eigenfunctions. We retained a number of dimensions explaining at least of the total variability in curves. Figure 2 shows eigenfunctions and their weights extracted after smoothing a set of low-noise curves. The first eigenfunction explains more than of the overall variability and captures the main peak in Group 1 representing the general response to the stimulus. The second eigenfunction explains about of the total variability and represents peaks that feature the subset of curves in group 1.1 with a slightly different response than group 1.2.
We adapted the general model presented in Section Parameter clustering in Bayesian functional PCA of fMRI data to the specific simulation analysis using eigenvalues and their properties to develop vaguely informative prior distributions for the parameters and (Equations (0.0) and (0.0)) in the two eigendimensions retained .
We set and as well as setting and . We made sure that even the smallest upper-bound of the dispersion parameter distribution represented an expected number of clusters a priori far higher than the ground truth (expected mean 14, variance 9) (Escobar, 1994; Jara and others, 2007). A similar choice for was specified by De Iorio and others (2018) due to the resulting stable computations.
We coded the model in R using the rjags package (Plummer, 2013); we employed a conservative approach (Medvedovic and Sivaganesan, 2002; Gelman and others, 2013) using iterations for the burn-in and retaining the subsequent MCMC iterations. This ensured good convergence results for all the parameters of interest. We used a thinning of 5 to store efficiently results from 100 simulated datasets (approximately 70 MB each with K=2). It takes 36 minutes on average (2.9 SD) to complete one simulation run on a 2-core Intel CPU running at 2.7 GHz.
We used Integrated Mean Square Error (IMSE) to measure and compare reconstruction performance between the new model and the standard Bayesian fPCA model. IMSE and its associated MC approximation for every curve are given by

(0.0)

where the expectation is taken with respect to the underlying curve . The IMSE is a useful measure of performance in density estimation and is frequently used in curves reconstruction (Gentle, 2009; Rasheed and Aref, 2016). In addition, as curves correlation is often of interest in neuroscientific applications (e.g. for measuring the degree of functional connectivity between brain areas), we measured correlations reconstruction using the L2 norm and compared it with the same distance computed using the estimates of the standard Bayesian fPCA model.
In order to assess clustering performance we adopted the Adjusted Rand Index (ARI) to quantify the similarity between the estimated partitions (using MAP) and the ground truth for every MC dataset and eigendimension . The ARI is commonly used in literature to compare different clustering methods as it varies between exact partitions agreement (1) and when partitions agree no more than is expected by chance (0) (McDowell and others, 2018; Hubert and Arabie, 1985). Moreover, we measured the improvement in distance (L2 norm) between the posterior pair-wise probability matrix and the ground truth compared to that of the standard Bayesian fPCA model where only one cluster for each dimension is implied. For clarity we named the following formula Clustering Improvement Index (CII):

(0.0)

where are the clustering matrices for the two competing models (new and standard), the underlying truth and the worst case-scenario we could incur with (i.e. the case where the true 0s and 1s are inverted). It follows that CII is 1 (i.e. improvement) when the clustering is completely accurate and values around 0 indicate no real advantage compared with the standard model. Differently to the ARI, this index accounts for cluster uncertainty in . An example of pair-wise probability matrix for is shown in Appendix B (Figure 1).

\@sect

subsection2[Simulation results]Simulation results

Results of curve reconstruction are reported in Table Parameter clustering in Bayesian functional PCA of fMRI data and Figure 4. The case of high noise (STN=1) shows all curves improved their IMSE in the proposed model compared to the standard model; moreover, we can see a substantial improvement in the outcome of the proposed model with of all trajectories having the IMSE diminished by at least compared to the standard model (Table Parameter clustering in Bayesian functional PCA of fMRI data, top; Figure 4). Results of low noise scenario (STN=6) show less improvement but still support the use of our model with of curves with IMSE improved and a median improvement of about compared to the standard model (Table Parameter clustering in Bayesian functional PCA of fMRI data, bottom; Figure 4). In addition, correlation reconstruction showed results in line with those of curve reconstruction with improvements of RMSE in the new model both in low and high noise scenarios; the latter achieving the largest reduction in RMSE (Table Parameter clustering in Bayesian functional PCA of fMRI data).
The clustering performance of the proposed model in terms of group identification and classification of the fPC scores is reported in Table Parameter clustering in Bayesian functional PCA of fMRI data, both in the case of low and high noise. The new model scored high in the two classification indices in both eigendimensions studied; clusters in the first eigendimension were almost always correctly identified by ARI and CII for both high and low signal to noise ratios. The identification of three clusters in the second eigendimension was more challenging as they were smaller and nearer to each other; however, high scores were almost always obtained when the low noise scenario was tested and even in the case of high noise we obtained fairly high scores in both indices (Table Parameter clustering in Bayesian functional PCA of fMRI data). Figure 2 in Appendix B provides further evidence of the improved level of information obtained by using the new model. Overall, the proposed model outperformed the standard Bayesian fPCA model in curve reconstruction, especially in the case of high noise in the data; moreover, it was able to retrieve the original spatial partition of curves and bring to light important relationships between clusters. These results together could further help the understanding of the underlying neuroscientific phenomenon in a real data scenario.

\@sect

section1[Application]Application

\@sect

subsection2[fMRI setting]fMRI setting

The study relates to a thirty-year-old healthy woman volunteer who underwent a resting-state fMRI at the Department of Radiology, Scientific Institute Santa Maria Nascente, Don Gnocchi Foundation (Milan, Italy) during February 2015. The recording was carried out using a 1.5 T Siemens Magnetom Avanto (Erlangen, Germany) MRI scanner with 8-channel head coil. The subject was asked to lie down in the MRI machine in supine position with eyes closed for 20 minutes. She was instructed to keep alert and relaxed; no specific mental task was requested.
BOLD EPI images were acquired at rest for approximately 8 minutes. High resolution T1-weighted 3D scans were also collected to be employed as anatomical references for fMRI data analysis. Standard pre-processing involved the following steps: motion and EPI distortion corrections, non-brain tissues removal, high-pass temporal filtering (cut-off Hz) and artefacts removal using the FMRIB ICA-based Xnoiseifier (FIX) toolbox (Griffanti and others, 2014).
After the pre-processing, the resulting 4D dataset was aligned to the subject’s high-resolution T1-weighted image, registered to MNI152 standard space and resampled to resolution. One minute length series (sampled at 0.5 Hz) were subsequently extracted as the average signal within each of 90 regions of interest (ROIs) according to the Automated Anatomical Labeling (AAL90) coordinates. The resulting dataset was input to fPCA for curve smoothing and dimension reduction. The set of 90 smooth curves and the retained eigendimensions are shown in Figure 3. We kept the first three dimensions explaining more than of the total variability while accounting for more than each.
We adapted the general model in section Parameter clustering in Bayesian functional PCA of fMRI data following the approach taken in the simulation study (section Parameter clustering in Bayesian functional PCA of fMRI data), favouring global patterns in the first eigendimension and local patterns in the remaining dimensions. Furthermore, we carried out a sensitivity analysis by varying the values of the hyperparameters and as well as the shape of the prior cluster variance in each dimension (Appendix C).

\@sect

subsection2[fMRI analysis results]fMRI analysis results

The posterior probabilities associated with the single cluster (i.e. no clusters) scenario were and for the three eigendimensions , respectively. The Bayes factors (BF) for the first eigendimension was , which indicates some mild evidence against no clusters. Conversely, the second and third dimensions returned and respectively, which can be interpret as mild evidence in favour of a single cluster.
Figure 5 shows the posterior probability for a cluster being empty and the posterior distributions of cluster size given it is not empty. Two to three clusters seem to emerge in dimension 1; the size of the second cluster (green) has a peak around , very small mass near zero, and a very low probability of being empty. The third cluster (cyan) has a size peaking at but more mass near zero and a higher probability of being empty. On the other hand, dimension 2 and 3 seem to suggest the presence of no more than one cluster each. The second cluster in both dimensions have higher probabilities of being empty (especially in dimension 2) and the distributions of their size have much more mass around zero. Furthermore, the distributions of the first cluster (red) in both dimensions have a notable peak around suggesting that, even when more than one cluster is considered, the large majority of fPC scores in dimension 2 and 3 tends to be gathered within a single large cluster.
The use of MAPs suggests there might be no more than 2 groups in the first dimension and 1 group in the second and third dimensions. Clustering with MAPs in the first dimension identified of curves whose trajectories are wigglier and with a visibly shorter inter-peak difference between the first positive and negative peaks compared to the other group (Figure 6). Figure 7 shows an example of curve reconstruction using the posterior mean and point-wise credible bands of the subject specific mean. Curves in cluster 2 pertain to brain areas from the occipital lobe (Calcarine, Cuneus, Lingual, Inferior Occipital Gyrus) and parietal lobe (Precuneus).
By analysing the pairwise probability matrix, a more comprehensive classification emerged. The previously dichotomous partition is now enriched by a third group of brain areas with no clear clustering preference (grey band at the top of the pariwise probability matrix in Figure 8). Cluster 2 comprises of curves which all represent areas from the occipital lobe (yellow dots) while curves in the third group (blue dots) belong to the cingulate cortex (Middle and Posterior Cingulate Cortex), parietal (Parietal Superior Lobule, Precuneus) and temporal (Middle and Inferior Temporal Gyrus) lobes (Figure 8).
We note that even if these results are relative to a single subject experiment, they are supported in the neuroimaging literature. For instance, it is well established that primary and extra-striate visual regions are active at rest (Van Den Heuvel and Pol, 2010) and have a role in processing mental imagery (Zhang and others, 2018). Just outside the visual cortex, The Temporal Inferior Gyrus takes part to the visual ventral stream which links information from the visual cortex to memory and recognition (Milner, 2017). Moreover, the Posterior Cingulate Cortex is known to interact with several different brain networks simultaneously and it participates in the Default Mode Network together with part of the parietal lobe (Leech and others, 2012). Conversely, it has been suggested that areas pertain to the Prefrontal Cortex (all included in cluster 1) have less long-range connectivity in the resting state condition (Tomasi and Volkow, 2011).
Finally, the sensitivity analysis further confirmed our findings as they were robust to changes in both shape and value of the hyperparameters (Appendix C).

\@sect

section1[Discussion]Discussion

The processing of the human brain is a complex phenomenon in both time and space. The modelling of spatio-temporal datasets in the big data era is a challenge becoming every day more demanding as we struggle to keep up with the overwhelmingly larger datasets we are required to make sense of. Moreover, the extraordinary advancements in neuroimaging of the last decades have focused large part of neuroscientists’ efforts on the spatial domain both in clinical practice and research. Nonetheless, the time domain retains important neurophysiological information on brain functioning and neuronal health and without it we are at risk of drawing partial and possibly wrong conclusions on how the brain works.
In the present study we proposed a model that combines functional PCA and Bayesian nonparametric techniques to explore spatio-temporal datasets flexibly. We combined the interesting idea of introducing spatial dependence among curves through the fPC scores proposed by Liu and others (2017) with the infinite Gaussian mixture model (Rasmussen, 2000; Medvedovic and Sivaganesan, 2002; Medvedovic and others, 2004; McDowell and others, 2018) to obtain a flexible modelling of the covariance structure. The main results show a clear improvement in curves reconstruction, particularly in the presence of high noise (as it is often the case in brain recordings) and the ability of exploring curves dependence dynamically allowing for different spatial patterns for each eigendimension retained.
Improvements in the reconstruction of high-noise corrupted curves were also reported by Liu and others (2017); in fact, the beneficial effect of accounting for curves similarity is more evident when the true signal is well masked behind the noise. Nevertheless, a direct modelling of large covariance matrices often resorts to the use of common covariance functions to avoid overparametrisation. The use of functions such as Matérn or rational quadratic implies a priori knowledge on the shape of spatial dependence. We believe that this approach does not suit highly complex phenomena, such as brain processing, where dependence has a much more elaborate architecture than a simple function of spatial proximity. Clustering the fPC scores allowed us to capture dependence among curve flexibly without the need to estimate the relative spatial covariance matrix.
DP mixture models have been also used for clustering time series through the clustering of the relative coefficients in a basis expansion representation (Angelini and others, 2012; James and Sugar, 2003; Ray and Mallick, 2006; Zhou and Wakefield, 2006). In the present study we moved from a global clustering of the data to a local clustering of fPC scores to address both the exploration of brain activity data and to improve curve reconstruction. This approach offers a richer classification technique as curves can potentially be assigned to different clusters in each eigendimension and a different number of clusters in every eigendimension is allowed. It follows that the assumption of separability of the cross-covariance matrix is avoided and rich time-space interactions can be captured by the model; as a consequence, this local borrowing of information also improves the reconstruction of the underlying smooth process. In addition, we benefit from the properties of the fPCA expansion to tune the hyperparameters and improve the MCMC convergence.
Cross-covariance matrices are often intractable if we do not resort to compromises in our models. A sensible compromise should be tailored to the type of specific data. In this study, we compromised with the time domain by using fPCA with a fixed number of eigendimensions while giving flexibility in the modelling of spatial dependence. This served the purpose of breaking off from the separability assumption while, at the same time, favouring interpretation and a simple model structure.
By means of a simulation study and the analysis of an fMRI dataset we demonstrate that our model is effective in recovering the underlying smooth curves and it produces a valuable exploration of the spatio-temporal dependence in brain time series. In our future work we plan to expand our approach to replicated data and multiple subjects experiments. Exploring inter-individual patterns of functional connectivity and their uncertainty can help answer important questions not only in the study of brain processes but also in the characterisation, early diagnosis and prognosis of brain diseases.

\@sect

section1[Software]Software

Software in the form of R code, together with a sample input data set and complete documentation is available on request from the corresponding author (N.Margaritella@sms.ed.ac.uk).

\@ssect

Acknowledgments

fMRI data were kindly provided by IRCCS Santa Maria Nascente, Don Carlo Gnocchi foundation of Milan (Italy). Conflict of Interest: None declared.

References

  • Angelini and others (2012) Angelini, Claudia, De Canditiis, Daniela and Pensky, Marianna. (2012). Clustering time-course microarray data using functional Bayesian infinite mixture model. Journal of Applied Statistics 39(1), 129–149.
  • Crainiceanu and Goldsmith (2010) Crainiceanu, Ciprian M and Goldsmith, A Jeffrey. (2010). Bayesian functional data analysis using WinBUGS. Journal of Statistical Software 32(11).
  • De Iorio and others (2018) De Iorio, Maria, Gallot, Natacha, Valcarcel, Beatriz and Wedderburn, Lucy. (2018). A Bayesian semiparametric Markov regression model for juvenile dermatomyositis. Statistics in Medicine 37(10), 1711–1731.
  • Escobar (1994) Escobar, Michael D. (1994). Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89(425), 268–277.
  • Escobar and West (1995) Escobar, Michael D and West, Mike. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90(430), 577–588.
  • Gelman (2006) Gelman, Andrew. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3), 515–534.
  • Gelman and others (2013) Gelman, Andrew, Stern, Hal S, Carlin, John B, Dunson, David B, Vehtari, Aki and Rubin, Donald B. (2013). Bayesian Data Analysis, 3rd edition. Chapman and Hall/CRC.
  • Gentle (2009) Gentle, James E. (2009). Computational Statistics. Springer.
  • Griffanti and others (2014) Griffanti, Ludovica, Salimi-Khorshidi, Gholamreza, Beckmann, Christian F, Auerbach, Edward J, Douaud, Gwenaëlle, Sexton, Claire E, Zsoldos, Enikő, Ebmeier, Klaus P, Filippini, Nicola, Mackay, Clare E and others. (2014). Ica-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging. Neuroimage 95, 232–247.
  • Hasenstab and others (2017) Hasenstab, Kyle, Scheffler, Aaron, Telesca, Donatello, Sugar, Catherine A, Jeste, Shafali, DiStefano, Charlotte and Şentürk, Damla. (2017). A multi-dimensional functional principal components analysis of EEG data. Biometrics 73(3), 999–1009.
  • Hubert and Arabie (1985) Hubert, Lawrence and Arabie, Phipps. (1985). Comparing partitions. Journal of Classification 2(1), 193–218.
  • James and Sugar (2003) James, Gareth M and Sugar, Catherine A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association 98(462), 397–408.
  • Jara and others (2007) Jara, Alejandro, García-Zattera, María José and Lesaffre, Emmanuel. (2007). A Dirichlet process mixture model for the analysis of correlated binary responses. Computational Statistics & Data Analysis 51(11), 5402–5415.
  • Leech and others (2012) Leech, Robert, Braga, Rodrigo and Sharp, David J. (2012). Echoes of the brain within the posterior cingulate cortex. Journal of Neuroscience 32(1), 215–222.
  • Liu and others (2017) Liu, Chong, Ray, Surajit and Hooker, Giles. (2017). Functional principal component analysis of spatially correlated data. Statistics and Computing 27(6), 1639–1654.
  • McDowell and others (2018) McDowell, Ian C, Manandhar, Dinesh, Vockley, Christopher M, Schmid, Amy K, Reddy, Timothy E and Engelhardt, Barbara E. (2018). Clustering gene expression time series data using an infinite gaussian process mixture model. PLoS Computational Biology 14(1), e1005896.
  • Medvedovic and Sivaganesan (2002) Medvedovic, Mario and Sivaganesan, Siva. (2002). Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18(9), 1194–1206.
  • Medvedovic and others (2004) Medvedovic, Mario, Yeung, Ka Yee and Bumgarner, Roger Eugene. (2004). Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20(8), 1222–1232.
  • Milner (2017) Milner, AD. (2017). How do the two visual streams interact with each other? Experimental Brain Research 235(5), 1297–1308.
  • Neal (2000) Neal, Radford M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265.
  • Plummer (2013) Plummer, Martyn. (2013). rjags: Bayesian graphical models using MCMC. R package version 3(10).
  • Ramsay and others (2009) Ramsay, James, Hooker, Giles and Graves, Spencer. (2009). Functional Data Analysis with R and MATLAB. Springer Science & Business Media.
  • Ramsay and Silverman (2005) Ramsay, James and Silverman, Bernard Walter. (2005). Functional Data Analysis. Springer Series in Statistics.
  • Rasheed and Aref (2016) Rasheed, Huda A and Aref, Raghda. (2016). Bayesian inference for parameter and reliability function of inverse Rayleigh distribution under modified squared error loss function. Australian Journal of Basic and Applied Sciences 10(16), 241–248.
  • Rasmussen (2000) Rasmussen, Carl Edward. (2000). The infinite Gaussian mixture model. In: Advances in neural information processing systems. pp. 554–560.
  • Ray and Mallick (2006) Ray, Shubhankar and Mallick, Bani. (2006). Functional clustering by Bayesian wavelet methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(2), 305–332.
  • Richardson and Green (1997) Richardson, Sylvia and Green, Peter J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59(4), 731–792.
  • Sørensen and others (2013) Sørensen, Helle, Goldsmith, Jeff and Sangalli, Laura M. (2013). An introduction with medical applications to functional data analysis. Statistics in Medicine 32(30), 5222–5240.
  • Tian (2010) Tian, T Siva. (2010). Functional data analysis in brain imaging studies. Frontiers in Psychology 1, 35.
  • Tomasi and Volkow (2011) Tomasi, Dardo and Volkow, Nora D. (2011). Functional connectivity hubs in the human brain. Neuroimage 57(3), 908–917.
  • Van Den Heuvel and Pol (2010) Van Den Heuvel, Martijn P and Pol, Hilleke E Hulshoff. (2010). Exploring the brain network: a review on resting-state fMRI functional connectivity. European Neuropsychopharmacology 20(8), 519–534.
  • Viviani and others (2005) Viviani, Roberto, Grön, Georg and Spitzer, Manfred. (2005). Functional principal component analysis of fMRI data. Human Brain Mapping 24(2), 109–129.
  • Wang and others (2016) Wang, Jane-Ling, Chiou, Jeng-Min and Mueller, Hans-Georg. (2016). Review of functional data analysis. Annual Review of Statistics and its Application 3, 257–295.
  • Wolfson (2018) Wolfson, Ouri. (2018). Understanding the human brain via its spatio-temporal properties (vision paper). In: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM. pp. 85–88.
  • Zhang and others (2018) Zhang, Zheng, Zhang, Delong, Wang, Zengjian, Li, Junchao, Lin, Yuting, Chang, Song, Huang, Ruiwang and Liu, Ming. (2018). Intrinsic neural linkage between primary visual area and default mode network in human brain: evidence from visual mental imagery. Neuroscience 379, 13–21.
  • Zhou and Wakefield (2006) Zhou, Chuan and Wakefield, Jon. (2006). A Bayesian mixture model for partitioning gene expression data. Biometrics 62(2), 515–525.

Figure 1: Parameters Clustering Bayesian fPCA model. Squares= observed random variables; circles= unobserved random variables; dashed= deterministic relationship; solid= stochastic relationship.
Figure 2: Top: an example of curves obtained in one simulation with low random noise (STN 6) and high random noise (STN 1); group 1 has two subsets of curves with slightly different patterns (wiggly curves). Bottom: the first and second eigenfunctions explaining about of the total variability in the curves.
Figure 3: fMRI data analysis: fPCA decomposition. The dataset composed by 90 curves (after smoothing) and the three retained eigendimensions explaining of the total variability in the curves.
Figure 4: Simulation study: curve reconstruction. IMSE distribution and density stratified for the curve groups (1.1, 1.2, 2) and noise levels (STN=6, STN=1).
Figure 5: fMRI data analysis: cluster identification. The first row shows the posterior probabilities of being empty for the second to tenth clusters in the three eigendimensions analysed. The second row shows the posterior distributions of cluster size (given it is not empty) among the first four clusters (Cl1:Cl4, right to left).
Figure 6: fMRI data analysis: cluster identification with MAPs. Top row: two clusters obtained in the first eigendimension together with the respective overall means (thick line). Bottom row: 3D localisation of cluster 2 over sagittal and axial slices of human brain.
Figure 7: fMRI data analysis: posterior means and credible bands for the resting-state fMRI data recorded from 2 sites in Frontal (flat curve) and Occipital (wiggly) brain regions.
Figure 8: fMRI data analysis: cluster identification with pairwise probabilities. Top, left: pairwise probabilities suggesting a different classification for the curves previously classified in cluster 2 according to MAPs (Figure 7). Top, right: cluster 2 updated according to pairwise probabilities. Bottom: the updated 3-D representation of cluster 2 over sagittal and axial slices of the human brain.

Table 1.  Simulation study: curves reconstruction under low (STN=6) and high (STN=1) noise in the data. The first column (left) reports median and IQR of IMSE distribution (across curves) of the new model; the second column shows the difference (percentage) in IMSE between the standard Bayesian fPCA model (std) and the proposed model. The third column counts the number of curves (out of ) with a lower IMSE in the proposed model than the standard model. Results are also stratified by the three curve groups.


Group STN=1 Overall 0.71 [0.47, 1.09] 21.53 [19.01, 23.85] 100% (100/100) Group 1.1 1.96 [1.80, 2.10] 21.54 [17.98, 23.30] 100% (20/20) Group 1.2 0.97 [0.90, 1.06] 21.32 [19.90, 22.29] 100% (30/30) Group 2.0 0.47 [0.42, 0.51] 22.63 [19.55, 24.99] 100% (50/50) STN=6 Overall 0.20 [0.14, 0.30] 2.99 [0.90, 4.19] 80% (80/100) Group 1.1 0.83 [0.74, 0.89] 4.33 [3.10, 5.66] 85% (17/20) Group 1.2 0.27 [0.24, 0.29] 2.99 [-0.69, 4.75] 73% (22/30) Group 2.0 0.14 [0.12, 0.15] 2.41 [1.10, 3.60] 84% (45/50)

Table 2.  Simulation study: correlation reconstruction under low (STN=6) and high (STN=1) noise in the data. The first column (left) reports median and IQR of RMSE distribution (of MC simulations) of the proposed model; the second column shows the difference (percentage) in RMSE between the standard Bayesian fPCA model (std) and the proposed model. The third column counts the number of MC datasets (out of ) with a lower RMSE in the new model than the standard model.


Correlation STN=1 0.111 [0.104, 0.115] 19.91 [16.0, 23.56] 99% (99/100) STN=6 0.050 [0.047, 0.053] 5.97 [2.19, 9.17] 80% (80/100)

Table 3.  Simulation study: clustering performance. The table reports median and IQR of ARI (Adjusted Rand Index) and CII (Clustering Improvement Index) computed for each MC dataset and every STN and eigendimension analysed.


Eigendimension STN=1 1st dim 1 [1,1] 0.993 [0.993,0.986] 2nd dim 0.753 [0.444,0.868] 0.645 [0.693,0.523] STN=6 1st dim 1 [1,1] 0.962 [0.968,0.950] 2nd dim 0.966 [0.933,0.966] 0.828 [0.855,0.797]

\@ssect

APPENDIX

\@sect

section1[Posterior conditional distributions]Posterior conditional distributions In this section we present the posterior conditional distributions for the parameters of our model.

In our model we fixed , and .

\@sect

section1[Additional figures]Additional figures Additional figures not shown in the paper that further clarify the features of our simulation study.

Figure  A.9: Simulation study: Posterior pair-wise probability matrices for the first (Left) and second (Right) eigendimension. These are examples of the matrices used to compute the CII for every MC dataset; as expected, more uncertainty is visible in the second eigendimension.
Figure  A.10: Simulation study: from noise-corrupted time-series to spatio-temporal classified smooth curves. The figure exemplifies information obtained by using the proposed model. Among 100 noise-corrupted time-series, the model identified three spatial clusters; each group can be represented by the mean of the relative reconstructed curves and the two modes of variation which characterise it (red: 1st eigendimension, blue: 2nd). The new classification method brings to light a strong link (red arrow) between two clusters via the first eigendimension explaining of the variability in the data.
\@sect

section1[Sensitivity analysis]Sensitivity analysis In the sensitivity analysis we tested how changing the prior expected number of clusters and cluster size impacted on our findings. The Table below shows four different settings tested for the cluster precision (or standard deviation ) and Dirichlet precision for each of the dimensions. Overall, our results are substantially robust to changes in the shape and value of the hyperparameters. For the first eigendimension we found only few curves transitioning from cluster 2 to cluster 1 as the prior size and number of clusters increase; vice versa, we found some of the curves with high classification uncertainty (see the analysis of pairwise probabilities in Section 4.2) moving into cluster 2 as the prior size and number of clusters decrease. The other eigendimensions showed only 1 cluster each in every scenario tested.

dimension cluster variability precision parameter
1
k1
k2
k3
2
k1
k2
k3
3
k1
k2
k3
4
k1
k2
k3
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
358669
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description