A nonstationary nonparametric Bayesian approach to dynamically modeling effective connectivity in functional magnetic resonance imaging experiments
Abstract
Effective connectivity analysis provides an understanding of the functional organization of the brain by studying how activated regions influence one other. We propose a nonparametric Bayesian approach to model effective connectivity assuming a dynamic nonstationary neuronal system. Our approach uses the Dirichlet process to specify an appropriate (most plausible according to our prior beliefs) dynamic model as the “expectation” of a set of plausible models upon which we assign a probability distribution. This addresses model uncertainty associated with dynamic effective connectivity. We derive a Gibbs sampling approach to sample from the joint (and marginal) posterior distributions of the unknowns. Results on simulation experiments demonstrate our model to be flexible and a better candidate in many situations. We also used our approach to analyzing functional Magnetic Resonance Imaging (fMRI) data on a Stroop task: our analysis provided new insight into the mechanism by which an individual brain distinguishes and learns about shapes of objects.
10.1214/11AOAS470 \volume5 \issue2B 2011 \firstpage1183 \lastpage1206
Dynamic effective connectivity in fMRI
A]\fnmsSourabh \snmBhattacharyalabel=e1]sourabh@isical.ac.in and B]\fnmsRanjan \snmMaitra\corref\thanksrefa2label=e2]maitra@iastate.edu \thankstexta2Supported in part by NSF CAREER Grant DMS0437555 and by the National Institutes of Health (NIH) Award DC0006740.
Attentional control network \kwdBayesian analysis \kwdDirichlet process \kwdeffective connectivity analysis \kwdfMRI \kwdGibbs sampling \kwdtemporal correlation.
1 Introduction
Functional magnetic resonance imaging (fMRI) is a noninvasive technique for detecting regions in the brain that are activated by the application of a stimulus or the performance of a task. Although important neuronal activities are responsible for such activation, these are very subtle and can not be detected directly. Instead, local changes during neuronal activity in the flow, volume, oxygen level and other characteristics of blood, called the blood oxygen level dependent (BOLD) response, form a proxy. Much research in fMRI has focused on identifying regions of cerebral activation in response to the activity of interest. There is, however, growing interest in obtaining better understanding of the interactions between different brain regions during the operation of the BOLD response. The study of how one neuronal system interacts with another is called effective connectivity analysis [Friston (1994); Nyberg and McIntosh (2001)]. We illustrate this in the context of obtaining greater insight into how an individual brain performs a Stroop task, which is also the main application studied in this paper.
1.1 Investigating the attentional control network in a Stroop task
The human brain’s information processing capability is limited, so it sifts out irrelevant details from taskrelevant information using the cognitive function called attention. Specifically, taskrelevant information is filtered out either because of intrinsic properties of the stimulus (bottomup selection) or independently (topdown selection) [Frith (2001)]. The brain’s preference for taskrelated information in topdown selection requires coordination of neural activity via an Attentional Control Network (ACN) which has systems to process taskrelevant and irrelevant information and also a “higherorder executive control system” to modulate the frequency of neuronal firings in each [Banach et al. (2000)]. Thus, the higherorder system can execute topdown selection by increasing neuronal activity in the taskrelevant processing system while suppressing it in its taskirrelevant counterpart. Many studies have empirically found the dorsal lateral prefrontal cortex (DLPFC) to be the main source of attentional control, while the taskrelevant and irrelevant processing sites depend on whether the stimulus is visual, auditory or in some other form.
Jaensch (1929) and Stroop (1935) discovered that the brain is quicker at reading named color words (e.g., blue, yellow, green, etc.) when they are in the concordant color than if they are in a discordant color. Tasks structured along these lines are now called Stroop tasks. A muchstudied twophase experiment [Milham et al. (2002, 2003); Ho, Ombao and Shumway (2003, 2005); Milham, Banich and Barad (2003); Bhattacharya, Ho and Purkayastha (2006)] designed around such a task provided the dataset for our investigation. In the first phase, a subject was trained to associate each of three unfamiliar shapes with a unique color word (“Blue,” “Yellow” and “Green”) with 100% accuracy. The second (testing) phase involved alternating six times between blocks of eighteen interference and eighteen neutral trials. The neutral trial consisted of printing the shape in a neutral color (white). The interference trial involved presenting the subject with one of the learned shapes, but printed in a color different from that learned to be represented by that shape in the learning phase. The subject’s task was to subvocally name the shape’s color as trained in the learning phase, ignoring the color presented in the testing phase. Each neutral or interference trial consisted of a 0.3s fixation cross, a 1.2s stimulus presentation stage and a 0.5s waiting state till the next trial. fMRI images were acquired and processed to obtain three activated regions, whose averaged postprocessed time series are what we analyze further to investigate attentional control. These three regions—also denoted as Regions 1, 2 and 3 in this paper—were the lingual gyrus (LG), the middle occipital gyrus (MOG) and the DLPFC, and were chosen as representatives of taskirrelevant, taskrelevant and executivecontrol systems, respectively. The LG is a visual area for processing color information [Corbetta et al. (1991)], which in our context is taskirrelevant [Kelley et al. (1998)]. The MOG is another visual area but processes shape information, which is the taskrelated information (form of the shape) in the experiment. We refer to Bhattacharya, Ho and Purkayastha (2006) for further details on data collection and postprocessing, noting here that, as in that and other preceding papers, the objective is to investigate and to understand the working of the ACN mechanism in performing a Stroop task.
1.2 Background and related work
Structural equation modeling [McIntosh and GonzalezLima (1994); Kirk et al. (2005); Penny et al. (2004)] and timevarying parameter regression [Büchel and Friston (1998)] are two early approaches that have been used to determine effective connectivity. In general, both approaches ignore dynamic modeling of the observed system, even though the latter accounts for temporal correlation in the analysis. There is, however, strong empirical evidence [Aertsen and Preißl (1991); Friston (1994); McIntosh and GonzalezLima (1994); Büchel and Friston (1998); McIntosh (2000)] that effective connectivity is dynamic in nature, which means that the timeinvariant model assumed by both approaches may not be appropriate. Ho, Ombao and Shumway (2005) overcame some of these limitations by modeling the data using a statespace approach, but did not account for the timevarying nature of the effective connectivity parameters.
An initial attempt at explicitly incorporating the timevarying nature of effective connectivity in addition to dynamic modeling of neuronal systems was by Bhattacharya, Ho and Purkayastha (2006), who adopted a Bayesian approach to inference and developed and illustrated their methodology with specific regard to the ACN mechanism of the LG, MOG and DLPFC regions in conducting the Stroop task outlined above. We summarize their model—framing it within the context of more recent literature in dynamic modeling of effective connectivity—and discuss their findings and some limitations next. In doing so, we also introduce the setup followed throughout this paper.
1.2.1 Bayesian modeling of dynamic effective connectivity
Let be the observed fMRI signal (or the measured BOLD response) corresponding to the th region at time , , . Specifically, is some voxelwise summary (e.g., regional average) of the corresponding detrended time series in the th region. Following Bhattacharya, Ho and Purkayastha (2006), let be the modeled BOLD response [as opposed to the measured BOLD response, ], that is, the stimulus convolved with the hemodynamic response function (HRF) for the th region and time point . In this paper, is assumed to be the very widelyused standard HRF model of Glover (1999) which differences two gamma functions and has some very appealing properties visavis other HRFs [Lu et al. (2006, 2007)]. Then the model for the observed fMRI signal can be hierarchically specified as
(1) 
where and are the baseline trend and activation coefficients for the th region, the latter at time . The errors ’s are all assumed to be independent , following Worsley et al. (2002). From Bhattacharya, Ho and Purkayastha [(2006), page 797], we assume that for , that is, we use the same HRF for each of the regions. Note that, as argued in that paper, this homogeneous assumption on the is inconsequential because it is compensated by the that are associated with , and allowed to be inhomogeneous with respect to the different regions. Also, following Bhattacharya, Ho and Purkayastha [(2006), page 799], we assume that ; . Actually, (1) is a generalization of a very standard model used extensively in the literature—see, for example, Lindquist [(2008), equation (9)] or Henson and Friston [(2007), page 179, equation (14.1)], who use the same model but with a constant timeinvariant . (Indeed, as very helpfully pointed out by a reviewer, this last specification is also the general linear model commonly used to analyze fMRI data voxelwise, such as in statistical parametric mapping and related conventional whole brain activation studies.) Our specific generalization incorporates timevarying and follows Ho, Ombao and Shumway (2005), Bhattacharya, Ho and Purkayastha (2006) or Harrison, Stephan and Friston [(2007), cf. page 516, equation 38.18]—note, however, that the latter model as a random walk [see equation 38.19, page 516, of Harrison, Stephan and Friston (2007)]. We prefer allowing for timevarying activation in order to address the “learning” effect often reported in fMRI studies whereby strong activation in the initial stages of the experiment dissipates over time [Gössl, Auer and Fahrmeir (2001); Milham et al. (2002, 2003); Milham, Banich and Barad (2003)]. Further modeling specifies the activation coefficient in the th region at the th timepoint in terms of the noisefree BOLD signal in the other regions at the previous timepoint. Thus,
(2)  
(3) 
where are independent distributed errors and is the influence of the th region on the th region at time . Under (2), functionally specified cerebral areas are not constrained to act independently but can interact with other regions. Our objective is to make inferences on in order to understand the functional circuitry in the brain as it processes a certain (in this paper, Stroop) task.
Equations (1) and (2) together specify one of many Vector Autoregressive (VAR) models proposed by several authors [Harrison, Penny and Friston (2003); Goebel et al. (2003); Rykhlevskaia, Fabiani and Gratton (2006); Sato et al. (2007); Thompson and Siegle (2009); Patriota, Sato and Achic (2010)]. To see this, note that for , depends linearly upon . Hence, substituting this in (2) yields , for known functions , which are linear in , . Then substituting in (1), we see that for each , is a linear function of . Hence, the vector is a linear function of the vector . As a result, our model is a first order VAR model from the viewpoint of the responses. It is of first order since depends upon , given . Moreover, (2) shows that the activation coefficients are modeled as first order VAR; that is, the component vector depends linearly upon , .
VAR models provide an alternative or a substantial generalization [Friston (2009)] to the Dynamic Causal Modeling (DCM) approach proposed by Friston, Harrison and Penny (2003), at least in continuoustime, to model the change of the neuronal state vector over time, using stochastic differential equations. In DCM, the observed BOLD signal is modeled as , where denotes nuisance effects, and is a modeled BOLD response obtained by first using a bilinear differential (neural state) equation, parametrized in terms of effective connectivity parameters and involving , then subsequently using a “balloon model” transformation [Buxton, Wong and Frank (1998) or extensions Friston et al. (2000); Stephan et al. (2007)] to the solution of the bilinear differential equation. DCM thus uses both as well as the nuisance effects to model the observed BOLD response, with playing the same role as our with the exception that the latter is obtained using the more widelyused Glover (1999) HRF model. Further, DCM assumes a deterministic relationship between the different brain regions unlike (2) which allows for noisy dynamics [Bhattacharya, Ho and Purkayastha (2006)].
Thompson and Siegle (2009) contend that VAR models have gained popularity in recent years because “the direction and valence of effective connectivity relationships do not need to be prespecified.” As such, these models have provided a useful framework for effective connectivity analysis.
Bhattacharya, Ho and Purkayastha (2006) proposed a symmetric random walk model for :
(4) 
where are independent distributed errors. In this paper we use to refer to the model specified by (1), (2) and (4). The effective connectivity parameters ; , also form a VAR model of the first order. To see this, let . Then it follows that , where is the order identity matrix and , indicating that ’s are within the framework of a VAR model.
Bhattacharya, Ho and Purkayastha (2006) specified prior distributions on the parameters and hyperparameters of this model and used Gibbs sampling to learn the posterior distributions of the unknowns. We refer to that paper for details and for results on simulation experiments using , noting here only that their Bayesianderived inference supported ACN theory and, more importantly, the notion that effective connectivity is indeed dynamic in the network. Further, they found that the restricted model with was the bestperformer, implying no direct feedback from the two sites of control (LG and MOG) to the source (DLPFC). Interestingly, however, and perhaps surprisingly, their estimated ’s (see Figure 6 in their paper) had very little relationship with the nature of the BOLD response (see Figure 1, bottom panel, in that paper). This is surprising because from (1), we have , and similarly for , which when substituted on the righthand side of (2) makes it independent of . This means that the effective connectivity parameters depend upon , the lefthand side of (2). Since is a function of , it is reasonable to expect ’s to depend upon , but such a relationship was not found in Bhattacharya, Ho and Purkayastha (2006). This perplexing finding led us to first investigate robustness of to even slight misspecifications.
1.2.2 Robustness of the random walk model
We tested the effect of a slight departure from by simulating, instead of from (4), from the following stationary autoregressive model:
(5) 
We call this slightly modified model . Here, and to match the details of the dataset of Section 1.1. We fit to data simulated from . Figure 1 displays the estimated posterior distributions of . The marginal posterior distribution of each is represented here by eight quantiles, each containing 12.5% of the distribution: increased opacity in shading denotes denser regions. Solid lines represent true values. As seen, many parts of the posterior distribution have very little coverage of the true effective connectivity parameters: this finding is also supported by Table 1 which provides the proportion of true values included in the 95% highest posterior density (HPD) credible intervals [Berger (1985)] (these are the shortest intervals with posterior probability 0.95). Thus, performance degrades substantially even though is not all that different from . Hence, modeling the process by a random walk may be too restrictive and thus a better approach may be needed. We do so in this paper by embedding an (asymptotically) stationary first order autoregressive AR(1) model in a larger class of models. Formally, we employ a Bayesian nonparametric framework using a Dirichlet Process (DP) prior whose base distribution is assumed to be that implied by an AR(1) model. The intuition behind this modeling style is that although one might expect the actual process to be stationary, the assumption might be too simplistic, and it is more logical to think of the stationary model as an “expected model,” thus allowing for nonstationarity (quantified by the DP prior) in the actual model. Theoretical issues related to the construction of DPbased nonstationary processes are discussed in Section 2.1. In Section 2.2 we introduce our new modeling ideas using the developments in Section 2.1. The efficacy of the new model is compared with its competitors
=275pt
0.99  0.99  0.91  1.0  0  0.05  0.05  1.0  0.60 
on some simulated datasets in Section 3. The new approach is applied in Section 4 to the dataset introduced in Section 1.1 to investigate effective connectivity between the LG, MOR and DLPFC regions. We conclude in Section 5 with some discussion. Additional derivations and further details on experiments and data analyses are provided in the supplement [Bhattacharya and Maitra (2011)], whose sections, figures and tables have the prefix “S” when referred to in this paper.
2 Modeling and methodology
2.1 A nonstationary Dirichlet process model for time series observations
A random probability measure on the probability space sampled from the Dirichlet Process (DP) denoted by , and with known distribution and precision parameter , can be represented almost surely, using the constructive method provided in Sethuraman (1994), as
(6) 
where and with ’s being independent, identically distributed (henceforth i.i.d.) random variables. The values are i.i.d. realizations from , for and are also independent of . Note that (6) implies that is discrete with probability one, and has expectation . DPs thus provide ways to place priors on probability measures.
The dependent Dirichlet process (DDP) is an extension of the DP in the sense that it allows for a prior distribution to be specified on a set of random probability measures, rather than on a single random probability measure. In other words, the realizations can be extended to accommodate an entire timeseries domain , such that . Following (6), the random process thus constructed can be represented as
(7) 
with form similar to that used for spatial DP models [see Gelfand, Kottas and MacEachern (2005)]. Note that in (7) are realizations of some stochastic process , with distribution for Hence, Kolmogorov’s consistency holds for . That is, finite dimensional joint distributions , for ordered timepoints , can be obtained from all finite but higherdimensional joint distributions (here is a finite set) specified by the process, by marginalizing over . Since (7) shows that is specified completely by the process and , and since the latter are independent of , it follows that Kolmogorov’s consistency holds for , providing a formal setup of a stochastic process of random distributions. In particular, for any , [and admits the representation ]. The collection of random measures is said to follow the DDP [see, e.g., MacEachern (2000); De Iorio et al. (2004); Gelfand, Kottas and MacEachern (2005)].
The process may be a time series that is stationary or—as adopted in our application and more realistically—asymptotically so. Indeed, while asymptotic stationarity is a very slight departure from stationarity, Section 1.2.2 demonstrates that it can have quite a significant impact on inference. It is also important to observe that although the process may be stationary or asymptotically stationary under , the same process when conditioned on is not even asymptotically stationary. Specifically,
and
Thus, is nonstationary, although under , may have a stationary model so that the mean is constant and the covariance depends upon time only through the time lag . Thus, we have defined here a process that is centered around a stationary process, but is actually nonstationary. For purposes of applications, we have given (ordered) timepoints , a variate distribution on the space of all variate distributions of with mean being the variate distribution implied by a standard time series.
The development of our nonstationary temporal process here technically resembles that of a similar spatial process in Gelfand, Kottas and MacEachern (2005), but differs from the latter in that it is actually embedded in the model for the observed fMRI signals. As a result, the full conditional distributions of ’s in our model are much more general and complicated than similar derivations following Gelfand, Kottas and MacEachern (2005). Another important difference between our approach and that of Gelfand, Kottas and MacEachern (2005) is that the latter had to introduce a pure error (“nugget”) process to avoid discreteness of the distribution of their spatial data. Such discreteness of the distribution (of our temporal data) is naturally avoided here, however, owing to the embedding approach used in our modeling. Gelfand, Kottas and MacEachern (2005) also rely on the availability of replications of the spatial dataset: our embedding approach obviates this requirement by merely assuming the availability of replicated (unobserved) random processes. We now introduce our dynamic effective connectivity model.
2.2 A Dirichlet processbased dynamic effective connectivity model
2.2.1 Hierarchical modeling
For , define the component vectors . Further, let ’s be i.i.d. , where , with denoting the scale parameter quantifying uncertainty in the base prior distribution . Also, assume that under , and for , , where and are i.i.d. for . It follows that under , where and for , has the th element
(8) 
Note that with as described above, the process is stationary if we choose and , otherwise the process converges to stationarity for large . In other words, under , which converges to 0 as , while from (8) it follows that, as with , . The case for is similar. Using the above developments, we specify our dynamic effective connectivity model hierarchically, by augmenting (1) and (2) with the following model for ’s:
Distributional assumptions on ’s, ’s and ’s are as in Section 1.2.1. We use to refer to this model: note also that as , our DPbased model converges to the AR(1) model, which we denote using . We note in closing that the effective connectivity parameters are AR(1), hence VAR, under the expected distribution of . Of course, they are trivially also so under . Note, however, that given a realization of a random distribution from the Dirichlet process, such VAR representation does not hold.
2.2.2 Other prior distributions
We specify independent prior distributions on each of ,,,,, , and . Specifically, ’s are assumed to be i.i.d. for and ’s are assumed to be i.i.d. , for . Also, ’s are independently distributed with mean and variance , while is uniformly distributed in , and , and are each i.i.d. with density having the functional form. Here and , and , , , and are all hyperparameters. In our examples, we take , reflecting our ignorance of the unknown parameter . Although the Gamma priors with are improper, they yielded proper posteriors in our case, vindicated by fast convergence of the corresponding marginal chains and resulting rightskewed posterior density estimates, which are expected of proper posteriors having positive support. For we first fix the expected value of (given by ) to be such that in the full conditional distribution of , given by (9), the “expected” probability of simulating a new realization from the “prior” base measure approximately equals the probability of selecting realizations of , for some . Hence, if there are nonzero in the model, then setting and serves the purpose. The resulting prior distribution has variance equal to its expectation if . To achieve large variance, we set ; the associated prior worked well in our examples. We also experimented with and and noted that while the case with provided the best results (see Tables S1 and S2), inferences related to the posterior distributions of the observed data were fairly robust with respect to different choices of . Moreover, the results demonstrate that in terms of percentage of inclusion of the true ’s, all inclusion percentages, with the exception of and , were quite robust with respect to . The results corresponding to and were quite similar, while those corresponding to yielded better performance. Further, other hyperparameters were estimated empirically from the data as in Bhattacharya, Ho and Purkayastha (2006) using Berger’s (1985) MLII approach.
2.2.3 Full conditional distributions
The posterior distribution of the parameters are specified by their full conditionals, which are needed for Gibbs sampling. The full conditional distributions of , , and are of standard form (see Section S1.1), while those of the ’s require some careful derivation. To describe these, note that, on integrating out , the prior conditional distribution of given for follows a Polya urn scheme, and is given by
(9) 
The above Polya urn scheme shows that marginalization with respect to induces dependence among in the form of clusterings, while maintaining the same stationary marginal for each . For Gibbs sampling we need to combine (9) with the rest of the model to obtain the full conditional distribution given all the other parameters and the data. We obtain the full conditionals by first defining, for , diagonal matrices , where lists the diagonal elements of the relevant matrix. We also define variate vectors for with first element equal to zero. For the th element of is . Further, we note that, thanks to conditional independence, it is only necessary to combine (9) with (2) to obtain the required full conditionals. It follows that
(10) 
where is the variate normal distribution with mean and variance . Also,
and
(12) 
for , with chosen to satisfy . Observe that unlike all DPbased approaches hitherto considered in the statistics literature, in our case , the conditional posterior base measure is not independent of for , which is a consequence of the fact that, thanks to (2), are not conditionally independent of each other. Thus, our methodology generalizes other DPbased methods, including that of Gelfand, Kottas and MacEachern (2005).
Section S1.2 presents an alternative algorithm to updating using configuration indicators which are updated sequentially using themselves and only the distinct , given everything else. MacEachern (1994) has argued that such an updating procedure theoretically improves convergence properties of the Markov chain: however, Section S1.3 shows that in our case the associated conditional distributions need to be obtained separately for each of the possible configuration indicators. This being infeasible, we recommend (10) for updating . [We remark here that full conditionals are easily obtained using configuration indicators in the case of Gelfand, Kottas and MacEachern (2005), thanks to the relative simplicity of their spatial problem.] Also, (2.2.3) and (12) imply that as , the full conditional distribution (10) converges to , which is actually the full conditional distribution of the entire dimensional parameter vector under the AR(1) model. In either case, we provide computationally efficient multivariate updates for our Gibbs updates: this makes our problem computationally tractable.
To obtain the full conditional of , define . Then, note that as in Escobar and West (1995), for a prior on , the full conditional distribution of the latter, given the number () of distinct and another continuous random variable , is a mixture of two distributions, specifically , where . Also, the full conditional of is Finally, the full conditional distributions of and are not very standard and need careful derivation. Section S1.4 describes a Gibbs sampling approach using configuration sets for updating and . For implementing this Gibbs step, one does not need to simulate the configuration indicators, as they can be determined after simulating the ’s using (10). Hence, this step is feasible. However, we failed to achieve sufficiently good convergence with this approach, and hence used a Metropolis–Hastings step. The acceptance ratio for the Metropolis–Hastings step is given by , evaluated, respectively, at the new and the old values of the parameters . In the above, , and the other factors are Polya urn distributions, following easily from (9). Once again, note the use of multivariate updates in the MCMC steps, making our updating approach computationally feasible and easily implemented.
We conclude this section by noting that our model is structured to be identifiable. The priors of , , are all different and informative. Further, (2) shows that is not permutationinvariant with respect to the indices of ’s. Identifiability of our model is further supported by the results in this paper, which show all posteriors (based on MCMC) to be distinct and different. This is unlike the case of the usual Dirichlet processbased mixture models which are permutationinvariant, as in Escobar and West (1995), where the parameters have the same posterior due to nonidentifiability. We now investigate performance of our methodology.
3 Simulation studies
We performed a range of simulation experiments to investigate performance of our approach relative to its alternatives. Since there are 9 nonzero ’s in our model, we followed the recipe provided in Section 2 and put a prior on the DP scale parameter . We investigated fitting , and to the simulated data of Section 1.2.2 and also to data simulated from the and models, the latter with both (clearly stationary model) and (where the model is not so clearly distinguished from nonstationarity but more clearly distinguished than when ). The Gibbs sampling procedure for model in our simulations was very similar to that of the detailed in Bhattacharya, Ho and Purkayastha (2006): we omit details. For all experiments in this paper and in the supplement, we discarded the first 10,000 MCMC iterations as burnin and stored the following 20,000 iterations for Bayesian inference. Our results are summarized here for want of space, but presented in detail in Section S2, with performance evaluated graphically [in terms of the posterior densities of ’s] and numerically using coverage and average lengths of the 95% HPD credible intervals of the posterior predictive distributions (for details, see Section S2).
The results of our experiments using the simulated data of Section 1.2.2 showed that performed better than , but model was the clear winner. Indeed, the support of the posterior distributions of and using were much too wide to be of much use, but substantially narrower under . also outperformed the other two models in terms of the proportion of true ’s included in the corresponding 95% HPD CIs. These CIs also captured almost all of the true values of under , but far fewer values using . also exhibited better predictive performance than and . All these findings which favor our DPbased model were implicitly the consequence of the fact that the true model in our experiment was approximately nonstationary, and modeled more flexibly by our nonstationary DP model rather than the stationary AR(1) model. That this borderline between stationarity and nonstationarity of the true model is important was vindicated by the results of fitting , and on the dataset simulated using . Here, outperformed both and in terms of coverage of the true values of , indicating that may underperform when compared to the true model, in terms of coverage of parameter values, when the true model can be clearly identified. In terms of prediction ability, however, was still the best performer, with the best coverage of the data points by the posterior predictive distribution and the lengths of the associated 95% CIs. This finding was not unexpected, since involves model averaging (see Section S1.5), which improves predictive performance [see, e.g., Kass and Raftery (1995)]. For the dataset simulated from with , the true model () outperformed marginally and substantially, but when , provided a much better fit than or . We have already mentioned that outperformed (and ) for the borderline case of : the experiment with demonstrated good performance of even in relatively more distinguishable situations. At the same time, the experiment with warns against overoptimism regarding ; for clearly stationary data, we are at least marginally better off replacing with a stationary model such as . In spite of this caveat for clearly stationary situations, our simulation experiments indicated that our DPbased approach is flexible enough to address stationary models as well as deviations. We now analyze the Stroop Task dataset introduced in Section 1.1.
4 Application to Stroop task data
The dataset was preprocessed following Ho, Ombao and Shumway (2005) and Bhattacharya, Ho and Purkayastha 2006, to which we refer for details while providing only a brief summary here. For each of the three regions (LG, MOG and DLPFC), a spherical region of 33 voxels was drawn around the location of peak activation. The voxelwise time series of the selected voxels in each region were then subjected to higher order (multilinear) singular value decomposition (HOSVD) using methods in Lathauwer, Moor and Vandewalle (2000). The first mode of this HOSVD, after detrending with a runningline smoother as in Marchini and Ripley (2000), provided us with our detrended time series response for the th region [see Figure S4 for ’s as well as ].
We compared results obtained using with those using and . We refer to Bhattacharya, Ho and Purkayastha (2006) and the supplement for detailed results using and , respectively, only summarizing them here in comparison with results obtained using , which we also discuss in greater detail here. Detailed studies on MCMC convergence are in Section S3.2.
4.1 Results
Figure 2 displays the Gibbsestimated marginal posterior distributions of the ’s for each time point obtained using . A striking feature of the marginal posterior densities of Figure 2 is the very strong oscillatory nature of these effective connectivity parameters with the modeled BOLD response . This is quite different from the posterior distributions of ’s obtained using (see Figure S7). Table 2 evaluates performance of the two models in terms of the length and proportion of observations contained in the 95% HPD credible intervals of the posterior predictive distributions: the intervals obtained using have greater coverage but are also much narrower, making it by far the better choice among the models. Figure 2 also shows that and —and, to a lesser extent, and —oscillate differently from the others in that their amplitude is close to zero. We examined this issue further through
=265pt
Proportions  Average length  

0.92  0.99  4,960.9  2,215.1  
1.00  1.00  3,864.2  2,068.1  
1.00  1.00  4,352.8  2,084.3 
Figure 3, which provides a map of the proportions of the cases for which each estimated marginal posterior density of has positive support at time . The intensities are mapped via a redblue diverging palette: thus, darker hues of blue and red indicate high and low values, respectively, for the proportions. Lighter hues of red or blue indicate values in the middle. Clearly, very little proportion of the marginal density is either on the positive or the negative parts of the real line for the cases of , and . We therefore investigated performance of models modified to exclude some or all of these regional influences.
4.1.1 Investigating restricted submodels of
Bhattacharya, Ho and Purkayastha (2006) found that the model with the constraint (henceforth ) provided better results that the unconstrained . Figure 2 also points to the possibility that models with some might provide better performance. We explored these aspects quantitatively using the models and , by considering the proportion of data contained in, and the average lengths of, the 95% HPD CIs of the corresponding posterior predictive distributions of . A systematic evaluation of all possible submodels is computationally very timeconsuming, so we investigated models with combinations of as in Bhattacharya, Ho and Purkayastha (2006) and with null ’s for those ’s whose posterior distributions exhibited less amplitude of oscillation as per Figure 2. Table 3 summarizes performances of the top three submodels: others are in Tables S11 and S12. The top three performers were the following:

: but with .

: but with .

: but with .
Thus, and both beat (of Table 2). The average 95% posterior predictive length using is about midway between and the unrestricted DPbased model, so we report our final findings and conclusions only using .
Proportion  Mean length  

0.99  0.99  1.0  2,097.6  2,140.4  2,276.5  
1.0  1.0  1.0  1,971.6  2,019.5  2,127.8  
1.0  1.0  1.0  1,985.0  2,021.3  2,125.4 
4.2 Summary of findings
Figure 4(a)–(h) display the posterior densities of the nonnull regional influences ’s over time. These ’s are very similar to those in Figure 2(a)–(h), with nonzero effective connectivity parameters again having a very pronounced oscillation synchronous with the modeled BOLD response: indeed, only the of Figure 4(f) has an oscillation slightly more damped than in Figure 2. Further, Figure 4(i) indicates that the estimated posterior densities put most of their mass either below zero [when is negative] or above zero [when is positive]. Indeed, these densities have substantial mass around zero only when is around zero. We also smoothed the modeled BOLD response to explore further its relationship with each of the estimated posterior mean ’s from . For each , we specified where are i.i.d. , is the amplitude of the time series, is the oscillation frequency and is a phase shift. Equivalently, with and . We obtain using the periodogram approach [see, i.e., Shumway and Stoffer (2006)]. Thus, each cycle in has a length of about 50 timepoints. A least squares fit yields and hence, and . Figure S8 shows that the smoothed BOLD response closely approximates the original time series . The correlation of with each of , , , , , , and are 0.959, 0.909, 0.952, 0.950, 0.922, 0.874, 0.949 and 0.929, respectively. Thus, ’s are not completely linear in the BOLD response, but very close to being so with regard to its transformed version.
The results of our analysis indicate that the region LG, centered around zero, exhibits very strong evidence of selffeedback, oscillatory with high amplitude, and period of about 50, matching the period of the modeled BOLD response . Similar influences are exerted by both MOG and DLPFC on LG and by the MOG region on itself. Indeed, Figure 4 indicates that these four inter and intraregional influences have, broadly, a similar pattern in terms of amplitude. The influence of LG on MOG and DLPFC is smaller and similar to each other. Further, Figure 4(f) and (h) indicate that the feedback provided by DLPFC on MOG [] is similar to that in the reverse direction []. Thus, there are three broad patterns in the way that interand intraregional influences occur.
Our analysis also demonstrates the existence of the ACN and its mechanism while performing a Stroop task. Thus, the executive control system (DLPFC) provides instruction to both the taskirrelevant (LG) and taskrelevant processing sites (MOG) but gets similar levels of feedback from the taskrelevant processor (MOG). LG which sifts out the taskirrelevant color information gets a lot of feedback in doing so from both itself and MOG. However, it provides far less feedback to the taskrelevant shape information processing MOG and the executive control DLPFC. MOG itself provides substantial selffeedback while processing shape information. Finally, note that while our results indicate higher amplitudes for interregional feedback involving ’s when they involve LG rather than MOG, this is consistent with the established notion that processing shape information is a higherlevel (more difficult) cognitive function than distinguishing color.
The results on the effective connectivity parameters using are very different from those done using [see Figure 5 of Bhattacharya, Ho and Purkayastha (2006)] or . Using , Bhattacharya, Ho and Purkayastha (2006) found some evidence of selffeedback only in LG: the 95% HPD BCRs contained zero unless increased. Further, while the relationship of the posterior mean appeared somewhat linear in , there was no relationship with the modeled BOLD response. Most ’s [with the exception of ] were almost invariant with respect to time , unlike the clear oscillatory nature of the time series obtained here using (or even ). The fact that the BOLD response had very little relationship with these effective connectivity parameters is perplexing, given that these regions were the ones found to be activated in the preprocessing of the fMRI dataset. The results on ’s using were also very surprising: while the posterior means oscillated synchronously with only for the taskirrelevant LG with a correlation of 0.943, there was no evidence of nonzero values for all the other effective connectivity parameter values (including the taskrelevant MOG), since their pointwise 95% HPD credible regions all contained zero for all time . This is very unlike the results obtained using , which also established the existence of the ACN theory in performing this task. Indeed, among all the approaches considered in the literature and here on this dataset, only the DPbased analyses have been able to capture both the dynamic as well as the oscillatory nature of the effective connectivity parameters. In doing so, we also obtain further insight into how an individual brain performs a Stroop task.
5 Conclusions and future work
Effective connectivity analysis provides an important approach to understanding the functional organization of the human brain. Bhattacharya, Ho and Purkayastha (2006) provide a coherent and elegant Bayesian approach to incorporating uncertainty in the analysis. In this paper we note that this approach also brings forth with it some limitations. In this paper we therefore propose a nonstationary and nonparametric Bayesian approach using a DPbased model that embeds an AR(1) process in the class of many possible models. Heuristically, our suggestion has some connection with model averaging, where we have, a priori, an AR(1) model in mind for specifying dynamic effective connectivity: the DP provides a coherent way to formalize our intuition. We have also derived an easily implemented Gibbs sampling algorithm for learning about the posterior distributions of all the unknown quantities. Simulation studies show that our model is a better candidate for the analysis of effective connectivity in many cases. The advantage is more pronounced with increasing departures from stationarity in the true model. We also applied our methodology to investigate the feedback mechanisms between the taskirrelevant LG, the taskrelevant MOG and the “executive control” DLPFC in the context of a singlesubject Stroop task study. Our results showed strong selffeedback for LG and MOG, but not for DLPFC. Further, MOG and DLPFC influence LG strongly but the reverse is rather mild. The influence of MOG on DLPFC and vice versa are very similar. All these discovered feedback mechanisms oscillate strongly in the manner of the BOLD signal and are supportive of the framework postulated by ACN theory. Our analysis also provides understanding into the mechanism of how the brain performs a Stroop task. All these are novel findings not reported in the context of fMRI analysis in the literature. Thus, adoption of our DPbased approach not only provided interpretable results, but—as very kindly pointed out by a reviewer—yielded additional insights into the workings of the brain.
There are several aspects of our methodology and analysis that deserve further attention. For one, we have investigated ACN in the context of a Stroop task for a single male volunteer. It would be of interest to study other tasks and responses to other stimuli and also to see how our results on a Stroop task translate to multiple subjects and to investigate how these mechanisms differ from one person to another. Our modeling approach can easily be extended to incorporate such scenarios. Further, our methodology, while developed and evaluated in the context of modeling dynamic effective connectivity in fMRI datasets, can be applied to other settings also, especially in situations where the actual models for the unknowns may be quite difficult to specify correctly. Thus, we note that while this paper has made an interesting contribution to analyzing dynamic effective connectivity in singlesubject fMRI datasets, several interesting questions and extensions meriting further attention remain.
Contents \slink[doi]10.1214/11AOAS470SUPP \slink[url]http://lib.stat.cmu.edu/aoas/470/supplement.pdf \sdatatype.pdf \sdescriptionSection S1 contains additional details regarding our methodology, including explicit forms of the full conditional distributions of specific parameters, the configuration indicators and the distinct parameters associated with the Dirichlet process needed for Gibbs sampling. Detailed arguments that show model averaging associated with our DPbased model are also presented there. Section S2 provides additional information on our simulation experiments, including associated methodology and results. Section S3 presents further details on the analysis of the Stroop task experiment, including display of the data, detailed assessment of convergence of our MCMC samplers when using and and MCMCbased posterior analysis using and other additional models obtained by setting some effective connectivity parameters to zero. Additional methodological details and results regarding the smoothing of the modeled BOLD signal are also presented there.
Acknowledgments
The authors are very grateful to the Editor and two reviewers, whose very detailed and insightful comments on earlier versions of this manuscript greatly improved its content and presentation.
References
 Aertsen and Preißl (1991) {binproceedings}[author] \bauthor\bsnmAertsen, \bfnmA.\binitsA. \AND\bauthor\bsnmPreißl, \bfnmH.\binitsH. (\byear1991). \btitleDynamics of activity and connectivity in physiological neuronal networks. In \bbooktitleNonlinear Dynamics and Neuronal Networks (\beditor\bfnmH. G.\binitsH. G. \bsnmSchuster, ed.) \bpages281–302. \bpublisherVCH, \baddressNew York. \endbibitem
 Banach et al. (2000) {barticle}[author] \bauthor\bsnmBanach, \bfnmM. T.\binitsM. T., \bauthor\bsnmMilham, \bfnmM. P.\binitsM. P., \bauthor\bsnmAtchley, \bfnmR.\binitsR., \bauthor\bsnmCohen, \bfnmN. J.\binitsN. J., \bauthor\bsnmWebb, \bfnmA.\binitsA., \bauthor\bsnmWszalek, \bfnmT.\binitsT., \bauthor\bsnmKramer, \bfnmA. F.\binitsA. F., \bauthor\bsnmLiang, \bfnmZ. P.\binitsZ. P., \bauthor\bsnmWright, \bfnmA.\binitsA., \bauthor\bsnmShenker, \bfnmJ.\binitsJ. \AND\bauthor\bsnmMagin, \bfnmR.\binitsR. (\byear2000). \btitlefMRI studies of Stroop tasks reveal unique roles of nterior and posterior brain systems in attentional selection. \bjournalJournal of Cognitive Neuroscience \bvolume12 \bpages988–1000. \endbibitem
 Berger (1985) {bbook}[author] \bauthor\bsnmBerger, \bfnmJ. O.\binitsJ. O. (\byear1985). \btitleStatistical Decision Theory and Bayesian Analysis. \bpublisherSpringer, \baddressNew York. \MR0804611 \endbibitem
 Bhattacharya, Ho and Purkayastha (2006) {barticle}[author] \bauthor\bsnmBhattacharya, \bfnmS.\binitsS., \bauthor\bsnmHo, \bfnmM. R.\binitsM. R. \AND\bauthor\bsnmPurkayastha, \bfnmS.\binitsS. (\byear2006). \btitleA Bayesian approach to modeling dynamic effective connectivity with fMRI data. \bjournalNeuroImage \bvolume30 \bpages794–812. \endbibitem