A simple and objective method for reproducible resting state network (RSN) detection in fMRI
Contents
 1 Introduction
 2 Methods
 3 Experiments and Results
 4 Group comparison of ICA results
 5 Discussion
 6 Conclusions
 A Figure Legends
Abstract
Spatial Independent Component Analysis (ICA) decomposes the time by space functional MRI (fMRI) matrix into a set of 1D basis time courses and their associated 3D spatial maps that are optimized for mutual independence. When applied to resting state fMRI (rsfMRI), ICA produces several spatial independent components (ICs) that seem to have biological relevance  the socalled resting state networks (RSNs). The ICA problem is well posed when the true data generating process follows a linear mixture of ICs model in terms of the identifiability of the mixing matrix. However, the contrast function used for promoting mutual independence in ICA is dependent on the finite amount of observed data and is potentially nonconvex with multiple local minima. Hence, each run of ICA could produce potentially different IC estimates even for the same data. One technique to deal with this runtorun variability of ICA was proposed by Yang et al. (2008) in their algorithm RAICAR which allows for the selection of only those ICs that have a high runtorun reproducibility. We propose an enhancement to the original RAICAR algorithm that enables us to assign reproducibility values to each IC and allows for an objective assessment of both within subject and across subjects reproducibility. We call the resulting algorithm RAICARN (N stands for null hypothesis test), and we have applied it to publicly available human rsfMRI data (http://www.nitrc.org). Our reproducibility analyses indicated that many of the published RSNs in rsfMRI literature are highly reproducible. However, we found several other RSNs that are highly reproducible but not frequently listed in the literature.
Notation

Scalars variables and functions will be denoted in a nonbold font (e.g., or ). Vectors will be denoted in a bold font using lower case letters (e.g., ). Matrices will be denoted in bold font using upper case letters (e.g., ). The transpose of a matrix will be denoted by and its inverse will be denoted by . will denote the identity matrix and will denote a vector or matrix of all zeros whose size should be clear from context. is the number of ways of choosing objects from objects when order does not matter.

The th component of vector will be denoted by whereas the th component of vector will be denoted by . The element of matrix will be denoted by or . Estimates of variables will be denoted by putting a hat on top of the variable symbol. For example, an estimate of will be denoted by .

If is a random vector with a multivariate Normal distribution with mean and covariance then we will denote this distribution by . The joint density of vector will be denoted by whereas the marginal density of will be denoted as . denotes the expectation of with respect to both random variables and .
1 Introduction
Independent component analysis (ICA) (Jutten and Herault, 1991; Comon, 1994; Bell and Sejnowski, 1995; Attias, 1999) models the observed data as a linear combination of a set of statistically independent and unobservable sources. (McKeown et al., 1998) first proposed the application of ICA to the analysis of functional magnetic resonance imaging (fMRI) data. Subsequently, ICA has been applied to fMRI both as an exploratory tool for the purpose of identifying task related components (McKeown et al., 1998) as well as a signal clean up tool for the purpose of removing artifacts from the fMRI data (Tohka et al., 2008). Recently, it has been shown that ICA applied to resting state fMRI (rsfMRI) in healthy subjects reveals a set of biologically meaningful spatial maps of independent components (ICs) that are consistent across subjects  the so called resting state networks (RSNs) (Beckmann et al., 2005). Hence, there is a considerable interest in applying ICA to rsfMRI data in order to define the set of RSNs that characterize a particular group of human subjects, a disease, or a pharmacological effect.
Several variants of the linear ICA model have been applied to fMRI data including square ICA (with equal number of sources and sensors) (McKeown et al., 1998), nonsquare ICA (with more sensors than sources) (Calhoun et al., 2001), and nonsquare ICA with additive Gaussian noise (noisy ICA) (Beckmann and Smith, 2004). All of these models are well known in the ICA literature (Jutten and Herault, 1991; Cardoso, 1998; Comon, 1994; Attias, 1999). Since the other ICA models are specializations of the noisy ICA model, we will assume a noisy ICA model henceforth.
Remarkably, the ICA estimation problem is well posed in terms of the identifiability of the mixing matrix given several nonGaussian and at most 1 Gaussian source in the overall linear mixture (Rao, 1969; Comon, 1994; Theis, 2004; Davies, 2004). In the presence of more than 1 Gaussian source, such as in noisy ICA, the mixing matrix corresponding to the nonGaussian part of the linear mixture is identifiable (upto permutation and scaling). In addition, the source distributions are uniquely identifiable (upto permutation and scaling) given a noisy ICA model with a particular Gaussian covariance structure, for example, the isotropic diagonal covariance. For details, see section 2.1.2.
While these uniqueness results are reassuring, a number of practical difficulties prevent the reliable estimation of ICs on real data. These difficulties include (1) true data not describable by an ICA model, (2) ICA contrast function approximations, (3) multiple local minima in the ICA contrast function, (4) confounding Gaussian noise and (5) model order overestimation. See section 2.1.3 for more details. A consequence of these difficulties is that multiple ICA runs on the same data or different subsets of the data produce different estimates of the IC realizations.
One technique to account for this runtorun variability in ICA was proposed by (Himberg et al., 2004) in their algorithm ICASSO. Using repeated runs of ICA with bootstrapped data using various initial conditions, ICASSO clusters ICs across ICA runs using agglomerative hierarchical clustering and also helps in visualizing the estimated ICs. The logic is that reliable ICs will show up in almost all ICA runs and thus will form a tight cluster well separated from the rest. (Esposito et al., 2005) proposed a technique similar to ICASSO called selforganizing group ICA (sogICA) which allows for clustering of ICs via hierarchical clustering in across subject ICA runs. When applied to multiple ICA runs across subjects, ICASSO does not restrict the IC clusters to contain only 1 IC from each subject per ICA run. In contrast, sogICA allows the user to select the minimum number of subjects for a ”group representative” IC cluster containing distinct subjects. By labelling each ICA run as a different ”subject” sogICA can also be applied to analyze multiple ICA runs across subjects.
Similar in spirit to ICASSO and sogICA, Yang et al. (2008) proposed an intuitive approach called RAICAR (Ranking and Averaging Independent Component Analysis by Reproducibility) for reproducibility analysis of estimated ICs. The basic idea in RAICAR is to select only those ICs as ”interesting” or ”stable” which show a high runtorun ”reproducibility”. RAICAR uses simple and automated spatial crosscorrelation matrix based IC alignment which has been shown to be more accurate compared to ICASSO (Yang et al., 2008). RAICAR is applicable to both within subject as well as across subjects reproducibility analysis.
A few limitations of ICASSO, sogICA and RAICAR are worth noting:

ICASSO requires the user to select the number of IC clusters and is inapplicable without modification for across subjects analysis of ICA runs since the IC clusters are not restricted to contain only 1 IC per ICA run.

sogICA requires the user to select the minimum number of subjects for a ”group representative” cluster and also a cutoff on within cluster distances.

RAICAR uses an arbitrary threshold on the reproducibility indices selected ”by eye” or set at an arbitrary value, such as of the maximum reproducibility value.
We propose a simple extension to RAICAR that avoids subjective user decisions and allows for an automatic reproducibility cutoff. The reproducibility indices calculated in RAICAR differ in magnitude significantly depending on whether the input to RAICAR:

(a) is generated using multiple ICA runs on the same data

(b) comes from multiple ICA runs on varying data sets (e.g. between and across subject runs)
See Figure 1 for an illustration of this effect. Obviously, the reproducibility indices are much lower in case (b) since we account for both within subject and between subjects variability in estimating ICs. Case (b) is also of great interest from a practical point of view since we are often interested in making statements about a group of subjects. Hence, it is clear that a cutoff on RAICAR reproducibility values for the purposes of selecting the ”highly reproducible” components should be data dependent. In this work,

We propose a modification of the original RAICAR algorithm by introducing an explicit ”null” model of no reproducibility.

We use this ”null” model to automatically generate values for each IC via simulation. This allows for an objective cutoff specification for extracting reproducible ICs (e.g. reproducible at ) within and across subjects. We call the resulting algorithm RAICARN (N stands for ”null” hypothesis test).

We validate RAICARN by applying it to publicly available human rsfMRI data.
2 Methods
The organization of this article revolves around the following sequence of questions which ultimately lead to the development of RAICARN:

How does the original RAICAR algorithm assess reproducibility? The answer to this question in section 2.3 will set up the stage for RAICARN.

How does RAICARN permit calculation of reproducibility values? In section 2.4, we describe the RAICARN ”null” model and a simulation based approach for assigning values to ICs.

How can RAICARN be extended for between group comparison of ICs and how does it compare to other approaches in the literature? This question is addressed in section 5.4.
2.1 ICA background
In this section, we provide a brief introduction to ICA along with a discussion of associated issues related to model order selection, identifiability and runtorun variability. The noisy ICA model assumes that observed data is generated as a linear combination of unobservable independent sources confounded with Gaussian noise:
(2.1) 
In this model,
(2.2)  
If the marginal density of the th source is then the joint source density factorizes as because of the independence assumption but is otherwise assumed to be unknown. Also, since the elements of are independent their covariance matrix is diagonal. The set of variables represents the unknown parameters in the noisy ICA model. Before discussing the identifiability of model 2.1, we briefly discuss the choice of model order or the assumed number of ICs .
2.1.1 Estimating the model order
Rigorous estimation of the model order in noisy ICA is difficult as the IC densities are unknown. This means that , the marginal density of the observed data given the model order and the ICA parameters cannot be derived in closed form (by integrating out the ICs) without making additional assumptions on the form of IC densities. Consequently, standard model selection criteria such as Bayes information criterion (BIC) (Kass and Raftery, 1993) cannot be easily applied to the noisy ICA model to estimate . One solution is to use a factorial mixture of Gaussians (MOG) joint source density model as in (Attias, 1999), and use the analytical expression for in conjunction with BIC. This solution is quite general in terms of allowing for an arbitrary Gaussian noise covariance , but maximizing with respect to becomes computationally intractable using an expectation maximization (EM) algorithm for ICs (Attias, 1999). Another rigorous nonparametric approach for estimating that is applicable to the noisy ICA model with isotropic diagonal Gaussian noise covariance i.e., with is the random matrix theory based sequential hypothesis testing approach of Kritchman and Nadler (2009). To the best of our knowledge, these are the only 2 rigorous approaches for estimating in the noisy ICA model.
Approximate approaches for estimating commonly used in fMRI literature (e.g., (Beckmann and Smith, 2004)) consist of first relaxing the isotropic diagonal noisy ICA model (with ) into a probabilistic PCA (PPCA) model of (Tipping, 1999) where the source densities are assumed to be Gaussian i.e., where . When using the PPCA model, it becomes possible to integrate out the Gaussian sources to get an expression for that can be analytically maximized (Tipping, 1999). Subsequently, methods such as BIC can be applied to estimate . Alternative approaches for estimating in the PPCA model consist of the Bayesian model selection of Minka (2000), or in datarich situations such as fMRI, even the standard technique of crossvalidation (Hastie et al., 2009).
From a biological point of view, it has been argued (Cole et al., 2010) that the number of extracted ICs simply reflect the various equally valid views of the human functional neurobiology  smaller number of ICs represent a coarse view while a larger number of ICs represent a more fine grained view. However, it is worth noting that from a statistical point of view, overspecification of will lead to overfitting of the ICA model which might render the estimated ICs less generalizable across subjects. On the other hand, underspecification of will result in incomplete IC separation. Both of these scenarios are undesirable.
2.1.2 Identifiability of the noisy ICA model
To what extent is the noisy linear ICA model identifiable? Consider a potentially different decomposition of the noisy ICA model 2.1:
(2.3) 
where
(2.4)  
What can be said about the equivalence between the parameterizations in 2.1 and 2.3?
Identifiability of :
Identifiability of :
A fundamental decomposition result states that the noisy ICA problem is wellposed in terms of the identifiability of the mixing matrix upto permutation and scaling provided that the components of are independent and nonGaussian (Rao, 1969; Comon, 1994; Theis, 2004; Davies, 2004). If is a diagonal scaling matrix and is a permutation matrix then the identifiability result can be stated as:
(2.6) 
where 2.3 is another decomposition of with containing independent and nonGaussian components. In other words, the mixing matrix is identifiable upto permutation and scaling.
Identifiability of and :
Equating the second moments of the right hand side of 2.3 and 2.1 and noting the equality of means 2.5 and the independence of and we get:
(2.7) 
Case 1: and
The second equation in 2.9 along with the orthogonality of gives and thus . If we fix the scaling of by selecting then from the first equation in 2.9 we get:
(2.10)  
( is diagonal)  
In other words, the noise covariance is uniquely determined and for a fixed scaling , the source variances are also uniquely determined upto permutation.
Case 2: and arbitrary positive definite matrices
Suppose is a square matrix and let be the diagonal matrix obtained by setting the nondiagonal elements of to and similarly let be the matrix obtained by setting the diagonal elements of to 0. The noisecovariance is partially identifiable by the following conditions:
(2.11)  
For a fixed scaling , the sources variances are constrained by:
(2.12) 
In general, the source variances cannot be uniquely determined as noted in (Davies, 2004).
Identifiability of the distribution of :
Is the distribution of the nonGaussian components of identifiable? From 2.1 and 2.3:
(2.13) 
Substituting 2.5 and 2.6 in 2.13 we get:
(2.14) 
Premultiplying both sides by from 2.8 we get:
(2.15) 
Let be the characteristic functions of and respectively. Then
(2.16)  
where and is a vector of real numbers of length equal to that of the corresponding random vectors in 2.16. Using 2.15, we can write:
(2.17) 
Noting the independence of and :
(2.18)  
Now and are multivariate Gaussian random vectors both with mean and covariance matrix and respectively. Hence, their characteristic functions are given by (Feller, 1966; Wlodzimierz, 1995):
(2.19)  
Claim 2.1.
Proof.
(2.21) 
Thus from 2.19,
(2.22) 
From 2.19, and are not equal to 0 for any finite , therefore, from 2.22 and 2.18 we get:
(2.23) 
Note that is a diagonal scaling matrix with entries on the diagonal and is a permutation matrix. Thus,
(2.24) 
where is some permutation of integers . Suppose is the characteristic function of the th component of and is the characteristic function of the th component of . Since the components of and are independent by assumption, the joint characteristic functions and factorize:
(2.25)  
Since is simply a permutation of integers , there exists a such that . Then set in 2.26. Then 2.27 and 2.26 imply:
(2.28) 
Select the scaling matrix as and thus is a diagonal matrix with elements on the diagonal. Thus and 2.28 can be rewritten as:
(2.29) 
Therefore,
(2.30)  
or  
Hence the characteristic function of the st component of is identical to the characteristic function of the (possibly signflipped) th component of . Since characteristic functions uniquely characterize a probability distribution (Feller, 1966), the distribution of and is identical. Next, by setting , we can find a distribution from that matches the nd component of . Proceeding in a similar fashion, it is clear that the distribution of each component of is uniquely identifiable upto sign flips for the choice . For a general , the source distributions are uniquely identifiable upto permutation and (possibly negative) scaling, as claimed. ∎
While the source distributions might not be uniquely identifiable for arbitrary covariance matrices , they are indeed uniquely identifiable upto permutation and scaling for the noisy ICA model with isotropic Gaussian noise covariance. For more general conditions that guarantee uniqueness of source distributions, please see Eriksson and Koivunen (2004, 2006).
Corollary 2.2.
If and , then the source distributions are uniquely identifiable upto sign flips for .
Corollary 2.3.
If , then the source distributions are uniquely identifiable up to sign flips for .
2.1.3 Why is there a runtorun variability in estimated ICs?
From the discussion in section 2.1.2, it is clear that for a noisy ICA model with isotropic diagonal additive Gaussian noise covariance:

The noisy ICA parameters are uniquely identifiable up to permutation and scaling.

The source distributions in are uniquely identifiable upto permutation and scaling.
While the above theoretical properties of ICA are reassuring, there are a number of practical difficulties that prevent the reliable estimation of ICs on real data:

Validity of the ICA model:
The assumption that the observed real data is generated by an ICA model is only that  an ”assumption”. If this assumption is not valid, then the uniqueness results do not hold anymore.

Mutual information approximations:
From an information theoretic point of view, the ICA problem is solved by minimizing a contrast function which is an approximation to the mutual information (Hyvarinen, 1998) between the ICs that depends on the finite amount of observed data. Such an approximation is necessary, since we do not have access to the marginal source densities . Different approximations to mutual information will lead to different objective functions and hence different solutions. This is one of the reasons why different ICA algorithms often produce different IC estimates even for the same data.

Nonconvexity of ICA objective functions:
The ICA contrast function is potentially nonconvex and hence has multiple local minima. Since global minimization is a challenging problem by itself, most ICA algorithms will only converge to local minima of the ICA contrast function. The runtorun variability of IC estimates will also depend on the number of local minima in a particular ICA contrast function.

IC estimate corruption by Gaussian noise:
For noisy ICA, the IC realizations cannot be recovered exactly even if the true mixing matrix and mean vector are known in 2.1. Commonly used estimators for recovering realization of ICs include the least squares (Beckmann and Smith, 2004) as well as the minimum mean square error (MMSE) (Davies, 2004). Consider the least squares estimate of a realization of based on :
(2.31) This means that even for known parameters, IC realization estimates will be corrupted by correlated Gaussian noise. Hence using different subsets of the data under the true model will also lead to variability in estimated ICs.

Overfitting of the ICA model:
Over specification of the model order leads to the problem of overfitting in ICA. As we describe below, this can lead to (1) the phenomenon of IC ”splitting” and (2) an increase in the variance of the IC estimates.
1. IC ”splitting”
Suppose that the true model order or the number of nonGaussian sources in an ICA decomposition of such as 2.1 is . Then a fundamental result in (Rao, 1969, Theorem 1) states that for any other ICA decomposition of , the number of nonGaussian sources remains the same while the number of Gaussian sources can change. In other words, cannot have two different ICA decompositions containing different number of nonGaussian sources.
In view of this fact, how can a model order ICA decomposition containing nonGaussian sources be ”split” into a ICA decomposition containing nonGaussian sources when performing ICA estimation using an assumed model order of ? As we describe below, the order ICA decomposition is only an approximation to the order ICA decomposition.
Let be the th column of in 2.1. In the presence of noise, it might be possible to approximate:
(2.32) Here:

is the contribution of the th nonGaussian source to the ICA model 2.1.

and are independent nonGaussian random variables that are also independent with respect to all nonGaussian sources in 2.1.

and are the basis time courses corresponding to and respectively.

The time courses and look similar to each other.
Note that if , then 2.32 can be made into an equality by choosing . By replacing in 2.1 using 2.32, we arrive at an approximate model order decomposition of . In this decomposition, the component from a model order decomposition appears to be ”split” into two subcomponents: and .
2. Inflated variance of IC estimates
Overestimation of model order will lead to overfitting of the mixing matrix . In other words, could have several columns that are highly correlated with each other. This could happen as a result of IC ”splitting” as discussed above. Now, for a given realization , the variance of is given by (for isotropic Gaussian covariance). An increase in number of columns of and the fact that many of them are highly correlated implies that the variability of IC estimates is inflated.

In other words, running ICA multiple times on the same data or variations thereof with random initialization could produce different ICs.
2.2 ICA algorithms, single subject ICA and group ICA
In this section, we give a brief summary of how the ICA parameters are estimated in practice and also summarize the two most common modes of ICA application to fMRI data  single subject ICA (section 2.2.1) and temporal concatenation based group ICA (section 2.2.2).
Given several independent observations as per the noisy ICA model 2.1, most ICA algorithms estimate the ICA parameters and the realizations of in 2 steps. We only consider the case with , since as shown in section 2.1.2, the mixing matrix and source distributions of are identifiable upto permutation and scaling for this case.

First, the diagonal source covariance is arbitrarily set as . The mean vector is estimated as . Then, using PCA or PPCA (Tipping, 1999), the mixing matrix is estimated, upto an orthogonal rotation matrix , to be in a signal subspace which is spanned by the principal eigenvectors corresponding to the largest eigenvalues of the data covariance matrix . The noise variance is estimated in this step as well.

Next, an estimator for the source realizations is defined using techniques such as least squares or MMSE. The only unknown involved in these estimates is the orthogonal rotation matrix .
For more details on noisy ICA estimation, please see (Beckmann and Smith, 2004) and for more details on ICA algorithms, please see (Hyvarinen et al., 2001).
2.2.1 Single subject ICA
How is ICA applied to single subject fMRI data? Suppose we are given a single subject fMRI scan which we rearrange as a 2D matrix in which column is the observed timecourse in the brain at voxel . Observed timecourses are considered to be independent realizations of as per the linear ICA model 2.1. Suppose is the matrix containing the estimated source realizations at the voxels. The th row of is the th IC. In other words, we decompose the time by space fMRI 2D matrix into a set of basis timecourses and a set of 3D IC maps using ICA.
2.2.2 Group ICA
How is ICA applied to data from a group of subjects in fMRI? Suppose we collect fMRI images from subjects. First, we register all subjects to a common space using a registration algorithm (e.g., affine registration). Next, we rearrange each of the fMRI scans into 2D matrices , each of size . Column in is the demeaned timecourse observed at voxel location for subject . The matrices are temporally concatenated to get a matrix as follows:
(2.33) 
Column of is the vector which is assumed to follow a linear ICA model 2.1. are considered to be independent realizations of the model 2.1. Suppose is a matrix containing the estimated source realizations at the voxels. The th row of is the th group IC. In group ICA, the joined timeseries across subjects is modeled using noisy linear ICA. In practice, is the PCA reduced data set for subject . The PCA reduction is either done separately for each subject using subject specific data covariance (Calhoun et al., 2001) or an average data covariance across subjects (Beckmann et al., 2005). The average covariance approach requires each subject to have the same number of time points in fMRI scans.
2.3 The original RAICAR algorithm
In this section, we give a brief introduction to the RAICAR algorithm of (Yang et al., 2008). Suppose we are given a data set which we decompose into ICs using ICA (e.g., single subject or group ICA). Our goal is to assess which ICs consistently show up in multiple ICA runs i.e., the reproducibility of each of these ICs. To that extent, we run the ICA algorithm times. Suppose is the vector (e.g. spatial ICA map rearranged into a vector) of the th IC from th ICA run. Suppose is a absolute spatial crosscorrelation coefficient matrix between the ICs from runs and :
(2.34) 
where denotes absolute value. is the absolute spatial crosscorrelation coefficient between IC from run and IC from run . The matrices are then arranged as elements of a blockmatrix such that the th row and th column of is (see Figure 2). This block matrix is the starting point for a RAICAR acrossrun component matching process.
Since ICs within a particular run cannot be matched to each other, the matrices along the blockdiagonal of are set to as shown in Figure 2 with a gray color. The following steps are involved in a RAICAR analysis:

Find the maximal element of . Suppose this maximum occurs in matrix at position . Hence component from run matches component from run . Let us label this matched component by (the first matched component).

Next, we attempt to find from each run ( and ) a component that matches with component . Suppose element is the maximal element in the th column of . Then component is the best matching component from run with the th component from run .
Similarly, suppose element is the maximal element in the th row of . Then component is the best matching component from run with component from run . As noted in (Yang et al., 2008), in most cases . However, it is possible that . Hence the component number matching from run is defined as follows:
(2.35) We would also like to remove component of run from further consideration during the matching process. To that extent, we zero out the th row from and the th column from .

Once a matching component has been found for all runs , we also zero out the th row from and the ith column from . Similarly, we zero out the th column from and the th row from . This eliminates component from run and component from run from further consideration during the matching process.

Steps 13 complete the matching process for one IC component across runs. These steps are repeated until components are matched across the runs. We label the matched component as which contains a set of matching ICs one from each of the ICA runs.
Suppose matched component , consists of the matched ICs . Form the crosscorrelation matrix between the matched components in . The th element of this matrix is simply:
(2.36) 
The normalized reproducibility of is then defined as:
(2.37) 
The double sum in 2.37 is simply the sum of the upper triangular part of excluding the diagonal. The normalizing factor is simply the maximum possible value of this sum. Hence the normalized reproducibility satisfies: .
Note that our definition of normalized reproducibility is slightly different from that in Yang et al. (2008). Whereas Yang et al. (2008) averages the thresholded absolute correlation coefficients, we simply average the unthresholded absolute correlation coefficients to compute reproducibility thereby avoiding the selection of a threshold on the absolute correlation coefficients.
2.4 The RAICARN enhancement
In this section, we describe how to compute reproducibility values for each matched component in RAICAR. Note that the RAICAR ”component matching” process can be used to assess the reproducibility of any spatial component maps  not necessarily ICA maps. For instance, RAICAR can be used to assess the reproducibility of a set of PCA maps across subjects.
In order to generate reproducibility values for the matched component maps:

We need to determine the distribution of normalized reproducibility that we get from the RAICAR ”component matching” process when the input to RAICAR represents a set of ”nonreproducible component maps” across the runs.

In addition, we would also like to preserve the overall structure seen in the observed sets of spatial component maps across the runs when generating sets of ”nonreproducible component maps” across the runs.
Hence for IC reproducibility assessment, we propose to use the original set of ICs across the runs to generate the ”nonreproducible component maps” across the runs.
Suppose ICA runs are submitted to RAICAR which gives us a vector of observed normalized reproducibility values  one for each IC. We propose to attach values for measuring the reproducibility of each IC in a datadriven fashion as follows:

First, we label the ICs across the runs using unique integers. In run 1, the ICs are labelled using integers . In run 2, the ICs are labelled using integers and so on. In run , the ICs are labelled using integers .

Our ”null” hypothesis is:
None of the ICs are reproducible (2.38) Hence, we can randomly label component from run as component from run To do this, we randomly permute the integers to get the permuted integers . Obviously .

The sets ”nonreproducible component runs under ” are constructed by assigning components with labels:

to run 1 under .

to run 2 under

to run under


After runs have been generated under , we subject these to a RAICAR analysis. This gives us values of normalized reproducibility, one for each matched component under .

Steps 14 are repeated times to build up a pooled vector of normalized reproducibility under .

Finally, we assign a value for reproducibility to each matched IC across the runs. The observed reproducibility for th matched IC is and its value is:
(2.39) 
Only those components with are considered to be significantly reproducible. We can use a fixed and objective value for such as . Note that this fixed cutoff is independent of the amount of variability in the input to RAICARN. Please see Figure 3 for a pictorial depiction of this process.
2.5 How many subjects should be used per group ICA run in RAICARN?
The input to RAICARN can either be single subject ICA runs or group ICA runs across a set of subjects. Note that the individual subject ICA runs are spatially unconstrained whereas a group ICA spatially constrains the group ICs across a set of subjects. Hence the number of ICs that can be declared as significantly reproducible at the group level are usually more than those that can be declared significantly reproducible at the single subject level. Hence the following question is relevant:
Suppose we have a group of subjects. We randomly select subjects and form a single group of subjects. We repeat this process times to get groups of subjects each of which is subjected to a group ICA analysis. Given the number of subjects , how should we choose and ?
First, we discuss the choice of . If then each of the groups will contain the same subjects and hence there will be no diversity in the groups. We would like to control the amount of diversity in the groups of subjects. Consider any 2 subjects and . The probability that both and appear in a set of randomly chosen subjects from subjects is given by:
(2.40) 
The expected number of times that and appear together in sets of subjects out of independently drawn sets is:
(2.41) 
Ideally, we would like to be only a small fraction of . Hence we impose the restriction:
(2.42) 
where is a user defined constant such as . This implies that the chosen value of must satisfy:
(2.43) 
In practice, we choose the largest value of that satisfies this inequality. As shown in Figure 5, if and then the largest value of that satisfies 2.43 is .
The number of group ICA runs should be as large as possible. From our experiments on real fMRI data we can roughly say that values of give equivalent results.
2.6 How to display the estimated nonGaussian spatial structure in ICA maps?
The ICs have been optimized for nonGaussianity. However, there can be many types of nonGaussian distributions. It has been empirically found that the nonGaussian distributions of ICs found in fMRI data have the following structure:

A central Gaussian looking part and

A tail that extends out on either end of the Gaussian
It has been suggested in (Beckmann and Smith, 2004) that a Gaussian/Gamma mixture model can be fitted to this distribution and the Gamma components can be thought of as representatives of the nonGaussian structure. We follow a similar approach:

The output of a RAICARN analysis is a set of spatial ICA maps (either transformed maps or raw maps) concatenated into a 4D volume.

We do a voxelwise transformation to Normality using the voxelwise empirical cumulative distribution function as described in (van Albada and Robinson, 2007).

Next, we submit the resulting 4D volume to a voxelwise group analysis using ordinary least squares. The design matrix for group analysis depends on the question being considered. In our case, the design matrix was simply a single group average design.

The resulting statistic maps are subjected to Student , Gamma and Gamma mixture modeling. The logic is that if the original ICA maps are pure Gaussian (i.e., have no interesting nonGaussian structure) then the result of a group average analysis will be a pure Student map which will be captured by a single Student (i.e., the Gamma and Gamma will be driven to class fractions). Hence the ”null” hypothesis will be correctly accounted for.

If the Gamma distributions have posterior probability at some voxels then those voxels are displayed in color to indicate the presence of significant nonGaussian structure over and above the background Student distribution.
Examples of Student , Gamma and Gamma mixture model fits are shown in Figure 6.
3 Experiments and Results
3.1 Human rsfMRI data
rsfMRI data titled: Baltimore (Pekar, J.J./Mostofsky, S.H.; n = 23 [8M/15F]; ages: 2040;
TR = 2.5; # slices = 47; # timepoints = 123)
, a part of the 1000 functional connectomes project, was downloaded from the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC): http://www.nitrc.org/projects/fcon_1000/.
3.2 Preprocessing
Data was analyzed using tools from the FMRIB software library (FSL: http://www.fmrib.ox.ac.uk/fsl/). Preprocessing steps included motion correction, brain extraction, spatial smoothing with an isotropic Gaussian kernel of 5mm FWHM and 100s highpass temporal filtering. Spatial ICA was performed using a noisy ICA model as implemented in FSL MELODIC (Beckmann and Smith, 2004) in either single subject or multisubject temporal concatenation mode also called group ICA. Please see section 2.2 for a brief summary of single subject ICA and group ICA. In each case, we fixed the model order of ICA at to be consistent with the model order range typically extracted in rsfMRI and fMRI (Smith et al., 2009; Esposito et al., 2005). For temporal concatenation based group ICA, single subject data was first affinely registered to the MNI 152 brain and subsequently resampled to 4x4x4 resolution (MNI 4x4x4) to decrease computational load.
3.3 RAICARN analysis with 1 ICA run per subject
Spatial ICA was run once for each of the subjects in their native space. The resulting set of ICA components across subjects were transformed to MNI 4x4x4 space and were submitted to a RAICARN analysis.^{1}^{1}1In all RAICARN analyses reported in this article, we used the transformed IC maps  which are basically the raw IC maps divided by a voxelwise estimate of noise standard deviation (named as melodic_IC.nii.gz in MELODIC). It is also possible to use the raw IC maps as inputs to RAICARN. ICA components were sorted according to their reproducibility and values were computed for each ICA component. Please see Figure 8.
We compared the reproducible RSNs from the single subject RAICARN analysis to the group RSN maps reported in literature (Beckmann et al., 2005). Please see Figure 9.
To summarize, when single subject ICA runs are combined across subjects:

We are able to declare 4 ”standard” RSNs as significantly reproducible at a value .

There are 2 other ”standard” RSNs that achieve a reproducibility value between 0.05 and 0.06.

There are 2 other ”nonstandard” RSNs that are of interest: one achieves a value of 0.0125 and the other achieves a value of 0.05699.
3.4 RAICARN on random sets of 5 subjects  50 group ICA runs
To promote diversity across the group ICA runs, as discussed in section 2.5, subjects were drawn at random from the group of subjects and submitted to a temporal concatenation based group ICA. This process was repeated times and the resulting set of 50 group ICA maps were submitted to a RAICARN analysis. ICA components were sorted according to their reproducibility and values were computed for each ICA component. Please see Figure 10.
We compared the reproducible RSNs from the single subject RAICARN analysis to the RSN maps reported in literature (Beckmann et al., 2005). Please see Figure 11.
In summary, when 50 random 5 subject group ICA runs (from a population of 23 subjects) are combined using RAICARN:

We are able to declare 8 ”standard” RSNs as significantly reproducible at a value .

There are 6 other ”nonstandard” RSNs that can be declared as significantly reproducible at a value .

There is 1 other ”nonstandard” RSN that achieves a value of 0.05299.
3.5 RAICARN on random sets of 5 subjects  100 group ICA runs
To promote diversity across the group ICA runs, as discussed in section 2.5, subjects were drawn at random from the group of subjects and submitted to a temporal concatenation based group ICA. This process was repeated times and the resulting set of 100 group ICA maps were submitted to a RAICARN analysis. ICA components were sorted according to their reproducibility and values were computed for each ICA component. Please see Figure 12.
We compared the reproducible RSNs from the single subject RAICARN analysis to the RSN maps reported in literature (Beckmann et al., 2005). Please see Figure 13.
In summary, when 100 random 5 subject group ICA runs (from a population of 23 subjects) are combined using RAICARN:

We are able to declare 8 ”standard” RSNs as significantly reproducible at a value .

There are 6 other ”nonstandard” RSNs that can be declared as significantly reproducible at a value .

There is 1 other ”nonstandard” RSN that achieves a value of 0.05824.
4 Group comparison of ICA results
In this section, we summarize the main approaches for group analysis of ICA results which can be broadly classified into two categories: (1) Approaches based on a single ICA run or no ICA run and (2) Approaches based on multiple ICA runs. To make things concrete, suppose we have two groups of subjects and .
4.1 Approaches based on a single group ICA run or no ICA run
The main idea in these approaches is to use the results of a group ICA using all subjects to derive subject specific spatial maps for group comparison. A typical sequence of steps is as follows:

The first step involves extraction of a set of template IC maps or a set of template mixing matrix time courses. This can be accomplished using two techniques:

Group ICA based template IC maps or time courses:
A temporal concatenation based group ICA is run using data from all subjects in group and . This usually involves two PCA data reductions. The first reduction is based on a subject wise PCA decomposition (Calhoun et al., 2001) or an average PCA decomposition (Beckmann et al., 2005) as discussed in section 2.2.2. The next reduction is based on PCA reduced temporally concatenated data. Subsequently, the group ICs and the dual PCA reduced mixing matrix time courses are estimated using an ICA algorithm. 
User supplied set of template IC maps:
The user supplies a set of spatial maps, perhaps corresponding to an ICA decomposition on an independent data set.


The next step either uses template IC maps or time courses.

Template time course based approach:
First, the mixing matrix is PCA back projected and partitioned into subject specific sub matrices. Next, subject specific spatial maps corresponding to the group ICs are estimated via leastsquares and a second PCA back projection is used to estimate the corresponding subject specific time courses. This is the approach proposed in (Calhoun et al., 2001), which we will refer to as the group ICA back projection approach. 
Template IC based approach:
First, spatial multiple regression using the template ICs as regressors is used against the original data of each subject to derive subject specific time courses corresponding to each template IC. Next, a second multiple regression using the subject specific time courses is used against the original data of each subject to derive subject specific spatial maps corresponding to each template IC. This approach called ”dualregression” has been proposed by (Beckmann et al., 2009). A similar approach called fixed average spatial ICA (FAS ICA) had also been proposed earlier in (Calhoun et al., 2004). Both dualregression and FAS ICA involve the first spatial regression stage, but dualregression also includes a second temporal regression stage.


Once subject specific spatial maps and time courses corresponding to group ICs have been determined, they are entered into a random effects analysis for group comparison.
4.1.1 Advantages of single group ICA based approaches

Much reduced computational load compared to multiple ICA based approaches.

Ability to take advantage of constrained spatial IC estimation across all subjects via group ICA.
Please see section 5 for discussion.
4.2 Approaches based on multiple single subject or group ICA runs
In these approaches results of multiple ICA runs in groups and are used for a between group analysis. A typical sequence of steps is as follows:

The first step involves:

running a separate single subject ICA for all subjects from groups and (possibly with multiple runs per subject) or

running a set of group ICA runs across various sets of subjects separately, with each set containing subjects either from group or group


The next step is to establish a correspondence between the ICs within and across groups. There are two main techniques of establishing this correspondence:

Template based methods:
In these approaches, the user defines a template or a spatial map containing the network of interest. Examples of templates include a spatial map of the default mode network (DMN) derived from a separate ICA analysis, a spatial map from a separate PCA analysis, or even a binary mask defining the regions of interest. The template is then used to select from each run of ICA (single subject or group ICA) in each group ( and ), an IC that best matches the template using a predefined metric such as spatial correlation coefficient or goodness of fit (GOF) (Greicius et al., 2004). 
Template free methods:
These approaches do not need a predefined template from the user, but instead attempt to match or cluster all ICs simultaneously within and across groups. Examples of such approaches include self organizing group ICA (sogICA, (Esposito et al., 2005)) and RAICAR (Yang et al., 2008). Each matched component or IC cluster includes one IC from each ICA run (single subject or group ICA) in each group ( and ).


Finally, the selected ICs in template based methods or ICs from a selected IC cluster/matched component in template free methods are then entered into a random effects group analysis (with repeated measures for multiple single subject ICA runs) for between group comparison.
4.2.1 Advantages of multiple ICA run approaches

They account for both algorithmic and data set variability of ICA.

Group comparisons happen on true ICs i.e., optimal solutions for the ICA problem.
Please see section 5 for discussion.
5 Discussion
As discussed in section 2.1.2, in the noisy linear ICA model with isotropic diagonal Gaussian noise covariance, for a given true model order, the mixing matrix and the source distributions are identifiable upto permutation and scaling. However, as pointed out in section 2.1.3, various factors prevent the convergence of ICA algorithms to unique IC estimates. These factors include ICA model not being the true data generating model, approximations to mutual information used in ICA algorithms, multiple local minima in ICA contrast functions, confounding Gaussian noise as well as variability due to model order overestimation. A practical implication of these factors is that ICA algorithms converge to different IC estimates depending on how they are initialized and on the specific data used as input to ICA. Hence, there is a need for a rigorous assessment of reproducibility or generalizability of IC estimates. A set of reproducible ICs can then be used as ICA based characteristics of a particular group of subjects.
We proposed an extension to the original RAICAR algorithm for reproducibility assessment of ICs within or across subjects. The modified algorithm called RAICARN builds up a ”null” distribution of normalized reproducibility values under a random assignment of observed ICs across the runs. This ”null” distribution is used to compute reproducibility values for each observed matched component from RAICAR. An objective cutoff such as can be used to detect ”significantly reproducible” components. This avoids subjective user decisions such as selection of the number of clusters in ICASSO or the reproducibility cutoff in RAICAR or a cutoff on intra cluster distance in sogICA.
5.1 Results for publicly available rsfMRI data
We applied RAICARN to publicly available subject rsfMRI data from http://www.nitrc.org/. We analyzed the data in 2 different ways:

ICs were extracted for each of the subjects. The single subject ICA runs were subjected to a RAICARN analysis (after registration to standard space).
In single subject ICA based RAICARN analysis (see Figures 8  9), we are able to declare 6 out of the 8 ICs reported in (Beckmann et al., 2005) (which used group ICA) as ”reproducible” (4 ICs have values and 2 ICs have values ). This is consistent with the 5 reproducible RSNs reported in (DeLuca et al., 2005) using single subject ICA analysis.

subjects were randomly drawn from subjects to create one group of subjects which was subjected to a group ICA analysis in which components were extracted. This process was repeated or times and the resulting group ICA runs were subjected to a RAICARN analysis.
In group ICA based RAICARN analysis (see Figures 10  13), we are able to declare all 8 components reported in (Beckmann et al., 2005) as ”reproducible” (at ). Some of the ICs detected as ”reproducible” in the group ICA based RAICARN on human rsfMRI data are not shown in (Beckmann et al., 2005) but do appear in the more recent paper (Smith et al., 2009). RAICARN results for are almost identical to those for suggesting that runs of group ICA are sufficient for a RAICARN reproducibility analysis.
5.2 Single subject ICA vs Group ICA
Based on our results, it appears that single subject ICA maps are less reproducible compared to group ICA maps as illustrated in Figures 8 and 10. A single subject ICA based analysis is more resistant to subject specific artifacts. On the other hand, a group ICA based analysis makes the strong assumption that ICs are spatially identical across subjects. If this assumption is true, group ICA takes advantage of temporal concatenation to constrain the ICs spatially across subjects thereby reducing their variance. Hence, when there are no gross artifacts in individual rsfMRI data sets, group ICA is expected to be more sensitive for reproducible IC detection. As seen in Figures 9 and 11, our results agree with this proposition. All ICs declared as ”reproducible” in the single subject based RAICARN analysis continue to remain ”reproducible” in the group ICA based RAICARN analysis.
5.3 How should subjects be grouped for group ICA?
This raises the question of how the subjects should be grouped together for individual group ICA runs in preparation for RAICARN. If all subjects are used in all group ICA analyses then there is no diversity in the individual group ICA runs. In this case, a RAICARN analysis will capture algorithmic variability due to nonconvexity of ICA objective function but not dataset variability. Hence, our conclusions might not be generalizable to a different set of subjects.
Another option is to randomly select subjects out of for each group ICA run and submit the resulting group ICA runs to RAICARN. In this case, we will account for both algorithmic and data set variability via a RAICARN analysis. In other words, we will be able to determine those ICs that are ”reproducible” across different sets of subjects and across multiple ICA runs. A key question is: How should we choose and ? In section 2.5, we proposed a simple method to determine the number of subjects to be used in a single group ICA run out of the subjects  the key idea is to form groups with enough ”diversity”. Multiple such group ICA runs can then be submitted to a RAICARN analysis for reproducibility assessment. Clearly, the larger the value of , the larger the value of . Hence, increasing the number of subjects in a study will allow us to make conclusions that are generalizable to a larger set of subjects. Also, conclusions generalizable to subjects are expected to hold for subjects but not vice versa.
5.4 RAICARN for group comparisons of reproducible ICs
In the present work, our focus was on enabling the selection of reproducible ICs for a given single group of subjects. However, RAICARN can be extended for between group analysis of reproducible components as well. Before we describe how to do so, it is useful to discuss other approaches for group analysis of RSNs described in section 5.4. Suppose we have two groups of subjects and .
5.4.1 Discussion of single group ICA based approaches

Subject specific maps corresponding to group ICA maps derived using ICA back projection or dual regression are not true ICs, i.e., they are not solutions to an ICA problem.

These approaches do not account for either the algorithmic or the data set variability of an ICA decomposition. The single group ICA decomposition will contain both reproducible and nonreproducible ICs, but there is no systematic way to differentiate between the two.

Both dual regression and ICA back projection using data derived IC templates are circular analyses. First, group ICA using all data is used to derive template IC maps or template time courses. Next leastsquares based ICA back projection or dual regression using a subset of the same data is used to derive subject specific maps and time courses corresponding to each IC. Thus model (group ICA) on data is used to learn an assumption (template IC maps or template time courses) that is then used to fit model (dual regression or ICA back projection) on a subset of the same data . This is circular analysis (Kriegeskorte et al., 2009; Vul and Kanwisher, 2010).
It is easy to avoid circular analysis in a dual regression approach via crossvalidation. For example, one can split the groups and into two random parts, a ”training” set and a ”test” set. First, the ”training” set can be used to derive template IC maps using group ICA. Next, the ”training” set based template IC maps can be used as spatial regressors for dual regression on the ”test” set. Alternatively, the template ICs for dual regression can also come from a separate ICA decomposition on a independent data set unrelated to groups and such as human rsfMRI data. This train/test approach cleanly avoids the circular analysis problem. It is not clear how to use crossvalidation for an ICA back projection approach since template time courses cannot be assumed to remain the same across ICA decompositions.

Subject specific structured noise is quite variable in terms of its spatial structure. Hence, a group ICA analysis cannot easily model or account for subject specific structured noise via group level ICs. Consequently, subject specific spatial maps in ICA back projection or dual regression will have a noise component that is purely driven by the amount of structured noise in individual subjects. On the other hand, a single subject ICA based analysis can accurately model subject specific structured noise via single subject ICs.
5.4.2 Discussion of multiple ICA run approaches

(Zuo et al.,