A Manifold Regularized MultiTask Learning Model for IQ Prediction from Multiple fMRI Paradigms
Abstract
Multimodal brain functional connectivity (FC) data have shown great potential for providing insights into individual variations in behavioral and cognitive traits. The joint learning of multimodal imaging data can utilize the intrinsic association, and thus can boost the learning performance. Although several multitask based learning models have already been proposed by viewing the feature learning on each modality as one task, most of them ignore the geometric structure information inherent in the modalities, which may play an important role in extracting discriminative features. In this paper, we propose a new manifold regularized multitask learning model by simultaneously considering betweensubject and betweenmodality relationships. Besides employing a groupsparsity regularizer to jointly select a few common features across multiple tasks (modalities), we design a novel manifold regularizer to preserve the structure information both within and between modalities in our model. This will make our model more adaptive for realistic data analysis. Our model is then validated on the Philadelphia Neurodevelopmental Cohort dataset, where we regard our modalities as functional MRI (fMRI) data collected under two paradigms. Specifically, we conduct experimental studies on fMRI based FC network data in two task conditions for intelligence quotient (IQ) prediction. The results demonstrate that our proposed model can not only achieve improved prediction performance, but also yield a set of IQrelevant biomarkers.
I Introduction
In recent decades, the human brain functional connectome has emerged as an important “fingerprint” to provide insights into individual variations in behavioral and cognitive traits [1, 2, 3]. The functional connectome is quantitatively characterized by a functional connectivity network (FCN) based on graph theory, where the spatially distributed but functionally linked regionsofinterest (ROIs) in the brain represent the nodes and the functional connectivities (FCs) defined as the correlations between the time courses of ROIs represent the edges. The Pearson correlation is widely adopted to measure the FC for its efficiency. It is also worth noting that among neuroimaging studies, functional magnetic resonance imaging (fMRI) is one of the most popular modalities to analyze brain FCNs due to its noninvasiveness, high spatial resolution, and good temporal resolution [4, 5, 6].
Functional connectomebased analyses using fMRI have offered great potential for understanding the brainbehavior and cognition relationship, while accounting for variables such as age, gender, intelligence, and disease [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. For instance, Meier et al. [7] constructed FCNs from restingstate fMRI, and then based on these restingstate FCNs, healthy younger and older adults were discriminated by a support vector machine (SVM) classifier. Tian et al. [9] investigated genderrelated differences in the topological organization of restingstate FCNs within the hemispheres on the basis of typical statistical tests. While FCNs are usually constructed from restingstate fMRI, task fMRI based FCs can better explore how individual traits are influenced by brain activity changes induced by traitrelated tasks [18, 19]. Calhoun et al. [14] used independent component analysis to study fMRI based FCNs from a large group of schizophrenia patients, individuals with bipolar disorder, and healthy controls while performing an auditory oddball task, followed by a multivariate statistical testing framework to infer group differences in properties of identified FCNs. Greene and Gao et al. [16, 17] showed that predictive models built from task fMRI based FC data (e.g., working memory or emotion) can lead to better predictions of fluid intelligence than models built from restingstate fMRI based FC data by the experiments on two large, independent datasets. As such, certain tasks may bring about meaningful findings across subjects with different traits, essentially facilitating biomarker identification beyond what can be found in the resting state.
The majority of previous work has focused on one imaging modality (e.g., restingstate or task fMRI). In neuroimaging research studies, it is common to acquire multimodal imaging from the same experimental subjects to provide complementary information. It has also been suggested in [20, 21] that there is a commonality between different modalities (in brain imaging modality can refer to different functional tasks or different imaging modalities) implicated by the same underlying pathology. To this end, it is highly desirable to develop an approach for a joint analysis of multiple modalities to boost learning performance. Recently, there have been notable efforts to incorporate multiple modalities in a unified framework for schizophrenia and Alzheimer’s disease (AD) diagnosis [22, 23, 24, 25, 26, 27, 28]. Specifically, Zhang et al. [25] proposed a multimodal multitask learning model, where multitask feature learning jointly selected a small number of common features from multiple modalities, and then a multimodal SVM fused these selected features for both classification and regression. Jie et al. [26] and Lei et al. [27] studied a manifold regularized multitask learning model by viewing the feature learning on each modality as one task. In addition to the groupsparsity regularizer that ensures a few common features to be jointly selected across multiple modalities (tasks), it included the manifold regularizer that preserves the structure information of the data (or called the subjectsubject relation) within each single modality. Zhu et al. [28] extended the model in [26] by imposing another two manifold regularizers that preserve the featurefeature relation and responseresponse relation, respectively. However, all these multitask based models ignore the subjectsubject relation between modalities, which could otherwise improve the final performance.
In this paper, motivated by the work in [26], we propose a new manifold regularized multitask learning model, which considers not only the relation of subjects within each single modality but also the relation of subjects between modalities. We extend the model in [26] by replacing the manifold regularizer with a novel one which defines the similarity (or the relation) of subjects by using the Gaussian radial basis function, and particularly the similarity of subjects between different modalities is calculated by propagating the similarity information of subjects within individual modality based on a weighted graph diffusion process. Motivation for this idea is derived from multiview spectral clustering studied in [29, 30], and we will introduce it in detail in the next section. From the machine learning point of view, this welldesigned manifold regularizer can extract more discriminative features and thereby improving the performance of subsequent prediction. To validate the efficiency and effectiveness of our proposed model, we perform extensive experiments on the publicly available Philadelphia Neurodevelopmental Cohort (PNC) dataset [31, 32] Here we predict the continuousvalue intelligence quotient (IQ) scores of subjects by using fMRI data in two task conditions (working memory and emotion), and our goal of this study is to investigate which common FCs from the two functional imaging modalities (here our modalities refer to fMRI data collected under two paradigms) contribute most to individual variations in IQ. To be specific, we first construct two FCNs from the two corresponding task fMRI datasets for each subject, respectively. We then regard these FCs as features extracted from the fMRI data and input them into our proposed model for subsequent analysis. It is shown that our proposed model yields improved performance in comparison to the competing models under the metrics of the root mean square error and the correlation coefficient.
The main contributions of this paper are twofold. First, we propose a new manifold regularized multitask learning model that has two apparent advantages: 1) incorporate complementary information from multiple modalities by jointly learning a small number of common features; and 2) employ a novel manifold regularizer to preserve the structure information of the data both within and between modalities. Second, we apply the proposed model on the real PNC dataset to identify relevant FC biomarkers for IQ prediction using two sets of task fMRI data, and the experimental results show that the proposed model can not only outperform the existing stateoftheart models, but also discover IQrelevant predictors that are in accordance with prior studies.
The remainder of this paper is organized as follows. Section II describes the existing multitask based learning models and our proposed new model, respectively. Section III presents the experimental results on the PNC data and some discussions. Finally, we conclude this paper in Section IV.
Notations: Throughout this paper, uppercase boldface, lowercase boldface, and normal italic letters are used to denote matrices, vectors, and scalars, respectively. The superscript denotes the transpose of a vector or a matrix. For a matrix , we denote its th row, th column, th entry, and trace as , , , and , respectively. For a vector , its th entry is denoted as . We further denote the Frobenius norm and norm of a matrix as and , respectively. Let denote the set of real numbers.
Ii Methods
Multitask learning (MTL) aims to improve the performance of multiple tasks by exploiting their relationships, particularly when these tasks have some relatedness or commonality [33, 34]. In [26], a manifold regularized multitask learning model has been recently proposed for jointly selecting a small number of common features from multiple modalities and achieved superior performance in AD classification, where each modality was viewed as one task. Importantly, this model considered the structure information of the data within each single modality by adding a manifold regularizer, compared with the classical multitask learning model. Motivated by the approach in [26], in this paper we propose a new manifold regularized multitask learning model, which includes our newly designed manifold regularizer that considers the structure information of the data both within each single modality and between modalities. In this section, we first briefly introduce the existing multitask based learning models, and subsequently present our proposed model as well as the optimization algorithm.
Iia Classical multitask learning (MTL)
Assume that there are different modalities (i.e., tasks). We denote the th modality as for , where represents the feature vector of the th subject in the th modality, and and respectively stand for the numbers of features and subjects. Let be the response vector from these subjects, and be the regression coefficient vector for the th modality. Then, the MTL model is to solve the following optimization problem:
(1) 
where denotes the regression coefficient matrix and is a regularization parameter that balances the tradeoff between residual error and sparsity. The norm encourages these multiple predictors from different modalities to share similar parameter sparsity patterns, through which the MTL model can result in improved performance for the modalityspecific models over training the models separately. It is readily seen that (1) is reduced to the least absolute shrinkage and selection operator (LASSO) problem [35] when the number of modalities equals one.
IiB Manifold regularized multitask learning (M2TL)
In the classical MTL model above, only the relation between data and the response values is considered, while ignoring the structure information of data, which most likely leads to large deviations. With the expectation that similar subjects should have similar response values, a manifold regularizer that takes into account the subjectsubject relation within each single modality is therefore introduced as follows:
(2) 
where is the estimated response vector and is the similarity matrix that defines the similarity for each pair of subjects in the th modality. As for the similarity matrix , we construct an adjacency graph by regarding each subject as a node and using the nearest neighbor rule along with the Gaussian radial basis function to calculate the edge weights as the similarities. If is among nearest neighbors of or is among nearest neighbors of , their similarity is defined as
(3) 
where is a free parameter to be fixed empirically as the mean of ; otherwise, is set to zero, i.e., . Let be the Laplacian matrix of the graph, where is a diagonal matrix with the diagonal elements being for . Then, (2) can be simplified as
(4) 
IiC Proposed new M2TL (NM2TL)
Compared with the MTL model, one appealing property of the M2TL model is that the introduced manifold regularizer in (5) can preserve the structure information of data. However, it only considers the relation of subjects within each single modality separately, but the important mutual relation of subjects between modalities is ignored. Motivated by this, in this subsection we propose a new M2TL (NM2TL) model that effectively considers both the relation of subjects within the same modality and that between modalities.
We first design the following novel manifold regularizer
(6) 
where is a constant such that when , and when . Similarly, is the similarity matrix for each pair of subjects between the th and th modalities, i.e., denotes the similarity of the th subject in the th modality and the th subject in the th modality. Note that in (6) is composed of two parts: the first part preserves the relation of subjects within each single modality; and the second part preserves the relation of subjects between modalities. The two free parameters and respectively control the effects of the two corresponding parts. In Fig. 1, the difference between the manifold regularizers in the M2TL and NM2TL models can readily be recognized.
A natural question is how to define the similarity of subjects between two modalities (or nodes from two graphs). We expect that if and (i.e., two subjects in the same modality) are similar, the cooccurring subject corresponding to should also be similar with . As presented in [29, 30], the similarity of and was calculated in a smooth way by summing over all cooccurrences, and for , i.e.,
(7) 
or in matrix form
(8) 
where and are the similarity matrices for the th and th modalities and calculated by (3), respectively. We then put them in a large matrix of the following blockwise form:
(9) 
such that along the diagonal, is used to tune the withinmodality similarity, and off the diagonal, is used to tune the betweenmodality similarity. It is obvious that is still symmetrical. Accordingly, by calculating the diagonal matrix where its diagonal elements are for , we get
(10) 
Therefore, it is not hard to verify that (6) can be equivalently expressed as
(11)  
Based on the new manifold regularizer in (11), the NM2TL model is proposed as follows:
(12) 
where , , and denote control parameters of the respective regularizers. In our NM2TL model (12), the norm regularizer ensures a sparse set of common features to be jointly learned from multiple modalities, and the manifold regularizer attempts to preserve the structure information of the data both within each single modality and between modalities. Thus, it may extract more discriminative features.
Remark 1
More recently, a similar model has been developed in [36] for identifying the associations between genetic risk factors and multiple neuroimaging modalities under the guidance of the a priori diagnosis information (i.e., AD status). Specifically, a diagnosisaligned regularizer was introduced to fully explore the relation of subjects with the class level diagnosis information in multimodal imaging such that subjects from the same class will be close to each other after being mapped into the label space, i.e.,
(13) 
where the similarity is defined as
(14) 
In this way, we can identify a set of common features that are associated with both risk genetic factors and disease status in order to have a better understanding of the biological pathway specific to AD. Our manifold regularizer in the NM2TL model can be clearly distinguished from the above diagnosisaligned regularizer in a number of aspects: Our proposed manifold regularizer is to preserve the geometric structure across modalities such that if the distance of subjects is small, their mapped response values in the label space will also be close. However, the manifold regularizer in (13) aims to preserve the class level diagnosis information. In our proposed manifold regularizer, we calculate the similarity of subjects using the Gaussian radial basis function, and particularly the similarity of subjects between different modalities is obtained by propagating the similarity information of subjects within individual modality based on a weighted graph diffusion process. This similarity measure has been proven to be effective to preserve the structure information of the original data. We use two different parameters and in our proposed manifold regularizer to balance the relative contribution of the structure information of the data within a single modality and that between modalities, which can result in a better fit to realistic data analysis.
IiD Optimization algorithm
Clearly, the objective function in (12) is convex but nondifferentiable with respect to . We write it as a summation of two functions:
(15)  
(16) 
where is convex and differentiable, while is convex but nondifferentiable. In this scenario, we optimize in (12) by the commonly used accelerated proximal gradient method [36, 37, 38, 39].
We iteratively update with the following procedure:
(17) 
where
(18) 
stands for the value of obtained at the th iteration, denotes the Frobenius inner product of two matrices, is the gradient of at point , and is a step size. As a result of simple calculation, we get
(19) 
where denotes the th block of in (10), i.e., .
By ignoring the terms independent of in (17), the update procedure can be written as
(20) 
where . In fact, (20) is equivalently expressed as
(21) 
where denotes the proximal operator [38] of the scaled function . Due to the separability of on each row, i.e., , in (20), we can solve the optimization problem for each row individually:
(22) 
In (22), the closedform solution of can be easily obtained [38]:
(23) 
Furthermore, in order to accelerate the proximal gradient method, we introduce an auxiliary variable
(24) 
and compute the gradient descent based on instead of , where the coefficient is set as
(25) 
The pseudocode of the proposed optimization algorithm is summarized in Algorithm 1.
Iii Experimental Results
Iiia Data preprocessing
In this study, we used the Philadelphia Neurodevelopmental Cohort (PNC) dataset [31, 32] for performance evaluation. The PNC is a largescale collaborative research project between the Brain Behavior Laboratory at the University of Pennsylvania and the Center for Applied Genomics at the Children¡¯s Hospital of Philadelphia. The primary objective of the PNC project was to characterize brain and behavior interaction with genetics that combines neuroimaging, diverse clinical and cognitive phenotypes, and genomics. Nearly adolescents aged – years underwent multimodal neuroimaging including restingstate fMRI, and fMRI of working memory and emotion identification tasks (called nback fMRI and emotion fMRI, respectively) in this research. All data acquired as part of the PNC can be freely downloaded from the public dbGaP site (www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000607.v1.p1).
We investigated the relationship between individual differences in IQ and brain activity during the engagement of cognitive abilities i.e., working memory and emotion identification, in this paper. The IQ scores of subjects were assessed with the Wide Range Achievement Test (WRAT), which was one test from a hour computerized neurocognitive battery (CNB) administered in the PNC. The WRAT is a standardized achievement test to measure an individual’s learning ability, e.g., reading recognition, spelling, and math computation [40], and hence provides a reliable estimate of IQ. To mitigate the influence of age over the final results, we excluded subjects whose ages were below years [41]. As a consequence, we were left with subjects (age: – and years; WRAT score: – and ; female/male: ), providing both nback fMRI and emotion fMRI. The distribution of IQ scores of these subjects is shown in Fig. 2.
All MRI scans were performed on a single T Siemens TIM Trio wholebody scanner. In the fractal back task to probe working memory, subjects were required to respond to a presented fractal only when it was the same as the one presented on a previous trial. In the emotion identification task, subjects were asked to identify faces displaying neutral, happy, sad, angry, or fearful expressions. All image data were acquired with a singleshot, interleaved multislice, gradientecho, echo planar imaging sequence. We implemented image preprocessing separately for nback fMRI and emotion fMRI of the selected subjects. The preprocessing procedures were similar to those used in [41, 42, 43, 44]. Specifically, standard preprocessing steps were applied using SPM12 (www.fil.ion.ucl.ac.uk/spm/), which primarily consisted of motion correction, spatial normalization to standard MNI space, and spatial smoothing with a mm FWHM Gaussian kernel. The functional time courses were subsequently bandpass filtered at –Hz. ROIs were defined to describe the whole brain as mm diameter spheres centered upon ROI coordinates introduced in [45]. We then calculated the Pearson correlation between the time courses of each pair of ROIs, resulting in a correlation matrix (FC matrix) for each subject in each single fMRI modality (here we regarded our modalities as fMRI data collected under the two paradigms). To avoid repeated information, only the lower triangular portion of the symmetrical correlation matrix was properly reformed into a vector with correlation values. Fisher’s ztransform was applied to these correlations to ensure normality. The FCs (Fisher’s ztransformed values) were the features used in all subsequent analysis. As a result, we extracted features from nback fMRI and features from emotion fMRI for each subject.
Model  CC (mean std)  value  RMSE (mean std)  value  
SM  nback  
emotion  
MTL  nback  
emotion  
M2TL  nback  
emotion  
NM2TL  nback  –  –  
emotion  –  – 

values were calculated by pairwise ttest comparisons between the regression accuracy of our NM2TL model and other competing models for each modality.

std denotes the standard deviation.
IiiB Experimental settings
In our experiments, we compared the performance of the proposed NM2TL model and three other competing models: (1) SM (denoted as single modality based model with LASSO [35], which is used to detect a significant subset of FCs from nback or emotion fMRI); (2) MTL [25]; and (3) M2TL [26]. We used a fold crossvalidation (CV) technique to evaluate the IQ prediction performance of all these predictive models. That is, the whole set of subjects was first randomly partitioned into disjoint subsets of as nearly equal size as possible; then each subset was successively selected as the test set and the other subsets were used for training the predictive model; and finally the trained model was applied to predict IQ scores of the subjects in the test set. This process was repeated for times independently to reduce the effect of sampling bias in the CV. All regularization parameters in the models, including the groupsparsity level and the manifold regularization parameters and , were tuned by a fold inner CV on the training set through a grid search within their respective ranges, i.e., . The in the nearest neighbor rule for the graph similarity matrix calculation was empirically set as .
One of the challenges encountered when using these predictive models is that wholebrain FC data consist of a large number of features (i.e., FCs) and a relatively small number of samples (i.e., subjects). This would give rise to various issues, such as proneness to overfitting, difficult interpretability, and computational burden. To this end, we used a simple univariate feature filtering technique to reduce the number of features prior to inputting into the predictive models. Specifically, we discarded features for which the values of the correlation with IQ scores of subjects in the nback and emotion fMRI training set were both greater than or equal to , and then trained the predictive models. All the remaining features of training subjects were normalized to have zero mean and unit norm, and the estimated mean and norm values of training subjects were used to normalize the corresponding features of testing subjects. Accordingly, we also conducted the meancentering on IQ scores of training subjects and then used the mean IQ value of training subjects to normalize the IQ scores of testing subjects. The model performance on each modality was quantified as the root mean square error (RMSE) and the correlation coefficient (CC) between predicted and actual IQ scores of subjects in the test set. An overview of the proposed framework was outlined in Fig. 3.
IiiC Regression results
Table I summarizes the regression performance of all competing models for IQ prediction. As we can see from Table I, the proposed NM2TL model consistently outperformed the other predictive models in terms of both the RMSE and the CC. Specifically, our proposed NM2TL model achieved the best CCs of for nback fMRI and for emotion fMRI, and the best RMSEs of for nback fMRI and for emotion fMRI. The next best performance was obtained by the M2TL model, i.e., for nback fMRI and for emotion fMRI in terms of the CC, and for nback fMRI and for emotion fMRI in terms of the RMSE. As shown in Table I, the MTL model, which utilized the multitask learning for a joint analysis of two modalities (tasks), achieved mostly better regression performance than the singletask based model (i.e., the SM model). It suggests that it is beneficial to use the multitask learning for integrating complementary information from multiple modalities by jointly selecting a sparse set of common features. In addition, the manifold regularizers in the M2TL and the NM2TL models that can exploit the structure information of data still help increase the performance. Specifically, the proposed NM2TL model outperformed the MTL model, improving the performance by and in the CCs, and by and in the RMSEs, for nback fMRI and emotion fMRI, respectively. Meanwhile, in Table I we reported the values of pairwise ttest based on the results of the fold CV to show statistically significant improvement of our proposed model. In light of the fact that the best performance over the IQ regressions was all obtained by our proposed NM2TL model, we can well demonstrate that the designed manifold regularizer in our proposed model was effective in identifying more discriminative features associated with IQ. Therefore, it is shown that from the machine learning point of view, properly using different regularizers in the least square regression model has been proven as a valid way to circumvent the overfitting problem and find a compact solution, especially in the high featuredimension and low samplesize scenarios (e.g., in the field of neuroimaging analysis).
We next investigated the parameters’ sensitivity by changing the values of in (12). The results in Fig. 4 show that the three parameters interactively affected the final performance, and our model was sensitive to them within only a small range. For better understanding the effect of these parameters, we also presented the performance of the MTL model as baseline that does not include any manifold regularization term. It is worth noting that when , our proposed NM2TL model will be degraded to the MTL model. As we can observe from Fig. 5, our proposed NM2TL model and the M2TL model both consistently outperformed the MTL model (baseline) under all values of . It can further embody the advantage of adding the manifold regularization term on top of the classical MTL model. Moreover, Fig. 5 shows that for each selected value of and/or , the curve representing the performance with respect to different values of was very smooth as long as , which indicates that our proposed NM2TL model and the M2TL model were very robust to when lies in the range of small values.
IiiD Discussion and future work
Human intelligence can be broadly defined as the ability of comprehending and successfully responding to a wide variety of factors in the external environment [46]. Also, IQ scores can be related to performance on cognitive tasks. Therefore, it is reasonable to examine the relationship between individual variations in IQ and brain activity during the engagement of the two cognitive tasks (i.e., working memory and emotion identification) in this paper. In the following, based on our proposed NM2TL model, we investigated the potential of both brain FCs and ROIs as biomarkers that are highly related to IQ, respectively.
To identify the most discriminative FCs, we averaged the obtained sparse regression coefficients by these fold CV trials. The coefficient vector measures the relative importance of the FC features in predicting IQ scores. For ease of visualization, we selected nback FCs and emotion FCs with the largest averaged weights, respectively, and visualized them by using the BrainNet Viewer [47] in Fig. 6. It should be noted that these selected FCs were mainly located in frontal, parietal, temporal, and occipital lobes, which are in accordance with the previous studies in the literature. For instance, in [48, 49], temporal lobe dysfunction has been shown to be related to attentiondeficit/hyperactivity disorder (ADHD), which is significantly correlated with IQ impairments. Several regions within frontal, parietal, temporal, and occipital lobes have been identified as significant predictors of IQ in [16, 17, 50]. Also, to extract the most discriminative ROIs, we computed the ROI weights by summing the weights across all FCs for each ROI. In Fig. 7, we visualized ROIs with the greatest relative prediction power on IQ for nback and emotion modalities, respectively. The results show that the largest number of the selected ROIs were located in frontal lobes, and the second largest number of the selected ROIs were in occipital, parietal, or temporal lobes. Interestingly, we also found that there were as many as overlapping ROIs between these two sets of ROIs selected separately from the corresponding two modalities.
In this paper, we focused on only two functional imaging modalities (here our modalities refer to fMRI data collected under multiple paradigms), i.e., nback fMRI and emotion fMRI collected under two paradigms. The PNC dataset also includes restingstate fMRI. An interesting future work is to incorporate all three modalities (i.e., three types of fMRI data from different paradigms) together by means of the proposed NM2TL model, which may extract more discriminative information across modalities and further improve the IQ regression performance [23]. Another interesting note perhaps should be pointed out that the similarity measure of the data regardless of within each single modality or between modalities could largely affect the contribution of the manifold regularizer to the regression performance. Therefore, in order to reveal the intrinsic structure information inherent in multiple modalities, finding an effective and powerful strategy to learn the similarity of the data would be a high priority for improving our model.
Iv Conclusion
In this paper, based on the general linear regression model, we proposed a new manifold regularized multitask learning model for a joint analysis of multiple modalities. Instead of including all highdimensional features to predict performance, our proposed model was devised in such a way that the most prominent features that are able to influence performance with improved prediction accuracy can be successfully identified. In our proposed model, besides employing the groupsparsity regularizer to jointly select a small number of common features across multiple modalities (tasks), we designed a novel manifold regularizer to preserve the structure information both within and between modalities, which will most likely affect the final performance. Furthermore, we validated the effectiveness of our proposed model on the PNC dataset by using fMRI based FC networks in two task conditions for IQ prediction. The experimental results showed that our proposed model achieved superior performance in IQ prediction compared with other competing models. Moreover, we discovered IQrelevant biomarkers in line with previous reports which may account for a proportion of the variance in human intelligence.
References
 [1] O. Sporns, “The human connectome: a complex network,” Ann. N. Y. Acad. Sci., vol. 1224, pp. 109125, 2011.
 [2] M. Cao et al., “Topological organization of the human brain functional connectome across the lifespan,” Dev. Cogn. Neurosci., vol. 7, pp. 7693, 2014.
 [3] X.N. Zuo et al., “Network centrality in the human functional connectome,” Cerebral Cortex, vol. 22, pp. 18621875, 2012.
 [4] E. A. Allen et al., “Tracking wholebrain connectivity dynamics in the resting state,” Cerebral Cortex, vol. 24, pp. 663676, 2014.
 [5] V. D. Calhoun, R. Miller, G. Pearlson, and T. Adali, “The chronnectome: timevarying connectivity networks as the next frontier in fMRI data discovery,” Neuro, vol. 84, pp. 262274, 2014.
 [6] Q. Yu et al., “Modular organization of functional network connectivity in healthy controls and patients with schizophrenia during the resting state,” Front. Syst. Neurosci., vol. 10, 2012.
 [7] T. B. Meier et al., “Support vector machine classification and characterization of agerelated reorganization of functional brain networks,” NeuroImage, vol. 60, pp. 601613, 2012.
 [8] A. Qiu, A. Lee, M. Tan, and M. K. Chung, “Manifold learning on brain functional networks in aging,” Med. Image Anal., vol. 20, no. 1, pp. 5260, 2015.
 [9] L. Tian, J. Wang, C. Yan, and Y. He, “Hemisphere and genderrelated differences in smallworld brain networks: A restingstate functional MRI study,” NeuroImage, vol. 54, pp. 191202, 2011.
 [10] V. C. Pezoulas, M. Zervakis, S. Michelogiannis, and M. A. Klados, “Restingstate functional connectivity and network analysis of cerebellum with respect to IQ and gender,” Front. Hum. Neurosci., vol. 11, 2017.
 [11] E. S. Finn et al., “Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity,” Nat. Neurosci., vol. 18, pp. 16641671, 2015.
 [12] M. Song et al., “Brain spontaneous functional connectivity and intelligence,” NeuroImage, vol. 41, pp. 11681176, 2008.
 [13] D. M. Barch et al., “Function in the human connectome: TaskfMRI and individual differences in behavior,” NeuroImage, vol. 80, pp. 169189, 2013.
 [14] V. D. Calhoun et al., “Exploring the psychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder,” Front. Psychiatry, vol. 2, 2012.
 [15] R. E. Beaty et al., “Robust prediction of individual creative ability from brain functional connectivity,” Proc. Natl. Acad. Sci., vol. 115, pp. 10871092, 2018.
 [16] A. S. Greene, S. Gao, D. Scheinost, and R. T. Constable, “Taskinduced brain state manipulation improves prediction of individual traits,” Nat. Commun., vol. 9, 2018.
 [17] S. Gao, A. S. Greene, R. T. Constable and D. Scheinost, “Task integration for connectomebased prediction via canonical correlation analysis,” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, 2018.
 [18] R. L. Buckner, F. M. Krienen, and B. T. Thomas Yeo, “Opportunities and limitations of intrinsic functional connectivity MRI,” Nat. Neurosci., vol. 16, pp. 832837, 2013.
 [19] E. S. Finn et al., “Can brain state be manipulated to emphasize individual differences in functional connectivity?” NeuroImage, vol. 160, pp. 140151, 2017.
 [20] T. Kaufmann et al., “Delayed stabilization and individualization in connectome development are related to psychiatric disorders,” Nat. Neurosci., vol. 20, pp. 513515, 2017.
 [21] V. D. Calhoun and J. Sui, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biol. Psychiatry Cogn. Neurosci. Neuroimaging, vol. 1, pp. 230244, 2016.
 [22] V. D. Calhoun, K. A. Kiehl, and G. D. Pearlson, “Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks,” Hum. Brain Mapp., vol. 7, pp. 828838, 2008.
 [23] M. S. Çetin et al., “Thalamus and posterior temporal lobe show greater internetwork connectivity at rest and across sensory paradigms in schizophrenia,” NeuroImage, vol. 97, pp. 117126, 2014.
 [24] A. M. Michael et al., “A method to fuse fMRI tasks through spatial correlations: Applied to schizophrenia,” Hum. Brain Mapp., vol. 30, pp. 25122529, 2009.
 [25] D. Zhang and D. Shen, “Multimodal multitask learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, pp. 895907, 2012.
 [26] B. Jie, D. Zhang, B. Cheng, and D. Shen, “Manifold regularized multitask feature learning for multimodality disease classification,” Hum. Brain Mapp., vol. 36, pp. 489507, 2015.
 [27] B. Lei et al., “Neuroimaging retrieval via adaptive ensemble manifold learning for brain disease diagnosis,” IEEE J. Biomed. Health Inform., DOI: 10.1109/JBHI.2018.2872581, 2018.
 [28] X. Zhu, H.I. Suk, L. Wang, S.W. Lee, and D. Shen, “A novel relational regularization feature selection method for joint regression and classification in AD diagnosis,” Med. Imag. Anal., vol. 38, pp. 205214, 2017.
 [29] V. R. De Sa, “Spectral clustering with two views,” in ICML workshop on learning with multiple views, pp. 2027, 2005.
 [30] O. Lindenbaum, A. Yeredor, M. Salhov, and A. Averbuch, “Multiview diffusion maps,” arXiv preprint arXiv:1508.05550, 2015.
 [31] T. D. Satterthwaite et al., “Neuroimaging of the Philadelphia neurodevelopmental cohort,” Neuroimage, vol. 86, pp. 544553, 2014.
 [32] T. D. Satterthwaite et al., “The Philadelphia neurodevelopmental cohort: A publicly available resource for the study of normal and abnormal brain development in youth,” Neuroimage, vol. 124, pp. 11151119, 2016.
 [33] R. Caruana, “Multitask learning,” Mach. Learning, vol. 28, pp. 4175, 1997.
 [34] A. Argyriou and T. Evgeniou, “Multitask feature learning,” in Advances in neural information processing systems, pp. 4148, 2007.
 [35] R. Tibshirani, “Regression shrinkage and selection via the lasso: A retrospective,” J. Roy. Statist. Soc., vol. 73, pp. 267288, 2011.
 [36] M. Wang, X. Hao, J. Huang, W. Shao, and D. Zhang, “Discovering network phenotype between genetic risk factors and disease status via diagnosisaligned multimodality regression method in Alzheimer’s disease,” Bioinformatics, https://doi.org/10.1093/bioinformatics/bty911, 2018.
 [37] Y. Nesterov, “A method of solving a convex programming problem with convergence rate ,” Soviet Mathematics Doklady, vol. 27, pp. 372376, 1983.
 [38] N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends in Optimization, vol. 1, pp. 123231, 2013.
 [39] X. Zhu, H.I. Suk, S.W. Lee, and D. Shen, “Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification,” IEEE Trans. Biomed. Eng., vol. 63, pp. 607618, 2016.
 [40] G. S. Wilkinson and G. J. Robertson, Wide Range Achievement Test 4 (WRAT4), Lutz, FL, 2006.
 [41] P. Zille, V. D. Calhoun, and Y.P. Wang, “Enforcing coexpression within a brainimaging genomics regression framework,” IEEE Trans. Med. Imaging, DOI: 10.1109/TMI.2017.2721301.
 [42] J. Fang et al., “Fast and accurate detection of complex imaging genetics associations based on greedy projected distance correlation,” IEEE Trans. Med. Imaging, vol. 37, pp. 860870, 2018.
 [43] W. Hu, B. Cai, V. D. Calhoun, and Y.P. Wang, “Multimodal Brain Connectivity Study Using Deep Collaborative Learning,” in Graphs in Biomedical Image Analysis and Integrating Medical Imaging and NonImaging Modalities, vol. 37, pp. 6673, Springer, Cham, 2018.
 [44] L. Xiao, J. M. Stephen, T. W. Wilson, V. D. Calhoun, and Y.P. Wang, “Alternating diffusion map based fusion of multimodal brain connectivity networks for IQ prediction,” IEEE Trans. Biomed. Eng., DOI: 10.1109/TBME.2018.2884129.
 [45] J. D. Power et al., “Functional network organization of the human brain,” Neuron, vol. 72, pp. 665678, 2011.
 [46] U. Neisser et al., “Intelligence: Knowns and unknowns,” Am. Psychol., vol. 51, pp. 77101, 1996.
 [47] M. Xia, J. Wang, and Y. He, “BrainNet Viewer: A network visualization tool for human brain connectomics,” PloS one, 8.7 (2013): e68910.
 [48] K. Rubia, A. B. Smith, M. J. Brammer, and E. Taylor, “Temporal lobe dysfunction in medicationnaive boys with attentiondeficit/hyperactivity disorder during attention allocation and its relation to response variability,” Biol. Psychiatry., vol. 62, pp. 9991006, 2007.
 [49] R. J. Haier, R. E. Jung, R. A. Yeo, K. Head, and M. T. Alkire, “The neuroanatomy of general intelligence: sex matters,” NeuroImage, vol. 25, pp. 320327, 2005.
 [50] L. J. Hearne, J. B. Mattingley, and L. Cocchi, “Functional brain networks related to individual differences in human intelligence at rest,” Sci. Rep., 6:32328, 2016.