# Multi-View Graph Convolutional Network and Its Applications on Neuroimage Analysis for Parkinson’s Disease

Multi-View Graph Convolutional Network and Its Applications on Neuroimage Analysis for Parkinson’s Disease

Xi Zhang, Lifang He, Kun Chen, Yuan Luo, Jiayu Zhou, Fei Wang

Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, NY

Department of Statistics, University of Connecticut, CT

Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, IL

Department of Computer Science and Engineering, Michigan State University, MI

Equal Contribution. Corresponding author, email: few2001@med.cornell.edu

Abstract. Parkinson’s Disease (PD) is one of the most prevalent neurodegenerative diseases that affects tens of millions of Americans. PD is highly progressive and heterogeneous. Quite a few studies have been conducted in recent years on predictive or disease progression modeling of PD using clinical and biomarkers data. Neuroimaging, as another important information source for neurodegenerative disease, has also arisen considerable interests from the PD community. In this paper, we propose a deep learning method based on Graph Convolution Networks (GCN) for fusing multiple modalities in brain images to distinct PD cases from controls. On Parkinson’s Progression Markers Initiative (PPMI) cohort, our approach achieved AUC, compared with AUC achieved by traditional approaches such as PCA.

## 1 Introduction

Parkinson’s Disease (PD) [1] is one of the most prevalent neurodegenerative diseases, which occur when nerve cells in the brain or peripheral nervous system lose function over time and ultimately die. PD affects predominately dopaminergic neurons in substantia nigra, which is a specific area of the brain. PD is a highly progressive disease, with related symptoms progressing slowly over the years. Typical PD symptoms include bradykinesia, rigidity, and rest tremor, which affect speech, hand coordination, gait, and balance. According to the statistics from National Institute of Environmental Healths (NIEHS), at least 500,000 Americans are living with PD^{1}^{1}1https://www.niehs.nih.gov/research/supported/health/neurodegenerative/index.cfm. The Centers for Disease Control and Prevention (CDC) rated complications from PD as the 14th cause of death in the United States [2].

The cause of PD remains largely unknown. There is no cure for PD and its treatments include mainly medications and surgery. The progression of PD is highly heterogeneous, which means that its clinical manifestations vary from patient to patient. In order to understand the underlying disease mechanism of PD and develop effective therapeutics, many large-scale cross-sectional cohort studies have been conducted. The Parkinson’s Progression Markers Initiative (PPMI) [3] is one such example including comprehensive evaluations of early stage (idiopathic) PD patients with imaging, biologic sampling, and clinical and behavioral assessments. The patient recruitment in PPMI is taking place at clinical sites in the United States, Europe, Israel, and Australia. This injects enough diversity into the PPMI cohort and makes the downstream analysis/discoveries representative and generalizable.

Quite a few computational studies have been conducted on PPMI data in recent years. For example, Dinov et al. [4] built a big data analytics pipeline on the clinical, biomarker and assessment data in PPMI to perform various prediction tasks. Schrag et al. [5] predicted the cognitive impairment of the patients in PPMI with clinical variables and biomarkers. Nalls et al. [6] developed a diagnostic model with clinical and genetic classifications with PPMI cohort. We also developed a sequential deep learning based approach to identify the subtypes of PD on the clinical variables, biomarkers and assessment data in PPMI, and our solution won the PPMI data challenge in 2016 [7]. These studies provided insights to PD researchers in addition to the clinical knowledge.

So far research on PPMI has been mostly utilizing its clinical, biomarker and assessment information. Another important part but under-utilized part of PPMI is its rich neuroimaging information, which includes Magnetic Resonance Imaging (MRI), functional MRI, Diffusion Tensor Imaging (DTI), CT scans, etc. During the last decade, neuroimaging studies including structural, functional and molecular modalities have also provided invaluable insights into the underlying PD mechanism [8]. Many imaging based biomarkers have been demonstrated to be closely related to the progression of PD. For example, Chen et al. [9] identified significant volumetric loss in the olfactory bulbs and tracts of PD patients versus controls from MRI scans, and the inverse correlation between the global olfactory bulb volume and PD duration. Different observations have been made on the volumetric differences in substantia nigra (SN) on MRI [10, 11, 12]. Decreased Fractional Anisotropy (FA) in the SN is commonly observed in PD patients using DTI [13]. With high-resolution DTI, greater FA reductions in caudal (than in middle or rostral) regions of the SN were identified, distinguishing PD from controls with 100% sensitivity and specificity [14]. One can refer to [15] for a comprehensive review on imaging biomarkers for PD. Many of these neuroradiology studies are strongly hypothesis driven, based on the existing knowledge on PD pathology.

In recent years, with the arrival of the big data era, many computational approaches have been developed for neuroimaging analysis [16, 17, 18]. Different from conventional hypothesis driven radiology methods, these computational approaches are typically data driven and hypothesis free–they derive features and evidences directly from neuroimages and utilize them in the derivation of clinical insights on multiple problems such as brain network discovery [19, 20, 21] and imaging genomics [22, 23, 24]. Most of these algorithms are linear [25, 26] or multilinear [27, 28], and they work on a single modality of brain images.

In this paper, we develop a computational framework for analyzing the neuroimages in PPMI data based on Graph Convolution Networks (GCN) [29]. Our framework learns pairwise relationships with the following steps.

Graph Construction. We parcel the structural MRI brain images of each acquisition into a set of Region-of-Interests (ROIs). Each region is treated as a node on a Brain Geometry Graph (BGG), which is undirected and weighted. The weight associated with each pair of nodes is calculated according to the average distance between the geometric coordinates of them in each acquisition. All acquisitions share the same BGG.

Feature Construction. We use different brain tractography algorithms on the DTI parts of the acquisitions to obtain different Brain Connectivity Graphs (BCGs), which are used as the features for each acquisition. Each acquisition has a BCG for each type of tractography.

Relationship Prediction. For each acquisition, we learn a feature matrix from each of its BCG through a GCN. Then all the feature matrices are aggregated through element-wise max pooling. Finally, the feature matrices from each acquisition pair are aggregated into a vector, which is fed into a softmax classifier for relationship prediction.

It is worthwhile to highlight the following aspects of the proposed framework.

Pairwise Learning. Instead of performing sample-level learning, we learn pairwise relationships, which is more flexible and weaker (sample level labels can always be transformed to pairwise labels but not vice versa). Importantly, such a pairwise learning strategy can square the training sample size (because each pair of training samples becomes an input), which is very important to learning algorithms that need large-scale training samples (e.g., deep learning).

Nonlinear Feature Learning. As we mentioned previously, most of the existing machine learning approaches for neuroimaging analysis are based on either linear or multilinear models, which have a limited capacity of exploring the information contained in neuroimages. We leverage GCN, which is a powerful tool that can explore graph characteristics at a spectrum of frequency bands. This brings our framework more potential to achieve good performance.

Multi-Graph Fusion. Different from conventional approaches that focus on a single graph (image modality), our framework fuses 1) information on the BGG obtained from the MRI part of each acquisition; 2) the features obtained from different BCGs obtained from the DTI part of each acquisition. This effectively leverages the complementary information scattered in different sources.

## 2 Methodology

In this section, we first describe the problem setting and then present the details of our proposed approach. To facilitate the description, we denote scalars by lowercase letters (e.g., ), vectors by boldfaced lowercase letters (e.g., ), and matrices by boldface uppercase letters (e.g., ). We also use lowercase letters as indices. We write to denote the âth entry of a vector , and the entry with row index and column index in a matrix . All vectors are column vectors unless otherwise specified.

### 2.1 Problem Setting

Suppose we have a population of acquisitions, where each acquisition is subject-specific and associated with BCGs obtained from different measurements or views. A BCG can be represented as an undirected weighted graph . The vertex set consists of ROIs in the brain and each edge in is weighted by a connectivity strength, where is the number of ROIs. We represent edge weights by an similarity matrix with denoting the connectivity between ROI and ROI . We assume that the vertices remain the same while the edges vary with views. Thus, for each subject, we have BCGs: . A group of similarity matrices can be derived.

We also define an undirected weighted BGG based on the geometric information of the region coordinates, which is a -Nearest Neighbor (-NN) graph [30]. The graph has ROIs as vertices , where each ROI is associated with coordinates of its center. Edges are weighted by the Gaussian similarity function of Euclidean distances, i.e., . We identify the set of vertices that are neighbors to the vertex using -NN, and connect and if or if . An adjacency matrix can then be associated with representing the similarity to nearest similar ROIs for each ROI, with the elements:

Our goal is to learn a feature representation for each subject by fusing its BCGs and the shared BGG, which captures both the local traits of each individual subject and the global traits of the population of subjects. Specifically, we develop a customized Multi-View Graph Convolutional Network (MVGCN) model to learn feature representations for diagnosing PD on PPMI data.

### 2.2 Our Approach

Overview. Fig. 1 provides an overview of the MVGCN framework we develop for relationship prediction on multi-view brain graphs. Our model is a deep neural network consisting of three main components: the first component is a multi-view GCN for extracting the feature matrices from each acquisition, the second component is a pairwise matching strategy for aggregating the feature matrices from each pair of acquisitions into feature vectors, and the third component is a softmax predictor for relationship prediction. All of these components are trained using back-propagation and stochastic optimization. Note that MVGCN is an end-to-end architecture without extra parameters for view pooling involved in multi-view GCN and pairwise matching, Also, all branches of the used views share the same parameters in the multi-view GCN component. We next give details of each component.

C1: Multi-View GCN. Traditional convolutional neural networks (CNN) rely on the regular grid-like structure with a well-defined neighborhood at each position in the grid (e.g. 2D and 3D images). On a graph structure there is usually no natural choice for an ordering of the neighbors of a vertex, therefore it is not trivial to generalize the convolution operation to the graph setting. Shuman et al. showed that this generalization can be made feasible by defining graph convolution in the spectral domain and proposed a GCN [31]. Motivated by the fact that GCN can effectively model the nonlinearity of samples in a population and has superior capability to explore graph characteristics at a spectrum of frequency bands, we propose a multi-view GCN for an effective fusion of populations of graphs with different views. It consists of two fundamental steps: (i) the design of convolution operator on multiple graphs across views, (ii) a view pooling operation that groups together multi-view graphs.

Graph Convolution. An essential point in GCN is to define graph convolution in the spectral domain based on Laplacian matrix and graph Fourier transform (GFT). We consider the normalized graph Laplacian , where is the adjacency matrix associated with the graph, is the diagonal degree matrix with , and is the identity matrix. As is a real symmetric positive semidefinite matrix, it can be decomposed as , where is the matrix of eigenvectors with (referred to as the Fourier basis) and is the diagonal matrix of eigenvalues . The eigenvalues represent the frequencies of their associated eigenvectors, i.e. eigenvectors associated with larger eigenvalues oscillate more rapidly between connected vertices [32]. Specifically, in order to obtain a unique frequency representation for the signals on the set of graphs, we define the Laplacian matrix on the BGG , as all graphs share a common structure with adjacency matrix .

Let be a signal defined on the vertices of a graph , where denotes the value of the signal at the -th vertex. The GFT is defined as , which converts signal to the spectral domain spanned by the Fourier basis . Then the graph convolution can be defined as:

(1) |

where is a vector of Fourier coefficients to be learned, and is called the filter which can be regarded as a function of . To render the filters -localised in space and reduce the computational complexity, can be approximated by a truncated expansion in terms of Chebyshev polynomials of order [33]. That is,

(2) |

where the parameter is a vector of Chebyshev coefficients and is the Chebyshev polynomial of order evaluated at , a diagonal matrix of scaled eigenvalues that lies in .

Substituting Eq. (2) into Eq. (1) yields , where . Denoting , we can use the recurrence relation to compute with and . Finally, the th output feature map in a GCN is given by:

(3) |

yielding vectors of trainable Chebyshev coefficients , where denotes the input feature maps from a graph. In our case, corresponds to the -th column of the respective similarity matrix to each graph, and (number of brain ROIs). The outputs are collected into a feature matrix , where each row represents the extracted features of a ROI.

View Pooling. For each subject, the output of GCN are feature matrices , where each matrix corresponds to a view. Similar to the view-pooling layer in the multi-view CNN [34], we use element-wise maximum operation across all feature matrices in each subject to aggregate multiple views together, producing a shared feature matrix . This view-pooling is similar to the view-pooling layer in the multi-view CNN [34]. An alternative is an element-wise mean operation, but it is not as effective in our experiments (see Table 2). The reason may be that the maximum operation learns to combine the views instead of averaging, and thus can use the more informative views of each feature while ignoring others.

Fig. 2 gives the flowchart of our multi-view GCN. Based on this multi-view GCN, different views of BCGs can be progressively fused in accordance with their similarity matrices, which can capture both local and global structural information from BCGs and BGG.

C2: Pairwise Matching. Training deep learning model requires a large amount of training data, but usually very few data are available from clinical practice. We take advantage of the pairwise relationships between subjects to guide the process of deep learning, which has been shown to be effective in previous studies [35, 36]. Similarity is an important type of pairwise relationship that measures the relatedness of two subjects. The basic assumption is that, if two subjects are similar, they should have a high probability to have the same class label.

Let and be the feature matrices for any subject pair obtained from multi-view GCN, we can use them to compute a ROI-ROI similarity score. To do so, we first normalize each matrix so that the sum of squares of each row is equal to 1, and then define the following pairwise similarity measure using the row-wise inner product operator:

(4) |

where and are the -th row vectors of the normalized matrices and , respectively.

C3: Softmax. For each pair, the output of the pairwise matching layer is a feature vector , where each element is given by Eq. (4). Then, this representation is passed to a fully connected softmax layer for classification. It computes the probability distribution over the labels:

(5) |

where is the weight vector of the -th class, and is the final abstract representation of the input example obtained by a series of transformations from the input layer through a series of convolution and pooling operations.

## 3 Experiments and Results

In order to evaluate the effectiveness of our proposed approach, we conduct extensive experiments on real-life Parkinsonâs Progression Markers Initiative (PPMI) data for relationship prediction and compare with several state-of-the-art methods. In the following, we introduce the datasets used and describe details of the experiments. Then we present the results as well as the analysis.

Data Description. We consider the DTI acquisition on subjects, where subjects are Parkinson’s Disease (PD) patients and the rest are Healthy Control (HC) ones. Each subject’s raw data were aligned to the b0 image using the FSL^{2}^{2}2http://www.fmrib.ox.ac.uk/fsl eddy-correct tool to correct for head motion and eddy current distortions. The gradint table is also corrected accordingly. Non-brain tissue is removed from the diffusion MRI using the Brain Extraction Tool (BET) from FSL [37]. To correct for echo-planar induced (EPI) susceptibility artifacts, which can cause distortions at tissue-fluid interfaces, skull-stripped b0 images are linearly aligned and then elastically registered to their respective preprocessed structural MRI using Advanced Normalization Tools (ANTs^{3}^{3}3http://stnava.github.io/ANTs/) with SyN nonlinear registration algorithm [38]. The resulting 3D deformation fields are then applied to the remaining diffusion-weighted volumes to generate full preprocessed diffusion MRI dataset for the brain network reconstruction. In the meantime, 84 ROIs is parcellated from T1-weighted structural MRI using Freesufer^{4}^{4}4https://surfer.nmr.mgh.harvard.edu and each ROI’s coordinate is defined using the mean coordinate for all voxels in that ROI.

Based on these 84 ROIs, we reconstruct six types of BCGs for each subject using six whole brain tractography algorithms, including four tensor-based deterministic approaches: Fiber Assignment by Continuous Tracking (FACT) [39], the 2nd-order Runge-Kutta (RK2) [40], interpolated streamline (SL) [41], the tensorline (TL) [42], one Orientation Distribution Function (ODF)-based deterministic approach [43]: ODF-RK2 and one ODF-based probabilistic approach: Hough voting [44]. Please refer to [45] for the details of whole brain tractography computations. Each resulted network for each subject is . To avoid computation bias in the later feature extraction and evaluation sections, we normalize each brain network by the maximum value in the matrix, as matrices derived from different tractography methods have different scales and ranges.

Experimental Settings. To learn similarities between graphs, brain networks in the same group (PD or HC) are labeled as matching pairs while brain networks from different groups are labeled as non-matching pairs. Hence, we have pairs in total, with matching samples and non-matching samples. 5-fold cross validation is adopted in all of our experiments by separating the sample pairs into stratified randomized sets. Using the coordinate information of ROIs in DTI, we construct a -NN BGG in our method, which has vertices and edges. For graph convolutional layers, the order of Chebyshev polynomials and the output feature dimension are used. For fully connected layers, the number of feature dimensions is in the baseline of one fully connected layer, and those are set as and for the baseline of two layers. The Adam optimizer is used with the initial learning rate . The above parameters are optimal settings for all the methods by performing cross-validation.

Methods | Views | |||||
---|---|---|---|---|---|---|

FACT | RK2 | SL | TL | ODF-RK2 | Hough | |

Raw Edges | 58.474.05 | 62.546.88 | 59.395.99 | 61.945.00 | 60.935.60 | 64.493.56 |

PCA | 64.102.10 | 63.402.72 | 64.432.23 | 62.461.46 | 60.932.63 | 63.463.52 |

FCN | 66.172.00 | 65.112.63 | 65.002.29 | 64.333.34 | 68.802.80 | 61.913.42 |

FCN | 82.361.87 | 81.024.28 | 81.682.49 | 81.993.44 | 82.534.74 | 81.773.74 |

GCN | 92.674.94 | 92.994.95 | 92.685.32 | 93.755.39 | 93.045.26 | 93.905.48 |

Architectures | AUC | NMI |
---|---|---|

PCA100-M-S | 64.432.23 | 0.39 |

FCN1024-M-FCN64-S | 82.534.74 | 0.87 |

GCN128-M-S | 93.755.39 | 0.98 |

MVGCN128-M-S | 94.745.62 | 1.00 |

MVGCN128-M-S | 95.375.87 | 1.00 |

Results. Since our target is to predict relations (matching vs. non-matching) between pairwise BCGs, the performance of binary classification are evaluated using the metric of Area Under the Curve (AUC). Table 1 provides the results of individual views using the following methods: raw edges-weights, PCA, feed-forward fully connected networks (FCN and FCN), and graph convolutional network (GCN), where FCN is a two-layer FCN. Through the compared methods, the feature representation of each subject in pairs can be learned. For a fair comparison, pairwise matching component and software component are utilized for all the methods. The best performance of GCN-based method achieves an AUC of . It is clear that GCN outperforms the raw edges-weights, conventional linear dimension reduction method PCA and nonlinear neural networks FCN and FCN.

Table 2 reports the performance on classification and acquisition clustering of our proposed MVGCN with three baselines. The architectures of neural networks by the output dimensions of the corresponding hidden layers are presented. M denotes the matching layer based on Eq. (4), S denotes the softmax operation in Eq. (5). The numbers denote the dimensions of extracted features at different layers. For our study, we evaluate both element-wise max pooling and mean pooling in the view pooling component. Specifically, to test the effectiveness of the learned similarities, we also evaluate the clustering performance in terms of Normalized Mutual Information (NMI). The acquisition clustering algorithm we used is -means (, PD and HC). The results show that our MVGCN outperforms all baselines on both classification and acquisition clustering tasks, with an AUC of and an NMI of .

In order to test whether the prediction results are meaningful for distinguishing brain networks as PD or HC, we visualize the Euclidean distance for the given 754 DTI acquisitions. Since the output values of all the matching models can indicate the pairwise similarities between acquisitions, we map it into a 2D space with t-SNE [46]. Fig. 3 compares the visualization results with different approaches. The feature extraction by PCA cannot separate the PD and HC perfectly. The result of FCN in the view ODF-RK2 that has the best AUC is much better, and two clusters can be observed with a few overlapped acquisitions. Compared with PCA and FCN, the visualization result of MVGCN with max view pooling clearly shows two well-separated and relatively compact groups.

Furthermore, we investigate the extracted pairwise feature vectors of the proposed MVGCN. After the ROI-ROI based pairwise matching, the output for each pair is a feature vector embedding the similarity of the given two acquisitions, with each element associated with a ROI. By visualizing the value distribution over ROIs, we can interpret the learned pairwise feature vector of our model. Fig. 4 reports the most ten similar or dissimilar ROI for PD or HC groups. For instance, the averaged distributions are computed given the pairwise PD samples, and the values of the top- ROI are shown in Fig. 4(a). According to the results, lateral orbitofrontal area, middle temporal and amygdala areas are the three most similar ROIs for PD patients, while important ROIs such as caudate and putamen areas are discriminative to distinguish PD and HC (see Fig. 4(d)). The observations demonstrate that the learned pairwise feature vectors are consistent with some clinical discoveries and thus verify the effectiveness of the MVGCN for neuroimage analysis.

## 4 Conclusions

We propose a multi-view graph convolutional network method called MVGCN in this paper, which can directly take brain graphs from multiple views as inputs and do prediction on that. We validate the effectiveness of MVGCN on real-world Parkinson’s Progression Markers Initiative (PPMI) data for predicting the pairwise matching relations. We demonstrate that our proposed MVGCN can not only achieve good performance, but also discover interesting predictive patterns.

## References

- [1] William Dauer and Serge Przedborski. Parkinson’s disease: mechanisms and models. Neuron, 39(6):889–909, 2003.
- [2] Kenneth D Kochanek, Sherry L Murphy, Jiaquan Xu, and Betzaida Tejada-Vera. Deaths: final data for 2014. National vital statistics reports: from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 65(4):1–122, 2016.
- [3] M Frasier, S Chowdhury, T Sherer, J Eberling, B Ravina, A Siderowf, C Scherzer, D Jennings, C Tanner, K Kieburtz, et al. The parkinson’s progression markers initiative: a prospective biomarkers study. Movement Disorders, 25:S296, 2010.
- [4] Ivo D Dinov, Ben Heavner, Ming Tang, Gustavo Glusman, Kyle Chard, Mike Darcy, Ravi Madduri, Judy Pa, Cathie Spino, Carl Kesselman, et al. Predictive big data analytics: a study of parkinsonâs disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PloS one, 11(8):e0157077, 2016.
- [5] Anette Schrag, Uzma Faisal Siddiqui, Zacharias Anastasiou, Daniel Weintraub, and Jonathan M Schott. Clinical variables and biomarkers in prediction of cognitive impairment in patients with newly diagnosed parkinson’s disease: a cohort study. The Lancet Neurology, 16(1):66–75, 2017.
- [6] Mike A Nalls, Cory Y McLean, Jacqueline Rick, Shirley Eberly, Samantha J Hutten, Katrina Gwinn, Margaret Sutherland, Maria Martinez, Peter Heutink, Nigel M Williams, et al. Diagnosis of parkinson’s disease on the basis of clinical and genetic classification: a population-based modelling study. The Lancet Neurology, 14(10):1002–1009, 2015.
- [7] University of california, san francisco and weill cornell medicine researchers named winners of 2016 parkinson’s progression markers initiative data challenge. https://www.michaeljfox.org/foundation/publication-detail.html?id=625&category=7.
- [8] Marios Politis. Neuroimaging in parkinson disease: from research setting to clinical practice. Nature Reviews Neurology, 10(12):708, 2014.
- [9] Shun Chen, Hong-yu Tan, Zhuo-hua Wu, Chong-peng Sun, Jian-xun He, Xin-chun Li, and Ming Shao. Imaging of olfactory bulb and gray matter volumes in brain areas associated with olfactory function in patients with parkinson’s disease and multiple system atrophy. European journal of radiology, 83(3):564–570, 2014.
- [10] Hirobumi Oikawa, Makoto Sasaki, Yoshiharu Tamakawa, Shigeru Ehara, and Koujiro Tohyama. The substantia nigra in parkinson disease: proton density-weighted spin-echo and fast short inversion time inversion-recovery mr findings. American Journal of Neuroradiology, 23(10):1747–1756, 2002.
- [11] Patrice Péran, Andrea Cherubini, Francesca Assogna, Fabrizio Piras, Carlo Quattrocchi, Antonella Peppe, Pierre Celsis, Olivier Rascol, Jean-François Demonet, Alessandro Stefani, et al. Magnetic resonance imaging markers of parkinsonâs disease nigrostriatal signature. Brain, 133(11):3423–3433, 2010.
- [12] L Minati, M Grisoli, F Carella, T De Simone, MG Bruzzone, and M Savoiardo. Imaging degeneration of the substantia nigra in parkinson disease with inversion-recovery mr imaging. American journal of neuroradiology, 28(2):309–313, 2007.
- [13] Claire J Cochrane and Klaus P Ebmeier. Diffusion tensor imaging in parkinsonian syndromes a systematic review and meta-analysis. Neurology, 80(9):857–864, 2013.
- [14] DE Vaillancourt, MB Spraker, J Prodoehl, I Abraham, DM Corcos, XJ Zhou, CL Comella, and DM Little. High-resolution diffusion tensor imaging in the substantia nigra of de novo parkinson disease. Neurology, 72(16):1378–1384, 2009.
- [15] Usman Saeed, Jordana Compagnone, Richard I Aviv, Antonio P Strafella, Sandra E Black, Anthony E Lang, and Mario Masellis. Imaging biomarkers in parkinsonâs disease and parkinsonian syndromes: current and emerging concepts. Translational neurodegeneration, 6(1):8, 2017.
- [16] Francisco Pereira, Tom Mitchell, and Matthew Botvinick. Machine learning classifiers and fmri: a tutorial overview. Neuroimage, 45(1):S199–S209, 2009.
- [17] Miles N Wernick, Yongyi Yang, Jovan G Brankov, Grigori Yourganov, and Stephen C Strother. Machine learning in medical imaging. IEEE signal processing magazine, 27(4):25–38, 2010.
- [18] Steven Lemm, Benjamin Blankertz, Thorsten Dickhaus, and Klaus-Robert Müller. Introduction to machine learning for brain imaging. Neuroimage, 56(2):387–399, 2011.
- [19] Zilong Bai, Peter Walker, Anna Tschiffely, Fei Wang, and Ian Davidson. Unsupervised network discovery for brain imaging data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 55–64. ACM, 2017.
- [20] Xinyue Liu, Xiangnan Kong, and Ann B Ragin. Unified and contrasting graphical lasso for brain network discovery. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 180–188. SIAM, 2017.
- [21] Michael Mannino and Steven L Bressler. Foundational perspectives on causality in large-scale brain networks. Physics of life reviews, 15:107–123, 2015.
- [22] Ahmad R Hariri and Daniel R Weinberger. Imaging genomics. British medical bulletin, 65(1):259–270, 2003.
- [23] Paul M Thompson, Nicholas G Martin, and Margaret J Wright. Imaging genomics. Current opinion in neurology, 23(4):368, 2010.
- [24] Harrison X Bai, Ashley M Lee, Li Yang, Paul Zhang, Christos Davatzikos, John M Maris, and Sharon J Diskin. Imaging genomics in cancer research: limitations and promises. The British journal of radiology, 89(1061):20151030, 2016.
- [25] Srikanth Ryali, Kaustubh Supekar, Daniel A Abrams, and Vinod Menon. Sparse logistic regression for whole-brain classification of fmri data. NeuroImage, 51(2):752–764, 2010.
- [26] Rémi Cuingnet, Marie Chupin, Habib Benali, and Olivier Colliot. Spatial and anatomical regularization of svm for brain image analysis. In Advances in Neural Information Processing Systems, pages 460–468, 2010.
- [27] Paul Sajda, Shuyan Du, Truman R Brown, Radka Stoyanova, Dikoma C Shungu, Xiangling Mao, and Lucas C Parra. Nonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain. IEEE transactions on medical imaging, 23(12):1453–1465, 2004.
- [28] Alonso Ramirez-Manzanares and Mariano Rivera. Basis tensor decomposition for restoring intra-voxel structure and stochastic walks for inferring brain connectivity in dt-mri. International Journal of Computer Vision, 69(1):77–92, 2006.
- [29] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- [30] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
- [31] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013.
- [32] Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, and Daniel Rueckert. Distance metric learning using graph convolutional networks: Application to functional brain networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 469–477. Springer, 2017.
- [33] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
- [34] Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945–953, 2015.
- [35] Jingyuan Zhang, Bokai Cao, Sihong Xie, Chun-Ta Lu, Philip S Yu, and Ann B Ragin. Identifying connectivity patterns for brain diseases via multi-side-view guided deep architectures. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 36–44. SIAM, 2016.
- [36] Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, and Fei Wang. Measuring patient similarities via a deep architecture with medical concept embedding. In Data Mining (ICDM), 2016 IEEE 16th International Conference on, pages 749–758. IEEE, 2016.
- [37] Stephen M Smith. Fast robust automated brain extraction. Human brain mapping, 17(3):143–155, 2002.
- [38] Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26–41, 2008.
- [39] Susumu Mori, Barbara J Crain, Vadappuram P Chacko, and Peter Van Zijl. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Annals of neurology, 45(2):265–269, 1999.
- [40] Peter J Basser, Sinisa Pajevic, Carlo Pierpaoli, Jeffrey Duda, and Akram Aldroubi. In vivo fiber tractography using dt-mri data. Magnetic resonance in medicine, 44(4):625–632, 2000.
- [41] Thomas E Conturo, Nicolas F Lori, Thomas S Cull, Erbil Akbudak, Abraham Z Snyder, Joshua S Shimony, Robert C McKinstry, Harold Burton, and Marcus E Raichle. Tracking neuronal fiber pathways in the living human brain. Proceedings of the National Academy of Sciences, 96(18):10422–10427, 1999.
- [42] Mariana Lazar, David M Weinstein, Jay S Tsuruda, Khader M Hasan, Konstantinos Arfanakis, M Elizabeth Meyerand, Benham Badie, Howard A Rowley, Victor Haughton, Aaron Field, et al. White matter tractography using diffusion tensor deflection. Human brain mapping, 18(4):306–321, 2003.
- [43] Iman Aganj, Christophe Lenglet, Guillermo Sapiro, Essa Yacoub, Kamil Ugurbil, and Noam Harel. Reconstruction of the orientation distribution function in single-and multiple-shell q-ball imaging within constant solid angle. Magnetic Resonance in Medicine, 64(2):554–566, 2010.
- [44] Iman Aganj, Christophe Lenglet, Neda Jahanshad, Essa Yacoub, Noam Harel, Paul M Thompson, and Guillermo Sapiro. A hough transform global probabilistic approach to multiple-subject diffusion mri tractography. Medical image analysis, 15(4):414–425, 2011.
- [45] Liang Zhan, Jiayu Zhou, Yalin Wang, Yan Jin, Neda Jahanshad, Gautam Prasad, Talia M Nir, Cassandra D Leonardo, Jieping Ye, Paul M Thompson, et al. Comparison of nine tractography algorithms for detecting abnormal structural brain networks in alzheimerâs disease. Frontiers in aging neuroscience, 7:48, 2015.
- [46] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.