A novel framework for automatic detection of Autism: A study on Corpus Callosum and Intracranial Brain Volume

# A novel framework for automatic detection of Autism: A study on Corpus Callosum and Intracranial Brain Volume

Hamza Sharif Rizwan Ahmed Khan Faculty of IT, Barrett Hodgson University, Karachi, Pakistan. LIRIS, Université Claude Bernard Lyon 1, France.
###### Abstract

Computer vision and machine learning are the linchpin of field of automation. The medicine industry has adopted numerous methods to discover the root causes of many diseases in order to automate detection process. But, the biomarkers of Autism Spectrum Disorder (ASD) are still unknown, let alone automating its detection, due to intense connectivity of neurological pattern in brain. Studies from the neuroscience domain highlighted the fact that corpus callosum and intracranial brain volume holds significant information for detection of ASD. Such results and studies are not tested and verified by scientists working in the domain of computer vision / machine learning. Thus, in this study we have applied machine learning algorithms on features extracted from corpus callosum and intracranial brain volume data. Corpus callosum and intracranial brain volume data is obtained from s-MRI (structural Magnetic Resonance Imaging) data-set known as ABIDE (Autism Brain Imaging Data Exchange). Our proposed framework for automatic detection of ASD showed potential of machine learning algorithms for development of neuroimaging data understanding and detection of ASD. Proposed framework enhanced achieved accuracy by calculating weights / importance of features extracted from corpus callosum and intracranial brain volume data.

###### keywords:
ASD, Machine learning, Corpus callosum, Intracranial brain volume, T1-weighted structural brain imaging data.

## 1 Introduction

The emerging field of computer vision and artificial intelligence has dominated research and industry in various domains and now aiming to outstrip human intellect Sebe et al. (2005). With computer vision and machine learning techniques, unceasing advancement has been made in different areas like imaging Kak and Slaney (1988), computational biology Zhang (2002), video processing Van den Branden Lambrecht (2013), affect analysis Khan et al. (2013, 2019, 2013), medical diagnostics Akram et al. (2013) and much more. However, despite all the advances, neuroscience is one of the area in which machine learning is minimally applied due to complex nature of data. This article proposes a framework for automatic identification of Autism Spectrum Disorder (ASD) Jaliaawala and Khan (2019) by applying machine learning algorithm on neuroimaging data-sets known as ABIDE (Autism Brain Imaging Data Exchange) Di Martino et al. (2014).

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that is perceived by a lack of social interaction and emotional intelligence, repetitive, abhorrent, stigmatized and fixated behavior Choi (2017); Jaliaawala and Khan (2019). This syndrome is not a rare condition, but a spectrum with numerous disabilities. ICD-10 WHO (World Health Organization 1992) Organization (1993) and DSM-IV APA (American Psychiatric Association) Castillo et al. (2007), outlined criteria for defining ASD in terms of social and behavioral characteristics. According to their nomenclature: an individual facing ASD has an abnormal trend associated with social interaction, lack of verbal and non-verbal communication skills and a limited range of interests in specific tasks and activities Jaliaawala and Khan (2019). Based on these behavioral lineaments, Autism Spectrum Disorder (ASD) is further divided into groups, which are:

1. High Functioning Autism (HFA) Baron-Cohen et al. (2001): HFA is a term applied to people with autistic disorder, who are deemed to be cognitively “higher functioning” (with an IQ of 70 or greater) than other people with autism.

2. Asperger Syndrome (AS) Klin et al. (2000): individuals facing AS have qualitative impairment in social interaction, show restricted repetitive and stereotyped patterns of behavior, interests, and activities. Usually such individuals have no clinically significant general delay in language or cognitive development. Generally, individuals facing AS have higher IQ levels but lack in facial actions and social communication skills.

3. Attention Deficit Hyperactivity Disorder (ADHD) Barkley and Murphy (1998): individuals with ADHD show impairment in paying attention (inattention). They have overactive behavior (hyperactivity) and sometimes impulsive behavior (acting without thinking).

4. Psychiatric symptoms Simonoff et al. (2008), such as anxiety and depression.

Recent population-based statistics have shown that autism is the fastest-growing neurodevelopmental disability in the United States and the UK Rice (2009). More than 1% of children and adults are diagnosed with autism and costs of $2.4 million and$2.2 million are used for treatment in the United States and the United Kingdom respectively, as reported by the Center of Disease Control and Prevention (CDC), USA Rice (2009); Buescher et al. (2014). It is also known that delay in detection of ASD is associated with increase in cost for supporting individual with ASD Horlin et al. (2014). Thus, it is utmost important for research community to propose novel solutions for early detection of ASD and our proposed framework can be used for early detection of ASD.

Until now, biomarkers of ASD are unknown Del Valle Rubido et al. (2018); Jaliaawala and Khan (2019). Physicians and clinicians are practicing standardized / conventional methods for ASD analysis and diagnosis. Intellectual properties and behavioral characteristics are accessed for the diagnosis of ASD; however, synaptic affiliations of ASD are still unknown and presents a challenging task for cognitive neuroscience and psychological researchers Kushki et al. (2013). A recent hypothesis in neurology demonstrates that an abnormal trend is associated with different neural regions of the brain among individuals facing ASD Bourgeron (2009). This variational trend is due to irregularities in neural pattern, disassociation and anti-correlation of cognitive function between different regions, that effects global brain network Schipul et al. (2011).

Magnetic Imaging Resonance (MRI), a non-invasive technique, has been widely used to study brain regional network(s). Thus, MRI data can be used to reveal subtle variations in neural patterns / network which can help in identifying biomarkers for ASD. An MRI technology expends electrical pluses to generate a pictorial representation of particular brain tissue. An example of MRI scan in different cross-sectional view is shown in Figure 2. MRI scans are further divided into structural MRI (s-MRI) and functional MRI (f-MRI) depending on type of scanning technique used Bullmore and Sporns (2009). The entire brain network using structural and functional MRI is shown in Figure 3.

Structural MRI (s-MRI) scans are used to examine anatomy and neurology of the brain. s-MRI scans are also employed to measure volume of brain i.e. regional grey matter (GM), white matter (WM) and cerebrospinal fluid (CSF) Giedd (2004), volume of its sub-regions and to identify localized lesions. s-MRI is classified into two sequences: T1-weighted MRI and T2-weighted MRI, where sequence means number of radio-frequency pulses and gradients that result in a set of images with a particular appearance Haacke et al. (2009). These sequences depends on the value of the scanning parameters: Repetition Time (TR) and Echo Time (TE). TR and TE parameters are used to control image contrast and weighting of MRI image Rutherford and Bydder (2002). T1-weighted scans are produced with short TE and long TR. Conversely, T2-weighted scans have long TE and short TR parameter values. The bright and dark regions in scans are primarily determined by T1 and T2 properties of cerebrospinal fluid (CSF). Cerebrospinal Fluid (CSF) is a clear, colorless body fluid present in brain. Therefore, CSF is dark in T1-weighted scans and appears bright in T2-weighted scans Budman et al. (1992).

f-MRI scans are used to visualize the activated brain regions associated with brain function. f-MRI computes synchronized neural activity through the detection of blood flow variation across different cognitive regions. By using MRI scans, numerous researchers have reported that distinctive brain regions are associated with ASD Huettel et al. (2004).

In 2012, the Autism Brain Imaging Data Exchange (ABIDE) provided scientific community with an “open source” repository to study ASD from brain imaging data i.e. MRI data Di Martino et al. (2014). The ABIDE data-set consists of 1112 participants (autism and healthy control) with rs-fMRI (resting state functional magnetic resonance imaging) data. rs-fMRI is a type of f-MRI data captured in resting or task-negative state Plitt et al. (2015); Smith et al. (2009). ABIDE also provides anatomical scans and phenotypical111clinical information such as age, sex and ethnicity data Di Martino et al. (2014). All the details (data collection and pre-processing) related to ABIDE data-set are presented in Section 3.

In this study we have proposed a machine learning based framework for automatic detection of ASD. Results from the proposed framework have been calculated using data from ABIDE data-set. Achieved average recognition accuracy of detection of individuals facing ASD is further enhanced by applying feature selection methods, where features are measurable attribute of the data Bishop (2006). Feature selection methods find weights / importance of different features by calculating their discriminative ability. Thus, improving prediction performance, computational time and generalization capability of machine learning algorithm Chandrashekar and Sahin (2014). Details related to feature selection method tested are presented in Section 4. Section 5 presents different classifiers evaluated during the course of this study, while obtained results are discussed in Section 5.2. Survey of related literature is presented in next section, i.e Section 2.

In summary, our contributions in this study are three-fold:

1. We showed potential of using machine learning algorithms applied to brain anatomical scans for automatic detection of ASD.

2. This study demonstrated that feature selection / weighting methods helps build robust classifier for automatic detection of ASD.

3. We also highlighted future directions to improve performance of such frameworks for automatic detection of ASD. Thus, such frameworks could perform well not only for published databases but also for real world applications and help clinicians in early detection of ASD.

## 2 State of the Art

In this section various methods that have been explored for classification of neurodevelopmental disorders are discussed. Fusion of artificial intelligence techniques (machine learning and deep learning) with brain imaging data has allowed to study representation of semantic categories Haxby et al. (2001), meaning of noun Buchweitz et al. (2012), learning Bauer and Just (2015) and emotions Kassam et al. (2013). But, generally use of machine learning algorithms to detect psychological and neurodevelopmental ailments i.e. schizophrenia Bellak (1994), autism Just et al. (2014) and anxiety / depression Craddock et al. (2009), remains restricted due to complex nature of problem. This literature review section is focused on the state-of-the-art methods that operates on brain imaging data to discover neurodevelopmental disorders via machine learning approaches.

Just et al. Just et al. (2014) presented Gaussian NaÃ¯ve Bayes (GNB) classifiers based approach to identify ASD and control participants using fMRI data. They achieved accuracy of 97% while detecting autism from a population of 34 individuals (17 control and 17 autistic individuals). Craddock et al. Craddock et al. (2009) used multi-voxel pattern analysis technique for detection of Major Depressive Disorder (MDD) Greicius et al. (2007). They have shown results on MRI data gathered from forty subjects i.e. twenty healthy controls and twenty individuals with MDD. Their proposed framework achieved accuracy of 95%.

One of the promising study done by Sabuncu et al. Sabuncu et al. (2015) used Multivariate Pattern Analysis (MVPA) algorithm and structural MRI (s-MRI) data to predict chain of neurodevelopmental disorders i.e. Alzheimer’s, autism, and schizophrenia. Sabuncu et al. analyzed structural neuroimaging data from six publicly available websites (https://www.nmr.mgh.harvard.edu/lab/mripredict), with 2800 subjects. MVPA algorithm constituted with three classes of classifiers that includes: Support Vector Machine (SVM) Vapnik (2013), Neighborhood Approximation Forest (NAF) Konukoglu et al. (2012) and Relevance Vector Machine (RVM) Tipping (2001). Sabuncu et al. attained detection accuracies of 70%, 86% and 59% for schizophrenia, alzheimer and autism respectively using 5-fold validation scheme.

It is important to note that studies that combine machine learning with brain imaging data collected from multiple sites (like ABIDE Di Martino et al. (2014)) to identify autism demonstrated that classification accuracy tends to decreases Arbabshirani et al. (2017). In this study we also observed same trend. Nielsen et al. Nielsen et al. (2013) also discovered the same pattern / trend from ABIDE data-set and also concluded that those sites with longer BOLD imaging time significantly have higher classification accuracy. Whereas, Blood Oxygen Level Dependent (BOLD) is an imaging method used in fMRI to observe active regions, using blood flow variation. Those regions where blood concentration is more appear to be more active than other regions Huettel et al. (2004).

In study conducted by Koyamada et al. Koyamada et al. (2015) employed neuroimaging data with DNN (Deep Neural Network) LeCun et al. (2015). They showed DNN out performs conventional supervised learning methods i.e. Support Vector Machine (SVM) Vapnik (2013), in learning concept from neuroimaging data. Koyamada et al. investigated brain states from brain activities using DNN to classify task-based fMRI data that has seven task categories: emotional response, wagering, language, motor, experiential, interpersonal and working memory. They trained deep neural network with two hidden layers and achieved an average accuracy of 50.47%.

Deep learning models i.e. DNN, holds a great potential in clinical / neuroscience / neuroimaging research applications. Plis et al. Plis et al. (2014) used Deep Belief Network (DBN) for automatic detection of schizophrenia Bellak (1994). Plis et al. trained model with three hidden layers: 50-50-100 hidden neurons in the first, second and top layer respectively, using T1-weighted structural MRI (s-MRI) imaging data (refer Section 1 for discussion on s-MRI data). They analyzed data-set from four different studies conducted by Johns Hopkins University (JHU), the Maryland Psychiatric Research Center (MPRC), the Institute of Psychiatry, London, UK (IOP), and the Western Psychiatric Institute and Clinic at the University of Pittsburgh (WPIC), with 198 schizophrenia patients and 191 control and achieved classification accuracy of 90%.

In another study Heinsfeldl et al. Heinsfeld et al. (2018) trained neural network by transfer learning from two auto-encoders Vincent et al. (2008). Transfer learning methodology allows distributions used in training and testing to be different and it also paves the path for neural network to use learned neurons weights in different scenarios Khan et al. (2019). The aim of the study by Heinsfeldl et al. was to detect ASD and healthy control. The main objective of auto-encoders is to learn data in an unattended way to improve the generalization of a model Vincent et al. (2010). For unsupervised pre-training of these two auto encoders, Heinsfeldl et al. utilized rs-fMRI (resting state-fMRI) image data from ABIDE data-set. The knowledge in the form of weights extracted from these two auto-encoders were mapped to multilayer perceptron (MLP). Heinsfeldl et al. achieved classification accuracy up to 70% .

The studies described above in this section, focused on analyzing neuroimaging data i.e. MRI and fMRI scanning data to detect different neurodevelopmental disorders. Different brain regions used to predict psychological disorders were not focused or given importance in these above mentioned studies. It has been shown that different regions of brain highlight subtle variations that differentiates healthy individuals from individual facing neurodevelopmental disorder. A quantitative survey using ABIDE data-set reported that increase in brain volume and reduction in corpus callosum Zaidel and Iacoboni (2003) area were found in participants with autism spectrum disorder (ASD). Where, the corpus callosum have a central function in integrating information and mediating behaviors Hinkley et al. (2012). The corpus callosum consists of approximately 200 million fibers of varying diameters and is the largest inter-hemispheric joint of the human brain Tomasch (1954).

Hiess et al. Hiess et al. (2015) also concluded that although there was no significant difference in the corpus callosum sub-regions between ASD and control participants, but the individuals facing ASD had increased intracranial volume. Intracranial volume (ICV) is used as an estimate of size of brain and brain regions / volumetric analysis NordenskjÃ¶ld et al. (2013). Waiter et al. Waiter et al. (2005) reported reduction in the size of splenium and isthmus and Chung et al. Chung et al. (2004) also found diminution in the area of splenium, genu and rostrum of corpus callosum in ASD. Whereas, splenium, isthmus, genu and rostrum are regional subdivisions of the corpus callosum based on Witelson et al. Witelson (1989) and Venkatasubramanian et al. Venkatasubramanian et al. (2007) studies. Refer Figure 4 for pictorial representation of different segmented sub-regions of corpus callosum.

In this article, we propose a framework for automatic identification of individuals facing ASD using T1-weighted MRI scans. For automatic detection of ASD we have utilized different classifiers techniques (refer Section 5 for details of machine learning algorithms used in this study). Machine learning is done on features extracted from Autism Brain Imaging Data Exchange (ABIDE) data-set / database. We further improved results of classification by calculating importance of different features for the given task (Section 4 presents feature selection methodology employed in this study). We used same features as used in the study of Hiess et al. Hiess et al. (2015). By using same features we can robustly verify relative strength or weakness of proposed machine learning based framework as study done by Hiess et al. does not employ machine learning. Next section presents all the details related to the ABIDE database and also explains preprocessing procedure.

## 3 Database

This study is performed using structural MRI (s-MRI) scans from Autism Brain Imaging Data Exchange (ABIDE-I) dataset (http://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html). ABIDE is an online sharing consortium that provides imaging data of ASD and control participants with their phenotypic information Di Martino et al. (2014). ABIDE-I data-set consists of 17 international sites, with total of 1112 subjects or samples, that includes (539 autism cases and 573 healthy control participants). According to Health Insurance Portability and Accountability Act (HIPAA) Act (1996) guidelines, identity of individuals participated in ABIDE database recording was not disclosed. Table 1 shows image acquisition parameters for structural MRI (s-MRI) scans for each site in ABIDE study.

As explained above, we used same features as used in the study of Hiess et al. Hiess et al. (2015). Next, we will explain preprocessing done by Hiess et al. on T1-weighted MRI scans from ABIDE database to calculate different parameters and regions of corpus callosum and brain volume.

### 3.1 Preprocessing

Corpus callosum area, its sub-regions and intracranial volume were calculated using different softwares. These softwares are:

1. yuki Ardekani (2013)

2. itksnap Yushkevich et al. (2006)

The corpus callosum have a central function in integrating information and mediating behaviors Hinkley et al. (2012). The corpus callosum consists of approximately 200 million fibers of varying diameters and is the largest inter-hemispheric joint of the human brain Tomasch (1954). Whereas, intracranial volume (ICV) is used as an estimate of size of brain and brain regions / volumetric analysis NordenskjÃ¶ld et al. (2013).

The corpus callosum area for each participant was segmented using âyukiâ software Ardekani (2013). The corpus callosum was automatically divided into its sub regions using Witelson scheme Witelson (1989). An example of corpus callosum segmentation is shown in Figure 4. Each segmentation was inspected visually and corrected necessarily using âITK-SNAPâ Yushkevich et al. (2006) software package. The inspection and correction procedure was performed by two readers. Due to minor manual correction in corpus callosum segmentation for some MRI scans, statistical equivalence analysis and intra-class correlation were calculated to measure corpus callosum area by both readers.

Total intracranial brain volume Malone et al. (2015) of each participant was measured by using software tool âbrainwashâ. âAutomatic Registration Toolboxâ (www.nitrc.org/projects/art) a feature in brainwash was used to extract intracranial brain volume. The brainwash method uses non-linear transformation to estimate intracranial regions by mapping the co-registered labels (pre-labeled intracranial regions) to participantâs MRI scan. The voxel-voting scheme Manjón and Coupé (2016) is used to classify each voxel in the participant MRI as intracranial or not. Each brain segmentation was visually inspected to ensure accurate segmentation. Some of cases where segmentation were not performed accurately, following additional steps were taken in order to process it:

1. In some cases where brain segmentation was not achieved correctly, the brainwash method was executed again with same site of preprocessed MRI scan that had error free brain segmentation.

2. The brainwash software automatically identifies the coordinates of anterior and posterior commissure. In some cases, these points were not correctly identified. In such cases, they were identified manually and entered in the software.

3. A âregion-based snakesâ feature implemented in âITK-SNAPâ Yushkevich et al. (2006) software package was used for minor correction of intracranial volume segmentation error manually.

Figure 1 shows how T1-weighted MRI scans is transformed into feature vector of M x N dimension, where M denotes the total number of samples and N denotes total number of features in the feature vector. Where features are measurable attribute of the data Bishop (2006). The feature vector was calculated using segmentation of corpus callosum area and brain volume. Different classifiers used in the proposed framework learn concept / pattern from the given / extracted features. Section 5 presents details of machine learning algorithms / classifiers used in this study.

## 4 Feature Selection

In every machine learning problem, selection of useful set of features or feature vector is an important task. The optimal features minimizes within-class variations (ASD vs control individuals) while maximizes between class variations Khan (2013). Feature selection techniques are utilized to find optimal features by removing redundant or irrelevant features for a given task.

As described above, we used same features as used in the study of Hiess et al. Hiess et al. (2015). By using same features, we can robustly verify relative strength or weakness of proposed machine learning based framework as study done by Hiess et al. does not employ machine learning. Hiess et al. have made preprocessed T1-weighted MRI scans data from ABIDE available for research (https://sites.google.com/site/hpardoe/cc_abide). Preprocessed data consists of parametric features of corpus callosum, its sub-regions and intracranial brain volume with label. In total, preprocessed data consists of 12 features from 1100 examples or samples each (12 x 1100). Statistical summary of preprocessed data is outlined in Table 2.

### 4.1 Studied feature selection methods

Selection of useful subset of features to extract meaningful results by eliminating redundant feature is very comprehensive and recursive task. To enhance computational simplicity, reduce complexity and improve performance of machine learning algorithms, different feature selection techniques are applied on the preprocessed ABIDE database. In literature, usually entropy or correlation based methods are used for feature selection. Thus, we have also employed state-of-the-art methods based on entropy and correlation to select features that minimizes within-class variations (ASD vs control individuals) while maximizes between class variations. Methods used in this study are explained below:

#### 4.1.1 Information Gain

Information gain is a feature selection technique that measures how much information a feature provides for the corresponding class. It measures information in the form of entropy. Entropy is defined as probabilistic measure of impurity, disorder or uncertainty in feature Quinlan (1986). Therefore, a feature with reduced entropy value intends to give more information and considered as more relevant. For a given set of training examples, , the vector of feature in this set, , the fraction of the examples of feature with value , the following equation is mathematically denoted:

 IG(SN,ni)=H(SN)−∣Sni=v∣∣SN∣∑v=values(ni)H(Sni=v) (1)

with entropy:

 H(S)=−p+(S)log2p+(S)−p−(S)log2p−(S) (2)

;
is the probability of training sample in dataset belonging to corresponding positive and negative class, respectively.

#### 4.1.2 Information Gain Ratio

Information gain is biased in selecting features with larger values Yu and Liu (2003). Information gain ratio, is modified version of information gain that reduces its bias. It is calculated as the ratio of information gain and intrinsic value Kononenko and Hong (1997). Intrinsic value is additional calculation of entropy. For a given set of features , of all training examples , with , where defines the specific example with feature value . The function denotes the set of all possible values of features . The information gain ratio for a feature is mathematically denoted as:

 IGR(Ex,f)=IG(Ex,f)IV (3)

with intrinsic value :

 IV(Ex,f)=∑vϵvalues(f)(∣{xϵEx∣values(x,f)=v}∣∣Ex∣).log2(∣{xϵEx∣values(x,f)=v}∣∣Ex∣) (4)

#### 4.1.3 Chi-Square Method

The Chi-Square () is correlation based feature selection method (also known as the Pearson Chi-Square test), which calculates the dependencies of two independent variables, where two variables and are defined as independent, if , or equivalent, and . In terms of machine learning, two variables are the occurrence of the features and class label Doshi (2014). Chi square method calculates the correlation strength of each feature by calculating statistical value represented by the following expression:

 χ2=n∑i=1(Ei−OiEi) (5)

;
() is the chi-square statistic, is the actual value of feature, and is the expected value of feature, respectively.

#### 4.1.4 Symmetrical Uncertainty

Symmetrical Uncertainty (SU) is referred as relevance indexing or scoring Brown et al. (2012) method which is used to find the relationship between a feature and class label. It normalizes the value of features within the range of [0, 1], where 1 indicates that feature and target class are strongly correlated and 0 indicates no relationship between them Peng et al. (2005). For a class label , the symmetrical uncertainty for set of features is mathematically denoted as:

 SU(X,Y)=[2∗IG(X,Y)H(X)+H(Y)] (6)

;
represents information gain, and represents entropy, respectively.

### 4.2 Analysis

All four methods (information gain, information gain ratio, chi-square and symmetrical uncertainty) calculates value / importance / weight of each feature for a given task. The weight of each feature is calculated with respect to class label and feature value calculated by each method. The higher the weight of feature, the more relevant it is considered. The weight of each feature is normalized between in the range of [0, 1]. The results of each feature selection method is shown in Figure 5.

Figure 5 presents result of feature selection study. First two graphs show weights of different features calculated from entropy based methods i.e. information gain and information gain ratio. Last two graphs present feature weights obtained from correlation based methods i.e. chi-square and symmetrical uncertainty. Result of information gain ratio differs from information gain but in both the methods and emerged as most important features. Results from correlation based methods i.e. chi-square and symmetrical uncertainty are almost similar with little differences. , , and emerged as the most discriminant features.

It is important to highlight that feature(s) that give more discriminant information in our study are comparable with features identified in study by Hiess et al. Hiess et al. (2015). Hiess et al. Hiess et al. (2015) concluded that and corpus callosum area are two important features used to discriminate ASD and control in ABIDE data-set. In our study we also concluded that and different sub-regions of corpus callosum i.e. genu, mid-body and splenium labeled as , and are most discriminant features. As a matter of fact, results from correlation based methods i.e. chi-square and symmetrical uncertainty are comparable with results presented by Hiess et al. Hiess et al. (2015).

In our proposed framework, we have applied threshold on results obtained from feature(s) selection method to select sub-set of features that reduce computational complexity and improve performance of machine learning algorithms. We performed experiments with different threshold values and empirically found that average classification accuracy (detection of ASD) obtained on sub-set of features from chi-square method at threshold value is highest. Final feature vector deduced in this study includes , , , , , and , where = corpus callosum . Average classification accuracy with and without feature selection method is presented in Table 3. It can be observed from table that training classifier on sub-set of discriminant features gives better result not only in terms of computational complexity by also in terms of average classification accuracy.

## 5 Experiment and results

Classification is a process of searching patterns / learning pattern / concept from a given data-set and predicting its label / class Bishop (2006). For automatic detection of ASD from preprocessed ABIDE data (features selected by feature selection algorithm, refer Section 4) we have tested below mentioned state-of-the-art classifiers:

1. Linear Discriminant Analysis (LDA)

2. Support Vector Machine (SVM) with radial basis function (rbf) Kernel

3. Random Forest (RF) of 10 trees

4. Multi-Layer Perceptron (MLP)

5. K- Nearest Neighbor (KNN) with =3

### 5.1 Classifiers used in our study

We chose classifiers from diverse categories. For example, K-Nearest Neighbor (KNN) is non parametric instance based learner, Support Vector Machine (SVM) is large margin classifier that theorizes to map data to higher dimensional space for better classification, Random Forest (RF) is tree based classifier which break the set of samples into a set of covering decision rules while Multi-Layer Perceptron (MLP) is motivated by human brain anatomy. Above mentioned classifiers are described below.

#### 5.1.1 Linear Discriminant Analysis (LDA)

LDA is a statistical method that finds linear combination of features, which separates the data-set into their corresponding classes. The resulting combination is used as linear classifier Jain and Huang (2004). LDA maximizes the linear separability by maximizing the ratio of between-class variance to the within-class variance for any particular dataset. Let and be the classes and number of exampleset in each class, respectively. Let and be the means of the classes and grand mean respectively. Then, the within and between class scatter matrices and are defined as:

 (7)
 Sb=L∑i=1P(ωi){[Mi−M][Mi−M]t} (8)

;
is the prior probability and represents covariance matrix of class , respectively.

#### 5.1.2 Support Vector Machine (SVM)

SVM classifier segregates samples into corresponding classes by constructing decision boundaries known as hyperplanes Vapnik (2013). It implicitly maps the data-set into higher dimensional feature space and construct a linear separable line with maximal marginal distance to separates hyperplane in higher dimensional space. For a training set of examples {} where and -1, 1 , a new test example is classified by the following function:

 f(x)=sgn(l∑i=1αiyiK(xi,x)+b) (9)

;
are Langrange multipliers of a dual optimization problem separating two hyperplanes, is a kernel function, and is the threshold parameter of the hyperplane respectively.

#### 5.1.3 Random Forest (RF)

Random Forest belongs to family of decision tree, capable of performing classification and regression tasks. A classification tree is composed of nodes and branches which break the set of samples into a set of covering decision rules Mitchell (1997). RF is an ensemble tree classifier consisting of many correlated decision trees and its output is mode of class’s output by individual decision tree.

#### 5.1.4 Multi-Layer Perceptron (MLP)

MLP belongs to the family of neural-nets which consists of interconnected group of artificial neurons called nodes and connections for processing information called edges Jain et al. (1996). A neural network consists of an input, hidden and output layer. The input layer transmits inputs in form of feature vector with a weighted value to hidden layer. The hidden layer, is composed with activation units or transfer function Gardner and Dorling (1998), carries the features vector from first layer with weighted value and performs some calculations as output. The output layer is made up of single activation units, carrying weighted output of hidden layer and predicts the corresponding class. An example of MLP with 2 hidden layer is shown in Figure 6. Multilayer perceptron is described as fully connected, with each node connected to every node in the next and previous layer. MLP utilizes the functionality of back-propagation Hecht-Nielsen (1992) during training to reduce the error function. The error is reduced by updating weight values in each layer. For a training set of examples { } and output 0 , 1 , a new test example is classified by the following function:

 y(x)=f(n∑j=1xjwj+b) (10)

;
is non-linear activation function, is weight multiplied by inputs in each layer , and is bias term, respectively.

#### 5.1.5 K-Nearest Neighbor (KNN)

KNN is an instance based non-parametric classifier which is able to find number of training samples closest to new example based on target function Khan et al. (2013); Acuna and Rodriguez (2004). Based upon the value of targeted function, it infers the value of output class. The probability of an unknown sample belonging to class can be calculated as follows:

 p(y∣q)=∑kϵKWk.1(ky=y)∑kϵKWk (11)
 Wk=1d(k,q) (12)

;
is the set of nearest neighbors, the class of , and the Euclidean distance of from , respectively.

### 5.2 Results and Evaluation

We chose to evaluate performance of our framework in the same way, as evaluation criteria proposed by Heinsfeldl et al. Heinsfeld et al. (2018). Heinsfeldl et al. evaluated the performance of their framework on the basis of -fold cross validation and leave-one-site-out classification schemes Bishop (2006). We have also evaluated and compared results of above mentioned classifiers based on these schemes.

#### 5.2.1 k-Fold cross validation scheme

Cross Validation is statistical technique for evaluating and comparing learning algorithms by dividing the data-set / sample space into two segments: one used to learn or train the model and other used to validate the model Kohavi et al. (1995). In -fold cross validation schema, data-set / sample space is segmented into k equally sized portions, segments or folds. Subsequently, iterations of learning and validation are performed, within each iteration folds are used for learning and a different fold of data is used for validation Bishop (2006). Upon completion of folds, performance of an algorithm is calculated by averaging values of evaluation metric i.e. accuracy of each fold.

All the studied classifiers are evaluated on -fold cross validation scheme which mixes the data from all 17 sites of ABIDE database. The database of all 17 sites is divided into 5 segments of equal portions. In -fold cross validation, 4 segments of data are used for training purpose and the other one portion is used for testing purpose. This process is explained in Figure 7.

Figure 8 presents average ASD recognition accuracy achieved by studied classifier using -fold cross validation scheme on preprocessed ABIDE data (features selected by feature selection algorithm, refer Section 4).The result shows that the overall accuracy of all classifiers increases with number of folds. Linear discriminant analysis (LDA), Support Vector Machine (SVM), Random Forest (RF), Multi-layer Perceptron (MLP) and K-nearest neighbor (KNN) achieved an average accuracy of 55.93%, 52.20%, 54.79%, 54.98% and 51.00% respectively. The result is also reported in Table 3.

#### 5.2.2 Leave-one-site-out classification scheme

In this classification validation scheme data from one site is used for testing purpose to evaluate the performance of model and rest of data from other sites is used for training purpose. This procedure is represented in Figure 9.

The framework achieved an average accuracy of 56.21%, 51.34%, 54.61%, 56.26% and 52.16% for linear discriminant analysis (LDA), Support Vector Machine (SVM), Random Forest (RF), Multi-layer Perceptron (MLP) and K-nearest neighbor (KNN) for ASD identification using leave-one-site-out classification scheme. Results are tabulated in Table 3.

Figure 10 presents recognition result for each site using leave-one-site-out classification method. It is interesting to observe that for all sites, maximum ASD classification accuracy is achieved for USM site data, with accuracy of 79.21% by -NN classifier. Second highest accuracy is achieved by LDA, with accuracy of 76.32% on CALTECH site data. This result is consistent with result obtained by Heinsfeldl et al. Heinsfeld et al. (2018).

The results of leave-one-site-out classification of all classifiers shows variations accross different sites. The result suggests that this variation could be due to change in number of samples size used for training phase. Furthermore, there is variability in data across different sites. Refer Table 1 for structural MRI acquisition parameters used across sites in the ABIDE database Hiess et al. (2015).

## 6 Conclusion and Future Work

Our research study show potential of machine learning algorithms for development of neuroimaging data understanding. We showed how machine learning algorithms can be applied to structural MRI data for automatic detection of individuals facing Autism Spectrum Disorder (ASD). Although achieved recognition rate is away from desired range but still in spite of absence of biomarkers such algorithms can assistant clinicians in early detection of ASD. Main conclusions drawn from this study are:

• Machine learning algorithms applied to brain anatomical scans can help in automatic detection of ASD. Features extracted from corpus callosum and intracranial brain regions presents significant discriminative information to classify individual facing ASD from control sub group.

• Feature selection / weighting methods helps build robust classifier for automatic detection of ASD. These methods not only help framework in terms of reducing computational complexity but also in terms of getting better average classification accuracy.

In future, we envision to improve performance of our framework so it could perform well not only for published databases but also for real world applications. Proposed framework can be improved by applying recently popularized machine learning algorithm called deep learning algorithm. Recent advancements in machine learning has shown that deep learning algorithms are better in learning concept and discovering intricate structure from given set of data.

One of the bottleneck that hinders application of deep learning algorithms is availability of large database. Deep learning algorithm has shown it supremacy over conventional machine learning algorithms, if training is done on significantly large database. For clinical applications where getting data, specially neuroimaging data is difficult, training of deep learning algorithm poses challenge. When required, this challenge is usually tackled by technique called transfer learning. Usually machine learning algorithms make prediction on data that is similar to what algorithm is trained on; training and test data are drawn from same distribution. On the contrary transfer learning allows distributions used in training and testing to be different Weiss et al. (2016). Thus, different pretrained deep learning models that have shown state-of-the-art results in detecting patterns in image data such as GoogleNet Szegedy et al. (2017), ResNet Han et al. (2015) or Inception Szegedy et al. (2015) can be used for understanding neuroimaging data.

Another proposition that can enhance recognition results of proposed framework is to use multimodal system. In addition to neuroimaging data other modalities i.e. EEG, speech or kinesthetic can be analyzed to achieve better recognition of ASD.

In order to bridge down the gap between neuroscience and computer science researchers, we emphasize and encourage the scientific community to share the database and results for automatic identification of psychological ailments.

## References

• Sebe et al. (2005) N. Sebe, I. Cohen, A. Garg, T. S. Huang, Machine learning in computer vision, volume 29, Springer Science & Business Media, 2005.
• Kak and Slaney (1988) A. C. Kak, M. Slaney, Principles of computerized tomographic imaging, IEEE press New York, 1988.
• Zhang (2002) T. J. M. Q. Zhang, Current topics in computational molecular biology, MIT Press, 2002.
• Van den Branden Lambrecht (2013) C. J. Van den Branden Lambrecht, Vision models and applications to image and video processing, Springer Science & Business Media, 2013.
• Khan et al. (2013) R. A. Khan, A. Meyer, H. Konik, S. Bouakaz, Framework for reliable, real-time facial expression recognition for low resolution images, Pattern Recognition Letters 34 (2013) 1159–1168.
• Khan et al. (2019) R. A. Khan, A. Meyer, H. Konik, S. Bouakaz, Saliency-based framework for facial expression recognition, Frontiers of Computer Science 13 (2019) 183–198. URL: https://doi.org/10.1007/s11704-017-6114-9. doi:10.1007/s11704-017-6114-9.
• Khan et al. (2013) R. A. Khan, A. Meyer, H. Konik, S. Bouakaz, Pain detection through shape and appearance features, in: 2013 IEEE International Conference on Multimedia and Expo (ICME), 2013, pp. 1–6. doi:10.1109/ICME.2013.6607608.
• Akram et al. (2013) M. U. Akram, S. Khalid, S. A. Khan, Identification and classification of microaneurysms for early detection of diabetic retinopathy, Pattern Recogn. 46 (2013) 107–116. URL: http://dx.doi.org/10.1016/j.patcog.2012.07.002. doi:10.1016/j.patcog.2012.07.002.
• Jaliaawala and Khan (2019) M. S. Jaliaawala, R. A. Khan, Can autism be catered with artificial intelligence-assisted intervention technology? a comprehensive survey, Artificial Intelligence Review (2019) 1–32.
• Di Martino et al. (2014) A. Di Martino, C.-G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. Assaf, S. Y. Bookheimer, M. Dapretto, et al., The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism, Molecular psychiatry 19 (2014) 659.
• Choi (2017) H. Choi, Functional connectivity patterns of autism spectrum disorder identified by deep feature learning, connections 4 (2017) 5.
• Organization (1993) W. H. Organization, The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research, volume 2, World Health Organization, 1993.
• Castillo et al. (2007) R. Castillo, D. Carlat, T. Millon, C. Millon, S. Meagher, S. Grossman, R. Rowena, J. Morrison, A. P. Association, et al., Diagnostic and statistical manual of mental disorders, Washington, DC: American Psychiatric Association Press, 2007.
• Baron-Cohen et al. (2001) S. Baron-Cohen, S. Wheelwright, R. Skinner, J. Martin, E. Clubley, The autism-spectrum quotient (aq): Evidence from asperger syndrome/high-functioning autism, malesand females, scientists and mathematicians, Journal of autism and developmental disorders 31 (2001) 5–17.
• Klin et al. (2000) A. Klin, F. R. Volkmar, S. S. Sparrow, Asperger syndrome, Guilford Press New York, 2000.
• Barkley and Murphy (1998) R. A. Barkley, K. R. Murphy, Attention-deficit hyperactivity disorder: A clinical workbook, Guilford Press, 1998.
• Simonoff et al. (2008) E. Simonoff, A. Pickles, T. Charman, S. Chandler, T. Loucas, G. Baird, Psychiatric disorders in children with autism spectrum disorders: prevalence, comorbidity, and associated factors in a population-derived sample, Journal of the American Academy of Child & Adolescent Psychiatry 47 (2008) 921–929.
• Schnitzlein and Murtagh (1985) H. N. Schnitzlein, F. R. Murtagh, Imaging anatomy of the head and spine. a photographic color atlas of mri, ct, gross and microscopic anatomy in axial, coronal, and sagittal planes, Journal of Neurology, Neurosurgery and Psychiatry. (1985).
• Rice (2009) C. Rice, Prevalence of autism spectrum disorders-autism and developmental disabilities monitoring network, united states, 2006, Morbidity and Mortality Weekly Report (MMWR) - Surveillance Summary (2009).
• Buescher et al. (2014) A. V. Buescher, Z. Cidav, M. Knapp, D. S. Mandell, Costs of autism spectrum disorders in the united kingdom and the united states, JAMA pediatrics 168 (2014) 721–728.
• Horlin et al. (2014) C. Horlin, M. Falkmer, R. Parsons, M. Albrecht, T. Falkmer, The cost of autism spectrum disorders, PLoS One 9 (2014).
• Del Valle Rubido et al. (2018) M. Del Valle Rubido, J. T. McCracken, E. Hollander, F. Shic, J. Noeldeke, L. Boak, O. Khwaja, S. Sadikhov, P. Fontoura, D. Umbricht, In search of biomarkers for autism spectrum disorder, Autism Research 11 (2018) 1567–1579.
• Kushki et al. (2013) A. Kushki, E. Drumm, M. P. Mobarak, N. Tanel, A. Dupuis, T. Chau, E. Anagnostou, Investigating the autonomic nervous system response to anxiety in children with autism spectrum disorders, PLoS one 8 (2013) e59730.
• Bourgeron (2009) T. Bourgeron, A synaptic trek to autism, Current opinion in neurobiology 19 (2009) 231–234.
• Schipul et al. (2011) S. E. Schipul, T. A. Keller, M. A. Just, Inter-regional brain communication and its disturbance in autism, Frontiers in systems neuroscience 5 (2011) 10.
• Bullmore and Sporns (2009) E. Bullmore, O. Sporns, Complex brain networks: graph theoretical analysis of structural and functional systems, Nature Reviews Neuroscience 10 (2009) 186.
• Giedd (2004) J. N. Giedd, Structural magnetic resonance imaging of the adolescent brain, Annals of the New York Academy of Sciences 1021 (2004) 77–85.
• Haacke et al. (2009) E. Haacke, S. Mittal, Z. Wu, J. Neelavalli, Y.-C. Cheng, Susceptibility-weighted imaging: Technical aspects and clinical applications, part 1, American Journal of Neuroradiology 30 (2009) 19–30.
• Rutherford and Bydder (2002) M. A. Rutherford, G. M. Bydder, MRI of the Neonatal Brain, WB Saunders London, 2002.
• Budman et al. (1992) S. H. Budman, M. F. Hoyt, S. Friedman, The first session in brief therapy, Guilford Press, 1992.
• Huettel et al. (2004) S. A. Huettel, A. W. Song, G. McCarthy, et al., Functional magnetic resonance imaging, volume 1, Sinauer Associates Sunderland, MA, 2004.
• Plitt et al. (2015) M. Plitt, K. A. Barnes, A. Martin, Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards, NeuroImage: Clinical 7 (2015) 359–366.
• Smith et al. (2009) S. M. Smith, P. T. Fox, K. L. Miller, D. C. Glahn, P. M. Fox, C. E. Mackay, N. Filippini, K. E. Watkins, R. Toro, A. R. Laird, et al., Correspondence of the brain’s functional architecture during activation and rest, Proceedings of the National Academy of Sciences 106 (2009) 13040–13045.
• Bishop (2006) C. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag New York, 2006.
• Chandrashekar and Sahin (2014) G. Chandrashekar, F. Sahin, A survey on feature selection methods, Computers & Electrical Engineering 40 (2014) 16 – 28.
• Haxby et al. (2001) J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten, P. Pietrini, Distributed and overlapping representations of faces and objects in ventral temporal cortex, Science 293 (2001) 2425–2430.
• Buchweitz et al. (2012) A. Buchweitz, S. V. Shinkareva, R. A. Mason, T. M. Mitchell, M. A. Just, Identifying bilingual semantic neural representations across languages, Brain and language 120 (2012) 282–289.
• Bauer and Just (2015) A. J. Bauer, M. A. Just, Monitoring the growth of the neural representations of new animal concepts, Human brain mapping 36 (2015) 3213–3226.
• Kassam et al. (2013) K. S. Kassam, A. R. Markey, V. L. Cherkassky, G. Loewenstein, M. A. Just, Identifying emotions on the basis of neural activation, PloS one 8 (2013) e66032.
• Bellak (1994) L. Bellak, The schizophrenic syndrome and attention deficit disorder. Thesis, antithesis, and synthesis?, The American psychologist 49 (1994) 25–29. URL: https://doi.org/10.1037//0003-066X.49.1.25. doi:10.1037//0003-066x.49.1.25.
• Just et al. (2014) M. A. Just, V. L. Cherkassky, A. Buchweitz, T. A. Keller, T. M. Mitchell, Identifying autism from neural representations of social interactions: neurocognitive markers of autism, PloS one 9 (2014) e113879.
• Craddock et al. (2009) R. C. Craddock, P. E. Holtzheimer III, X. P. Hu, H. S. Mayberg, Disease state prediction from resting state functional connectivity, Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 62 (2009) 1619–1628.
• Greicius et al. (2007) M. Greicius, B. Flores, V. Menon, G. Glover, H. Solvason, H. Kenna, A. Reiss, A. Schatzberg, Resting-state functional connectivity in major depression: abnormally increased contributions from subgenual cingulate cortex and thalamus, Biological Psychiatry 62 (2007).
• Sabuncu et al. (2015) M. R. Sabuncu, E. Konukoglu, A. D. N. Initiative, et al., Clinical prediction from structural brain mri scans: a large-scale empirical study, Neuroinformatics 13 (2015) 31–46.
• Vapnik (2013) V. Vapnik, The nature of statistical learning theory, Springer science & business media, 2013.
• Konukoglu et al. (2012) E. Konukoglu, B. Glocker, D. Zikic, A. Criminisi, Neighbourhood approximation forests, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2012, pp. 75–82.
• Tipping (2001) M. E. Tipping, Sparse bayesian learning and the relevance vector machine, Journal of machine learning research 1 (2001) 211–244.
• Arbabshirani et al. (2017) M. R. Arbabshirani, S. Plis, J. Sui, V. D. Calhoun, Single subject prediction of brain disorders in neuroimaging: promises and pitfalls, NeuroImage 145 (2017) 137–165.
• Nielsen et al. (2013) J. A. Nielsen, B. A. Zielinski, P. T. Fletcher, A. L. Alexander, N. Lange, E. D. Bigler, J. E. Lainhart, J. S. Anderson, Multisite functional connectivity mri classification of autism: Abide results, Frontiers in human neuroscience 7 (2013) 599.
• Huettel et al. (2004) S. A. Huettel, A. W. Song, G. McCarthy, et al., Functional magnetic resonance imaging, volume 1, Sinauer Associates Sunderland, MA, 2004.
• Koyamada et al. (2015) S. Koyamada, Y. Shikauchi, K. Nakae, M. Koyama, S. Ishii, Deep learning of fmri big data: a novel approach to subject-transfer decoding, arXiv preprint arXiv:1502.00093 (2015).
• LeCun et al. (2015) Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
• Plis et al. (2014) S. M. Plis, D. R. Hjelm, R. Salakhutdinov, E. A. Allen, H. J. Bockholt, J. D. Long, H. J. Johnson, J. S. Paulsen, J. A. Turner, V. D. Calhoun, Deep learning for neuroimaging: a validation study, Frontiers in neuroscience 8 (2014) 229.
• Heinsfeld et al. (2018) A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, F. Meneguzzi, Identification of autism spectrum disorder using deep learning and the abide dataset, NeuroImage: Clinical 17 (2018) 16–23.
• Vincent et al. (2008) P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, ACM, 2008, pp. 1096–1103. URL: http://doi.acm.org/10.1145/1390156.1390294. doi:10.1145/1390156.1390294.
• Khan et al. (2019) R. A. Khan, A. Crenn, A. Meyer, S. Bouakaz, A novel database of children’s spontaneous facial expressions (LIRIS-CSE), Image Vision Comput. (2019).
• Vincent et al. (2010) P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of machine learning research 11 (2010) 3371–3408.
• Zaidel and Iacoboni (2003) E. Zaidel, M. Iacoboni, The parallel brain: the cognitive neuroscience of the corpus callosum, MIT press, 2003.
• Hinkley et al. (2012) L. B. N. Hinkley, E. J. Marco, A. M. Findlay, S. Honma, R. J. Jeremy, Z. Strominger, P. Bukshpun, M. Wakahiro, W. S. Brown, L. K. Paul, A. J. Barkovich, P. Mukherjee, S. S. Nagarajan, E. H. Sherr, The role of corpus callosum development in functional connectivity and cognitive processing, PLOS ONE 7 (2012) 1–17.
• Tomasch (1954) J. Tomasch, Size, distribution, and number of fibres in the human Corpus Callosum, The anatomical records (1954).
• Hiess et al. (2015) R. K. Hiess, R. Alter, S. Sojoudi, B. Ardekani, R. Kuzniecky, H. Pardoe, Corpus callosum area and brain volume in autism spectrum disorder: quantitative analysis of structural mri from the abide database, Journal of autism and developmental disorders 45 (2015) 3107–3114.
• NordenskjÃ¶ld et al. (2013) R. NordenskjÃ¶ld, F. Malmberg, E.-M. Larsson, A. Simmons, S. J. Brooks, L. Lind, H. AhlstrÃ¶m, L. Johansson, J. Kullberg, Intracranial volume estimated with commonly used methods could introduce bias in studies including brain volume measurements, NeuroImage 83 (2013) 355 – 360.
• Waiter et al. (2005) G. D. Waiter, J. H. Williams, A. D. Murray, A. Gilchrist, D. I. Perrett, A. Whiten, Structural white matter deficits in high-functioning individuals with autistic spectrum disorder: a voxel-based investigation, Neuroimage 24 (2005) 455–461.
• Chung et al. (2004) M. K. Chung, K. M. Dalton, A. L. Alexander, R. J. Davidson, Less white matter concentration in autism: 2d voxel-based morphometry, Neuroimage 23 (2004) 242–251.
• Witelson (1989) S. F. Witelson, Hand and sex differences in the isthmus and genu of the human corpus callosum: a postmortem morphological study, Brain 112 (1989) 799–835.
• Venkatasubramanian et al. (2007) G. Venkatasubramanian, G. Anthony, U. S. Reddy, V. V. Reddy, P. N. Jayakumar, V. Benegal, Corpus callosum abnormalities associated with greater externalizing behaviors in subjects at high risk for alcohol dependence, Psychiatry Research: Neuroimaging 156 (2007) 209 – 215.
• Act (1996) A. Act, Health insurance portability and accountability act of 1996, Public law 104 (1996) 191.
• Ardekani (2013) B. Ardekani, Yuki module of the automatic registration toolbox (art) for corpus callosum segmentation, Google Scholar (2013).
• Yushkevich et al. (2006) P. A. Yushkevich, J. Piven, H. C. Hazlett, R. G. Smith, S. Ho, J. C. Gee, G. Gerig, User-guided 3d active contour segmentation of anatomical structures: significantly improved efficiency and reliability, Neuroimage 31 (2006) 1116–1128.
• Malone et al. (2015) I. B. Malone, K. K. Leung, S. Clegg, J. Barnes, J. L. Whitwell, J. Ashburner, N. C. Fox, G. R. Ridgway, Accurate automatic estimation of total intracranial volume: a nuisance variable with less nuisance, Neuroimage 104 (2015) 366–372.
• Manjón and Coupé (2016) J. V. Manjón, P. Coupé, volbrain: an online mri brain volumetry system, Frontiers in neuroinformatics 10 (2016) 30.
• Khan (2013) R. A. Khan, Detection of emotions from video in non-controlled environment, Ph.D. thesis, LIRIS, Universite Claude Bernard Lyon1, France, 2013.
• Quinlan (1986) J. R. Quinlan, Induction of decision trees, Machine learning 1 (1986) 81–106.
• Yu and Liu (2003) L. Yu, H. Liu, Feature selection for high-dimensional data: A fast correlation-based filter solution, in: Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 856–863.
• Kononenko and Hong (1997) I. Kononenko, S. J. Hong, Attribute selection for modelling, Future Generation Computer Systems 13 (1997) 181–195.
• Doshi (2014) M. Doshi, Correlation based feature selection (cfs) technique to predict student perfromance, International Journal of Computer Networks & Communications 6 (2014) 197.
• Brown et al. (2012) G. Brown, A. Pocock, M.-J. Zhao, M. Luján, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, Journal of machine learning research 13 (2012) 27–66.
• Peng et al. (2005) H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis & Machine Intelligence (2005) 1226–1238.
• Jain and Huang (2004) A. Jain, J. Huang, Integrating independent components and linear discriminant analysis for gender classification, in: Automatic Face and Gesture Recognition, 2004. Proceedings. Sixth IEEE International Conference on, IEEE, 2004, pp. 159–163.
• Mitchell (1997) T. M. Mitchell, Machine Learning, McGraw-Hill Series in Computer Science, 1997.
• Jain et al. (1996) A. K. Jain, J. Mao, K. M. Mohiuddin, Artificial neural networks: A tutorial, Computer 29 (1996) 31–44.
• Gardner and Dorling (1998) M. W. Gardner, S. Dorling, Artificial neural networks (the multilayer perceptron)âa review of applications in the atmospheric sciences, Atmospheric environment 32 (1998) 2627–2636.
• Hecht-Nielsen (1992) R. Hecht-Nielsen, Theory of the backpropagation neural network, in: Neural networks for perception, Elsevier, 1992, pp. 65–93.
• Acuna and Rodriguez (2004) E. Acuna, C. Rodriguez, The treatment of missing values and its effect on classifier accuracy, in: Classification, clustering, and data mining applications, Springer, 2004, pp. 639–647.
• Kohavi et al. (1995) R. Kohavi, et al., A study of cross-validation and bootstrap for accuracy estimation and model selection, in: International joint conference on Artificial intelligence, volume 14, Montreal, Canada, 1995, pp. 1137–1145.
• Weiss et al. (2016) K. Weiss, T. M. Khoshgoftaar, D. Wang, A survey of transfer learning, Journal of Big Data 3 (2016).
• Szegedy et al. (2017) C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, Inception-v4, Inception-ResNet and the impact of residual connections on learning, in: Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 4278–4284.
• Han et al. (2015) S. Han, J. Pool, J. Tran, W. Dally, Learning both weights and connections for efficient neural network, in: Advances in neural information processing systems, 2015, pp. 1135–1143.
• Szegedy et al. (2015) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters