Anatomical Pattern Analysis for decoding visual stimuli in human brains

Anatomical Pattern Analysis for decoding visual stimuli in human brains

Abstract

Background: A universal unanswered question in neuroscience and machine learning is whether computers can decode the patterns of the human brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool for addressing this question. However, there are two challenges in the previous MVPA methods, which include decreasing sparsity and noise in the extracted features and increasing the performance of prediction.

Methods: In overcoming mentioned challenges, this paper proposes Anatomical Pattern Analysis (APA) for decoding visual stimuli in the human brain. This framework develops a novel anatomical feature extraction method and a new imbalance AdaBoost algorithm for binary classification. Further, it utilizes an Error-Correcting Output Codes (ECOC) method for multiclass prediction. APA can automatically detect active regions for each category of the visual stimuli. Moreover, it enables us to combine homogeneous datasets for applying advanced classification.

Results and Conclusions: Experimental studies on 4 visual categories (words, consonants, objects and scrambled photos) demonstrate that the proposed approach achieves superior performance to state-of-the-art methods.

1Introduction

In order to decode visual stimuli in the human brain, Multi-Voxel Pattern Analysis (MVPA) technique [31] must apply machine learning methods to task-based functional Magnetic Resonance Imaging (fMRI) datasets. Indeed, analyzing the patterns of visual objects is one of the most interesting topics in MVPA, which can enable us to understand how brain stores and processes the visual stimuli [13]. Technically, there are two challenges in the previous studies. As the first issue, trained features are sparse and noisy because most of the previous studies in whole-brain analysis directly utilized raw voxels for predicting the stimuli [13]. As the second challenge, improving the performance of prediction is so hard because task-based fMRI datasets can be considered as the imbalanced classification problems. For instance, consider collected data with 10 same size categories. Since this dataset is imbalance for (one-versus-all) binary classification, most of the classical algorithms cannot provide acceptable performance [13].

As the main contributions, this paper proposes Anatomical Pattern Analysis (APA) for decoding visual stimuli in the human brain. To generate a normalized view, APA automatically detects the active regions and then extracts features based on the brain anatomical structures. Indeed, the normalized view can enable us to combine homogeneous datasets, and it can decrease noise, sparsity, and error of learning. Further, this paper develops a modified version of imbalance Adapting Boosting (AdaBoost) algorithm for binary classification. This algorithm uses a supervised random sampling and penalty values, which are calculated by the correlation between different classes, for improving the performance of prediction. This binary classification will be used in a one-versus-all ECOC method as a multiclass approach for classifying the categories of the brain response.

The rest of this paper is organized as follows: The related works are presented in Section 2. This paper introduces the proposed method in Section 3. Experimental results are reported in Section 4; and finally, this paper presents conclusion and pointed out some future works in Section 5.

2Related Works

There are three different types of studies for decoding stimuli in the human brain. Pioneer studies just focused on recognizing special regions of the human brain, such as inanimate objects [26], faces [20], visually illustration of words [6], body parts [24], and visual objects [14]. Although they proved that different stimuli can provide distinctive responses in the brain regions, they cannot find the deterministic locations (or patterns) related to each category of stimuli.

The next group of studies developed correlation techniques in order to understand the similarity (or difference) between distinctive stimuli. Haxby et al. employed brain patterns located in Fusiform Face Area (FFA) and Parahippocampal Place Area (PPA) in order to analyze correlations between different categories of visual stimuli, i.e. gray-scale images of faces, houses, cats, bottles, scissors, shoes, chairs, and scrambled (nonsense) photos [14]. Kamitani and Tong studied the correlations of low-level visual features in the visual cortex (V1–V4) [19]. In similar studies, Haynes et al. analyzed distinctive mental states [16] and more abstract brain patterns such as intentions [17]. Kriegeskorte et al. proposed Representational Similarity Analysis (RSA) in order to evaluate the similarities (or differences) among distinctive brain states [22]. Connolly et al. utilized RSA in order to compare the correlations between human brains and monkey brains [7]. RSA demonstrates that the representations of each category of stimuli in distinctive brain regions have a different structure [22]. Rice et al. proved that not only the brain responses are different based on the categories of the stimuli but also they are correlated based on different properties of the stimuli. They extracted the properties of visual stimuli (photos of objects) and calculated the correlations between these properties and the brain responses. They separately reported the correlation matrices for different human faces and different objects (houses, chairs, etc.) [34].

The last group of studies proposed the MVPA techniques for predicting the category of visual stimuli. Cox et al. utilized linear and non-linear versions of Support Vector Machine (SVM) algorithm [9]. In order to decode the brain patterns, some studies [2] employed classical feature selection (ranking) techniques, such as Principal Component Analysis (PCA) [2], Linear Discriminant Analysis (LDA) [33], or Independent Component Analysis (ICA) [37], that these method are mostly used for analyzing rest-state fMRI datasets. Recent studies proved that not only these techniques cannot provide stable performance in the task-based fMRI datasets [4] but also they had spatial locality issue, especially when they were used for whole brain functional analysis [5]. Norman et al. argued for using SVM and Gaussian Naive Bayes classifiers [31]. Kay et al. studied decoded orientation, position and object category from the brain activities in visual cortex [21]. Mitchell et al. introduced a new method in order to predict the brain activities associated with the meanings of nouns [28]. Miyawaki et al. utilized a combination of multiscale local image decoders in order to reconstruct the visual images from the brain activities [29]. In order to generalize the testing procedure for task-based fMRI datasets, Kriegeskorte et al. proved that the data in testing must have no role in the procedure of generating an MVPA model [23].

There are also some studies that focused on sparse learning techniques. Yamashita et al. developed Sparse Logistic Regression (SLR) in order to improve the performance of classification models [38]. Carroll et al. employed the Elastic Net for prediction and interpretation of distributed neural activity with sparse models [3]. Varoquaux et al. proposed a small-sample brain mapping by using sparse recovery on spatially correlated designs with randomization and clustering. Their method is applied on small sets of brain patterns for distinguishing different categories based on a one-versus-one strategy [35].

As the first modern approaches for decoding visual stimuli, Anderson and Oates applied non-linear Artificial Neural Network (ANN) on brain responses [1]. McMenamin et al. studied subsystems underlie Abstract-Category (AC) recognition and priming of objects (e.g., cat, piano) and Specific-Exemplar (SE) recognition and priming of objects (e.g., a calico cat, a different calico cat, a grand piano, etc.). Technically, they applied SVM on manually selected ROIs in the human brain for generating the visual stimuli predictors [27]. Mohr et al. compared four different classification methods, i.e. L1/L2 regularized SVM, the Elastic Net, and the Graph Net, for predicting different responses in the human brain. They show that L1-regularization can improve classification performance while simultaneously providing highly specific and interpretable discriminative activation patterns [30]. Osher et al. proposed a network (graph) based approach by using anatomical regions of the human brain for representing and classifying the different visual stimuli responses (faces, objects, bodies, scenes) [32].

Anatomical Pattern Analysis (APA) framework
Anatomical Pattern Analysis (APA) framework

3The Proposed Method

Blood Oxygen Level Dependent (BOLD) signals are used in fMRI techniques for representing the neural activates. Based on hyperalignment problem in the brain decoding [13], quantity values of the BOLD signals in the same experiment for the two subjects are usually different. Therefore, MVPA techniques use the correlation between different voxels as the pattern of the brain response [32]. As depicted in Figure ?, each fMRI experiment includes a set of sessions (time series of images), which can be captured by different subjects or just repeating the imaging procedure with a unique subject. Technically, each session can be partitioned into a set of visual stimuli categories. Indeed, an independent category denotes a set of homogeneous conditions, which are generated by using the same type of photos as the visual stimuli. For instance, if a subject watches 6 photos of cats and 5 photos of houses during a unique session, this image includes 2 different categories and 11 conditions.

3.1Feature Extraction

Preprocessed fMRI time series collected for sessions can be defined by , where is the number of time points, denotes the number of voxels in the original space, and defines the functional activity for the session in time point and voxel. Indeed, images (tensors) in fMRI time series are considered vectorized for simplicity [19]. In addition, onsets (or time series) in the session is defined as follows:

Here, denotes the number of categories of visual stimulus, is the number of stimuli, the vector denotes the onsets belonged to condition, and is the number of time points for this condition, where . By considering (1), all time points for condition can be also defined as follows:

This paper employs maximum functional activities in each voxel as the condition:

In order to extract active voxels and then automatically define Region of Interests (ROIs), can be also written as a general linear model:

where denotes the design matrix, is the noise (error of estimation), and also denotes the set of correlations for session. Here, and respectively denote the design vector and voxel correlations belonged to category of stimuli. Now, design matrix can be calculated by:

where is the Hemodynamic Response Function (HRF) signal [12], and denotes the convolution operator. In order to solve (4), this paper uses Generalized Least Squares (GLS) [12] approach as follows:

where is the covariance matrix of the noise (var) [13]. Further, activated voxels can be defined as follows by using the positive values of the correlation matrix:

where denotes the estimated correlation matrix in . Indeed, non-zero elements in are all activated voxels belonged to category of stimuli. These activated voxel correlations can be applied to the conditions as follows:

where denotes Hadamard product. Here, the condition must be belonged to the category of stimuli. Since mapping whole of fMRI time series to standard space decreases the performance of the final results, most of the previous studies use the original images instead of the standard version. By considering for each condition, this paper enables to map brain activities to a standard space. This mapping can provide normalized view for combing homogeneous datasets. For registering to standard space, this paper utilizes the fMRI Linear Image Registration Tool (FLIRT) algorithm [18], which minimizes the following cost function:

where denotes the reference image, the function is the Normalized Mutual Information between two images, denotes the transformation matrix. The performance of will be analyzed in Section 4. Further, the final mapping can be also defined as follows:

where denotes the number of voxels in the standard space. In order to reduce the sparsity of , this paper employs an anatomical atlas. Now, consider where is the number of atlas regions, ,
, and denotes the set of indexes of voxels for the region. The extracted feature for session in condition and anatomical region is calculated as follows:

Finally, the extracted features for each condition can be defined by , where . Here, denotes the extracted features for the session. In addition, whole of dataset can be defined by , where each column denotes the extracted features for an individual stimulus.

The proposed AdaBoost algorithm for applying a robust binary classification
The proposed AdaBoost algorithm for applying a robust binary classification

3.2Binary Classification Algorithm

In previous sections, we mentioned the imbalance issue in the MVPA analysis. In practice, there are two approaches in order to deal with this issue, i.e. designing an imbalance classifier, or converting the imbalance problem to an ensemble of balance classification models. Previous studies demonstrated that the performance of imbalance classifiers may not be stable, especially when we have sparsity and noise in our datasets [25]. Since fMRI datasets mostly include noise and sparsity, this paper has chosen the ensemble approach. Technically, ensemble learning also contains two groups of solutions, i.e. bagging or boosting. While bagging generates all classifiers at the same time and then combine all of them as the final model, the boosting gradually creates each classifier in order to improve the performance of each iteration by tracing errors of previous iterations. We just have to note that ensemble learning can be used in both balance and imbalance problems. In fact, the main difference comes from the strategy of sampling. In balance problems, sampling methods are applied to the whole of datasets, whereas instances of the large class are sampled in the imbalance problems [25]. As depicted in Figure ?, this paper presents a new branch of AdaBoost algorithm in order to significantly improve the performance of the final model in fMRI analysis. In a nutshell, this algorithm firstly converts an imbalance MVPA problem to a set of balance problems. Then, it iteratively applies the decision tree [25] to each of these balance problems. Finally, AdaBoost is used in order to generate the final model. In the proposed method, the weight of each classifier (tree) for the final combination is generated based on the error (failed predictions) of the previous iterations for gradually improving the performance of the final model.

In order to apply the binary classification, this paper randomly partitions the extracted features into the training set and the testing set . As a new branch of AdaBoost algorithm, Algorithm ? employs for training binary classification. Then, is utilized for estimating the performance of the final model. As mentioned before, the binary classification for fMRI analysis is mostly imbalance, especially by using a one-versus-all strategy. Consequently, the number of samples in one of these binary classes is smaller than the other classes. As previously mentioned, this paper exploits this concept in order to solve the imbalance issue. Indeed, Algorithm 1 firstly partitions the training data into small and large classes (groups) based on the class labels . Here, all labels are except the label of instances belong to category of visual stimuli. Then, it calculates the scale of existed elements between two classes. We have to note that defines the floor function. As the next step, the large class is randomly partitioned into parts. Indeed, is the number of balance subsets generated from the imbalance dataset. Consequently, the number of the ensemble iteration is . In each balance subset, training data is generated by all instances of the small class , one of the partitioned parts of the large class , and the instances of the previous iteration , which cannot truly be trained (the failed predictions). After that, training weights for the final combination () are calculated by using the Pearson correlation (corr) between training instances, where larger values increase the learning sensitivity. Indeed, these weights are always maximized for the instances of the small class and the failed instances of the previous iterations. Further, the weights of the other instances are a scale of the correlation between the large class and the small class. Therefore, these weights are updated in each iteration based on the performance of previous iterations. As the last step of each iteration, the proposed method generates a classification model () and its weight () for the final combination. While can denote any kind of weighted classification algorithm, this paper employs a simple weighted decision tree [25] as the classification model. At the end, the final model is created by applying the AdaBoost method to the generated balance classifiers.

3.3Multiclass Classification Algorithm

In this paper, a multiclass classifier is a prediction model in order to map extracted features to the category of visual stimuli, i.e. where . Generally, there are two techniques for applying multiclass classification. The first approach directly creates the classification model such as multiclass support vector machine [9] or neural network [31]. In contrast, decomposition design (indirect) uses an array of binary classifiers for solving the multiclass problems.

The proposed Error-Correcting Output Codes (ECOC) approach for multiclass classification
The proposed Error-Correcting Output Codes (ECOC) approach for multiclass classification

Table 1: tbl:datasets
Title ID S U P T X Y Y Scanner TR TE
Visual Object Recognition DS105 6 71 8 121 79 79 75 G 3T 2500 30
Word and Object Processing DS107 49 98 4 164 53 63 52 S 3T 2000 28
Multi-subject, multi-modal DS117 20 171 2 210 64 61 33 S 3T 2000 30

denotes the number of subject; is the number of sessions; denotes the number of stimulus categories; is the number of scans in unites of TRs (Time of Repetition); are the size of 3D images in the original space; Scanners include Siemens, and General Electric in 3 Tesla; TR is Time of Repetition in millisecond; TE denotes Echo Time in millisecond; Please see openfmri.org for more information.

Based on the previous discussion related to imbalance issue in fMRI datasets, this paper utilizes Error-Correcting Output Codes (ECOC) as an indirect multiclass approach in order to extend the proposed binary classifier for the multiclass prediction. As depicted in Figure ?, ECOC includes three components, i.e. base algorithms, coding matrix and decoding procedures [11]. Since this paper uses one-versus-all encoding strategy, Algorithm ? is employed as the based algorithms () in the ECOC, where it generates a binary classifier for each category of visual stimuli. In other words, each independent category of the visual stimuli is compared with the rest of categories. Consequently, the size of the coding matrix is , where diagonal cell of this matrix represents the positive predictions belong to the category of visual stimuli and the rest of cells in this matrix determine the other categories of visual stimuli. Indeed, the number of classifiers in this strategy is exactly equal to the number of categories. As decoding stage, binary predictions, which are generated by applying the brain response to the base algorithms, are assigned to the category in the coding matrix with closest Hamming distance.

In order to present an example for ECOC procedure, consider fMRI dataset with categories of visual stimuli, i.e. photos of shoes, houses, bottles, and human faces. In this problem, different binary classifiers must be trained in order to distinguish each category of visual stimuli versus the rest of them (one-versus-all strategy). A coding matrix is also generated where each diagonal element represents the positive class of these categories (classifiers). By considering the order of the coding matrix, each prediction is assigned to the closest Hamming distance in the coding matrix. In other words, if these classifiers generate the prediction for a testing instance, then this instance definitely belongs to the first category of visual stimuli. Similarly, the prediction means the instance belongs to the second category, and etc.

4Results

4.1Datasets

As depicted in Table 1, this paper employs datasets, shared by openfmri.org, for running empirical studies. As the first dataset, ‘Visual Object Recognition’ (DS105) includes categories of visual stimuli, i.e. gray-scale images of faces, houses, cats, bottles, scissors, shoes, chairs, and scrambled (nonsense) photos. This dataset is analyzed in high-level visual stimuli as the binary predictor, by considering all categories except scrambled photos as objects, and low-level visual stimuli in the multiclass prediction. Please see [13] for more information. As the second dataset, ‘Word and Object Processing’ (DS107) contains categories of visual stimuli, i.e. words, objects, scrambles, and consonants. Please see [10] for more information. As the last data set, ‘Multi-subject, multi-modal human neuroimaging dataset’ (DS117) includes MEG and fMRI images. This paper just uses the fMRI images of this dataset. It also contains categories of visual stimuli, i.e. human faces, and scrambles. Please see [36] for more information.

These datasets are preprocessed by SPM 12 (www.fil.ion.ucl.ac.uk/spm/), i.e. slice timing, realignment, normalization, smoothing. Further, whole-brain functional alignment is applied based on [5]. Then, the beta values are calculated for each session. This paper employs the MNI 152 T1 1mm (see Figure ?.d) as the reference image () in for registering the extracted conditions () to the standard space (). In addition, this paper uses Talairach Atlas (contains regions) in for extracting features.

Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.
Extracted features based on different stimuli, i.e. (A) word, (B) object, and (C) scramble. (D) The effect of different objective functions in (4) on the error of registration.

4.2Parameter Analysis

The registration objective function in will be analyzed in this section by using different distance metrics, i.e. Woods function (W), Correlation Ratio (CR), Joint Entropy (JE), Mutual Information (MI), and Normalized Mutual Information (NMI) [41]. Figures ?.a-c demonstrate examples of brain responses to different stimuli, i.e. (a) word, (b) object, and (c) scramble. Here, gray parts show the anatomical atlas, the colored parts (red, yellow and green) define the functional activities, and also the red rectangles illustrate the error areas after registration. Indeed, these errors can be formulated as the nonzero areas in the brain image which are located in the zero area of the anatomical atlas (the area without region number). Indeed, the registration errors are mostly related to the distance metrics. Previous studies illustrated that the entropy-based metrics can provide better performance [41]. The performances of objective function on DS105, DS107, and DS117 datasets are analyzed in Figure ?.d by using the mentioned distance metrics. As depicted in this figure, the entropy-based metrics (JE, MI, NMI) have provided better performance in comparison with other metrics. Since NMI uses normalization for removing the scaling effect [41], it generated the best results among the entropy-based metrics. Therefore, this paper employs NMI as the distance metric in Equation 4 for mapping the brain activities from the original space to the standard space.

The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.
The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.
The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.
The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.
The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.
The correlation matrices: (A) raw voxels and (B) extracted features in the DS105 dataset, (C) raw voxels and (D) extracted features in the DS107 dataset, (E) raw voxels and (F) extracted features in the DS117 dataset.

4.3Correlation Analysis

The correlations of the extracted features will be compared with the correlations of the original voxels in this section. Previous studies illustrated that patterns of different Abstract-Categories (ACs), which is extracted from a suitable feature representation, must provide distinctive correlation values [27]. Therefore, the main assumption in this section is that better feature representation (extraction) can improve the correlation analysis, where the correlation between different categories of visual stimuli must be significantly
smaller than the correlation between stimuli belonged to the same category. In order to provide a better perspective, the extracted features are compared by considering two different levels. At the first level, the feature space is compared with the whole of raw voxels in the original space, where this comparison analyzes the correlation between whole-brain data and automatically detected ROIs. At the second level, the correlation values among different ACs are compared in the feature space that it shows how much the feature space is well-designed.

Figure ?.A, B, and E illustrate the correlation matrix of the DS105, DS107, and DS117 at the raw voxel space, respectively. Furthermore, Figure ?.B, D, and F respectively show the correlation matrix the DS105, DS107, and DS117 in the feature space. As these figures depicted, different ACs are highly correlated in the voxel space. Indeed, the average of correlations is around in DS105 and DS107. And, this average is around in DS117 because this dataset just includes classes (photos of scramble and human face). The main reason for these results is that brain responses in the voxel space are sparse, high-dimensional and noisy. Therefore, it is so hard to discriminate between different categories (ACs) in the original space, especially when whole-brain data will be analyzed. By contrast, the feature space (Figure ?.B, D, and E) provides distinctive representation when the proposed method used the correlated patterns in each anatomical region as the extracted features.

Table 2: Accuracy of binary predictors (meanstd)
Alg., Datasets DS105 (Objects) DS107 (Words) DS107 (Consonants) DS107 (Objects) DS107 (Scramble) DS117
SVM 71.650.97 69.891.02 67.840.82 65.321.67 67.960.87 81.251.03
Elastic Net 80.770.61 78.260.79 75.539.87 84.150.89 87.340.93 86.490.70
Graph Net 79.230.74 79.910.91 74.010.84 85.960.76 86.210.51 85.490.88
PCA 72.150.76 70.320.92 69.571.10 68.780.64 69.410.35 81.920.87
ICA 73.250.81 70.820.67 71.870.94 67.990.75 72.480.89 80.711.16
Selected ROI 83.060.36 89.620.52 87.820.37 84.220.44 86.190.26 85.190.56
L1 Reg. SVM 85.290.49 81.140.91 79.690.69 75.320.41 78.450.62 85.460.29
Graph-based 90.821.23 94.210.83 95.540.99 95.620.83 93.100.78 86.610.61
PCA + Algorithm 83.610.97 80.120.81 79.470.91 82.801.01 80.520.98 86.270.88
ICA + Algorithm 84.410.93 82.210.86 78.880.78 82.300.99 83.990.84 85.571.10
APA + SVM 76.320.78 77.190.83 78.610.91 69.220.87 73.520.99 89.900.72
Binary APA 98.970.12 98.170.36 98.720.16 95.260.92 97.230.76 96.810.79
Table 3: Area under the ROC Curve (AUC) of binary predictors (meanstd)
Alg., Datasets DS105 (Objects) DS107 (Words) DS107 (Consonants) DS107 (Objects) DS107 (Scramble) DS117
SVM 68.371.01 67.760.91 63.841.45 63.170.59 66.730.92 79.360.33
Elastic Net 78.230.82 77.940.76 74.110.82 81.060.98 85.540.81 83.420.68
Graph Net 77.260.72 78.310.97 71.430.58 82.080.92 83.970.97 81.670.74
PCA 70.690.84 69.370.77 65.120.93 67.560.59 68.890.90 79.610.72
ICA 71.330.85 68.860.93 71.031.07 66.910.97 70.200.72 78.390.96
Selected ROI 82.220.42 86.350.39 85.630.61 81.540.92 85.790.42 83.710.81
L1 Reg. SVM 80.910.21 78.230.57 77.410.92 73.920.28 76.140.47 83.211.23
Graph-based 88.540.71 93.610.62 94.540.31 94.230.94 92.230.38 82.290.91
PCA + Algorithm 81.760.90 78.910.88 77.440.93 81.760.12 77.640.84 84.320.72
ICA + Algorithm 81.110.72 80.920.58 75.760.98 81.040.81 83.020.92 82.370.88
APA + SVM 72.270.86 73.591.04 76.950.94 68.141.02 71.070.79 85.100.93
Binary APA 97.060.82 97.310.82 96.210.62 94.920.11 97.210.92 94.080.84
Table 4: Accuracy of multiclass predictors (meanstd)
Alg., Datasets DS105 (# of classes = 8) DS107 (# of classes = 4) ABSTRACT (# of classes = 5) ALL (# of classes = 10)
Multiclass SVM 18.034.07 38.012.56 31.772.61 12.265.97
MLP 38.343.21 71.552.79 67.243.72 32.944.89
Selected ROI 28.722.37 68.511.07 54.192.80 35.032.66
Graph-based 50.614.83 89.692.32 78.963.32 47.645.28
Multiclass APA 59.212.05 95.611.83 95.851.05 62.932.69

The correlation between different ACs can be also meaningful in the feature space. In DS105 and DS107, the scramble (nonsense) stimuli have a low correlation () in comparison with sensible categories. As another example in DS105, human faces are mostly correlated to the photos of cats and houses (respectively , ) in comparison with other objects (the average of correlations is ). Another interesting example is the correlation between meaningful stimuli (words and objects) and nonsense stimuli (scrambles and consonants) in DS107, where the meaningful stimuli are highly correlated () and their correlations with nonsense stimuli are negative (the average of correlation is ). Since DS117 is a binary dataset, it is really a good example in order to understand the negative effect of noise and sparsity in fMRI analysis. The correlation between the face category and scramble is around in the raw voxel space, whereas this correlation is in the feature space. Indeed, the noisy and sparse raw voxels are not suitable (wise) in order to train a high-performance cognitive model.

Here, we have to note that a suitable feature representation can also improve the performance of the MVPA analysis (the final cognitive model). By considering the geometric analysis of a linear space, classification algorithms draw a hyperplane (that is known as the decision surface in neuroscience [13]) in a representational space in order to distinguish different classes (categories of visual stimuli). In a highly correlated space, the margin of error for this hyperplane is sensitive. In other words, small changes in the parameters of the hyperplane can rapidly reduce the performance of the classifier in a highly correlated space, such as the raw voxel space [4]. Now, suitable feature extraction can minimize the correlation between different categories of visual stimuli. As a result, the margin of error and stability of the final model can be increased in order to train a classifier.

4.4Performance Analysis

In this section, the performance of different methods will be evaluated for both binary and multiclass analyses. In the binary analysis, the performance of the classical binary Support Vector Machine (SVM) is represented. Indeed, this method is used in [9] in order to distinguish different categories of visual stimuli. As regularized methods that are introduced in [30] for decoding the brain patterns, the performances of L1 regularized SVM, the Elastic Net, and the Graph Net are also reported in this section. Further, the performances of component based methods are also evaluated, i.e. Principal Component Analysis (PCA) that is used in [33] for training a cognitive model and Independent Component Analysis (ICA), which is employed in [37] in order to analyze fMRI datasets. As another alternative for decoding visual stimuli, the Selected Region of Interest (ROI) method [27] is reported in this section, where the ROIs for each dataset is manually selected same as the original paper [27] and then the SVM classifier is applied to the selected ROIs in order to train a cognitive model. As the method was developed in [32], the performance of a graph-based approach is reported. In order to represent the effect of different parts of the proposed method, we also report three baselines. As the first alternatives, we utilize two component based methods, i.e. ‘PCA + Algorithm ’ and ‘ICA + Algorithm ’ that apply Algorithm ? to the features that are respectively extracted by PCA and ICA. As the last baseline, ‘APA + SVM’ applies SVM algorithm to the features that are extracted by APA. In the multiclass analysis, the performance of multiclass SVM is presented as a baseline, where this algorithm was used in [9] in order to generate the cognitive model. Further, the performance of the proposed method is compared with Multilayer Perceptron (MLP) that was introduced as a multiclass approach in [1] in order to decode the brain patterns. Selected ROI method [27] and the graph-based approach [32] are also reported as other alternatives. We have to note that a multiclass SVM is used in the Selected ROI method in order to create a multiclass cognitive model (in Table 4). All of the mentioned algorithms are implemented in the MATLAB R2016b () by authors in order to generate experimental results. Further, all evaluations are applied by using leave-one-subject-out cross validation, e.g. we have selected brain patterns of subjects in DS105 for training a classifier in each iteration and then used the patterns of the rest of the subject in order to test the generated cognitive model. It is worth noting that the number of iterations will be equal to the number of subjects ( in this example). Indeed, not only the brain patterns in training sets and testing sets are independent across subjects but also fMRI data related to each subject was separately preprocessed [23]. We have to note that the same training set and testing set are applied in each iteration to all of the evaluated methods.

Tables Table 2 and Table 3 respectively illustrate the classification Accuracy and Area under the ROC Curve (AUC) for the binary predictors based on the category of the visual stimuli. All visual stimuli in the dataset DS105 except scrambled photos are considered as the object category for generating these experimental results. As these tables depicted, SVM cannot create an acceptable performance on the raw voxels because fMRI data in the original space includes noise and sparsity. Moreover, the performances of the component-based approaches (PCA and ICA) are significantly low because they were applied to whole-brain. Indeed, these methods are suitable for ROI-based problems or rest-mode fMRI data, where they can find the best projection among the wised voxels. Another evidence for this claim is the Selected ROI method, where the performance of this method is significantly improved in comparison with the component-based approaches. In fact, this is the main reason in this paper in order to develop the automatically selected ROI method instead of just applying the component-based methods to the whole-brain data for selecting (ranking) the effective features. As also mentioned in the original paper [30], L1 regularized SVM generated better results in comparison with other regularized techniques, i.e. Elastic Net and Graph Net. As another alternative, the graph-based method that is developed by Osher et al. [32] generated acceptable performance because it also employed the anatomical features in order to create a cognitive model. Further, ‘PCA/ICA + Algorithm ’ have generated better performances in comparison with PCA/ICA methods because of the ensemble approach. Since ‘APA + SVM’ uses better representational space in contrast with the raw fMRI data, it significantly improves the performance of SVM method. Although these baselines can show that each part of the proposed method can generate better performance in comparison with the classical algorithms (PCA, ICA, and SVM), the best results will be produced when we use all parts at the same time. The last but not least, the proposed algorithm has achieved the best performance in comparison with other methods because it provided a better representation of neural activities by exploiting the anatomical structure of the human brain.

Table 4 illustrates the classification accuracy for multiclass predictors. In this table, ‘DS105’ includes different categories (classes) and ‘DS107’ contains categories of the visual stimuli. This paper also combined three datasets in two distinctive forms. ‘ABSTRACT’ includes different categories, i.e. words, objects, scrambles, consonants, and human faces, which is generated by considering all visual stimuli in the dataset DS105 except faces and scrambled photos as object category and combining them with the datasets DS107 and DS117. Indeed, this combined dataset can be considered for comparing the abstract features of visual stimuli in the human brain. As another alternative, ‘ALL’ in this table generated by combining all of the visual stimuli in the three datasets, i.e. faces, houses, cats, bottles, scissors, shoes, chairs, words, consonants, and scrambled photos. As depicted in Table 4, the accuracy of the proposed method is improved by combining three datasets, whereas the performances of other methods are significantly decreased. As mentioned before, it is the standard space registration problem in the fMRI analysis. In addition, our framework employs the extracted features from the structural regions instead of using all or a subgroup of voxels, which can increase the performance of the predictive models by decreasing noise and sparsity.

Automatically detected active regions (with Probability greater than 50%) across abstract categories of visual stimuli, which is generated by combining 3 datasets, i.e. DS105, DS107, and DS117.
Automatically detected active regions (with Probability greater than 50%) across abstract categories of visual stimuli, which is generated by combining datasets, i.e. DS105, DS107, and DS117.

5Discussions and Conclusions

Anatomical Pattern Analysis (APA) can be used by neuroscientist in order to seek most effective active voxels (regions) across abstract categories of visual stimuli in both a singular dataset and the combined datasets. Figure ? illustrates an example for these active voxels by using the ABSTRACT dataset in Table 4 that was generated by combining the visual stimuli in the datasets, i.e. DS105, DS107, and DS117. In this figure, active regions () for abstract categories (faces, scrambles, and objects) are normalized in the standard space and then the active voxels with probability greater than 50% are visualized as the automatically detected ROIs, i.e. Pr, where is the number of all sessions in the combined dataset. As this figure depicted, not only APA can generate a cognitive model in order to predict visual stimuli in the human brain but also it can automatically demonstrate activated loci in human brain across categories of visual stimuli. These activated loci can be used in order to study Specific-Exemplar (SE) recognition [27] or design an accurate brain mask (ROI) for ROI-based studies [41].

In summary, this paper proposes APA framework for decoding visual stimuli in the human brain. This framework uses an anatomical feature extraction method, which provides a normalized representation for combining homogeneous datasets. Further, a new binary imbalance AdaBoost algorithm is introduced. It can increase the performance of prediction by exploiting a supervised random sampling and the correlation between classes. In addition, this algorithm is utilized in an Error-Correcting Output Codes (ECOC) method for multiclass prediction of the brain responses. Empirical studies on visual categories clearly show the superiority of our proposed method in comparison with the voxel-based approaches. In future, we plan to apply the proposed method to different brain tasks such as low-level visual stimuli, emotion and etc.

Compliance with Ethical Standards

Conflict of Interests

Muhammad Yousefnezhad and Daoqiang Zhang declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. In: Proceedings of the Cognitive Science Society, vol. 32 (2010)
    Anderson, M., Oates, T.: A critique of multi-voxel pattern analysis.
  2. Journal of Cognitive Neuroscience 15(5), 704–717 (2003)
    Carlson, T.A., Schrater, P., He, S.: Patterns of activity in the categorical representations of objects.
  3. NeuroImage 44(1), 112–122 (2009)
    Carroll, M.K., Cecchi, G.A., Rish, I., Garg, R., Rao, A.R.: Prediction and interpretation of distributed neural activity with sparse models.
  4. In: 28th Advances in Neural Information Processing Systems (NIPS-15), pp. 460–468. Advances In Neural Information Processing Systems (NIPS), December/7–12, Montréal, Canada (2015)
    Chen, P.H., Chen, J., Yeshurun, Y., Hasson, U., Haxby, J., Ramadge, P.J.: A reduced-dimension fmri shared response model.
  5. In: 29th Workshop of Representation Learning in Artificial and Biological Neural Networks. Advances In Neural Information Processing Systems (NIPS), December/5–10, Barcelona, Spain (2016)
    Chen, P.H., Zhu, X., Zhang, H., Turek, J.S., Chen, J., Willke, T.L., Hasson, U., Ramadge, P.J.: A convolutional autoencoder for multi-subject fmri data aggregation.
  6. Brain 123(2), 291–307 (2000)
    Cohen, L., Dehaene, S., Naccache, L., Lehéricy, S., Dehaene-Lambertz, G., Hénaff, M.A., Michel, F.: The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.
  7. Connolly, A., Gobbini, M., Haxby, J.: Three virtues of similarity-based multi-voxel pattern analysis (2012)
  8. Journal of Neuroscience 32(8), 2608–2618 (2012)
    Connolly, A.C., Guntupalli, J.S., Gors, J., Hanke, M., Halchenko, Y.O., Wu, Y.C., Abdi, H., Haxby, J.V.: The representation of biological classes in the human brain.
  9. NeuroImage 19(2), 261–270 (2003)
    Cox, D.D., Savoy, R.L.: Functional magnetic resonance imaging (fmri) ‘brain reading’: detecting and classifying distributed patterns of fmri activity in human visual cortex.
  10. NeuroImage 46(4), 1018–1026 (2009)
    Duncan, K.J., Pattamadilok, C., Knierim, I., Devlin, J.T.: Consistency and variability in functional localisers.
  11. Journal of Machine Learning Research 11(Feb), 661–664 (2010)
    Escalera, S., Pujol, O., Radeva, P.: Error-correcting output codes library.
  12. In: Neuroscience Databases, pp. 237–250. Springer (2003)
    Friston, K.J.: Statistical parametric mapping.
  13. Annual Review of Neuroscience 37, 435–456 (2014)
    Haxby, J.V., Connolly, A.C., Guntupalli, J.S.: Decoding neural representational spaces using multivariate pattern analysis.
  14. Science 293(5539), 2425–2430 (2001)
    Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Distributed and overlapping representations of faces and objects in ventral temporal cortex.
  15. Neuron 72(2), 404–416 (2011)
    Haxby, J.V., Guntupalli, J.S., Connolly, A.C., Halchenko, Y.O., Conroy, B.R., Gobbini, M.I., Hanke, M., Ramadge, P.J.: A common, high-dimensional model of the representational space in human ventral temporal cortex.
  16. Nature Reviews Neuroscience 7(7), 523 (2006)
    Haynes, J.D., Rees, G.: Decoding mental states from brain activity in humans.
  17. Current Biology 17(4), 323–328 (2007)
    Haynes, J.D., Sakai, K., Rees, G., Gilbert, S., Frith, C., Passingham, R.E.: Reading hidden intentions in the human brain.
  18. Neuroimage 17(2), 825–841 (2002)
    Jenkinson, M., Bannister, P., Brady, M., Smith, S.: Improved optimization for the robust and accurate linear registration and motion correction of brain images.
  19. Nature Neuroscience 8(5), 679–685 (2005)
    Kamitani, Y., Tong, F.: Decoding the visual and subjective contents of the human brain.
  20. Journal of Neuroscience 17(11), 4302–4311 (1997)
    Kanwisher, N., McDermott, J., Chun, M.M.: The fusiform face area: a module in human extrastriate cortex specialized for face perception.
  21. Nature 452(7185), 352 (2008)
    Kay, K.N., Naselaris, T., Prenger, R.J., Gallant, J.L.: Identifying natural images from human brain activity.
  22. Frontiers in systems neuroscience 2 (2008)
    Kriegeskorte, N., Mur, M., Bandettini, P.: Representational similarity analysis–connecting the branches of systems neuroscience.
  23. Nature Neuroscience 12(5), 535–540 (2009)
    Kriegeskorte, N., Simmons, W.K., Bellgowan, P.S., Baker, C.I.: Circular analysis in systems neuroscience: the dangers of double dipping.
  24. American Journal of Ophthalmology 133(4), 598 (2002)
    Liesegang, T.J.: A cortical area selective for visual processing of the human body. downing pe, 1∗ school of psychology, centre for cognitive neuroscience, university of wales, bangor, ll57 2as, united kingdom. e-mail: p. downing@ bangor. ac. uk jiang y, shuman m, kanwisher n. science 2001; 293: 2470–2473.
  25. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2), 539–550 (2009)
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning.
  26. Proceedings of the National Academy of Sciences (PNAS) 92(18), 8135–8139 (1995)
    Malach, R., Reppas, J., Benson, R., Kwong, K., Jiang, H., Kennedy, W., Ledden, P., Brady, T., Rosen, B., Tootell, R.: Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.
  27. Brain and Cognition 93, 54–63 (2015)
    McMenamin, B.W., Deason, R.G., Steele, V.R., Koutstaal, W., Marsolek, C.J.: Separability of abstract-category and specific-exemplar visual object subsystems: Evidence from fmri pattern analysis.
  28. science 320(5880), 1191–1195 (2008)
    Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.M., Malave, V.L., Mason, R.A., Just, M.A.: Predicting human brain activity associated with the meanings of nouns.
  29. Neuron 60(5), 915–929 (2008)
    Miyawaki, Y., Uchida, H., Yamashita, O., Sato, M.a., Morito, Y., Tanabe, H.C., Sadato, N., Kamitani, Y.: Visual image reconstruction from human brain activity using a combination of multiscale local image decoders.
  30. NeuroImage 104, 163–176 (2015)
    Mohr, H., Wolfensteller, U., Frimmel, S., Ruge, H.: Sparse regularization techniques provide novel insights into outcome integration processes.
  31. Trends in Cognitive Sciences 10(9), 424–430 (2006)
    Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V.: Beyond mind-reading: multi-voxel pattern analysis of fmri data.
  32. Cerebral Cortex 26(4), 1668–1683 (2015)
    Osher, D.E., Saxe, R.R., Koldewyn, K., Gabrieli, J.D., Kanwisher, N., Saygin, Z.M.: Structural connectivity fingerprints predict cortical selectivity for multiple visual categories across cortex.
  33. Journal of Cognitive Neuroscience 17(4), 580–590 (2005)
    O’toole, A.J., Jiang, F., Abdi, H., Haxby, J.V.: Partially distributed representations of objects and faces in ventral temporal cortex.
  34. Journal of Neuroscience 34(26), 8837–8844 (2014)
    Rice, G.E., Watson, D.M., Hartley, T., Andrews, T.J.: Low-level image properties of visual objects predict patterns of neural response across category-selective regions of the ventral visual pathway.
  35. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp. 1375–1382 (2012)
    Varoquaux, G., Gramfort, A., Thirion, B.: Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering.
  36. Scientific Data 2 (2015)
    Wakeman, D.G., Henson, R.N.: A multi-subject, multi-modal human neuroimaging dataset.
  37. Frontiers in Neuroscience 7 (2013)
    Xu, J., Potenza, M.N., Calhoun, V.D.: Spatial ica reveals functional activity hidden from traditional fmri glm-based analyses.
  38. NeuroImage 42(4), 1414–1429 (2008)
    Yamashita, O., Sato, M.a., Yoshioka, T., Tong, F., Kamitani, Y.: Sparse estimation automatically selects voxels relevant for the decoding of fmri activity patterns.
  39. In: 8th International Conference on Brain Inspired Cognitive Systems (BICS’16), pp. 47–57. Springer, November/28–30, Beijing, China (2016)
    Yousefnezhad, M., Zhang, D.: Decoding visual stimuli in human brain by using anatomical pattern analysis on fmri images.
  40. In: 34th AAAI Conference on Artificial Intelligence (AAAI-17), pp. 59–65. Association for the Advancement of Artificial Intelligence (AAAI), February/4–9, San Francisco, California, USA (2017)
    Yousefnezhad, M., Zhang, D.: Local discriminant hyperalignment for multi-subject fmri data alignment.
  41. In: 17th SIAM International Conference on Data Mininig (SDM-17), pp. 54–62. Society for Industrial and Applied Mathematics (SIAM), April/27–29, Houston, Texas, USA (2017)
    Yousefnezhad, M., Zhang, D.: Multi-region neural representation: A novel model for decoding visual stimuli in human brains.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
Add comment
Cancel
Loading ...
10534
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description