Deep Semantic Architecture with discriminative feature visualization for neuroimage analysis
Neuroimaging data analysis often involves a-priori selection of data features to study the underlying neural activity. Since this could lead to sub-optimal feature selection and thereby prevent the detection of subtle patterns in neural activity, data-driven methods have recently gained popularity for optimizing neuroimaging data analysis pipelines and thereby, improving our understanding of neural mechanisms. In this context, we developed a deep convolutional architecture that can identify discriminating patterns in neuroimaging data and applied it to electroencephalography (EEG) recordings collected from 25 subjects performing a hand motor task before and after a rest period or a bout of exercise. The deep network was trained to classify subjects into exercise and control groups based on differences in their EEG signals. Subsequently, we developed a novel method termed the cue-combination for Class Activation Map (ccCAM), which enabled us to identify discriminating spatio-temporal features within definite frequency bands (23–33 Hz) and assess the effects of exercise on the brain. Additionally, the proposed architecture allowed the visualization of the differences in the propagation of underlying neural activity across the cortex between the two groups, for the first time in our knowledge. Our results demonstrate the feasibility of using deep network architectures for neuroimaging analysis in different contexts such as, for the identification of robust brain biomarkers to better characterize and potentially treat neurological disorders.
Skilled motor practice facilitates the formation of an internal model of movement, which may be later used to anticipate task specific requirements. These internal models are more susceptible to alterations during and immediately following practice and become less susceptible to alterations over time, a process called consolidation brashers1996consolidation (); robertson2004current (). A single bout of cardiovascular exercise, performed in close temporal proximity to a session of skill practice, has shown to facilitate motor memory consolidation roig2013effects (). Several potential mechanisms underlying the time-dependent effects induced by acute exercise on motor memory consolidation have been identified, such as increased availability of neurochemicals skriver2014acute () and increased cortico-spinal excitability ostadan2016changes (). However, the distinct contribution of specific brain areas and the precise neurophysiological mechanisms underlying the positive effects of acute cardiovascular exercise on motor memory consolidation remain largely unknown.
Electroencephalography (EEG) is a popular technique used to study the electrical activity from different brain areas. The EEG signal arises from synchronized postsynaptic potentials of neurons that generate electrophysiological oscillations in different frequency bands. During movement, the EEG signal power spectrum within the alpha (8–12 Hz) and beta (15–29 Hz) range decreases in amplitude and this is thought to reflect increased excitability of neurons in sensorimotor areas crone1998functional (); neuper2001event (); pfurtscheller2003spatiotemporal (); salmelin1995functional (). This phenomenon is termed Event-Related Desynchronization (ERD). Alpha- and beta-band ERD have been shown to be modulated during motor skill learning in various EEG studies boonstra2007multivariate (); houweling2008neural (); zhuang1997event (). There is converging evidence suggesting an association of cortical oscillations in the motor cortex with neuroplasticity events underlying motor memory consolidation boonstra2007multivariate (); pollok2014changes (). In this context, our aim was to study the add-on effects of exercise on motor learning in terms of modulation of EEG-based ERD.
Many neuroimaging studies, including EEG ones, rely on the a-priori selection of features from the recorded time-series. This could lead to sub-optimal feature selection and could eventually prevent the detection of subtle discriminative patterns in the data. Alternatively, data-driven approaches such as deep learning allow discovery of the optimal discriminative features in a given dataset. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been applied to computer vision and speech processing datasets krizhevsky2012imagenet (); graves2013speech (); zhang2015text (); karpathy2014large () with great success. They have also been used successfully in the neuroimaging domain to learn feature representations for Magnetic Resonance Image segmentations plis2014deep () and EEG data decoding bashivan2015learning (); schirrmeister2017deep (); thodoroff2016learning () among others. Most studies using CNNs for EEG have been restricted to the classification of EEG data segments into known categories. However, the usefulness of CNNs to improve our understanding of the underlying neural bases is less straightforward, primarily due to the difficulty into visualizing and interpreting the feature space learnt by the CNN.
Our work addresses existing caveats in applying deep learning architectures, such as CNNs, to analyzing EEG data by means of three novel contributions –
We used two parallel feature extraction streams to discover informative features from EEG data before and after an intervention and subsequently characterize the modulatory effect on these derived features rather than on the raw EEG data itself;
We incorporated a subject prediction adversary component in the network architecture to learn subject-invariant, group-related features instead of subject-specific features;
We developed a novel method, termed cue-combination for Class Activation Map (ccCAM), to visualize the features extracted by the CNN after training
We used this CNN-based deep network architecture to identify exercise-induced changes in neural activity from EEG signals recorded during an isometric motor task. The training was carried out in a hierarchical structure – first for time-frequency and then for topographical data maps. Visualizing the features after each stage of the training allowed us to identify frequency bands and the corresponding brain regions that were modulated by the add-on effects of acute exercise on motor learning.
The majority of previous related studies have leveraged large-scale datasets consisting of hundreds of subjects for training purposes, which may be a limiting factor for applying powerful deep learning methodologies to data of smaller sample size. Therefore, one of our aims was to develop a method that can be used both for small-scale and large-scale studies. To this end, we added a regularizer that prevented the feature extraction part of the CNN from learning subject-specific features, thus promoting the identification of group-specific features only.
The dataset used in this work consisted of EEG recordings from 25 healthy subjects. The experiment and data collection are detailed elsewhere dal2018acute (). Briefly, 25 right-handed healthy subjects were recruited and assigned to the Control (CON, n=13 subjects) or Exercise (EXE, n=12 subjects) groups.
Each subject reported to the laboratory on four occasions as shown in Figure 1. Visit 1 required the participants to go through a Graded Exercise Test (GXT), which was used to determine their cardiorespiratory fitness. Visit 2 was conducted at least 48 hrs after the GXT to avoid potential long-term effects of exercise on memory berchtold2005exercise (); hopkins2012differential (). EEG recordings were collected at baseline while subjects performed isometric handgrips, which corresponded to 50 repetitions of visually cued isometric handgrips with their dominant right hand using a hand clench dynamometer (Biopac, Goleta, CA, USA). Each contraction was maintained for 3.5 sec at 15% of each participant’s maximum voluntary contraction (MVC) and followed by a 3 to 5 sec rest period. The baseline assessment was followed by the practice of a visuo-motor tracking task (skill acquisition), which was used for the calculation of the motor learning score. Participants were then randomly assigned to two groups. The exercise group (EXE) performed a bout of high-intensity interval cycling of 15 min, while the control group (CON) rested on the cycle ergometer for the same amount of time. The same EEG recordings collected at baseline were repeated 30, 60 and 90 min after the exercise or rest period.EEG activity was recorded using a 64-channel ActiCap cap (BrainVision, Munich, Germany) with electrode locations arranged according to the 10–20 international system. Electrical conductive gel was inserted at each electrode site to keep impedances below 5 k. EEG signals were referenced to the FCz electrode and sampled at 2500 Hz.
The analysis pipeline was first applied to the time and frequency domain data without incorporating spatial information. Subsequently, it was applied to the data obtained by creating topographical maps corresponding to distribution of activity in specific frequency bands across the cortex. The entire pipeline consisted of 3 segments , i.e.– Preprocessing, CNN training and ccCAM generation.
3.1 Time-Frequency (TF) maps
EEG data preprocessing was similar to that performed in a previous study dal2018acute () and was performed using the Brainstorm Matlab toolbox tadel2011brainstorm (). EEG signals were bandpass-filtered between 0.5 Hz and 55 Hz and average-referenced. Continuous data were visually inspected and portions of signals with muscle or electrical transient artifacts were rejected. Independent component analysis (ICA) was subsequently applied on each dataset (total number of components: 20) and between one and two eye-blink related components were rejected based on their topography and time signatures delorme2004eeglab (). The resulting dataset was epoched with respect to the period of time (3.5 sec) corresponding to the appearance of the visual cue that triggered the initiation of the isometric handgrips (n = 50/subject). Finally, each trial was visually inspected and those containing artifacts were manually removed.
Morlet wavelet (wave number = 7) coefficients between 1 to 55 Hz with 1 Hz resolution were extracted to obtain time-frequency decompositions of the EEG data. The time-frequency data for each electrode were consequently normalized with respect to their spectral power before the start of the grip event, as calculated from a window of 0.5 sec. Following this, an average over all trials was calculated in order to obtain a single time-frequency map for each electrode. Further steps were applied on the EEG recording segment corresponding to 0.5–3.5 sec after the appearance of the visual cue, i.e. during the handgrip task, to perform the subsequent analysis.
The overall CNN architecture that we developed is shown in Figure 2. Following preprocessing of the data, time-frequency maps for each electrode and session – at baseline and 90 min after exercise or a rest period (post-intervention session) – were obtained. The data for each session was then rearranged to form 2D matrices comprising of the frequency spectra for all electrodes at a given time instant t. Each matrix had a dimension of 64 55 (64 electrodes 55 frequency bands). For training the network, a pair of matrices was used – the first corresponding to time point t from the baseline session and the second corresponding to the same time point t from the post-intervention session. Each pair was labeled as either exercise or control, depending on the group allocation. Structuring the data in this fashion allowed the network to take into account the inter-subject variability in baseline measures and therefore did not require the experimenter to adopt techniques for normalizing the EEG signal from the post-intervention session with respect to the baseline session. Thus, the network was expected to capture the EEG features that were modulated by the add-on effects of acute exercise.
Dataset Notation:- B and A represent the entire data tensor at baseline and post-intervention respectively. Each data tensor consists of data matrices from all 25 subjects and timepoints. For subject , the goal was to classify whether the tuple containing the matrices and (where denotes timepoint) belongs to the EXE or CON groups.
To this end, we used a deep convolutional network that was optimized for the task. The network architecture is similar to the one used in agrawal2015learning (). Features from matrices and were extracted using a network termed the Base CNN. The difference between the obtained feature vectors was passed to a discriminator network, termed the Top NN, to predict the correct group in which each pair belongs to. The schematic view of the architecture is shown in Figure 2 and details of each network’s architecture are provided in Tables S1 and S2 in supplementary material respectively. Since the sampling frequency was 2500 Hz and the time period of interest was 3 sec long, for each subject .
The convolutions performed in the Base CNN were with respect to the frequency domain and not the electrode (sensor) domain. This is because the former was laid out in a semantic order of increasing frequencies, as opposed to the latter, which was not arranged by the spatial locations of the electrodes. Consequently, we expected the features extracted by the Base CNN to be the frequency bands significantly affected by exercise. Therefore, all convolutional filters in the Base CNN were implemented as 2D filters, where is the extent of the filter in the frequency domain. The same holds for the Max-Pooling layers.
Initially, a network that did not include an adversary loss component (Figure S1.a from Supplementary material) was used; however, it was found that this network was able to learn subject-specific features as opposed to subject-invariant, exercise-related features. This is illustrated in Figure S2 (Supplementary material) and Table 1. In most neuroimaging studies, the number of participants scanned is limited, which typically restricts deep networks from learning subject-invariant features. To address this issue, we followed a domain adaptation approach. Specifically, each subject was considered as a separate domain comprising of subject-specific features along with subject-invariant, exercise-related features. Since our goal was to learn features mainly related to the effect of exercise on the consolidation of motor memory, we incorporated the domain confusion strategy tzeng2015simultaneous () to train the network, thus adding the subject discriminator as an adversary (Figure 2 – bottom right). Specifically, we added this network in parallel to the Top NN with similar model capacity (see Table S3 in supplementary material for architecture details).
Network Architecture Notation:- The feature extractor operation and parameters of the Base CNN are denoted as and respectively, the Top NN feature discrimination operator and its parameters are denoted by and respectively, while the subject discrimination operator and its parameters are denoted by and respectively. The input tuple is denoted by and its corresponding group and subject labels by and respectively. We used the Negative Log Likelihood (NLL) loss for each classifier with the Adam optimizer kingma2014adam () in Torch collobert2011torch7 () for training the network. The Subject Discriminator was trained to minimize the subject prediction loss given by –
The Top NN was trained to minimize the group prediction loss given by –
We trained the feature extractor, Base CNN, in a manner such that the features extracted would be agnostic to the originating subject, therefore, the target distribution for the subject prediction network was a uniform distribution. Hence, we used the domain confusion loss tzeng2015simultaneous () over the gradient reversal layer ganin2016domain () and used the Kullback-Leibler (KL) divergence from the uniform distribution over 25 classes (25 subjects) as our loss metric. Conclusively, the Base CNN was trained to minimize the loss given by –
where denotes the KL divergence between distributions & , denotes the uniform distribution, denotes the total number of training examples and is a hyperparamater that determines the weight for the subject discrimination regularizer. Here, we used a 80-20 split of the data set, whereby 80% was used for training and 20% was used for validation.
A major contribution of the present work is the development of a novel method for the visualization of the features that guide the proposed network’s decision. Although well-known techniques used in the computer vision literature include the use of Global Average Pooling (GAP) zhou2016learning () and grad-CAM selvaraju2016grad (), these methods are not suited for the neuroscience paradigm considered here. For instance, GAP requires averaging the activations of each filter map, i.e. each channel of the extracted feature tensor. This leads to loss of information related to electrode positions, as convolutions were performed only in the frequency domain. Specifically, we applied GAP and grad-CAM to our data and we were unable to obtain adequate classification accuracy (%) with a GAP layer in the network. Also, grad-CAM is sensitive to absolute scale of the features in the input data and hence yielded results that were biased towards frequency bands with higher power-values, namely the lower frequency bands (<10 Hz).
Given these limitations in existing analytic methods, we used the linear cue-combination theory used in human perception studies ernst2002humans () to develop a method that explains the network’s decisions. Let us consider for example, a CNN with only 2 channels , i.e. filter maps, in the final feature tensor extracted after convolutions. Each of these filter maps preserve the spatial and/or semantic structure of the input data. Each of these filter maps acts as a “cue” to the network’s classifier layers, denoted as and . If we denote the desired class label as and assuming and to be independent to each other, we can use Bayes’ Theorem to write –
If the likelihood for predicting due to cue is Gaussian with mean and variance , the maximum likelihood estimate (MLE) yields the combined cue, denoted by , that summarizes the important features on which the network bases its decisions. Therefore, the combined cue, , is the desired Class Activation Map (CAM).
Since the network is trained, . To calculate the values of , we used the NLL loss values. The NLL loss with a cue removed provided an estimate of the associated with that cue.
is estimated over the entire dataset as shown in Equation 7.
Using the estimated , the CAM corresponding to the correct class for each input was generated. Since in the present case corresponds to a 2D matrix, the denominator in Equation 7 was replaced by the mean-squared value of the corresponding matrix. The obtained CAMs were subsequently group-averaged to extract frequency bands that contain features characteristic to each group (CON and EXE).
3.2 Topographical maps
Topographical maps were created using the frequency bands obtained from the ccCAM corresponding to the TF-maps. The average power within each frequency band for all electrodes at time point was used to construct a matrix by projecting the average power value of each electrode to a point corresponding to the its spatial position. Since this procedure yielded a sparse matrix, cubic interpolation was used to obtain a continuous image depicting the distribution of activity within each frequency band over the entire head. A total of three such matrices were packed together to form a tensor corresponding to activity maps at times , and respectively. The entire data tensor for a given subject was created by taking non-overlapping time windows. Hence, the total number of tensors for each subject was equal to 2500.
Similar to the analysis of TF-maps, we trained a CNN-based network to classify each data tensor into the CON and EXE groups. Since the inputs were 2D image tensors here, we used 2D convolutional filters in the Base CNN (see Tables S4, S5 and S6 from supplementary material for more details). Following training, ccCAM was applied to obtain CAMs for each subject at each time instant during the task execution.
4 Results and Discussion
The results presented here illustrate the differences between the Baseline and 90 min post exercise/rest datasets. The network architecture details for each type of data (TF and Topographical) map are presented in the supplementary material, along with details regarding the chosen hyperparameters.
4.1 Time-Frequency maps
We observed that the features extracted by Base CNN, without any subject prediction regularizer, could be used to identify the subject corresponding to any given data tensor. As the subject discriminator regularization was given more weight by increasing , the Base CNN learned to extract features that were agnostic to the originating subject. However for very high values, the extracted features could not be used to discriminate the EXE and CON groups, suggesting that the Base CNN was unable to learn discriminative features. The loss values obtained post-training for four different values of are shown in Table 1. The choice of an optimal value for depends on two factors – group prediction accuracy and subject prediction accuracy. To identify subject-invariant features, we aimed to obtain an optimal value of that achieved good group prediction accuracy but poor subject prediction accuracy. Consequently, this required a good tradeoff between the two prediction accuracies.
|Group prediction loss (NLL)||Subject prediction loss (NLL)||KL divergence loss from Uniform distribtion|
According to this procedure, the model corresponding to was used for ccCAM generation. The average loss over a batch for subject prediction was around 2.6, which roughly predicted the correct subject with a confidence of . The group prediction accuracy was 99.984% (99.969% for CON and 100% for EXE). Hence the extracted features achieved excellent group prediction, while all subjects in the group were predicted with roughly equal probability (CON and EXE consisted of 13 and 12 subjects respectively). The ccCAM obtained is shown in Figure 3.
As one of the main goals of this study was to identify the frequency bands that contained significant information, we calculated the ccCAMs for all timepoints and then group-averaged (averaging across all timepoints and subjects in a group) the maps to get two 2D maps – for the CON and EXE groups. We plotted the average activation within each frequency band in each of these 2D maps to obtain the plots in Figure 3. The bold lines denote the group-mean and the shaded regions span 1 standard error over all subjects in the group. The two plots are significantly different within the band 23–33 Hz. This band lies within the wider beta-band and agrees with findings in dal2018acute () where beta-band desynchronization was found to be significantly modulated by exercise. It is important to note that the ccCAM highlights the differences between the 90min and baseline EEG recordings. The negative values in a frequency band indicate that the ERD was smaller after than before the exercise. This also agrees with findings in dal2018acute () and implies that decreased neural activity was required to perform the hand-grip task after exercise. The p-value calculated from the ccCAM outputs within this frequency band was equal to 0.021, while the corresponding p-value from the original time-frequency data tensor was equal to 0.0134. This suggests that had the band of interest in the previous study dal2018acute () been chosen to be 23–33 Hz, instead of the wider beta-band (15–29 Hz), similar, statistically significant inferences would have been drawn.
Topographical maps were created to understand the distribution of the activity within the 23–33 Hz frequency band over the cortex. After training a network to classify into the CON and EXE groups from topographical maps, a classification accuracy of 98.70% (98.94% for CON and 98.43% for EXE) was obtained for . Generating ccCAMs for the topographical maps revealed the propagation of the discriminative activity across the cortex. A video showing this traveling property of this activity is included in the supplementary material. Some snapshots from the video are shown in Figure 4.
To the best of our knowledge, this traveling pattern of activity across the cortex while performing an isometric handgrip has not been demonstrated before. These oscillations could allow us to visualize the neural mechanisms involved in maintaining a constant grip-force output. Further investigation into the correlation of these activities with the observed error signal while performing the task is required to understand these mechanisms more precisely. As expected, differences in the ccCAM of EXE group before and after exercise were higher in magnitude as compared to those in the CON group (see Figure S4 in supplementary material), thus indicating the modulatory effects of an acute bout of high-intensity exercise.
This work introduces a deep learning architecture for the analysis of EEG data and shows promising results in terms of discriminating the participants that underwent an acute bout of high-intensity exercise/rest in close temporal proximity to performing a motor learning task. Importantly, the proposed novel method enabled us to visualize the features learnt by deep networks such as CNNs, which may in turn yield better interpretation of their classification basis. The results are in general agreement with those reported in a previous study using more standard statistical analysis for a-priori selected features on the same dataset dal2018acute (), with our analysis revealing a narrower, more-specific frequency band associated with exercise-induced changes. In addition, our method revealed, for the first time, the traveling pattern of cortical activity while subjects were performing isometric handgrips. Therefore, our approach demonstrates scope of identifying discriminative features in a completely data-driven manner. The proposed method is not restricted to the EEG modality and dataset described here. Hence, it paves the way for applying equivalent deep learning methods to datasets obtained from neuroimaging studies of differing scales and varying modalities (eg. magnetoencephalography – MEG). This, in turn, yields great potential to accelerate research oriented towards identification of neurophysiological changes associated with various neurological disorders and ultimately lead to design of optimized and individualized intervention strategies.
- P. Agrawal, J. Carreira, and J. Malik. Learning to see by moving. In Computer Vision (ICCV), 2015 IEEE International Conference on, pages 37–45. IEEE, 2015.
- P. Bashivan, I. Rish, M. Yeasin, and N. Codella. Learning representations from eeg with deep recurrent-convolutional neural networks. arXiv preprint arXiv:1511.06448, 2015.
- N. Berchtold, G. Chinn, M. Chou, J. Kesslak, and C. Cotman. Exercise primes a molecular memory for brain-derived neurotrophic factor protein induction in the rat hippocampus. Neuroscience, 133(3):853–861, 2005.
- T. W. Boonstra, A. Daffertshofer, M. Breakspear, and P. J. Beek. Multivariate time–frequency analysis of electromagnetic brain activity during bimanual motor learning. Neuroimage, 36(2):370–377, 2007.
- T. Brashers-Krug, R. Shadmehr, and E. Bizzi. Consolidation in human motor memory. Nature, 382(6588):252, 1996.
- R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop, number EPFL-CONF-192376, 2011.
- N. E. Crone, D. L. Miglioretti, B. Gordon, J. M. Sieracki, M. T. Wilson, S. Uematsu, and R. P. Lesser. Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. i. alpha and beta event-related desynchronization. Brain: a journal of neurology, 121(12):2271–2299, 1998.
- F. Dal Maso, B. Desormeau, M.-H. Boudrias, and M. Roig. Acute cardiovascular exercise promotes functional changes in cortico-motor networks during the early stages of motor memory consolidation. NeuroImage, 174:380–392, 2018.
- A. Delorme and S. Makeig. Eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis. Journal of neuroscience methods, 134(1):9–21, 2004.
- M. O. Ernst and M. S. Banks. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870):429, 2002.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
- A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
- M. E. Hopkins, F. C. Davis, M. R. VanTieghem, P. J. Whalen, and D. J. Bucci. Differential effects of acute and regular physical exercise on cognition and affect. Neuroscience, 215:59–68, 2012.
- S. Houweling, A. Daffertshofer, B. W. van Dijk, and P. J. Beek. Neural changes induced by learning a challenging perceptual-motor task. Neuroimage, 41(4):1395–1407, 2008.
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- C. Neuper and G. Pfurtscheller. Event-related dynamics of cortical rhythms: frequency-specific features and functional correlates. International journal of psychophysiology, 43(1):41–58, 2001.
- F. Ostadan, C. Centeno, J.-F. Daloze, M. Frenn, J. Lundbye-Jensen, and M. Roig. Changes in corticospinal excitability during consolidation predict acute exercise-induced off-line gains in procedural memory. Neurobiology of learning and memory, 136:196–203, 2016.
- G. Pfurtscheller, B. Graimann, J. E. Huggins, S. P. Levine, and L. A. Schuh. Spatiotemporal patterns of beta desynchronization and gamma synchronization in corticographic data during self-paced movement. Clinical neurophysiology, 114(7):1226–1236, 2003.
- S. M. Plis, D. R. Hjelm, R. Salakhutdinov, E. A. Allen, H. J. Bockholt, J. D. Long, H. J. Johnson, J. S. Paulsen, J. A. Turner, and V. D. Calhoun. Deep learning for neuroimaging: a validation study. Frontiers in neuroscience, 8:229, 2014.
- B. Pollok, D. Latz, V. Krause, M. Butz, and A. Schnitzler. Changes of motor-cortical oscillations associated with motor learning. Neuroscience, 275:47–53, 2014.
- E. M. Robertson, A. Pascual-Leone, and R. C. Miall. Current concepts in procedural consolidation. Nature Reviews Neuroscience, 5(7):576, 2004.
- M. Roig, S. Nordbrandt, S. S. Geertsen, and J. B. Nielsen. The effects of cardiovascular exercise on human memory: a review with meta-analysis. Neuroscience & Biobehavioral Reviews, 37(8):1645–1666, 2013.
- R. Salmelin, M. Hámáaláinen, M. Kajola, and R. Hari. Functional segregation of movement-related rhythmic activity in the human brain. Neuroimage, 2(4):237–243, 1995.
- R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball. Deep learning with convolutional neural networks for eeg decoding and visualization. Human brain mapping, 38(11):5391–5420, 2017.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. See https://arxiv. org/abs/1610.02391 v3, 7(8), 2016.
- K. Skriver, M. Roig, J. Lundbye-Jensen, J. Pingel, J. W. Helge, B. Kiens, and J. B. Nielsen. Acute exercise improves motor memory: exploring potential biomarkers. Neurobiology of learning and memory, 116:46–58, 2014.
- F. Tadel, S. Baillet, J. C. Mosher, D. Pantazis, and R. M. Leahy. Brainstorm: a user-friendly application for meg/eeg analysis. Computational intelligence and neuroscience, 2011:8, 2011.
- P. Thodoroff, J. Pineau, and A. Lim. Learning robust features using deep learning for automatic seizure detection. In Machine Learning for Healthcare Conference, pages 178–190, 2016.
- E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko. Simultaneous deep transfer across domains and tasks. In Computer Vision (ICCV), 2015 IEEE International Conference on, pages 4068–4076. IEEE, 2015.
- X. Zhang and Y. LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pages 2921–2929. IEEE, 2016.
- P. Zhuang, C. Toro, J. Grafman, P. Manganotti, L. Leocani, and M. Hallett. Event-related desynchronization (erd) in the alpha frequency during development of implicit and explicit learning. Electroencephalography and clinical neurophysiology, 102(4):374–381, 1997.