Capsule Attention for Multimodal EEG and EOG Spatiotemporal Representation Learning with Application to Driver Vigilance Estimation

Capsule Attention for Multimodal EEG and EOG Spatiotemporal Representation Learning with Application to Driver Vigilance Estimation


Driver vigilance estimation is an important task for transportation safety. Wearable and portable brain-computer interface devices provide a powerful means for real-time monitoring of the vigilance level of drivers, thus help with avoiding distracted or impaired driving. In this paper, we propose a novel multimodal architecture for in-vehicle vigilance estimation from Electroencephalogram and Electrooculogram. However, most current works in the area lack an effective framework for learning the part-whole relationships within the data and learning useful spatiotemporal representations. To tackle this problem and other issues associated with multimodal biological signal analysis, we propose an architecture composed of a capsule attention mechanism following a deep Long Short-Term Memory (LSTM) network. Our model learns both temporal and hierarchical/spatial dependencies in the data through the LSTM and capsule feature representation layers. To better explore the discriminative ability of the learned representations, we study the effect of the proposed capsule attention mechanism including the number of dynamic routing iterations as well as other parameters. Experiments show the robustness of our method by outperforming other solutions and baseline techniques, setting a new state-of-the-art.

1 Introduction

Recent advances in driver monitoring using modern sensing technologies have the potential to reduce the number of driving accidents, especially those occurring due to driver fatigue, distraction, and influence of illegal substances. Accordingly, recent studies have tackled the notion of measuring and monitoring driver awareness, also referred to as vigilance [27]. For example, in recent years, wireless and wearable devices have been used to collect signals such as Electroencephalogram (EEG) and Electrooculogram (EOG) for estimation of driver alertness [15, 13, 16].

In general, EEG, which captures brain activity recorded from the scalp, is influenced by factors such as fatigue and alertness during different activities such as driving [20]. Similarly, EOG which collects the potentials between the front and back of human eyes, notably cornea and retina, and is recorded from the forehead [16], contains information regarding vigilance and eye movements (e.g., blinking and saccade) [7]. The fusion of EEG and EOG (multimodal) has subsequently been utilized for analysis of vigilance, showing clear advantages over EEG and EOG alone [27]. Due to the difficulty of multimodal spatiotemporal learning, many studies in the field formulate the problem as classification, sometimes even as a binary problem. Nonetheless, we believe the more challenging approach of formulating the problem as regression is more suitable for continuous and higher resolution tracking and application in real systems [22].

Despite the viability of utilizing EEG and EOG for in-vehicle vigilance estimation, this task remains a challenging one due to a number of open problems:

  • Much like other biological signals, EEG and EOG are often contaminated by environmental artifacts and noise. Moreover, EEG and EOG are susceptible to artifacts caused by motion and muscle activity such as jaw motion, frowning, and others, making their interpretation particularly challenging [2].

  • EEG recordings suffer from the inherent issue of lack of control on subjects’ thoughts and mental activity, unlike videos and images where physical activity and protocols can be highly controlled [3].

  • Multimodal analysis of biological signals is very difficult since identifying the complementary and contradicting information in the available signals is a challenging task. Furthermore, lack of ideal inter and intra-modality synchronization is another challenge often associated with multimodal signal analysis [14].

Figure 1: The overview of the experiment work-flow is presented.

We believe the solution to the problems mentioned above lies in an architecture capable of learning the temporal relationships followed by the ability to focus on certain sections within the learned representations in order to selectively attend to different parts of the data given the redundant, complementary, uncertain, or noisy information. As a result, in this paper, in order to perform driver monitoring through vigilance estimation, we propose a novel solution that first encodes the temporal information from the multimodal EEG-EOG data through a deep LSTM network, and then learns the hierarchical dependencies and part-whole relationships in the learned representations through a capsule attention mechanism. We compare our proposed model to a number of other works including past published methods and our own baselines. We illustrate that our model significantly outperforms the state-of-the-art solutions in both intra-subject and cross-subject validation schemes, with lower Root Mean Square Error (RMSE) and higher Pearson Correlation Coefficient (PCC). An overview of the system is illustrated in Figure 1.

2 Related Work

EEG-EOG Vigilance Estimation

Several conventional machine learning solutions have been proposed for driving vigilance evaluation. For example, Support Vector Regression (SVR) is employed for EEG, EOG, and multimodal EEG and EOG respectively, demonstrating that EEG and EOG have complementary information for vigilance estimation [27]. Two probabilistic models notably Continuous Conditional Random Field (CCRF) and Continuous Conditional Neural Field (CCNF) are employed for multimodal vigilance estimation [27]. The superiority of multimodal vigilance estimation is confirmed using Graph-regularized Extreme Learning Machine (GELM), achieving better performance with multimodal EEG and EOG compared to individual EEG and EOG [10].

Several deep learning networks have also been used in vigilance estimation. \citeauthordu2017detecting \shortcitedu2017detecting employ a multimodal deep autoencoder. \citeauthorzhang2016continuous \shortcitezhang2016continuous use an LSTM network, reporting a considerable improvement using feature fusion over single-mode EEG and EOG. \citeauthorwu2018regression \shortcitewu2018regression utilize Double-layered Neural Network with Subnetwork Nodes (DNNSN) along with multimodal feature selection using an autoencorder, and obtain impressive results. \citeauthorli2018multimodal \shortciteli2018multimodal employ two domain adaption networks, notably Domain-Adversarial Neural Network (DANN) and Adversarial Discriminative Domain Adaptation (ADDA) with feature fusion.

Soft Attention

Architectures based on LSTM with Soft Attention (SoftAtt) mechanisms were recently proposed for Natural Language Processing (NLP) [21] and have since been used for other applications, including EEG analysis [24]. This mechanism results in better feature representation learning by assigning learned weights to LSTM cell outputs.

Capsule Attention

Capsule network was proposed by \citeauthorsabour2017dynamic in and has shown strong characteristics in learning hierarchical relationships in the input data, outperforming other deep learning architectures in a number of applications such as facial expression recognition [9] and infrared facial image recognition [19]. These networks were proposed to capture important high-level information by learning part-whole relationships using capsules (group of neurons) with dynamic routing to overcome a number of limitations in CNNs and RNNs [18]. While capsule networks can be used on their own for learning, in this paper, we use it as a form of attention mechanism successive to a deep LSTM network. Capsule attention employs routing by agreement to enable the lower level capsules to learn what needs to be paid attention to given the feedback from higher level capsules. Lower level capsules will then route to the higher level capsules by similarity agreement. This concept has been very recently proposed for state-of-the-art NLP relation extraction [26] and visual question answering [28].

Figure 2: The architecture of our proposed method is presented.

3 Proposed Architecture

3.1 Problem Setup

Suppose denote the set of input data and labels, where and denote the sample and subject indices respectively. is the number of samples belonging to each subject and is the total number of subjects. Due to the biological differences among subjects and even the same subject at different times, biological signals especially EEG and EOG, are very subject- and session-dependant [23]. This phenomenon has resulted in the adoption of distinct intra- and cross-subject validation schemes:

i) Intra-subject scheme: In this validation scheme, we equally split into number of folds. For the th iteration for the th subject () we set , where is the iteration index, and .

ii) Cross-subject scheme: In this validation scheme, for the th experiment (), we have and : .

3.2 Solution Overview

We design our model with the aim of learning spatiotemporal dependencies and discriminative information from the multimodal data. To achieve this, an LSTM is first used to learn the temporal dependencies in the data. Next, to deal with the inherent challenges in multimodal biological data as described earlier in the Introduction (e.g. complementary or contradictory information, lack of control on subject mental activity, and others), we propose the use of capsule attention to learn the part-whole hierarchical relationships in the representations received from the LSTM outputs. This section describes our model, which consists of five layers, namely, input representation layer, LSTM layer, lower level capsule layer, higher level capsule layer, and regression layer, as illustrated in Figure 2. Our proposed architecture allows for the temporal representation learned by the LSTM to then be further learned for part-whole hierarchical spatial relationships by the capsule attention through dynamic routing. Thus, capsule attention allows the model to learn which temporal representations to pay more attention to, given the uncertainties in the data aforementioned in Introduction.

3.3 Input Representation Layer

This layer encodes the input bio-signals as extracted fused features with three steps, namely data pre-processing, feature extraction, and feature fusion.

Data Pre-processing

Both EEG and EOG are first downsampled to Hz, followed by a notch filter removing Hz power line interference and a band-pass filter with a frequency range of Hz minimizing artifacts such as noise [27]. Min-max normalization of the signal amplitudes is employed to re-scale the biological time-series for each subject to , thus minimizing the differences in signal amplitudes across different subjects and signals.

Feature Extraction

EEG signals are divided into non-overlapping second segments, where Short-time Fourier Transform (STFT) is used to calculate time-frequency features from second windows with overlap using a Hanning window. The log of the Power Spectral Density (PSD) and Differential Entropy (DE) are calculated on the STFT outputs with a Hz resolution starting from Hz [27]. The PSD is calculated based on , and due to the Gaussian distribution of the signals, DE is calculated using where .

For EOG, we extract time-domain features, namely mean, variance, maximum, minimum, and power during blinking, saccade, and fixation from the EOG channels as described in [27].

Feature Fusion

EEG and EOG features are fused as follows: : , where denotes the th feature sample and is the total number of feature samples for each subject. Accordingly, we have , where and are the number of channels of the EEG and EOG signals respectively.

3.4 Long Short-Term Memory Layer

Our LSTM network [8] employs a number of cells, the outputs of which are modified through the network by past information. Long-term dependencies are kept through the cells along the LSTM sequence using the common cell state. An input gate and a forget gate control the information flow and determine if the previous state needs to be forgotten or if the current state needs to be updated based on the latest inputs. An output gate computes the output based on updated information from the cell state.

3.5 Feature Representation Layer

This layer employs a lower level capsule layer and a higher level capsule layer to capture and cluster the representation of lower level features and higher level features with dynamic routing.

Lower Level Capsule Layer

The output from each of the LSTM cells with hidden units is first reshaped as , where and define the grid of capsules. Then we split the LSTM cells into channels of dimensional capsules (), and within each, a convolution operation with an kernel and stride of is employed. Accordingly, we produce capsules where each contain a dimensional vector. Thus, each lower level capsule is represented as .

Higher Level Capsule Layer

This layer consists of a matrix where is the number of higher level capsules and is the dimension of each higher level capsule .

Dynamic Routing

The length of the higher level capsule output can be considered as the probability of existence of that higher level representation. Therefore, a non-linear squashing function is employed to normalize into the range of while the direction of remains unchanged. The squashing operation is performed as: , where is a weighted sum of representing a prediction vector from lower level capsule to higher level capsule based on , where is calculated by the multiplication of a weight matrix and a lower level capsule output , where the size of is . Therefore is defined as .

Coupling coefficients between a lower level capsule and all the higher level capsules denote the probability of capsule being coupled to capsule , where is calculated using a softmax function for logit . Then, are the log prior probabilities and is therefore summed to by . Dynamic routing performs based on routing-by-agreement between and . The feature representation layer employs a dynamic routing algorithm to update zero initialized , by evaluating consistency between and with an inner product . Then is updated to a higher value if and have a strong agreement. Otherwise, a lower value is assigned to . To learn the part-whole relationships, Algorithm 1 is used, where denotes the set of capsules in layer .

1:procedure Routing(, , )
2:       Log prior probability initialization:
3:       for r iterations do
4:             for all capsule  do
6:             end for
7:             for all capsule  do
10:             end for
11:             for all capsule , capsule  do
13:             end for
14:       end for
15:end procedure
Algorithm 1 Dynamic Routing Algorithm

3.6 Regression Layer

This layer contains a fully connected layer with a Tanh activation to ensure that the network predictions cover the range of recorded vigilance scores.

4 Experiment Setup

In order to evaluate the performance of our proposed solution for multimodal vigilance estimation, we conduct the following experiments.

4.1 Dataset

SEED-VIG is a large dataset for vigilance estimation where the data were collected from subjects [27]. Both EEG and EOG were collected using ESI Neuroscan system1 with a sampling rate of Hz. EEG channels were recorded from the temporal and posterior brain regions and EOG channels were collected from the forehead. Subjects were required to drive the simulated car in a virtual environment for around minutes. Most of the subjects were asked to perform the simulation after lunch to increase the possibility of fatigue [27, 6]. SMI eye-tracking glasses2 were used to record several eye movements including blinks, eyes closures (CLOS), saccade, and fixation. Accordingly, vigilance score, PERCLOS [4], is calculated as the percentage of blinks plus CLOS over the total duration of these four activities, described as .

4.2 Implementation Details

In our experiments, in order to solve the problem of different ranges and distribution of fused features, we employ a batch normalization layer [11] followed by a Leaky ReLu [17] activation layer before each LSTM layer and lower level capsule layer, thus normalizing, re-scaling, and shifting the fused features. Batch normalization is not employed after the lower level capsule layer due to its negative effect on the squashing function. We employ Mean Square Error (MSE) as the loss function and Adam optimizer [12] to help minimize the loss. We use the default values of Adam optimizer [12] and Batch normalization layers [11] to efficiently train our proposed model. We empirically tune the hyper-parameters of the network to achieve the best performance. The list of hyper-parameter settings is presented in Table 1. The pipeline is implemented using TensorFlow [1] on a pair of NVIDIA RTX 2080Ti GPUs.

Layers Parameters Value
Model Batch size
Training epochs
LSTM Recurrent depth
Hidden layer units M
No. of cells L
Leaky ReLu Slope
Lower Level Caps Kernel size e
Stride g
No. of channels C
Dimension size d
Caps channel grid
Higher Level Caps No. of representations K
Dimension size H
Dynamic Routing Routing iterations r
Regression Activation Tanh
Table 1: Training Hyper-Parameters

4.3 Evaluation Method

To evaluate the performance of our regression method, the following two metrics are utilized similar to other works in the area: and , where is the vector of output PERCLOS labels and is the vector of predicted labels for all the samples. and are the ground truth and prediction ratings for sample , and and are the mean ground truth and predicted ratings for all the samples.

We use both intra-subject and cross-subject validation schemes to evaluate the model performance in detail. We follow the same protocol as the works mentioned in the Related Works. We employ -fold cross validation method for the intra-subject scheme, where data for each subject are randomly shuffled before being divided into folds. No overlap exists between the testing and training data. To perform cross-subject validation, we employ Leave-One-Subject-Out (LOSO) cross validation, where the data from subjects are used for training, and the remaining subject is used for testing. LOSO validation is critical in examining the subject-dependency of our method.

4.4 Comparison

State-of-the-art methods

As described in the Related Works, a number of solutions have been proposed for this dataset. Here, we further describe the state-of-the-art solutions in both intra-subject and cross-subject validation scenarios:

  • \citeauthor

    huo2016driving \shortcitehuo2016driving employ GELM by integrating graph regularization to ELM, thus establishing adjacent graph and constrain output weights by learning the similarity among the sample outputs and its nearest neighbors. Two fusion methods are proposed in order to achieve better performance in intra-subject validation. The feature level fusion helps the GELM model achieve the best performance.

  • \citeauthor

    li2018multimodal \shortciteli2018multimodal propose two multimodal domain adaption networks notably DANN and ADDA based on feature fusion of EEG and EOG, optimizing transfer from data into the feature space. Both DANN and ADDA employed adversarial training to minimize prediction loss by eliminating domain shift between the source (training set) and target (testing set) domains. Feature level fusion is applied to obtain the best results for cross-subject estimation of vigilance scores.

Baseline models

In addition to the methods published in the literature, we also implement four models for further benchmarking our proposed architecture. First, we utilize a 2D CNN with convolutional layers and , , and feature maps for the first, second, and third layers respectively. Each layer uses kernels and a stride of with ReLu activation. Second, we utilize stacked LSTM layers, where each layer has cells and units. Third, we implement a cascade convolutional recurrent neural network (CNN-LSTM) by reproducing the same method used in [23]. And lastly, we implement a capsule attention with the same parameters as our proposed model. For all these baseline methods, we implement a fully connected layer followed with a Tanh activation function in order to perform the regression task. The parameters of all the baseline methods are tuned empirically to achieve the best results. Our implementation details and hyper-parameters in the baseline LSTM architecture (e.g. output layer activation function, number of LSTM units, optimizers, and training epochs) are different from [25]. Moreover, instead of dropout [25], we employed batch normalization followed by a Leaky ReLu, which significantly improved the results.

[27] SVR
[27] CCRF
[27] CCNF
[5] DAE
[10] GELM
[25] LSTM
[22] DNNSN
Ours (baseline) CNN
Ours (baseline) LSTM
Ours (baseline) CNN-LSTM
Ours (baseline) CapsNet
Ours LSTM-CapsAtt
Table 2: The performance of our proposed model in comparison to different solutions using intra-subject validation.
[14] DANN
[14] ADDA
Ours (baseline) CNN
Ours (baseline) LSTM
Ours (baseline) CNN-LSTM
Ours (baseline) CapsNet
Ours LSTM-CapsAtt
Table 3: The performance of our proposed model in comparison to different solutions using cross-subject validation.

5 Results

In this section, we present the results of our proposed architecture and compare the performance to other published solutions, as well as the baseline methods, in both intra-subject and cross-subject schemes. Additionally, we investigate the effects of variations in the model architecture, routing iterations, and different attention mechanisms.


Tables 2 and 3 present the performance of our proposed architecture in comparison to the other aforementioned methods for both validation scenarios. The evaluation metrics RMSE and PCC listed in the tables are achieved using multimodal EEG and EOG. It is observed that the LSTM-CapsAtt model achieves state-of-the-art results by outperforming both previous solutions and baseline methods, based on both RMSE and PCC values. This confirms that the obtained embeddings in the high-level capsule layer (see Figure 2) are informative for multimodal vigilance estimation. Since the improvement in the cross-subject validation scheme is larger than in intra-subject validation, it can be concluded that the representations obtained through our capsule attention mechanism are more discriminative for learning high-level subject-independant attributes, contributing to the more difficult task of cross-subject validation.

Effect of LSTM Architecture

Here, we evaluate the effect of several important parameters, notably the number of stacked LSTM layers and activation function used for the regression layer on the results. The performances are outlined in Table 4 and Table 5 for intra-subject and cross-subject validations respectively. The results show that three stacked LSTM layers helps our model achieve the best results in both validation scenarios. Activation functions also play a critical role in the regression model, where Tanh outperforms Sigmoid and ReLu activation functions for the proposed model in all the scenarios with different stacked LSTM layers. ReLu performs poorly in the proposed model mainly due to the lack of constraint on the model output.

1-layer LSTM-CapsAtt (ReLu)
2-layer LSTM-CapsAtt (ReLu)
3-layer LSTM-CapsAtt (ReLu)
4-layer LSTM-CapsAtt (ReLu)
5-layer LSTM-CapsAtt (ReLu)
1-layer LSTM-CapsAtt (Sigmoid)
2-layer LSTM-CapsAtt (Sigmoid)
3-layer LSTM-CapsAtt (Sigmoid)
4-layer LSTM-CapsAtt (Sigmoid)
5-layer LSTM-CapsAtt (Sigmoid)
1-layer LSTM-CapsAtt (Tanh)
2-layer LSTM-CapsAtt (Tanh)
3-layer LSTM-CapsAtt (Tanh)
4-layer LSTM-CapsAtt (Tanh)
5-layer LSTM-CapsAtt (Tanh)
Table 4: Comparison of our proposed model with different variants using intra-subject validation.
1-layer LSTM-CapsAtt (ReLu)
2-layer LSTM-CapsAtt (ReLu)
3-layer LSTM-CapsAtt (ReLu)
4-layer LSTM-CapsAtt (ReLu)
5-layer LSTM-CapsAtt (ReLu)
1-layer LSTM-CapsAtt (Sigmoid)
2-layer LSTM-CapsAtt (Sigmoid)
3-layer LSTM-CapsAtt (Sigmoid)
4-layer LSTM-CapsAtt (Sigmoid)
5-layer LSTM-CapsAtt (Sigmoid)
1-layer LSTM-CapsAtt (Tanh)
2-layer LSTM-CapsAtt (Tanh)
3-layer LSTM-CapsAtt (Tanh)
4-layer LSTM-CapsAtt (Tanh)
5-layer LSTM-CapsAtt (Tanh)

Table 5: Comparison of our proposed model with different variants using cross-subject validation.

Effect of Routing Iterations

To investigate the effect of routing iterations on our proposed model, we conduct experiments with different numbers of routing iterations using both validation scenarios. Figure 3 shows the calculated MSE loss of the model for 30 training epochs. The model achieves the best results with 3 iterations, showing fast convergence of the dynamic routing algorithm in conformity with [18].

Figure 3: Effect of routing iterations.

Effect of Attention Mechanisms

We evaluate different attention mechanisms in comparison to our proposed model. To this end, we employ LSTM-CNN and LSTM-SoftAtt architectures using the same LSTM settings. The LSTM-CNN architecture employs CNN based attention with the same parameters as the baseline CNN model and the LSTM-SoftAtt model employs the same soft attention mechanism as described in [21]. All the above-mentioned models have the same fully-connected layer with Tanh activation. These settings were selected to maximize performance. As shown in Tables 6 and 7, our approach outperforms the other solutions by achieving the best RMSE and PCC values in both validation scenarios.

LSTM-CapsAtt (ours)
Table 6: Comparison of our model with other attention mechanisms using intra-subject validation.
LSTM-CapsAtt (ours)
Table 7: Comparison of our model with other attention mechanisms using cross-subject validation.

6 Conclusions

To the best of our knowledge, this is the first time that an LSTM-CapsAtt architecture is used for bio-signals. In this paper, we propose a novel multimodal approach based on this architecture for in-vehicle vigilance estimation using EEG and EOG. This model extracts lower level hierarchical information using a lower level capsule layer and further captures and clusters these representations with a higher level capsule layer, where part-whole relationships in the features are explored using dynamic routing. The experiments show the generalizability of our model by achieving state-of-the-art results in both intra-subject and cross-subject validation scenarios. The results confirm the impact of capsule attention on multimodal spatiotemporal representation learning, in this case, in the context of learning EEG and EOG for in-vehicle driver vigilance estimation. Our proposed architecture is capable of dealing with uncertainties such as lack of control over participants, biological differences, noise, and contradicting information between modalities by learning the hierarchical information in the learned temporal dependencies.




  1. M. Abadi (2016) Tensorflow: a system for large-scale machine learning. In 12th Symposium on Operating Systems Design and Implementation, pp. 265–283. Cited by: §4.2.
  2. M. Chaumon, D. V. Bishop and N. A. Busch (2015) A practical guide to the selection of independent components of the electroencephalogram for artifact correction. Journal of Neuroscience Methods 250, pp. 47–63. Cited by: 1st item.
  3. E. A. Curran and M. J. Stokes (2003) Learning to control brain activity: a review of the production and control of eeg components for driving brain-computer interface (bci) systems. Brain and Cognition 51 (3), pp. 326–336. Cited by: 2nd item.
  4. D. F. Dinges and R. Grace (1998) PERCLOS: a valid psychophysiological measure of alertness as assessed by psychomotor vigilance. US Department of Transportation, Federal Highway Administration, Publication Number FHWA-MCRT-98-006. Cited by: §4.1.
  5. L. Du, W. Liu, W. Zheng and B. Lu (2017) Detecting driving fatigue with multimodal deep learning. In 8th International IEEE/EMBS Conference on Neural Engineering, pp. 74–77. Cited by: Table 2.
  6. M. Ferrara and L. De Gennaro (2001) How much sleep do we need?. Sleep Medicine Reviews 5 (2), pp. 155–179. Cited by: §4.1.
  7. N. Galley (1993) The evaluation of the electrooculogram as a psychophysiological measuring instrument in the driver study of driver behaviour. Ergonomics 36 (9), pp. 1063–1070. Cited by: §1.
  8. K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber (2016) LSTM: a search space odyssey. IEEE Trans. on Neural Networks and Learning Systems 28 (10), pp. 2222–2232. Cited by: §3.4.
  9. S. Hosseini and N. I. Cho (2019) GF-capsnet: using gabor jet and capsule networks for facial age, gender, and expression recognition. In 14th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1–8. Cited by: §2.0.3.
  10. X. Huo, W. Zheng and B. Lu (2016) Driving fatigue detection with fusion of eeg and forehead eog. In IEEE International Joint Conference on Neural Networks, pp. 897–904. Cited by: §2.0.1, Table 2.
  11. S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §4.2.
  12. D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  13. G. S. Larue, A. Rakotonirainy and A. N. Pettitt (2011) Driving performance impairments due to hypovigilance on monotonous roads. Accident Analysis & Prevention 43 (6), pp. 2037–2046. Cited by: §1.
  14. H. Li, W. Zheng and B. Lu (2018) Multimodal vigilance estimation with adversarial domain adaptation networks. In IEEE International Joint Conference on Neural Networks, pp. 1–6. Cited by: 3rd item, Table 3.
  15. C. Lin, C. Chuang, C. Huang, S. Tsai, S. Lu, Y. Chen and L. Ko (2014) Wireless and wearable eeg system for evaluating driver vigilance. IEEE Trans. on Biomedical Circuits and Systems 8 (2), pp. 165–176. Cited by: §1.
  16. J. Ma, L. Shi and B. Lu (2014) An eog-based vigilance estimation method applied for driver fatigue detection. Neuroscience and Biomedical Engineering 2 (1), pp. 41–51. Cited by: §1, §1.
  17. A. L. Maas, A. Y. Hannun and A. Y. Ng (2013) Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning, Vol. 30, pp. 3. Cited by: §4.2.
  18. S. Sabour, N. Frosst and G. E. Hinton (2017) Dynamic routing between capsules. In Advances in Neural Information Processing Systems, pp. 3856–3866. Cited by: §2.0.3, §5.0.3.
  19. A. Vinay, A. Gupta, A. Bharadwaj, A. Srinivasan, a. Murthy and S. Natarajan (2018) Optimal search space strategy for infrared facial image recognition using capsule networks. In International Conference on Recent Trends in Image Processing and Pattern Recognition, pp. 454–465. Cited by: §2.0.3.
  20. H. Wang, C. Zhang, T. Shi, F. Wang and S. Ma (2015) Real-time eeg-based detection of fatigue driving danger for accident prediction. International Journal of Neural Systems 25 (02), pp. 1550002. Cited by: §1.
  21. Y. Wang, M. Huang and L. Zhao (2016) Attention-based lstm for aspect-level sentiment classification. In Proceedings of The 2016 Conference on Empirical Methods In Natural Language Processing, pp. 606–615. Cited by: §2.0.2, §5.0.4.
  22. W. Wu, Q. J. Wu, W. Sun, Y. Yang, X. Yuan, W. Zheng and B. Lu (2018) A regression method with subnetwork neurons for vigilance estimation using eog and eeg. IEEE Trans. on Cognitive and Developmental Systems. Cited by: §1, Table 2.
  23. D. Zhang, L. Yao, X. Zhang, S. Wang, W. Chen, R. Boots and B. Benatallah (2018) Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. In AAAI Conference on Artificial Intelligence, pp. 1703–1710. Cited by: §3.1, §4.4.2.
  24. G. Zhang, V. Davoodnia, A. Sepas-Moghaddam, Y. Zhang and A. Etemad (2019) Classification of hand movements from eeg using a deep attention-based lstm network. arXiv preprint arXiv:1908.02252. Cited by: §2.0.2.
  25. N. Zhang, W. Zheng, W. Liu and B. Lu (2016) Continuous vigilance estimation using lstm neural networks. In International Conference on Neural Information Processing, pp. 530–537. Cited by: §4.4.2, Table 2.
  26. N. Zhang, S. Deng, Z. Sun, X. Chen, W. Zhang and H. Chen (2018) Attention-based capsule networks with dynamic routing for relation extraction. arXiv preprint arXiv:1812.11321. Cited by: §2.0.3.
  27. W. Zheng and B. Lu (2017) A multimodal approach to estimating vigilance using eeg and forehead eog. Journal of Neural Engineering 14 (2), pp. 026017. Cited by: §1, §1, §2.0.1, §3.3.1, §3.3.2, §3.3.2, §4.1, Table 2.
  28. Y. Zhou, R. Ji, J. Su, X. Sun and W. Chen (2019) Dynamic capsule attention for visual question answering. In AAAI Conference on Artificial Intelligence, pp. 9324–9331. Cited by: §2.0.3.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description