Spatial Filtering for EEGBased Regression Problems in BrainComputer Interface (BCI)
Abstract
Electroencephalogram (EEG) signals are frequently used in braincomputer interfaces (BCIs), but they are easily contaminated by artifacts and noises, so preprocessing must be done before they are fed into a machine learning algorithm for classification or regression. Spatial filters have been widely used to increase the signaltonoise ratio of EEG for BCI classification problems, but their applications in BCI regression problems have been very limited. This paper proposes two common spatial pattern (CSP) filters for EEGbased regression problems in BCI, which are extended from the CSP filter for classification, by making use of fuzzy sets. Experimental results on EEGbased response speed estimation from a largescale study, which collected 143 sessions of sustainedattention psychomotor vigilance task data from 17 subjects during a 5month period, demonstrate that the two proposed spatial filters can significantly increase the EEG signal quality. When used in LASSO and nearest neighbors regression for user response speed estimation, the spatial filters can reduce the root mean square estimation error by , and at the same time increase the correlation to the true response speed by .
I Introduction
Electroencephalogram (EEG) signals are the most widely used input for braincomputer interfaces (BCIs) [24, 53, 25, 29, 47, 34], mainly due to the convenience to obtain them, compared with magnetoencephalography (MEG) [32], functional magnetic resonance imaging (fMRI) [44], functional nearinfrared spectroscopy (fNIRS) [33], and invasive signals like electrocorticography (ECoG) [35] and intracortical neural recordings [30]. However, EEG signals are often contaminated by ocular, muscular, and cardiac artifacts and various noises (powerline, changes in electrode impedances, etc) [49, 4, 34]. Usually some preprocessing, either manually or automatically [4, 34], is needed to remove the artifacts, and then temporal and spatial filters are applied to further improve the EEG signal quality before feeding it into a classification or regression algorithm. The most commonly used temporal filters are bandpass filters and notch filters (at 50 or 60 Hz powerline frequency).
This paper focuses on spatial filtering for improving the EEG signal quality. Many such approaches have been proposed in the literature [54, 17, 38, 40, 41, 15, 7, 37, 2]. However, almost all of them focus primarily on EEG classification problems in BCI, whereas EEG regression problems have been largely overlooked. Nevertheless, the latter is also very important in BCI. One example is driver drowsiness (or alertness) estimation from EEG signals, which has been extensively studied in our previous research [59, 27, 26, 62, 60, 28, 56]. This is a very important problem because drowsy driving is among the most important causes of road crashes, following only to alcohol, speeding, and inattention [43]. According to the National Highway Traffic Safety Administration [52], 2.5% of fatal motor vehicle crashes (on average 886/year in the U.S.) and 2.5% of fatalities (on average 1,004/year in the U.S.) between 2005 and 2009 involved drowsy driving.
This paper proposes two spatial filters for EEGbased regression problems in BCI. We also validate their performance in response speed (RS) estimation from EEG signals measured in a largescale sustainedattention psychomotor vigilance task (PVT) [21], which collected 143 sessions of data from 17 subjects in a 5month period.
The remainder of this paper is organized as follows: Section II reviews the stateoftheart spatial filters for EEGbased classification problems in BCI. Section III introduces our proposed spatial filters for supervised BCI regression problems. Section IV describes the experimental setup, RS and EEG data preprocessing techniques, and the procedure to evaluate the performances of different spatial filters. Section V presents the results of the comparative studies and parameter sensitivity analysis for the proposed spatial filter. Section VI discusses the limitations of the proposed approaches and outlines several future research directions. Finally, Section VII draws conclusions.
Ii Spatial Filters for EEG Classification in BCI
Many spatial filters have been proposed for EEG classification in BCI. The most basic ones include common average reference (CAR) [48], Laplacian filters [23], and principal component analysis [19]. Some of the more recent and also more sophisticated ones are:

Independent Component Analysis (ICA) [9, 54, 17], which decomposes a multivariate signal into independent nonGaussian signals. ICA has been widely used in the EEG research community to detect and remove stereotyped eye, muscle, and line noise artifacts [20, 49, 26].
Generally ICA works on an unepoched long block of EEG data, instead of epoched short EEG trials. Let the unepoched EEG data be , where is the number of EEG channels, and is the number of time samples. ICA assumes that is the linear combination of independent sources, i.e., , where is the mixing matrix, and the source signals, which are the rows of , are supposed to be stationary, independent, and nonGaussian. ICA can use various different principles [49, 17, 54, 9] to estimate both unknown and unknown simultaneously from . Once is obtained, cleaner and more representative features may be extracted from it than from the original [26].

xDAWN algorithm [38, 39, 40], which is often used to increase the signal to signalplusnoise ratio in P300based BCIs.
Like ICA, xDAWN also works on the unepoched long block of EEG data . It assumes that , where represents the P300 signal in an EEG epoch, and is a Toeplitz matrix whose first column is defined as:
(1) and represents the ongoing background brain activity as well as the artifacts and noises. xDAWN then designs a spatial filtering matrix , where is the number of spatial filters, to maximize the signal to signalplusnoise ratio, i.e.,
(2) where is the trace of a matrix. (2) is a generalized Rayleigh quotient [14], and its solution is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix .
The spatially filtered trial for is then computed as:
(3) 
Canonical Correlation Analysis (CCA) [41, 15], which finds linear transformations to maximize the correlations between two datasets. It has been used to improve BCI performance in codemodulated visual evoked potentials [5], steadystate visual evoked potentials [6], and eventrelated potentials like P300 and errorrelated potentials [45].
Unlike ICA and xDAWN, CCA works on epoched EEG trials. Consider a binary classification problem, with training examples in Class 1 and training examples in Class 2. Let be the th training example, where ( is the number of channels, and is the number of time samples in each trial), and . Let be the average of in Class . We then construct and , where is the concatenation of all in Class , and is the concatenation of . CCA first finds two vector filters and such that the correlation between and is maximized. and are called the first pair of canonical variables. CCA then finds the second pair of canonical variables in a similar way, subject to the constraint that they are uncorrelated with the first pair of canonical variables. This procedure can be continued up to times.
Finally, the spatial filtering matrix is the concatenation of all , which can be applied to each to increase its SNR.

Common Spatial Patterns (CSP) [7, 37], which is a supervised technique frequently used to enhance the binary classification performance of EEG data. The basic idea is to separate the EEG signal into additive subcomponents which have maximum differences in variance between the two classes. In the following we introduce the oneversustherest (OVR) CSP [11], which extends the traditional CSP from binary classification to classes.
Like CCA, OVR CSP also works on epoched EEG trials. Let be the th training example, as defined above. Assume the mean of has been removed, e.g., by highpass or bandpass filtering. Then, for Class , OVR CSP finds a spatial filter matrix , where is the number of spatial filters, to maximize the variance difference between Class and the rest:
(4) where is the mean covariance matrix of trials in Class . (4) is also a generalized Rayleigh quotient [14], and the solution is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix .
Finally, we concatenate the individual OVR CSP spatial filters to obtain the complete filter:
(5) and compute the spatially filtered trial for by (3).
Iii Spatial Filters for Supervised BCI Regression Problems
In this section we propose two common spatial pattern for regression (CSPR) filters, which extend the multiclass CSP filters from classification to regression by making use of fuzzy sets [63], as we have done in [62].
First, a brief introduction of fuzzy sets is given below.
Iiia Fuzzy Sets
A fuzzy set is comprised of a universe of discourse of real numbers together with a membership function , i.e.,
(6) 
Here denotes the collection of all points with associated membership degree . An example of a fuzzy set is shown in Fig. 1. The membership degrees are , , , , and . Observe that this is different from traditional (binary) sets, where each element can only belong to a set completely (i.e., with membership degree 1), or does not belong to it at all (i.e., with membership degree 0); there is nothing in between (i.e., with membership degree 0.5). Fuzzy sets are frequently used in modeling concepts in natural language [22, 55, 36], which may not have clear boundaries.
IiiB CsprOvr
Let () be the th EEG trial, where is the number of channels and is the number of time samples in each trial. We assume that the mean of each channel measurement has been removed, which is usually performed by bandpass filtering. Let be the RS of .
With the help of fuzzy sets, we can define “fuzzy” classes to connect regression problems and classification problems. Assume fuzzy classes are used. First, we partition the interval into equal intervals, and denote the partition points as . It is easy to obtain that
(7) 
For each , we then find the corresponding percentile value of all training and denote it as . Next we define fuzzy classes from them, as shown in Fig. 2. In this way, we can “classify” the training into fuzzy classes, corresponding to the crisp classes in the CSP for classification. However, note that in the CSP for classification a belongs to a crisp class either completely or not at all. For a fuzzy class here, a can belong to it at a membership degree in .
Next, for each fuzzy class, we compute its mean EEG trial as:
(8) 
where is the membership degree of in Fuzzy Class . Substituting (8) into (4), we can solve for the spatial filtering matrix for Fuzzy Class . Essentially, this makes those in Fuzzy Class different from those not in Fuzzy Class , which will help the regression performance, as we will demonstrate in Section V.
IiiC CsprOva
In (4) we construct the multiclass CSP using an OVR approach, but it can also be constructed using the following oneversusall (OVA) approach:
(9) 
The only difference between (9) and (4) is that the numerator of (9) also includes the contribution from Class itself. If we view Class as the signal of interest, and all other classes as noises, then (9) maximizes the signal to signalplusnoise ratio, as (2) in the xDAWN algorithm.
Iv Experiments and Data
This section introduces a PVT experiment that was used to evaluate the performances of the proposed spatial filtering algorithms, the corresponding RS and EEG data preprocessing procedures, and the feature sets.
Iva Experiment Setup
17 university students (13 males; average age 22.4, standard deviation 1.6) from National Chiao Tung University (NCTU) in Taiwan volunteered to support the datacollection efforts over a 5month period to study EEG correlates of attention and performance changes under specific conditions of realworld fatigue [21], as determined by the effectiveness score of Readiband [42]. The voluntary, fully informed consent of the persons used in this research was obtained as required by federal and Army regulations [51, 50]. The Institutional Review Board of NCTU approved the experimental protocol.
All participants registered their fatigue levels through a smartphone daily, and received notifications to report for experimental trials when the effectiveness score deemed their conditions fitted the experimental requirement (low fatigue: ; normal: ; high fatigue: ). Upon completion of the related questionnaires [Karolinska Sleepiness Scale (KSS) [1], and electronicallyadapted visual analog scale for fatigue (VASF) and stress (VASS)] and the informed consent form, subjects performed a PVT, a dynamic attentionshifting task, a lanekeeping task, and selected surveys (KSS, VASF, VASS, statetrait anxiety inventory, and mindwandering) preceding each condition. EEG data were recorded at 1000 Hz using a 64channel NeuroScan system. Most participants performed the laboratory experiment thrice in each of the three fatigue states.
In this paper we focus on the PVT [10], which is a sustainedattention task that uses RS to measure the speed with which a subject responds to a visual stimulus. It is widely used, particularly by NASA, for its ease of scoring, simple metrics, convergent validity, and free of learning effects. In our experiment, the PVT was presented on a smartphone with each trial initiated as an empty solid white circle centered on the touchscreen that began to fill in red displayed as a clockwise sweeping motion like the hand of a clock. The sweeping motion was programmed to turn solid red in one second or terminate upon a response by the participants, which required them to tap the touchscreen with the thumb of their dominant hand. The RS was computed as the inverse of the elapsed time between the appearance of the empty solid white circle and the participant’s response. Following completion of each trial, the circle went back to solid white until the next trial. Intertrial intervals consisted of random intervals between 210 seconds.
143 sessions of PVT data were collected from the 17 subjects, and each session lasted 10 minutes. Our goal is to predict the RS using a 3second EEG trial immediately before it.
IvB Performance Evaluation Process
The following procedure was performed to evaluate the performances of different spatial filters:

EEG data preprocessing to suppress artifacts and noises.

RS data preprocessing to suppress outliers.

5fold crossvalidation to compute the regression performance for each combination of spatial filters and regression method: first randomly partition the trials into five equal folds; then, use four folds for supervised spatial filtering and regression model training, and the remaining fold for testing; repeat this five times so that every fold is used in testing; finally compute the regression performances in terms of root mean square error (RMSE) and correlation coefficient (CC). Two regression methods were used: LASSO, whose adjustable parameter was optimized by an inner 5fold crossvalidation on the training dataset, and nearest neighbors (kNN) regression, where .

Repeat Step 3 10 times and compute the average regression performance.
More details about the first two steps are given in the next two subsections.
IvC EEG Data Preprocessing
We first downsampled the EEG data to 256 Hz, then epoched them to 3second trials according to the onset of the PVTs. Let the onset time of the th PVT be . Then, the 62channel EEG trial in seconds was used to predict the RS, i.e., . Each trial was then individually filtered by a Hz finite impulse response bandpass filter to make each channel zeromean and to remove unuseful high frequency components.
Because the intertrial intervals consisted of random intervals between 210 seconds, it’s possible that a 3second EEG trial covers part of data from the previous trial. Additionally, a trial may also contain the EEG oscillations related to motor reaction (tapping the touchscreen) in the previous trial. To remedy these problems, we removed overlapping trials: let the RS of the th trial be (the corresponding response time is ); then, the th trial is removed if , i.e., when the 3second EEG data for Trial overlap with the data and response for the previous trial.
IvD RS Data Preprocessing
The raw response times for two subjects are shown in Fig. 3. The top panel is from a typical subject, whose response times were mostly shorter than 1 second. The lower panel is from a subject with possible data recording issues, because lots of response times were longer than 5 seconds, which are highly unlikely in practice. So we excluded that subject from consideration in this paper, and only used the remaining 16 subjects.
As shown in Fig. 3, the response times were very noisy, and there were obvious outliers. It is very important to suppress the outliers and noises so that the performances of different algorithms can be more accurately compared. In addition to the step in the previous subsection to remove overlapping trials, we also employed the following 2step procedure for response time preprocessing:

Outlier thresholding, which aimed to suppress abnormally large response times. First, a threshold was computed for each subject, where is the mean response time from all sessions of that subject, and is the corresponding standard deviation. Then, all response times larger than were replaced by . Note that the threshold was different for different subjects.

Moving average smoothing, which replaced each response time by the average response time during a 60 seconds moving window centered at the onset of the corresponding PVT to suppress noises.
We then computed the RS as the inverse of the RT. The RSs for the 16 subjects are shown in Fig. 4. Observe that they are roughly in the same range, and many of them are approximately Gaussian.
IvE Feature Extraction
We extracted the following four feature sets for each preprocessed EEG trial:

Raw: Theta and Alpha powerband features from the bandpass filtered EEG trials. We computed the average power spectral density (PSD) in the Theta band (48 Hz) and Alpha band (813 Hz) for each channel using Welch’s method [57], and converted these band powers to dBs as our features.

CAR: Theta and Alpha powerband features from EEG trials filtered by CAR. This procedure was almost identical to Raw, except that the bandpass filtered EEG trials were also spatially filtered by CAR before the powerband features were computed. CAR is one of the most commonly used spatial filters for EEG, and [31] showed that it helped improve EEG classification performance. It simply removes the mean of all channels from each channel.

OVR: Theta and Alpha powerband features from EEG trials filtered by CSPROVR. This procedure was almost identical to CAR, except that the CAR filter was replaced by CSPROVR. We used 3 fuzzy classes for the RSs, and 21 spatial filters^{1}^{1}1We used 21 spatial filters here so that the filtered signals had roughly the same dimensionality as the original signals, which ensured fair performance comparison. In Section VC we also performed sensitivity analysis on the number of spatial filters. for each fuzzy class, so that the spatially filtered signals had dimensionality , roughly the same as the dimensionality of the original signals. We then extracted band power features for each trial.

OVA: Theta and Alpha powerband features from EEG trials filtered by CSPROVA. This procedure was also almost identical to CAR, except that the spatial filtering was performed by CSPROVA instead of CAR. There were also band power features for each trial.
V Experimental Results
This section compares the informativeness of the features in Raw, CAR, OVR and OVA, presents the regression performances, and also performs parameter sensitivity analysis for Algorithm 1.
Va Informativeness of the Features
Before studying the regression performances, it is important to check if the extracted features in Raw, CAR, OVR and OVA are indeed meaningful. We picked a typical subject, partitioned his data random into 50% training and 50% testing, and extracted Raw and CAR. We then designed the spatial filters using CSPROVR and CSPROVA on the training data, and extracted the corresponding OVR and OVA. For each feature set, we identified the top three channels that had the maximum correlation with the RS using the training data, and also computed the corresponding correlation coefficients for the testing data.
The results are shown in Fig. 5, where in each subfigure the data on the left of the black dotted line were used for training, and the right for testing. The top thick curve is the RS, and the bottom three curves are the maximally correlated features (note that good features are negatively correlated with the RS) identified from the training data. The training and testing correlation coefficients are shown on the left and right of the corresponding channel, respectively. Observe that the features from CAR had slightly better correlations with the RS in training than those from Raw, but not necessarily in testing. However, the features from OVR and OVA had much higher training and testing correlations to the RS than those from Raw and CAR, suggesting that CSPROVR and CSPROVA can indeed increase the signal quality. The reason is: if we view Class as the signal of interest, and all other classes as noises, then CSPROVR in (4) enhances the signal to noise ratio of the EEG signal, and CSPROVA in (9) enhances the signal to signalplusnoise ratio.
VB Regression Performance Comparison
The RMSEs and CCs of LASSO and kNN using the four feature sets are shown in Fig. 6 for the 16 subjects. Recall that for each subject the feature extraction methods were run 10 times, each with randomly partitioned training and testing data, and the average regression performances are shown here. The average RMSEs and CCs across all subjects are also shown in the last group of each panel. Observe that CAR had comparable or slightly better performance than Raw. Regardless of which regression algorithm was used, generally OVR and OVA had similar performance, and both of them achieved much smaller RMSEs and much larger CCs than Raw and CAR, suggesting that our extension of CSP from supervised classification to supervised regression can indeed improve the regression performance. Finally, LASSO had better performance than kNN on Raw and CAR, but kNN became better on OVR and OVA.
The corresponding percentage performance improvements of LASSO and kNN using the four feature sets are shown in Fig. 7, where the legend “LASSO,OVR/Raw” means the percentage performance improvement of LASSO on OVR over LASSO on Raw, and other legends should be interpreted in a similar manner. For both LASSO and kNN, OVR and OVA achieved similar performance improvements over Raw, and also over CAR. For LASSO, on average OVR had smaller RMSE than Raw, and larger CC. For kNN, on average OVR had smaller RMSE than Raw, and larger CC.
We also performed a twoway Analysis of Variance (ANOVA) for different regression algorithms to check if the RMSE and CC differences among the four feature sets were statistically significant, by setting the subjects as a random effect. The results are shown in Table I, which indicated that there were statistically significant differences in both RMSEs and CCs among different feature sets for both LASSO and kNN.
LASSO  kNN  
RMSE  CC  RMSE  CC  
Then, nonparametric multiple comparison tests based on Dunn’s procedure [12, 13] were used to determine if the difference between any pair of algorithms was statistically significant, with a value correction using the False Discovery Rate method [3]. The values are shown in Table II, where the statistically significant ones are marked in bold. Table II shows that, except for the CC of kNN, generally there was no statistically significant difference between Raw and CAR. However, for both LASSO and kNN, the RMSE and CC differences between and were always statistically significant. In all cases, there were no statistically significant differences between OVR and OVA.
LASSO  kNN  
RMSE  CC  RMSE  CC  
Raw  CAR  OVR  Raw  CAR  OVR  Raw  CAR  OVR  Raw  CAR  OVR  
CAR  .5883  .3374  .1437  .0009  
OVR  .0063  .0034  .0000  .0000  .0000  .0001  .0000  .0000  
OVA  .0122  .0044  .4960  .0000  .0000  .4970  .0000  .0001  .4937  .0000  .0000  .4741 
VC Parameter Sensitivity Analysis
There are two adjustable parameters in CSPROVR: , the number of fuzzy classes for the RSs, and , the number of spatial filters for each fuzzy class. In this subsection we study the sensitivity of the regression performance to these two parameters.
The regression performances for ( was fixed to be 21) are shown in Fig. 8. Algorithm 1 was repeated five times, each with a random partition of training and testing data, and the average regression results are shown. For both LASSO and kNN, on average gave worst performance, but resulted in roughly the same RMSE and CC. Hence, seems to be a good compromise between performance and computational cost.
The regression performances for ( was fixed to be 3) are shown in Fig. 9. Algorithm 1 was again repeated five times, and the average regression results are shown. For both LASSO and kNN, generally a larger resulted in a smaller RMSE and a larger CC, but the performance may reach a plateau at a certain . Also, a larger means heavier computational cost, which should be taken into consideration in choosing . For the PVT experiment, seemed to achieve a good compromise between performance and computational cost.
VD Different Fuzzy Set Shapes
In Section III we used triangular fuzzy sets for simplicity, but other shapes can also be used. Fig. 10 illustrates how Gaussian fuzzy sets can be designed here: the center of the th Gaussian fuzzy class is at [computed from (7)], and the spread is specially designed so that two adjacent fuzzy sets intersect at the midpoint with membership grade 0.5. As a result, generally the Gaussian fuzzy classes are not symmetric.
VE Robustness to Noise
It is also important to study the robustness of different spatial filters to noises. According to [64], there are two types of noises: class noise, which is the noise on the model outputs, and attribute noise, which is the noise on the model inputs. In this subsection we focus on the attribute noise.
As in [64], for each model input, we randomly replaced () of all trials from a subject with a uniform noise between its minimum and maximum values. After this was done for both the training and testing data, we extracted feature sets Raw, CAR, OVR and OVA, and trained LASSO and kNN, on the corrupted training data. We then tested their performances on the corrupted testing data. The results are shown in Fig. 12. Generally, as the noise level increased, the performances decreased, which is intuitive. However, OVR and OVA achieved better RMSEs and CCs than Raw and CAR at almost all noise levels, suggesting that it is still beneficial to use CSPROVR and CSPROVA even under high attribute noise.
VF Computational cost
Observe from Algorithm 1 that in training CSPROVR needs to perform a matrix inversion and an eigendecomposition to compute ; however, once the training is done, the filtering of new EEG trials can be conducted very efficiently by a simple matrix multiplication [see (3)]. Let be the number of training samples. Then, the actual training time of CSPROVR and CSPROVA increased linearly with , as shown in Fig. 13. The platform was a Dell XPS15 laptop (Intel i76700HQ CPU @2.60GHz, 16 GB memory) running Windows 10 Pro 64bit and Matlab 2016b. A least squares curve fit shows that the training time is seconds, which should not be a problem for a practical .
Vi Discussions and Future Research
Recall that 5fold crossvalidation was used in the performance evaluation in the previous section, i.e., we concatenated the ninesession data from the same subject, randomly partitioned them into five equallength folds, and then used four folds for training and the remaining one for testing. So, the training and testing folds contained data from the same sessions. This is equivalent to the case that we label some sessionspecific data in offline regression. Our results showed that in this case CSPROVR and CSPROVA can significantly improve the regression performance.
To avoid the use of sessionspecific data, we also investigated a different validation method: leaveonesessionout validation, in which for each subject we trained the spatial filters using eight sessions and tested them on the remaining session. Interestingly, all four feature sets and both regression models achieved very poor performance here. The reasons are: 1) we need a proper way to normalize the RSs from different sessions, as done for the response times in [16]; and, 2) there is large intrasubject variation, meaning that the EEG responses for the same subject vary at different times (recall that these nine sessions were collected at different days); so, the patterns learned from previous sessions become obsolete for the new session, and hence spatial filtering alone does not help. However, our previous research [58, 62, 61] has shown that transfer learning can cope well with the intersubject variation (individual differences) in both classification and regression problems, and we conjecture that it can also handle the intrasubject variation. One of our future research directions is to demonstrate the performance of CSPROVR and CSPROVA in a transfer learning framework to individualize a generalized model for regression problems, as done in [18, 46] for EEGbased cognitive performance classification.
Another direction of our future research will apply CSPROVR and CSPROVA to other important EEGbased regression problems, e.g., drowsiness (or alertness) estimation during driving, and integrate it with more sophisticated feature extraction approaches, e.g., Riemannian geometry [8], for better regression performance.
Vii Conclusions
EEG signals are easily contaminated by artifacts and noises, so preprocessing is needed before they are fed into a machine learning algorithm in BCI. Spatial filters, e.g., ICA, xDAWN, CSP and CCA, have been widely used to increase the EEG signal quality for classification problems, but their applications in BCI regression problems have been very limited. In this paper, we have proposed two CSP filters for EEGbased regression problems in BCI, which were extended from the CSP filter for classification, by making use of fuzzy sets. Extensive experimental results on EEGbased RS estimation from a largescale study, which collected 143 sessions of PVT data from 17 subjects during a 5month period, demonstrated that our proposed spatial filters can significantly increase the EEG signal quality. When used in LASSO and kNN, the spatial filters can reduce the estimation RMSE by , and at the same time increase the CC by .
Acknowledgement
Research was sponsored by the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement Numbers W911NF1020022 and W911NF10D0002/TO 0023. The views and the conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory or the U.S. Government. This work was also partially supported by the Australian Research Council (ARC) under discovery grant DP150101645.
References
 [1] T. Akerstedt and M. Gillberg, “Subjective and objective sleepiness in the active individual,” International Journal of Neuroscience, vol. 52, no. 12, pp. 29–37, 1990.
 [2] A. Barachant. (2014) MEG decoding using Riemannian geometry and unsupervised classification. Accessed: 8/17/2016. [Online]. Available: http://alexandre.barachant.org/wpcontent/uploads/2014/08/documentation.pdf.
 [3] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 57, pp. 289–300, 1995.
 [4] N. BigdelyShamlo, T. Mullen, C. Kothe, K.M. Su, and K. A. Robbins, “The PREP pipeline: standardized preprocessing for largescale EEG analysis,” Frontiers in Neuroinformatics, vol. 9, 2015.
 [5] G. Bin, X. Gao, Y. Wang, Y. Li, B. Hong, and S. Gao, “A highspeed BCI based on code modulation VEP,” Journal of neural engineering, vol. 8, no. 2, 2011.
 [6] G. Bin, X. Gao, Z. Yan, B. Hong, and S. Gao, “An online multichannel SSVEPbased braincomputer interface using a canonical correlation analysis method,” Journal of neural engineering, vol. 6, no. 4, 2009.
 [7] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. R. Muller, “Optimizing spatial filters for robust EEG singletrial analysis,” IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 41–56, 2008.
 [8] M. Congedo, A. Barachant, and A. Andreev, “A new generation of braincomputer interface based on Riemannian geometry,” arXiv: 1310.8115, 2013.
 [9] A. Delorme and S. Makeig, “EEGLAB: an open source toolbox for analysis of singletrial EEG dynamics including independent component analysis,” Journal of Neuroscience Methods, vol. 134, pp. 9–21, 2004.
 [10] D. F. Dinges and J. W. Powell, “Microcomputer analyses of performance on a portable, simple visual RT task during sustained operations,” Behavior research methods, instruments, & computers, vol. 17, no. 6, pp. 652–655, 1985.
 [11] G. Dornhege, G. C. B. Blankertz, and K.R. Muller, “Boosting bit rates in noninvasive EEG singletrial classifications by feature combination and multiclass paradigms,” IEEE Trans. on Biomedical Engineering, vol. 51, no. 6, pp. 993–1002, 2004.
 [12] O. Dunn, “Multiple comparisons among means,” Journal of the American Statistical Association, vol. 56, pp. 62–64, 1961.
 [13] ——, “Multiple comparisons using rank sums,” Technometrics, vol. 6, pp. 214–252, 1964.
 [14] G. H. Golub and C. F. V. Loan, Matrix Computation, 3rd ed. Baltimore, MD: The Johns Hopkins University Press, 1996.
 [15] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936.
 [16] Z. Hu, Y. Sun, J. Lim, N. Thakor, and A. Bezerianos, “Investigating the correlation between the neural activity and task performance in a psychomotor vigilance test,” in Proc. 37th Annual Int’l Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, August 2015, pp. 4725–4728.
 [17] A. Hyvarinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13, no. 4, pp. 411–430, 2000.
 [18] R. R. Johnson, D. P. Popovic, R. E. O. andMaja Stikic, D. J. Levendowski, and C. Berka, “Drowsiness/alertness algorithm development and validation using synchronized EEG and cognitive performance to individualize a generalized model,” Biological Psychology, vol. 87, p. 241â250, 2011.
 [19] I. Jolliffe, Principal component analysis. Wiley Online Library, 2002.
 [20] T.P. Jung, S. Makeig, C. Humphries, T.W. Lee, M. J. Mckeown, V. Iragui, and T. J. Sejnowski, “Removing electroencephalographic artifacts by blind source separation,” Psychophysiology, vol. 37, no. 2, pp. 163–178, 2000.
 [21] S. Kerick, C.H. Chuang, J.T. King, T.P. Jung, J. Brooks, B. T. Files, K. McDowell, and C.T. Lin, “Inter and intraindividual variations in sleep, subjective fatigue, and vigilance task performance of students in their realworld environments over extended periods,” 2016, submitted.
 [22] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Upper Saddle River, NJ: PrenticeHall, 1995.
 [23] T. D. Lagerlund, F. W. Sharbrough, and N. E. Busacker, “Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition,” Journal of Clinical Neurophysiology, vol. 14, no. 1, pp. 73–82, 1997.
 [24] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell, “Braincomputer interface technologies in the coming decades,” Proc. of the IEEE, vol. 100, no. 3, pp. 1585–1599, 2012.
 [25] L.D. Liao, C.T. Lin, K. McDowell, A. Wickenden, K. Gramann, T.P. Jung, L.W. Ko, and J.Y. Chang, “Biosensor technologies for augmented braincomputer interfaces in the next decades,” Proc. of the IEEE, vol. 100, no. 2, pp. 1553–1566, 2012.
 [26] C. T. Lin, R. C. Wu, S. F. Liang, T. Y. Huang, W. H. Chao, Y. J. Chen, and T. P. Jung, “EEGbased drowsiness estimation for safety driving using independent component analysis,” IEEE Trans. on Circuits and Systems, vol. 52, pp. 2726–2738, 2005.
 [27] C.T. Lin, Y.C. Chen, T.Y. Huang, T.T. Chiu, L.W. Ko, S.F. Liang, H.Y. Hsieh, S.H. Hsu, and J.R. Duann, “Development of wireless brain computer interface with embedded multitask scheduling and its application on realtime driver’s drowsiness detection and warning,” IEEE Trans. on Biomedical Engineering, vol. 55, no. 5, pp. 1582–1591, 2008.
 [28] C.T. Lin, L.W. Ko, I.F. Chung, T.Y. Huang, Y.C. Chen, T.P. Jung, and S.F. Liang, “Adaptive EEGbased alertness estimation system by using ICAbased fuzzy neural networks,” IEEE Trans. on Circuits and SystemsI, vol. 53, no. 11, pp. 2469–2476, 2006.
 [29] S. Makeig, C. Kothe, T. Mullen, N. BigdelyShamlo, Z. Zhang, and K. KreutzDelgado, “Evolving signal processing for braincomputer interfaces,” Proc. of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567–1584, 2012.
 [30] E. M. Maynard, C. T. Nordhausen, and R. A. Normann, “The Utah intracortical electrode array: a recording structure for potential braincomputer interfaces,” Electroencephalography and clinical neurophysiology, vol. 102, no. 3, pp. 228–239, 1997.
 [31] D. J. McFarland, L. M. McCane, S. V. David, and J. R. Wolpaw, “Spatial filter selection for EEGbased communication,” Electroencephalography and clinical Neurophysiology, vol. 103, pp. 386–394, 1997.
 [32] J. Mellinger, G. Schalk, C. Braun, H. Preissl, W. Rosenstiel, N. Birbaumer, and A. Kubler, “An MEGbased braincomputer interface (BCI),” Neuroimage, vol. 36, no. 3, pp. 581–593, 2007.
 [33] N. Naseer and K.S. Hong, “fNIRSbased braincomputer interfaces: a review,” Frontiers in human neuroscience, vol. 9, p. 3, 2015.
 [34] L. F. NicolasAlonso and J. GomezGil, “Brain computer interfaces, a review,” Sensors, vol. 12, no. 2, pp. 1211–1279, 2012.
 [35] X. Pei, D. L. Barbour, E. C. Leuthardt, and G. Schalk, “Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans,” Journal of neural engineering, vol. 8, no. 4, 2011.
 [36] C. C. Ragin, Fuzzyset social science. Chicago, IL: The University of Chicago Press, 2000.
 [37] H. Ramoser, J. MullerGerking, and G. Pfurtscheller, “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Trans. on Rehabilitation Engineering, vol. 8, no. 4, pp. 441–446, 2000.
 [38] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithm to enhance evoked potentials: application to braincomputer interface,” IEEE Trans. on Biomedical Engineering, vol. 56, no. 8, pp. 2035–2043, 2009.
 [39] B. Rivet, H. Cecotti, A. Souloumiac, E. Maby, and J. Mattout, “Theoretical analysis of xDAWN algorithm: application to an efficient sensor selection in a P300 BCI,” in Proc. 19th European Signal Processing Conference, Barcelona, Spain, August 2011, pp. 1382–1386.
 [40] B. Rivet and A. Souloumiac, “Optimal linear spatial filters for eventrelated potentials based on a spatiotemporal model: Asymptotical performance analysis,” Signal Processing, vol. 93, no. 2, pp. 387–398, 2013.
 [41] R. N. Roy, S. Bonnet, S. Charbonnier, P. Jallon, and A. Campagne, “A comparison of ERP spatial filtering methods for optimal mental workload estimation,” in Proc. 37th Annual Int’l Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015, pp. 7254–7257.
 [42] C. Russell, J. Caldwell, D. Arand, L. Myers, P. Wubbels, and H. Downs. (2015) Validation of the fatigue science readiband actigraph and associated sleep/wake classification algorithms. Accessed: 08/11/2016. [Online]. Available: http://static1.squarespace.com/static/550af02ae4b0cf85628d981a/t/5526c99ee4b019412c323758/1428605342303/Readiband_Validation.pdf.
 [43] F. Sagberg, P. Jackson, H.P. Kruger, A. Muzer, and A. Williams, “Fatigue, sleepiness and reduced alertness as risk factors in driving,” Institute of Transport Economics, Oslo, Tech. Rep. TOI Report 739/2004, 2004.
 [44] R. Sitaram, A. Caria, R. Veit, T. Gaber, G. Rota, A. Kuebler, and N. Birbaumer, “fMRI braincomputer interface: a tool for neuroscientific research and treatment,” Computational intelligence and neuroscience, 2007.
 [45] M. Spuler, A. Walter, W. Rosenstiel, and M. Bogdan, “Spatial filtering based on canonical correlation analysis for classification of evoked or eventrelated potentials in EEG data,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 22, no. 6, pp. 1097–1103, 2014.
 [46] M. Stikic, R. R. Johnson, D. J. Levendowski, D. P. Popovic, R. E. Olmstead, and C. Berka, “EEGderived estimators of present and future cognitive performance,” Frontiers in Human Neuroscience, vol. 5, 2011.
 [47] D. S. Tan and A. Nijholt, Eds., BrainComputer Interfaces: Applying our Minds to HumanComputer Interaction. London: Springer, 2010.
 [48] M. Teplan, “Fundamentals of EEG measurement,” Measurement Science Review, vol. 2, no. 2, pp. 1–11, 2002.
 [49] J. A. Uriguen and B. GarciaZapirain, “EEG artifact removal – stateoftheart and guidelines,” Journal of Neural Engineering, vol. 12, no. 3, 2015.
 [50] US Department of Defense Office of the Secretary of Defense, “Code of federal regulations protection of human subjects,” Government Printing Office, no. 32 CFR 19, 1999.
 [51] US Department of the Army, “Use of volunteers as subjects of research,” Government Printing Office, no. AR 7025, 1990.
 [52] (2011) Traffic safety facts crash stats: drowsy driving. US Department of Transportation, National Highway Traffic Safety Administration. Washington, DC. [Online]. Available: http://wwwnrd.nhtsa.dot.gov/pubs/811449.pdf
 [53] J. van Erp, F. Lotte, and M. Tangermann, “Braincomputer interfaces: Beyond medical applications,” Computer, vol. 45, no. 4, pp. 26–34, 2012.
 [54] R. Vigario, J. Sarela, V. Jousmiki, M. Hamalainen, and E. Oja, “Independent component approach to the analysis of EEG and MEG recordings,” IEEE Trans. on Biomedical Engineering, vol. 47, no. 5, pp. 589–593, 2000.
 [55] L.X. Wang, A Course in Fuzzy Systems and Control. Upper Saddle River, NJ: Prentice Hall, 1997.
 [56] C.S. Wei, Y.P. Lin, Y.T. Wang, T.P. Jung, N. BigdelyShamlo, and C.T. Lin, “Selective transfer learning for EEGbased drowsiness detection,” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Hong Kong, October 2015.
 [57] P. Welch, “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. on Audio Electroacoustics, vol. 15, pp. 70–73, 1967.
 [58] D. Wu, “Online and offline domain adaptation for reducing BCI calibration effort,” IEEE Trans. on HumanMachine Systems, 2016, in press.
 [59] D. Wu, C.H. Chuang, and C.T. Lin, “Online driver’s drowsiness estimation using domain adaptation with model fusion,” in Proc. Int’l Conf. on Affective Computing and Intelligent Interaction, Xi’an, China, September 2015, pp. 904–910.
 [60] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.T. Lin, “Offline EEGbased driver drowsiness estimation using enhanced batchmode active learning (EBMAL) for regression,” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Budapest, Hungary, October 2016.
 [61] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.T. Lin, “Spectral metalearner for regression (SMLR) model aggregation: Towards calibrationless braincomputer interface (BCI),” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Budapest, Hungary, October 2016.
 [62] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.T. Lin, “Driver drowsiness estimation from EEG signals using online weighted adaptation regularization for regression (OwARR),” IEEE Trans. on Fuzzy Systems, 2016, in press.
 [63] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, pp. 338–353, 1965.
 [64] X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study of their impacts,” Artificial Intelligence Review, vol. 22, pp. 177–210, 2004.