Spatial Filtering for EEG-Based Regression Problems in Brain-Computer Interface (BCI)

Spatial Filtering for EEG-Based Regression Problems in Brain-Computer Interface (BCI)

Dongrui Wu1, Senior Member, IEEE, Jung-Tai King2, Chun-Hsiang Chuang32,
Chin-Teng Lin32, Fellow, IEEE, Tzyy-Ping Jung45, Fellow, IEEE





E-mail: drwu09@gmail.com, jtchin2@gmail.com, cch.chuang@gmail.com,
Chin-Teng.Lin@uts.edu.au, jung@sccn.ucsd.edu
1DataNova, NY USA 2Brain Research Center, National Chiao-Tung University, Hsinchu, Taiwan 3Faculty of Engineering and Information Technology, University of Technology, Sydney, Australia 4Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, La Jolla, CA 5Center for Advanced Neurological Engineering, Institute of Engineering in Medicine, University of California San Diego, La Jolla, CA
Abstract

Electroencephalogram (EEG) signals are frequently used in brain-computer interfaces (BCIs), but they are easily contaminated by artifacts and noises, so preprocessing must be done before they are fed into a machine learning algorithm for classification or regression. Spatial filters have been widely used to increase the signal-to-noise ratio of EEG for BCI classification problems, but their applications in BCI regression problems have been very limited. This paper proposes two common spatial pattern (CSP) filters for EEG-based regression problems in BCI, which are extended from the CSP filter for classification, by making use of fuzzy sets. Experimental results on EEG-based response speed estimation from a large-scale study, which collected 143 sessions of sustained-attention psychomotor vigilance task data from 17 subjects during a 5-month period, demonstrate that the two proposed spatial filters can significantly increase the EEG signal quality. When used in LASSO and -nearest neighbors regression for user response speed estimation, the spatial filters can reduce the root mean square estimation error by , and at the same time increase the correlation to the true response speed by .

Brain-computer interface, common spatial pattern, EEG, fuzzy sets, psychomotor vigilance task, response speed estimation, spatial filtering

I Introduction

Electroencephalogram (EEG) signals are the most widely used input for brain-computer interfaces (BCIs) [24, 53, 25, 29, 47, 34], mainly due to the convenience to obtain them, compared with magnetoencephalography (MEG) [32], functional magnetic resonance imaging (fMRI) [44], functional near-infrared spectroscopy (fNIRS) [33], and invasive signals like electrocorticography (ECoG) [35] and intracortical neural recordings [30]. However, EEG signals are often contaminated by ocular, muscular, and cardiac artifacts and various noises (power-line, changes in electrode impedances, etc) [49, 4, 34]. Usually some preprocessing, either manually or automatically [4, 34], is needed to remove the artifacts, and then temporal and spatial filters are applied to further improve the EEG signal quality before feeding it into a classification or regression algorithm. The most commonly used temporal filters are band-pass filters and notch filters (at 50 or 60 Hz power-line frequency).

This paper focuses on spatial filtering for improving the EEG signal quality. Many such approaches have been proposed in the literature [54, 17, 38, 40, 41, 15, 7, 37, 2]. However, almost all of them focus primarily on EEG classification problems in BCI, whereas EEG regression problems have been largely overlooked. Nevertheless, the latter is also very important in BCI. One example is driver drowsiness (or alertness) estimation from EEG signals, which has been extensively studied in our previous research [59, 27, 26, 62, 60, 28, 56]. This is a very important problem because drowsy driving is among the most important causes of road crashes, following only to alcohol, speeding, and inattention [43]. According to the National Highway Traffic Safety Administration [52], 2.5% of fatal motor vehicle crashes (on average 886/year in the U.S.) and 2.5% of fatalities (on average 1,004/year in the U.S.) between 2005 and 2009 involved drowsy driving.

This paper proposes two spatial filters for EEG-based regression problems in BCI. We also validate their performance in response speed (RS) estimation from EEG signals measured in a large-scale sustained-attention psychomotor vigilance task (PVT) [21], which collected 143 sessions of data from 17 subjects in a 5-month period.

The remainder of this paper is organized as follows: Section II reviews the state-of-the-art spatial filters for EEG-based classification problems in BCI. Section III introduces our proposed spatial filters for supervised BCI regression problems. Section IV describes the experimental setup, RS and EEG data preprocessing techniques, and the procedure to evaluate the performances of different spatial filters. Section V presents the results of the comparative studies and parameter sensitivity analysis for the proposed spatial filter. Section VI discusses the limitations of the proposed approaches and outlines several future research directions. Finally, Section VII draws conclusions.

Ii Spatial Filters for EEG Classification in BCI

Many spatial filters have been proposed for EEG classification in BCI. The most basic ones include common average reference (CAR) [48], Laplacian filters [23], and principal component analysis [19]. Some of the more recent and also more sophisticated ones are:

  1. Independent Component Analysis (ICA) [9, 54, 17], which decomposes a multivariate signal into independent non-Gaussian signals. ICA has been widely used in the EEG research community to detect and remove stereotyped eye, muscle, and line noise artifacts [20, 49, 26].

    Generally ICA works on an unepoched long block of EEG data, instead of epoched short EEG trials. Let the unepoched EEG data be , where is the number of EEG channels, and is the number of time samples. ICA assumes that is the linear combination of independent sources, i.e., , where is the mixing matrix, and the source signals, which are the rows of , are supposed to be stationary, independent, and non-Gaussian. ICA can use various different principles [49, 17, 54, 9] to estimate both unknown and unknown simultaneously from . Once is obtained, cleaner and more representative features may be extracted from it than from the original [26].

  2. xDAWN algorithm [38, 39, 40], which is often used to increase the signal to signal-plus-noise ratio in P300-based BCIs.

    Like ICA, xDAWN also works on the unepoched long block of EEG data . It assumes that , where represents the P300 signal in an EEG epoch, and is a Toeplitz matrix whose first column is defined as:

    (1)

    and represents the ongoing background brain activity as well as the artifacts and noises. xDAWN then designs a spatial filtering matrix , where is the number of spatial filters, to maximize the signal to signal-plus-noise ratio, i.e.,

    (2)

    where is the trace of a matrix. (2) is a generalized Rayleigh quotient [14], and its solution is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix .

    The spatially filtered trial for is then computed as:

    (3)
  3. Canonical Correlation Analysis (CCA) [41, 15], which finds linear transformations to maximize the correlations between two datasets. It has been used to improve BCI performance in code-modulated visual evoked potentials [5], steady-state visual evoked potentials [6], and event-related potentials like P300 and error-related potentials [45].

    Unlike ICA and xDAWN, CCA works on epoched EEG trials. Consider a binary classification problem, with training examples in Class 1 and training examples in Class 2. Let be the th training example, where ( is the number of channels, and is the number of time samples in each trial), and . Let be the average of in Class . We then construct and , where is the concatenation of all in Class , and is the concatenation of . CCA first finds two vector filters and such that the correlation between and is maximized. and are called the first pair of canonical variables. CCA then finds the second pair of canonical variables in a similar way, subject to the constraint that they are uncorrelated with the first pair of canonical variables. This procedure can be continued up to times.

    Finally, the spatial filtering matrix is the concatenation of all , which can be applied to each to increase its SNR.

  4. Common Spatial Patterns (CSP) [7, 37], which is a supervised technique frequently used to enhance the binary classification performance of EEG data. The basic idea is to separate the EEG signal into additive subcomponents which have maximum differences in variance between the two classes. In the following we introduce the one-versus-the-rest (OVR) CSP [11], which extends the traditional CSP from binary classification to classes.

    Like CCA, OVR CSP also works on epoched EEG trials. Let be the th training example, as defined above. Assume the mean of has been removed, e.g., by high-pass or band-pass filtering. Then, for Class , OVR CSP finds a spatial filter matrix , where is the number of spatial filters, to maximize the variance difference between Class and the rest:

    (4)

    where is the mean covariance matrix of trials in Class . (4) is also a generalized Rayleigh quotient [14], and the solution is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix .

    Finally, we concatenate the individual OVR CSP spatial filters to obtain the complete filter:

    (5)

    and compute the spatially filtered trial for by (3).

Iii Spatial Filters for Supervised BCI Regression Problems

In this section we propose two common spatial pattern for regression (CSPR) filters, which extend the multi-class CSP filters from classification to regression by making use of fuzzy sets [63], as we have done in [62].

First, a brief introduction of fuzzy sets is given below.

Iii-a Fuzzy Sets

A fuzzy set is comprised of a universe of discourse of real numbers together with a membership function , i.e.,

(6)

Here denotes the collection of all points with associated membership degree . An example of a fuzzy set is shown in Fig. 1. The membership degrees are , , , , and . Observe that this is different from traditional (binary) sets, where each element can only belong to a set completely (i.e., with membership degree 1), or does not belong to it at all (i.e., with membership degree 0); there is nothing in between (i.e., with membership degree 0.5). Fuzzy sets are frequently used in modeling concepts in natural language [22, 55, 36], which may not have clear boundaries.

Fig. 1: An examples of a fuzzy set.

Iii-B Cspr-Ovr

Let () be the th EEG trial, where is the number of channels and is the number of time samples in each trial. We assume that the mean of each channel measurement has been removed, which is usually performed by band-pass filtering. Let be the RS of .

With the help of fuzzy sets, we can define “fuzzy” classes to connect regression problems and classification problems. Assume fuzzy classes are used. First, we partition the interval into equal intervals, and denote the partition points as . It is easy to obtain that

(7)

For each , we then find the corresponding percentile value of all training and denote it as . Next we define fuzzy classes from them, as shown in Fig. 2. In this way, we can “classify” the training into fuzzy classes, corresponding to the crisp classes in the CSP for classification. However, note that in the CSP for classification a belongs to a crisp class either completely or not at all. For a fuzzy class here, a can belong to it at a membership degree in .

Fig. 2: The fuzzy classes for , when triangular fuzzy sets are used.

Next, for each fuzzy class, we compute its mean EEG trial as:

(8)

where is the membership degree of in Fuzzy Class . Substituting (8) into (4), we can solve for the spatial filtering matrix for Fuzzy Class . Essentially, this makes those in Fuzzy Class different from those not in Fuzzy Class , which will help the regression performance, as we will demonstrate in Section V.

Next, we construct the concatenated spatial filtering matrix by (5), and finally perform the spatial filtering for each EEG trial by (3). The complete CSPR-OVR spatial filter for supervised BCI regression problems is summarized in Algorithm 1.

Input: EEG training examples , where , ;
          , the number of fuzzy classes for ;
          , the number of spatial filters for each
            fuzzy class.
Output: Spatially filtered EEG trials .
Band-pass filter each to remove the mean of each channel;
Compute in (7);
Compute the corresponding percentile values for ;
Construct the fuzzy classes as shown in Fig. 2;
Compute by (8);
Compute by (4);
Construct by (5);
Return by (3)
Algorithm 1 The CSPR-OVR spatial filter for supervised BCI regression problems.

Iii-C Cspr-Ova

In (4) we construct the multi-class CSP using an OVR approach, but it can also be constructed using the following one-versus-all (OVA) approach:

(9)

The only difference between (9) and (4) is that the numerator of (9) also includes the contribution from Class itself. If we view Class as the signal of interest, and all other classes as noises, then (9) maximizes the signal to signal-plus-noise ratio, as (2) in the xDAWN algorithm.

Equation (9) is also a generalized Rayleigh quotient [14], and the solution is the concatenation of the eigenvectors associated with the largest eigenvalues of the matrix . The OVA CSP for classification still uses (5) to construct the final spatial filter, and (3) to perform the filtering.

Using the technique introduced in the previous subsection, we can easily develop the CSPR-OVA spatial filter for BCI regression problems. Its procedure is almost identical to that in Algorithm 1. The only difference is that is computed by (9) instead of (4).

Iv Experiments and Data

This section introduces a PVT experiment that was used to evaluate the performances of the proposed spatial filtering algorithms, the corresponding RS and EEG data preprocessing procedures, and the feature sets.

Iv-a Experiment Setup

17 university students (13 males; average age 22.4, standard deviation 1.6) from National Chiao Tung University (NCTU) in Taiwan volunteered to support the data-collection efforts over a 5-month period to study EEG correlates of attention and performance changes under specific conditions of real-world fatigue [21], as determined by the effectiveness score of Readiband [42]. The voluntary, fully informed consent of the persons used in this research was obtained as required by federal and Army regulations [51, 50]. The Institutional Review Board of NCTU approved the experimental protocol.

All participants registered their fatigue levels through a smartphone daily, and received notifications to report for experimental trials when the effectiveness score deemed their conditions fitted the experimental requirement (low fatigue: ; normal: ; high fatigue: ). Upon completion of the related questionnaires [Karolinska Sleepiness Scale (KSS) [1], and electronically-adapted visual analog scale for fatigue (VAS-F) and stress (VAS-S)] and the informed consent form, subjects performed a PVT, a dynamic attention-shifting task, a lane-keeping task, and selected surveys (KSS, VAS-F, VAS-S, state-trait anxiety inventory, and mind-wandering) preceding each condition. EEG data were recorded at 1000 Hz using a 64-channel NeuroScan system. Most participants performed the laboratory experiment thrice in each of the three fatigue states.

In this paper we focus on the PVT [10], which is a sustained-attention task that uses RS to measure the speed with which a subject responds to a visual stimulus. It is widely used, particularly by NASA, for its ease of scoring, simple metrics, convergent validity, and free of learning effects. In our experiment, the PVT was presented on a smartphone with each trial initiated as an empty solid white circle centered on the touchscreen that began to fill in red displayed as a clockwise sweeping motion like the hand of a clock. The sweeping motion was programmed to turn solid red in one second or terminate upon a response by the participants, which required them to tap the touchscreen with the thumb of their dominant hand. The RS was computed as the inverse of the elapsed time between the appearance of the empty solid white circle and the participant’s response. Following completion of each trial, the circle went back to solid white until the next trial. Inter-trial intervals consisted of random intervals between 2-10 seconds.

143 sessions of PVT data were collected from the 17 subjects, and each session lasted 10 minutes. Our goal is to predict the RS using a 3-second EEG trial immediately before it.

Iv-B Performance Evaluation Process

The following procedure was performed to evaluate the performances of different spatial filters:

  1. EEG data preprocessing to suppress artifacts and noises.

  2. RS data preprocessing to suppress outliers.

  3. 5-fold cross-validation to compute the regression performance for each combination of spatial filters and regression method: first randomly partition the trials into five equal folds; then, use four folds for supervised spatial filtering and regression model training, and the remaining fold for testing; repeat this five times so that every fold is used in testing; finally compute the regression performances in terms of root mean square error (RMSE) and correlation coefficient (CC). Two regression methods were used: LASSO, whose adjustable parameter was optimized by an inner 5-fold cross-validation on the training dataset, and -nearest neighbors (kNN) regression, where .

  4. Repeat Step 3 10 times and compute the average regression performance.

More details about the first two steps are given in the next two subsections.

Iv-C EEG Data Preprocessing

We first downsampled the EEG data to 256 Hz, then epoched them to 3-second trials according to the onset of the PVTs. Let the onset time of the th PVT be . Then, the 62-channel EEG trial in seconds was used to predict the RS, i.e., . Each trial was then individually filtered by a Hz finite impulse response band-pass filter to make each channel zero-mean and to remove un-useful high frequency components.

Because the inter-trial intervals consisted of random intervals between 2-10 seconds, it’s possible that a 3-second EEG trial covers part of data from the previous trial. Additionally, a trial may also contain the EEG oscillations related to motor reaction (tapping the touchscreen) in the previous trial. To remedy these problems, we removed overlapping trials: let the RS of the th trial be (the corresponding response time is ); then, the th trial is removed if , i.e., when the 3-second EEG data for Trial overlap with the data and response for the previous trial.

Iv-D RS Data Preprocessing

The raw response times for two subjects are shown in Fig. 3. The top panel is from a typical subject, whose response times were mostly shorter than 1 second. The lower panel is from a subject with possible data recording issues, because lots of response times were longer than 5 seconds, which are highly unlikely in practice. So we excluded that subject from consideration in this paper, and only used the remaining 16 subjects.

Fig. 3: Response times for a typical subject (top panel) and a subject with possible data recording issues (bottom panel). The green line is the threshold, and the red stars are response times above the threshold, which will be brought to the threshold.

As shown in Fig. 3, the response times were very noisy, and there were obvious outliers. It is very important to suppress the outliers and noises so that the performances of different algorithms can be more accurately compared. In addition to the step in the previous subsection to remove overlapping trials, we also employed the following 2-step procedure for response time preprocessing:

  1. Outlier thresholding, which aimed to suppress abnormally large response times. First, a threshold was computed for each subject, where is the mean response time from all sessions of that subject, and is the corresponding standard deviation. Then, all response times larger than were replaced by . Note that the threshold was different for different subjects.

  2. Moving average smoothing, which replaced each response time by the average response time during a 60 seconds moving window centered at the onset of the corresponding PVT to suppress noises.

We then computed the RS as the inverse of the RT. The RSs for the 16 subjects are shown in Fig. 4. Observe that they are roughly in the same range, and many of them are approximately Gaussian.

Fig. 4: Distributions of the preprocessed RSs for the 16 subjects.

Iv-E Feature Extraction

We extracted the following four feature sets for each preprocessed EEG trial:

  • Raw: Theta and Alpha powerband features from the band-pass filtered EEG trials. We computed the average power spectral density (PSD) in the Theta band (4-8 Hz) and Alpha band (8-13 Hz) for each channel using Welch’s method [57], and converted these band powers to dBs as our features.

  • CAR: Theta and Alpha powerband features from EEG trials filtered by CAR. This procedure was almost identical to Raw, except that the band-pass filtered EEG trials were also spatially filtered by CAR before the powerband features were computed. CAR is one of the most commonly used spatial filters for EEG, and [31] showed that it helped improve EEG classification performance. It simply removes the mean of all channels from each channel.

  • OVR: Theta and Alpha powerband features from EEG trials filtered by CSPR-OVR. This procedure was almost identical to CAR, except that the CAR filter was replaced by CSPR-OVR. We used 3 fuzzy classes for the RSs, and 21 spatial filters111We used 21 spatial filters here so that the filtered signals had roughly the same dimensionality as the original signals, which ensured fair performance comparison. In Section V-C we also performed sensitivity analysis on the number of spatial filters. for each fuzzy class, so that the spatially filtered signals had dimensionality , roughly the same as the dimensionality of the original signals. We then extracted band power features for each trial.

  • OVA: Theta and Alpha powerband features from EEG trials filtered by CSPR-OVA. This procedure was also almost identical to CAR, except that the spatial filtering was performed by CSPR-OVA instead of CAR. There were also band power features for each trial.

V Experimental Results

This section compares the informativeness of the features in Raw, CAR, OVR and OVA, presents the regression performances, and also performs parameter sensitivity analysis for Algorithm 1.

V-a Informativeness of the Features

Before studying the regression performances, it is important to check if the extracted features in Raw, CAR, OVR and OVA are indeed meaningful. We picked a typical subject, partitioned his data random into 50% training and 50% testing, and extracted Raw and CAR. We then designed the spatial filters using CSPR-OVR and CSPR-OVA on the training data, and extracted the corresponding OVR and OVA. For each feature set, we identified the top three channels that had the maximum correlation with the RS using the training data, and also computed the corresponding correlation coefficients for the testing data.

The results are shown in Fig. 5, where in each subfigure the data on the left of the black dotted line were used for training, and the right for testing. The top thick curve is the RS, and the bottom three curves are the maximally correlated features (note that good features are negatively correlated with the RS) identified from the training data. The training and testing correlation coefficients are shown on the left and right of the corresponding channel, respectively. Observe that the features from CAR had slightly better correlations with the RS in training than those from Raw, but not necessarily in testing. However, the features from OVR and OVA had much higher training and testing correlations to the RS than those from Raw and CAR, suggesting that CSPR-OVR and CSPR-OVA can indeed increase the signal quality. The reason is: if we view Class as the signal of interest, and all other classes as noises, then CSPR-OVR in (4) enhances the signal to noise ratio of the EEG signal, and CSPR-OVA in (9) enhances the signal to signal-plus-noise ratio.

Fig. 5: Powerband features from different feature extraction methods, and the corresponding training and testing CCs with the RS.

V-B Regression Performance Comparison

The RMSEs and CCs of LASSO and kNN using the four feature sets are shown in Fig. 6 for the 16 subjects. Recall that for each subject the feature extraction methods were run 10 times, each with randomly partitioned training and testing data, and the average regression performances are shown here. The average RMSEs and CCs across all subjects are also shown in the last group of each panel. Observe that CAR had comparable or slightly better performance than Raw. Regardless of which regression algorithm was used, generally OVR and OVA had similar performance, and both of them achieved much smaller RMSEs and much larger CCs than Raw and CAR, suggesting that our extension of CSP from supervised classification to supervised regression can indeed improve the regression performance. Finally, LASSO had better performance than kNN on Raw and CAR, but kNN became better on OVR and OVA.

Fig. 6: RMSEs and CCs of the eight approaches on the 16 subjects.

The corresponding percentage performance improvements of LASSO and kNN using the four feature sets are shown in Fig. 7, where the legend “LASSO,OVR/Raw” means the percentage performance improvement of LASSO on OVR over LASSO on Raw, and other legends should be interpreted in a similar manner. For both LASSO and kNN, OVR and OVA achieved similar performance improvements over Raw, and also over CAR. For LASSO, on average OVR had smaller RMSE than Raw, and larger CC. For kNN, on average OVR had smaller RMSE than Raw, and larger CC.

Fig. 7: Pairwise percentage performance improvement of the algorithms on the 16 subjects.

We also performed a two-way Analysis of Variance (ANOVA) for different regression algorithms to check if the RMSE and CC differences among the four feature sets were statistically significant, by setting the subjects as a random effect. The results are shown in Table I, which indicated that there were statistically significant differences in both RMSEs and CCs among different feature sets for both LASSO and kNN.

LASSO kNN
RMSE CC RMSE CC
TABLE I: -values of two-way ANOVA tests for .

Then, non-parametric multiple comparison tests based on Dunn’s procedure [12, 13] were used to determine if the difference between any pair of algorithms was statistically significant, with a -value correction using the False Discovery Rate method [3]. The -values are shown in Table II, where the statistically significant ones are marked in bold. Table II shows that, except for the CC of kNN, generally there was no statistically significant difference between Raw and CAR. However, for both LASSO and kNN, the RMSE and CC differences between and were always statistically significant. In all cases, there were no statistically significant differences between OVR and OVA.

LASSO kNN
RMSE CC RMSE CC
Raw CAR OVR Raw CAR OVR Raw CAR OVR Raw CAR OVR
CAR .5883 .3374 .1437 .0009
OVR .0063 .0034 .0000 .0000 .0000 .0001 .0000 .0000
OVA .0122 .0044 .4960 .0000 .0000 .4970 .0000 .0001 .4937 .0000 .0000 .4741
TABLE II: -values of non-parametric multiple comparison for .

V-C Parameter Sensitivity Analysis

There are two adjustable parameters in CSPR-OVR: , the number of fuzzy classes for the RSs, and , the number of spatial filters for each fuzzy class. In this subsection we study the sensitivity of the regression performance to these two parameters.

The regression performances for ( was fixed to be 21) are shown in Fig. 8. Algorithm 1 was repeated five times, each with a random partition of training and testing data, and the average regression results are shown. For both LASSO and kNN, on average gave worst performance, but resulted in roughly the same RMSE and CC. Hence, seems to be a good compromise between performance and computational cost.

Fig. 8: (a) RMSEs and (b) CCs of LASSO and kNN with respect to , the number of fuzzy classes in Algorithm 1.

The regression performances for ( was fixed to be 3) are shown in Fig. 9. Algorithm 1 was again repeated five times, and the average regression results are shown. For both LASSO and kNN, generally a larger resulted in a smaller RMSE and a larger CC, but the performance may reach a plateau at a certain . Also, a larger means heavier computational cost, which should be taken into consideration in choosing . For the PVT experiment, seemed to achieve a good compromise between performance and computational cost.

Fig. 9: (a) RMSEs and (b) CCs of LASSO and kNN with respect to , the number of spatial filters for each fuzzy class in Algorithm 1.

V-D Different Fuzzy Set Shapes

In Section III we used triangular fuzzy sets for simplicity, but other shapes can also be used. Fig. 10 illustrates how Gaussian fuzzy sets can be designed here: the center of the th Gaussian fuzzy class is at [computed from (7)], and the spread is specially designed so that two adjacent fuzzy sets intersect at the midpoint with membership grade 0.5. As a result, generally the Gaussian fuzzy classes are not symmetric.

When the Gaussian fuzzy classes in Fig. 10 are used in CSPR-OVR and CSPR-OVA, the results are shown in Fig. 11, which are almost identical to those obtained from triangular fuzzy sets (Fig. 6).

Fig. 10: The three fuzzy classes for , when Gaussian fuzzy sets are used.
Fig. 11: RMSEs and CCs of the eight approaches on the 16 subjects, when the three Gaussian fuzzy sets in Fig. 10 are used in CSPR-OVR and CSPR-OVA.

V-E Robustness to Noise

It is also important to study the robustness of different spatial filters to noises. According to [64], there are two types of noises: class noise, which is the noise on the model outputs, and attribute noise, which is the noise on the model inputs. In this subsection we focus on the attribute noise.

As in [64], for each model input, we randomly replaced () of all trials from a subject with a uniform noise between its minimum and maximum values. After this was done for both the training and testing data, we extracted feature sets Raw, CAR, OVR and OVA, and trained LASSO and kNN, on the corrupted training data. We then tested their performances on the corrupted testing data. The results are shown in Fig. 12. Generally, as the noise level increased, the performances decreased, which is intuitive. However, OVR and OVA achieved better RMSEs and CCs than Raw and CAR at almost all noise levels, suggesting that it is still beneficial to use CSPR-OVR and CSPR-OVA even under high attribute noise.

Fig. 12: Average RMSEs and CCs of the eight approaches wrt different attribute noise levels.

V-F Computational cost

Observe from Algorithm 1 that in training CSPR-OVR needs to perform a matrix inversion and an eigen-decomposition to compute ; however, once the training is done, the filtering of new EEG trials can be conducted very efficiently by a simple matrix multiplication [see (3)]. Let be the number of training samples. Then, the actual training time of CSPR-OVR and CSPR-OVA increased linearly with , as shown in Fig. 13. The platform was a Dell XPS15 laptop (Intel i7-6700HQ CPU @2.60GHz, 16 GB memory) running Windows 10 Pro 64-bit and Matlab 2016b. A least squares curve fit shows that the training time is seconds, which should not be a problem for a practical .

Fig. 13: The training time of CSPR-OVR and CSPR-OVA wrt .

Vi Discussions and Future Research

Recall that 5-fold cross-validation was used in the performance evaluation in the previous section, i.e., we concatenated the nine-session data from the same subject, randomly partitioned them into five equal-length folds, and then used four folds for training and the remaining one for testing. So, the training and testing folds contained data from the same sessions. This is equivalent to the case that we label some session-specific data in offline regression. Our results showed that in this case CSPR-OVR and CSPR-OVA can significantly improve the regression performance.

To avoid the use of session-specific data, we also investigated a different validation method: leave-one-session-out validation, in which for each subject we trained the spatial filters using eight sessions and tested them on the remaining session. Interestingly, all four feature sets and both regression models achieved very poor performance here. The reasons are: 1) we need a proper way to normalize the RSs from different sessions, as done for the response times in [16]; and, 2) there is large intra-subject variation, meaning that the EEG responses for the same subject vary at different times (recall that these nine sessions were collected at different days); so, the patterns learned from previous sessions become obsolete for the new session, and hence spatial filtering alone does not help. However, our previous research [58, 62, 61] has shown that transfer learning can cope well with the inter-subject variation (individual differences) in both classification and regression problems, and we conjecture that it can also handle the intra-subject variation. One of our future research directions is to demonstrate the performance of CSPR-OVR and CSPR-OVA in a transfer learning framework to individualize a generalized model for regression problems, as done in [18, 46] for EEG-based cognitive performance classification.

Another direction of our future research will apply CSPR-OVR and CSPR-OVA to other important EEG-based regression problems, e.g., drowsiness (or alertness) estimation during driving, and integrate it with more sophisticated feature extraction approaches, e.g., Riemannian geometry [8], for better regression performance.

Vii Conclusions

EEG signals are easily contaminated by artifacts and noises, so preprocessing is needed before they are fed into a machine learning algorithm in BCI. Spatial filters, e.g., ICA, xDAWN, CSP and CCA, have been widely used to increase the EEG signal quality for classification problems, but their applications in BCI regression problems have been very limited. In this paper, we have proposed two CSP filters for EEG-based regression problems in BCI, which were extended from the CSP filter for classification, by making use of fuzzy sets. Extensive experimental results on EEG-based RS estimation from a large-scale study, which collected 143 sessions of PVT data from 17 subjects during a 5-month period, demonstrated that our proposed spatial filters can significantly increase the EEG signal quality. When used in LASSO and kNN, the spatial filters can reduce the estimation RMSE by , and at the same time increase the CC by .

Acknowledgement

Research was sponsored by the U.S. Army Research Laboratory and was accomplished under Cooperative Agreement Numbers W911NF-10-2-0022 and W911NF-10-D-0002/TO 0023. The views and the conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory or the U.S. Government. This work was also partially supported by the Australian Research Council (ARC) under discovery grant DP150101645.

References

  • [1] T. Akerstedt and M. Gillberg, “Subjective and objective sleepiness in the active individual,” International Journal of Neuroscience, vol. 52, no. 1-2, pp. 29–37, 1990.
  • [2] A. Barachant. (2014) MEG decoding using Riemannian geometry and unsupervised classification. Accessed: 8/17/2016. [Online]. Available: http://alexandre.barachant.org/wp-content/uploads/2014/08/documentation.pdf.
  • [3] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 57, pp. 289–300, 1995.
  • [4] N. Bigdely-Shamlo, T. Mullen, C. Kothe, K.-M. Su, and K. A. Robbins, “The PREP pipeline: standardized preprocessing for large-scale EEG analysis,” Frontiers in Neuroinformatics, vol. 9, 2015.
  • [5] G. Bin, X. Gao, Y. Wang, Y. Li, B. Hong, and S. Gao, “A high-speed BCI based on code modulation VEP,” Journal of neural engineering, vol. 8, no. 2, 2011.
  • [6] G. Bin, X. Gao, Z. Yan, B. Hong, and S. Gao, “An online multi-channel SSVEP-based brain-computer interface using a canonical correlation analysis method,” Journal of neural engineering, vol. 6, no. 4, 2009.
  • [7] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. R. Muller, “Optimizing spatial filters for robust EEG single-trial analysis,” IEEE Signal Processing Magazine, vol. 25, no. 1, pp. 41–56, 2008.
  • [8] M. Congedo, A. Barachant, and A. Andreev, “A new generation of brain-computer interface based on Riemannian geometry,” arXiv: 1310.8115, 2013.
  • [9] A. Delorme and S. Makeig, “EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis,” Journal of Neuroscience Methods, vol. 134, pp. 9–21, 2004.
  • [10] D. F. Dinges and J. W. Powell, “Microcomputer analyses of performance on a portable, simple visual RT task during sustained operations,” Behavior research methods, instruments, & computers, vol. 17, no. 6, pp. 652–655, 1985.
  • [11] G. Dornhege, G. C. B. Blankertz, and K.-R. Muller, “Boosting bit rates in non-invasive EEG single-trial classifications by feature combination and multi-class paradigms,” IEEE Trans. on Biomedical Engineering, vol. 51, no. 6, pp. 993–1002, 2004.
  • [12] O. Dunn, “Multiple comparisons among means,” Journal of the American Statistical Association, vol. 56, pp. 62–64, 1961.
  • [13] ——, “Multiple comparisons using rank sums,” Technometrics, vol. 6, pp. 214–252, 1964.
  • [14] G. H. Golub and C. F. V. Loan, Matrix Computation, 3rd ed.   Baltimore, MD: The Johns Hopkins University Press, 1996.
  • [15] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936.
  • [16] Z. Hu, Y. Sun, J. Lim, N. Thakor, and A. Bezerianos, “Investigating the correlation between the neural activity and task performance in a psychomotor vigilance test,” in Proc. 37th Annual Int’l Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, August 2015, pp. 4725–4728.
  • [17] A. Hyvarinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13, no. 4, pp. 411–430, 2000.
  • [18] R. R. Johnson, D. P. Popovic, R. E. O. andMaja Stikic, D. J. Levendowski, and C. Berka, “Drowsiness/alertness algorithm development and validation using synchronized EEG and cognitive performance to individualize a generalized model,” Biological Psychology, vol. 87, p. 241–250, 2011.
  • [19] I. Jolliffe, Principal component analysis.   Wiley Online Library, 2002.
  • [20] T.-P. Jung, S. Makeig, C. Humphries, T.-W. Lee, M. J. Mckeown, V. Iragui, and T. J. Sejnowski, “Removing electroencephalographic artifacts by blind source separation,” Psychophysiology, vol. 37, no. 2, pp. 163–178, 2000.
  • [21] S. Kerick, C.-H. Chuang, J.-T. King, T.-P. Jung, J. Brooks, B. T. Files, K. McDowell, and C.-T. Lin, “Inter- and intra-individual variations in sleep, subjective fatigue, and vigilance task performance of students in their real-world environments over extended periods,” 2016, submitted.
  • [22] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications.   Upper Saddle River, NJ: Prentice-Hall, 1995.
  • [23] T. D. Lagerlund, F. W. Sharbrough, and N. E. Busacker, “Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition,” Journal of Clinical Neurophysiology, vol. 14, no. 1, pp. 73–82, 1997.
  • [24] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell, “Brain-computer interface technologies in the coming decades,” Proc. of the IEEE, vol. 100, no. 3, pp. 1585–1599, 2012.
  • [25] L.-D. Liao, C.-T. Lin, K. McDowell, A. Wickenden, K. Gramann, T.-P. Jung, L.-W. Ko, and J.-Y. Chang, “Biosensor technologies for augmented brain-computer interfaces in the next decades,” Proc. of the IEEE, vol. 100, no. 2, pp. 1553–1566, 2012.
  • [26] C. T. Lin, R. C. Wu, S. F. Liang, T. Y. Huang, W. H. Chao, Y. J. Chen, and T. P. Jung, “EEG-based drowsiness estimation for safety driving using independent component analysis,” IEEE Trans. on Circuits and Systems, vol. 52, pp. 2726–2738, 2005.
  • [27] C.-T. Lin, Y.-C. Chen, T.-Y. Huang, T.-T. Chiu, L.-W. Ko, S.-F. Liang, H.-Y. Hsieh, S.-H. Hsu, and J.-R. Duann, “Development of wireless brain computer interface with embedded multitask scheduling and its application on real-time driver’s drowsiness detection and warning,” IEEE Trans. on Biomedical Engineering, vol. 55, no. 5, pp. 1582–1591, 2008.
  • [28] C.-T. Lin, L.-W. Ko, I.-F. Chung, T.-Y. Huang, Y.-C. Chen, T.-P. Jung, and S.-F. Liang, “Adaptive EEG-based alertness estimation system by using ICA-based fuzzy neural networks,” IEEE Trans. on Circuits and Systems-I, vol. 53, no. 11, pp. 2469–2476, 2006.
  • [29] S. Makeig, C. Kothe, T. Mullen, N. Bigdely-Shamlo, Z. Zhang, and K. Kreutz-Delgado, “Evolving signal processing for brain-computer interfaces,” Proc. of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567–1584, 2012.
  • [30] E. M. Maynard, C. T. Nordhausen, and R. A. Normann, “The Utah intracortical electrode array: a recording structure for potential brain-computer interfaces,” Electroencephalography and clinical neurophysiology, vol. 102, no. 3, pp. 228–239, 1997.
  • [31] D. J. McFarland, L. M. McCane, S. V. David, and J. R. Wolpaw, “Spatial filter selection for EEG-based communication,” Electroencephalography and clinical Neurophysiology, vol. 103, pp. 386–394, 1997.
  • [32] J. Mellinger, G. Schalk, C. Braun, H. Preissl, W. Rosenstiel, N. Birbaumer, and A. Kubler, “An MEG-based brain-computer interface (BCI),” Neuroimage, vol. 36, no. 3, pp. 581–593, 2007.
  • [33] N. Naseer and K.-S. Hong, “fNIRS-based brain-computer interfaces: a review,” Frontiers in human neuroscience, vol. 9, p. 3, 2015.
  • [34] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, a review,” Sensors, vol. 12, no. 2, pp. 1211–1279, 2012.
  • [35] X. Pei, D. L. Barbour, E. C. Leuthardt, and G. Schalk, “Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans,” Journal of neural engineering, vol. 8, no. 4, 2011.
  • [36] C. C. Ragin, Fuzzy-set social science.   Chicago, IL: The University of Chicago Press, 2000.
  • [37] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Trans. on Rehabilitation Engineering, vol. 8, no. 4, pp. 441–446, 2000.
  • [38] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithm to enhance evoked potentials: application to brain-computer interface,” IEEE Trans. on Biomedical Engineering, vol. 56, no. 8, pp. 2035–2043, 2009.
  • [39] B. Rivet, H. Cecotti, A. Souloumiac, E. Maby, and J. Mattout, “Theoretical analysis of xDAWN algorithm: application to an efficient sensor selection in a P300 BCI,” in Proc. 19th European Signal Processing Conference, Barcelona, Spain, August 2011, pp. 1382–1386.
  • [40] B. Rivet and A. Souloumiac, “Optimal linear spatial filters for event-related potentials based on a spatio-temporal model: Asymptotical performance analysis,” Signal Processing, vol. 93, no. 2, pp. 387–398, 2013.
  • [41] R. N. Roy, S. Bonnet, S. Charbonnier, P. Jallon, and A. Campagne, “A comparison of ERP spatial filtering methods for optimal mental workload estimation,” in Proc. 37th Annual Int’l Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015, pp. 7254–7257.
  • [42] C. Russell, J. Caldwell, D. Arand, L. Myers, P. Wubbels, and H. Downs. (2015) Validation of the fatigue science readiband actigraph and associated sleep/wake classification algorithms. Accessed: 08/11/2016. [Online]. Available: http://static1.squarespace.com/static/550af02ae4b0cf85628d981a/t/5526c99ee4b019412c323758/1428605342303/Readiband_Validation.pdf.
  • [43] F. Sagberg, P. Jackson, H.-P. Kruger, A. Muzer, and A. Williams, “Fatigue, sleepiness and reduced alertness as risk factors in driving,” Institute of Transport Economics, Oslo, Tech. Rep. TOI Report 739/2004, 2004.
  • [44] R. Sitaram, A. Caria, R. Veit, T. Gaber, G. Rota, A. Kuebler, and N. Birbaumer, “fMRI brain-computer interface: a tool for neuroscientific research and treatment,” Computational intelligence and neuroscience, 2007.
  • [45] M. Spuler, A. Walter, W. Rosenstiel, and M. Bogdan, “Spatial filtering based on canonical correlation analysis for classification of evoked or event-related potentials in EEG data,” IEEE Trans. on Neural Systems and Rehabilitation Engineering, vol. 22, no. 6, pp. 1097–1103, 2014.
  • [46] M. Stikic, R. R. Johnson, D. J. Levendowski, D. P. Popovic, R. E. Olmstead, and C. Berka, “EEG-derived estimators of present and future cognitive performance,” Frontiers in Human Neuroscience, vol. 5, 2011.
  • [47] D. S. Tan and A. Nijholt, Eds., Brain-Computer Interfaces: Applying our Minds to Human-Computer Interaction.   London: Springer, 2010.
  • [48] M. Teplan, “Fundamentals of EEG measurement,” Measurement Science Review, vol. 2, no. 2, pp. 1–11, 2002.
  • [49] J. A. Uriguen and B. Garcia-Zapirain, “EEG artifact removal – state-of-the-art and guidelines,” Journal of Neural Engineering, vol. 12, no. 3, 2015.
  • [50] US Department of Defense Office of the Secretary of Defense, “Code of federal regulations protection of human subjects,” Government Printing Office, no. 32 CFR 19, 1999.
  • [51] US Department of the Army, “Use of volunteers as subjects of research,” Government Printing Office, no. AR 70-25, 1990.
  • [52] (2011) Traffic safety facts crash stats: drowsy driving. US Department of Transportation, National Highway Traffic Safety Administration. Washington, DC. [Online]. Available: http://www-nrd.nhtsa.dot.gov/pubs/811449.pdf
  • [53] J. van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces: Beyond medical applications,” Computer, vol. 45, no. 4, pp. 26–34, 2012.
  • [54] R. Vigario, J. Sarela, V. Jousmiki, M. Hamalainen, and E. Oja, “Independent component approach to the analysis of EEG and MEG recordings,” IEEE Trans. on Biomedical Engineering, vol. 47, no. 5, pp. 589–593, 2000.
  • [55] L.-X. Wang, A Course in Fuzzy Systems and Control.   Upper Saddle River, NJ: Prentice Hall, 1997.
  • [56] C.-S. Wei, Y.-P. Lin, Y.-T. Wang, T.-P. Jung, N. Bigdely-Shamlo, and C.-T. Lin, “Selective transfer learning for EEG-based drowsiness detection,” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Hong Kong, October 2015.
  • [57] P. Welch, “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. on Audio Electroacoustics, vol. 15, pp. 70–73, 1967.
  • [58] D. Wu, “Online and offline domain adaptation for reducing BCI calibration effort,” IEEE Trans. on Human-Machine Systems, 2016, in press.
  • [59] D. Wu, C.-H. Chuang, and C.-T. Lin, “Online driver’s drowsiness estimation using domain adaptation with model fusion,” in Proc. Int’l Conf. on Affective Computing and Intelligent Interaction, Xi’an, China, September 2015, pp. 904–910.
  • [60] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Offline EEG-based driver drowsiness estimation using enhanced batch-mode active learning (EBMAL) for regression,” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Budapest, Hungary, October 2016.
  • [61] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Spectral meta-learner for regression (SMLR) model aggregation: Towards calibrationless brain-computer interface (BCI),” in Proc. IEEE Int’l Conf. on Systems, Man and Cybernetics, Budapest, Hungary, October 2016.
  • [62] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Driver drowsiness estimation from EEG signals using online weighted adaptation regularization for regression (OwARR),” IEEE Trans. on Fuzzy Systems, 2016, in press.
  • [63] L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, pp. 338–353, 1965.
  • [64] X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study of their impacts,” Artificial Intelligence Review, vol. 22, pp. 177–210, 2004.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
13008
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description