White-Box Target Attack
for EEG-Based BCI Regression Problems
Machine learning has achieved great success in many applications, including electroencephalogram (EEG) based brain-computer interfaces (BCIs). Unfortunately, many machine learning models are vulnerable to adversarial examples, which are crafted by adding deliberately designed perturbations to the original inputs. Many adversarial attack approaches for classification problems have been proposed, but few have considered target adversarial attacks for regression problems. This paper proposes two such approaches. More specifically, we consider white-box target attacks for regression problems, where we know all information about the regression model to be attacked, and want to design small perturbations to change the regression output by a pre-determined amount. Experiments on two BCI regression problems verified that both approaches are effective. Moreover, adversarial examples generated from both approaches are also transferable, which means that we can use adversarial examples generated from one known regression model to attack an unknown regression model, i.e., to perform black-box attacks. To our knowledge, this is the first study on adversarial attacks for EEG-based BCI regression problems, which calls for more attention on the security of BCI systems.
Keywords:Adversarial attack, brain-computer interfaces, regression, target attack, white-box attack
Machine learning has been widely used to solve many difficult tasks. One of them is brain-computer interfaces (BCIs). BCIs enable a user to directly communicate with a computer via brain signals , and have attracted lots of research interest recently [10, 11]. Electroencephalogram (EEG) is the most frequently used input signal in BCIs, because of its low-cost and non-invasive nature. Three commonly used BCI paradigms are motor imagery (MI) , event-related potentials (ERP) [16, 21], and steady-state visual evoked potentials (SSVEP) . Machine learning can be used to extract more generalizable features  and construct high-performance models , and hence makes BCIs more robust and user-friendly.
Recent research has shown that many machine learning models are vulnerable to adversarial examples. By adding deliberately designed perturbations to legitimate data, adversarial examples can cause large changes in the model outputs. The perturbations are usually so small that they are hardly noticeable by a human or a computer program, but can dramatically degrade the model performance. For example, in image recognition, adversarial examples can easily mislead a classifier to give a wrong output . In speech recognition, adversarial examples can generate audio that sounds meaningless to a human, but be understood as a meaningful voice command by a smart phone . Our recent work  also showed that adversarial examples can dramatically degrade the classification accuracy of EEG-based BCIs.
There are many different approaches for crafting adversarial examples. Szegedy et al.  first discovered the existence of adversarial examples in 2014, and proposed an optimization-based approach, L-BFGS, to find them. Goodfellow et al.  proposed a fast gradient sign method (FGSM) in 2014, which can rapidly find adversarial examples by searching for perturbations in the direction the loss has the fastest change. Carlini and Wagner  proposed the CW method in 2017, which can find adversarial examples with very small distortions.
All above approaches focused on classification problems, which find perturbations that can push the original examples cross the decision boundary. Jagielski et al.  conducted the first non-target adversarial attacks for linear regression models. This paper considers target adversarial attacks for regression problems, which change the model output by a pre-determined amount. Our contributions are:
We propose two approaches, based on optimization and gradient, respectively, to perform white-box target attack for regression problems.
We validate the effectiveness of our proposed approaches in two EEG-based BCI regression problems (drowsiness estimation and reaction time estimation). They can craft adversarial EEG trials that a human cannot distinguish from the original EEG trials, but can dramatically change the outputs of the BCI regression model.
We show that adversarial examples crafted by our approaches are transferable: adversarial examples crafted from a ridge regression model can also successfully attack a neural network model, and vice versa. This makes black-box attacks possible.
The attacks proposed in this paper may pose serious security and safety problems in real-world BCI applications. For example, an EEG-based BCI system may be used to monitor the driver’s drowsiness level and urge him/her to take breaks accordingly. An attack that deliberately changes the estimated drowsiness level from a high value to a low value may overload the driver, and hence cause accidents.
The remainder of this paper is organized as follows: Section 2 introduces several typical adversarial attack approaches for classification problems. Section 3 proposes two white-box target attack approaches for regression problems. Section 4 evaluates the performances of our proposed approaches in two EEG-based BCI regression problems. Section 5 draws conclusion.
2 Adversarial Attacks for Classification Problems
This section introduces two typical adversarial attack approaches for classification problems, which are extended to regression problems in the next section.
2.1 Adversarial Attack Types
Assume a valid benign example ( is the dimensionality of ) is classified into Class by a classifier . It is possible to find an adversarial example , which is very similar to the original sample according to some distance metric , but is misclassified to . According to how is different from , there can be two types of attacks:
Target attack, in which all adversarial examples are classified into a pre-determined class .
Non-target attack, whose goal is to construct adversarial examples that will be misclassified, but does not require them to be misclassified into a particular class.
According to how much knowledge the attacker can obtain about the target model (the model to be attacked), adversarial attacks can also be categorized into:
White-box attack, in which the attacker knows all information about the target model, such as its architecture and all parameter values.
Black-box attack, in which the attacker does not know the architecture and parameters of the target model; instead, he/she can feed some inputs to it and observe its outputs. In this way, he/she can obtain some training examples, and train a substitute model to craft adversarial examples to attack the target model. This approach makes use of the transferability of the adversarial examples .
2.2 White-Box Target Attack Approaches
This paper considers white-box target attacks only. Assume we know the architecture and all parameters of the classifier . We want to craft an adversarial example from an input so that , where is a fixed class for all .
Two representative target attack approaches for classification problems are:
automatically satisfies the constraint . in (1) is the variable to be optimized, which can assume any value in .
Given , is found through:
where is a trade-off parameter, and
in which is the logits of the target model in Class , and controls the confidence of the adversarial example. A large forces the adversarial example to be classified into the target class with high confidence.
where controls the amplitude of the perturbation, is a loss function, and is the true label of .
ITCM performs target attack by replacing in (4) by the target class . It also improves the attack performance by taking multiple small steps of in the gradient direction and clipping the maximum perturbation to , instead of taking a single large step of in (4):
where ensures the difference between each dimension of and the corresponding dimension of does not exceed .
3 White-Box Target Attack for Regression Problems
The section proposes two white-box target attack approaches for regression problems.
Let be an input, the groundtruth output, and the regression model. Target attack aims to generate a small perturbation such that the adversarial example can change the regression output to , where is a predefined target222The regression output can also be changed to . Without loss of generality, is considered in this paper.:
3.1 CW for Regression (CW-R)
To extend the CW target attack approach from classification to regression, we optimize the following loss function:
The constructed adversarial example is then .
The pseudocode of the proposed CW method for regression (CW-R) is shown in Algorithm 1. It uses iterative binary search to find the optimal trade-off parameter .
3.2 Iterative Fast Gradient Sign Method for Regression (IFGSM-R)
Iterative fast gradient sign method for regression (IFGSM-R) extends ITCM from classification to regression.
Define the loss function
which is essentially the same as (9), except that a change of variable is not used here. Then, the adversarial example can be iteratively calculated as:
The pseudocode of the proposed IFGSM-R is shown in Algorithm 2.
4 Experiments and Results
This section evaluates the performances of the two proposed white-box target attack approaches in two BCI regression problems.
4.1 The Two BCI Regression Problems
We used the following two BCI regression datasets in our experiments:
Driving. The driving dataset was collected from 16 subjects (ten males, six females; age 24.2 3.7), who participated in a sustained-attention driving experiment [4, 23]. Our task was to predict the drowsiness index from the EEG signals, which were recorded using 32 channels with a sampling rate of 500 Hz. Our preprocessing and feature extraction procedures were identical to those in . We applied a [1,50] Hz band-Rass filter to remove artifacts and noise, and then downsampled the EEG signals from 500 Hz to 250 Hz. Next, we computed the average power spectral density in the theta band (4-7 Hz) and alpha band (7-13 Hz) for each channel, and used them as our features, after removing abnormal channels. Since data from one subject were not recorded correctly, we only used 15 subjects in our paper. Each subject had about 1000 samples. More details about this dataset can be found in [4, 19].
PVT. A psychomotor vigilance task (PVT)  uses reaction time (RT) to measure a subject’s response speed to a visual stimulus. Our dataset  consisted of 17 subjects (13 males, four females; age 22.4 1.6), each with 465-843 trials. The 64-channel EEG signals were preprocessed using the standardized early-stage EEG processing pipeline (PREP) . Then, they were downsampled from 1000 Hz to 256 Hz, and passed through a Hz band-Rass filter. Similar to the driving dataset, we also computed the average power spectral density in the theta band (4-7 Hz) and alpha band (7-13 Hz) for each channel as our features. The goal was to predict a user’s RT from the EEG signals. More details about this dataset can be found in [22, 20].
4.2 Experimental Settings and Performance Measures
We performed white-box target attack on the two BCI regression datasets. Assume the attacker knows all information about the regression model, i.e., its architecture and parameters. We crafted adversarial examples that can change the regression model output by a pre-determined amount.
Two regression models were considered. The first was ridge regression (RR) with ridge parameter . The second was a multi-layer perceptron (MLP) neural network with two hidden layers and 50 nodes in each layer. We used the Adam optimizer  and the root mean squared error (RMSE) as the loss function. Early stopping was used to reduce over-fitting.
Two attack scenarios were considered:
Within-subject attack. For each individual subject, we randomly chose data for training the RR model and the rest for testing. For the MLP, we further randomly set apart of the training set as the validation set in early stopping. We computed the test RMSE for each subject, and also their average across all subjects.
Cross-subject attack. Each time we picked one subject as the test subject, and concatenated data from all remaining subjects together to train the RR model. For the MLP, of these data were randomly selected for training, and the remaining for validation in early stopping. RMSEs were computed on the test subject.
Attack success rate (ASR) and distortion were used to evaluate the attack performance. The ASR was defined as the percentage of adversarial examples whose prediction satisfied , where was our targeted change. The distortion was computed as the distance between the adversarial example and the original example.
4.3 Experimental Results
The baseline regression performances on the original (unperturbed) EEG data are shown in the first panel of Table 1, where “mean output (MO)” is the mean of the regression outputs for all EEG trials. For each regression model on each dataset, the cross-subject RMSE was always larger than the corresponding within-subject RMSE, which is intuitive, because individual differences make it difficult to develop a model that generalizes well across subjects.
We set in both CW-R and IFGSM-R, and called the attack a success if . and were used in CW-R. , , and grid search for were used in IFGSM-R. We used distance to measure the distortion of the adversarial examples. The attack performances are shown in the second and third panels of Table 1:
The RMSEs after CW-R and IFGSM-R attacks were always much larger than those before the attacks, indicating that the attacks dramatically changed the characteristics of the model output.
For each regression model on each dataset, the mean output of the adversarial examples was always larger than that of the original examples by at least , which was our target. This suggests that both CW-R and IFGSM-R were effective.
The ASRs of both CW-R and IFGSM-R were always close to 100%, indicating that almost all attacks were successful. A closer look revealed that the ASR of CW-R was always slightly larger than the corresponding ASR of IFGSM-R, and the RMSE, mean output, and distortion of CW-R were always smaller than the corresponding quantities of IFGSM-R, i.e., CW-R was generally more effective than IFGSM-R. However, the computational cost of CW-R was much higher than IFGSM-R.
It’s also interesting to check if adding random noise can significantly degrade the regression performance; if so, then no deliberate adversarial example crafting is needed. To this end, we performed attacks by adding random Gaussian noise to the original examples, where was chosen so that the resulted distortion approximately equaled the maximum distortion introduced by CW-R and IFGSM-R. The corresponding attack performances are shown in the last panel of Table 1. Though the distortion was large, random Gaussian noise almost did not change the regression RMSE and the mean output, and its ASR was always , suggesting that sophisticated attack approaches like CW-R and IFGSM-R are indeed needed.
Some examples of the original EEG trials and those after adding adversarial perturbations are shown in Fig. 1. The differences between the original and adversarial trials were too small to be distinguished by a human, which should also be very difficult to be detected by a computer algorithm.
4.4 Spectrogram Analysis
This section utilizes spectrogram analysis to further understand the characteristics of the adversarial examples. We computed the mean spectrogram of all EEG trials, the mean spectrogram of all successful adversarial examples, and the mean spectrogram of the corresponding perturbations, using wavelet decomposition. Fig. 2 shows the results, where the adversarial examples were designed for MLP on the PVT dataset. There is no noticeable difference between the mean spectrograms of the original EEG trials and the adversarial examples crafted by our two approaches. This suggests that adversarial examples are difficult to distinguish from spectrogram analysis.
The third column of Fig. 2 shows the difference between the mean spectrograms in the first two columns. Note that the amplitudes were much smaller than those in the first two columns. The patterns of those two perturbations are similar. The energy of those perturbations was concentrated in [3,10] Hz, and was almost uniformly distributed in the entire time domain.
4.5 Transferability of Adversarial Examples between Different Regression Models
The transferability of adversarial examples means that adversarial examples designed to attack one model may also be used to attack a different model. This property makes black-box attacks possible, where we have no information about the target regression model at all [13, 14].
Fig. 3 shows the mean output, when adversarial examples designed from MLP were used to attack the RR model [Fig. 3], and vice versa [Fig. 3], in within-subject attacks on the PVT dataset. In Fig. 3, although the attack performance on RR degraded compared with the attack performance on MLP, the adversarial examples still dramatically changed the outputs of RR. Fig. 3 is similar. These demonstrate that adversarial examples generated by CW-R and IFGSM-R are also transferrable, and hence may be used in black-box attacks.
This paper has proposed two white-box target attack approaches, CW-R and IFGSM-R, for regression problems, and applied them to two EEG-based BCI regression problems. Both approaches can successfully change the model output by a pre-determined amount. Generally, CW-R achieved better attack performance than IFGSM-R, in terms of a larger ASR and a smaller distortion; however, its computational cost is higher than IFGSM-R. We also verified that the adversarial examples crafted from both CW-R and IFGSM-R are transferrable, and hence adversarial examples generated from a known regression model can also be used to attack an unknown regression model, i.e., to perform black-box attacks.
To our knowledge, this is the first study on adversarial attacks for EEG-based BCI regression problems, which calls for more attention on the security of BCI systems. Our future research will study how to defend such attacks.
-  Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K.M., Robbins, K.A.: The PREP pipeline: standardized preprocessing for large-scale EEG analysis. Frontiers Neuroinform 9, 16 (Jun 2015)
-  Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., Wagne, D., Zhou, W.: Hidden voice commands. In: Proc. 25th USENIX Security Symposium. Austin, TX (Aug 2016)
-  Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proc. IEEE Symposium on Security and Privacy. San Jose, CA (May 2017)
-  Chuang, C.H., Ko, L.W., Jung, T.P., Lin, C.T.: Kinesthesia in a sustained-attention driving task. Neuroimage 91, 187â202 (2014)
-  Dinges, D.F., Powell, J.W.: Microcomputer analyses of performance on a portable, simple visual RT task during sustained operations. Behavior Res. Methods, Instrum., Comput 17(6), 652â655 (1985)
-  Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and Harnessing Adversarial Examples. In: Proc. Int’l Conf. on Learning Representations. San Diego, CA (Dec 2014)
-  Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In: Proc. IEEE Symposium on Security and Privacy. San Francisco, CA (May 2018)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proc. Int’l Conf. on Learning Representations. Banff, Canada (Apr 2014)
-  Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proc. Int’l Conf. on Learning Representations. Toulon, France (Apr 2017)
-  Lance, B.J., Kerick, S.E., Ries, A.J., Oie, K.S., McDowell, K.: Brain-computer interface technologies in the coming decades. Proc. of the IEEE 100(Special Centennial Issue), 1585–1599 (May 2012)
-  Makeig, S., Kothe, C., Mullen, T., Bigdely-Shamlo, N., Zhang, Z., Kreutz-Delgado, K.: Evolving signal processing for brain-computer interfaces. Proc. of the IEEE 100(Special Centennial Issue), 1567–1584 (May 2012)
-  Middendorf, M., McMillan, G., Calhoun, G., Jones, K.: Brain-computer interfaces based on the steady-state visual-evoked response. IEEE Trans. on Rehabilitation Engineering 8(2), 211–214 (Jun 2000)
-  Papernot, N., McDaniel, P.D., Goodfellow, I.J.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR abs/1605.07277 (2016), http://arxiv.org/abs/1605.07277
-  Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proc. ACM Asia Conf. on Computer and Communications Security. Abu Dhabi, UAE (Apr 2017)
-  Pfurtscheller, G., Neuper, C.: Motor imagery and direct brain-computer communication. Proc. of the IEEE 89(7), 1123–1134 (Jul 2001)
-  Sutton, S., Braren, M., Zubin, J., John, E.R.: Evoked-potential correlates of stimulus uncertainty. Science 150(3700), 1187–1188 (1965)
-  Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: Proc. Int’l Conf. on Learning Representations. Banff, Canada (Apr 2014)
-  Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain-computer interfaces for communication and control. Clinical Neurophysiology 113(6), 767–91 (Jun 2002)
-  Wu, D., Chuang, C.H., Lin, C.T.: Online driver’s drowsiness estimation using domain adaptation with model fusion. In: Proc. Int’l Conf. Affective Computing and Intelligent Interaction. Xi’an, China (Sep 2015)
-  Wu, D., King, J.T., Chuang, C.H., Lin, C.T., Jung, T.P.: Spatial filtering for EEG-based regression problems in brain-computer interface (BCI). IEEE Trans. on Fuzzy Systems 26(2), 771–781 (2018)
-  Wu, D., King, J.T., Chuang, C.H., Lin, C.T., Jung, T.P.: Spatial filtering for EEG-based regression problems in brainâcomputer interface (BCI). IEEE Trans. on Fuzzy Systems 26(2), 771–781 (Apr 2018)
-  Wu, D., Lance, B.J., Lawhern, V.J., Gordon, S., Jung, T.P., Lin, C.T.: EEG-based user reaction time estimation using Riemannian geometry features. IEEE Trans. on Neural Systems and Rehabilitation Engineering 25(11), 2157–2168 (Nov 2017)
-  Wu, D., Lawhern, V.J., Gordon, S., Lance, B.J., Lin, C.T.: Driver drowsiness estimation from EEG signals using online weighted adaptation regularization for regression (OwARR). IEEE Trans. on Fuzzy Systems 25(6), 1522–1535 (Dec 2017)
-  Zander, T.O., Kothe, C.: Towards passive brainâcomputer interfaces: applying brainâcomputer interface technology to humanâmachine systems in general. Journal of Neural Engineering 8(2), 025005 (2011)
-  Zhang, X., Wu, D.: On the vulnerability of CNN classifiers in EEG-based BCIs. IEEE Trans. on Neural Systems and Rehabilitation Engineering pp. 814–825 (2019)