WhiteBox Target Attack
for EEGBased BCI Regression Problems
Abstract
Machine learning has achieved great success in many applications, including electroencephalogram (EEG) based braincomputer interfaces (BCIs). Unfortunately, many machine learning models are vulnerable to adversarial examples, which are crafted by adding deliberately designed perturbations to the original inputs. Many adversarial attack approaches for classification problems have been proposed, but few have considered target adversarial attacks for regression problems. This paper proposes two such approaches. More specifically, we consider whitebox target attacks for regression problems, where we know all information about the regression model to be attacked, and want to design small perturbations to change the regression output by a predetermined amount. Experiments on two BCI regression problems verified that both approaches are effective. Moreover, adversarial examples generated from both approaches are also transferable, which means that we can use adversarial examples generated from one known regression model to attack an unknown regression model, i.e., to perform blackbox attacks. To our knowledge, this is the first study on adversarial attacks for EEGbased BCI regression problems, which calls for more attention on the security of BCI systems.
Keywords:
Adversarial attack, braincomputer interfaces, regression, target attack, whitebox attack1 Introduction
Machine learning has been widely used to solve many difficult tasks. One of them is braincomputer interfaces (BCIs). BCIs enable a user to directly communicate with a computer via brain signals [18], and have attracted lots of research interest recently [10, 11]. Electroencephalogram (EEG) is the most frequently used input signal in BCIs, because of its lowcost and noninvasive nature. Three commonly used BCI paradigms are motor imagery (MI) [15], eventrelated potentials (ERP) [16, 21], and steadystate visual evoked potentials (SSVEP) [12]. Machine learning can be used to extract more generalizable features [24] and construct highperformance models [23], and hence makes BCIs more robust and userfriendly.
Recent research has shown that many machine learning models are vulnerable to adversarial examples. By adding deliberately designed perturbations to legitimate data, adversarial examples can cause large changes in the model outputs. The perturbations are usually so small that they are hardly noticeable by a human or a computer program, but can dramatically degrade the model performance. For example, in image recognition, adversarial examples can easily mislead a classifier to give a wrong output [6]. In speech recognition, adversarial examples can generate audio that sounds meaningless to a human, but be understood as a meaningful voice command by a smart phone [2]. Our recent work [25] also showed that adversarial examples can dramatically degrade the classification accuracy of EEGbased BCIs.
There are many different approaches for crafting adversarial examples. Szegedy et al. [17] first discovered the existence of adversarial examples in 2014, and proposed an optimizationbased approach, LBFGS, to find them. Goodfellow et al. [6] proposed a fast gradient sign method (FGSM) in 2014, which can rapidly find adversarial examples by searching for perturbations in the direction the loss has the fastest change. Carlini and Wagner [3] proposed the CW method in 2017, which can find adversarial examples with very small distortions.
All above approaches focused on classification problems, which find perturbations that can push the original examples cross the decision boundary. Jagielski et al. [7] conducted the first nontarget adversarial attacks for linear regression models. This paper considers target adversarial attacks for regression problems, which change the model output by a predetermined amount. Our contributions are:

We propose two approaches, based on optimization and gradient, respectively, to perform whitebox target attack for regression problems.

We validate the effectiveness of our proposed approaches in two EEGbased BCI regression problems (drowsiness estimation and reaction time estimation). They can craft adversarial EEG trials that a human cannot distinguish from the original EEG trials, but can dramatically change the outputs of the BCI regression model.

We show that adversarial examples crafted by our approaches are transferable: adversarial examples crafted from a ridge regression model can also successfully attack a neural network model, and vice versa. This makes blackbox attacks possible.
The attacks proposed in this paper may pose serious security and safety problems in realworld BCI applications. For example, an EEGbased BCI system may be used to monitor the driver’s drowsiness level and urge him/her to take breaks accordingly. An attack that deliberately changes the estimated drowsiness level from a high value to a low value may overload the driver, and hence cause accidents.
The remainder of this paper is organized as follows: Section 2 introduces several typical adversarial attack approaches for classification problems. Section 3 proposes two whitebox target attack approaches for regression problems. Section 4 evaluates the performances of our proposed approaches in two EEGbased BCI regression problems. Section 5 draws conclusion.
2 Adversarial Attacks for Classification Problems
This section introduces two typical adversarial attack approaches for classification problems, which are extended to regression problems in the next section.
2.1 Adversarial Attack Types
Assume a valid benign example ( is the dimensionality of ) is classified into Class by a classifier . It is possible to find an adversarial example , which is very similar to the original sample according to some distance metric , but is misclassified to . According to how is different from , there can be two types of attacks:

Target attack, in which all adversarial examples are classified into a predetermined class .

Nontarget attack, whose goal is to construct adversarial examples that will be misclassified, but does not require them to be misclassified into a particular class.
According to how much knowledge the attacker can obtain about the target model (the model to be attacked), adversarial attacks can also be categorized into:

Whitebox attack, in which the attacker knows all information about the target model, such as its architecture and all parameter values.

Blackbox attack, in which the attacker does not know the architecture and parameters of the target model; instead, he/she can feed some inputs to it and observe its outputs. In this way, he/she can obtain some training examples, and train a substitute model to craft adversarial examples to attack the target model. This approach makes use of the transferability of the adversarial examples [13].
2.2 WhiteBox Target Attack Approaches
This paper considers whitebox target attacks only. Assume we know the architecture and all parameters of the classifier . We want to craft an adversarial example from an input so that , where is a fixed class for all .
Two representative target attack approaches for classification problems are:

Carlini and Wagner (CW) [3], which improves LBFGS [17]. It introduces a new variable so that
(1) automatically satisfies the constraint . in (1) is the variable to be optimized, which can assume any value in .
Given , is found through:
(2) where is a tradeoff parameter, and
(3) in which is the logits of the target model in Class , and controls the confidence of the adversarial example. A large forces the adversarial example to be classified into the target class with high confidence.

Iterative Target Class Method (ITCM)^{1}^{1}1It is also called the iterative leastlikely class method in [9]. [9], which modifies FGSM [6], an efficient approach for nontarget attacks:
(4) where controls the amplitude of the perturbation, is a loss function, and is the true label of .
ITCM performs target attack by replacing in (4) by the target class . It also improves the attack performance by taking multiple small steps of in the gradient direction and clipping the maximum perturbation to , instead of taking a single large step of in (4):
(5) (6) where ensures the difference between each dimension of and the corresponding dimension of does not exceed .
3 WhiteBox Target Attack for Regression Problems
The section proposes two whitebox target attack approaches for regression problems.
Let be an input, the groundtruth output, and the regression model. Target attack aims to generate a small perturbation such that the adversarial example can change the regression output to , where is a predefined target^{2}^{2}2The regression output can also be changed to . Without loss of generality, is considered in this paper.:
(7) 
3.1 CW for Regression (CWR)
To extend the CW target attack approach from classification to regression, we optimize the following loss function:
(8) 
where
(9)  
(10) 
The constructed adversarial example is then .
The pseudocode of the proposed CW method for regression (CWR) is shown in Algorithm 1. It uses iterative binary search to find the optimal tradeoff parameter .
3.2 Iterative Fast Gradient Sign Method for Regression (IFGSMR)
Iterative fast gradient sign method for regression (IFGSMR) extends ITCM from classification to regression.
Define the loss function
(11) 
which is essentially the same as (9), except that a change of variable is not used here. Then, the adversarial example can be iteratively calculated as:
(12)  
(13) 
The pseudocode of the proposed IFGSMR is shown in Algorithm 2.
4 Experiments and Results
This section evaluates the performances of the two proposed whitebox target attack approaches in two BCI regression problems.
4.1 The Two BCI Regression Problems
We used the following two BCI regression datasets in our experiments:

Driving. The driving dataset was collected from 16 subjects (ten males, six females; age 24.2 3.7), who participated in a sustainedattention driving experiment [4, 23]. Our task was to predict the drowsiness index from the EEG signals, which were recorded using 32 channels with a sampling rate of 500 Hz. Our preprocessing and feature extraction procedures were identical to those in [19]. We applied a [1,50] Hz bandRass filter to remove artifacts and noise, and then downsampled the EEG signals from 500 Hz to 250 Hz. Next, we computed the average power spectral density in the theta band (47 Hz) and alpha band (713 Hz) for each channel, and used them as our features, after removing abnormal channels. Since data from one subject were not recorded correctly, we only used 15 subjects in our paper. Each subject had about 1000 samples. More details about this dataset can be found in [4, 19].

PVT. A psychomotor vigilance task (PVT) [5] uses reaction time (RT) to measure a subject’s response speed to a visual stimulus. Our dataset [22] consisted of 17 subjects (13 males, four females; age 22.4 1.6), each with 465843 trials. The 64channel EEG signals were preprocessed using the standardized earlystage EEG processing pipeline (PREP) [1]. Then, they were downsampled from 1000 Hz to 256 Hz, and passed through a Hz bandRass filter. Similar to the driving dataset, we also computed the average power spectral density in the theta band (47 Hz) and alpha band (713 Hz) for each channel as our features. The goal was to predict a user’s RT from the EEG signals. More details about this dataset can be found in [22, 20].
4.2 Experimental Settings and Performance Measures
We performed whitebox target attack on the two BCI regression datasets. Assume the attacker knows all information about the regression model, i.e., its architecture and parameters. We crafted adversarial examples that can change the regression model output by a predetermined amount.
Two regression models were considered. The first was ridge regression (RR) with ridge parameter . The second was a multilayer perceptron (MLP) neural network with two hidden layers and 50 nodes in each layer. We used the Adam optimizer [8] and the root mean squared error (RMSE) as the loss function. Early stopping was used to reduce overfitting.
Two attack scenarios were considered:

Withinsubject attack. For each individual subject, we randomly chose data for training the RR model and the rest for testing. For the MLP, we further randomly set apart of the training set as the validation set in early stopping. We computed the test RMSE for each subject, and also their average across all subjects.

Crosssubject attack. Each time we picked one subject as the test subject, and concatenated data from all remaining subjects together to train the RR model. For the MLP, of these data were randomly selected for training, and the remaining for validation in early stopping. RMSEs were computed on the test subject.
Attack success rate (ASR) and distortion were used to evaluate the attack performance. The ASR was defined as the percentage of adversarial examples whose prediction satisfied , where was our targeted change. The distortion was computed as the distance between the adversarial example and the original example.
4.3 Experimental Results
The baseline regression performances on the original (unperturbed) EEG data are shown in the first panel of Table 1, where “mean output (MO)” is the mean of the regression outputs for all EEG trials. For each regression model on each dataset, the crosssubject RMSE was always larger than the corresponding withinsubject RMSE, which is intuitive, because individual differences make it difficult to develop a model that generalizes well across subjects.
Scenario  Withinsubject  Crosssubject  

Dataset  Driving  PVT  Driving  PVT  
Model  RR  MLP  RR  MLP  RR  MLP  RR  MLP  
Baseline  RMSE  
MO  
CWR  RMSE  
MO  
ASR  
Distortion  
IFGSMR  RMSE  
MO  
ASR  
Distortion  
Random Noise  RMSE  
MO  
ASR  
Distortion 
We set in both CWR and IFGSMR, and called the attack a success if . and were used in CWR. , , and grid search for were used in IFGSMR. We used distance to measure the distortion of the adversarial examples. The attack performances are shown in the second and third panels of Table 1:

The RMSEs after CWR and IFGSMR attacks were always much larger than those before the attacks, indicating that the attacks dramatically changed the characteristics of the model output.

For each regression model on each dataset, the mean output of the adversarial examples was always larger than that of the original examples by at least , which was our target. This suggests that both CWR and IFGSMR were effective.

The ASRs of both CWR and IFGSMR were always close to 100%, indicating that almost all attacks were successful. A closer look revealed that the ASR of CWR was always slightly larger than the corresponding ASR of IFGSMR, and the RMSE, mean output, and distortion of CWR were always smaller than the corresponding quantities of IFGSMR, i.e., CWR was generally more effective than IFGSMR. However, the computational cost of CWR was much higher than IFGSMR.
It’s also interesting to check if adding random noise can significantly degrade the regression performance; if so, then no deliberate adversarial example crafting is needed. To this end, we performed attacks by adding random Gaussian noise to the original examples, where was chosen so that the resulted distortion approximately equaled the maximum distortion introduced by CWR and IFGSMR. The corresponding attack performances are shown in the last panel of Table 1. Though the distortion was large, random Gaussian noise almost did not change the regression RMSE and the mean output, and its ASR was always , suggesting that sophisticated attack approaches like CWR and IFGSMR are indeed needed.
Some examples of the original EEG trials and those after adding adversarial perturbations are shown in Fig. 1. The differences between the original and adversarial trials were too small to be distinguished by a human, which should also be very difficult to be detected by a computer algorithm.
4.4 Spectrogram Analysis
This section utilizes spectrogram analysis to further understand the characteristics of the adversarial examples. We computed the mean spectrogram of all EEG trials, the mean spectrogram of all successful adversarial examples, and the mean spectrogram of the corresponding perturbations, using wavelet decomposition. Fig. 2 shows the results, where the adversarial examples were designed for MLP on the PVT dataset. There is no noticeable difference between the mean spectrograms of the original EEG trials and the adversarial examples crafted by our two approaches. This suggests that adversarial examples are difficult to distinguish from spectrogram analysis.
The third column of Fig. 2 shows the difference between the mean spectrograms in the first two columns. Note that the amplitudes were much smaller than those in the first two columns. The patterns of those two perturbations are similar. The energy of those perturbations was concentrated in [3,10] Hz, and was almost uniformly distributed in the entire time domain.
4.5 Transferability of Adversarial Examples between Different Regression Models
The transferability of adversarial examples means that adversarial examples designed to attack one model may also be used to attack a different model. This property makes blackbox attacks possible, where we have no information about the target regression model at all [13, 14].
Fig. 3 shows the mean output, when adversarial examples designed from MLP were used to attack the RR model [Fig. 3], and vice versa [Fig. 3], in withinsubject attacks on the PVT dataset. In Fig. 3, although the attack performance on RR degraded compared with the attack performance on MLP, the adversarial examples still dramatically changed the outputs of RR. Fig. 3 is similar. These demonstrate that adversarial examples generated by CWR and IFGSMR are also transferrable, and hence may be used in blackbox attacks.
5 Conclusions
This paper has proposed two whitebox target attack approaches, CWR and IFGSMR, for regression problems, and applied them to two EEGbased BCI regression problems. Both approaches can successfully change the model output by a predetermined amount. Generally, CWR achieved better attack performance than IFGSMR, in terms of a larger ASR and a smaller distortion; however, its computational cost is higher than IFGSMR. We also verified that the adversarial examples crafted from both CWR and IFGSMR are transferrable, and hence adversarial examples generated from a known regression model can also be used to attack an unknown regression model, i.e., to perform blackbox attacks.
To our knowledge, this is the first study on adversarial attacks for EEGbased BCI regression problems, which calls for more attention on the security of BCI systems. Our future research will study how to defend such attacks.
References
 [1] BigdelyShamlo, N., Mullen, T., Kothe, C., Su, K.M., Robbins, K.A.: The PREP pipeline: standardized preprocessing for largescale EEG analysis. Frontiers Neuroinform 9, 16 (Jun 2015)
 [2] Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., Wagne, D., Zhou, W.: Hidden voice commands. In: Proc. 25th USENIX Security Symposium. Austin, TX (Aug 2016)
 [3] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proc. IEEE Symposium on Security and Privacy. San Jose, CA (May 2017)
 [4] Chuang, C.H., Ko, L.W., Jung, T.P., Lin, C.T.: Kinesthesia in a sustainedattention driving task. Neuroimage 91, 187â202 (2014)
 [5] Dinges, D.F., Powell, J.W.: Microcomputer analyses of performance on a portable, simple visual RT task during sustained operations. Behavior Res. Methods, Instrum., Comput 17(6), 652â655 (1985)
 [6] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and Harnessing Adversarial Examples. In: Proc. Int’l Conf. on Learning Representations. San Diego, CA (Dec 2014)
 [7] Jagielski, M., Oprea, A., Biggio, B., Liu, C., NitaRotaru, C., Li, B.: Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In: Proc. IEEE Symposium on Security and Privacy. San Francisco, CA (May 2018)
 [8] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proc. Int’l Conf. on Learning Representations. Banff, Canada (Apr 2014)
 [9] Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proc. Int’l Conf. on Learning Representations. Toulon, France (Apr 2017)
 [10] Lance, B.J., Kerick, S.E., Ries, A.J., Oie, K.S., McDowell, K.: Braincomputer interface technologies in the coming decades. Proc. of the IEEE 100(Special Centennial Issue), 1585–1599 (May 2012)
 [11] Makeig, S., Kothe, C., Mullen, T., BigdelyShamlo, N., Zhang, Z., KreutzDelgado, K.: Evolving signal processing for braincomputer interfaces. Proc. of the IEEE 100(Special Centennial Issue), 1567–1584 (May 2012)
 [12] Middendorf, M., McMillan, G., Calhoun, G., Jones, K.: Braincomputer interfaces based on the steadystate visualevoked response. IEEE Trans. on Rehabilitation Engineering 8(2), 211–214 (Jun 2000)
 [13] Papernot, N., McDaniel, P.D., Goodfellow, I.J.: Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR abs/1605.07277 (2016), http://arxiv.org/abs/1605.07277
 [14] Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical blackbox attacks against machine learning. In: Proc. ACM Asia Conf. on Computer and Communications Security. Abu Dhabi, UAE (Apr 2017)
 [15] Pfurtscheller, G., Neuper, C.: Motor imagery and direct braincomputer communication. Proc. of the IEEE 89(7), 1123–1134 (Jul 2001)
 [16] Sutton, S., Braren, M., Zubin, J., John, E.R.: Evokedpotential correlates of stimulus uncertainty. Science 150(3700), 1187–1188 (1965)
 [17] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R.: Intriguing properties of neural networks. In: Proc. Int’l Conf. on Learning Representations. Banff, Canada (Apr 2014)
 [18] Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Braincomputer interfaces for communication and control. Clinical Neurophysiology 113(6), 767–91 (Jun 2002)
 [19] Wu, D., Chuang, C.H., Lin, C.T.: Online driver’s drowsiness estimation using domain adaptation with model fusion. In: Proc. Int’l Conf. Affective Computing and Intelligent Interaction. Xi’an, China (Sep 2015)
 [20] Wu, D., King, J.T., Chuang, C.H., Lin, C.T., Jung, T.P.: Spatial filtering for EEGbased regression problems in braincomputer interface (BCI). IEEE Trans. on Fuzzy Systems 26(2), 771–781 (2018)
 [21] Wu, D., King, J.T., Chuang, C.H., Lin, C.T., Jung, T.P.: Spatial filtering for EEGbased regression problems in brainâcomputer interface (BCI). IEEE Trans. on Fuzzy Systems 26(2), 771–781 (Apr 2018)
 [22] Wu, D., Lance, B.J., Lawhern, V.J., Gordon, S., Jung, T.P., Lin, C.T.: EEGbased user reaction time estimation using Riemannian geometry features. IEEE Trans. on Neural Systems and Rehabilitation Engineering 25(11), 2157–2168 (Nov 2017)
 [23] Wu, D., Lawhern, V.J., Gordon, S., Lance, B.J., Lin, C.T.: Driver drowsiness estimation from EEG signals using online weighted adaptation regularization for regression (OwARR). IEEE Trans. on Fuzzy Systems 25(6), 1522–1535 (Dec 2017)
 [24] Zander, T.O., Kothe, C.: Towards passive brainâcomputer interfaces: applying brainâcomputer interface technology to humanâmachine systems in general. Journal of Neural Engineering 8(2), 025005 (2011)
 [25] Zhang, X., Wu, D.: On the vulnerability of CNN classifiers in EEGbased BCIs. IEEE Trans. on Neural Systems and Rehabilitation Engineering pp. 814–825 (2019)