Surrogatebased artifact removal from singlechannel EEG
Abstract
Objective: The recent emergence and success of electroencephalography (EEG) in lowcost portable devices, has opened the door to a new generation of applications processing a small number of EEG channels for health monitoring and braincomputer interfacing. These recordings are, however, contaminated by many sources of noise degrading the signals of interest, thus compromising the interpretation of the underlying brain state. In this work, we propose a new datadriven algorithm to effectively remove ocular and muscular artifacts from singlechannel EEG: the surrogatebased artifact removal (SuBAR). Methods: By means of the timefrequency analysis of surrogate data, our approach is able to identify and filter automatically ocular and muscular artifacts embedded in singlechannel EEG. Results: In a comparative study using artificially contaminated EEG signals, the efficacy of the algorithm in terms of noise removal and signal distortion was superior to other traditionallyemployed singlechannel EEG denoising techniques: wavelet thresholding and the canonical correlation analysis combined with an advanced version of the empirical mode decomposition. Even in the presence of mild and severe artifacts, our artifact removal method provides a relative error 4 to 5 times lower than traditional techniques. Significance: In view of these results, the SuBAR method is a promising solution for mobile environments, such as ambulatory healthcare systems, sleep stage scoring or anesthesia monitoring, where very few EEG channels or even a single channel is available.
Keywords: Artifact removal, electroencephalography (EEG), singlechannel EEG, surrogate data, wavelet decomposition
1 Introduction
Electroencephalogram (EEG) s the standard recording of electrophysiological activity of the brain. Due to its temporal resolution (ms), technical simplicity (portable and noninvasive) and low cost, EEG is nowadays extensively used for studying different cognitive and pathological brain states. EEG recordings are, however, often contaminated by nonneural physiological activities, as well as other external or environmental noises, that seriously degrade the signals of interest.
Eye movementrelated artifacts have a strong detrimental effect on the quality of scalp EEG. During eye movements, abrupt changes on the retina’s resting potential are primarily observed in the frontal EEG electrodes before their widespread propagation over the scalp. The strength and spatial distribution of the artifact strongly depends on the position of EEG electrodes and on the direction of the eye movement [2, 3, 1]. Eye blinks also contaminate EEG signals but with an artifact whose amplitude is generally larger than that produced by eye movements. Muscular artifacts originate from the electrical activity elicited by contracting muscles. Although electromyographic (EMG) activity can be observed over the entire scalp, the amplitude and distribution of EMG artifacts depend on the type of muscle contracted (e.g. jaw, neck or face) and on the degree of tension [2, 3, 1]. Other possible perturbations include breathing artifacts, electrodermal interferences produced by sweating, motion artifacts (e.g. head or chest movements), as well as shifts of the electrical properties of electrodes.
In clinical settings, visual inspection and manual removal of contaminated EEG segments is a common practice prior to any offline signal analysis. Obviously, such manual methods are not suitable for online applications. There is a number of general techniques used for artifact removal from EEG recordings. When the frequency bands of the signal and interferences do not overlap, simple low pass, band pass or high pass filtering are effective techniques for removing artifacts. Nevertheless, some interferences (e.g. muscular activities) have a wide spectral distribution that overlaps with that of EEG, making difficult to remove them. In case of spectral overlap, more refined techniques such as adaptive filtering, Wiener filtering, as well as blind source separation (BSS) methods have been effectively used to cancel EEG interferences. Other methods like wavelet decompositions (WT) and empirical mode decomposition (EMD) have also been successfully applied to remove EEG artifacts [10, 12]. For a review and discussion of different approaches see [2, 3, 1] and references therein.
Linear regressions (in time or frequency domain) and adaptive filtering have been successfully applied in EOG and ECG correction procedures [1]. Their main disadvantage is, however, that they assume that one or more reference channels with the artifacts waveforms are available. Independent component analysis (ICA) is a blind signal separation (BSS) technique that has been largely used for EEG artifact removal [5, 4]. Briefly, ICA separates multichannel EEG signals into statistically independent components which are assumed to represent the underlying sources of the observed EEG signals. Clean EEG signals can be reconstructed by removing artifactrelated components from the original ICA decomposition. ICAbased methods can effectively remove interferences from a wide variety of artifactual sources in EEG recordings only when the number of channels and the amount of data are large enough [1]. Canonical correlation analysis (CCA) is a more efficient BSS method for muscular artifact removal that exploits the relative low autocorrelation of EMG artifacts in comparison with EEG activity [7, 11].
Artifact removal by waveletbased methods and other datadriven decompositions (e.g. the EMD and its variants [31, 34, 1]) relies on the assumption that EEG artifacts can be represented by one or more levels or modes, which are thresholded before reconstructing the signal from the filtered representation. The success of these techniques depends on the threshold selecting criteria. In recent studies, different combinations of algorithms to remove artifacts (e.g. EMDICA, EMDCCA) have provided significantly improved results [11, 13, 6].
In last years, simplified EEG systems with few channels have been developed with the aim to increase usability of ambulatory neuroimaging technologies in clinical environments (e.g. in epilepsy and sleep diagnosis) and in customdesigned settings for routine monitoring [14, 15]. For some applications like neurofeedback, mental state classification, emotion sensing, etc., artifact removal algorithms should perform reasonably well with short epochs of streaming EEG data. Further, conventional multichannel techniques can not be applied directly to isolate artifact sources in reduced channels configurations. Hence, there is a growing need to develop effective artifact removal techniques that can operate in short segments of single channel EEG, especially in singlechannel settings [2, 13, 6, 12].
In this work, we propose a datadriven approach for artifact removal from singlechannel EEG: the SurrogatesBased Artifact Removal (SuBAR) method. Our approach combines wavelet decomposition and a resampling method called surrogate data. Under the hypothesis of stationary EEG segments, muscular and ocular artifacts are considered as nonstationary events that can be identified across different scales. The obtained scales from the original recorded EEG are then compared with those obtained from resampled data, which are stationary by construction. Hence, instead of estimating the filtering threshold from uncontaminated segments or theoretical functions, we obtain it directly from the WT of surrogate data.
The proposed framework is validated on artifactfree EEG data contaminated with simulated artifacts of several types (EMG or EOG). The method is also illustrated on a real EEG contaminated data collected from a healthy subject. The reliability and performances of our method are also compared with those obtained by an unsupervised waveletbased artifact removal [11, 6, 1, 12] and by those resulting from the CCA in combination with an advanced version of the EMD [11, 34, 6]. The remainder of the paper is organized as follows: Section 2 describes the proposed framework, as well as the WT and CCAEMD approaches. Section 3 presents the database and Section 4 describes the procedure to simulate EMG and EOG artifacts [4]. Section 5 provides the experimental results and evaluation of the method. Finally, we conclude the paper with a discussion in Section 6.
2 Methods
In this section, we describe the various techniques employed in this work. Firstly, we describe our surrogatesbased EEG technique, and then we outline two alternative methods for artifact removal from EEG singlechannels (used here for comparison): a wavelet thresholding method [11, 6, 1, 12], and the artifact removal algorithm based on CCA combined with the EMD [11].
2.1 Resamplingbased artifact removal
In this work, we assume that a recorded singlechannel EEG signal
The aim of our work is to filter from the vector of observations , with minimal a priori knowledge on the artifact signal, the EEG signal and the observational noise. To this end, we have used wavelet transform as a tool to detect and remove artifacts from single EEG channels. Waveletbased artifact removal aims at separating the artifact components from the clean EEG components in the wavelet domain. Once the artifact components have been identified and the corresponding wavelet coefficients removed, the remaining components are kept to reconstruct the cleaned EEG signal. The thresholding criterion is traditionally based on statistical properties of the wavelet spectrum obtained from uncontaminated baselines or from theoretical thresholding functions. The originality of our approach is that we propose to estimate this threshold directly from the stationarized spectrum of the observed data.
Figure 1 shows the general pipeline of the artifact removal method proposed in this study. The key point of our method is that the hypothesis of EEG stationarity (which corresponds to timeinvariance in the timefrequency spectrum) is statistically characterized on the basis of a set of surrogates which all share the same average stationary spectrum as the desired EEG signal. In our approach, the recorded EEG signal is of the form , where the artifact is assumed to be a nonstationary event restricted to a finite time interval shorter than the observed signal . Since surrogates can be viewed as distinct, independent stationary realizations of the observed signal, the wavelet spectra obtained from surrogates can define the learning set for stationarity. Wavelet thresholding is therefore based on the statistical distribution of the spectrum of surrogates (for each timefrequency bin). All the components of the original wavelet decomposition higher than a given threshold are set to zero, and the desired EEG signal is reconstructed utilizing the remaining components only.
The main blocks of the proposed method are the following:
Preprocessing
In most EEG systems, some procedures are currently applied to prepare the EEG data for analysis. These procedures include rereferencing of recorded signals, downsampling of original data for saving transmission power and computational cost, or filtering for baseline and powerline (50/60 Hz) interference removal. In our preprocessing step, the sampled raw EEG signal is divided into epochs of size . Very short segments ( s) may not represent some slow artifacts properly (e.g. ocular or movement artifacts) whereas in very large windows (e.g. s), the stationary assumption of EEG may be no longer valid and there is a high chance that clean EEG segments will be distorted by the artifact removal procedure [30]. Here, we set s.
Resampling
Bootstrap and other resampling methods have been used extensively in the past to appropriately determine the properties of a time series before applying different analysis and modeling techniques [22]. The aim of these techniques is to capture a given structure of the original signal, and construct additional realizations that can then be used to test the original data for additional, unexplained structure. Surrogate data techniques were proposed as nonparametric methods for testing general hypotheses on data without making assumptions on the underlying generating process. Surrogates are time series created directly from the original dataset through replication of the linear autocorrelation (or equivalently, the power spectrum density) and amplitude distribution, with all other higherorder quantities randomized [24].
In this work, surrogate time series were obtained by destroying the organized phase structure that controls the nonstationarity of the original signal [23]. In the classical Fouriertransform surrogate algorithm (FT), the signal is first Fourier transformed, and the magnitude of the spectrum is then kept unchanged while its phases are replaced by a random sequence, uniformly distributed over [23, 24]. This modified spectrum is then inverse Fourier transformed, leading to a Gaussian stationary surrogate time series with the same spectrum as the original signal [23, 24]. To deal with nonGaussian data, the algorithm of amplitude adjusted Fourier transform (AAFT) first orders a Gaussian white noise time series to match the rank order of the original data and derives the FT surrogate of this time series [23]. The final surrogate is scaled to the distribution of the original data by sorting the original data according to the ranking of the FT surrogate. [23]. The AAFT algorithm guarantees that surrogate data possesses the original distribution exactly and the original power spectrum approximately [24]. In this work, we have used the socalled Iterative Amplitude Adjusted Fourier Transform (IAAFT), which is an iterative version of AAFT [26, 24]. The steps are repeated until the autocorrelation function is sufficiently similar to the original, or until there is no change in the amplitudes [26, 24].
In the context of nonlinear time series analysis, surrogate data have been widely used for testing linearity and, more recently, for testing stationarity [24, 25]. To this end, timefrequency decompositions are used for determining whether the spectral characteristics of a signal change significantly over time, which is an indication of nonstationarity [25]. The timefrequency localization properties of the wavelet transform allow a comparison of the original signal with the stationarized surrogates in the wavelet domain. We can therefore detect the position, on the timefrequency plane, where the spectrum of the observed EEG differs from a stationary process. In this study, values derived from surrogate realizations were used to represent the spectrum of stationary signals.
Wavelet decomposition
Wavelet transform is a method commonly used to remove artifacts from EEG signals [11, 10, 6, 1, 12]. The wavelet transform is an useful analysis tool for timefrequency representation of nonstationary signals, obtained by convolving the signal with a scaled and translated wavelet function,
(1) 
with and , positive real numbers and . The WT provides thus a decomposition of the signal in different scales, where the obtained wavelet coefficients represent a measure of similarity between the signal and the corresponding wavelet function .
In the discrete wavelet transform (DWT) both factors and are integers and are chosen in a dyadic grid ( and with integers and playing roles of the decomposition level and temporal localization at this level, respectively). At the th level of decomposition, a matrix of size can be appropriately built with the corresponding orthonormal wavelet basis. The wavelet coefficients can be thus obtained by . The DWT can be seen as a bandpass filter bank: the decomposition starts by passing the signal through a lowpass filter giving the approximation coefficients, and a highpass filter producing the detail coefficients. The resulting two orthogonal subbands are afterwards downsampled by two. Then, the lowpass result can be recursively filtered by the same pair of filters until the desired frequency range is obtained.
In this paper, we used the Maximal Overlap Wavelet Transform (MODWT) [28], instead of the orthogonal DWT, to estimate wavelet decomposition. As the DWT, the MODWT can also be utilized for multiresolution analysis, with the advantage that it can be applied to signals of any size, while the DWT requires the sample size to be an integer power of two. In contrast with the usual DWT, the MODWT is translation invariant, i.e. shifting circularly the signal by any amount results into shifting the outputs of the lowpass and highpass filters by the same amount. This property does not hold for the DWT because of the subsampling involved in the filtering process [28].
Let and a th level wavelet and scaling filter, respectively. The corresponding th level MODWT wavelet and scaling filters are defined, respectively, by and with the same common length [28]. The th level MODWT wavelet and scaling coefficients are dimensional vectors defined by
(2) 
and
(3) 
for , respectively. The “mod” operator denotes the modular arithmetic between two integers.
Inverse transforming the MODWT coefficients creates the socalled details and smooths that form a multiresolution analysis of signal [28]:
(4) 
with denoting the number of decomposition levels. An interesting property of MODWT is that and are associated with zero phase filters [28]. The frequency band of detail coefficients associated to coefficients is given by with a width [28]. In contrast with DWT, there are always coefficients at each level. The MODWT is an energy preserving transform and the total energy of can be partitioned by the MODWT scaling and wavelet coefficients: [28].
For all the waveletbased methods studied in this paper, we used the symlets wavelet and 5 levels of decomposition. The symlets are orthogonal functions, nearly symmetrical wavelets with an oscillatory waveform and good timefrequency localization properties [16]. This makes it suitable wavelet choice for filtering and reconstructing EEG signals [18, 17].
Signal reconstruction
Comparison of the original signal with the stationary surrogates in the wavelet domain can identify nonstationary events on the timefrequency plane. The wavelet coefficients corresponding to artifacts are expected to be of high amplitude and well localized in time and scales, while the clean EEG coefficients are expected to be small and homogeneously spread over the whole segment.
To detect artifact components, the wavelet transform of surrogates are firstly averaged to produce a mean reference spectrum of stationarity:
(5) 
where is the number of surrogates and denotes the wavelet coefficients for surrogate at the th level. At each timepoint of the th level of decomposition, the standard deviation for the ensemble of surrogates can also be determined as
(6) 
At each point of the wavelet domain, the significance of wavelet coefficients at a given level was assessed by quantifying its statistical deviation from values obtained for the ensemble of surrogates. Thus, nonstationary components in observed signals can be detected by comparing values to a given threshold.
Distribution of wavelet coefficients from surrogates can be fitted with an appropriate distribution, such as Gaussian or Gamma distribution, and then setting a onesided confidence interval, for rejection of nonstationary components. Here, the significance was obtained by the ratio whose pvalue is given by the Chebyshev’s inequality: for any statistical distribution of : where is the chosen statistical threshold [27].
The threshold values are compared with the wavelet coefficients of the original signal in the following manner:
(7) 
If the original wavelet coefficients are greater than the threshold, they are set to the average value of the reference spectrum (obtained from the surrogates). In this manner, only the stationary components of the original spectrum are retained. Here, the threshold was set as the values greater than % of the values in the surrogate distribution. After nonstationary components have been filtered, the cleaned signal can be recomposed from all levels using the inverse MODWT [28] of the cleaned coefficients .
The main steps of proposed SuBAR method for automatic artifact removal are summarized in Algorithm 1.
2.2 Methods for Comparison
The proposed technique is compared with two other commonly used methods for artifact removal from singlechannel EEG signals: i) a waveletbased artifact removal [11, 6, 1, 12] which is based on the classical waveletthresholding and ii) a singlechannel method based on the combination of CCA with the EMD [11, 34, 6].
Wavelet thresholding
The key point of the wavelet thresholding is to separate, in the wavelet domain, the artifact components from the uncontaminated EEG components. For thresholding the wavelet coefficients we have used a leveldependent threshold [19, 29, 30]: , where is the length of signal and is the estimated noise variance for the wavelet coefficients, , at the th level of decomposition, which is usually calculated by [29]: . Such leveldependent thresholding is more appropriate than a single universal threshold in case of correlated and nonGaussian data, as such that it characterizes the underlying EEG activities [29]. Artifact removal is finally obtained by removing wavelet coefficients whose absolute values exceed the threshold [20, 1, 12] as follows:
(8) 
Canonical Correlation Analysis (CCA) combined with Empirical Mode Decomposition for single channels
CCA is a BSS technique currently used for separating a number of mixed or contaminated signals [7]. The recorded multichannel EEG signals are considered as a mixture given by ,where is the unknown mixing matrix and the components in are the statistically independent and unknown source signals, which include the artifacts. As other BSS techniques, CCA estimates the mixing matrix and recovers the original sources as , where is the unmixing matrix, i.e. the estimate of the inverse of . Artifacts are removed by computing , where is the mixing matrix with the columns corresponding to artifact related sources, set to zero.
Due to their relatively low autocorrelation, muscle artifacts are generally well identified in the last components obtained by the CCA algorithm [7]. Previous studies have shown that CCA outperforms different ICA algorithms for artifact removal from multichannel EEG and fMRI signals [9, 7, 8]. Other advantages of the CCA include that, i) as CCA uses second order statistics, it is a more computationally efficient algorithm than ICA and, ii) contrary to ICA algorithms, the CCA method always provides the same result for a given input.
As ICA, the original CCA is a multivariate technique that requires multichannel recordings to perform the decomposition. In some recent works, multidimensional time series have been generated from a singlechannel recording, using popular datadriven methods such as the wavelet transform and the empirical mode decomposition [2, 3, 11, 1, 6]. Artifact removal is then performed by applying BSS techniques (e.g. ICA or CCA) to the generated multichannel signals. For our singlechannel artifact removal technique we have here combined the CCA, as the best choice form multichannel artifact removal, with an improved version of the EMD as the best choice for time series decomposition [11]. The combination of these methods has been shown to be more reliable and computationally more efficient than the EMD combined with ICA [11].
The orignal EMD is an adaptive datadriven method, proposed by [31], to decompose nonlinear and nonstationary signals into a number of subcomponents called intrinsic modal functions (IMFs), with well defined instantaneous frequencies. The original signal is thus decomposed as , where stands for a residual trend, and the intrinsic modes ’s are nearly orthogonal to each other [31]. By construction, the spectral supports are decreased when going from one residual to the next. Nevertheless their frequency discrimination applies only locally (in time) and they cannot correspond to a subband filtering [32]. Despite its several advantages to decompose mixed signals, the EMD of noisy data may result in a corruption of modes, i.e. very similar oscillations appear in different IMFs [31]. Recently, the socalled complete Ensemble EMD with adaptive noise (CEEMDAN) was proposed to ameliorate the spectral separation of modes and to reduce computational time [33, 34]. The key idea on this algorithm relies on averaging the modes obtained by EMD applied to several realizations of Gaussian white noise added to the original signal.
Here, we use this decomposition technique (CEEMDAN), to convert the singlechannel signal into a multichannel signal . By means of the CCA, the source signals associated to artifacts can be then removed as described before. The cleaned singlechannel signal without the artifacts can be finally reconstructed by adding the new IMFs components in [11]. Hereafter, for the sake of simplicity, we denote this technique CCAEMD.
Criteria for artifact removal with the CCAEMD method
Eye blinks artifacts display large slow waves and have large autocorrelation compared to EEG sources. Here, EOG artifacts were thus identified from the first canonical variates due to their large autocorrelations (larger than ) [11]. In contrast, due to the frequency spectrum of the EMG artifacts, they resemble high frequency activity. In this work, the CCA components with spectral bandwidth larger to Hz were associated to muscle artifacts and removed from the reconstruction [7, 4, 36]. Other filtering criteria can also be applied, possibly providing better tuning of the algorithm to the particularities of other EEG artifacts.
3 Database
To assess the performance of the proposed artifact removal technique, we employed two datasets recorded via surface electrodes (Acticap, BrainProducts GmbH, Germany) using 64 scalp positions according to the standard 1010 montage. The first dataset consisted on a collection of clean EEG signals from two subjects who were instructed to remain quietly, but alert, with their eyes closed during two minutes. The second dataset was composed by EEG signals (two minutes) from one subject instructed to deliberately produced artifacts by eye blinking and jaw clenching at short intervals. To verify the correct realization of artifacts and to detect their instances, we used four external EOG and EMG channels (right and left frontalis and anterior temporalis muscles). In all recordings, impedance between electrodes and skin were set below k. According to the declaration of Helsinki, written informed consent was obtained from subjects after explanation of the study, which was approved by the ethical committee CPPIDFVI of Paris (n^{o} 2016A0062645).
The EEG signals were amplified, digitized at a sampling frequency of 1024 Hz, then downsampled to 256 Hz and segmented in sec nonoverlapped windows to reduce computational cost in successive blocks. Clean signals from the first dataset were artificially contaminated with muscle artifacts and eye movements as in [4] and were used to compare quantitatively the removal of artifacts by the different methods. Trained experts at the EEG platform of the ICM visually inspected all trials and selected different artifactfree EEG segments from the first database. Finally, contaminated signals (with vertical ocular blinks and other pronounced muscular artifacts) were selected from the second dataset to qualitatively illustrate the efficacy of the denoising process.
4 Evaluation
Artificially contaminated EEG signals were simulated using clean segments from the first dataset (absence of ocular and muscular activity and other artifacts due to body movements or technical interferences). Artifacts, superimposed to clean data, were generated in three steps:
4.1 Characterization of spatial distribution
Since artifacts of different origins have a specific distribution in the scalp, we first computed a weight vector of dimensions to scale the artifact patterns according to the topographical information from the scalp electrodes. To obtain the topographical information for each type of artifact, we applied ICA decomposition to a selection of real EEG segments containing the artifacts. Vector was the rescaled column in the estimated mixing matrix associated to the artifactual component, found by inspecting some features such as the autocorrelation and spatial position in the scalp [4]. The components associated to eye movements and blinks () were detected as those yielding high amplitudes on the most frontal electrodes, and muscular components () as those having transient and fast activities localized most importantly on temporal electrodes.
4.2 Generation of artifact patterns
The two different artifact patterns consisted on vector signals of length obtained as follows:

Ocular artifacts, , were recorded from the original electrooculogram signal (EOG).

Muscular artifacts, , were generated using random noise bandpass filtered between and Hz with a random length between  s (equivalent to those observed in real EEG data) [4].
From the above patterns, the matrix of simulated artifacts, of dimension , was computed by performing the product for each pattern. Clean EEG signals (selected by visual inspection) and simulated artifacts came from different subjects to ensure that all segments (trials) of simulated and real EEG data were independent of each other.
4.3 Setting of signal to noise ratios
Synthetic artifacts were superimposed on the clean EEG segments as follows: , where represents the contribution of the artifact. For each trial and artifact type, the signal to noise ratio (SNR) was adjusted by changing the parameter as follows:
(9) 
where corresponds to the root mean squared value averaged over all channels, and denotes the root mean squared value of the artifact. Following [4], prior to computing SNR for the topographic artifacts, we scaled their amplitudes by the highest channel gain in the applied scalp map. Here, the artifact contribution was gradually decreased in a dB scale () from dB to dB.
The performances of the considered algorithms for artifact removal were evaluated both in terms of the amount of artifact reduction and the amount of distortion they bring into clean EEG signals. Performances were expressed in terms of the relative root mean square error (RRMSE) [37]:
(10) 
where is the signal after artifact removal. To assess whether our technique preserves the frequency spectrum of clean EEG, or it introduces any phase shift in denoised signals, we have also measured the spectral coherence and phase delay between the corrected data and the clean EEG segments.
5 Results and Discussion
The proposed algorithm is applied to EEG data contaminated by simulated muscular activities and ocular artifacts. Figure 2 shows some examples of added artifacts and their removal by the SuBAR algorithm. Results suggest a good removal of both types of artifacts without distorting the background EEG signals outside the artifactual regions. Low or bandpass filters were not capable of removing muscular artifacts without altering the underlying brain activity because the overlap of the frequency spectrum of artifacts and that of clean EEG signals.
We examined the reliability of the SuBAR method at different SNR values in terms of the RRMSE as mentioned in Section 4. We also applied two alternative methods – wavelet thresholding and CCA combined with the advanced version of the EMD – to the contaminated EEG channels, and performed a comparison through 20 independent simulations. In our simulations, the amplitude of artifacts were scaled by their spatial distribution and a prescribed SNR. Here, we present the performances of different methods for a reduced number of EEG channels which spatial positions are relevant for different practical applications of ambulatory clinical neuroimaging, in reduced settings for routine monitoring [14, 15] or for practical BCI systems based on motor imagery. EEG electrodes included are: FC1, FC2, FCz, CP1, CP2, CPz, C1,C2 and Cz.
Figure 3 shows a systematic comparison of different algorithms, expressed in terms of the RRMSE for different SNR and both muscular and ocular artifacts. Bar plots display averaged values estimated over all channels and segments. Results clearly indicate that, even with severe artifacts, our SuBAR method yields better performances than traditional artifact removal techniques.
Muscular artifacts
The performances of different methods to remove simulated muscular artifacts are shown in Figure 4. It is clearly seen that the SuBAR method outperformed its competitors for all SNRs. Although wavelet thresholding performances are stable for all SNRs, this technique is not able to recover the original EEG signals. On the other hand, as the contamination level increases in frontal and posterior regions, the performances of the datadriven CCAEMD method are considerably degraded. Filtering modes from the CCAEMD method is insufficient to remove large muscular artifacts without altering the underlying brain activity since the frequency spectrum of the muscle artifacts overlaps with that of the brain signals.
Results support the hypothesis that, thanks to the timefrequency localization properties of the wavelet transform, a comparison of the contaminated EEG signal with the surrogates in the wavelet domain can identify artifacts as nonstationary events embedded on a stationary signal. Long and persistant muscular artifacts could not be detected as the spectrum of the contaminated EEG will not differ from a stationary process.
Ocular artifacts
Figure 5 shows the RRMSE for the different ocular artifact removal algorithms as a function of SNR. It can be observed that both the surrogatebased algorithm and the CCAEMD methods are not able to remove large artifacts without distorting the true signals, whereas the wavelet thresholding is an algorithm that provides better performances. This indicates that surrogates of EEG signals with large ocular artifacts cannot be distinguished in the wavelet domain from the decomposition of contaminated signals. Nevertheless, for mild and weak artifacts, our surrogatebased technique outperforms other methods and can better remove the artifacts and recover the underlying EEG signals.
Distortion of clean EEG segments
Let us now quantify the distortions –in terms of RRMSE– produced by each method when applied to artifact free EEG epochs. Figure 6 shows that, although wavelet thresholding is able to remove large ocular artifacts, it also produces an important amount of distortion on clean EEG segments. Similarly, the automatic correction of artifacts with CCAEMD method altered clean EEG signals substantially, although the algorithm to detect ocular artifacts resulted in a larger distortion than the algorithm for muscular artifact removal. These results indicate that for singleEEG channels, the SuBAR method preserves better the EEG signals than the other considered artifact removal algorithms, which may remove true neural components from clean brain signals.
Spectral distortion produced by the SuBAR method
To identify if the proposed algorithm preserves the natural frequency spectrum in denoised signals, we computed the spectral coherence and phase delays between the original clean EEG signals and the corrected data. Figures 78 show that, even in presence of severe artifacts (SNR dB), the SuBAR method preserves the spectrum of EEG signals. A small amount of distortion on EEG spectrum is observed for frequencies larger than Hz. In this band of frequencies, the correction of large artifacts introduced small absolute values of phase delay of about radians. Low frequencies ( Hz) were practically undistorted by the filtering method. Results clearly indicate that, in general, the proposed method preserves well the spectral components from clean brain signals and no significant phase delays are introduced by the filtering.
Correction of real contaminated data
For illustration purposes, the SuBAR algorithm was applied on the EEG epochs from the second dataset, i. e. contaminated with real artifacts. Figure 9(A) shows the performance of the SuBAR method on an EEG epoch that contains both eye blinks and muscular artifacts. Notice that, although different artifacts were present in the same segment, they were relatively well removed. In Figures 9(BC), we observe that, although small amounts of muscular activities remain in the denoised signals, our nonparametric algorithm corrects well the eye blinks from the contaminated data. Error measures could not be used here as we did not have apriori information available of the clean brain signals.
Computational complexity
For a given level of decomposition, the Maximal Overlap Wavelet Transform is computationally equivalent to other shiftinvariant discrete wavelet transforms, and may be calculated with computational complexity [38]. Although the wavelet thresholding and the surrogatebased algorithm have similar algorithmic complexity, the former remains highly advantageous from the computational point of view as surrogates are not generated. For the EEG segments analyzed here
6 Conclusion
In this work, a new datadriven method for automatic artifact removal in singlechannel EEG was presented. The novelty of our proposal relies on the timefrequency analysis of surrogate data to identify and filter ocular and muscular artifacts embedded in single EEG channels. The efficacy of the algorithm was compared to wavelet thresholding, and the CCA combined with the EMD. Through artificially contaminated EEG signals, we demonstrate that the surrogatebased removal (SuBAR) algorithm outperforms the other techniques considered here for removing muscle and ocular artifacts from single EEG signals. Although large ocular artifacts are better removed by wavelet thresholding, the SuBAR method yields, in general, a relative error 4 to 5 times smaller than the other considered artifact removal methods. Results show that the proposed algorithm preserves well the frequency spectrum of EEG in denoised signal (without phase delay). Furthermore, our method yields the smallest distortion of signals when applied to artifactfree EEG segments. Though it is not the aim of this study, we envisage that possible further optimizations can be obtained with other families of wavelets.
Most artifact reduction techniques require multivariate EEG data, or auxiliary referential signals (e.g. EOG or EMG) [1]. Common artifact removal algorithms generally requires appropriate spectral and topographical parameters for the detection of EEG artifacts (eye blinks, saccades, muscle activity) [1, 6]. In contrast, results presented in this work suggest that our singlechannel technique can be a good nonparametric filter for artifact removal in offline environments with a reduced number of sensors. Computational profiling shows that the proposed method could be suitable for pseudo realtime environments, such as the mobile monitoring of cognitive or emotional states. The use of recent distributed signal processing algorithms might speedup the algorithm for realtime implementations, or for the analysis of multichannel EEG datasets [39]. The proposed datadriven SuBAR method could be used in combination with other temporal EEG features (as those used in artifact detection [40]) to facilitate further signal processing of singlechannel EEGs.
Although the generation of surrogate time series and the corresponding wavelet decomposition of large EEG segments might be a bottleneck for the processing speed in online applications, its technical simplicity (it is fully datadriven) makes the SuBAR method highly operable in mobile environments, such as ambulatory healthcare systems [41], where there are only a few EEG channels, or even a single channel is available (e.g. sleep stage scoring or anesthesia monitoring). The practicality, accuracy and reliability of an adaptive filter based on ou method remain, however, to be explored.
Acknowledgments
F. De Vico Fallani is supported by the French Agence Nationale de la Recherche (programme ANR15NEUC000602), and X. NavarroSune is financially supported by Air Liquide Medical Systems S.A.
Footnotes
 In this paper, scalars are denoted by italic letters, vectors by lowercase boldface letters and matrices by uppercase, boldface leterrs. Superscript refers to the transpose operation.
 Using Matlab on a 2.8GHz dualcore Intel Core i7 processor, 16GB of memory
References
 J. A. Urigüen and B. GarciaZapirain, “EEG artifact removal–Stateoftheart and guidelines”, Journal of Neural Engineering, vol. 12, no. 3 p. 031001, 2015.
 K. T. Sweeney, T. E. Ward and S. F. McLoone, “Artifact removal in physiological signals–practices and possibilities”, IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. (3), pp. 488–500, 2012.
 K. T. Sweeney, et al. “A methodology for validating artifact removal techniques for physiological signals”, IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 5, pp. 918–926, 2012.
 A. Delorme, T. Sejnowski and S. Makeig, “Enhanced detection of artifacts in EEG data using higherorder statistics and independent component analysis”, Neuroimage, vol. 34, no. 4, pp. 1443–1449, 2007.
 J. Iriarte, et al., “Independent component analysis as a tool to eliminate artifacts in EEG: a quantitative study”, Journal of Clinical Neurophysiology, vol. 20, no. 4, pp. 249–257, 2003.
 X. Chen, et al. (2016). “Removing Muscle Artifacts From EEG Data: Multichannel or SingleChannel Techniques?”, IEEE Sensors Journal, vol. 16, no. 7, pp. 1986–1997, 2016.
 W. De Clercq, et al., “Canonical correlation analysis applied to remove muscle artifacts from the electroencephalogram”, IEEE Transactions on Biomedical Engineering, vol. 53, no. 12, pp. 2583–2587, 2006.
 J. Gao, C. Zheng, and P. Wang, “Online removal of muscle artifact from electroencephalogram signals based on canonical correlation analysis”, Clinical EEG and Neuroscience, vol. 41, no. 1, pp. 53–59, 2010.
 M. Borga, et al., “A canonical correlation approach to exploratory data analysis in fMRI,” in Proceedings of the 10th Scientific Meeting of the International Society of Magnetic Resonance in Medicine (ISMRM’02), Honolulu, Hawaii, 2002.
 A. X. Patel, et al. “A wavelet method for modeling and despiking motion artifacts from restingstate fMRI time series”, Neuroimage, vol. 95, pp. 287–304, 2014.
 K. T. Sweeney, S. F. McLoone and T. E. Ward, “The use of ensemble empirical mode decomposition with canonical correlation analysis as a novel artifact removal technique”, IEEE Transactions on Biomedical Engineering, vol. 60, no. 1, pp. 97–105, 2013.
 S. Khatun, R. Mahajan and B. I. Morshed, “Comparative Study of WaveletBased Unsupervised Ocular Artifact Removal Techniques for SingleChannel EEG Data”, IEEE Journal of Translational Engineering in Health and Medicine, vol. 4, pp. 1–8, 2016.
 X. Chen, et al., “A preliminary study of muscular artifact cancellation in singlechannel EEG”, Sensors, vol. 14, no. 10, pp. 18370–18389, 2014.
 A. Casson, et al., “Wearable electroencephalography”, IEEE Engineering in Medicine and Biology Magazine, vol. 29, no. 3, pp. 44–56, 2010.
 V. Mihajlović, et al., “Wearable, wireless EEG solutions in daily life applications: what are we missing?”, IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 1, pp. 6–21, 2015.
 I. Daubechies, Ten lectures on wavelets of CBMSNSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), vol. 61, Philadelphia, PA, 1992.
 M. I. AlKadi, et al., “Compatibility of mother wavelet functions with the electroencephalographic signal,” in 2012 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), pp. 113–117, 2012.
 A. K. Aniyan, et al., “A wavelet based algorithm for the identification of oscillatory eventrelated potential components”, Journal of Neuroscience Methods, vol. 233, pp. 63–72, 2014.
 D. Donoho and I. Johnstone, “Adapting to unknown smoothness via wavelet shrinkage”, Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200–1224, 1995.
 D. Safieddine, et al., “Removal of muscle artifact from EEG data: comparison between stochastic (ICA and CCA) and deterministic (EMD and waveletbased) approaches”, EURASIP Journal on Advances in Signal Processing, vol. 1, pp. 1–15, 2012.
 Y. Chen, et al., “The removal of EMG in EEG by neural networks”, Physiological Measurement, vol. 31, no. 12, p. 1567, 2010.
 B. Efron, and R. J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall/CRC, New York, 1993.
 J. Theiler, and D. Prichard, “Constrainedrealization MonteCarlo method for hypothesis testing”, Physica D, vol. 94, no. 4, pp. 221–235, 1996.
 T. Schreiber, and A. Schmitz, “Surrogate time series”, Physica D, vol. 142, no. 3, pp. 346–382, 2000.
 P. Borgnat, et al., “Testing stationarity with surrogates: A timefrequency approach”, IEEE Transactions on Signal Processing, vol. 58, no. 7, pp. 3459–3470, 2010.
 T. Schreiber, and A. Schmitz, “Improved surrogate data for nonlinearity tests”, Physical Review Letters, vol. 77, no. 4, p. 635, 1996.
 A. Papoulis, and S. U. Pillai, Probability, random variables, and stochastic processes. Tata McGrawHill Education, 2002.
 D. B. Percival, and A. T. Walden, Wavelet methods for time series analysis. Cambridge University Press, 2006.
 I. M. Johnstone, and B. W. Silverman, “Wavelet threshold estimators for data with correlated noise”, Journal of the Royal Statistical Society, Series B, vol. 59, no. 2, pp. 319–351, 1997.
 M. K. Islam, A. Rastegarnia, and Z. Yang, “A WaveletBased Artifact Reduction from Scalp EEG for Epileptic Seizure Detection”, IEEE Journal of Biomedical and Health Informatics , vol. 20, no. 5, pp. 1321–1332, 2016.
 N. E. Huang, et al., “The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis”, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903995, 1998.
 P. Flandrin, G. Rilling and P. Goncalves, “Empirical mode decomposition as a filter bank”, IEEE Signal Processing Letters, vol. 11, no. 2, pp. 112114, 2004.
 M. E. Torres et al., “A Complete Ensemble Empirical Mode Decomposition with adaptive noise,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4144–4147, 2011.
 M. A. Colominas, G. Schlotthauer and M. E. Torres, “Improved complete ensemble EMD: A suitable tool for biomedical signal processing”, Biomedical Signal Processing and Control, vol. 14, pp: 19–29, 2014.
 T.W. Lee, et al., “Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources”, Neural computation, vol. 11, no. 2, pp: 417–441,1999.
 S. D. Muthukumaraswamy, “Highfrequency brain activity and muscle artifacts in MEG/EEG: a review and recommendations”, Front. Hum. Neurosci., vol. 7, pp: 138, 2013.
 Z. Wang, and A. Bovik, “Mean squared error: love it or leave it”, IEEE Signal Processing Magazine, vol. 26, no. 1, pp: 98–117, 2009.
 J. Enders, et al., “The shiftinvariant discrete wavelet transform and application to speech waveform analysis” The Journal of the Acoustical Society of America, vol. 117, no. 4, pp: 2122–2133, 2005.
 A. Bertrand, “Distributed signal processing for wireless EEG sensor networks” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 23, no. 6, pp: 923–935, 2015.
 A. Mognon, et al., “ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features”, Psychophysiology, vol. 48, no. 2, pp: 229–240, 2011.
 J.M. Ehrenfeld, and M. Cannesson Eds, Monitoring technologies in acute care environments. New York, NY, USA: Springer, 2014.