A Generalised Seizure Prediction with Convolutional Neural Networks for Intracranial and Scalp Electroencephalogram Data Analysis
Seizure prediction has attracted a growing attention as one of the most challenging predictive data analysis efforts in order to improve the life of patients living with drug-resistant epilepsy and tonic seizures. Many outstanding works have been reporting great results in providing a sensible indirect (warning systems) or direct (interactive neural-stimulation) control over refractory seizures, some of which achieved high performance. However, many works put heavily handcraft feature extraction and/or carefully tailored feature engineering to each patient to achieve very high sensitivity and low false prediction rate for a particular dataset. This limits the benefit of their approaches if a different dataset is used. In this paper we apply Convolutional Neural Networks (CNNs) on different intracranial and scalp electroencephalogram (EEG) datasets and proposed a generalized retrospective and patient-specific seizure prediction method. We use Short-Time Fourier Transform (STFT) on -second EEG windows to extract information in both frequency and time domains. A standardization step is then applied on STFT components across the whole frequency range to prevent high frequencies features being influenced by those at lower frequencies. A convolutional neural network model is used for both feature extraction and classification to separate preictal segments from interictal ones. The proposed approach achieves sensitivity of , , and false prediction rate (FPR) of /h, /h, /h on Freiburg Hospital intracranial EEG (iEEG) dataset, Children’s Hospital of Boston-MIT scalp EEG (sEEG) dataset, and Kaggle American Epilepsy Society Seizure Prediction Challenge’s dataset, respectively. Our prediction method is also statistically better than an unspecific random predictor for most of patients in all three datasets.
Shell : Bare Demo of IEEEtran.cls for IEEE Journals
in data mining and machine learning in the past few decade has attracted significantly more attention to the application of these techniques in detective and predictive data analytics especially in healthcare, medical practices and biomedical engineering . While the body of available proven knowledge lacks a convincing and comprehensive understanding of sources of epileptic seizures, some early works showed the possibility of predicting, seemingly unpredictable, seizures . In ref. , dynamical similarity index, effective correlation dimension and increments of accumulated energy were used as feature extraction. Dynamical similarity index yielded highest performance with sensitivity and false prediction rate (FPR) less than /h. Mean phase coherence and lag synchronization index of -s sliding EEG windows were used as features for seizure prediction . Performance of this approach was still modest at sensitivity of and a comparable FPR. This approach was further improved by combining bi-variate empirical mode decomposition and Hilbert-based mean phase coherence as feature extraction . As a result, sensitivity was increased to beyond while FPR dropped below /h. Another method to exploit the synchronization information was proposed by authors in . In that method, phase-match error of two consecutive epochs is calculated first, then applied discrete cosine transform (DCT) on the phase-match error in order to estimate energy concentration ratio. The average of energy concentration ratio across all channels was then used as global features. The authors extracted local features based on modified deviation and fluctuation functions, and LS-SVM was used for classification which resulted in sensitivity and /h FPR.
A machine learning approach using Support Vector Machine (SVM) with features from nine frequency bands of spectral power was introduced in . This method achieved a decent performance on Freiburg Hospital dataset  with sensitivity of and FPR of /h. A similar approach with additional features, power spectral density ratios, was proposed by  with very high sensitivity exceeding and FPR less than /h. However, this approach extremely tailored feature selection for each patient, hence, lacked of generalization. Different from the two approaches above,  did a Bayesian inversion of power spectral density then applied a rule-based decision to perform the seizure prediction task. This approach was tested with the same Freiburg dataset with sensitivity of and FPR of /h. The authors, in a recent work , extracted six uni-variate and bi-variate features including correlation dimension, correlation entropy, noise level, Lempel-Ziv complexity, largest Lyapunov exponent, and nonlinear interdependence and achieved a comparable sensitivity of and lower FPR of /h.
Based on an assumption that the future events depend on a number of previous events, multi-resolution -gram on amplitude patterns was used as feature extraction in . After optimizing feature set per patient, this method yielded a high sensitivity of and a low FPR of /h. Recently,  captured dynamics of EEG by using fuzzy rules to estimate trajectory of each sliding EEG window on Poincaré plane. The features went through PCA to reduce interrelated features before classified by a SVM. This work achieved a decent performance with sensitivity of more than and FPR below /h.
Other seizure prediction techniques were proposed by . In , features estimated by wavelet energy and entropy were optimized for each patient, then a discriminant analysis was used to separate preictal segments from interictal ones. The results were promising with sensitivity of and FPR of /h testing with intracranial EEG data from six patients from Montreal Neurological Institute dataset.  introduced a lightweight approach based on spike rate. This approach was able to achieve a sensitivity of with a false prediction rate of /h.
There have been works claimed to have sensitivity and very low false alarm, less than /h , or even zero false alarm . However, these works employed numerous feature engineering techniques and seizure prediction for each patient performs well only with a certain technique. For example, in , the authors used different feature extraction methods and three machine learning algorithms. Similarly, in , there were features and a set of cost-sensitive linear SVM classifiers being used to search for the optimal single features or feature combinations that performs the best for each patient. These approaches have two main drawbacks: (1) we do not know which combination of features and classifier will work for a new patient, and (2) we cannot guarantee that the optimal combination will work well with future data of the same patient.
We are seeking for an approach that can be applied for all patients with minimum feature engineering. Neural networks are known with capability to extract features from raw input data to perform a classification task. In this work, we will deploy a convolutional neural network for seizure prediction. The main contributions of this work are: (1) propose a proper method to pre-process raw EEG data into a form suitable for convolutional neural network, and (2) propose a guideline to help convolutional neural network perform well with seizure prediction task with minimum feature engineering. To prove the advantage of our approach, we will use the same pre-processing technique and convolutional neural network configuration for all patients from two different datasets: Freiburg Hospital intracranial EEG (iEEG) dataset and Children’s Hospital of Boston-Massachusetts Institute of Technology (CHB-MIT) scalp EEG (sEEG) dataset.
There are three datasets being used in this work: Freiburg Hospital dataset , CHB-MIT dataset  and Kaggle American Epilepsy Society Seizure Prediction Challenge’s dataset . The Freiburg dataset consists of intracranial EEG (iEEG) recordings of patients with intractable epilepsy. Due to lack of availability of the dataset, we are only able to use data from patients. A sampling rate of Hz was used to record iEEG signals from these patients. In this dataset, there are recording channels from selected contacts where three of them are from epileptogenic regions and the other three are from the remote regions. For each patient, there are at least min preictal data and h of interictal. More details about Freiburg dataset can be found in .
CHB-MIT dataset contains scalp EEG (sEEG) data of pediatric patients with h of continuous sEEG recording and seizures. Scalp EEG signals were captured using electrodes at sampling rate of Hz . We define interictal periods that are at least h away before seizure onset and after seizure ending. In this dataset, there are cases that multiple seizures occur close to each other. For the seizure prediction task, we are interested in predicting the leading seizures. Therefore, for seizures that are less than min away from the previous one, we consider them as only one seizure and use the onset of leading seizure as the onset of the combined seizure. Besides, we only consider patients with less than seizures a day for the prediction task because it is not very critical to perform the task for patients having a seizure every h on average. With the above definition and consideration, there are patients with sufficient data (at least leading seizures and interictal hours).
Kaggle seizure prediction challenge’s dataset has iEEG data of canines and patients with seizures and hours of interictal recording . Intracranial EEG canine data were recorded from implanted electrodes with Hz sampling rate. Recorded iEEG data of the two patients were from depth electrodes (patient 1) and subdural electrodes (patient 2) at sampling rate of kHz. Preictal and interictal -min segments were extracted by the challenge’s organizers. Specifically, for each lead seizure, six preictal segments were extracted from min to min prior to seizure onset with s apart in time. Interictal segments were randomly selected at least one week away from any seizure.
Since -dimensional convolutional neural network will be used in our work, it is necessary to convert raw EEG data into matrix, ie. image-like format. The conversion must be able to keep the most important information of the EEG signals. Wavelet and Fourier transform were commonly used to convert time-series EEG signals into image shape. They were also used as an effective feature extraction method for seizure detection and prediction. In this paper, we use Short-Time Fourier Transform to translate raw EEG signal into two dimensional matrix comprised of frequency and time axes. We use EEG window length of s. Most of EEG recordings were contaminated by power line noise at Hz (see Figure 2a) for Freiburg dataset and Hz for CHB-MIT dataset. In frequency domain, it is convenient to effectively remove the power line noise by excluding components at frequency range of – Hz and – Hz if power frequency is Hz and components at frequency range of – Hz and – Hz for power line frequency of Hz. The DC component (at Hz) was also removed. Figure 2b shows the STFT of a -s window after removing power line noise.
One challenge to convolutional neural network is the imbalance of the dataset, i.e. much more interictal recordings than preictal ones. For example, in Freiburg dataset, we have interictal to preictal ratio per patient varies from to . To overcome this, we generate more preictal segments by using overlapping technique during training phase. In particular, we create extra preictal samples for training by sliding a -s window along time axis at every step over preictal time-series EEG signals (see Figure 3). is chosen per each subject so that we have similar number of samples per each class (preictal or interictal) in training set. Note that it is possible to have some extra preictal segments are the same with original ones but this would not be problematic.
2.3Convolutional neural network
Convolutional neural networks (CNNs) have been used extensively the recent years for computer vision and natural language processing. In this paper, we use a CNN with three convolution blocks as described in Figure 4. Each convolution block consists of a batch normalization, a convolution layer with a rectified linear unit (ReLU) activation function, and a max pooling layer. The batch normalization to ensure the inputs to convolution layer have zero mean and unit variance. The first convolution layer has sixteen 3-dimensional kernel with size , where is number of EEG channels, is used with stride . The next two convolution blocks have and convolution kernels, respectively, and both have kernel size , stride and max pooling size . Following the three convolution blocks are two fully-connected layers with sigmoid activation and output sizes of and respectively. Drop-out layers are placed before each of the two fully-connected layers with dropping rate of . Since the dataset for training the CNN is very limited, it is important to prevent the CNN from over-fitting. First, we keep the CNN architecture simple and shallow as described above. Second, we propose a guideline to prevent over-fitting during training the neural network. A common practice is to randomly split of the training set to use as a validation set. After each training epoch, a loss and/or accuracy are calculated with respects to the validation to check if the network starts to over-fit the training set. This approach works well with datasets where there is no time information involved, eg. images for classification task. For seizure prediction, it is logical to use samples from a different time period than those during training to monitor if the model starts to over-fit. In this paper, we carefully select later samples from preictal and interictal sets for validation and the rest for training (Fig. Figure 5).
It is common to have isolated false positives during interictal periods. These isolated false predictions can be effectively reduced by using a discrete-time Kalman filter . In this work, we employ a very simple method that is -of- analysis to be consistent with the paper theme of simplicity. Particularly, for every predictions, the alarm only rises if there are at least positive predictions. As we use -s windows, the CNN produces predictions every seconds. We choose and in this work.
It is non-trivial to remind how a seizure prediction system should be evaluated. Seizure prediction horizon (SPH) and seizure occurrence period (SOP) need to be defined before estimating performance metrics such as sensitivity and false prediction rate. In this paper, we follow the definition of SOP and SPH that was proposed in  and is illustrated in Figure 6. SOP is the interval where the seizure is expected to occur. The time period between the alarm and beginning of SOP is called SPH. For a correct prediction, a seizure onset must be is after the SPH and within the SOP. Likewise, a false alarm rises is when the prediction system returns a positive but there is no seizure occurring during SOP. When an alarm rises, it will last until the end of the SOP. Regarding clinical use, SPH must be long enough to allow sufficient intervention or precautions. SPH is also called intervention time . In contrast, SOP should be not too long to reduce the patient’s anxiety. Some works failed to mention SPH and SOP properly. In , the authors reported using SPH of min but based on their explanation, what they were implicitly using is SPH of min and SOP of min, ie. if a alarm occurs at any point within min before seizure onset, it is considered as a successful prediction. Similarly, authors in  provided a different definition of SPH that is the interval between the alarm and seizure onset. Inconsistency in defining SPH and SOP make the benchmark among methods difficult and confusing.
Metrics used to test the proposed approach are sensitivity, false prediction rate under SPH of min and SOP of min. To have a robust evaluation, we follow a leave-one-out cross-validation approach for each subject. If a subject has seizures, seizures will be used for training and the withheld seizure for validation. This round is repeated times so all seizures will be used for validation exactly one time. Interictal segments are randomly split into parts. parts are used for training and the rest for validation.
In this section, we are testing our approach with three datasets: (1) Freiburg iEEG dataset, (2) CHB-MIT sEEG dataset, and (3) Kaggle iEEG dataset. SOP = min and SPH = min were used in calculating all metrics in this paper. Our model is implemented in Python 2.7 using Keras 2.0 with Tensorflow 1.4.0 backend. The model was run parallelly on NVIDIA K80 graphic cards. Each fold of leave-one-out cross-validation was executed twice and average results were reported. Table ? summarizes seizure prediction results for Freiburg iEEG dataset with SOP of min and SPH of min. By applying solely the power line noise removal, prediction sensitivity is , ie. out of seizures are successfully predicted. False prediction rate (FPR) is very low at /h. Our method achieves similar sensitivity of on MIT sEEG dataset but with higher FPR of /h (see Table ?). This is reasonable since scalp EEG recordings tend to be noisier than intracranial one. For Kaggle dataset, overall sensitivity is and FPR is /h. It is important to note that our approach works comparably with both intracranial EEG and scalp EEG recordings without any denoising techniques except power line noise removal.
Table ? demonstrates a benchmark of recent seizure prediction approaches and this work. It is complicated to tell which approach is the best because each approach is usually tested with one dataset that is limited in amount of data. In other words, one approach can work well with this dataset but probably perform poorly on another dataset. Therefore, we add an extra indicator on whether same feature engineering or feature set is applied across all patients to evaluate generalization of each method. From clinical perspective, it is desirable to have long enough SPH to allow an effective therapeutic intervention and/or precautions. SOP, in the other hand, should be short to minimize the patient’s anxiety . Some works that implicitly used zero SPH disregarded clinical considerations, hence, could be over-estimated. Approach proposed by  achieved a very high sensitivity of and a FPR of /h testing with patients from Freiburg dataset. Our method yields a less sensitivity of but a better FPR of /h. It is non-trivial to note that SPH was implicitly set to zero which means prediction at time close to or event at seizure onset can be counted as successful prediction. Likewise, researches conducted in  also implied a use of zero SPH will not be compared directly to our results. Among the rest of the works listed in Table ?,  had a very good prediction sensitivity of and a low FPR of /h under SOP = min and SPH = min. The authors in  were clever in fine-tuning feature extraction for each patient. This, however, leads to the need of adequate expertise and time to perform the feature engineering for new dataset. Authors in applied same feature extraction technique to all patients and performed classification using SVM. This approach achieved a high sensitivity of – and a low FPR of – testing with Freiburg intracranial EEG dataset. However, there have been no works reported to successfully use similar approach on scalp EEG signals.
Information extracted from EEG signals in frequency and time (synchronization) domains has been used widely to predict seizures. As seen in Table ?, sensitivity and false prediction have improved over time. This paper proposed a novel way to exploit both frequency and time aspects of EEG signals without handcraft feature engineering. Short-Time Fourier Transform of an EEG window has two axes of frequency and time. A –dimensional convolution filter was slid throughout the STFT to collect the changes in both frequency and time of EEG signals. This is where the beauty of convolutional neural network comes in. The filter weights are automatically adjusted during the training phase and the CNN acts like a feature extraction method in a automatic fashion.
We also compare the prediction performance of our approach with an unspecific random predictor. Given an FPR, the probability to raise an alarm in an SOP can be approximated by 
Therefore, probability of predicting at least of independent seizures by chance is given by
We calculated value for each patient by using FPR of that patient and the number of predicted seizures () by our method. If is less than , we can conclude that our prediction method is significantly better than a random predictor at significant level of . Tables ? and ? have shown that our prediction method achieve significantly superior to an unspecific random predictor for all patients except Pat14 in Freiburg dataset and Pat9 in CHB-MIT dataset. It is worth reminding that Freiburg dataset is intracranial EEG while CHB-MIT dataset is scalp EEG. In other words, our method works well with both types of EEG signals. Regarding Kaggle dataset, our method results in significantly better performance compared to a random predictor for out of canines (see Table ?) and for Pat1.
As seizure characteristics may change over time, calibration of seizure prediction algorithm is necessary. Minimum feature engineering brings great advantage that it does not require an expert to carefully extract and select optimum features for the prediction task. Hence, it allows faster and more frequent updates so that patients are able to benefit the most from the seizure prediction algorithm. Also, minimum feature engineering also allows the seizure prediction available to more patients. Since feature extraction task is taken by CNN, neuro-physiologists and clinical staff can spend more time in monitoring and recording EEG signals for diagnostic purpose and/or training data collection.
Our method can be further improved by non-EEG data such as information of time when seizures occurs. Epileptic seizures have been shown to have biases in distribution over time at various intervals that can be as long as year or as short as hour . Importantly, the authors in  have shown that there are more incidences of seizure around sunrise, noon and midnight in their dataset of patients with seizures. However, this pattern is patient-specific and can be very different from patient to patient. Adopting the same observation, authors in  leverage this pattern to significantly improve their seizure forecasting system. Unfortunately, the three datasets investigated in this paper is not long enough to assess if time of day information is useful because maximum recording period per patient is days. Nevertheless, it is still worth to see how incidences of seizure are distributed over time of day across patients in the CHB-MIT dataset, the only dataset that we can access time of seizure occurrence. Figure 7 shows greatest incidence in the early morning, and two lower peaks around p.m. and a.m.
Seizure prediction capability has been studied and improved over the last four decades. A perfect prediction is yet available but with current prediction performance, it is useful to provide the patients with warning message so they can take some precautions for their safety. This paper proposed a novel approach of using convolutional neural network with minimum feature engineering. The proposed shown its good generalization when working well with both intracranial EEG and scalp EEG. This brings opportunity to more patients to possess a seizure prediction device that can help them have a more manageable life.
The authors appreciate Dr Benjamin H. Brinkmann support from Mayo Systems Electrophysiology Lab for providing information on some unlabeled datasets. The authors also thank Dr Farzaneh Shayegh for sharing her thoughts on Freiburg Hospital dataset. N. Truong acknowledges The Commonwealth Scientific and Industrial Research Organisation (CSIRO) partial financial support via a PhD Scholarship, PN 50041400. J. Yang acknowledges National Natural Science Foundation of China for their financial support under Grant 61501332.
- S. Ramgopal, S. Thome-Souza, M. Jackson, N. E. Kadish, I. Sánchez Fernández, J. Klehm, W. Bosl, C. Reinsberger, S. Schachter, and T. Loddenkemper, “Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy,” Epilepsy & Behavior, vol. 37, pp. 291–307, aug 2014.
- K. Gadhoumi, J.-M. Lina, F. Mormann, and J. Gotman, “Seizure prediction for therapeutic devices: A review,” Journal of Neuroscience Methods, vol. 260, pp. 270–282, 2016.
- E. Bou Assi, D. K. Nguyen, S. Rihana, and M. Sawan, “Towards accurate prediction of epileptic seizures: A review,” Biomedical Signal Processing and Control, vol. 34, pp. 144–157, 2017.
- Z. Rogowski, I. Gath, and E. Bental, “On the prediction of epileptic seizures,” Biological Cybernetics, vol. 42, no. 1, pp. 9–15, 1981.
- Y. Salant, I. Gath, and O. Henriksen, “Prediction of epileptic seizures from two-channel EEG,” Medical and Biological Engineering and Computing, vol. 36, no. 5, pp. 549–556, 1998.
- T. Maiwald, M. Winterhalder, R. Aschenbrenner-Scheibe, H. U. Voss, A. Schulze-Bonhage, and J. Timmer, “Comparison of three nonlinear seizure prediction methods by means of the seizure prediction characteristic,” Physica D: Nonlinear Phenomena, vol. 194, no. 3-4, pp. 357–368, 2004.
- M. Winterhalder, B. Schelter, T. Maiwald, A. Brandt, A. Schad, A. Schulze-Bonhage, and J. Timmer, “Spatio-temporal patient–individual assessment of synchronization changes for epileptic seizure prediction,” Clinical Neurophysiology, vol. 117, no. 11, pp. 2399–2413, 2006.
- Y. Zheng, G. Wang, K. Li, G. Bao, and J. Wang, “Epileptic seizure prediction using phase synchronization based on bivariate empirical mode decomposition,” Clinical Neurophysiology, vol. 125, no. 6, pp. 1104–1111, 2014.
- M. Z. Parvez and M. Paul, “Seizure prediction using undulated global and local features,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 1, pp. 208–217, jan 2017.
- Y. Park, L. Luo, K. K. Parhi, and T. Netoff, “Seizure prediction with spectral power of EEG using cost-sensitive support vector machines,” Epilepsia, vol. 52, no. 10, pp. 1761–1770, 2011.
- =2plus 43minus 4 “EEG Database at the Epilepsy Center of the University Hospital of Freiburg, Germany.” [Online]. Available: https://epilepsy.uni-freiburg.de/freiburg-seizure-prediction-project/eeg-database/ =0pt
- Z. Zhang and K. K. Parhi, “Low-complexity seizure prediction from iEEG/sEEG using spectral power and ratios of spectral power,” IEEE Transactions on Biomedical Circuits and Systems, vol. 10, no. 3, pp. 693–706, 2016.
- A. Aarabi and B. He, “Seizure prediction in hippocampal and neocortical epilepsy using a model-based approach,” Clinical Neurophysiology, vol. 125, no. 5, pp. 930–940, 2014.
- ——, “Seizure prediction in patients with focal hippocampal epilepsy,” Clinical Neurophysiology, 2017.
- A. Eftekhar, W. Juffali, J. El-Imad, T. G. Constandinou, and C. Toumazou, “Ngram-derived pattern recognition for the detection and prediction of epileptic seizures,” PloS one, vol. 9, no. 6, p. e96235, 2014.
- B. Sharif and A. H. Jafari, “Prediction of epileptic seizures from EEG using analysis of ictal rules on Poincaré plane,” Computer Methods and Programs in Biomedicine, vol. 145, pp. 11–22, 2017.
- K. Gadhoumi, J.-M. Lina, and J. Gotman, “Discriminating preictal and interictal states in patients with temporal lobe epilepsy using wavelet analysis of intracerebral EEG,” Clinical Neurophysiology, vol. 123, no. 10, pp. 1906–1916, oct 2012.
- S. Li, W. Zhou, Q. Yuan, and Y. Liu, “Seizure prediction using spike rate of intracranial EEG,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 21, no. 6, pp. 880–886, 2013.
- P. W. Mirowski, Y. LeCun, D. Madhavan, and R. Kuzniecky, “Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG,” 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 244–249, 2008.
- A. H. Shoeb, “Application of machine learning to epileptic seizure onset detection and treatment,” Ph.D. dissertation, Massachusetts Institute of Technology, 2009.
- B. H. Brinkmann, J. Wagenaar, D. Abbot, P. Adkins, S. C. Bosshard, M. Chen, Q. M. Tieng, J. He, F. J. Muñoz-Almaraz, P. Botella-Rocamora, J. Pardo, F. Zamora-Martinez, M. Hills, W. Wu, I. Korshunova, W. Cukierski, C. Vite, E. E. Patterson, B. Litt, and G. A. Worrell, “Crowdsourcing reproducible seizure forecasting in human and canine epilepsy,” Brain, vol. 139, no. 6, pp. 1713–1722, 2016.
- B. Schelter, M. Winterhalder, T. Maiwald, A. Brandt, A. Schad, A. Schulze-Bonhage, and J. Timmer, “Testing statistical significance of multivariate time series analysis techniques for epileptic seizure prediction,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 16, no. 1, p. 13108, 2006.
- G. M. Griffiths and J. T. Fox, “Rhythm in epilepsy,” The Lancet, vol. 232, no. 5999, pp. 409–416, 1938.
- P. J. Karoly, H. Ung, D. B. Grayden, L. Kuhlmann, K. Leyde, M. J. Cook, and D. R. Freestone, “The circadian profile of epilepsy improves seizure forecasting,” Brain, vol. 140, no. 8, pp. 2169–2182, 2017.