Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG
We present the first real-time sleep staging system that uses deep learning without the need for servers in a smartphone application for a wearable EEG. We employ real-time adaptation of a single channel Electroencephalography (EEG) to infer from a Time-Distributed 1-D Deep Convolutional Neural Network. Polysomnography (PSG)—the gold standard for sleep staging, requires a human scorer and is both complex and resource-intensive. Our work demonstrates an end-to-end on-smart phone pipeline that can infer sleep stages in just single 30-second epochs, with an overall accuracy of 83.5% on 20-fold cross validation for five-class classification of sleep stages using the open Sleep-EDF dataset.
Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG
Abhay Koushik MIT Media Lab Cambridge, MA USA firstname.lastname@example.org Judith Amores MIT Media Lab Cambridge, MA USA email@example.com Pattie Maes MIT Media Lab Cambridge, MA USA firstname.lastname@example.org
noticebox[b]Machine Learning for Health (ML4H) Workshop at NeurIPS 2018.\end@float
1 Introduction and Background
Having a proper night of sleep and regular circadian rhythm is crucial for physical, mental and social well-being [24, 28]. Sleep facilitates learning, memory-consolidation and emotion processing[5, 14]. Identification of sleep stages is important not only in diagnosing and treating sleep disorders but also for understanding the neuroscience of healthy sleep. Polysomnography (PSG) is used in hospitals to study sleep and diagnose sleep disorders. It is considered the gold standard for sleep staging and involves recording of multiple electrophysiological signals from the body such as brain activity using EEG, heart rhythm through Electrocardiography (ECG), muscle tone through Electromyography (EMG) and eye-movement through Electrooculography (EOG). PSG is a tedious procedure which requires skilled sleep technologists in a laboratory setting. Since EEG is the most reliable signal for sleep staging, automation of PSG can be achieved through accurate classification of EEG. Research in Deep Learning[15, 13] has led to many efficient algorithms to classify different kinds of data including bio-medical and physiological signals. In this paper, we focus on developing a real-time sleep staging application using Time-distributed 1-D Deep Convolutional Neural Networks to classify the five sleep stages in a comfortable environment. As per the new AASM rules, these stages are—Wake, Rapid-Eye-Movement (REM) and Non-Rapid-Eye-Movement (N-REM) stages N1, N2, and N3. We make use of a single channel EEG recorded through a modified research version of the Muse headband. The headband is flexible and can be comfortably used while sleeping. It has 5 electrodes, namely, AF7, AF8, TP9, TP10 and reference Fpz, for brain activity measurements.
Importance of mobile systems for sleep staging
PSG requires a minimum of 22 wires attached to the body in order to monitor sleep activity. The complexity of this setup requires sleeping in a hospital or laboratory with an expert monitoring and scoring signals in real-time. This results in an unnatural and disturbed night of sleep for the subject which may not only affect the diagnosis but also causes sub-optimal utilization of time and energy resources for recording and scoring and is as such highly undesirable. There is significant development in research on automating sleep staging with wireless signals and more compact, wearable devices[7, 21]. Nevertheless, none of these systems implements a five-stage classification of sleep in real-time.
The goal of our research was to simplify and reliably automate PSG on-smart phone in just unit 30-second non-overlapping epochs for automatic real-time interventions during experiments on sleep stages and cognition as the minimum time-resolution of a single sleep score by an expert is 30 seconds. Automated classification is achieved through adaptation of a Time-Distributed Deep Convolutional Neural Network model. Simplification is achieved by developing TensorFlow Lite Android application that uses only a single channel recording from a wearable EEG. We have also designed a friendly user interface that visualizes sleep stages and raw EEG data with real-time statistics about accuracy. Our app connects via Bluetooth Low Energy (BLE) to the flexible EEG headband thus making it portable and not restricted to laboratory and hospital use.
2 Related work
Automatic analysis and sleep scoring using multi-layer Neural Networks  was done as early as 1996 using 3 channels of physiological data, namely EEG, EOG and EMG. This involved power spectral density calculations for feature extraction from raw EEG which required a tedious laboratory setting to collect reliable data through these channels. More recent work has looked into creating portable sleep scoring systems, such as the work by Zhang et al. , that uses pulse, blood oxygen and motion sensors to predict sleep stages. In their paper, they do not detect sleep stages N1 and N2 separately, and N1 is usually the hardest one to predict. The authors already mention that these results cannot provide equally high accuracy as compared to the EEG and EOG signals of PSG. The same limitations apply to the work by Zhao et al. Our work achieves reliable accuracy by using only one channel from a wearable EEG, and overcomes the complexity of recording multiple signals.
Our model is based on Time-Distributed Deep Convolutional Neural Networks and is inspired by the DeepSleepNet from Supratak et al.. DeepSleepNet makes use of representation learning with a Convolutional Neural Network (CNN) followed by sequence residual learning using Bidirectional-Long Short Term Memory cells (Bi-LSTM). The major drawback of this network is that it requires 25 epochs of raw EEG data to be fed in together to obtain 25 labels. This is mainly because of the Bi-LSTM which relies on large temporal sequences to achieve better accuracy.
State-of-the art network model—SeqSleepNet processes multiple epochs and outputs the sleep labels all at once using end-to-end Hierarchical Recurrent Neural Networks. This uses all 3 channels—namely, EEG, EMG and EOG in order to give the best overall accuracy of 87.1% on the MASS dataset. CNN models by Sors et al. and Tsinalis et al., as well as SeqSleepNet and DeepSleepNet all use longer temporal sequences for inference—4, 5, 10 and 25 raw EEG epochs of 30 seconds respectively. We overcome this limitation by using Time-Distributed Deep CNN to predict single 30-second epochs with real-time adaptation from wearable EEG.
Flexibility of this wearable also makes it more preferable than the bulky system used by Lucey et al. The smart-phone based nature of our sleep-staging application overcomes the need for a client-server architecture as used in Dreem headband. Our TensorFlow-Lite mobile application can also be adapted to other types of EEG devices for real-time settings.
3 Methods and Materials
3.1 Dataset description and pre-processing
We used the expanded Sleep-EDF database from Physionet-bank. Single channel EEG (Fpz-Cz at 100Hz) of 20 subjects are divided into training set of 33 nights and validation set of 4 nights. Together, they contain non-overlapping nights of 19 subjects for 20 fold-cross validation. The non-overlapping test set contains 2 nights (1 subject). We remove the extra wake states before and after half an hour of sleep as described in the DeepSleepNet. We excluded MOVEMENT and UNKNOWN stages, and combined N4 and N3 to follow the five-stage classification as per the new AASM rules.
3.2 Model architecture and training
Our model architecture is described in Figure 1. The Base-CNN has 3 repeated sets of two 1-D convolutional (Conv1D) layers, 1-D max-pooling and spatial dropout layers. This is followed by two Conv1D, 1-D global max-pooling, dropout and dense layers. We finally have a dropout layer as the output of Base-CNN. 30-second epochs of normalized raw EEG at 100Hz is fed into the Time-Distributed Base-CNN model as described in the Figure 1. All Conv1D layers use Rectified-Linear-Units (ReLU) activation. The training uses an Adam optimizer of 0.001 with an initial learning rate of which is reduced each time the validation accuracy plateaus using ReduceLROnPlateau Keras Callbacks.
3.3 EEG adaptation and experiments
Pre-processing data from wearable EEG
Real-time brain activity from the flexible EEG headband is streamed via BLE to the smartphone application. Raw EEG from Af7 channel is down-sampled to 100Hz at the end of each 30-second epoch before feeding into the network.
EEG real-time adaptation
Since the raw EEG recording instrument used for training is different from the testing instrument, we adapt EEG using a Z-score scaling with wake-stage standard deviation for calibration. The main reason for choosing standard deviation of EEG over any other metrics is highlighted in Figure 2. We estimate the mutual information for a discrete sleep stage variable. We use the set of statistical features shown in the diagram as input calculated every 30 seconds and the corresponding sleep-stage labels as output of the dataset. Final feature shown is Min-Max-Distance (MMD) as described by Aboalayon et al., calculated as a sum over euclidean distances between minimum and maximum EEG values with 1 second sliding window.
We calculated the statistical feature importance of raw EEG for 3 randomly selected nights. The standard deviation clearly has the major role with average relative importance of 53.33 percent over other features for the classification of sleep stages. Z-score calibration of wake stage employs this important feature, hence, using this method for 30-second epoch adaptation makes the raw-EEG both instrument-independent and subject-independent as long as signal is noise-reduced.
Our model has an overall accuracy of 83.5% for 20-fold cross validation of five-stage classification. The accuracy for nights from the test data ranges from 72%(worst-case). This model achieves reliable accuracy given that the overall IRR (Inter-Rater-Reliability) among human experts scoring sleep recordings reported was about 80% (Cohen’s = 0.68 to 0.76).
Table1 describes the precision, recall, F1-score and support of all the five sleep stages on predictions from 5 test nights. The corresponding accuracy of 81.72% and F1-score of 76.23% was obtained. In addition, micro, macro and weighted average of these metrics are also calculated in order to give a better statistical understanding. The confusion matrix for the same night is shown in the left part of Figure 3. N1 stage shows the poorest agreement because of the absence of an occipital electrode.
The Figure 4 shows the difference between the hypnogram of one full-night predicted by the model and the ground truth hypnogram as labeled in the dataset. The mobile application which deploys this deep learning model is built using TensorFlow Lite. The raw data from the wearable EEG is wirelessly streamed to the phone and updated every 30 seconds with its correspondent sleep stage and confidence value. We successfully validated real-time Rapid-Eye-Movement (REM) detection using our wearable headband by re-creating the same closed, lateral eye movement during wakefulness. We have also simulated and successfully validated jaw clenching and blinks during wakefulness.
5 Conclusion and future scope
This work demonstrates an end-to-end mobile pipeline for the fastest real-time sleep-staging by adaptation of a wearable EEG. With the development of the new mobile TensorFlow Lite application, we achieve automated sleep staging without the need of servers, in a portable way that can be used anywhere, including the home. The application is versatile as it can be adapted to take in single channel(Fpz-Cz) recordings from any wearable EEGs. We aim to use this work for real-time interventions using Brain Computer Interfaces (BCI) for applications in Human Computer Interaction (HCI), such as wearable olfactory interfaces [2, 3], real-time audio-neural feedback[23, 9] and sleep-based enhancement of learning and memory[20, 4].
-  Khald Ali I Aboalayon, Miad Faezipour, Wafaa S Almuhammadi, and Saeid Moslehpour. Sleep stage classification using eeg signal analysis: a comprehensive survey and new investigation. Entropy, 18(9):272, 2016.
-  Judith Amores, Javier Hernandez, Artem Dementyev, Xiqing Wang, and Pattie Maes. Bioessence: A wearable olfactory display that monitors cardio-respiratory information to support mental wellbeing. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5131–5134. IEEE, 2018.
-  Judith Amores and Pattie Maes. Essence: Olfactory interfaces for unconscious influence of mood and cognitive performance. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 28–34. ACM, 2017.
-  Thomas Andrillon, Daniel Pressnitzer, Damien Léger, and Sid Kouider. Formation and suppression of acoustic memories during human sleep. Nature communications, 8(1):179, 2017.
-  M Alizadeh Asfestani, Elena Braganza, Jan Schwidetzky, J Santiago, S Soekadar, Jan Born, and Gordon B Feld. Overnight memory consolidation facilitates rather than interferes with new learning of similar materials—a study probing nmda receptors. Neuropsychopharmacology, 43(11):2292, 2018.
-  Richard B Berry, Rita Brooks, Charlene E Gamaldo, Susan M Harding, CL Marcus, BV Vaughn, et al. The aasm manual for the scoring of sleep and associated events. Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine, 2012.
-  Alexander J Casson, David C Yates, Shelagh JM Smith, John S Duncan, and Esther Rodriguez-Villegas. Wearable electroencephalography. IEEE engineering in medicine and biology magazine, 29(3):44–56, 2010.
-  Youness Mansar. EEG_classification. https://github.com/CVxTz/EEG_classification, 2018.
-  Eliran Dafna, Ariel Tarasiuk, and Yaniv Zigel. Sleep staging using nocturnal sound analysis. Scientific reports, 8(1):13474, 2018.
-  Heidi Danker-hopfe, Peter Anderer, Josef Zeitlhofer, Marion Boeck, Hans Dorn, Georg Gruber, Esther Heller, Erna Loretz, Doris Moser, Silvia Parapatics, et al. Interrater reliability for sleep scoring according to the rechtschaffen & kales and the new aasm standard. Journal of sleep research, 18(1):74–84, 2009.
-  Bob Kemp, Aeilko H Zwinderman, Bert Tuk, Hilbert AC Kamphuisen, and Josefien JL Oberye. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg. IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000.
-  Olave E Krigolson, Chad C Williams, Angela Norton, Cameron D Hassall, and Francisco L Colino. Choosing muse: Validation of a low-cost, portable eeg system for erp research. Frontiers in neuroscience, 11:109, 2017.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  Laura BF Kurdziel, Jessica Kent, and Rebecca MC Spencer. Sleep-dependent enhancement of emotional memory in early childhood. Scientific reports, 8, 2018.
-  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
-  Brendan P Lucey, Jennifer S Mcleland, Cristina D Toedebusch, Jill Boyd, John C Morris, Eric C Landsness, Kelvin Yamada, and David M Holtzman. Comparison of a single-channel eeg sleep study to polysomnography. Journal of sleep research, 25(6):625–635, 2016.
-  Christian O’reilly, Nadia Gosselin, Julie Carrier, and Tore Nielsen. Montreal archive of sleep studies: an open-access resource for instrument benchmarking and exploratory research. Journal of sleep research, 23(6):628–635, 2014.
-  Amiya Patanaik, Ju Lynn Ong, Joshua J Gooley, Sonia Ancoli-Israel, and Michael WL Chee. An end-to-end framework for real-time automatic sleep stage classification. Sleep, 41(5):zsy041, 2018.
-  Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y Chén, and Maarten De Vos. Seqsleepnet: End-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. arXiv preprint arXiv:1809.10932, 2018.
-  Björn Rasch, Christian Büchel, Steffen Gais, and Jan Born. Odor cues during slow-wave sleep prompt declarative memory consolidation. Science, 315(5817):1426–1429, 2007.
-  A Sano, AJ Phillips, AW McHill, S Taylor, LK Barger, CA Czeisler, and RW Picard. 0182 influence of weekly sleep regularity on self-reported wellbeing. Journal of Sleep and Sleep Disorders Research, 40(suppl_1):A67–A68, 2017.
-  Nicolas Schaltenbrand, Régis Lengelle, M Toussaint, R Luthringer, G Carelli, A Jacqmin, E Lainey, Alain Muzet, and Jean-Paul Macher. Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep, 19(1):26–35, 1996.
-  Maren D Schütze and Klaus Junghanns. The difficulty of staying awake during alpha/theta neurofeedback training. Applied psychophysiology and biofeedback, 40(2):85–94, 2015.
-  Eti Ben Simon and Matthew P Walker. Sleep loss causes social withdrawal and loneliness. Nature communications, 9, 2018.
-  Arnaud Sors, Stéphane Bonnet, Sébastien Mirek, Laurent Vercueil, and Jean-François Payen. A convolutional neural network for sleep stage scoring from raw single-channel eeg. Biomedical Signal Processing and Control, 42:107–114, 2018.
-  Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. Deepsleepnet: a model for automatic sleep stage scoring based on raw single-channel eeg. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(11):1998–2008, 2017.
-  Orestis Tsinalis, Paul M Matthews, and Yike Guo. Automatic sleep stage scoring using time-frequency analysis and stacked sparse autoencoders. Annals of biomedical engineering, 44(5):1587–1597, 2016.
-  Katharina Wulff, Silvia Gatti, Joseph G Wettstein, and Russell G Foster. Sleep and circadian rhythm disruption in psychiatric and neurodegenerative disease. Nature Reviews Neuroscience, 11(8):589, 2010.
-  Jin Zhang, Dawei Chen, Jianhui Zhao, Mincong He, Yuanpeng Wang, and Qian Zhang. Rass: A portable real-time automatic sleep scoring system. In Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd, pages 105–114. IEEE, 2012.
-  Mingmin Zhao, Shichao Yue, Dina Katabi, Tommi S Jaakkola, and Matt T Bianchi. Learning sleep stages from radio signals: a conditional adversarial architecture. In International Conference on Machine Learning, pages 4100–4109, 2017.
Appendix A Appendix
The Base-CNN model used in our work is described by the Figure 5. All dropout layers have a rate of 0.01, pooling layers have a size of 2 and the model is compiled with an Adam optimizer of 0.001. The corresponding parameters of each of the layers are given alongside for reference.