Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG

Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG

Abhay Koushik
MIT Media Lab
Cambridge, MA USA \AndJudith Amores
MIT Media Lab
Cambridge, MA USA \AndPattie Maes
MIT Media Lab
Cambridge, MA USA

We present the first real-time sleep staging system that uses deep learning without the need for servers in a smartphone application for a wearable EEG. We employ real-time adaptation of a single channel Electroencephalography (EEG) to infer from a Time-Distributed 1-D Deep Convolutional Neural Network. Polysomnography (PSG)—the gold standard for sleep staging, requires a human scorer and is both complex and resource-intensive. Our work demonstrates an end-to-end on-smart phone pipeline that can infer sleep stages in just single 30-second epochs, with an overall accuracy of 83.5% on 20-fold cross validation for five-class classification of sleep stages using the open Sleep-EDF dataset.


Real-Time Sleep Staging using Deep Learning on a Smartphone for a Wearable EEG

  Abhay Koushik MIT Media Lab Cambridge, MA USA Judith Amores MIT Media Lab Cambridge, MA USA Pattie Maes MIT Media Lab Cambridge, MA USA


noticebox[b]Machine Learning for Health (ML4H) Workshop at NeurIPS 2018.\end@float

1 Introduction and Background

Having a proper night of sleep and regular circadian rhythm is crucial for physical, mental and social well-being [24, 28]. Sleep facilitates learning, memory-consolidation and emotion processing[5, 14]. Identification of sleep stages is important not only in diagnosing and treating sleep disorders but also for understanding the neuroscience of healthy sleep. Polysomnography (PSG) is used in hospitals to study sleep and diagnose sleep disorders. It is considered the gold standard for sleep staging and involves recording of multiple electrophysiological signals from the body such as brain activity using EEG, heart rhythm through Electrocardiography (ECG), muscle tone through Electromyography (EMG) and eye-movement through Electrooculography (EOG). PSG is a tedious procedure which requires skilled sleep technologists in a laboratory setting. Since EEG is the most reliable signal for sleep staging, automation of PSG can be achieved through accurate classification of EEG[16]. Research in Deep Learning[15, 13] has led to many efficient algorithms to classify different kinds of data including bio-medical and physiological signals. In this paper, we focus on developing a real-time sleep staging application using Time-distributed 1-D Deep Convolutional Neural Networks to classify the five sleep stages in a comfortable environment. As per the new AASM rules[6], these stages are—Wake, Rapid-Eye-Movement (REM) and Non-Rapid-Eye-Movement (N-REM) stages N1, N2, and N3. We make use of a single channel EEG recorded through a modified research version of the Muse headband[12]. The headband is flexible and can be comfortably used while sleeping. It has 5 electrodes, namely, AF7, AF8, TP9, TP10 and reference Fpz, for brain activity measurements.

Importance of mobile systems for sleep staging

PSG requires a minimum of 22 wires attached to the body in order to monitor sleep activity. The complexity of this setup requires sleeping in a hospital or laboratory with an expert monitoring and scoring signals in real-time. This results in an unnatural and disturbed night of sleep for the subject which may not only affect the diagnosis but also causes sub-optimal utilization of time and energy resources for recording and scoring and is as such highly undesirable. There is significant development in research on automating sleep staging with wireless signals[30] and more compact, wearable devices[7, 21]. Nevertheless, none of these systems implements a five-stage classification of sleep in real-time.

The goal of our research was to simplify and reliably automate PSG on-smart phone in just unit 30-second non-overlapping epochs for automatic real-time interventions during experiments on sleep stages and cognition as the minimum time-resolution of a single sleep score by an expert is 30 seconds. Automated classification is achieved through adaptation of a Time-Distributed Deep Convolutional Neural Network model. Simplification is achieved by developing TensorFlow Lite Android application that uses only a single channel recording from a wearable EEG. We have also designed a friendly user interface that visualizes sleep stages and raw EEG data with real-time statistics about accuracy. Our app connects via Bluetooth Low Energy (BLE) to the flexible EEG headband thus making it portable and not restricted to laboratory and hospital use.

2 Related work

Automatic analysis and sleep scoring using multi-layer Neural Networks [22] was done as early as 1996 using 3 channels of physiological data, namely EEG, EOG and EMG. This involved power spectral density calculations for feature extraction from raw EEG which required a tedious laboratory setting to collect reliable data through these channels. More recent work has looked into creating portable sleep scoring systems, such as the work by Zhang et al. [29], that uses pulse, blood oxygen and motion sensors to predict sleep stages. In their paper, they do not detect sleep stages N1 and N2 separately, and N1 is usually the hardest one to predict. The authors already mention that these results cannot provide equally high accuracy as compared to the EEG and EOG signals of PSG. The same limitations apply to the work by Zhao et al[30]. Our work achieves reliable accuracy by using only one channel from a wearable EEG, and overcomes the complexity of recording multiple signals.

Our model is based on Time-Distributed Deep Convolutional Neural Networks[8] and is inspired by the DeepSleepNet from Supratak et al.[26]. DeepSleepNet makes use of representation learning with a Convolutional Neural Network (CNN) followed by sequence residual learning using Bidirectional-Long Short Term Memory cells (Bi-LSTM). The major drawback of this network is that it requires 25 epochs of raw EEG data to be fed in together to obtain 25 labels. This is mainly because of the Bi-LSTM which relies on large temporal sequences to achieve better accuracy.

State-of-the art network model—SeqSleepNet[19] processes multiple epochs and outputs the sleep labels all at once using end-to-end Hierarchical Recurrent Neural Networks. This uses all 3 channels—namely, EEG, EMG and EOG in order to give the best overall accuracy of 87.1% on the MASS dataset[17]. CNN models by Sors et al.[25] and Tsinalis et al.[27], as well as SeqSleepNet and DeepSleepNet all use longer temporal sequences for inference—4, 5, 10 and 25 raw EEG epochs of 30 seconds respectively. We overcome this limitation by using Time-Distributed Deep CNN to predict single 30-second epochs with real-time adaptation from wearable EEG.

Flexibility of this wearable also makes it more preferable than the bulky system used by Lucey et al[16]. The smart-phone based nature of our sleep-staging application overcomes the need for a client-server architecture as used in Dreem headband[18]. Our TensorFlow-Lite mobile application can also be adapted to other types of EEG devices for real-time settings.

3 Methods and Materials

3.1 Dataset description and pre-processing

We used the expanded Sleep-EDF database from Physionet-bank[11]. Single channel EEG (Fpz-Cz at 100Hz) of 20 subjects are divided into training set of 33 nights and validation set of 4 nights. Together, they contain non-overlapping nights of 19 subjects for 20 fold-cross validation. The non-overlapping test set contains 2 nights (1 subject). We remove the extra wake states before and after half an hour of sleep as described in the DeepSleepNet[26]. We excluded MOVEMENT and UNKNOWN stages, and combined N4 and N3 to follow the five-stage classification as per the new AASM rules[6].

3.2 Model architecture and training

Figure 1: Model architecture used for training and inference

Our model architecture is described in Figure 1. The Base-CNN has 3 repeated sets of two 1-D convolutional (Conv1D) layers, 1-D max-pooling and spatial dropout layers. This is followed by two Conv1D, 1-D global max-pooling, dropout and dense layers. We finally have a dropout layer as the output of Base-CNN. 30-second epochs of normalized raw EEG at 100Hz is fed into the Time-Distributed Base-CNN model[8] as described in the Figure 1. All Conv1D layers use Rectified-Linear-Units (ReLU) activation. The training uses an Adam optimizer of 0.001 with an initial learning rate of which is reduced each time the validation accuracy plateaus using ReduceLROnPlateau Keras Callbacks.

3.3 EEG adaptation and experiments

Pre-processing data from wearable EEG

Real-time brain activity from the flexible EEG headband is streamed via BLE to the smartphone application. Raw EEG from Af7 channel is down-sampled to 100Hz at the end of each 30-second epoch before feeding into the network.

EEG real-time adaptation

Since the raw EEG recording instrument used for training is different from the testing instrument, we adapt EEG using a Z-score scaling with wake-stage standard deviation for calibration. The main reason for choosing standard deviation of EEG over any other metrics is highlighted in Figure 2. We estimate the mutual information for a discrete sleep stage variable. We use the set of statistical features shown in the diagram as input calculated every 30 seconds and the corresponding sleep-stage labels as output of the dataset. Final feature shown is Min-Max-Distance (MMD) as described by Aboalayon et al.[1], calculated as a sum over euclidean distances between minimum and maximum EEG values with 1 second sliding window.

Figure 2: The heat-map shows the mutual importance of different statistical features of raw EEG. We choose the default of 3 nearest neighbours as the parameter for the mutual_info_classif function for automated feature-selection in scikit-learn machine learning library

We calculated the statistical feature importance of raw EEG for 3 randomly selected nights. The standard deviation clearly has the major role with average relative importance of 53.33 percent over other features for the classification of sleep stages. Z-score calibration of wake stage employs this important feature, hence, using this method for 30-second epoch adaptation makes the raw-EEG both instrument-independent and subject-independent as long as signal is noise-reduced.

4 Results

Our model has an overall accuracy of 83.5% for 20-fold cross validation of five-stage classification. The accuracy for nights from the test data ranges from 72%(worst-case). This model achieves reliable accuracy given that the overall IRR (Inter-Rater-Reliability)[10] among human experts scoring sleep recordings reported was about 80% (Cohen’s = 0.68 to 0.76).

Label Sleep Stage Precision Recall F1-score Support
0 Wake 0.83 0.96 0.89 730
1 N1 0.47 0.42 0.44 337
2 N2 0.87 0.83 0.85 2248
3 N3 0.92 0.81 0.86 931
4 REM 0.71 0.83 0.77 903
Micro average 0.82 0.82 0.82 5149
Macro average 0.76 0.77 0.76 5149
Weighted average 0.82 0.82 0.82 5149
Table 1: Five-class classification report for sleep staging on 5 test nights

Table1 describes the precision, recall, F1-score and support of all the five sleep stages on predictions from 5 test nights. The corresponding accuracy of 81.72% and F1-score of 76.23% was obtained. In addition, micro, macro and weighted average of these metrics are also calculated in order to give a better statistical understanding. The confusion matrix for the same night is shown in the left part of Figure 3. N1 stage shows the poorest agreement because of the absence of an occipital electrode[16].

Figure 3: Confusion matrix (left). Snapshot of the headband and android application (right)

The Figure 4 shows the difference between the hypnogram of one full-night predicted by the model and the ground truth hypnogram as labeled in the dataset. The mobile application which deploys this deep learning model is built using TensorFlow Lite. The raw data from the wearable EEG is wirelessly streamed to the phone and updated every 30 seconds with its correspondent sleep stage and confidence value. We successfully validated real-time Rapid-Eye-Movement (REM) detection using our wearable headband by re-creating the same closed, lateral eye movement during wakefulness. We have also simulated and successfully validated jaw clenching and blinks during wakefulness.

Figure 4: Comparison of the ground truth and predicted hypnograms of one full night

5 Conclusion and future scope

This work demonstrates an end-to-end mobile pipeline for the fastest real-time sleep-staging by adaptation of a wearable EEG. With the development of the new mobile TensorFlow Lite application, we achieve automated sleep staging without the need of servers, in a portable way that can be used anywhere, including the home. The application is versatile as it can be adapted to take in single channel(Fpz-Cz) recordings from any wearable EEGs. We aim to use this work for real-time interventions using Brain Computer Interfaces (BCI) for applications in Human Computer Interaction (HCI), such as wearable olfactory interfaces [2, 3], real-time audio-neural feedback[23, 9] and sleep-based enhancement of learning and memory[20, 4].


  • [1] Khald Ali I Aboalayon, Miad Faezipour, Wafaa S Almuhammadi, and Saeid Moslehpour. Sleep stage classification using eeg signal analysis: a comprehensive survey and new investigation. Entropy, 18(9):272, 2016.
  • [2] Judith Amores, Javier Hernandez, Artem Dementyev, Xiqing Wang, and Pattie Maes. Bioessence: A wearable olfactory display that monitors cardio-respiratory information to support mental wellbeing. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5131–5134. IEEE, 2018.
  • [3] Judith Amores and Pattie Maes. Essence: Olfactory interfaces for unconscious influence of mood and cognitive performance. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 28–34. ACM, 2017.
  • [4] Thomas Andrillon, Daniel Pressnitzer, Damien Léger, and Sid Kouider. Formation and suppression of acoustic memories during human sleep. Nature communications, 8(1):179, 2017.
  • [5] M Alizadeh Asfestani, Elena Braganza, Jan Schwidetzky, J Santiago, S Soekadar, Jan Born, and Gordon B Feld. Overnight memory consolidation facilitates rather than interferes with new learning of similar materials—a study probing nmda receptors. Neuropsychopharmacology, 43(11):2292, 2018.
  • [6] Richard B Berry, Rita Brooks, Charlene E Gamaldo, Susan M Harding, CL Marcus, BV Vaughn, et al. The aasm manual for the scoring of sleep and associated events. Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine, 2012.
  • [7] Alexander J Casson, David C Yates, Shelagh JM Smith, John S Duncan, and Esther Rodriguez-Villegas. Wearable electroencephalography. IEEE engineering in medicine and biology magazine, 29(3):44–56, 2010.
  • [8] Youness Mansar. EEG_classification., 2018.
  • [9] Eliran Dafna, Ariel Tarasiuk, and Yaniv Zigel. Sleep staging using nocturnal sound analysis. Scientific reports, 8(1):13474, 2018.
  • [10] Heidi Danker-hopfe, Peter Anderer, Josef Zeitlhofer, Marion Boeck, Hans Dorn, Georg Gruber, Esther Heller, Erna Loretz, Doris Moser, Silvia Parapatics, et al. Interrater reliability for sleep scoring according to the rechtschaffen & kales and the new aasm standard. Journal of sleep research, 18(1):74–84, 2009.
  • [11] Bob Kemp, Aeilko H Zwinderman, Bert Tuk, Hilbert AC Kamphuisen, and Josefien JL Oberye. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg. IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000.
  • [12] Olave E Krigolson, Chad C Williams, Angela Norton, Cameron D Hassall, and Francisco L Colino. Choosing muse: Validation of a low-cost, portable eeg system for erp research. Frontiers in neuroscience, 11:109, 2017.
  • [13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • [14] Laura BF Kurdziel, Jessica Kent, and Rebecca MC Spencer. Sleep-dependent enhancement of emotional memory in early childhood. Scientific reports, 8, 2018.
  • [15] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
  • [16] Brendan P Lucey, Jennifer S Mcleland, Cristina D Toedebusch, Jill Boyd, John C Morris, Eric C Landsness, Kelvin Yamada, and David M Holtzman. Comparison of a single-channel eeg sleep study to polysomnography. Journal of sleep research, 25(6):625–635, 2016.
  • [17] Christian O’reilly, Nadia Gosselin, Julie Carrier, and Tore Nielsen. Montreal archive of sleep studies: an open-access resource for instrument benchmarking and exploratory research. Journal of sleep research, 23(6):628–635, 2014.
  • [18] Amiya Patanaik, Ju Lynn Ong, Joshua J Gooley, Sonia Ancoli-Israel, and Michael WL Chee. An end-to-end framework for real-time automatic sleep stage classification. Sleep, 41(5):zsy041, 2018.
  • [19] Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y Chén, and Maarten De Vos. Seqsleepnet: End-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. arXiv preprint arXiv:1809.10932, 2018.
  • [20] Björn Rasch, Christian Büchel, Steffen Gais, and Jan Born. Odor cues during slow-wave sleep prompt declarative memory consolidation. Science, 315(5817):1426–1429, 2007.
  • [21] A Sano, AJ Phillips, AW McHill, S Taylor, LK Barger, CA Czeisler, and RW Picard. 0182 influence of weekly sleep regularity on self-reported wellbeing. Journal of Sleep and Sleep Disorders Research, 40(suppl_1):A67–A68, 2017.
  • [22] Nicolas Schaltenbrand, Régis Lengelle, M Toussaint, R Luthringer, G Carelli, A Jacqmin, E Lainey, Alain Muzet, and Jean-Paul Macher. Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep, 19(1):26–35, 1996.
  • [23] Maren D Schütze and Klaus Junghanns. The difficulty of staying awake during alpha/theta neurofeedback training. Applied psychophysiology and biofeedback, 40(2):85–94, 2015.
  • [24] Eti Ben Simon and Matthew P Walker. Sleep loss causes social withdrawal and loneliness. Nature communications, 9, 2018.
  • [25] Arnaud Sors, Stéphane Bonnet, Sébastien Mirek, Laurent Vercueil, and Jean-François Payen. A convolutional neural network for sleep stage scoring from raw single-channel eeg. Biomedical Signal Processing and Control, 42:107–114, 2018.
  • [26] Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. Deepsleepnet: a model for automatic sleep stage scoring based on raw single-channel eeg. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(11):1998–2008, 2017.
  • [27] Orestis Tsinalis, Paul M Matthews, and Yike Guo. Automatic sleep stage scoring using time-frequency analysis and stacked sparse autoencoders. Annals of biomedical engineering, 44(5):1587–1597, 2016.
  • [28] Katharina Wulff, Silvia Gatti, Joseph G Wettstein, and Russell G Foster. Sleep and circadian rhythm disruption in psychiatric and neurodegenerative disease. Nature Reviews Neuroscience, 11(8):589, 2010.
  • [29] Jin Zhang, Dawei Chen, Jianhui Zhao, Mincong He, Yuanpeng Wang, and Qian Zhang. Rass: A portable real-time automatic sleep scoring system. In Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd, pages 105–114. IEEE, 2012.
  • [30] Mingmin Zhao, Shichao Yue, Dina Katabi, Tommi S Jaakkola, and Matt T Bianchi. Learning sleep stages from radio signals: a conditional adversarial architecture. In International Conference on Machine Learning, pages 4100–4109, 2017.

Appendix A Appendix

The Base-CNN model used in our work is described by the Figure 5. All dropout layers have a rate of 0.01, pooling layers have a size of 2 and the model is compiled with an Adam optimizer of 0.001. The corresponding parameters of each of the layers are given alongside for reference.

Figure 5: Architecture of the Base-CNN model
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description