Deep Neural Networks with Weighted Averaged Overnight Airflow Features for Sleep Apnea-Hypopnea Severity Classification

Deep Neural Networks with Weighted Averaged Overnight Airflow Features for Sleep Apnea-Hypopnea Severity Classification

Payongkit Lakhan1, Apiwat Ditthapron2, Nannapas Banluesombatkul1 and Theerawit Wilaiprasitporn1 1Bio-inspired Robotics and Neural Engineering Lab,
School of Information Science and Technology, Vidyasirimedhi Institute of Science & Technology, Thailand
2Computer Department, Worcester Polytechnic Institute, Worcester, MA, USA.

Dramatic raising of Deep Learning (DL) approach and its capability in biomedical applications lead us to explore the advantages of using DL for sleep Apnea-Hypopnea severity classification. To reduce the complexity of clinical diagnosis using Polysomnography (PSG), which is multiple sensing platform, we incorporates our proposed DL scheme into one single Airflow (AF) sensing signal (subset of PSG). Seventeen features have been extracted from AF and then fed into Deep Neural Networks to classify in two studies. First, we proposed a binary classifications which use the cutoff indices at AHI = 5, 15 and 30 events/hour. Second, the multiple Sleep Apnea-Hypopnea Syndrome (SAHS) severity classification was proposed to classify patients into 4 groups including no SAHS, mild SAHS, moderate SAHS, and severe SAHS. For methods evaluation, we used a higher number of patients than related works to accommodate more diversity which includes 520 AF records obtained from the MrOS sleep study (Visit 2) database. We then applied the 10-fold cross-validation technique to get the accuracy, sensitivity and specificity. Moreover, we compared the results from our main classifier with other two approaches which were used in previous researches including the Support Vector Machine (SVM) and the Adaboost-Classification and Regression Trees (AB-CART). From the binary classification, our proposed method provides significantly higher performance than other two approaches with the accuracy of 83.46%, 85.39% and 92.69% in each cutoff, respectively. For the multiclass classification, it also returns a highest accuracy of all approaches with 63.70%.

sleep apnea-hypopnea syndrome (SAHS) severity classification, deep neural networks, machine learning, one single airflow sensing signals, feature extraction from airflow signals.

I Introduction

Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is characterized by repetitive episodes of airflow reduction (hypopnea) or cessation (apnea), which are caused by upper airway collapse during sleep[1]. Most common symptom of OSAHS is snoring, a sleep disturbance, which results in drowsiness during day time [2]. Furthermore, there are also effects to health qualities such as increasing the risk of Hypertension, Diabetes, Acute Myocardial Infarction, Heart attack, Stroke, Depression, etc.[2]. Polysomnography (PSG) is a clinical measurement technique for the sleep disorder diagnosis [3]. However, multiple physiological signal recordings such as electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), oxygen saturation of blood (SpO2), leg movement, airflow, cannula flow, respiratory rate and body position are incorporated to PSG [4]. In general, PSG is performed overnight inside sleep laboratory, either in the hospital or in the clinic [5]. Once PSG is recorded, medical doctor with OSAHS expertise need to perform an offline analysis on the whole physiological signals from PSG. Eventually, clinic would report Apnea-Hypopnea Index (AHI) which indicates severity of people with OSAHS[6]. AHI is catagorized as the following states: no SAHS (Sleep Apnea-Hypopnea Syndrome) (AHI 5 events/hr), mild SAHS (5 AHI 15 events/hr), moderate SAHS (15 AHI 30 events/hr), and severe SAHS (AHI 30 events/hr) [6]. Due to complexity and high cost of PSG [7, 8], one study reported that 90% of people who had OSAHS were undiagnosed [9]. Thus, simplifying OSAHS diagnosis remains a challenge issue.

A common approach to solve mentioned issue is reducing the complexity of SAHS (using single physiological signal), cost and analysis time, which are typically required in clinical diagnosis using PSG [10]. In previous works, researchers did try using single physiological recording from PSG to predict AHI using various computational methods. Single lead ECG, SpO2 from pulse oximeter and airflow (AF) from thermistor were proposed candidates in single recoding for SAHS diagnosis [11, 12, 13, 14]. Referred works are based on the same computational strategy which are finding violated periods on the signals and scoring them as apnea-hypopnea related events. The scores are interpreted into AHI eventually. In this study, we aim to develop an automated algorithms to predict SAHS severity by using single time-series, AF which is sensed by the thermistor in front of nose. The comparison of physiological recordings in standard PSG indicated that AF is the most direct measure in breathing obstruction. Moreover, an amplitude of AF will change dramatically during apnea or hypopnea periods [8][15].

In contrast to aforementioned computational strategy, we implemented our method by using statistical based features together with either classical machine learning approaches (support vector machine, SVM and Adaboost-Classification and Regression Trees, AB-CART) or a modern artificial neural networks (deep neural networks, DNNs). Proposed method begins with statistical extracting features of Apnea and Hypopnea events, and time domain from overnight AF signals. Then, we incorporated features into SVM, AB-CART and DNN for evaluations. Classification tasks had been arranged from simple scenarios which are binary classifications (cutoff indices at AHI = 5, 15 and 30 events/hr). Finally, we performed the same method on multiple classes (no-SAHS, mild-SAHS, moderate-SAHS, and severe-SAHS) afterwards. The experimental studies and performance evaluations were designed according to a previous research on SAHS severity classification using AF signals [16].

Merits of our works are proposing novel feature extractions from AF signals and performance evaluations on large population. Experimental results of proposed DNNs using proposed features outperformed classical machine learning approaches with the same features. Furthermore, accuracy of proposed DNNs was beyond AB-CART, which was reported to be state-of-the-art for SAHS detection using AF [16].

Ii Methods

In this section, we first introduce OSAHS datasets from men who parent Osteoporotic Fractures named MrOS [17, 18, 19, 20]. Then, we propose statistical based feature extraction from overnight AF signals. The extracted features had fed into SVM and DNN approaches for performance comparison afterwards. 10-fold cross-validation was used to evaluate the performance in both approaches. Binary classifications (three different AHI cutoff indexes) and multi-classes classifications (four severity levels including no-SAHS/control participants) were performed in this study.

Ii-a Datasets

MrOS sleep data (Visit 2) was used in this study. There were 1,026 men of age 65 years or older participated in standard sleep examinations from six clinical centers. Raw polysomnography (PSG) data in European Data Format (EDF) files with XML annotation files were exported from Compumedics Profusion software. AF signals in PSG were acquired from ProTech Thermistor sensors with 32 Hz sampling rate and high-pass filter at 0.15 Hz cutoff. Each annotation file includes starting and ending times of both Apnea and Hypopnea events. We labeled the severity of SAHS using the AHI variable provided in the datasets. Here, AHI are the numbers of Apnea events in all desaturations and Hypopnea events with 4% oxygen desaturation per hour[21].

Ii-B Subsampling and Data Preparation

We did random 520 subjects from the whole

no mild mod severe All
Subjects 185 190 85 60 520

datasets for our study. In regard to personal annotation file, we gathered pieces of AF signals during Apnea and Hypopnea events individually. Both AF signals from Apnea and Hypopnea were treated in the same way. In this way, each participant had different numbers and periods of AF samples. Low frequency band is usually a major band in the AF signals [22], so low-pass filter with 3 Hz cut-off was applied on all AF samples prior to feature extraction process.

Ii-C Feature Extraction

After the subsampling and filtering processes, we extracted 17 features from overnight AF samples in the followings:

  • Number of Apnea events.

  • Number of Hypopnea events.

  • Summation of Apnea and Hypopnea events.

  • Summation of periods (in seconds) from Apnea and Hypopnea events

  • Average of maximum amplitudes from all AF samples.

  • Average of minimum amplitudes from all AF samples.

  • Average of mean amplitudes from all AF samples.

  • Average of standard deviation of amplitudes from all AF samples.

  • Maximum periods from all AF samples.

  • Minimum periods from all AF samples.

  • Mean of periods from all AF samples.

  • Standard deviation of Apnea and Hypopnea periods from all AF samples.

  • Variance of periods from all AF samples.

  • Weighted averaged maximum amplitude from all AF samples

  • Weighted averaged minimum amplitude from all AF samples

  • Weighted averaged mean amplitude from all AF samples

  • Weighted averaged standard deviation of amplitudes from all AF samples

Period of each AF sample is the weight factor in weighted averaging on the last four features.

Fig. 1: Illustration of proposed SAHS severity classifier using seventeen statistical-based features and Deep Neural Networks (DNNs).

Ii-D Classification of Sleep Apnea-Hypopnea Syndrome (SAHS) Severity

There were two main classification tasks in our experiments. First, we aimed to construct three binary classifiers for three AHI cutoff indices, which are clinical standard SAHS cutoff classes (AHI = 5, 15 and 30 events/hr). The number of subjects in this task were shown in Table I. The second task was to classify subjects in to four standard SAHS classes (normal, mild, moderate and severe). To avoid imbalanced population we did subsampling again in this task, from 520 to be 270 subjects including 70 from normal, mild and moderate subjects and 60 from severe subjects.

Here, we incorporated proposed features into proposed DNNs and classical machine learning (ML) approaches, which are Support Vector Machine (SVM) and Adaboost-Classification and Regression Trees (AB-CART), for comparative classifiers. To validate DNNs, we did split datasets into three subsets which were 80% for training set, 10% for validation set and 10% for testing set. To validate the rest of comparative classifiers, we did use exactly the same datasets as DNNs except validation set. Classical ML does not require validation set or we can simply say that testing set is same as validation set in ML. 10-fold cross-validation had incorporated into these datasets.

Ii-D1 Deep Neural Networks (DNNs)

As shown in Figure1, proposed features were fed into the DNNs. DNNs was implemented using Keras API [23] with configuration parameters as follows:

  • A stack of fully-connected neural networks with layer size of 1024, 512, 256, 128, 64, 32, 16, 8, and 4 hidden nodes.

  • Each DNN layer was followed by the Hyperbolic tangent (tanh) activation function.

  • The optimizer was RMSprop with the learning rate of 0.001.

  • The softmax function was applied for classification.

Ii-D2 Machine Learning (ML) Approaches

To present superiority of proposed DNNs over conventional approaches, two classical ML approaches had been used as comparative or baseline classifiers. Conventional SVM with linear kernel and balancing classed weights and AB-CART from scikit-learn API [24] had been performed in this study. While SVM is standard baseline classifier, AB-CART had been proposed to use with Airflow (AF)-related sleep Apnea severity in previous study [16].

In the binary classifications, one way repeated measure analysis of variance had been implemented to compare the performance of three classifiers using three metrics: sensitivity, specificity and accuracy. While, confusion matrices had been computed for performance comparison of multiple classes task.

Sensitivity Specificity Accuracy
Cutoff 5
Cutoff 15
Cutoff 30
TABLE II: Summary of 10-fold Sensitivity, Specificity and Accuracy of binary classification in each AHI severity cutoff using SVM, AB-CART and DNN classifier. Bold numbers in the table represent the significant highest values in each cutoff.
Predicted no mild mod severe no mild mod severe no mild mod severe
Actual no 56 9 4 1 41 22 3 4 57 12 1 0
mild 20 26 17 7 18 26 20 6 11 44 12 3
mod 11 21 18 20 12 18 24 16 10 17 35 8
severe 4 2 14 40 4 6 17 33 3 9 12 36

Iii Results and Discussion

After performing binary classification using 3 SAHS severity level cutoffs including AHI = 5, 10 and 15 along with the SVM classifiers, the AB-CART classifiers and our main classifier with or DNN approach, the results are shown in Table II.

For cutoff at AHI = 5, the accuracy of SVM ranges from 76.4% to 79.38% (mean standard error, 77.89% 1.49%), AB-CART ranges from 75.78% to 78.54% (mean standard error, 77.12% 1.42%) and our main classifier ranges from 82.38% to 84.54% (mean standard error, 83.46% 1.08%).

For cutoff at AHI = 15, the accuracy of SVM ranges from 77.64% to 80.82% (mean standard deviation error, 79.23% ± 1.59%), AB-CART ranged from 77.7% to 80% (mean standard error, 78.85% 1.15%) and our main classifier ranges from 84.14% to 86.64% (mean standard error, 85.39% 1.25%).

For cutoff at AHI = 30, the accuracy of SVM ranges from 76.53% to 79.61% (mean standard error, 78.07% 1.54%), AB-CART accuracy ranges from 89.82% to 91.7% (mean standard error, 90.76% ± 0.94%) and our main classifier ranges from 92.14% to 93.24% (mean standard error, 92.69% 0.55%).

While our DNNs classifiers reached the highest accuracy from all classifiers in every cutoffs and also increased in each of the cutoff, the SVM reached the highest Specificity at AHI = 5 and Sensitivity at AHI = 15 and 30, the AB-CART reached the highest Sensitivity at AHI = 5 and Sensitivity at AHI = 15 and 30. One way repeated measures ANOVA revealed that there were significant difference of mean accuracy among results from three approaches in every cutoffs (AHI=5: F(2)=2313.822, p0.05, AHI=15: F(2)=9.850, p0.05, AHI=30: F(2)=50.771, p0.05). After pairwise comparisons were performed, we found that the accuracies of our DNNs classifiers are significantly higher than others (p0.05). Consequently, we can conclude that our classifiers are able to sustain the sensitivity and specificity while still maintaining the highest accuracy in all cutoffs.

Additionally, after the data was balanced in each SAHS level and classified non-linearly into 4 classes, the cumulative confusion matrix of 10 folds are computed as shown in the table for every approaches III. The results from our DNNs classifiers are promising and higher than the others, with the overall accuracy of 63.70%, while the SVM reached only 51.85% and the AB-CART reached only 45.93%. It represents that our DNNs classifier provides a higher diagnostic performance than the other approaches.

Iv Conclusion

In summary, we proposed statistical based feature extraction from single channel overnight airflow (AF) signals. There are seventeen features in total. Sets of features had fed into proposed DNNs and classical machine learning (ML) approaches, which are Support Vector Machine (SVM) and Adaboost-Classification and Regression Trees (AB-CART), for comparison. Binary and multiple sleep Apnea-Hypopnea severity classifications had been conducted to demonstrate the performance of our proposed features with DNNs which outperformed classical machine learning techniques.


  • [1] Eric J. Olson, John G. Park, and Timothy I. Morgenthaler. Obstructive sleep apnea-hypopnea syndrome. Primary Care: Clinics in Office Practice, 32(2):329–359, 2005.
  • [2] Shahrokh Javaheri, Ferran Barbe, Francisco Campos-Rodriguez, Jerome A. Dempsey, and et al. Sleep apnea: Types, mechanisms, and clinical cardiovascular consequences. Journal of the American College of Cardiology, 69(7):841–858, 2017.
  • [3] Susheel P. Patil, Hartmut Schneider, Alan R. Schwartz, and Philip L. Smith. Adult obstructive sleep apnea: Pathophysiology and diagnosis. Chest, 2007.
  • [4] Clete A. Kushida, Michael R. Littner, Timothy Morgenthaler, Cathy A. Alessi, and et al. Practice parameters for the indications for polysomnography and related procedures: an update for 2005. Sleep, 28(4):499–521, 2005.
  • [5] Rahul K. Kakkar and Richard B. Berry. Positive airway pressure treatment for obstructive sleep apnea. Chest, 132(3):1057 – 1072, 2007.
  • [6] Asher Qureshi, Robert D Ballard, and Harold S Nelson. Obstructive sleep apnea. Journal of Allergy and Clinical Immunology, 112(4):643 – 651, 2003.
  • [7] J A BENNETT and W J M KINNEAR. Sleep on the cheap: the role of overnight oximetry in the diagnosis of sleep apnoea hypopnoea syndrome. Thorax, 54:958–959, 1999.
  • [8] W. Ward Flemons, Michael R. Littner, James A. Rowley, Peter Gay, W. McDowell Anderson, David W. Hudgel, R. Douglas McEvoy, and Daniel I. Loube. Home diagnosis of sleep apnea: A systematic review of the literature: An evidence review cosponsored by the american academy of sleep medicine, the american college of chest physicians, and the american thoracic society. Chest, 124(4):1543 – 1579, 2003.
  • [9] M. Singh, P. Liao, S. Kobah, D.N. Wijeysundera, C. Shapiro, and F. Chung. Proportion of surgical patients with undiagnosed obstructive sleep apnoea. British Journal of Anaesthesia, 110(4):629 – 636, 2013.
  • [10] Marcel Młyńczak, Ewa Migacz, Maciej Migacz, and Wojciech Kukwa. Detecting breathing and snoring episodes using a wireless tracheal sensor—a feasibility study. IEEE journal of biomedical and health informatics, 21(6):1504–1510, 2017.
  • [11] Fernanda Ribeiro de Almeida, Najib T Ayas, Ryo Otsuka, Hiroshi Ueda, Peter Hamilton, Frank C Ryan, and Alan A Lowe. Nasal pressure recordings to detect obstructive sleep apnea. Sleep and Breathing, 10(2):62–69, 2006.
  • [12] Ulysses J Magalang, Jacek Dmochowski, Sateesh Veeramachaneni, Azmi Draw, M Jeffery Mador, Ali El-Solh, and Brydon JB Grant. Prediction of the apnea-hypopnea index from overnight pulse oximetry. Chest, 124(5):1694–1701, 2003.
  • [13] Thomas Penzel, J McNames, P De Chazal, B Raymond, A Murray, and G Moody. Systematic comparison of different algorithms for apnoea detection based on electrocardiogram recordings. Medical and Biological Engineering and Computing, 40(4):402–407, 2002.
  • [14] Carlos Alberto Nigro, Eduardo Dibur, Silvia Aimaretti, Sergio González, and Edgardo Rhodius. Comparison of the automatic analysis versus the manual scoring from apnealink™ device for the diagnosis of obstructive sleep apnoea syndrome. Sleep and Breathing, 15(4):679–686, 2011.
  • [15] Richard B Berry, Rohit Budhiraja, Daniel J Gottlieb, David Gozal, Conrad Iber, Vishesh K Kapur, Carole L Marcus, Reena Mehra, Sairam Parthasarathy, Stuart F Quan, et al. Rules for scoring respiratory events in sleep: update of the 2007 aasm manual for the scoring of sleep and associated events: deliberations of the sleep apnea definitions task force of the american academy of sleep medicine. Journal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine, 8(5):597, 2012.
  • [16] Gonzalo C Gutiérrez-Tobal, Daniel Álvarez, Félix del Campo, and Roberto Hornero. Utility of adaboost to detect sleep apnea-hypopnea syndrome from single-channel airflow. IEEE Transactions on Biomedical Engineering, 63(3):636–646, 2016.
  • [17] Dennis A Dean, Ary L Goldberger, Remo Mueller, Matthew Kim, Michael Rueschman, Daniel Mobley, Satya S Sahoo, Catherine P Jayapandian, Licong Cui, Michael G Morrical, et al. Scaling up scientific discovery in sleep medicine: the national sleep research resource. Sleep, 39(5):1151–1164, 2016.
  • [18] Janet Babich Blank, Peggy Mannen Cawthon, Mary Lou Carrion-Petersen, Loretta Harper, J Phillip Johnson, Eileen Mitson, and Romelia Ramírez Delay. Overview of recruitment for the osteoporotic fractures in men study (mros). Contemporary clinical trials, 26(5):557–568, 2005.
  • [19] Eric Orwoll, Janet Babich Blank, Elizabeth Barrett-Connor, Jane Cauley, Steven Cummings, Kristine Ensrud, Cora Lewis, Peggy M Cawthon, Robert Marcus, Lynn M Marshall, et al. Design and baseline characteristics of the osteoporotic fractures in men (mros) study—a large observational study of the determinants of fracture in older men. Contemporary clinical trials, 26(5):569–585, 2005.
  • [20] Terri Blackwell, Kristine Yaffe, Sonia Ancoli-Israel, Susan Redline, Kristine E Ensrud, Marcia L Stefanick, Alison Laffan, Katie L Stone, and Osteoporotic Fractures in Men Study Group. Associations between sleep architecture and sleep-disordered breathing and cognition in older community-dwelling men: the osteoporotic fractures in men sleep study. Journal of the American Geriatrics Society, 59(12):2217–2225, 2011.
  • [21] Lili Chen, Xi Zhang, and Changyue Song. An automatic screening approach for obstructive sleep apnea diagnosis based on single-lead electrocardiogram. IEEE Transactions on Automation Science and Engineering, 12(1):106–115, 2015.
  • [22] Daniel Álvarez, GC Gutierrez, J Víctor Marcos, Félix del Campo, and Roberto Hornero. Spectral analysis of single-channel airflow and oxygen saturation recordings in obstructive sleep apnea detection. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, pages 847–850. IEEE, 2010.
  • [23] François Chollet et al. Keras., 2015.
  • [24] Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108–122, 2013.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description