Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing*
Abstract
Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical ventilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semiMarkov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust timeseries modeling using semiMarkov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84% of infants who failed extubation could have been accurately identified prior to extubation.
I Introduction
Due to immature lungs and respiratory control, extremely preterm infants (gestational age â¤ 28 weeks) are at high risk of respiratory failure after birth. For that reason, most require endotracheal intubation and Invasive Mechanical Ventilation (MV) within the first hours or days of life [1]. As the respiratory status improves, MV is continuously weaned until the medical team deems that removal of the tube, termed extubation, can be performed. This is a difficult decision since if done too early, it could result in the need for reintubation, a technically challenging intervention that has been associated with increased morbidities and mortality [2, 3]. On the other hand, delayed extubation, and hence prolonged mechanical ventilation, could increase the risk of complications. The most common complication is bronchopulmonary dysplasia (BPD), a form of chronic lung disease associated with serious longterm sequelae [4].
This project is part of an ongoing multicenter, multidisciplinary collaborative study aiming to develop an automated prediction tool of extubation readiness using analysis of cardiorespiratory signals in extremely preterm infants (Clinicaltrials.gov identifier: NCT01909947) [5]. In the present work, we approach the task of predicting whether a patient would succeed or fail extubation based solely on respiratory patterns extracted from a 5minute period of spontaneous breathing trial (SBT), while patient was still intubated but not receiving mechanical inflations from the ventilator.
Methods of generative modeling using Markov and semiMarkov chains were applied. The respiratory time series was processed as a sequence of five unique and mutually exclusive breathing patterns/states: Pause, Asynchronous breathing, Movement artifact, Synchronous breathing and Unknown, using an existing method of analysis of the respiratory signals called AUREA (Automated Unsupervised Respiratory Event Analysis) [6]. The models fitted to these states sequences revealed key similarities and differences between patients who succeeded and failed extubation. The characteristics were exploited in a support vector machinebased discriminative classifier and results indicated that our model could identify infants who went on failing extubation with up to 84% accuracy.
The rest of this paper is organized as follows. Section II reviews related work. Section III describes the data acquisition and preprocessing steps. Section IV describes our methodology. Section V presents results and provides a discussion on their significance, and in Section VI, we discuss future research directions.
Ii Related Work
The use of Markov chain models for understanding sequence data and making predictions about outcomes is popular across many domains, especially when the values of the timeseries are drawn from a discrete set of states. Ye et al [7] applied Markov chain modeling to detect anomalous activities in computer networks. In another study by Gabriel et al [8], this method was successfully used to track rainfall occurrence patterns. More recently, Alinovi et al [9] applied semiMarkov chains to model the respiratory rate of infant patients experiencing breathingrelated disorders such as apneas. In that study, respiratory rate was transformed from a realvalued timeseries in to 5 levels by quantization. The authors demonstrated that semiMarkov chain models accurately described respiratory rate and simulated realistic sequences of respiratory rate in both normal and apneic infants.
Previous work in predicting extubation readiness of extreme preterm newborns have demonstrated that cardiorespiratory signals hold useful discriminating information between infants that succeed and those that fail extubation. In particular 2 prospective studies by Kaczmarek et al [10] and RoblesRubio et al [11]on a small cohort of extremely preterm infants, extracted measures of heart rate variability (HRV) and respiratory variability (RV) from electrocardiogram (ECG) and respiratory inductive plethysmography (RIP) signals, respectively. Results revealed that measures of HRV and RV were excellent at detecting extubation successes (i.e. high specificity), but not as sensitive in detecting failures. In [12], using patients of the same cohort, a Support Vector Machine (SVM) was developed to predict extubation readiness directly from the combination of cardiorespiratory variability measures. This system achieved improved ability to detect extubation failures (sensitivity of 83%) while maintaining a fairly high specificity of 74%. It should however be noted that these works were carried out on fairly small datasets.
In this current work, we analyzed a larger dataset of 186 babies collected as part of a multiinstitutional study. We explore the predictive potential, not of HRV or RV but, of respiratory pattern sequences of the infants. The current work: 1) empirically demonstrated that the semiMarkov modeling has a more robust capability for timeseries data than the Markov chain models; 2) used semiMarkov chain models to understand the transition behavior of meaningful respiratory patterns representing actual respiratory states (not just quantized realvalues); 3) showed that, in addition to generative classification via maximizing joint likelihood, the parameters of semi Markov chains can be exploited in discriminative classifiers to improve performance. To the best of our knowledge, this is the first work that applies a timeseriesbased machine learning method to predict extubation readiness in extremely preterm infants.
Iii Data
Iiia Data Acquisition
Data from 186 infants was available for this study. Patients were enrolled and studied in five sites in Canada (Royal Victoria Hospital, Montreal Children’s Hospital, Jewish General Hospital) and the USA (Detroit Medical Center, MI, and Women and Infants Hospital of Rhode Island, RI). Ethical approval was obtained from each institutional board, and informed parental consent was obtained before recruitment.
Eligible infants were of birth weight 1250g, and receiving MV at time of recruitment. Infants were excluded if they had any major congenital anomalies such as heart disease, or were receiving any vasopressor or sedative drugs at the time of extubation.
Respiratory signals were measured using RIP bands placed around the infant’s ribcage and abdomen. Signals were acquired for a 5minute period of spontaneous breathing without any mechanical inflation from the ventilator, prior to extubation, at a sampling frequency of 1000Hz. Only infants which were then extubated were included in the dataset. Failure after extubation was defined as reintubation within 7 days. More details on the data collection procedure for this study are available in [5].
IiiB Data Preprocessing
RIP signals sampled at 50Hz were analyzed using AUREA, which extracts samplebysample metrics of respiratory power, synchrony between the ribcage (RCG) and abdomen (ABD), and movement artifact [6]. AUREA uses kmeans clustering with windowing and smoothing to assign each sample to one of the following respiratory patterns:

Pause (PAU): A cessation of breathing.

Synchronous Breathing (SYB): RCG and ABD are in phase.

Asynchronous Breathing (ASB): RCG and ABD are out of phase.

Movement Artifact (MVT): Associated with infant moving or nurse handling.

Unknown (UNK): Ambiguous patterns not belonging to any other pattern category.
Iv Methods
We applied Markov chains and SemiMarkov chains to model the time series of respiratory patterns, and then used SVM with features extracted from these chains to predict extubation readiness. We now describe the analysis methods.
Iva Discretetime Markov Chain
A Markov chain is a probabilistic graphical model in which every node in the chain is only dependent on the one preceding it. Timeseries data can be represented as a Markov chain whereby the nodes of the chain are the values observed at every time step. We use discretetime finite Markov chains, in which at time , state takes a value from a finite set of states (in our case, the 5 respiratory patterns provided by AUREA).
A Markov chain model has 2 sets of parameters: the probability distribution over initial states (a vector) and the transition probabilities between states (a matrix) . Fitting or learning the model of a Markov chain involves estimating these parameters from data. Their maximum likelihood estimates are given by [13]:
(1)  
(2) 
where is the number of time steps in which a transition from state i to j occurred. Given a timeseries of observations , the joint likelihood of the sequence according to the Markov chain is given by:
(3) 
Note that in practice, the start state distribution was not included in our model, due to an adapation or transition phase from MV with inflations to ETTCPAP without inflations.
In order to apply Markov chain models for classification, separate transition models , were first fit to the data coming from the success and failure patients, respectively. The classification for a new sequence is done by computing its posterior likelihood with respect to both models (using Eq.3), and selecting the class, c whose model gives the higher likelihood:
(4) 
where
(5) 
IvB Discretetime SemiMarkov Chain
A SemiMarkov chain model, on the other hand, is characterised by 3 parameters: a start state distribution vector ; the transition matrix , which stores only crossstate transition probabilities (diagonal elements are 0); and a set of dwell or sojourn time distributions , which model the duration spent in each state until a transition out of that state occurs.
This framework was useful in our application for several reasons. First, Markov chains implicitly model dwell times as an exponential distribution [14] which could introduce bias into the model if underlying data is not actually exponential. Secondly, in data characterised by very long dwell times, the transition probabilities of crossstate transitions (offdiagonal elements) go to 0, making it very difficult to get any useful information from the model. Finally, a Markov chain is highly susceptible to changes in the sampling rate of the data. SemiMarkov chains address all of these issues.
The joint likelihood of a sequence of observations under a semiMarkov chain is given by:
(6) 
where is the probability of sojourning in the state for the duration . [15]
In modeling the infant respiratory pattern sequences as a semiMarkov chain, was learned as before. To fit the dwell time distributions, all dwell times in a state (e.g., PAU) for a particular population (success or failure) were obtained. Several known probability distributions were fit to this data using MATLAB [16]. The distribution which minimized the Bayesian Information Criterion (BIC) [17] was selected. These steps were repeated for all states in both success and failure groups to obtain 10 separate dwell time distributions. Classification of a new example sequence as success or failure was done as described in the previous section by selecting the class of larger posterior likelihood.
IvC Support Vector Machine (SVM)
Using the Markov model likelihood for classification can be suboptimal if the model structure or some of the model parameters are imprecise. Discriminative models do not make probabilistic assumptions about how the inputs were generated, but rather attempt to learn a (linear or nonlinear) boundary between the groups. Support vector machines (SVMs), in particular, learn a maximum margin decision boundary [18]. In order to compare our results to the Markov chain case, we derived summary statistics from the respiratory pattern sequence of each patient and used these as inputs to train an SVM. In particular, a radial basis function (RBF) SVM was used. The hyperparameters of the RBF SVM  box constraint, C (which penalises the error function to manage overfitting) and kernel scale (which controls the width of the Gaussian)  were optimized using leaveoneout crossvalidation and the balanced loss metric, as motivated in section IVE.
IvD Symmetric KL Divergence
The KullbackLeibler (KL) divergence [19] is a measure of how well a distribution is approximating another distribution . It is defined as:
(7) 
The KLdivergence is nonsymmetric: , which is not desirable in our application. Hence, we use symmetrized KLdivergence to compare distributions over transitions between patterns:
(8) 
(9) 
IvE Model Evaluation
Due to the small size of our dataset and class imbalance, it was necessary to give special thought to the choice of evaluation method. First, due to the relatively small number of examples, we used leaveoneout crossvalidation instead of kFold crossvalidation. Though computationally more expensive, leaveoneout is a better estimator for small datasets [20].
Secondly, the class imbalance meant that optimizing for classification accuracy could result in suboptimal (or even degenerate) models, because it implicitly assigns higher weights to examples of the majority class. To address this, the true positive rate (sensitivity) and true negative rate (specificity) were tracked separately. As a single evaluation metric for choosing one model over another, we use the balanced misclassification loss (see Appendix A).
V Results
Of the 186 patients recruited, 136 succeeded extubation and 50 failed. We discuss the results of our experiments in the following subsections.
Success  Failure  
Pause  Exponential  Exponential 
Asynchrony  GeneralizedExtremeValue  GeneralizedExtremeValue 
k=0.63, = 1.30, =1.85  k=0.65, = 1.36, =1.81  
Movement  GeneralizedPareto  GeneralizedPareto 
k=0.22, = 3.62  k=0.11, = 3.31  
Synchrony  InverseGaussian  InverseGaussian 
=8.61, = 3.61  =7.83, = 3.41  
Unknown  GeneralizedPareto  GeneralizedPareto 
k=0.07, = 2.07  k=0.10, = 2.05 
PAU  ASB  MVT  SYB  UNK  

PAU  0  0.27  0.09  0.26  0.38 
ASB  0.10  0  0.16  0.29  0.45 
MVT  0.12  0.32  0  0.43  0.14 
SYB  0.06  0.25  0.15  0  0.54 
UNK  0.13  0.28  0.04  0.55  0 
PAU  ASB  MVT  SYB  UNK  

PAU  0  0.28  0.06  0.39  0.28 
ASB  0.12  0  0.21  0.28  0.40 
MVT  0.17  0.41  0  0.32  0.09 
SYB  0.14  0.21  0.14  0  0.52 
UNK  0.15  0.30  0.03  0.52  0 
Va Analysis of Respiratory State Durations
As a starting point, the total time spent in each breathing pattern was obtained for the two groups of patients, shown in Fig. 2. It was normalized by the total duration of spontaneous breathing to account for the difference in number of examples in both classes. In the 2 groups, SYB was the dominant state at least 50% of the time, with the success patients spending slightly more time in this state. The failure patients on the other hand spent a greater fraction of time in the PAU state. The time spent in the other states were essentially equivalent as confirmed by standard error estimates obtained by bootstrapping, shown in Fig. 2.
VB Modeling of Dwell Times
The optimal dwell time distributions in each state were estimated for both groups (as described in section IVB) and are summarized in Table I. The detailed plots of the probability density functions (PDF) are shown in Appendix C. It was interesting to observe that for each pattern, the distribution type which best fit the dwell times in both populations were same. We suspect that this an indication of some underlying consistency in breathing behaviour of premature infants in spite of extubation outcome. It should also be noted that the dwell time was distributed exponentially only in the PAU state. As discussed earlier, the use of a Markov chain model would have implicitly taken an exponential distribution for all states. The use of semiMarkov framework has thus allowed for a more expressive and accurate representation.
VC Modeling Transitions
The respiratory pattern sequences were modeled as semiMarkov chains (Tables II and III). Each cell in the matrix represents the probability of transitioning from the state labeled on the row to that on the column. It was observed that the most probable transition given ASB and SYB (the breathing states) was the same in both infants who succeeded and those who failed extubation (shown in bold, black font). However, it differed for PAU and MVT states (shown in bold, red font). This suggests that in terms of transition behaviour, the 2 groups of infants differ more in the transitions emanating from nonbreathing states than from breathing states.
Further, the symmetric KL divergence () between the 2 transition matrices for the semiMarkov model was estimated as 0.27. When modeled as a Markov chain (see details in Appendix B), the probabilities of crosspattern transitions and the tended towards zero (0.0019), indicating that the learned models were almost identical due to relatively long dwell times. The semiMarkov transition matrix, on the other hand, being invariant to sampling frequency changes and/or length of dwell time, resulted to better numerical resolution of crosspattern transitions and the learning of more discriminating characteristics between both groups of infants.
VD Generative Classification with SemiMarkov Chain Models
The classification performance of the learned semiMarkov transition models was evaluated using leaveoneout cross validation. As described in section IVB, the likelihoods of each test example were computed using its entire sequence of transitions (LkALL). Failure patients were identified at a rate (sensitivity) of 50% while specificity was 73%.
The predictive value of the individual patterns/states was further evaluated. In particular, to compute the likelihood of a test sequence based on one state, the product of crossstate transitions emanating from only that state are taken. As before, this likelihood is computed with respect to the transition models for the 2 classes, and a prediction is made by selecting the class whose model gave higher likelihood. Results are shown accordingly in Table IV where LkSTATE represents prediction made using likelihood of the ”STATE” specified. The highest sensitivity of 68% was obtained by the PAU, MVT, SYB states. Overall, the PAU pattern gave the lowest loss of 0.37.
VE Discriminative Classification with SVM
The following features, motivated by the semiMarkov chain model, were extracted from each subject and applied in an SVM classifier.

Total dwell time in each respiratory pattern as a fraction of the total sequence duration, (DwAll)  5 features

Number of transitions from pattern to pattern (where ) as a fraction of the total dwell time in pattern , , (TrAll)  20 features

Number of occurrences of each respiratory pattern as a fraction of the number of occurrences of all patterns, (OcAll)  5 features
Approach  Sensitivity  Specificity  Loss 

Generative (SemiMarkov)  
LkALL  0.50  0.73  0.38 
LkPAU  0.68  0.58  0.37 
LkASB  0.63  0.48  0.45 
LkMVT  0.68  0.50  0.41 
LkSYB  0.68  0.53  0.40 
LkUNK  0.61  0.44  0.48 
Discriminative (SVM)  
DwALL  0.18  0.93  0.45 
OcALL  0.44  0.69  0.44 
TrALL  0.82  0.41  0.39 
DwOcTrALL  0.64  0.63  0.37 
DwOcTrPAU  0.84  0.54  0.31 
DwOcTrASB  0.38  0.75  0.44 
DwOcTrMVT  0.60  0.58  0.41 
DwOcTrSYB  0.26  0.81  0.46 
DwOcTrUNK  0.62  0.43  0.48 
The features were first used individually (DwALL, OcALL, TrALL) and then as an ensemble (DwOcTrALL) of 30 features. Similar to the generative case, the predictive value of each individual pattern/state was evaluated  for each state, the dwell time in that state, DwSTATE (1 feature), the crosstransitions, TrSTATE (4 features) and the occurrence count, OcSTATE (1 feature) were combined (DwOcTrSTATE) to train the classifier. We applied SVMs with RBF/Gaussian kernels. Leaveoneout crossvalidation was used in a grid search to find the best pair of hyperparameter (box constraint and kernel scale ) values based on the balanced misclassification loss. This grid search was repeated for each feature set since each would have different optimal values of and .
When using features for all patterns, the sensitivity and specificity at the optimal hyperparameter setting were 64% and 63%, respectively. The highest sensitivity of 84% was obtained when using features of only the Pause pattern, which also gave the lowest loss of 0.31. It could also be observed that whereas PAU and MVT patterns gave higher sensitivities, SYB and ASB gave higher specificities. This is likely an indication that the Pause and Movement patterns characterise better patients who may fail extubation while the breathing patterns better characterise patients who succeed. All results are summarised in Table IV. In Fig. 3, we show the receiveroperating characteristic (ROC) curve for the 2 SVM models which gave the lowest loss. ROC was obtained by fixing at the optimal value and varying .
Vi Discussion
We demonstrated the practical application of semiMarkov chains for modeling and classification of respiratory pattern behaviour of preterm infants in the immediate period prior to extubation. We showed that semiMarkov chain models provide more expressive and robust details about the underlying time series compared to Markov chain models. In terms of dwell time behaviour, the model revealed consistency between the success and failure groups in all respiratory states. Differences were highlighted primarily in transition behaviour arising from the Pause and Movement Artifact patterns. Prediction results confirmed that these 2 patterns provide more discriminating information (especially for patients who failed extubation) than any other pattern.
Indeed, using only features from the Pause pattern led to the greatest detection of failed extubations (with a sensitivity of 84%), but failed to recognizing nearly half of the successful extubations (specificity of 54%). This means that at the time infants were deemed ready for extubation, 8 out of 10 failures would be prevented, but at the expense of unnecessarily prolonging the course of mechanical ventilation in 5 out of 10 successes. Previous work [12] which applied SVM (on a much smaller cohort) using cardiorespiratory variables gave comparable sensitivity but a higher specificity of 74%. It will be necessary to repeat those experiments with this larger dataset. Another viable path for future work is in training a mixture of experts model (MEM) that classifies failure patients based on Pause and Movement states, and success patients on the breathing states. The use of automatically extracted respiratory patterns for prediction of extubation readiness provides an approach that unveils intuition and enhances interpretable models. We emphasize that all babies used in this study were deemed ready for extubation by an attending clinician, so these results constitute an improvement in detecting problem cases over current practice. The advantage of using an automated approach is that we can provide a quantified, repeatable and precise analysis breathing patterns to support clinical decisions.
References
 [1] B. J. Stoll, N. I. Hansen, E. F. Bell, M. C. Walsh, W. A. Carlo, S. Shankaran, A. R. Laptook, P. J. Sánchez, K. P. Van Meurs, M. Wyckoff et al., “Trends in care practices, morbidity, and mortality of extremely preterm neonates, 19932012,” JAMA, vol. 314, no. 10, pp. 1039–1051, 2015.
 [2] B. J. Manley, L. W. Doyle, L. S. Owen, and P. G. Davis, “Extubating extremely preterm infants: Predictors of success and outcomes following failure,” J Pediatr, vol. 173, pp. 45–49, 2016.
 [3] S. Chawla, G. Natarajan, S. Shankaran, B. Carper, L. P. Brion, M. Keszler, W. A. Carlo, N. Ambalavanan, M. G. Gantz, A. Das et al., “Markers of successful extubation in extremely preterm infants, and morbidity after failed extubation,” The Journal of Pediatrics, 2017.
 [4] M. C. Walsh, B. H. Morris, L. A. Wrage, B. R. Vohr, W. K. Poole, J. E. Tyson, L. L. Wright, R. A. Ehrenkranz, B. J. Stoll, A. A. Fanaroff et al., “Extremely low birthweight neonates with protracted ventilation: mortality and 18month neurodevelopmental outcomes,” The Journal of pediatrics, vol. 146, no. 6, pp. 798–804, 2005.
 [5] W. Shalish, L. J. Kanbar, S. Rao, C. A. RoblesRubio, L. Kovacs, S. Chawla, M. Keszler, D. Precup, K. Brown, R. E. Kearney, and G. M. Sant’Anna, “Prediction of extubation readiness in extremely preterm infants by the automated analysis of cardiorespiratory behavior: study protocol,” BMC Pediatrics, vol. 17, no. 1, p. 167, Jul 2017. [Online]. Available: https://doi.org/10.1186/s128870170911z
 [6] C. A. RoblesRubio, K. A. Brown, and R. E. Kearney, “Automated unsupervised respiratory event analysis,” in 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Aug 2011, pp. 3201–3204.
 [7] N. Ye et al., “A markov chain model of temporal behavior for anomaly detection,” in Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop, vol. 166. West Point, NY, 2000, p. 169.
 [8] K. Gabriel and J. Neumann, “A markov chain model for daily rainfall occurrence at tel aviv,” Quarterly Journal of the Royal Meteorological Society, vol. 88, no. 375, pp. 90–95, 1962.
 [9] D. Alinovi, G. Ferrari, F. Pisani, and R. Raheli, “Markov chain modeling and simulation of breathing patterns,” Biomedical Signal Processing and Control, vol. 33, pp. 245 – 254, 2017.
 [10] J. Kaczmarek, S. Chawla, C. Marchica, M. Dwaihy, L. Grundy, and G. M. Sant’Anna, “Heart rate variability and extubation readiness in extremely preterm infants,” Neonatology, vol. 104, no. 1, pp. 42–48, 2013.
 [11] C. A. RoblesRubio, J. Kaczmarek, S. Chawla, L. Kovacs, K. A. Brown, R. E. Kearney, and G. M. Sant Anna, “Automated analysis of respiratory behavior in extremely preterm infants and extubation readiness,” Pediatric Pulmonology, vol. 50, no. 5, pp. 479–486, 2015. [Online]. Available: http://dx.doi.org/10.1002/ppul.23151
 [12] D. Precup, C. A. RoblesRubio, K. A. Brown, L. Kanbar, J. Kaczmarek, S. Chawla, G. M. Sant’Anna, and R. E. Kearney, “Prediction of extubation readiness in extreme preterm infants based on measures of cardiorespiratory variability,” in 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Aug 2012, pp. 5630–5633.
 [13] V. Barbu and N. Limnios, SemiMarkov Chains and Hidden SemiMarkov Models Toward Applications: Their Use in Reliability and DNA Analysis, 1st ed. Springer Publishing Company, Incorporated, 2008.
 [14] C. M. Bishop, Pattern recognition and machine learning. Springer, 2006.
 [15] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 1994.
 [16] M. Sheppard, “Fit all valid parametric probability distributions to data (https://www.mathworks.com/matlabcentral/fileexchange/34943), retrieved january 18, 2017, matlab central file exchange.” 2012.
 [17] G. Schwarz et al., “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461–464, 1978.
 [18] C. Cortes and V. Vapnik, “Supportvector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
 [19] S. Kullback, Information theory and statistics. Courier Corporation, 1997.
 [20] T.T. Wong, “Performance evaluation of classification algorithms by kfold and leaveoneout cross validation,” Pattern Recognition, vol. 48, no. 9, pp. 2839–2846, 2015.
a Balanced Misclassification Loss
Accuracy is defined as:
(10)  
(11)  
(12)  
(13) 
where , , and are respectively the number of true positives, true negatives, false positives and false negatives; is the number of positive examples and the number of negative examples. is sensitivity and is specificity.
This means that, if the class proportions are 75:25, a classifier which simply predicts the majority class would have a misleading accuracy of 75%. The balanced accuracy measure, , corresponds to the average of the sensitivity and specificity measures:
(14) 
B Transition Matrices for Fitted Markov chain Model
The Markov chain transition matrices for the success and failure populations are shown in Tables V and VI, respectively. It can be seen that due to extremely long dwell times in states compared to crossstate transitions, the diagonal elements (selftransitions) account for nearly all of the probability on each row.
PAU  ASB  MVT  SYB  UNK  

PAU  0.9936  0.0019  0.0004  0.0022  0.0018 
ASB  0.0006  0.9953  0.0010  0.0013  0.0018 
MVT  0.0012  0.0029  0.9931  0.0020  0.0007 
SYB  0.0003  0.0005  0.0003  0.9977  0.0012 
UNK  0.0015  0.0031  0.0003  0.0055  0.9895 
PAU  ASB  MVT  SYB  UNK  

PAU  0.9920  0.0022  0.0007  0.0020  0.0032 
ASB  0.0005  0.9955  0.0007  0.0013  0.0020 
MVT  0.0007  0.0021  0.9934  0.0028  0.0010 
SYB  0.0001  0.0005  0.0003  0.9978  0.0012 
UNK  0.0013  0.0027  0.0004  0.0056  0.9899 
C Dwell/Sojourn Time Distribution Fits
Plots of the probability density functions (PDF) of sojourn times in all states are shown in Fig 4, as well as the distributions of best fit based on the bayesian information criterion (BIC).
Fig 4: Probability Density Functions (PDF) of Dwell Time Distributions in all 5 respiratory patterns for success and failure patients