Unsupervised Prediction of Negative Health Events Ahead of Time
The emergence of continuous health monitoring and the availability of an enormous amount of time series data has provided a great opportunity for the advancement of personal health tracking. In recent years, unsupervised learning methods have drawn special attention of researchers to tackle the sparse annotation of health data and real-time detection of anomalies has been a central problem of interest. However, one problem that has not been well addressed before is the early prediction of forthcoming negative health events. Early signs of an event can introduce subtle and gradual changes in the health signal prior to its onset, detection of which can be invaluable in effective prevention. In this study, we first demonstrate our observations on the shortcoming of widely adopted anomaly detection methods in uncovering the changes prior to a negative health event. We then propose a framework which relies on online clustering of signal segment representations which are automatically learned by a specially designed LSTM auto-encoder. We show the effectiveness of our approach by predicting Bradycardia events in infants using MIT-PICS dataset 1.3 minutes ahead of time with 68% AUC score on average, using no label supervision. Results of our study can indicate the viability of our approach in the early detection of health events in other applications as well.
The increasing prevalence of continuous health monitoring through bedside care and wireless health sensors offers great potential for gaining insights into health state of individuals. However, the majority of the rich collected data remains unlabeled, mainly due to uncontrolled and real-time collection setting and tediousness of offline labeling by domain experts. Therefore, conventional supervised data analysis models which heavily rely on annotations become disadvantageous.
Unsupervised deep representation learning models, such as auto-encoders, have recently gained considerable attention in learning informative realization of data, including images and text . When it comes to time series analysis, one important application of these models has been anomaly detection, which primarily focuses on recognition of an abrupt change in time-series normal behavior. Such a change is shown to increase the reconstruction error on a model that is trained to generate the normal signal, as the model cannot reconstruct anomalous data points accurately. The success of deep anomaly detection in time series has been recently expanded to health care, especially in the analysis of ECG signals.
In spite of advances in real-time detection of anomalies, there has been a missing focus on the early prediction of forthcoming negative health events, which can be possible with the unsupervised analysis of health signals in intervals before the event onset. Physiological and environmental changes detected through health sensors can be an early sign for the onset of a negative health event in the near future (Fig. 1) and a number of studies have already validated this hypothesis in applications such as prediction of an asthma attack in children  and Bradycardia in infants .
In this study, we propose an unsupervised approach based on deep sequential auto-encoders and online clustering of the internal representations to address the aforementioned problem. We use the PICS dataset  which is recently made available to predict the onset of Bradycardia heart events in infants. We show that the widely adopted anomaly detection methods relying on the increase of reconstruction error perform poorly on distinguishing the more subtle and complex changes of signal behavior in pre-event episodes. Instead, we analyze the clusters formed by the representation of signal segments from an auto-encoder using Denstream , an online and noise-tolerant clustering method. In the design of the auto-encoder, we employ an LSTM encoder-decoder based architecture  alongside wavelet transform of the signal  to capture temporal features of time and frequency domain. Furthermore, we use unit-ball regularization of the learned representations  to optimize the results of our clustering phase.
In short, to the best of our knowledge, our study is first to address the problem of future negative event prediction using unsupervised models and propose a framework that its prediction capability is validated on the early detection of Bradycardia events in infants,
Ii Related Work
With the rise of unsupervised deep learning models especially auto-encoders and their great performance in other domains such as image recognition , their application has recently emerged in wireless health for detection of anomalies in health signals such as ECG signals. [5, 13] are among studies that employed auto-encoders on ECG to distinguish anomalous parts from the healthy ones. For this aim, the reconstruction error from the auto-encoder that is trained on normal data is tracked to find sudden jumps, motivated by the idea that such a model cannot reconstruct anomalous intervals of data accurately. Auto-encoders have successfully replaced prior approaches such as classifiers  which require large annotated datasets, alongside statistical clustering models , and future value predictor models  that both are not easily generalizable to other applications.
LSTM auto-encoders  were later introduced in learning representations of videos and improved feature extraction by capturing temporal features of the signal. They were later used in time series analysis as well. Moreover, Two recent studies have shown improved performance of auto-encoders in more complex anomaly detection settings by utilizing the encoded representation from auto-encoders in offline clustering of anomalies  or detection of signal change point by comparing neighbor segment representations . Although these studies follow different goals, we employ their finding in this study in building our model.
Prediction of Bradycardia in infants using the PICS dataset was approached before by publishers of the dataset with statistical methods . They specifically used a point process analysis and tried to capture the differences in variance and mean of signal segments before a Bradycardia event. Although this study proves the feasibility and achieves reasonable accuracy, their approach is supervised, hand-engineered, and heavily relies on the observance of multiple onsets of Bradycardia events in each infant, which is not always possible in the real-world setting. This is while our approach focuses on the straightforward collection of normal signals from individuals and the detection of changes in an unsupervised and automatic manner.
Learning an unsupervised representation of health signals can be used to distinguish intervals of data that may lead to a negative event. In this section, we review the representation learning and online clustering approaches used for this aim.
Iii-a Sequential Representation Learning
In time series data, temporal features carry important information. Therefore, we employ a variant of LSTM encoder-decoder architecture similar to  to encode signal segments into informative representations.
Given a time series segmented into fixed or variable length windows denoted as where each is itself a list of readings of length :
The model first embeds into a fixed length representation by feeding it into an encoder module, an LSTM based recurrent neural network (RNN) with cells. The hidden state of the last (th) cell can be considered as a compact and informative representation of , which we call it . To make these representations more suitable for comparison using distance-based metrics and remove the impact of representation length, we apply normalization on in the training phase and before feeding to the decoder as suggested by .
The decoder module that tries to reconstructs window from is also an LSTM RNN with a linear layer on the output gate. It uses as the initial state to the first cell. Also, the output of each cell like in the decoder is used as an input to the next cell and also represents the prediction of one reading in . We follow findings of prior studies on improved optimization of encoder-decoder architectures  and predict each window in reverse order. Fig. 1 depicts the design of the encoder and decoder modules.
Considering decoder module as a function , denotes the output of the model for window . When reconstruction error between and input is used as the objective function and both modules are jointly trained on normal intervals of data, the model learns to embed representative features of a normal input window into . Therefore, the objective function can be written as:
Where, the last term denotes normalization of linear layer weights.
Iii-B Online Clustering
As discussed, when auto-encoder is trained to reconstruct normal windows of time series, the encoder module, in turn, learns to extract representative features of a normal window. It is hypothesized that deviations from the norm in the signal will reflect in these features. It is important to note that representation of normal windows can come from multiple clusters and recognizing them is important for the detection of an abnormal window. Also, noisy abnormal deviations should be ignored. To reach these goals, we employ Denstream .
Denstream, an online and noise-tolerant clustering approach, tries to find close groups of data points as core micro-clusters and by marking those that do not reach a density threshold as outliers, it tackles the noise in data. Real clusters of data points (that we use) are formed by the connection of neighbor core micro-clusters at each point in time.
Having a trained encoder module, we feed representation of training windows to Denstream to extract main clusters of normal windows, denoted as . In the test time, as the representation of incoming signal windows are extracted and fed into Denstream in real-time, an increased appearance of clusters other than (abnormal clusters) in a short time is considered as an abnormality and possible event onset. It is worth mentioning that as Denstream removes sparse outlier windows, the abnormal clusters detected are dense enough to show a real change in the signal. In particular, we consider the last received windows ( calling them ”confidence windows” ) and keep track of the number of windows that join an abnormal cluster. A threshold on this score, 50 % in this study, is used to generate an alarm for an even onset.
In this section, we review the used dataset, details of pre-processing, experiment setup, and finally our results.
Iv-a Dataset and Data Pre-processing
Preterm Infant Cardio-Respiratory Signals Database(PICS)  contains 20 to 70 hours of ECG recordings of 10 infants with multiple onsets of Bradycardia episodes, in which infant’s heart rate stays below 100 bpm for at least two beats.
In this study, we employ the heart rate variability signal, generated by extracting the time difference between R-peaks in ECG. Furthermore, we take Morlet Continuous Wavelet Transform (CWT)  of this signal in the low-frequency band (0.01-0.15 HZ) to train our models. Both techniques have been shown to be a powerful tool in the detection of heart events . The pre-processed signal is then segmented into 64-heartbeat windows and fed to the model. The hyper-parameters of Denstream, , min-neighbor, and decay are set to 0.01, 2, and 0 respectively. The first third of the dataset for each infant is used for model training and the rest for the evaluation. Furthermore, to select only normal intervals of data for the training phase, a 3 and 6-minute margin before and after a Bradycardia event has been disregarded.
We evaluate our model by scoring the rate of true-positive alarms (recall), true negatives decisions (specificity), AUC, and earliest prediction time on true-positive alarms. An alarm is considered as true positive if a negative event happens within a 3-minute time span. Moreover, 6 minutes after each onset is disregarded in evaluation to ensure the effects of the last event has passed. Therefore, a true negative happens when no alarm is generated from 6 minutes after to 3 minutes before two consecutive events. The used time ranges are borrowed from in the initial study on this database .
As tracking reconstruction error is used as a common approach in anomaly detection , we share results of evaluating its performance in prediction of future negative events. As it is noticeable in Fig. 3, reconstruction error performs well in capturing visible and sharp changes in the signal. However, as discussed in  and observable here, the variability in heart rate, which translates into similar variability in the reconstruction error, does not show a simple pattern before a Bradycardia onset. Therefore, although reconstruction error achieves good performance in unsupervised detection of sudden changes, it can perform poorly for prediction of forthcoming events, mainly due to more complex nature of this task. This experiment validates our approach in employing deeper features of signal for uncovering the hidden changes before a negative event .
We next evaluate our proposed model qualitatively and quantitatively. Fig. 4 shows a qualitative view of online clustering results before three events onsets of infant 7. Each cluster is depicted by a unique color and level in the y-axis and those appearing in 3-minute time span before a negative event are shown with a cross mark. We can observe the trend of change in clusters as we move in time (on the x-axis) and process new incoming windows. As the figure suggests, the two blue clusters that appear in most of the times are related to the normal behavior of data. More importantly, we can observe the sudden appearance of numerous abnormal clusters in the 3 minute time span before each event, showing a powerful sign of an onset. Furthermore, it is noticeable that abnormal clusters appear far more sparsely in normal intervals. Confidence windows introduced in section III-B help in tuning the sensitivity of our model to these appearances.
Results of the next experiment, depicted in Fig. 5, is used to analyze the impact of confidence window size () on the performance of our model. In general, as we increase , the earliest time to prediction and recall decreases while the specificity increases, meaning that false alarms decrease in cost of losing detection of some events. If we observe closer, for to we can see a stable performance. This is because having ensures that a single appearance of an abnormal window does not generate an alarm and corresponds to around 2 minutes before an event where main changes happen. We can also observe that AUC is pretty stable as this metric is not dependent on our cut-off threshold (50 % abnormal observations in a confidence window) and mainly measures how well our model can assign distinguishable scores to positive and negative labels.
Table 5 contains AUC score and earliest time of prediction of our model with confidence window of 5 for all infants. The achieved results are competitive with ones from the prior study  (mean AUC of 0.79) when considering definite advantages of our unsupervised approach in the healthcare setting that labels are generally missing.
In this study, we approached the problem of early negative health event prediction. We first demonstrated poor performance of common anomaly detection models in addressing this problem and then proposed an unsupervised framework using LSTM auto-encoders and Denstream online clustering. We evaluated performance of our model qualitatively and quantitatively and validated its capabilities for addressing prediction of Bradycardia event in infants using MIT-PICS dataset, achieving average 68 % AUC score and 1.3 minute early prediction time.
-  P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings of ICML workshop on unsupervised and transfer learning, 2012, pp. 37–49.
-  L. Deng, “Three classes of deep learning architectures and their applications: a tutorial survey,” APSIPA transactions on signal and information processing, 2012.
-  J. Li, X. Chen, E. Hovy, and D. Jurafsky, “Visualizing and understanding neural models in nlp,” arXiv preprint arXiv:1506.01066, 2015.
-  P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long short term memory networks for anomaly detection in time series,” in Proceedings. Presses universitaires de Louvain, 2015, p. 89.
-  P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “Lstm-based encoder-decoder for multi-sensor anomaly detection,” arXiv preprint arXiv:1607.00148, 2016.
-  A. Hosseini, C. M. Buonocore, S. Hashemzadeh, H. Hojaiji, H. Kalantarian, C. Sideris, A. A. Bui, C. E. King, and M. Sarrafzadeh, “Feasibility of a secure wireless sensing smartwatch application for the self-management of pediatric asthma,” Sensors, vol. 17, no. 8, p. 1780, 2017.
-  A. H. Gee, R. Barbieri, D. Paydarfar, and P. Indic, “Predicting bradycardia in preterm infants using point process analysis of heart rate,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 9, 2017.
-  P. PhysioToolkit, “Physionet: components of a new research resource for complex physiologic signals,” Circulation. v101 i23. e215-e220.
-  F. Cao, M. Estert, W. Qian, and A. Zhou, “Density-based clustering over an evolving data stream with noise,” in Proceedings of the 2006 SIAM international conference on data mining. SIAM, 2006, pp. 328–339.
-  N. Srivastava, E. Mansimov, and R. Salakhudinov, “Unsupervised learning of video representations using lstms,” in International conference on machine learning, 2015, pp. 843–852.
-  C. Torrence and G. P. Compo, “A practical guide to wavelet analysis,” Bulletin of the American Meteorological society, vol. 79, no. 1, 1998.
-  C. Aytekin, X. Ni, F. Cricri, and E. Aksu, “Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations,” arXiv preprint arXiv:1802.00187, 2018.
-  P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long short term memory networks for anomaly detection in time series,” in Proceedings. Presses universitaires de Louvain, 2015, p. 89.
-  V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
-  M. Xie, S. Han, B. Tian, and S. Parvin, “Anomaly detection in wireless sensor networks: A survey,” Journal of Network and Computer Applications, vol. 34, no. 4, pp. 1302–1325, 2011.
-  S. A. Haque, M. Rahman, and S. M. Aziz, “Sensor anomaly detection in wireless sensor networks for healthcare,” Sensors, vol. 15, no. 4, pp. 8764–8786, 2015.
-  W.-H. Lee, J. Ortiz, B. Ko, and R. Lee, “Time series segmentation through automatic feature learning,” arXiv preprint:1801.05394, 2018.
-  A. Fagundes, L. Fagundes, and E. Lo Schiavo, “Wavelet concepts for electrocardiographic signal analysis,” Clinical and Experimental Medical Letters, vol. 54, pp. 169–178, 12 2013.
-  M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. ACM, 2014, p. 4.