Generalization Studies of Neural Network Models for Cardiac Disease Detection Using Limited Channel ECG

Generalization Studies of Neural Network Models for
Cardiac Disease Detection Using Limited Channel ECG

Deepta Rajan, David Beymer, Girish Narayan 
IBM Research, San Jose, CA, USA
Abstract

Acceleration of machine learning research in healthcare is challenged by lack of large annotated and balanced datasets. Furthermore, dealing with measurement inaccuracies and exploiting unsupervised data are considered to be central to improving existing solutions. In particular, a primary objective in predictive modeling is to generalize well to both unseen variations within the observed classes, and unseen classes. In this work, we consider such a challenging problem in machine learning driven diagnosis – detecting a gamut of cardiovascular conditions (e.g. infarction, dysrhythmia etc.) from limited channel ECG measurements. Though deep neural networks have achieved unprecedented success in predictive modeling, they rely solely on discriminative models that can generalize poorly to unseen classes. We argue that unsupervised learning can be utilized to construct effective latent spaces that facilitate better generalization. This work extensively compares the generalization of our proposed approach against a state-of-the-art deep learning solution. Our results show significant improvements in F1-scores.


\symbold

Generalization Studies of Neural Network Models for

Cardiac Disease Detection Using Limited Channel ECG

Deepta Rajan, David Beymer, Girish Narayan

IBM Research, San Jose, CA, USA


1 Introduction

Longitudinal patient records are central to the realm of healthcare, comprising a historical aggregation of diverse information such as diagnostic codes, lab measurements, imaging exams, text reports etc. Analyzing sequences of events and episodes are required for diagnosis and treatment planning. Harnessing the power of artificial intelligence tools and technologies for such analysis, e.g. deep learning, could potentially improve patient care quality and reduce costs by digitizing healthcare [1]. However, the success of data-driven solutions primarily depends on two aspects: infrastructure to manage big data and deploy solutions at scale; and strong generalization abilities of computational models to produce reliable predictions in new environments. However, leveraging large volumes of patient data being routinely collected is challenged by both availability of expert annotations and our ability to discern meaningful correlations from the vast number of predictor variables. Further, these datasets are plagued by discrepancies in measurements and imbalances in disease distributions. Consequently, despite the community-wide efforts of curating representative datasets [2], we operate in small-data regimes that result in highly biased clinical models.

In this paper, we focus on detecting cardiac abnormalities, such as myocardial infarction, a disease causing over million deaths annually [3], using limited channel ECG. The standard -channel ECG is a prevalent diagnostic modality and the primary screening exam for heart ailments, with over million signals recorded every year [4]. However, in certain cases, only a subset of these leads can be accessed. Examples include inpatient telemetry [5], and ambulatory heart rhythm monitoring [6]. In such cases, the goal is to obtain meaningful clinical conclusions using only the limited measurements [7].

A popular solution for sequence modeling includes Recurrent Neural Networks (RNN) based on Long Short-Term Memory (LSTM) units, and have had proven success with clinical time-series classification [1]. Besides, a recent empirical study [8] deemed 1-D Convolutional Neural Networks (CNN) to be a powerful and more effective alternative to model sequences. Specifically, 1-D Residue Networks (ResNet) have become a popular solution for rhythm classification [9]. More recently, new learning paradigms such as deep attention models [10] and temporal convolutional networks have been shown to produce improved performances. Broadly, these methods fall under the category of discriminatory modeling, which can be ineffective when dealing with out-of-distribution samples. Recently, Rajan et.al. [11] argued that a generative modeling approach can enable inference of latent features to describe complex distributions. In this paper, we build on this idea and propose a novel neural network architecture referred as ResNet++ for limited channel ECG classification. Our approach uses an unsupervised generative model to construct latent features, followed by a discriminative model (-D ResNet [9]) to detect anomalies. The resulting latent feature representations implicitly exploit information from the missing channels and can predict the entire -channel ECG from a subset of channels. Results show that the proposed approach provides improved generalization to new diseases, and is less sensitive to hyperparameter settings even with small and imbalanced data.

.  missing

2 Problem Formulation

In this section, we provide a formal definition of the problem and introduce the notations used in the rest of this paper. In limited channel problems, ECG signals from only to leads are assumed to be available, and the goal is to build predictive models for disease detection. In this paper, we consider inferior myocardial infarcation (MI) leads (II, III, AvF) in order to build the model. From previous studies, it is known that these leads provide information to adequately localize inferior wall ischemia and infarction [12]. However, the ability of models trained on datasets dominated by subjects with infero MI conditions to generalize to other cardiac conditions is unknown. Such generalization studies emphasize the importance of choosing the optimal subset of leads that can be more broadly applicable in diagnosis. However, by restricting the algorithm to use only the inferior leads, disease generalization capacity of even sophisticated models suffers. To this end, we propose a novel approach that assumes access to full channel training data, builds meaningful latent spaces that can compensate for the missing channel data, and finally construct a deep neural network that performs predictions based on these latent features. Note that, in the rest of this paper, the terms leads and channels are used interchangeably. The multi-variate sequence dataset is represented as , where denotes the number of training samples, denotes the number of time-steps in each measurement and indicates the total number of channels. Based on the limited channel configuration denoted by the set {II, III, aVF}, whose cardinality , we extract the matrix . In order to perform implicit completion of the missing data, we propose to build a generative model that attempts to recover using . In this process, it infers a latent space that defines an effective metric to compare different samples.

3 Proposed Approach

Stage 1: As shown in Figure LABEL:fig:arch, the first stage of our algorithm employs unsupservised representation learning for predicting the complete ECG using only the 3-channel measurements. More specifically, we build an encoder-decoder architecture, commonly referred as Seq2Seq [13], with an optional attention mechanism. Though originally developed for machine translation, they are applicable to more general sequence to sequence transformation tasks. The architecture is comprised of two RNNs (based on Gated Recurrent Unit (GRU)), one each for encoder and decoder. The encoder transforms an input sequence from into a fixed length vector, either from the last time step of the sequence or by concatenating hidden representations from all time steps. The decoder then predicts the output sequence, in our case , using the encoder output. Optionally, the decoder can also attend to a certain part of the encoder states through an attention mechanism. The attention mechanism often uses both content from the encoder states, and also context from the sequence generated so far at the decoder. Our RNNs are designed using GRU cells, which are capable of learning long-term dependencies. Each GRU cell is comprised of the following operations, implemented using fully connected networks:

(1)

A GRU has two gates, a reset gate , and an update gate . While the reset gate determines how to combine the current input with the previous memory state, the update gate defines how much of the previous memory to retain as we proceed to the next step. In the simplest case, when we set the reset gate to all ’s and the update gate to all ’s it simplifies into a plain RNN model. While the underlying idea of using gating mechanisms to capture long-term dependencies is the same as in a LSTM, they are different in terms of the number of gates and the absence of an internal memory that is different from the hidden state. The generative model is trained with an loss at the decoder output. Note that, our architecture attempts to reconstruct the observed channels as well as predict the missing channel measurements.

Stage 2: We now design a classifier stage that exploits the latent space from the generative model trained for missing channel prediction. Interestingly, compared to discriminative models, this approach utilizes additional channel information from the training stage and builds a more effective metric for the whole data space instead of discriminating the normal/abnormal classes. Furthermore, since the first stage is unsupervised, we can use even unlabeled data to construct a more robust latent space. The classifier we use is a -D ResNet architecture with convolution, ReLU, batch normalization and dropout layers, as illustrated in Figure LABEL:fig:arch.

4 Experiment Setup

.  missing

5 Results

.  missing

  • [1] Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference. 2016; 301–318.
  • [2] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000 (June 13);101(23):e215–e220. Circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
  • [3] Ansari S, Farzaneh N, Duda M, Horan K, Andersson HB, Goldberger ZD, Nallamothu BK, Najarian K. A review of automated methods for detection of myocardial ischemia and infarction using electrocardiogram and electronic health records. IEEE reviews in biomedical engineering 2017;10:264–298.
  • [4] Hedén B, Ohlin H, Rittner R, Edenbrandt L. Acute myocardial infarction detected in the 12-lead ecg by artificial neural networks. Circulation 1997;96(6):1798–1802.
  • [5] Sandau KE, Funk M, Auerbach A, Barsness GW, Blum K, Cvach M, Lampert R, May JL, McDaniel GM, Perez MV, et al. Update to practice standards for electrocardiographic monitoring in hospital settings: a scientific statement from the american heart association. Circulation 2017;CIR–0000000000000527.
  • [6] Kennedy HL. The evolution of ambulatory ecg monitoring. Progress in cardiovascular diseases 2013;56(2):127–132.
  • [7] Atoui H, Fayn J, Rubel P. A novel neural-network model for deriving standard 12-lead ecgs from serial three-lead ecgs: application to self-care. IEEE transactions on information technology in biomedicine 2010;14(3):883–890.
  • [8] Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv180301271 2018;.
  • [9] Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY. Cardiologist-level arrhythmia detection with convolutional neural networks. arXiv preprint arXiv170701836 2017;.
  • [10] Song H, Rajan D, Thiagarajan JJ, Spanias A. Attend and Diagnose: Clinical Time Series Analysis using Attention Models. Proceedings of AAAI 2018 2018;.
  • [11] Rajan D, Thiagarajan JJ. A generative modeling approach to limited channel ecg classification. arXiv preprint arXiv180206458 2018;.
  • [12] Dubin D. Rapid Interpretation of EKG’s, volume 200. Cover Publishing Company Tampa (FL), 1996.
  • [13] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 2014; 3104–3112.
  • [14] Bousseljot R, Kreiseler D, Schnabel A. Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet. Biomedizinische TechnikBiomedical Engineering 1995;40(s1):317–318.
  • [15] Reasat T, Shahnaz C. Detection of inferior myocardial infarction using shallow convolutional neural networks. arXiv preprint arXiv171001115 2017;. Address for correspondence: Deepta Rajan - 650, Harry Road, San Jose, CA, USA - 95120
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
330568
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description