Heartbeat Anomaly Detectionusing Adversarial Oversampling

Heartbeat Anomaly Detection
using Adversarial Oversampling

Jefferson L. P. Lima
Centro de Informática
Universidade Federal de Pernambuco
50.740-560, Recife, PE, Brazil
   David Macêdo Centro de Informática
Universidade Federal de Pernambuco
50.740-560, Recife, PE, Brazil
   Cleber Zanchettin Centro de Informática
Universidade Federal de Pernambuco
50.740-560, Recife, PE, Brazil

Cardiovascular diseases are one of the most common causes of death in the world. Prevention, knowledge of previous cases in the family, and early detection is the best strategy to reduce this fact. Different machine learning approaches to automatic diagnostic are being proposed to this task. As in most health problems, the imbalance between examples and classes is predominant in this problem and affects the performance of the automated solution. In this paper, we address the classification of heartbeats images in different cardiovascular diseases. We propose a two-dimensional Convolutional Neural Network for classification after using a InfoGAN architecture for generating synthetic images to unbalanced classes. We call this proposal Adversarial Oversampling and compare it with the classical oversampling methods as SMOTE, ADASYN, and RandomOversampling. The results show that the proposed approach improves the classifier performance for the minority classes without harming the performance in the balanced classes.

I Introduction

Cardiovascular diseases are demanding attention since they are among the leading causes of death in the world. Cardiac arrhythmia consists of disturbances that alter heart rate, constituting a severe public health problem. The electrocardiogram (ECG) is the process of recording the heart electrical activity over some time using electrodes. ECG can be handy to assist the cardiac arrhythmia diagnostic as it is a non-invasive method of detecting abnormal cadences of the heartbeats.

Cardiologists often perform human arrhythmia detection using ECG because of the high error rates of the computerized approaches. In order to perform arrhythmias detection using ECG, a machine learning algorithm must recognize the distinct arrhythmias wave types and its distinct forms. Detect arrhythmias is a difficult task due to the variability in wave morphology between patients as well as the presence of noise. The considerable intra-class variation makes the challenger even harder.

Most of the current approaches to deal with arrhythmia classification make use of feature extraction procedures. However, these approaches require a prior and specific knowledge about the characteristics that are essential to represent a specific type of heartbeat. A possible strategy is taking abnormal beats as crucial points and analyzing the recurrence of the anomalies along the segment of the ECG. To overcome the previously mentioned limitations, considering the difficulty and the importance of taking spatial information in ECG analysis, Convolutional Neural Networks (CNN) have been used to tackle this problem in several studies [1] [2] [3] [4].

However, one problem commonly encountered in datasets involving health is the high degree of imbalance. Often, we come across a smaller amount of samples involving the positive class (representing the disease). Many learning systems are not used to deal with unbalanced data.

The use of this type of datasets to train models such as a neural network usually causes high accuracy for data belonging to the majority class, but an unacceptable accuracy for minority class data. It typically happens because the model ends up specializing only in the classes with the most significant quantity of samples as the classes with few samples are presented fewer times for the adjustments of the weights.

Many techniques try to increase the amount of minority class samples. The simplest way is to repeat samples randomly. Other techniques try to use some criterion to select the samples, such as a level of difficulty that they have according to the classification model. Some more elaborate techniques to generate extra samples trying to follow the original distribution of the data, however, much class noise is inserted in the dataset since samples may follow an incorrect assumption of that distribution.

To solve this problem, in this paper, we propose a generative adversarial model architecture using InfoGAN to try to learn the data distributions and generate synthetic samples of ECG beats. The idea is to balance the dataset with sintentic data generated by an InfoGAN. We call this method adversarial oversampling. Moreover, we compared our proposed method with three other traditional oversampling methods: RandomOversampler, SMOTE, and ADASYN.

The MIT-BIH arrhythmia database was used to perform experiments in order to verify if a CNN training using the proposed method outperforms the ones trained using standard oversampling approaches.

Fig. 1: Pipeline of the proposed model: Pre-processing (converts signals into 2D images of single beats); InfoGAN (generates synthetic samples to balance the dataset); CNN (classifies the database balanced).

In section II, we described related work. Section III presents the proposed method. Additionally, the experiments and results are reported in Section IV and Section V, respectively. Section VI presents final remarks and future works.

Ii Background

This section describes the subjects that served as a basis for developing this work. First, we present the standard oversampling methods used in this paper. After that, we describe Generative Adversarial Networks (GAN) in generic terms.

Ii-a Oversampling Methods

The present work makes use of synthetic samples generated by InfoGAN to perform a dataset balancing. To investigate the effectiveness of this approach, we compare to three other classical methods for data balancing: RandomOversampler, SMOTE, and ADASYN. These methods are widespread in the literature and are widely used to deal with unbalanced datasets [5] [6] [7] [8] [9] [10]. All methods are available in the Imblearn library [11].

Ii-A1 Random OverSampler

It consists of the most straightforward approach to performing Oversampling. The method suggests choosing samples from the minority class randomly and with replacement.

Ii-A2 SMOTE: Synthetic Minority Over-sampling Technique

To oversample with SMOTE [12], is took a sample from the dataset, and consider its k nearest neighbors (in feature space). To create a synthetic data point, take the vector between one of those k neighbors, and the current data point. Basically, synthesizes new instances between existing (real) minority instances. SMOTE draws lines between existing minority samples and then generate new, synthetic minority instances somewhere on these lines.

Ii-A3 ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning

The basic idea behind ADASYN [13] is to consider the level of difficulty in learning different samples from minority class examples and use to weighted distribution for each class. The method generates synthetic data for minority class examples that are harder to learn compared to those minority examples that are easier to learn. That is, forcing the learning algorithm to focus on regions of difficult learning. The main idea of the ADASYN algorithm is to use a density distribution as a criterion for automatically deciding the needed number of synthetic samples to be generated for each example of the minority class. So, for each example of the minority class , synthetic examples are generated.

Ii-B Generative Adversarial Networks

GANs [14] have shown remarkable success as a framework for training models to produce realistic-looking data. GANs are composed of two networks which compete in other to improve their performance. Given a training set X, the Generator, G(x), takes as input a noise vector and tries to produce sample data similar to real data presented in the training set.

A Discriminator network, D(x), is a binary classifier that tries to distinguish between the real data exhibited in the training set X and the fake data generated by the Generator. So, given a set of training data, GANs can learn to estimate the underlying probability distribution of the training data.

Based on traditional GAN models, Xi Chen et al. [15] (2016) designed the infoGAN, which applies concepts from Information Theory to transform some of the noise terms into latent codes that have systematic and predictable effects on the outcome. InfoGAN performs its task by splitting the Generator input code into two parts: the traditional noise vector and a new “latent code” vector. The codes are then made meaningful by maximizing the Mutual Information between the code and the generator output. It allows convergence to happen much faster than in traditional GANs.

Fig. 2: Samples of heartbeat after the pre-processing step with their respective classes.

Recently, the use of synthetic samples generated by GANs has been gaining strength with the results obtained in some works. Anthreas Antoniou et al. (2018) [16] presented the Data Augmentation Generative Adversarial Networks (DAGAN). It has been empirically shown that the samples generated by DAGAN generally yielded gains of accuracy, reaching improvements of up to 13% in experiments. M. Frid-Adar et al. (2018) [17] used synthetic samples generated by a Deep Convolutional GAN (DCGAN) to perform Data Augmentation in order to improve liver lesion classification. It achieved an improvement of 7% using synthetic augmentation over the classic augmentation methods. F. Bolelli et al. (2018) [18] used GANs in Skin Lesion Segmentation problem. The GANs was used to augment data in the image segmentation field, and a Convolutional Deconvolutional Neural Network (CDNN) to generate lesion segmentation mask from dermoscopic images.

Among the surveys, we have no find any approach to address the use of GAN to deal with unbalanced datasets involving health problems, therefore, more like the Oversampling we have is the use of GAN to Data Augmentation for a whole dataset.

Iii Heartbeat Anomaly Detection
using Adversarial Oversampling

In this section, we describe the complete pipeline of the model proposed in this work. The Figure 1 shows in a summarized way the flow of the solution. Three phases comprise the method: Pre-processing, Adversarial Oversampling (InfoGAN), and CNN classification.

  1. Pre-processing: The complete long ECG records are preprocessed to extract all individual beats and produce heartbeat centered binary images;

  2. Adversarial Oversampling: The InfoGAN is used to generate synthetic adversarial images for the minority heartbeat classes;

  3. Trainning and Classification: The new balanced dataset is used to training a 2D CNN to image classification.

Iii-a Pre-processing

After the acquisition of the signal, a pre-processing was performed to collect a range of 25 time-slices of the signal. Those images have the peak of the heartbeat in its central region. Therefore, each peak centralized image is classified individually. Then, we produced binary images with resolution corresponding to the peaks. With this, the database that was previously composed by a long 1-dimensional ECG series is now composed of several 2-dimensional labeled images. The Figure 2 shows the data after all pre-processing.

Iii-B Adversarial Oversampling

After a search of the existing GAN models, rapid experiments were carried out to evaluate the speed of convergence, robustness to mode collapse and visual quality of the synthetic generated images. We evaluated in details the traditional GAN, the Wasserstein GAN [19], and InfoGAN. Finally, we chose the InfoGAN as the adversary model.

As we need to generate new images, the Generator (G) and Discriminator (D) are both composed by convolutional layers. The Generator also has upsampling and batch normalization layers. At the end of the model, we employed a Tanh activation function to have activation between and .

Therefore, the G takes a standard normal random noise of size , which turns on a synthetic image at the end of the model. The D takes both fake and real images (not at same time) with size . The D is composed of 2-dimensional convolutional layers, dropout to minimize overfitting, batch normalization, Leaky ReLU [20] as activation function, and Sigmoid function as the output of the model. The auxiliary Q Net is composed of Fully Connected layers and a Softmax function as output to give us the current label of the synthetic image.

The Figure 3(a) shows at left the architecture of the InfoGAN Generator, and a detailed architecture of the Discriminator (D) and Auxiliary Q Net at right.

To train the InfoGAN, we use batches of noise generated images as fake class and batches of real images as the true class. So, the traditional steps to train a conventional GAN is performed. We pass a batch of samples with some fake data, and a batch of samples with real data to the Discriminator to learn how to differentiate fake data from the real data. The InfoGAN training is shown in the Figure 3(b).

After training some epochs of the Discriminator and the Q Net, we freeze D weights and train only the Generator model at the combined model. Consequently, the Generator model iteratively learns how to generate synthetic images that follow the real distribution of the training images. The general InfoGAN objective function is given by the lower-bound approximation to the Mutual Information as follows:

We adopted the following methodology to generate synthetic samples: Given training data samples from the minority classes VEB and APC, we train the InfoGAN for both samples. The training/generation of images was iterative. The max number of epochs was 100 thousand, and at every 500 epochs, we save a snapshot of the generator network model and some synthetic images. When the training was finished, we looked at all samples and chose snapshots that generated images with a realistic appearance. Hence, we are not considering the loss, but looking only to the quality of generated images.

Fig. 3: (a) InfoGAN Architecture. (b) InfoGAN Training. (c) CNN Architecture.

Iii-C Training and Classification

A Convolutional Neural Network (CNN) was chosen as the classification model. The Figure 3(c) presents the high-level architecture of the proposed CNN. The network takes as input a 2D dimensional array, which represents a sampled heartbeat image with dimensions. Two convolutional layers compose the model. The kernels initialization follow the He Initialization approach [21]. We used Rectified Linear Unit (ReLU) [22] as the activation function.


To prevent overfitting, we employ some layers with the dropout training strategy [23] and also batch normalization [24] to accelerate training. As subsampling method, we use simple AveragePooling layers. At the end of the network, we used a Softmax layer to provide the output as probabilities of each heartbeat class. We use the Adam optimizer [17] with the default parameters and interrupted the training when the loss on the validation set stopped to decrease. ¨ Given the data set X, for a single sample of from X, the network training consists in optimize the following cross-entropy objective function:


In the previously equation, is the number of classes or rhythms that a signal can assume, is a binary indicator (0 or 1) if class label c is the correct classification for observation (heartbeat) o, and is the predicted probability observation o is of class c.

Class Original Adversarial Oversampling ADASYN SMOTE Random Oversampler
TABLE I: F1 score for each oversampling method

Iv Experiments

This section describes the experiments that investigate if Adversarial Oversampling could be used to balance the training data set. That is, whether synthetic data generated by a GAN could improve the overall classification accuracy. We compare the performance of this Adversarial Oversampling with the RandomOversampler, SMOTE, and ADASYN.

In this experiments, we use the MIT-BIH Arrhythmia Database, which is a classical arrhythmia database. This dataset contains 48 half-hour excerpts of two-channel ambulatory ECG 360hz recordings obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. [25]. From the MIT-BIH Arrhythmia dataset, we collect a number of samples from seven different classes as shown in Table II.

Heartbeat Type # Samples
Atrial Premature Contraction APC 243
Normal Normal 1079
Left Bundle Branch Block LBBB 1051
Paced Beat PAB 895
Premature Ventricular Contraction PVC 1012
Right Bundle Branch RBB 1006
Ventricular Escape Beat VEB 106
TABLE II: MIT-BIH Arrhythmia Datasset Description

The CNN training consisted of 20 epochs and an early-stop if the loss in the validation set did not decrease considering the previous epoch. The experiments were repeated ten times, where 20% the data was used as the test set, 10% as the validation set, and 70% to training.

Considering the methodology used to generate InfoGAN synthetic samples, we split the original dataset between the training set and test set, and only the train set was used to InfoGAN training. Considering only the training set, we generate how many samples were necessary to reach a total of 1000 samples as the other classes have around this amount. In this way, the new training set became relatively balanced.

After generating the synthetic samples, we perform the Adversarial Oversampling iteratively added the samples to the network training and observed the F1 score for each class. Then we used the same training and test environment and compare the Adversarial Oversampling with the traditional oversampling methods. The classifier for each oversampling methods was the proposed CNN method.

V Results and Discussion

In this section, the results of the proposed Adversarial Oversampling approach are compared to the ones provided by traditional oversampling methods. As shown in the second column in Table I, the CNN produced low performance in the original (unbalanced) dataset influenced by the F1 score for the minority classes VEB and APC.

Consequently, this experiment aims to minimize this problem by generating high-quality synthetic samples to minority classes to get a more balanced train set. Although it had a slightly smaller amount, the PAB class was not chosen to be oversampled since it did not influence the performance of the overall classification.

The Figure 4 shows three adversarial samples for the classes APC and VEB. On the left side, we have two images representing the original samples of the mentioned clasess.

Fig. 4: Adversarial samples from VEB and APC classes.

To observe whether Adversarial Oversampling could be used to perform the dataset balancing, the F1 score for each class was observed as synthetic samples were inserted. The objective was to observe if the APC and VEB minority classes would perform better in the classification, without decreasing the performance of the other classes.

The Figure (a)a and the Figure (b)b show the CNN performance for each class when 150 and 100 samples are added until reaching the total number of 1000. Both graphs contain a mean F1 score of 10 executions.

Fig. 5: Mean of F1-score for each class when (a) VEB and (b) APC synthetic samples are used.

After observing the variation in CNN performance as synthetic samples are inserted into the training set, we tested the CNN using the best amount of synthetic samples for each of both classes. As shown in Figure (a)a and Figure (b)b, the best performance of CNN was observed using 600 synthetic samples of the APC class and 600 synthetic samples of the VEB class. The Table I shows the F1-score for each class using Adversarial Oversampling compared to the RandomOverSampler, ADASYN, SMOTE oversampling methods.

As we can see, the results favor the hypothesis that the Adversarial Oversampling could be used to perform the dataset balancing taking into account the performance observed in Table I, which was higher than obtained by oversampling using RandomOversampler, SMOTE and ADASYN for the two minority classes, APC and VEB.

One possible explanation for this may be the fact that InfoGAN generates new samples, not just repeats them as in the case of RandomOversampler. Besides, it can generate samples that follow the original distribution of the classes, reducing the occurrence of class noise, which is a common risk in traditional Oversampling methods.

Vi Conclusion

When using deep learning for signal analysis, such as ECG, through convolutional neural networks, convolutions with one dimension consider the signal as a time series. However, according to the obtained results, we can observe that the bi-dimensional approach used in our method obtained an excellent performance. Only the minority classes did not have a satisfactory classification.

Concerning the performance of the InfoGAN to generate synthetic samples, we show that these samples have a variation within the class and maintain the original characteristics of the signal, which is fundamental. So, we can see that and Adversarial Oversampling can be used to perform an dataset balancing, generating new synthetics samples of the minority classes on the dataset and that the generated and selected samples were able to decrease the effects of the unbalance data significantly. The proposed approach improved the performance of CNN in the classification of VEB and APC types. Besides, for the used dataset the Adversarial Oversampling proposed overcomes traditional methods such as RandomOversampler, SMOTE, and ADASYN.

As future works we suggest the elaboration of a more automatic form of generating and selection samples, possibly observing only the loss, without the necessity of human interference to select good synthetic samples to use. Also, we suggest evaluating the performance of new approaches such as Unrolled GAN [26]. The base code for this work is available publicly online (https://github.com/JeffersonLPLima/adversarial_oversampling).


We would like to thank CAPES and CNPq (Brazilian research agencies) for the financial support. In addition the E.life Brazil for making the infrastructure available for conducting the experiments.


  • [1] S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, pp. 664–675, March 2016.
  • [2] P. Sodmann, M. Vollmer, N. Nath, and L. Kaderali, “A convolutional neural network for ecg annotation as the basis for classification of cardiac rhythms,” Physiological Measurement, vol. 39, no. 10, p. 104005, 2018.
  • [3] Özal Yıldırım, P. Pławiak, R.-S. Tan, and U. R. Acharya, “Arrhythmia detection using deep convolutional neural network with long duration ecg signals,” Computers in Biology and Medicine, vol. 102, pp. 411 – 420, 2018.
  • [4] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolutional neural networks and learning ecg features for screening paroxysmal atrial fibrillation patients,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, pp. 2095–2104, Dec 2018.
  • [5] J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, “A survey on addressing high-class imbalance in big data,” Journal of Big Data, vol. 5, pp. 1–30, 2018.
  • [6] R. K. Shahzad, “Android malware detection using feature fusion and artificial data,” 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pp. 702–709, 2018.
  • [7] X. R. Palou, C. Turino, A. Steblin, M. S. de-la Torre, F. Barbé, and E. Vargiu, “Comparative analysis of predictive methods for early assessment of compliance with continuous positive airway pressure therapy,” in BMC Med. Inf. and Decision Making, 2018.
  • [8] S. Wang, Z. Li, W. Chao, and Q. Cao, “Applying adaptive over-sampling technique based on data density and cost-sensitive svm to imbalanced learning,” in The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, June 2012.
  • [9] T. Liu, Y. Liang, and W. Ni, “Minority identification for imbalanced dataset,” in Proceedings of the 31st Chinese Control Conference, pp. 3897–3902, July 2012.
  • [10] B. Zhou, C. Yang, H. Guo, and J. Hu, “A quasi-linear svm combined with assembled smote for imbalanced data classification,” in The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–7, Aug 2013.
  • [11] G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning,” Journal of Machine Learning Research, vol. 18, no. 17, pp. 1–5, 2017.
  • [12] N. Chawla, K. Bowyer, L. O. Hall, and W. Philip Kegelmeyer, “Smote: Synthetic minority over-sampling technique,” J. Artif. Intell. Res. (JAIR), vol. 16, pp. 321–357, 01 2002.
  • [13] H. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328, June 2008.
  • [14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), pp. 2672–2680, Curran Associates, Inc., 2014.
  • [15] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, 2016.
  • [16] A. Antoniou, A. Storkey, and H. Edwards, “Data augmentation generative adversarial networks,” 2018.
  • [17] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “Synthetic data augmentation using gan for improved liver lesion classification,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 289–293, April 2018.
  • [18] F. Bolelli, F. Pollastri, R. P. Palacios, and C. Grana, “Improving skin lesion segmentation with generative adversarial networks,” in 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), pp. 442–443, June 2018.
  • [19] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning (D. Precup and Y. W. Teh, eds.), vol. 70 of Proceedings of Machine Learning Research, (International Convention Centre, Sydney, Australia), pp. 214–223, PMLR, 06–11 Aug 2017.
  • [20] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” 05 2015.
  • [21] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034, Dec 2015.
  • [22] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, (USA), pp. 807–814, Omnipress, 2010.
  • [23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014.
  • [24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pp. 448–456, JMLR.org, 2015.
  • [25] G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, pp. 45–50, May 2001.
  • [26] L. Metz, B. Poole, D. Pfau, and J. Sohl-Dickstein, “Unrolled generative adversarial networks,” CoRR, vol. abs/1611.02163, 2016.
Comments 1
Request Comment
The feedback must be of minumum 40 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description