Smartphone-based paroxysmal atrial fibrillation monitoring with robust generalization

Smartphone-based paroxysmal atrial fibrillation monitoring with robust generalization

Tamas Madl
Austrian Research Institute for Artificial Intelligence
HeartShield Ltd.
&David Madl
HeartShield Ltd.

Atrial fibrillation is increasingly prevalent, especially in the elderly, and challenging to detect due paroxysmal nature. Here, we propose novel computational methods based on heart beat intervals to facilitate rapid and robust discrimination between atrial fibrillation and sinus rhythm. We used low-cost Android smartphones, and recorded short, 30 second waveform data from 194 participants. In addition, we evaluated our approach on 8528 hand-held ECG recordings to show generalization.

Our approach achieves a sensitivity of 93% and specificity of 94% on 30 second waveforms, significantly outperforming previously proposed heart rate variability features and smartphone-based AFib detection methods, and substantiates the feasibility of real-world application on low-cost hardware.


Smartphone-based paroxysmal atrial fibrillation monitoring with robust generalization

  Tamas Madl Austrian Research Institute for Artificial Intelligence HeartShield Ltd. David Madl HeartShield Ltd.


noticebox[b]31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\end@float

1 Introduction

Atrial fibrillation (AFib) can cause significant risks to health, such as thrombosis formation or cerebrovascular stroke. Estimated population prevalence ranges from 0.5 - 3%, can reach 10 - 30% in the elderly, and nearly one in six stroke events are caused by AFib [reiffel2014atrial]. However, detecting paroxysmal atrial fibrillation remains challenging. A large percentage of all cases are unnoticed.

Recently, smartphones have been suggested to facilitate home monitoring at any time, with simple handheld ECGs [galloway2013iphone], which the patient would need to purchase, or using the camera and flashlight as a substitute photoplethysmograph [krivoshei2016smart]. These latter approaches obtain an approximate photoplethysmography signal from the brightness values of the built-in phone camera, and subsequently calculate inter-beat intervals from the temporal differences of maxima or other fiducial points. It has been argued that variability measures of beat intervals obtained from smartphones can be in good agreement with the same measures calculated from ECG [peng2015extraction], at least for higher-end devices with high quality built in camera sensors and circuitry.

However, low sensor quality, especially in lower-priced devices, as well as frequent motion artifacts and the lack of willingness or patience to sit completely still for extended periods of time pose serious challenges for both of these home screening methods. Evidence of reliable detection of AFib has so far been mostly limited to high end phones such as iPhones, to minimum measurement durations of two to five minutes (requiring very patient participants), and to a few dozen patients [krivoshei2016smart].

Here, we describe a system with stronger generalization ability than previously proposed methods using robust measures of irregularity and multivariate analysis, and show its applicability in more practical settings; namely, up to 10x shorter measurements (30 s vs. 300 s), low-cost hardware, higher tolerance to movement artifacts, low false positive rate, and generalization across hardware types).

2 Methods

We constructed the model based on inter-beat intervals (IBIs) from Holter ECGs of 102 participants (see section ‘Training data‘), and evaluated its performance on out of sample data from

  • IBIs from smartphone recordings of 100 patients, recorded in a clinical setting

  • IBIs from smartphone recordings of 200 participants, recorded in a home monitoring setting

  • IBIs from 8528 hand-held ECG recordings, recorded in a home monitoring setting

Signal processing and data analysis was performed using custom software written in Python. Each short signal was evenly resampled, detrended and bandpass filtered prior to beat detection. Signals were rescaled to zero-mean and unit variance to mitigate the extreme amplitude differences between different data sources (Holters, 12-lead ECGs and hand-held fingertip ECGs for home monitoring; smartphones of different models and makes). In the case of ECG signals, the QRS-detector of [kim2016simple] was applied subsequently to extract inter-beat time intervals.

ECG sampling rates varied between 128 Hz and 300 Hz, depending on data source (see details below). All smartphone PPGs were recorded by a custom developed app on Android smartphones, which, at the time of this publication, are limited by the operating system to optical sensor input with a maximum of 30 Hz sampling rate.

2.1 Robust heart rate variability measures for AFib detection

A large number of measures of heart rate variability (HRV) have been suggested in previous literature (see e.g. [acharya2006heart] for a review), and the authors have developed an array of additional HRV measures, shown to perform well at discriminating other disease etiologies from healthy controls [madl2016cinc]. For the present AFib detection model, we extracted the five best performing measures of HRV from this multitude of possibilities, applying a wrapper method of feature selection from machine learning literature [guyon2003introduction], yielding the following best-performing features in order of inclusion:

. Standard deviation of the n’th order derivative of inter-beat intervals, equivalent to the standard deviation of successive differences (of successive differences). In our experiments, n=5 led to the best results. The idea of taking the standard deviation of a derivative instead of the plain RR interval standard deviation (SDRR) has a long history in HRV literature - see e.g. [brennan2001existing].

Let denote the sequence of inter-beat intervals, such that each represents the temporal difference between two heart beats. Then, the first feature is defined as


. Histogram entropy, a particular implementation of the popular entropy-based measures of heart rate variability (e.g. [lake2002sample]), defined as follows. Let be a histogram of the frequency distribution of the inter-beat interval sequence , discretized into bins, such that is the normalized frequency of all that fall into the th bin, and is the width of the th bin. Then,


Since a larger number of bins increases susceptibility of noise and sensitivity to the particular bin edges, we chose the smallest viable number of bins (), which yields the most robust generalization (more bin edges could allow more noisy beats to land in the ‘wrong’ bin).

. Resemblance of the distribution of inter-beat intervals to a Rayleigh distribution. Testing against the Rayleigh distribution is frequently used in physics to test for periodicity in time series of possibly regular events [leahy1983searches]. After inclusion of the above two, more traditional HRV measures, the addition of this metric facilitated the highest increase in AFib predictive accuracy. We formalized the resemblance between the IBI distribution and a comparable (fitted) Rayleigh distribution as follows. Let denote the Rayleigh distribution: of inter-beat intervals , with a scale parameter . Let be the maximum likelihood estimated scale parameter of the best-fitting Rayleigh distribution, given the series .

Furthermore, let the denote the kernel density estimate of sequence , such that


is the Normal distribution, and the parameter is calculated using Silverman’s rule of thumb bandwidth estimator [silverman1986density]; that is, .

Then, , the resemblance of the kernel density estimated distribution of inter-beat intervals to the best-fitting Rayleigh distribution can be calculated numerically by summing up the absolute differences between the two functions:


was chosen to be 1 ms (since none of the data sources hat a higher resolution / accuracy), and was chosen to cover most of the Rayleigh distribution. Specifically, M was ensured to be make the range of summation extend at least to the point where the Rayleigh distribution took on an amplitude smaller than of its peak, i.e., such that .

. Measures of stochasticity based on a horizontal visibility graph of the IBI series, in particular, the measures of graph radius and disassortative entropy, both shown before to be capable of outperforming previously proposed HRV measures in terms of predictive power for certain cardiovascular diseases [madl2016cinc]. Structural properties of the original time series, such as periodicity or fractality, are preserved in a horizontal visibility graph representation; and it has been argued that they are well-suited for discriminating stochastic and chaotic processes [luque2009horizontal].

A horizontal visibility graph (HVG) can be constructed from the IBI series as follows. Let be the time that interval has ‘occurred’; that is, the time of the fiducial point in beat in the ECG or PPG signal. Then, is a network constructed of the series , such that each IBI has a corresponding vertex, each pair of vertices corresponding to a pair of IBIs and is connected by an edge if both for all [luque2009horizontal]. In other words, in a bar plot of all inter-beat intervals , vertices corresponding to two particular IBIs are connected of an unbroken horizontal line can be drawn between them without intersecting any intermediate bars in the plot.

Apart from their usefulness in differentiating noise from chaos arising from non-linear dynamics, HVGs are useful because they facilitate the application of any complex networks analysis tool to IBIs. According to preliminary feature selection, two particular complex network descriptors increased predictive accuracy the most: radius and disassortative entropy.

, HVG radius, is defined as the minimum of eccentricities, where eccentricity is simply the graph distance to the vertex farthest from it in the HVG:


, HVG disassortative entropy, is a connectivity measure proposed by the author [madl2016cinc]. It is defined as the entropy of the mixing matrix, measuring the information content of the tendency of vertices to connect to similar vertices. Whereas traditional assortativity [newman2003mixing] is maximized by a graph always connecting high-degree vertices to other high-degree vertices, disassortative entropy is maximized by a graph in which the connectivity is randomized. Thus, it is a second-degree measure of stochasticity.

The disassortative entropy-based feature is defined as


where is the joint probability of the degrees of vertices and in HVG constructed from . Numerically, is the fraction of edges in that connect vertices with degree to vertices with degree (where degree in the graph theory sense simply means the number of edges incident to a vertex). See [newman2003mixing].

Table 1 shows cross-validated accuracies using logistic regression, chosen to avoid overfitting the limited training data. Regularization and feature selection were based on a hold-out set of the training Holter ECGs, and were not adapted to smartphone PPG.

3 Results

The multivariate logistic regression model described above was constructed using solely the training dataset described above (research ECG datasets), and evaluated on out of sample testing datasets (smartphone PPGs or hand-held ECGs) not seen at training time. Table 1 shows a comparison to alternative physiological markers or biomarkers recently suggested for paroxysmal AFib diagnosis. On data from cheap smartphone hardware, our method displays significantly higher accuracy (94% vs. 70%), area under the ROC curve (AUC, 0.97 vs. 0.81), and specificity (94% vs. 66%), but slightly lower sensitivity (93% vs. 95%) compared to the recently proposed method by [krivoshei2016smart]. A high specificity is important in a home monitoring setting for psychological, time, and financial reasons, which makes previously proposed methods difficult to apply to short-term, 30 second recordings obtained from low-cost smartphone hardware.

Table 1 also shows the performance of other physiological and biological markers recently proposed to diagnose paroxysmal atrial fibrillation in a clinical setting (see [howlett2015diagnosing] for a recent review). Unlike the comparison with prior methods of smartphone-based screening, these markers were evaluated on separate patient cohorts, limiting the usefulness of direct comparison. Nevertheless, the order of magnitude of these performance indicators strongly suggests smartphone-based home monitoring to be a viable solution to detect unnoticed or asymptomatic paroxysmal atrial fibrillation, at very low cost to patients and the healthcare system.

home monitoring
Resting ECG Biomarker Physiological
et al., 2016
P-wave dispersion
(Dilaveris 1998)
NT-proBNP peptide
(Fonseca 2014)
Left atrium size
+pump function
(Toh 2010)
83% 100% 82%
85% 70.4% 91%
Sample size
194 PPG
(5641 ECG)
194 PPG
(5641 ECG)
100 264 280
Table 1: Sensitivities and specificities of smartphone home monitoring on 30 second heart beat interval sequences (first two columns), as well as physiological and biological markers (last three columns - separate cohorts) in detecting paroxysmal AFib on novel cohorts without retraining. Brackets show hand-held ECG results, numbers above concern smartphone PPG results.

4 Conclusion

Our results provide evidence that waveforms as short as 30 seconds are sufficient for robust detection, increasing utility for end-users who might be hard pressed to sit still without the slightest movement for the 5 minute periods used in previous work [krivoshei2016smart]. We demonstrate high sensitivity and specificity on a much larger patient cohort than previous smartphone studies, with much lower hardware requirements. Finally, our approach seems to generalize across patient cohorts and even measurement devices. Although Table 1 shows differences between the smartphone PPG cohort and the hand-held ECG data, discrimination ability is high in both cases, suggestive of strong generalization.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description