Cardiac Aging Detection Using Complexity Measures
Abstract
As we age, our hearts undergo changes which result in reduction in complexity of physiological interactions between different control mechanisms. This results in a potential risk of cardiovascular diseases which are the number one cause of death globally. Since cardiac signals are nonstationary and nonlinear in nature, complexity measures are better suited to handle such data. In this study, noninvasive methods for detection of cardiac aging using complexity measures are explored. LempelZiv (LZ) complexity, Approximate Entropy (ApEn) and EfforttoCompress (ETC) measures are used to differentiate between healthy young and old subjects using heartbeat interval data. We show that both LZ and ETC complexity measures are able to differentiate between young and old subjects with only 10 data samples while ApEn requires at least 15 data samples.
keywords:
Cardiac aging, LempelZiv (LZ) complexity, Approximate Entropy (ApEn), EfforttoCompress (ETC), chaos, time series analysis1 Introduction
It is well known that functions of all physiological systems are greatly altered during the process of aging. Among these, the cardiovascular system has received prominent attention due to the high death rate attributed to heart related diseases. In fact, the World Health Organization (WHO) has labeled cardiovascular diseases as the number one cause of death globally deathCVD (). Hence, study of cardiac aging has been an important area in clinical medicine, and understanding heart rate patterns is an essential step in this study. In the 1980’s, chaos theory began being used to investigate heart beat dynamics physiologyChaos80s1 (); physiologyChaos80s2 (); physiologyChaos80s3 (). Initially, researchers had assumed that pathological systems were the ones that produce chaotic time series. They therefore tried using the concepts of nonlinear theory in modeling pathologies like cardiac arrhythmias, which initiated the process of understanding the behaviour and dynamics of atrial or ventricular fibrillation heartbeatChaotic (). But contrary to earlier theories, it was found that cardiac arrhythmia doesn’t display chaotic dynamics pathologyNotChaos (). It was indeed a surprise when further studies led to the discovery that cardiac chaos is not found in pathological systems but in the dynamics of normal sinus rhythm normalHeartChaos1 (); normalHeartChaos2 (); normalHeartChaos3 ().
Goldberger, in his seminal work on heart beat dynamics heartbeatNonlinear1 (), points out that even during bed rest, the heart rate in healthy individuals is not constant or periodic, but shows irregular patterns that are typically associated with a chaotic nonlinear system. A detailed analysis of the beattobeat heart rate variability time series shows that there is no particular time scale for this behaviour and it is seen in all orders of magnitude (hours, minutes and seconds). This selfsimilar structure lends credence to the fact that a fractal feature of chaotic dynamics exists in these time series data. This has been further verified by Goldberger heartbeatNonlinear1 () by performing spectral analysis on normal heart time series, which revealed a broad spectrum with a continuum of frequency components, indicating a high possibility of presence of chaos. Added to this, the phase space mapping of the heart rate time series produced strange attractors, not limit cycles, which clearly indicates the presence of underlying chaotic dynamics in the heart rate variability data. It has been surmised that this chaotic behaviour is due to the fact that a healthy physiological condition is defined by complex interactions between multiple control mechanisms. These interactions are essential for the individual to adapt to the ever changing external environment. These highly adaptive and complex mechanisms lead to a very irregular firing rate of the pacemaker cells in the heart’s sinus node, resulting in a chaotic heart rate variability heartbeatChaotic ().
Looking at this physiological response of healthy individuals, a natural question that arises is: ‘How long can this complex interaction continue? What happens as the person gets older?’ This was the focus of study by Goldberger et al. in complexityAging (). They point out that, during the process of aging, there occur two effects that affect the complexity of the interactions among the control mechanisms.

Progressive impairment of functional components.

Altering of nonlinear coupling between these components.
Due to these effects, the physiological functions no longer show multitudes of variations but slowly settle down to a more regular nature. This results in a decrease of complexity in heart rate variability in healthy older individuals. This also explains why researchers are motivated to use complexity measures (such as approximate entropy, LempelZiv complexity and other measures) to study cardiac aging, as such measures are known to aid the detection of the regularity inherent in the cardiac time series. From a clinical application perspective, it is desirable to have a complexity measure that can aid this detection in realtime (hence should be able to work with small number of samples), and also be robust to noise or missing data.
2 Existing studies and their limitations
Statistical measures like mean and variance can’t adequately quantify complex chaoticlike behaviour. This is because systems with very different dynamics may have very similar means and variances. Hence Goldberger et al. in cardiacAgingApEn () have used the following two measures to capture the complex nonlinear variability of the physiological processes.

Measurement of the dimension of the nonlinear system that describes the physiological data variations.

Calculation of ApEn (Approximate Entropy) of the physiological time series.
Using beattobeat heart rate signals collected from 16 healthy young subjects and 18 healthy elderly subjects, they demonstrate that the complexity of cardiovascular dynamics reduces with aging. Similar studies have also been done using fractal measures like ‘detrended fluctuation analysis’ to quantify long range correlation properties in fractal1 (); fractal2 () to show that complexity of heart beat variablity decreases with aging. Use of multifractality, sample entropy and wavelet analysis has been explored in AgingMultifractality_sampleEntropy (). Nemati et al. in agingTransferEntropy () use bandlimited transfer entropy to characterize cardiac aging, with an added purpose of discovering the effect of respiration and blood pressure on complexity of heart rate variability due to aging phenomenon.
All the aforementioned studies have used very long data sequences, running into 1000s of data samples for their analysis. Some methods that are used, namely ‘detrended fluctuation analysis’ and approximate dimension require such large data sets to effectively quantify the complexity.
Looking at the fractal nature of the heart beat variability data, there is a high possibility that short data sequences would be enough to fully characterize the structure in the time series, provided the complexity measure can handle short sequences effectively. Takahashi in AgingHRV_conditionalentropy () has used conditional entropy and symbolic analysis with sequences of length 200 for differentiating between heart beat intervals of young and old subjects. We go a step ahead and use sequences shorter than that. Our goal is to compare the usefulness of EfforttoCompress (ETC ETC ()), Approximate Entropy (ApEn ApEn ()), and LempelZiv (LZ LZComplexity ()) complexity measures in automatic identification of cardiac aging using very short length interbeat time series data. It is conceivable that longer interbeat time series data may not be practically available, or that the long sequence may be unreliable due to contamination by noise. In such instances, it is always beneficial to rely upon short length time series data, especially if it were to be useful in a practical setting.
3 Complexity Measures
In this section, we introduce the reader to LZ, ApEn and ETC complexity measures.
3.1 LempelZiv complexity
LempelZiv complexity is a popular measure in the field of biomedical data characterization LZ_HRV (); LZ_VentricularFibrillation (). To compute the LempelZiv complexity, the given data (if numerical) has to be first converted to a symbolic sequence. This symbolic sequence is then parsed from left to right to identify the number of distinct patterns present in the sequence. This method of parsing is proposed in the seminal work on LempelZiv complexity LZComplexity (). The very succinct description in LZfiniteDataSize () is reproduced here.
Let denote a symbolic sequence; denote a substring of that starts at position and ends at position ; denote the set of all substrings ( for and ). For example, let = , then = {}. The parsing mechanism involves a lefttoright scan of the symbolic sequence . Start with = 1 and = 1. A substring is compared with all strings in (Let V(S(1,0)) = {}, the empty set). If is present in , then increase by 1 and repeat the process. If the substring is not present, then place a dot after to indicate the end of a new component, set = + 1, increase by 1, and the process continues. This parsing procedure continues until , where is the length of the symbolic sequence. For example, the sequence ‘’ is parsed as ‘.’. By convention, a dot is placed after the last element of the symbolic sequence and the number of dots gives us the number of distinct words which is taken as the LZ complexity, denoted by . In this example, the number of distinct words (LZ complexity) is 4.
Since we may need to compare sequences of different lengths, a normalized measure is proposed and is denoted by and expressed as :
(1) 
where denotes the number of unique symbols in the symbol set LZ_interpretation ().
3.2 Approximate Entropy (ApEn)
Approximate entropy is a complexity measure used to quantify regularity of time series, especially short and noisy sequences ApEn (). ApEn is a measure that monitors how much a set of patterns that are close together for a few observations, still retains its closeness on comparing the next few observations. Basically it checks for the convergence and divergence of patterns to check the complexity of the given sequence. If neighbouring patterns retain the same closeness, then we infer it to be a more regular pattern, with a concomitant lower ApEn value. The measure been defined in ApEn () and we reproduce the definition here. Two input parameters, and , must be initially chosen for the computation of the measure  being the length of the patterns we want to compare each time for closeness, and being a tolerance factor for the regularity of the two sets of patterns being compared.
Given a sequence of length N, we now define the complexity ApEn(,,) as follows.

Form vector sequences through defined by = [], representing consecutive values, starting from the value.

Define the distance between vectors and as the maximum difference in their respective scalar components.

For each , calculate the number of such that and call the number as .

For each , calculate the parameters = . These parameters measure, within a tolerance , the regularity or frequency of patterns similar to given pattern of length .

Define
(2) 
Using this, ApEn complexity measure is defined as
(3)
It has been shown in ApEn () that for =1 and 2, values of between 0.1 to (standard deviation) of the sequence provide good statistical validity of ApEn. In our analysis, we use =1 and = (standard deviation) of the sequence.
3.3 EfforttoCompress (ETC) complexity
EfforttoCompress (ETC) is a recently proposed complexity measure that is based on the effort required by a lossless compression algorithm to compress a given sequence ETC (). The measure has been proposed using a lossless compression algorithm known as Nonsequential Recursive Pair Substitution (NSRPS). The algorithm for compressing a given sequence of symbols proceeds as follows. At the first iteration, that pair of symbols which has maximum number of occurrences is replaced with a new symbol. For example, the input sequence ‘’ is transformed into ‘’ since the pair ‘’ has maximum number of occurrences compared to other pairs (‘’, ‘’ and ‘’). In the second iteration, ‘’ is transformed to ‘’ since ‘’ has maximum frequency (in fact all pairs are equally likely). The algorithm proceeds in this fashion until the length of the string is 1 or the string becomes a constant sequence (at which stage the entropy is zero and the algorithm halts). In this example, the algorithm transforms the input sequence ‘’ ‘’ ‘’ ‘’ ‘’ ‘’.
The ETC measure is defined as , the number of iterations required for the input sequence to be transformed to a constant sequence through the usage of NSRPS algorithm. is an integer that varies between 0 and , where stands for the length of the input symbolic sequence. The normalized version of the measure is given by: (Note: ).
4 Experimental setup
Data used for our experiment was obtained from ‘Physionet: Fantasia database’ physionet () and has been described in fractal2 (). Beattobeat heart rate signals from two groups of healthy adults: twenty young (age 2134) and twenty old (age 6881) with 10 males and females in each category, were obtained. The subjects were studied while lying in a supine (lying on back) position and watching the Disney movie ‘Fantasia’ to maintain wakefulness. The ECG data was sampled at 250 Hz and the obtained samples were used for analysis. For further details of the experimental setup, please refer to fractal2 ().
In our study, we take short length samples from random time instances and analyze them using the three complexity measures (described in the previous section) to gauge their efficiency in distinguishing between the two groups of data.
5 Comparative complexity analysis
In this section, we take a closer look at the performances of each of the measures. We are interested in finding out the minimum number of samples required for correct classification.
5.1 Analysis procedure

From each of the twenty young and old data sets, choose consecutive number of samples from a random location.

Calculate the ApEn, LZ and ETC complexity measures for the chosen length data set.

For statistical accuracy, 50 such locations are randomly chosen and 50 complexity values for each of the measures are calculated.

The complexity assigned to each of the measure for a sequence of length is the average of all the 50 values calculated.

Thus for each complexity measure, we obtain 20 complexity values for the young subjects and 20 complexity values for the old subjects.

Using these values as samples representing the young and old populations, a twosample ttest is performed for each complexity measure.

The results of the ttest are analyzed to check if the mean complexity value of the beattobeat interval of the young subjects is significantly greater than the mean complexity value of the beattobeat interval of the old subjects.

The entire process is repeated multiple times with different values of and the minimum length at which each complexity measure is able to successfully classify is determined.
5.2 Results summary
As per the procedure outlined above, complexity measures were calculated for data of different lengths. The continuous valued beattobeat interval data was quantized using using 4 bins and again using 8 bins and complexity measures were calculated. Twosample ttest was performed on the complexity values for various lengths and the results are shown in Tables 2 and 2 respectively.
Length  Complexity  Mean(Old)  Mean(Young)  tvalue  df  p 
L=20 
ApEn  0.718  0.769  3.29  30  0.001 
LZ  0.871  0.928  2.76  30  0.005  
ETC  0.712  0.729  1.86  28  0.037  
L=15 
ApEn  0.588  0.621  2.24  27  0.017 
LZ  0.905  0.948  2.36  34  0.012  
ETC  0.771  0.778  0.87  27  0.195  
L=10 
ApEn  0.358  0.385  1.41  38  0.083 
LZ  0.958  0.963  0.38  38  0.352  
ETC  0.869  0.871  0.15  38  0.439 
Length  Complexity  Mean(Old)  Mean(Young)  tvalue  df  p 
L=15 
ApEn  0.588  0.621  2.24  27  0.017 
LZ  0.795  0.832  3.09  32  0.004  
ETC  0.883  0.909  3.01  29  0.005  
L=10 
ApEn  0.358  0.385  1.41  38  0.083 
LZ  0.788  0.809  2.10  38  0.043  
ETC  0.933  0.950  2.30  38  0.027  
L=8 
ApEn  0.305  0.309  0.26  38  0.797 
LZ  0.773  0.790  1.89  38  0.067  
ETC  0.956  0.962  1.25  38  0.217 
The ttest results for analysis using 4 bins may be summarized as follows:

The mean ApEn complexity of the beattobeat interval of old subjects is significantly less than that of young subjects for data lengths of 20 (t_{30} = 3.29, p = 0.001) and 15 (t_{27} = 2.24, p = 0.017) while it is not significantly less (t_{38} = 1.41, p = 0.083) for data length of 10.

The mean LZ complexity of the beattobeat interval of old subjects is significantly less than that of young subjects for data lengths of 20 (t_{30} = 2.76, p = 0.005) and 15 (t_{34} = 2.36, p = 0.012) while it is not significantly less (t_{38} = 0.38, p = 0.352) for data length of 10.

The mean ETC complexity of the beattobeat interval of old subjects is significantly less than that of young subjects for a data length of 20 (t_{28} = 1.86 p = 0.037), while it is not significantly less for data lengths of 15 (t_{27} = 0.87, p = 0.195) and 10 (t_{38} = 0.15, p = 0.439).
The ttest results for analysis using 8 bins may be summarized as follows:

The mean ApEn complexity of the beattobeat interval of old subjects is significantly less (t_{27} = 2.24, p = 0.017) than that of young subjects for a data length of 15 while it is not significantly less for data lengths of 10 (t_{38} = 1.41, p = 0.083) and 8 (t_{38} = 0.26, p = 0.797).

The mean LZ complexity of the beattobeat interval of old subjects is significantly less than that of young subjects for data lengths of 15 (t_{32} = 3.09, p = 0.004) and 10 (t_{38} = 2.10, p = 0.043) while it is not significantly less (t_{38} = 1.89, p = 0.067) for a data length of 8.

The mean ETC complexity of the beattobeat interval of old subjects is significantly less than that of young subjects for data lengths of 15 (t_{29} = 3.01, p = 0.005) and 10 (t_{38} = 2.30, p = 0.027) while it is not significantly less (t_{38} = 1.25, p = 0.217) for a data length of 8.
Tables 3 and 4 summarize the ability of the complexity measures to classify beattobeat intervals of old and young subjects, using 4 bins and 8 bins respectively.
Complexity measure  L=20  L=15  L=10 

ApEn 
Yes  Yes  No 
LZ  Yes  Yes  No 
ETC  Yes  No  No 
Complexity measure  L=15  L=10  L=8 

ApEn 
Yes  No  No 
LZ  Yes  Yes  No 
ETC  Yes  Yes  No 
6 Conclusions and Future Research Work
Based on our study on the experimental data, at a 5% significance level (overall error rate) for the statistical test, there is sufficient evidence to conclude that:

For data analyzed using 4 bins, ApEn and LZ complexity measures are able to distinguish between beattobeat intervals of young and old subjects for lengths of 15 or more, while ETC complexity measure is able to do so for lengths of 20 and higher.

For data analyzed using 8 bins, both LZ and ETC complexity measures are able to distinguish between beattobeat intervals of young and old subjects for lengths as short as 10 data samples, while ApEn is able to do so only for lengths greater than 15.
For future research work, it is imperative that we study the effect of noise (which is unavoidable in reallife applications) and missingdata on the complexity measures and how it might impact the discrimination between cardiac signals of young and old subjects. This could be done by adding different levels of noise on the existing datasets and also removing parts of data to simulate missingdata problem. This is also important from a clinical application perspective.
7 Acknowledgment
The authors would like to acknowledge Gayathri R Prabhu (Indian institute of Technology (IIT), Chennai), Sutirth Dey (Indian institute of Science Educational and Research (IISER), Pune) and Sriram Devanathan (Amrita University) in this work. We also thank Del Marshall (Amrita University) for valuable suggestionsâ to improve the manuscript.
8 References
References
 [1] Shanthi Mendis, Pekka Puska, Bo Norrving, et al. Global atlas on cardiovascular disease prevention and control. World Health Organization, 2011.
 [2] Ary L Goldberger, Valmik Bhargava, Bruce J West, and Arnold J Mandell. On a mechanism of cardiac electrical stability. the fractal hypothesis. Biophysical journal, 48(3):525, 1985.
 [3] Ary L Goldberger and Bruce J West. Fractals in physiology and medicine. The Yale journal of biology and medicine, 60(5):421, 1987.
 [4] AL Goldberger, DR Rigney, J Mietus, EM Antman, and S Greenwald. Nonlinear dynamics in sudden cardiac death syndrome: heartrate oscillations and bifurcations. Cellular and Molecular Life Sciences, 44(11):983–987, 1988.
 [5] Ary L Goldberger. Is the normal heartbeat chaotic or homeostatic? Physiology, 6(2):87–91, 1991.
 [6] Ary L Goldberger and David R Rigney. Nonlinear dynamics at the bedside. In Theory of Heart, pages 583–605. Springer, 1991.
 [7] Ary L Goldberger and R Rigney, David. Chaos and fractals in human physiology. Scientific American, 262:42–49, 1990.
 [8] R Rössler, F Gotz, and OE Rössler. Chaos in endocrinology. Biophys. J, 25(2):216a, 1979.
 [9] Niels Wessel, Maik Riedl, and Jürgen Kurths. Is the normal heart rate âchaoticâ due to respiration? Chaos: An Interdisciplinary Journal of Nonlinear Science, 19(2):028508, 2009.
 [10] Ari L Goldberger. Is the normal heartbeat chaotic or homeostatic? News Physiol Sci, 6:87–91, 1991.
 [11] Lewis A Lipsitz and Ary L Goldberger. Loss of complexity and aging. Jama, 267(13):1806–1809, 1992.
 [12] DT Kaplan, MI Furman, SM Pincus, SM Ryan, LA Lipsitz, and AL Goldberger. Aging and the complexity of cardiovascular dynamics. Biophysical Journal, 59(4):945–949, 1991.
 [13] Robb W Glenny, H Thomas Robertson, Stanley Yamashiro, and James B Bassingthwaighte. Applications of fractal analysis to physiology. Journal of Applied Physiology, 70(6):2351–2367, 1991.
 [14] Nikhil Iyengar, C K Peng, Raymond Morin, A L Goldberger, and Lewis A Lipsitz. Agerelated alterations in the fractal scaling of cardiac interbeat interval dynamics. American Journal of PhysiologyRegulatory, Integrative and Comparative Physiology, 271(4):R1078–R1084, 1996.
 [15] Anne Humeau, François ChapeauBlondeau, David Rousseau, Pascal Rousseau, Wojciech Trzepizur, and Pierre Abraham. Multifractality, sample entropy, and wavelet analyses for agerelated changes in the peripheral cardiovascular system: preliminary results. Medical Physics, 35(2):717–723, 2008.
 [16] Shamim Nemati, Bradley A Edwards, Joon Lee, Benjamin PittmanPolletta, James P Butler, and Atul Malhotra. Respiration and heart rate complexity: effects of age and gender assessed by bandlimited transfer entropy. Respiratory physiology & neurobiology, 189(1):27–33, 2013.
 [17] Anielle CM Takahashi, Alberto Porta, Ruth C Melo, Robison J Quitério, Ester da Silva, Audrey BorghiSilva, Eleonora Tobaldini, Nicola Montano, and Aparecida M Catai. Aging reduces complexity of heart rate variability assessed by conditional entropy and symbolic analysis. Internal and emergency medicine, 7(3):229–235, 2012.
 [18] Nithin Nagaraj, Karthi Balasubramanian, and Sutirth Dey. A new complexity measure for time series analysis and classification. The European Physical Journal Special Topics, 222(34):847–860, 2013.
 [19] Steve Pincus. Approximate entropy (ApEn) as a complexity measure. Chaos: An Interdisciplinary Journal of Nonlinear Science, 5(1):110–117, 1995.
 [20] Abraham Lempel and Jacob Ziv. On the complexity of finite sequences. Information Theory, IEEE Transactions on, 22(1):75–81, 1976.
 [21] J Goá¸¿ezPilar, GC GutiérrezTobal, D Alvarez, F del Campo, and R Hornero. Classification methods from heart rate variability to assist in SAHS diagnosis. XIII Mediterranean Conference on Medical and Biological Engineering and Computing 2013, pages 1825–1828, 2014, Springer.
 [22] Deling Xia, Qingfang Meng, Yuehui Chen, and Zaiguo Zhang. Classification of ventricular tachycardia and fibrillation based on the lempelziv complexity and EMD. Intelligent Computing in Bioinformatics, pages 322–329, 2014, Springer.
 [23] Jing Hu, Jianbo Gao, and Jose Carlos Principe. Analysis of biomedical signals by the LempelZiv complexity: the effect of finite data size. Biomedical Engineering, IEEE Transactions on, 53(12):2606–2609, 2006.
 [24] Mateo Aboy, Roberto Hornero, Daniel Abásolo, and Daniel Álvarez. Interpretation of the LempelZiv complexity measure in the context of biomedical signal analysis. Biomedical Engineering, IEEE Transactions on, 53(11):2282–2288, 2006.
 [25] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, ChungKang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000.