# Maximum approximate entropy and r threshold: a new approach for regularity changes detection

###### Abstract

Approximate entropy (ApEn) has been widely used as an estimator of regularity in many scientific fields. It has proved to be a useful tool because of its ability to distinguish different system’s dynamics when there is only available short-length noisy data. Incorrect parameter selection (embedding dimension , threshold and data length ) and the presence of noise in the signal can undermine the ApEn discrimination capacity. In this work we show that () can also be used as a feature to discern between dynamics. Moreover, the combined use of and allows a better discrimination capacity to be accomplished, even in the presence of noise. We conducted our studies using real physiological time series and simulated signals corresponding to both low- and high-dimensional systems. When is incapable of discerning between different dynamics because of the noise presence, our results suggest that provides additional information that can be useful for classification purposes. Based on cross-validation tests, we conclude that, for short length noisy signals, the joint use of and can significantly decrease the misclassification rate of a linear classifier in comparison with their isolated use.

###### keywords:

Non-linear dynamics, Approximate entropy, Chaotic time-series.^{†}

^{†}journal: Physica A: Statistical Mechanics and its Applications.\biboptions

numbers,square

## 1 Introduction

The concept of changing complexity has proved to be helpful to characterize and assess different phenomena in areas such as seismology, economy, mechanics, physiology, etc. potirakis2012 (); isik2010 (); He2012 (); Vaillancourt2002 (). In the last 30 years this challenging endeavor has led researchers and practitioners to develop different methods conceived to estimate and understand such complexity changes and their relationship with physical and biological system dynamics. In the early nineties, Lipsitz et al. reported that the process of natural aging is attached to a decrease of complexity in the dynamics of physiological functions Lipsitz1992a (). This results in a loss in the capacity of the organism to adapt to stress, making it more vulnerable to diseases.

Approximate entropy finds its origins in Kolmogorov-Sinai Entropy (K-S Entropy), defined as the mean rate of information generated by a process Kol1958 (); Eckmann1985 (). This measure is recognized for being a meaningful parameter to describe the behavior of dynamical systems. In Grassberger1983 () Grassberger and Procaccia provided an algorithm to calculate a lower bound for the K-S Entropy from a finite time series. Takens Takens1983 () and Eckmann and Ruelle Eckmann1985 () modified this approach to directly evaluate the K-S Entropy. Motivated by Eckmann-Ruelle Entropy, Pincus introduced the ApEn Pincus1991 (), providing a statistic to assess complexity from noisy short-length data. For an -dimensional time series, ApEn depends on two parameters: the Embedding Dimension () and the Threshold (). and can be seen respectively as a family of parametric statistics and estimators, designed to measure the regularity of a system. ApEn has been widely used as a non-linear feature to classify different dynamics, for example epileptic seizures Acharya2012a (); Srinivasan2007 (); Shen2011 () and sleep apnea Acharya2011 ().

Because of the bias introduced by counting self-matches and the finite data length (), in Pincus1994 (); Richman2000a () the authors assert that the estimator lacks of consistency. To overcome this limitation, Richman et al. proposed the Sample Entropy (SampEn) as a more consistent regularity measure Richman2000a (). However, both ApEn and SampEn are highly dependent on the set of chosen parameters (, ). Chon et al. Chon2009 () assert that neither ApEn nor SampEn is accurate in measuring the signal’s complexity when the calculations are made with the values of and recommended in the literature Pincus1991e (). Instead, the use of , i.e. the maximum value of , with fixed and was proposed as a more consistent estimator of system’s complexity Chon2009 (); Chen2005 (); Lu2008 ().

The signal’s noise level has an important influence on estimation and therefore is also affected. Pincus asserts that the reliability in the calculations could be seriously undermined when the Signal to Noise Ratio (SNR) is below 3 dB Pincus1991e (). To overcome this issue, some authors have proposed a pre-processing step, in which, techniques such as Empirical Mode Decomposition (EMD) Alam2011 (); Lv2010 () or Dyadic Wavelet Transform (DWT) Ocak2009 () have been used.

In this paper, we will show that ( value at which ) brings useful information and can be also used as a feature for classification purposes. Furthermore, the use of combined with can provide a more consistent method to discern between different dynamics even in presence of noise.

The remainder of this paper is organized as follows. In Section 2 we briefly recall the main approaches used for ApEn parameter selection and we present the methodology used in our simulations. In Section 3 the obtained results are summarized and discussed. Finally, in Section 4 the conclusions are presented.

## 2 Methods

In order to estimate for an -dimensional time series , given the parameters , and , the -dimensional embedded vectors with have to be considered. Then, the ApEn is defined as Pincus1991 ():

The ApEn measures the logarithmic likelihood that two points (, ) that are close (within a distance ) in an -dimensional space, remain close in an -dimensional space. Greater (lesser) likelihood of remaining close produces smaller (larger) ApEn values Pincus1991a (). It is important to recall that was not conceived as an approximate value of E-R Entropy, therefore it cannot certify chaos. However, its scope relies on its ability to compare different types of dynamics Pincus1991 (). Pincus asserts that, for a given system, ApEn values can vary significantly with and Pincus1991e (). For this reason, it cannot be seen as an absolute measure. Moreover, this situation emphasizes the importance of the parameters’ selection to draw conclusions from ApEn estimations. In order to make this paper self contained, we will review some results for parameter selection.

### 2.1 Embedding dimension ()

The main purpose of embedding a time series is to unfold the projection to a state space that is representative of the original system’s space, i.e. a reconstructed attractor must preserve the invariant characteristics of the original one Faust2012 (). Takens’ embedding theorem gives sufficient conditions to accomplish this task using any bigger than twice the Hausdorff dimension of the chaotic attractor. The idea is to estimate the minimum embedding dimension since a bigger will lead to excessive computational efforts. Kennel et al. proposed a parametric algorithm to determine the minimum embedding dimension, named False Nearest Neighbors Kennel992 (). Its main disadvantage is that the results highly depend on the choice of the algorithm parameters. A slightly different approach was proposed by Cao Cao1997 (). This method does not rely upon subjective parameters other than the embedding lag.

Pincus has suggested to set or Pincus1994 (); Pincus1991e (). That advice arises from the fact that, once is set, high values conduct to poor estimations. This is due to the bias introduced by self-counting and the decreased number of vectors available to estimate . The aforementioned approach may be convenient when low-dimensional systems are studied. However, when the dimension is high, this criterion will lead to a poor reconstruction of the process’ dynamics Wolf1985 (); Small2005 (), causing inconsistencies in presence of noise.

It is worthwhile noting that typical applications with ApEn have been conducted using the previously mentioned values of . Aletti et al. set to assess congenital heart malformation in children using Heart Rate Variability (HRV) signals Aletti2012 (). Zarjam et al. use and to calculate from electroencephalogram (EEG) signals to investigate changes in working memory load during the performance of a cognitive task with varying difficulty levels Zarjam2012 ().

### 2.2 Embedding Lag ()

The objective of selecting is to maximally spread the data in the phase space, removing redundancies and making fine features more easily discernible Small2005 (). In most ApEn applications is set to one. Kaffashi et al. Kaffashi2008a () concluded that, for time series generated by non-linear dynamics and whose Autocorrelation Function decays rapidly, is sufficient to provide a good estimation of signal complexity. However, for signals with long range correlation, a equal to time occurrence of the first local minimum of the Mutual Information Function or to the time occurrence of the first zero crossing of the Autocorrelation Function can provide additional information Small2005 (); Kaffashi2008a ().

### 2.3 Threshold (r)

As it was aforementioned, the statistics can vary significantly with . Pincus suggests that should lie between times the standard deviation (SD) of the raw signal Pincus2001 (); Pincus1992 (). The value should be large enough, not only to avoid significant contribution from noise, but also to admit a reasonable number of vectors being within a distance . This would ensure an acceptable estimation of the probability Pincus1991e (). However, with too large values, is unable to perform fine process distinctions and consequently, the value selection will greatly depend on the application Pincus1994 ().

Although the later approach has been broadly applied Sapoznikov1995a (); Pincus1996a (); Pincus1996b (), some authors assert that sometimes this methodology leads to an incorrect assessment of complexity Chon2009 (); Chen2005 (); Lu2008 (). They proposed the use of as a better complexity estimator. One main issue arises from the fact that the calculation of requires high computational efforts. To overcome this limitation, a set of equations was proposed to calculate a parameter as an approximation to Chon2009 (); Lu2008 (). Supported on experimental results with HRV signals, Castiglioni et al. concluded that the use of seems to be a reasonable approach, because this choice would allow the time series complexity to be better quantified than any other choice of Castiglioni2008 (). On the other hand, Liu et al. observed that was incapable of distinguishing between groups of healthy and heart failure subjects, in experiments with HRV signals. Further, since they found that fails in estimating for the Logistic map, they asserted that care must be taken when using Liu2011 (). In a recent study, Boskovic et al. Boskovic2012 (), present some evidence of instability. They observed that for two time series, the estimated value suggests opposite results when data length decreases. There are other algorithms conceived to reduce the computational effort of calculating the whole profile of ApEn as a function of and Zurek2012 (); Pan2011 ().

Model | Equation | Model Parameters | ApEn Parameters |
---|---|---|---|

Mackey-Glass | |||

Shilnikov’s type |

### 2.4 Simulations

In the presence of noise, the estimator could be incapable to discern between different dynamics. Here we address the hypothesis that provides additional information valuable for the discrimination process. In other words, the use of both and would increase the ability of discerning between different complexities in the case of noisy time series. To assess this hypothesis four simulations were conducted: three of them with synthetic signals and the last one with an EEG record.

As a first case the Mackey-Glass delay-differential equation was used Cao1997 (). Our aim is to assess these estimators on time series from a high-dimensional system Guevara1983 (). This system have been used not only to study the behavior of complexity estimators on high-dimensions Grassberger1983 () but also to model the dynamics of physiological control systems like the neurological system Guevara1983 (), the respiratory system Mackey1977 () and the hematopoietic system Mackey1977 ().

Two sets of realizations were produced for each value of the parameter (see Table 1). Each realization with points has a different initial condition, randomly chosen from a distribution. In order to avoid the influence of transients, the first points of each realization were discarded. The resulting signals (with length ) were normalized to have unitary energy. For two randomly selected signals, one from each set, its Mutual Information Function was calculated. Then, the lag corresponding to each first local minimum was selected, and the parameter was fixed as the largest between these values. was calculated for each signal, with and taking the values listed in Table 1 and and were found from the functions. Additionally an estimator of the minimum embedding dimension was calculated for all signals using Cao’s algorithm Cao1997 ().

With the goal of analyzing synthetic data from a system that resembles a particular physiological dynamics, a Shilnikov’s type chaos model was considered as a second case. The same methodology as in the first case was adopted for Shilnikov’s type model using two values of the parameter (see Table 1), which allow simulating EEG signals recorded during a seizure of petit mal epilepsy Friedrich1992 (). The initial conditions for each realization were selected from a distribution and the variable was used for the calculations.

In order to evaluate our method in presence of noise, white Gaussian noise was added to each signal (Mackey-Glass and Shilnikov) with SNR dB and SNR dB. Then, all realizations were normalized to have unitary energy and both and were calculated as previously described. Table 1 summarizes the models and the parameter values used to obtain the time series as well as the parameter values used to calculate the ApEn.

A real physiological signal recorded using stereo electroencephalography (EEG) with eight multilead electrodes (2 mm long and 1.5 mm apart) was studied. It was filtered and amplified using a 1-40 Hz band-pass filter. A four-pole Butterworth filter was used as anti-aliasing low-pass filter. This signal was digitized at 256 Hz through a 10 bits A/D converter. A physician accomplished the analysis of pre-ictal and ictal data by visual inspection of the EEG record. According to the visual assessment of the EEG seizure recording, the patient presented an epileptogenic area in the hippocampus with immediate propagation to the girus cingular and the supplementary motor area, on the left hemisphere. In Fig. 1, the EEG signal of two ictal and two pre-ictal episodes corresponding to a depth electrode in the hippocampus is presented. All these episodes contains 3000 data samples. The first pre-ictal and ictal episodes comprise the signal portions for and data points respectively. The second pre-ictal and ictal portions were selected for and respectively. Each of the data sets were normalized to have unitary energy and the parameter was selected as described above among the four signals. and were then calculated for . Additionally, white Gaussian noise was added to the raw EEG signal with SNR dB (the actual SNR of the EEG signal is unknown) and and were calculated again.

## 3 Results and Discussion

Fig. 2 summarizes the results obtained for the Mackey-Glass model simulations. The and mean and confidence interval (CI) are presented as functions of for two different parameter values. The CIs were empirically obtained by sorting the and values calculated from the realizations and taking the and the quantiles as the lower and upper bound respectively. In Fig. (a)a, it can be noticed that the curves of become closer as increases, achieving the maximum distance at . On the contrary, in Fig. (b)b it can be observed that the distance between the curves becomes larger as increases. Figs. (c)c and (d)d show the effect of noise over and estimations. First, notice that, compared to noise free figures, the mean values of and are increased due to the addition of noise. Additionally, in both cases, the CIs are reduced. Finally, the and curves for different parameter values are closer to each other than in the case without noise. It is important to remark that, in presence of noise, while is still able to discern between dynamics, the discrimination capacity of is highly reduced. In conclusion, these results suggest that can bring useful information even in presence of noise.

Shilnikov’s type chaos model results are presented in Fig. 3. In Fig. (a)a, it can be noticed that it is impossible to distinguish the two dynamics using calculated with . Nevertheless, embedding the system in a higher dimension such as (minimum embedding dimension), distinctions between dynamics can be made. However, in Fig. (b)b, it can be seen that for , indicates a difference between dynamics. The added noise has the same above mentioned influence over both and (see Figs. (c)c and (d)d). However, in this case the curves are closer than those corresponding to the Mackey-Glass system. From this simulation we can conclude that using or independently can be inconvenient for classification purposes. Instead, we propose to study the combined use of both estimators for this task.

In order to illustrate the advantages of this new approach, Fig. 4 shows scatter plots of vs for both models with noise (SNR dB), using and . In the presence of noise, it is enough to set and to use only to correctly differentiate the two dynamics from the Mackey-Glass model (see Fig. (c)c). However, in Fig. (a)a it can be noticed that provides additional information that can make easier the classification process. A slightly different situation can be appreciated for Shilnikov’s type dynamics. Fig. (c)c shows that it is not possible to discern between classes using calculated with . Nevertheless, with the information brought by , the two classes can be separated in a more suitable way. As presented before, when there is noise in the signal, the assessment of using an value equal or larger than the minimum embedding dimension could be more accurate and robust than just setting . Cao’s algorithm suggests that the minimum embedding dimension for both models should be . Such large value is the result of noise influence in the estimation of the systems’ minimum embedding dimension. The issue is that, for the Mackey-Glass model, losses its discrimination capacity for high values. Nonetheless, as can be appreciated in Fig. (b)b, the two classes can be still successfully separated using only . On the other hand, Fig. (d)d shows that for , the two different dynamics from the Shilnikov’s type model can be more conveniently clustered using than using . These results remark the importance of using both estimators together instead of each one individually.

In addition to the Mackey-Glass and Shilnikov’s systems we introduce calculations of and using dynamics from the Logistic map, , in two chaotic regimes (, ) with and without white Gaussian noise. and were evaluated using the procedure aforementioned for with and .

With the goal of quantitatively verifying the proficiency of and as classification features, we perform a 10-fold cross-validation using linear support vector machines (SVMs). We choose a linear classifier given that its simplicity will disclose the real quality of the features. The basic idea behind the SVMs is to separate the classes using the optimal hyperplane (the linear decision function that maximizes the distance between the closest points of different classes to the hyperplane) Vap2000 (). In Fig. 5 the Misclassification Rates (MR) for three classifiers as functions of and different noise levels (noiseless, SNR dB and SNR dB) are presented. The first classifier uses only as input feature, the second one uses only , and the third one uses jointly both estimators.

For the noiseless Mackey-Glass system, it can be seen in Fig. (a)a that the MR of the first classifier increases with , achieving its maximum () for . Further, the second classifier presents a non-zero MR only for . The MR for the third classifier is zero for with a maximum value of at . A similar behavior can be observed for the Mackey-Glass model immersed in noise (SNR dB). In contrast with the noiseless case, in Fig. (b)b it is shown that the MR of the classifier which uses only has been greatly increased. Additionally, using only , the classifier has non-zero MR for . Nevertheless, the MR of the classifier that uses both estimators still remains equal to zero for values. For the case in which the SNR dB (Fig. (c)c), it can be noticed that the MR of the third classifier is always below or equal to the lowest MR between the other two classifiers. The last results attest that, as an ensemble, and provide features that are robust against noise.

Regarding the results for the noiseless Shilnikov’s model, it is shown in Fig. (d)d that, for the MR of the first classifier is lower than the second classifier’s MR. For as well as for , the last statement is reversed. However, the MR for the third classifier remains below the MR of the other two ones, being zero for and for . Additionally, from Fig. (e)e it can be noticed that for all values the MR of the third classifier is always below or equal to the lowest MR value between the other two classifiers, being zero for and for . For the SNR dB case, it can be observed in Fig. (f)f that for very low MR values are achieved by the first and third classifiers, being zero for .

Very similar results were achieved with the logistic map (Figs. (g)g, (h)h, (i)i). The MR of the classifier that uses in conjunction and is zero for all values in the noiseless case and for in the SNR= dB case. From Fig. (i)i it can be seen that the MR of the third classifier is below the MR of the other classifiers for all values, being zero for and and achieves a very low value for . This results lead us to think about the usefulness of these estimators to discriminate dynamics from discrete-time non-linear systems.

It is important to notice that when these three systems were immersed in high levels of noise (see Figs. (c)c, (f)f and (i)i) the worst results were achieved for low values (specially for ). This suggests that increasing the embedding dimension could be beneficial for the discrimination process. As a conclusion, these results highlight the complementary relationship between both estimators and the benefits of being used together. It is also important to observe that the use of both estimators enlarges the range of values that can be selected to achieve a good classification performance in presence of noise. Nonetheless, using an estimate of the minimum embedding dimension can be a wise choice (see Fig. (c)c and (f)f for ).

There is an interesting fact in these results concerning the presence of noise in the time series. As it was discussed before, the addition of noise not only decreases the distance of and curves between different dynamics but it also reduces both estimators’ CI. The trade-off between these two phenomena is more evident as the SNR is reduced. In Figs. (b)b and (c)c, it can be seen that for the MR of the first classifier is larger for SNR dB than for SNR dB. As a consequence of this trade-off, the distributions of values for two different dynamics are more overlapped in the case with SNR dB than when SNR dB. The last statement can be verified comparing the Bhattacharyya coefficient () Kailath1967 () between the distributions of different dynamics for different SNRs. For two density functions and over the same domain , this coefficient is defined as , , being zero if and do not overlap. The coefficients between distributions () for SNR dB and SNR= dB are and respectively. This fact explains why the MR of this classifier is lower for SNR dB than for SNR dB when high values are used with systems like Mackey-Glass and Shilnikov’s.

An important topic that must be considered in the calculation of and is the data length. When the time series is short, the choice of large and values can be harmful because the estimation of conditional probabilities becomes unreliable Pincus1991 (); Pincus1994 (). However, there is another issue that can alter their estimation and it is related to the use of small values.

It is known that poor state space reconstructions are obtained when the system is embedded with an value smaller than the system’s minimum embedding dimension, and such situation brings to the occurrence of false neighbors Wolf1985 (), i.e. points that are close due to a low embedding dimension rather than because of the system’s dynamics. Given that the estimation of conditional probabilities is based on counting neighbors’ occurrences, an appropriate selection of the value demands to take into account an estimation of the minimum embedding dimension Small2005 (). With the aim of assess the behavior of our method as a function of the data length, the next simulation was conducted over the Logistic map and the Mackey-Glass system.

Two sets of realizations were built. Each set was obtained using a different value of the parameter for the Logistic map and, of the parameter for the Mackey-Glass system. Each signal of these sets was normalized to have unitary energy. For the Logistic map and were estimated with , and . For the Mackey-Glass system these estimators were evaluated with , and . Then, the misclassification rate of a linear SVM classifier that uses both estimators as features was computed using Leave one Out cross-validation. Additionally, the same procedure was used over the same signals contaminated with white Gaussian additive noise (SNR dB and SNR dB). It is important to mention that the values of and were suggested by the Cao’s algorithm Cao1997 () as minimum embedding dimensions for the noisy signals (SNR dB) from the Logistic map and, from the Mackey-Glass system respectively. For the Mackey-Glass system the calculations with were made for all values except .

In Fig. 6 it is shown the misclassification rate calculated for both systems as a function of and the noise level. In Fig. (a)a are shown the results for the noise free Mackey-Glass system. It can be observed that for small data length values the biggest errors are achieved using () and (); this is a consequence of the reduced amount of information available to estimate the conditional probabilities. Nevertheless, as is increased, the error for all values goes to zero. It must be noticed that for and the error is zero for all values.

On the other hand Fig. (b)b shows that, compared with the noiseless case, for (at ) and (at ) the error has increased its value from zero, whereas the error for and has decreased its values to zero for . Observe that the error is zero from regardless the value of . From Fig. (c)c it can be seen that, excluding the error (equal to ) for at , the biggest error is accomplished using followed by the one obtained with for . However, the error for and is always lesser or equal to the error achieved with or , moreover, it is zero starting from . Comparing Figs. (a)a and (c)c for m = 2 and m = 3 it is clear that, for small values, a poor state space reconstruction added to the presence of noise deteriorates the discrimination capacity of and .

For the noiseless Logistic map (Fig. (d)d) it can be observed that for the biggest error () belongs to the estimators calculated with followed by the error calculated with (). However, for and the error is zero. Moreover, as is increased, the error remains equal to zero for all values. It can be seen Fig. (e)e that the biggest error is achieved with for all values except . Instead, for the error is equal to zero for all values. It is worth mentioning that for all values the error obtained with and is always below or equal to the error attain with and . From Fig. (f)f it can be noticed that using produces the worst classification error regardless the value of and the best results are accomplished using and for almost all values.

Based on these results we can conclude that in the and estimation’s processes it is highly recommended to keep in mind that there exists a trade-off between and , and special attention is needed in the presence of noise. When data length is short and there is not noise in the signal, relative small values provides the best performance. However, in presence of noise, it would be wise either to use an estimation of the system’s minimum embedding dimension whenever it is possible, or to use a value as close as possible to it when the data length is a limitation. It must also be considered that in real applications, such as epileptic seizures’ detection, the duration of some events is only of a few samples: for example absence seizures often last less than seconds Shorvon2004 (), which correspond to samples using a standard sampling frequency of Hz. Although for small values there is no guarantee of an accurate estimation of nor with relative high values. The results here presented show that using values above or can increase the discrimination capacity of these estimators, specially in presence of noise.

The studies conducted on the EEG recording provided similar results to those obtained with the previous simulations. In Fig. 7 are presented the and curves as functions of for two ictal episodes and their respective pre-ictal segments. The distances (relative to the scale) between the curves of each ictal and its corresponding pre-ictal episodes are small for all the values, as can be observed in Fig. (a)a. On the contrary, Fig. (b)b suggests that can be used to discriminate between dynamics. Decreasing the SNR tends to reduce the distance between and curves (see Figs. (c)c and (d)d). However, for high values, the information given by can be useful to distinguish between dynamics.

It is worth mentioning that, for this signal and with these estimators, it is difficult to separate the ictal and pre-ictal episodes as isolated groups. However, it is possible to state differences between an ictal episode and its corresponding pre-ictal one. This result leads us to think that a suitable approach to detect ictal episodes from EEG signals, using these estimators, should be one in which their temporal evolution could be evaluated.

In order to assess this idea, we corrupted the EEG signal with white Gaussian noise (SNR dB) and we considered sliding windows of length , shifted data points. Each window was normalized to have unitary energy. and were estimated using for . With these results we proceed as follows: first, we built the matrices and , where the entry was the value of calculated with the i-th value of for the k-th window. The matrix was built alike with the values. Each matrix was statistically normalized (zero mean and unitary SD) by columns. Observe that the temporal evolution of and calculated with the i-th value can be evaluated by looking the i-th column of the and matrices respectively. A third matrix named was conformed by the horizontal concatenation of the above mentioned matrices: . Next, we performed a Principal Component Analysis (PCA) over each matrix. The first principal component (PC) can be thought as a summary that best represents the information collected by these estimators through all values. Finally, an algorithm for detection of abrupt mean changes (CUSUM) was applied on the PC of each matrix Montgomery2010 (). The target mean and the reference value were fixed as the average of the first twenty data points and two times their SD, respectively.

In Fig. 8 are presented the EEG signal with the four ictal episodes marked between vertical dashed lines (Fig. (a)a) and the results of the CUSUM algorithm applied over the PC of each matrix (Fig. (b)b). It can be observed in Fig. (b)b that all ictal episodes can be detected using the information contained in each matrix. Nevertheless, while some ictal episodes are better detected with , others are better detected with . On the other hand, a more consistent identification of ictal episodes can be achieved using in conjunction both estimators. These results suggest that, with the information provided by both and for different values, the ability to discriminate between different dynamics can be increased (even in presence of noise), since changes that cannot be identified in the temporal evolution of one estimator could be identified in the temporal evolution of the other one. It must be remarked that these findings only suggest the suitability to jointly use both estimators to detect ictal episodes from EEG signals. Further experiments with a large data base will be conducted in future works to statistically assess the performance of the proposed method to detect complexity changes in real signals.

## 4 Conclusions

The approximate entropy has been recognized by its ability to distinguish between different system’s dynamics when short-length data with moderate noise are available. However, it is also known that high noise levels and incorrect parameter selection can undermine its discrimination capacity. In order to overcome these difficulties, in this paper we have proposed a method based on the use of along with to discern between different dynamics. Using signals from real physiological and from simulated low- and high-dimensional systems, with and without noise, we have studied the behavior of and as functions of the embedding dimension, the data length and the noise level. The results indicate that, even in presence of noise, provides valuable information that can be used for classification purposes. Furthermore, as these estimators vary with , there is a complementary relationship between them, which strengthens the idea of using combined with to distinguish between dynamics. Cross-validation simulations have demonstrated that the jointly use of both estimators as input features, significantly decreases the misclassification rate of a simple linear classifier. Moreover, the conjoint use of both estimators enlarges the range of values that can be chosen to achieve a good classification performance. Concerning the data length, we have shown that for short-length signals good discriminating features can be achieved using relative small values if there is no noise. However, in presence of noise the discrimination capacity of and can be increased using values above or . Our results encourage the use of an estimation of the system’s minimum embedding dimension when it is possible, or the use of a close enough value when the data length is a limitation. We assert that as well as , the estimator can also be utilized to discern between dynamics even in the presence of noise. Moreover, the use of has shown to be helpful in such cases when is unable to contrast between processes that are immersed in noise. The link between and system complexity will be addressed in future studies, to reveal the nature of this relationship and its physical meaning.

## Acknowledgments

This work was supported by the National Agency for Scientific and Technological Promotion (ANPCyT), Universidad Nacional de Entre Ríos, and the National Scientific and Technical Research Council (CONICET) of Argentina.

## References

- (1) S. M. Potirakis, G. Minadakis, K. Eftaxias, Analysis of electromagnetic pre-seismic emissions using Fisher information and Tsallis entropy, Physica A: Statistical Mechanics and its Applications 391 (1) (2012) 300–306.
- (2) F. Isik, An entropy-based approach for measuring complexity in supply chains, International Journal of Production Research 48 (12) (2010) 3681–3696.
- (3) Y. He, J. Huang, B. Zhang, Approximate entropy as a nonlinear feature parameter for fault diagnosis in rotating machinery, Measurement Science and Technology 23 (4) (2012) 045603.
- (4) D. E. Vaillancourt, K. M. Newell, Changing complexity in human behavior and physiology through aging and disease, Neurobiology of Aging 23 (1) (2002) 1–11.
- (5) L. A. Lipsitz, A. L. Goldberger, Loss of complexity and aging, Journal of the American Medical Association 267 (13) (1992) 1806–1809.
- (6) A. Kolmogorov, New metric invariant of transitive dynamical systems and endomorphisms of Lebesgue spaces, Doklady of Russian Academy of Sciences 119 (N5) (1958) 861–864.
- (7) J. P. Eckmann, D. Ruelle, Ergodic theory of chaos and strange attractors, Reviews of Modern Physics 57 (3) (1985) 617–656.
- (8) P. Grassberger, I. Procaccia, Estimation of the Kolmogorov entropy from a chaotic signal, Physical review A 28 (4) (1983) 2591–2593.
- (9) F. Takens, Mechanical and gradient systems: local perturbations and generic properties, Boletim da Sociedade Brasileira de Matemática 14 (2) (1983) 147–162.
- (10) S. M. Pincus, Approximate entropy as a measure of system complexity, Proceedings of the National Academy of Sciences 88 (6) (1991) 2297–2301.
- (11) U. R. Acharya, F. Molinari, S. V. Sree, S. Chattopadhyay, K.-H. Ng, J. S. Suri, Automated diagnosis of epileptic EEG using entropies, Biomedical Signal Processing and Control 7 (4) (2012) 401–408.
- (12) V. Srinivasan, C. Eswaran, N. Sriraam, Approximate entropy-based epileptic EEG detection using artificial neural networks, IEEE Transactions on Information Technology in Biomedicine 11 (3) (2007) 288–295.
- (13) C. Shen, C. Chan, F. Lin, M. Chiu, J. Lin, J. Kao, C. Chen, F. Lai, Epileptic seizure detection for multichannel EEG signals with support vector machines, in: IEEE 11th International Conference on Bioinformatics and Bioengineering, 2011, pp. 39–43.
- (14) U. R. Acharya, E. C.-P. Chua, O. Faust, T.-C. Lim, L. F. B. Lim, Automated detection of sleep apnea from electrocardiogram signals using nonlinear parameters, Physiological Measurement 32 (3) (2011) 287.
- (15) S. M. Pincus, A. L. Goldberger, Physiological time-series analysis: what does regularity quantify?, American Journal of Physiology-Heart and Circulatory Physiology 266 (4) (1994) H1643–H1656.
- (16) J. S. Richman, J. R. Moorman, Physiological time-series analysis using approximate entropy and sample entropy, American Journal of Physiology-Heart and Circulatory Physiology 278 (6) (2000) H2039–H2049.
- (17) K. Chon, C. Scully, S. Lu, Approximate entropy for all signals, IEEE Engineering in Medicine and Biology Magazine 28 (6) (2009) 18–23.
- (18) S. Pincus, I. Gladstone, R. Ehrenkranz, A regularity statistic for medical data analysis, Journal of Clinical Monitoring 7 (4) (1991) 335–345.
- (19) X. Chen, I. Solomon, K. Chon, Comparison of the use of approximate entropy and sample entropy: applications to neural respiratory signal, in: 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005, pp. 4212–4215.
- (20) S. Lu, X. Chen, J. K. Kanters, I. C. Solomon, K. H. Chon, Automatic selection of the threshold value r for approximate entropy, IEEE Transactions on Biomedical Engineering 55 (8) (2008) 1966–1972.
- (21) S. Alam, M. I. H. Bhuiyan, Detection of epileptic seizures using chaotic and statistical features in the EMD domain, in: Annual IEEE India Conference, 2011, pp. 1–4.
- (22) L. Jianxin, W. Husheng, T. Jie, Feature extraction and application of engineering non-stationary signals based on EMD-approximate entropy, in: International Conference on Computer, Mechatronics, Control and Electronic Engineering, Vol. 5, 2010, pp. 222–225.
- (23) H. Ocak, Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy, Expert Systems with Applications 36 (2) (2009) 2027–2036.
- (24) S. Pincus, Approximate entropy: a complexity measure for biological time series data, in: Proceedings of the IEEE Seventeenth Annual Northeast Bioengineering Conference, 1991, pp. 35–36.
- (25) O. Faust, M. G. Bairy, Nonlinear analysis of physiological signals: a review, Journal of Mechanics in Medicine and Biology 12 (04) (2012) 1240015.
- (26) M. B. Kennel, R. Brown, H. D. I. Abarbanel, Determining embedding dimension for phase-space reconstruction using a geometrical construction, Physical Review A 45 (6) (1992) 3403–3411.
- (27) L. Cao, Practical method for determining the minimum embedding dimension of a scalar time series, Physica D: Nonlinear Phenomena 110 (1–2) (1997) 43–50.
- (28) A. Wolf, J. B. Swift, H. L. Swinney, J. A. Vastano, Determining Lyapunov exponents from a time series, Physica D: Nonlinear Phenomena 16 (3) (1985) 285–317.
- (29) M. Small, Applied Nonlinear Time Series Analysis: Applications in Physics, Physiology and Finance, World Scientific, 2005.
- (30) F. Aletti, M. Ferrario, T. Almas de Jesus, R. Stirbulov, A. Borghi Silva, S. Cerutti, L. Malosa Sampaio, Heart rate variability in children with cyanotic and acyanotic congenital heart disease: analysis by spectral and non linear indices, in: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012, pp. 4189–4192.
- (31) P. Zarjam, J. Epps, N. Lovell, F. Chen, Characterization of memory load in an arithmetic task using non-linear analysis of EEG signals, in: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012, pp. 3519–3522.
- (32) F. Kaffashi, R. Foglyano, C. G. Wilson, K. A. Loparo, The effect of time delay on approximate and sample entropy calculations, Physica D: Nonlinear Phenomena 237 (23) (2008) 3069–3074.
- (33) S. M. Pincus, Assessing serial irregularity and its implications for health, Annals of the New York Academy of Sciences 954 (1) (2001) 245–267.
- (34) S. M. Pincus, D. L. Keefe, Quantification of hormone pulsatility via an approximate entropy algorithm, American Journal of Physiology - Endocrinology And Metabolism 262 (5) (1992) E741–E754.
- (35) D. Sapoznikov, M. H. Luria, M. S. Gotsman, Detection of regularities in heart rate variations by linear and non-linear analysis: power spectrum versus approximate entropy, Computer Methods and Programs in Biomedicine 48 (3) (1995) 201–209.
- (36) S. M. Pincus, E. F. Gevers, I. C. Robinson, G. v. d. Berg, F. Roelfsema, M. L. Hartman, J. D. Veldhuis, Females secrete growth hormone with more process irregularity than males in both humans and rats, American Journal of Physiology - Endocrinology And Metabolism 270 (1) (1996) E107–E115.
- (37) S. M. Pincus, T. Mulligan, A. Iranmanesh, S. Gheorghiu, M. Godschalk, J. D. Veldhuis, Older males secrete luteinizing hormone and testosterone more irregularly, and jointly more asynchronously, than younger males, Proceedings of the National Academy of Sciences 93 (24) (1996) 14100–14105.
- (38) P. Castiglioni, M. Di Rienzo, How the threshold “r” influences approximate entropy analysis of heart-rate variability, in: Computers in Cardiology, 2008, 2008, pp. 561–564.
- (39) C. Liu, C. Liu, P. Shao, L. Li, X. Sun, X. Wang, F. Liu, Comparison of different threshold values r for approximate entropy: application to investigate the heart rate variability between heart failure and healthy control groups, Physiological Measurement 32 (2) (2011) 167.
- (40) A. Boskovic, T. Loncar-Turukalo, O. Sarenac, N. Japundzic-Zigon, D. Bajic, Unbiased entropy estimates in stress: a parameter study, Computers in Biology and Medicine 42 (6) (2012) 667–679.
- (41) S. Zurek, P. Guzik, S. Pawlak, M. Kosmider, J. Piskorski, On the relation between correlation dimension, approximate entropy and sample entropy parameters, and a fast algorithm for their calculation, Physica A: Statistical Mechanics and its Applications 391 (24) (2012) 6601–6610.
- (42) Y.-H. Pan, Y.-H. Wang, S.-F. Liang, K.-T. Lee, Fast computation of sample entropy and approximate entropy in biomedicine, Computer Methods and Programs in Biomedicine 104 (3) (2011) 382–396.
- (43) M. R. Guevara, L. Glass, M. C. Mackey, A. Shrier, Chaos in neurobiology, IEEE Transactions on Systems, Man, and Cybernetics 13 (5) (1983) 790–798.
- (44) M. C. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (4300) (1977) 287–289.
- (45) R. Friedrich, C. Uhl, Evolution of Dynamical Structures in Complex Systems, Springer Berlin–Heidelberg, NY, 1992.
- (46) V. Vapnik, The Nature of Statistical Learning Theory, 2nd Edition, Springer, 2000.
- (47) T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Transactions on Communication Technology 15 (1) (1967) 52–60.
- (48) S. D. Shorvon, D. Fish, W. E. Dodson, The Treatment of Epilepsy, Wiley, 2004.
- (49) D. C. Montgomery, G. C. Runger, Applied Statistics and Probability for Engineers, John Wiley & Sons, 2010.