Bounded Influence Propagation \tau-Estimation: A New Robust Method for ARMA Model Estimation

Bounded Influence Propagation -Estimation:
A New Robust Method for ARMA Model Estimation

Michael Muma,  and Abdelhak M. Zoubir,  M. Muma and A.M. Zoubir are with the Signal Processing Group, Technische Universität Darmstadt, Darmstadt, Germany. This work was supported by the project HANDiCAMS which acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 323944.
Abstract

A new robust and statistically efficient estimator for ARMA models called the bounded influence propagation (BIP) -estimator is proposed. The estimator incorporates an auxiliary model, which prevents the propagation of outliers. Strong consistency and asymptotic normality of the estimator for ARMA models that are driven by independently and identically distributed (iid) innovations with symmetric distributions are established. To analyze the infinitesimal effect of outliers on the estimator, the influence function is derived and computed explicitly for an AR(1) model with additive outliers. To obtain estimates for the AR() model, a robust Durbin-Levinson type and a forward-backward algorithm are proposed. An iterative algorithm to robustly obtain ARMA(,) parameter estimates is also presented. The problem of finding a robust initialization is addressed, which for orders is a non-trivial matter. Numerical experiments are conducted to compare the finite sample performance of the proposed estimator to existing robust methodologies for different types of outliers both in terms of average and of worst-case performance, as measured by the maximum bias curve. To illustrate the practical applicability of the proposed estimator, a real-data example of outlier cleaning for R-R interval plots derived from electrocardiographic (ECG) data is considered. The proposed estimator is not limited to biomedical applications, but is also useful in any real-world problem whose observations can be modeled as an ARMA process disturbed by outliers or impulsive noise.

Robust Estimation, ARMA, Bounded Influence Propagation, Robustness, Dependent Data, Outliers, -estimator, Artifacts, Influence Function, ECG, HRV

I Introduction

Autoregressive moving-average (ARMA) models are amongst the most popular models for characterizing dependent data and they have a long tradition in numerous real-world applications, e.g. in speech processing [1], biomedicine [2, 3], radar [4], electricity consumption forecasting [5, 6, 7], system identification [8] and econometry [9]. Numerous extensions of the ARMA model, such as Seasonal Integrated ARMA (SARIMA) [7], Periodic ARMA (PARMA) [10], Controlled ARMA [11], and Time-Varying ARMA (TV-ARMA) models [12] have been proposed.

This paper focusses on robust parameter estimation for ARMA models associated with random processes for which the majority of samples are appropriately modeled by a stationary and invertible ARMA model and a minority consists of outliers with respect to the ARMA model. For such cases and, in general, classical estimators are unreliable and may break down completely [13, 14, 15, 16, 17, 18, 19, 20, 21, 5, 6, 7, 22, 23]. The nature of the outliers depends on the application. For example, motion artifacts are often evident in biomedical signals such as intracranial pressure (ICP), electrocardiographic (ECG) and photoplethysmographic (PPG) signals [24, 25, 26, 27, 28] while in electricity consumption forecasting outliers are associated with holidays, major sporting events and strikes [5, 21]. For a discussion on how outliers affect ARMA parameter estimation, the reader is referred, e.g. to [14, 16, 19, 29, 30] and there is a clear need for robust methods that can, to some extent, resist outliers. First contributions to robust estimation for dependent data were made in the 1980’s [31, 32, 33, 34], and in recent years, research in this area has increased significantly (e.g. [35, 18, 36, 37, 38, 19, 39, 40, 27, 22, 23, 41, 20, 5, 42, 43, 44, 45, 6, 7, 26, 24, 46, 47]).

Research on robust ARMA parameter estimation may be loosely grouped into two categories which are associated with the diagnostic approach (e.g. [13, 14, 15, 16, 17, 38, 39, 23, 47]) and the statistically robust approach (e.g. [18, 19, 20, 21, 5, 6, 7, 27, 22, 48, 49]). Diagnostic approaches enhance robustness via detection and hard rejection of outliers, followed by a classical parameter estimation method that handles missing values. Statistically robust methods utilize the entire data set and accommodate the outliers by bounding their influence on the parameter estimates. Robust statistical theory also provides measures, such as the influence function (IF), the breakdown point and the maximum bias curve [19, 50, 21], which characterize quantitative and qualitative robustness and allow for an analytical comparison of different estimators.

The main contributions of this paper is to propose and analyze a new estimator for ARMA model parameters called the bounded influence propagation (BIP) -estimator which is simultaneously robust and possesses a controllable statistical efficiency. Robustness and high efficiency are jointly achieved by incorporating an auxiliary model which prevents the propagation of outliers into the -estimator. The term ’propagation of outliers’ means that one outlier in the observations creates multiple outliers in the reconstructed innovation series. The BIP -estimate minimizes a robust and efficient scale of the reconstructed innovation series. In Theorem 1, strong consistency of the -estimator of the ARMA parameters is established. In Lemma 1, Fisher consistency of the -estimator of the ARMA parameters is shown, given all past observations. In Lemma 2, almost sure convergence of the -estimator of the innovations scale to the population value based on the expectation operator is proven. In Theorem 2, under an ARMA model, it is established that the BIP -estimator is asymptotically equivalent to a -estimator. Theorems 1 and 2 together prove the strong consistency of the proposed estimator under general conditions, which include the Gaussian ARMA model as a special case. In Theorem 3, asymptotic normality of the estimator for the ARMA model is proven by deriving the asymptotic equivalence to an M-estimator. To analyze the infinitesimal robustness of the BIP -estimator in the asymptotic case, its IF is derived. The IF is explicitly computed for an autoregressive process of order one, AR(1), in the case of additive outliers. To compute the estimates for the AR() model, a computationally efficient robust Durbin-Levinson type algorithm is proposed that incorporates the BIP model. Here the parameters are recursively found for increasing orders. In this way, searching for a robust starting point to minimize a non-convex cost function is avoided, which is a key-difficulty in robust estimation. A forward-backward algorithm to recursively compute the AR() parameters is also proposed. In the search for ARMA parameter estimates, a Marquard algorithm is used to find the parameters that minimize the -scale of the innovations. For this case, an algorithm to find a robust starting point is presented. The starting point algorithm uses a BIP-AR model based outlier cleaning operation. Numerical experiments to evaluate the estimator in terms of the maximum bias curve in order to assess its quantitative robustness and also to compare it to existing benchmark estimators are conducted. In particular, Monte Carlo experiments for ARMA models of orders are performed. This is unusual in robust ARMA parameter estimation, which usually is limited to ARMA models of lower orders. Patchy and independent replacement and additive outliers of different types are considered in the simulations. Finally, the proposed estimator is applied to a real-data example of artifact cleaning for R-R interval plots derived from electrocardiographic (ECG) data. R-R intervals denote the time intervals between consecutive heart beats and are used in heart rate and heart rate variability analysis.

Relation to existing work: In the analysis of our estimator, we build upon theoretical results that were established for the BIP MM-estimator [20]. As for the classical regression setting, the [51] and MM [52] are alternative estimators with similar statistical and robustness properties. In the context of AR parameter estimation, a key advantage of the -estimator is its definition via the -scale. Based on this definition, a robust Durbin-Levinson type procedure is proposed. Further, the starting point for the BIP MM, especially for is difficult to find and expressions for the IF are not available for the BIP MM-estimator. Our estimator is also conceptually related to the filtered -estimator [19], which uses a robust filter to prevent outlier propagation. A disadvantage of the filtered estimators is that they are intractable in terms of robustness and asymptotic statistical analysis.

The paper is organized as follows. Section II introduces the signal and outlier models and discusses the propagation of outliers. Section III introduces the BIP -estimator and details associated statistical and robustness analysis. Section IV presents an algorithm for computing the stationary and invertible BIP -estimates. Section V compares the performance of the proposed BIP -estimator with existing ARMA parameter estimators via Monte Carlo simulations. Section VI provides a real-data example of artifact cleaning for R-R interval plots derived from ECG data. Conclusions, and possible extensions of this research are presented in Section VII.

Notation. Vectors (matrices) are denoted by bold-faced lowercase (uppercase letters), e.g. (). The th column vector of a matrix is denoted by . is the transpose operator. Sets are denoted by calligraphic letters, e.g. . refers to the estimator (or estimate) of the parameter vector , , and are, respectively, the probability density function (pdf) and cumulative distribution function (cdf) of , and are, respectively, the joint pdf and joint cdf of the random variables and , is the pdf of conditioned on and given . is the probability that . is the expectation operator, while denotes convergence to the normal distribution with mean vector and covariance matrix . Given a function , is the -dimensional column vector whose th element is . Finally, denotes the grid of equidistant points in , ranging from to with a step size of .

Ii Signal and Outlier Models

The ARMA and Bounded Innovation Propagation (BIP)-ARMA signal models, as well as some important outlier models, are briefly revisited. Attention is drawn to the fact that estimators, which are computed based on the innovations, require a mechanism that prevents the propagation of outliers.

Ii-a Signal model

Let

(1)

denote a sequence of observations that was generated by a stationary and invertible ARMA() process up to time according to

(2)

where the true parameter vector , and .
(A1) Assume that are independent and identically distributed (iid) random variables with a symmetric distribution and further assume that .
To restrict the parameter space in a manner which is consistent with a stationary and invertible ARMA model, let be a parameter vector defined by the polynomials

(3)

and

(4)

which have all their roots outside the unit circle. Then, by defining

(5)

the following recursion follows

(6)

and .
(A2) Assume that and do not have common roots.

Ii-B Outlier models

In real-world applications, the observations may not exactly follow (2). There exist several statistical models for outliers in dependent data (see e.g. [13, 14, 15, 16, 17, 23, 19, 21]). The following provides a brief review of important models.
The additive outlier (AO) model defines contaminated observations according to

(7)

where follows an ARMA model, as given in (2), defines the contaminating process that is independent of and is a stationary random process for which

(8)

For the replacement outlier (RO) model

(9)

where is independent of and is defined by (8). As discussed, e.g. in [19, 21], innovation outliers, i.e., outliers in , can be dealt with by classical robust estimators.

Outliers may also differ in their temporal structure. For isolated outliers, takes the value 1, such that at least one non-outlying observation is between two outliers (e.g. follows an independent Bernoulli distribution). For patchy outliers, on the other hand, takes the value 1 for subsequent samples.

Ii-C Bounded innovation propagation (BIP)-ARMA model

ARMA parameter estimation, i.e., determining , is often based on minimizing some function of the reconstructed innovation sequence. However, as can be seen from (5), one AO or RO in can propagate onto multiple innovations . In the extreme case, all entries of the innovations sequence are disturbed by a single outlier. Thus, robust estimators are only applicable if they are combined with a mechanism to prevent outlier propagation. An auxiliary model to do this, is the BIP-ARMA model [20]:

Here, , where if , , while if , . ARMA models are included by setting . Thus, by choosing to be one of the well-known monotone or redescending nonlinearities (e.g., Huber’s or Tukey’s) [50], all innovations that lie within some region around are left untouched and, on the other hand, the effect of a single AO or RO is bounded to a single corrupted innovation. In (II-C), is a robust M-scale of [50, 21], i.e., it solves

(11)

where is defined as

(12)

To make the M-estimator consistent in scale with the standard deviation when the data is Gaussian, in (12), is the expectation operator with respect to the standard normal distribution.
(A3) Assume that is a real-valued function with the following properties: , and is continuous, non-constant and non-decreasing in . is bounded and continuous.
(A4) Assume that is an odd, bounded and continuous function.
From (II-C), the innovations sequence can be recursively obtained for according to

(13)

Fig. 1 illustrates the influence of for an ARMA(2,1) model with parameters , . The red crosses mark the AO positions in the observations. When reconstructing the innovations with an ARMA model (6) that uses , multiple innovation samples are contaminated. This effect is suppressed when applying the BIP-ARMA (13).

Fig. 1: (blue) True innovations sequence; (red) innovations derived from a Gaussian ARMA(2,1) observation with AOs whose positions are marked with red crosses; (black) innovations obtained when using a BIP-ARMA(2,1) model. In both cases, the true parameter vector , is used.

Iii Proposed Estimator

We next define an estimator that is based on the idea of minimizing a robust and efficient scale of the reconstructed innovations, the -scale. The estimator is defined for the case when the sample size exceeds the number of model parameters, i.e., . It computes the -scale both for innovations reconstructed from the ARMA in (6) and from the BIP-ARMA in (13), and chooses as a final estimate , which provides the smaller -scale. We show, for iid innovations with a symmetric pdf, that the proposed estimator is strongly consistent with the ARMA model (Theorem 1 and Theorem 2). Further, the estimator is asymptotically normal for the ARMA model case with a controllable efficiency with respect to the maximum-likelihood estimator (Theorem 3). Finally, an expression for the IF which measures robustness against infinitesimal contamination is provided.

Iii-a Definition of the -estimator under the ARMA model

Let be an M-estimate of the scale of based on which satisfies A3, i.e.,

(14)

(A5) Assume that .
The -estimate of under the ARMA model is defined according to

(15)

where is the -estimate [51] of the scale of and is defined as

(16)

Here where for some small .
(A6) Assume that satisfies A3, and additionally, , where .

Iii-B Definition of the -estimator under the BIP ARMA model

The -estimate of under the BIP-ARMA model is defined according to

(17)

where

and is recursively obtained from (13). To compute , the MA-infinity representation of the BIP-ARMA model is used

(19)

where are the coefficients of . It then follows that

(20)

where is the standard deviation of and

(21)

The estimate of in Eq. (20) can then be computed according to

(22)

with , and where is chosen sufficiently large to approximate the MA-infinity representation.

Iii-C Definition of the proposed -estimator

The final -estimate of the innovations scale is

(23)

and the final parameter estimate becomes

(24)

It is shown in Sec. III-D that when the data follows an ARMA model without outliers, the result that is asymptotically obtained for . This implies that the asymptotic efficiency of is independent of . However, this does not hold in the finite sample size case.

Iii-D Statistical analysis

Theorem 1. establishes strong consistency of the -estimator of the ARMA parameters.
Assume that follows from Eq. (2) with satisfying A1. Further, assume that satisfies A3 and A5 and that satisfies A6. Then, the -estimator defined in Eq. (15) is strongly consistent for .
Proving this theorem requires Lemmas 1-3.
Lemma 1. provides the Fisher consistency of the -estimator of the ARMA parameters given all past observations111For visual clarity, let , and .
Let be an observation from an ARMA(p,q), as in Eq. (2). Assume that is bounded and satisfies A3 and A5. It then holds, with denoting the true innovations scale, that if and . This implies that the estimate , as defined in Eq. (15), is Fisher consistent for .

Proof.

Consider the assumptions made in Theorem 1 which are the same assumptions made in Lemma 2 in [53]. This lemma states: if and it holds, for an M-estimate of scale defined by

(25)

that . Since for

(26)

where

(27)

and

(28)

it follows by defining

(29)

that

(30)

Using Lemma 3.1 (i) from [54] it then follows that

(31)

for all . Then, using Lemma 3.1 (ii) from [54], and assuming that is continuously differentiable, it is sufficient to show, for , that

(32)

is nondecreasing with respect to , since A6 implies that

(33)

Lemma 2. states the almost sure convergence of the -estimator of the innovations scale to the population value based on the expectation operator.
Under the assumptions of Theorem 1, for any , it follows that

(34)
Proof.

The continuity and positivity of the M-scale functional defined in Eq. (25) was shown in Lemma 5 of [53]. The continuity and positivity of follows from (30), as long as satisfies A3. Let

(35)

and

(36)

Then and . According to Lemma 5 of [53], it holds, for any , that

(37)

From Lemma 2 of [55], it holds, under the assumptions A3, A6 on , , that

Eq. (34) then follows from (37), (III-D) and (30). ∎

Lemma 3
Under the assumptions of Theorem 1, there exists , such that

(39)

Proof.

The proof is based on the one given in Lemma 6 of [53] which states that

(40)

where has been replaced by , , and is assumed to be consistent with A3 and A6. Then, using the continuity and positivity of and the definition of the -scale of (30), (39) follows from (40). ∎

Proof of Theorem 1.

Take arbitrarily small and let be as in Lemma 3. The continuity of the M-scale functional defined in (25) follows from Lebesgue’s dominated convergence theorem. The continuity of follows from (30) as long as satisfies A3. By Lemma 1 of this paper, there exists such that

(41)

By Lemma 2 of this paper, there exists , such that for

(42)

and

(43)

By Lemma 3, there exists , such that for

(44)

Therefore, for it holds that , which proves the theorem. ∎

Theorem 2. establishes, under an ARMA model, that the BIP - is asymptotically equivalent to a -estimator.
Assume that follows (2) with satisfying A1. Further, assume that and are bounded, that satisfies A3 and A5, that satisfies A6, that for any compact set , and, finally, that satisfies A4. Then, if is not white noise, with probability 1, there exists , such that for all and then a.s..

Proof.

Theorem 2 of [53] shows that

(45)

Starting from (45),

(46)

follows from Lemmas 9 and 10 of [53] together with Eq. (30), as long as satisfies A3. Furthermore, in Theorem 1, it is established that

(47)

and this proves the theorem. ∎

Theorem 3. establishes the asymptotic normality of the estimator for the ARMA model.
Let be as in (2), let A1, A2, A3 be fulfilled and let . Further, assume that and are continuous and bounded functions. Then, the -estimator is asymptotically normally distributed with

(48)

where

(49)

with ,

(50)

and being the matrix of dimensions with elements

(51)
(52)
(53)
(54)

Here and .

Proof.

According to Theorem 5 of [53], an M-estimator, under the same assumptions that are made in this theorem, is asymptotically normally distributed with

(55)

where

(56)

To prove Theorem 3, it must be shown that the -estimator of the ARMA parameters satisfies an -estimating equation. Differentiating (15) yields the following system of equations:

(57)

Here,

(58)

with , where

(59)
(60)

and

(61)

Replacing (58) in (57) and defining

(62)

if satisfies A6, the -estimate satisfies an M-estimating equation

(63)

with data adaptive given by

(64)

Special cases are (i) which results in and the -estimator being equivalent to an LS estimator, (ii) which results in the -estimator being equivalent to an S-estimator. The asymptotic value of the estimator is defined by

(65)

and under suitable regularity conditions, i.e., ergodicity, the interchange of limits is justified (e.g. by dominated convergence) to yield