Bounded Influence Propagation \tau-Estimation: A New Robust Method for ARMA Model Estimation

# Bounded Influence Propagation τ-Estimation: A New Robust Method for ARMA Model Estimation

Michael Muma,  and Abdelhak M. Zoubir,  M. Muma and A.M. Zoubir are with the Signal Processing Group, Technische Universität Darmstadt, Darmstadt, Germany. This work was supported by the project HANDiCAMS which acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 323944.
###### Abstract

A new robust and statistically efficient estimator for ARMA models called the bounded influence propagation (BIP) -estimator is proposed. The estimator incorporates an auxiliary model, which prevents the propagation of outliers. Strong consistency and asymptotic normality of the estimator for ARMA models that are driven by independently and identically distributed (iid) innovations with symmetric distributions are established. To analyze the infinitesimal effect of outliers on the estimator, the influence function is derived and computed explicitly for an AR(1) model with additive outliers. To obtain estimates for the AR() model, a robust Durbin-Levinson type and a forward-backward algorithm are proposed. An iterative algorithm to robustly obtain ARMA(,) parameter estimates is also presented. The problem of finding a robust initialization is addressed, which for orders is a non-trivial matter. Numerical experiments are conducted to compare the finite sample performance of the proposed estimator to existing robust methodologies for different types of outliers both in terms of average and of worst-case performance, as measured by the maximum bias curve. To illustrate the practical applicability of the proposed estimator, a real-data example of outlier cleaning for R-R interval plots derived from electrocardiographic (ECG) data is considered. The proposed estimator is not limited to biomedical applications, but is also useful in any real-world problem whose observations can be modeled as an ARMA process disturbed by outliers or impulsive noise.

Robust Estimation, ARMA, Bounded Influence Propagation, Robustness, Dependent Data, Outliers, -estimator, Artifacts, Influence Function, ECG, HRV

## I Introduction

Autoregressive moving-average (ARMA) models are amongst the most popular models for characterizing dependent data and they have a long tradition in numerous real-world applications, e.g. in speech processing [1], biomedicine [2, 3], radar [4], electricity consumption forecasting [5, 6, 7], system identification [8] and econometry [9]. Numerous extensions of the ARMA model, such as Seasonal Integrated ARMA (SARIMA) [7], Periodic ARMA (PARMA) [10], Controlled ARMA [11], and Time-Varying ARMA (TV-ARMA) models [12] have been proposed.

This paper focusses on robust parameter estimation for ARMA models associated with random processes for which the majority of samples are appropriately modeled by a stationary and invertible ARMA model and a minority consists of outliers with respect to the ARMA model. For such cases and, in general, classical estimators are unreliable and may break down completely [13, 14, 15, 16, 17, 18, 19, 20, 21, 5, 6, 7, 22, 23]. The nature of the outliers depends on the application. For example, motion artifacts are often evident in biomedical signals such as intracranial pressure (ICP), electrocardiographic (ECG) and photoplethysmographic (PPG) signals [24, 25, 26, 27, 28] while in electricity consumption forecasting outliers are associated with holidays, major sporting events and strikes [5, 21]. For a discussion on how outliers affect ARMA parameter estimation, the reader is referred, e.g. to [14, 16, 19, 29, 30] and there is a clear need for robust methods that can, to some extent, resist outliers. First contributions to robust estimation for dependent data were made in the 1980’s [31, 32, 33, 34], and in recent years, research in this area has increased significantly (e.g. [35, 18, 36, 37, 38, 19, 39, 40, 27, 22, 23, 41, 20, 5, 42, 43, 44, 45, 6, 7, 26, 24, 46, 47]).

Research on robust ARMA parameter estimation may be loosely grouped into two categories which are associated with the diagnostic approach (e.g. [13, 14, 15, 16, 17, 38, 39, 23, 47]) and the statistically robust approach (e.g. [18, 19, 20, 21, 5, 6, 7, 27, 22, 48, 49]). Diagnostic approaches enhance robustness via detection and hard rejection of outliers, followed by a classical parameter estimation method that handles missing values. Statistically robust methods utilize the entire data set and accommodate the outliers by bounding their influence on the parameter estimates. Robust statistical theory also provides measures, such as the influence function (IF), the breakdown point and the maximum bias curve [19, 50, 21], which characterize quantitative and qualitative robustness and allow for an analytical comparison of different estimators.

The main contributions of this paper is to propose and analyze a new estimator for ARMA model parameters called the bounded influence propagation (BIP) -estimator which is simultaneously robust and possesses a controllable statistical efficiency. Robustness and high efficiency are jointly achieved by incorporating an auxiliary model which prevents the propagation of outliers into the -estimator. The term ’propagation of outliers’ means that one outlier in the observations creates multiple outliers in the reconstructed innovation series. The BIP -estimate minimizes a robust and efficient scale of the reconstructed innovation series. In Theorem 1, strong consistency of the -estimator of the ARMA parameters is established. In Lemma 1, Fisher consistency of the -estimator of the ARMA parameters is shown, given all past observations. In Lemma 2, almost sure convergence of the -estimator of the innovations scale to the population value based on the expectation operator is proven. In Theorem 2, under an ARMA model, it is established that the BIP -estimator is asymptotically equivalent to a -estimator. Theorems 1 and 2 together prove the strong consistency of the proposed estimator under general conditions, which include the Gaussian ARMA model as a special case. In Theorem 3, asymptotic normality of the estimator for the ARMA model is proven by deriving the asymptotic equivalence to an M-estimator. To analyze the infinitesimal robustness of the BIP -estimator in the asymptotic case, its IF is derived. The IF is explicitly computed for an autoregressive process of order one, AR(1), in the case of additive outliers. To compute the estimates for the AR() model, a computationally efficient robust Durbin-Levinson type algorithm is proposed that incorporates the BIP model. Here the parameters are recursively found for increasing orders. In this way, searching for a robust starting point to minimize a non-convex cost function is avoided, which is a key-difficulty in robust estimation. A forward-backward algorithm to recursively compute the AR() parameters is also proposed. In the search for ARMA parameter estimates, a Marquard algorithm is used to find the parameters that minimize the -scale of the innovations. For this case, an algorithm to find a robust starting point is presented. The starting point algorithm uses a BIP-AR model based outlier cleaning operation. Numerical experiments to evaluate the estimator in terms of the maximum bias curve in order to assess its quantitative robustness and also to compare it to existing benchmark estimators are conducted. In particular, Monte Carlo experiments for ARMA models of orders are performed. This is unusual in robust ARMA parameter estimation, which usually is limited to ARMA models of lower orders. Patchy and independent replacement and additive outliers of different types are considered in the simulations. Finally, the proposed estimator is applied to a real-data example of artifact cleaning for R-R interval plots derived from electrocardiographic (ECG) data. R-R intervals denote the time intervals between consecutive heart beats and are used in heart rate and heart rate variability analysis.

Relation to existing work: In the analysis of our estimator, we build upon theoretical results that were established for the BIP MM-estimator [20]. As for the classical regression setting, the [51] and MM [52] are alternative estimators with similar statistical and robustness properties. In the context of AR parameter estimation, a key advantage of the -estimator is its definition via the -scale. Based on this definition, a robust Durbin-Levinson type procedure is proposed. Further, the starting point for the BIP MM, especially for is difficult to find and expressions for the IF are not available for the BIP MM-estimator. Our estimator is also conceptually related to the filtered -estimator [19], which uses a robust filter to prevent outlier propagation. A disadvantage of the filtered estimators is that they are intractable in terms of robustness and asymptotic statistical analysis.

The paper is organized as follows. Section II introduces the signal and outlier models and discusses the propagation of outliers. Section III introduces the BIP -estimator and details associated statistical and robustness analysis. Section IV presents an algorithm for computing the stationary and invertible BIP -estimates. Section V compares the performance of the proposed BIP -estimator with existing ARMA parameter estimators via Monte Carlo simulations. Section VI provides a real-data example of artifact cleaning for R-R interval plots derived from ECG data. Conclusions, and possible extensions of this research are presented in Section VII.

Notation. Vectors (matrices) are denoted by bold-faced lowercase (uppercase letters), e.g. (). The th column vector of a matrix is denoted by . is the transpose operator. Sets are denoted by calligraphic letters, e.g. . refers to the estimator (or estimate) of the parameter vector , , and are, respectively, the probability density function (pdf) and cumulative distribution function (cdf) of , and are, respectively, the joint pdf and joint cdf of the random variables and , is the pdf of conditioned on and given . is the probability that . is the expectation operator, while denotes convergence to the normal distribution with mean vector and covariance matrix . Given a function , is the -dimensional column vector whose th element is . Finally, denotes the grid of equidistant points in , ranging from to with a step size of .

## Ii Signal and Outlier Models

The ARMA and Bounded Innovation Propagation (BIP)-ARMA signal models, as well as some important outlier models, are briefly revisited. Attention is drawn to the fact that estimators, which are computed based on the innovations, require a mechanism that prevents the propagation of outliers.

### Ii-a Signal model

Let

 yt=(…,yt−k,…,yt−1,yt) (1)

denote a sequence of observations that was generated by a stationary and invertible ARMA() process up to time according to

 yt=μ0+p∑i=1ϕ0,i(yt−i−μ0)+at(β0)−q∑i=1θ0,iat−i(β0) (2)

where the true parameter vector , and .
(A1) Assume that are independent and identically distributed (iid) random variables with a symmetric distribution and further assume that .
To restrict the parameter space in a manner which is consistent with a stationary and invertible ARMA model, let be a parameter vector defined by the polynomials

 ϕ(B)=1−p∑i=1ϕiBi (3)

and

 θ(B)=1−q∑i=1θiBi (4)

which have all their roots outside the unit circle. Then, by defining

 aet(β)=θ−1(B)ϕ(B)(yt−μ), (5)

the following recursion follows

 aet(β)=yt−μ−p∑i=1ϕi(yt−i−μ)+q∑i=1θiaet−i(β) (6)

and .
(A2) Assume that and do not have common roots.

### Ii-B Outlier models

In real-world applications, the observations may not exactly follow (2). There exist several statistical models for outliers in dependent data (see e.g. [13, 14, 15, 16, 17, 23, 19, 21]). The following provides a brief review of important models.
The additive outlier (AO) model defines contaminated observations according to

 yεt=xt+ξεtwt, (7)

where follows an ARMA model, as given in (2), defines the contaminating process that is independent of and is a stationary random process for which

 ξεt={1with probability ε0with probability (1−ε). (8)

For the replacement outlier (RO) model

 yεt=(1−ξεt)xt+ξεtwt, (9)

where is independent of and is defined by (8). As discussed, e.g. in [19, 21], innovation outliers, i.e., outliers in , can be dealt with by classical robust estimators.

Outliers may also differ in their temporal structure. For isolated outliers, takes the value 1, such that at least one non-outlying observation is between two outliers (e.g. follows an independent Bernoulli distribution). For patchy outliers, on the other hand, takes the value 1 for subsequent samples.

### Ii-C Bounded innovation propagation (BIP)-ARMA model

ARMA parameter estimation, i.e., determining , is often based on minimizing some function of the reconstructed innovation sequence. However, as can be seen from (5), one AO or RO in can propagate onto multiple innovations . In the extreme case, all entries of the innovations sequence are disturbed by a single outlier. Thus, robust estimators are only applicable if they are combined with a mechanism to prevent outlier propagation. An auxiliary model to do this, is the BIP-ARMA model [20]:

Here, , where if , , while if , . ARMA models are included by setting . Thus, by choosing to be one of the well-known monotone or redescending nonlinearities (e.g., Huber’s or Tukey’s) [50], all innovations that lie within some region around are left untouched and, on the other hand, the effect of a single AO or RO is bounded to a single corrupted innovation. In (II-C), is a robust M-scale of [50, 21], i.e., it solves

 E[ρ(atσ)]=b, (11)

where is defined as

 b=E[ρ(x)]. (12)

To make the M-estimator consistent in scale with the standard deviation when the data is Gaussian, in (12), is the expectation operator with respect to the standard normal distribution.
(A3) Assume that is a real-valued function with the following properties: , and is continuous, non-constant and non-decreasing in . is bounded and continuous.
(A4) Assume that is an odd, bounded and continuous function.
From (II-C), the innovations sequence can be recursively obtained for according to

 abt(β,σ) = yt−μ−p∑i=1ϕi(yt−i−μ)+r∑i=1(ϕiabt−i(β,σ) (13) +(θi−ϕi)ση(abt−i(β,σ)σ)).

Fig. 1 illustrates the influence of for an ARMA(2,1) model with parameters , . The red crosses mark the AO positions in the observations. When reconstructing the innovations with an ARMA model (6) that uses , multiple innovation samples are contaminated. This effect is suppressed when applying the BIP-ARMA (13).

## Iii Proposed Estimator

We next define an estimator that is based on the idea of minimizing a robust and efficient scale of the reconstructed innovations, the -scale. The estimator is defined for the case when the sample size exceeds the number of model parameters, i.e., . It computes the -scale both for innovations reconstructed from the ARMA in (6) and from the BIP-ARMA in (13), and chooses as a final estimate , which provides the smaller -scale. We show, for iid innovations with a symmetric pdf, that the proposed estimator is strongly consistent with the ARMA model (Theorem 1 and Theorem 2). Further, the estimator is asymptotically normal for the ARMA model case with a controllable efficiency with respect to the maximum-likelihood estimator (Theorem 3). Finally, an expression for the IF which measures robustness against infinitesimal contamination is provided.

### Iii-a Definition of the τ-estimator under the ARMA model

Let be an M-estimate of the scale of based on which satisfies A3, i.e.,

 1n−pn∑t=p+1ρ1(at(β)^σMn(an(β)))=b. (14)

(A5) Assume that .
The -estimate of under the ARMA model is defined according to

 ^βτ=arg min β∈B^στn(an(β)), (15)

where is the -estimate [51] of the scale of and is defined as

 ^στn(an(β))=^σMn(an(β)) ⎷1n−pn∑t=p+1ρ2(at(β)^σMn(an(β))) (16)

Here where for some small .
(A6) Assume that satisfies A3, and additionally, , where .

### Iii-B Definition of the τ-estimator under the BIP ARMA model

The -estimate of under the BIP-ARMA model is defined according to

 ^βbτ=arg min β∈B^στn(abn(β,^σ(β))), (17)

where

 ^στn(abn(β,^σ(β))) = ^σMn(abn(β,^σ(β)))⋅  ⎷1n−pn∑t=p+1ρ2(abt(β,^σ(β))^σMn(abn(β,^σ(β))))

and is recursively obtained from (13). To compute , the MA-infinity representation of the BIP-ARMA model is used

 yt=μ−at+∞∑i=1λiση(at−iσ), (19)

where are the coefficients of . It then follows that

 σ2(β)=σ2y1+κ2∑∞i=1λ2i(β), (20)

where is the standard deviation of and

 κ2=Var[η(atσ)]=E[(η(atσ)−E[η(atσ)])2]. (21)

The estimate of in Eq. (20) can then be computed according to

 ^σ2(β)=^στn(yn)21+κ2∑qlongi=1λ2i(β), (22)

with , and where is chosen sufficiently large to approximate the MA-infinity representation.

### Iii-C Definition of the proposed τ-estimator

The final -estimate of the innovations scale is

 ^σ∗τ=min{^στn(an(^βτ)),^στn(abn(^βbτ,^σ(^βbτ)))} (23)

and the final parameter estimate becomes

 ^β∗τ=⎧⎨⎩^βτif^στn(an(^βτ))<^στn(abn(^βbτ,^σ(^βbτ)))^βbτif^στn(abn(^βbτ,^σ(^βbτ)))<^στn(an(^βτ)). (24)

It is shown in Sec. III-D that when the data follows an ARMA model without outliers, the result that is asymptotically obtained for . This implies that the asymptotic efficiency of is independent of . However, this does not hold in the finite sample size case.

### Iii-D Statistical analysis

Theorem 1. establishes strong consistency of the -estimator of the ARMA parameters.
Assume that follows from Eq. (2) with satisfying A1. Further, assume that satisfies A3 and A5 and that satisfies A6. Then, the -estimator defined in Eq. (15) is strongly consistent for .
Proving this theorem requires Lemmas 1-3.
Lemma 1. provides the Fisher consistency of the -estimator of the ARMA parameters given all past observations111For visual clarity, let , and .
Let be an observation from an ARMA(p,q), as in Eq. (2). Assume that is bounded and satisfies A3 and A5. It then holds, with denoting the true innovations scale, that if and . This implies that the estimate , as defined in Eq. (15), is Fisher consistent for .

###### Proof.

Consider the assumptions made in Theorem 1 which are the same assumptions made in Lemma 2 in [53]. This lemma states: if and it holds, for an M-estimate of scale defined by

 E[ρ1(aet(β)^σM(β))]=b, (25)

that . Since for

 aet(β)=ω(B)at+c(μ0−μ) (26)

where

 ω(B)=θ−1(B)θ0(B)ϕ−10(B)ϕ(B)=1+∞∑i=1ωiBi (27)

and

 c=1−∑pi=1ϕi1−∑qi=1θi≠0 (28)

it follows by defining

 Δt(β)=∞∑i=1ωiat−i+c(μ0−μ), (29)

that

 ^σ2τ(β) = ^σ2M(β)E[ρ2(aet(β)^σM(β))] (30) = ^σ2M(β)E[ρ2(at+Δt(β)^σM(β))]

Using Lemma 3.1 (i) from [54] it then follows that

 ^σ2τ(β)>^σ2M(β)E[ρ2(at^σM(β))] (31)

for all . Then, using Lemma 3.1 (ii) from [54], and assuming that is continuously differentiable, it is sufficient to show, for , that

 h(^σM)=^σ2ME[ρ2(at^σM)] (32)

is nondecreasing with respect to , since A6 implies that

 (33)

Lemma 2. states the almost sure convergence of the -estimator of the innovations scale to the population value based on the expectation operator.
Under the assumptions of Theorem 1, for any , it follows that

 limn→∞sup β∈B0×[−d,d]|^στn(an(β))−^στ(β)|=0a.s.. (34)
###### Proof.

The continuity and positivity of the M-scale functional defined in Eq. (25) was shown in Lemma 5 of [53]. The continuity and positivity of follows from (30), as long as satisfies A3. Let

 h1=infβ∈B0×[−d,d]^στ(β) (35)

and

 h2=sup β∈B0×[−d,d]^στ(β). (36)

Then and . According to Lemma 5 of [53], it holds, for any , that

 limn→∞sup β∈B0×[−d,d]|^σMn(an(β))−^σM(β)|=0a.s.. (37)

From Lemma 2 of [55], it holds, under the assumptions A3, A6 on , , that

 limn→∞sup β∈B0×[−d,d],c∈[h1/2,2h2] ∣∣ ∣∣1n−pn∑t=p+1ρj(aet(β)c) −E[ρj(aet(β)c)]∣∣ ∣∣=0a.s..

Eq. (34) then follows from (37), (III-D) and (30). ∎

Lemma 3
Under the assumptions of Theorem 1, there exists , such that

 limn→∞inf|μ|>d,(ϕ,θ)∈B0inf^στn(an(β))>σ0+1a.s.. (39)

###### Proof.

The proof is based on the one given in Lemma 6 of [53] which states that

 limn→∞inf|μ|>d,(ϕ,θ)∈B0inf^σMn(an(β))>σ0+1a.s., (40)

where has been replaced by , , and is assumed to be consistent with A3 and A6. Then, using the continuity and positivity of and the definition of the -scale of (30), (39) follows from (40). ∎

###### Proof of Theorem 1.

Take arbitrarily small and let be as in Lemma 3. The continuity of the M-scale functional defined in (25) follows from Lebesgue’s dominated convergence theorem. The continuity of follows from (30) as long as satisfies A3. By Lemma 1 of this paper, there exists such that

 minβ∈B0×[−d,d],||β−β0||≥ξ^στ(β)≥σ0+γ. (41)

By Lemma 2 of this paper, there exists , such that for

 minβ∈B0×[−d,d],||β−β0||≥ξ^στn(an(β))≥σ0+γ/2 (42)

and

 ^στn(an(β0))≤σ0+γ/4. (43)

By Lemma 3, there exists , such that for

 inf|μ|>d,(ϕ,θ)∈B0^στn(an(β))>σ0+γa.s.. (44)

Therefore, for it holds that , which proves the theorem. ∎

Theorem 2. establishes, under an ARMA model, that the BIP - is asymptotically equivalent to a -estimator.
Assume that follows (2) with satisfying A1. Further, assume that and are bounded, that satisfies A3 and A5, that satisfies A6, that for any compact set , and, finally, that satisfies A4. Then, if is not white noise, with probability 1, there exists , such that for all and then a.s..

###### Proof.

Theorem 2 of [53] shows that

 limn→∞β∈Binf ^σMn(abn(β,^σ(ϕ,θ)))>σ0+δa.s.. (45)

Starting from (45),

 limn→∞β∈Binf^στn(abn(β,^σ(ϕ,θ)))>σ0+δa.s. (46)

follows from Lemmas 9 and 10 of [53] together with Eq. (30), as long as satisfies A3. Furthermore, in Theorem 1, it is established that

 limn→∞^στn(an(^βτ))=σ0a.s.. (47)

and this proves the theorem. ∎

Theorem 3. establishes the asymptotic normality of the estimator for the ARMA model.
Let be as in (2), let A1, A2, A3 be fulfilled and let . Further, assume that and are continuous and bounded functions. Then, the -estimator is asymptotically normally distributed with

 (n−p)1/2(^βτ−β0)−→DN(0,Σ), (48)

where

 Σ=σ20E[ψ2τ(at/σ0)]E2[ψ′τ(at/σ0)](σ2C−100c−20) (49)

with ,

 c0=−1−∑pi=1ϕ0i1−∑qi=1θ0i (50)

and being the matrix of dimensions with elements

 ci,j = ∞∑k=0νkνk+j−iifi≤j≤p, (51) cp+i,p+j = ∞∑k=0ϖkϖk+j−iifi≤j≤q, (52) ci,p+j = −∞∑k=0ϖkνk+j−iifi≤p,j≤p,i≤j, (53) ci,p+j = −∞∑k=0vkϖk+i−jifi≤p,j≤q,j≤i. (54)

Here and .

###### Proof.

According to Theorem 5 of [53], an M-estimator, under the same assumptions that are made in this theorem, is asymptotically normally distributed with

 (n−p)1/2(^βM−β0)−→DN(0,Σ), (55)

where

 Σ=σ20E[ψ2(at/σ0)]E2[ψ′(at/σ0)](σ2C−100c−20.) (56)

To prove Theorem 3, it must be shown that the -estimator of the ARMA parameters satisfies an -estimating equation. Differentiating (15) yields the following system of equations:

 ∇^στn(an(β))2 = 2^σMn(an(β))∇^σMn(an(β))⋅ (57) 1n−pn∑t=p+1ρ2(at(β)^σMn(an(β))) +1n−pn∑t=p+1ψ2(at(β)^σMn(an(β)))⋅ (∇at(β)^σMn(an(β))−at(β)∇^σMn(an(β))) =0.

Here,

 ∇^σMn(an(β))=−^σMn(an(β))∑nt=p+1ψ1(at(β)^σMn(an(β)))∇at(β)∑nt=p+1ψ1(at(β)^σMn(an(β)))at(β) (58)

with , where

 ∂aet(β)∂ϕi=−θ−1(B)(yt−i−μ),1≤i≤p, (59)
 ∂aet(β)∂θj=−θ−2(B)ϕ(B)(yt−j−μ),1≤j≤q, (60)

and

 ∂aet(β)∂μ=−1−∑pi=1ϕi1−∑qj=1θj. (61)

Replacing (58) in (57) and defining

 Wn(β)=∑nt=p+12ρ2(at(β)^σMn(an(β)))−ψ2(at(β)^σMn(an(β)))at(β)^σMn(an(β))∑nt=p+1ψ1(at(β)^σMn(an(β)))at(β)^σMn(an(β)), (62)

if satisfies A6, the -estimate satisfies an M-estimating equation

 n∑t=p+1ψτ(at(β)^σMn(an(β)))∇at(β)=0 (63)

with data adaptive given by

 ψτ(x)=Wn(β)ψ1(at(β)^σMn(an(β)))+ψ2(at(β)^σMn(an(β))). (64)

Special cases are (i) which results in and the -estimator being equivalent to an LS estimator, (ii) which results in the -estimator being equivalent to an S-estimator. The asymptotic value of the estimator is defined by

 limn→∞n∑t=p+1ψτ(at(β)^σMn(an(β)))∇at(β)=0 (65)

and under suitable regularity conditions, i.e., ergodicity, the interchange of limits is justified (e.g. by dominated convergence) to yield

 limn→∞n∑t=p+1ψτ(at(β)^σMn(an(β)))∇an(β)=E[ψτ(at(β)^σM(β))