Empirical Likelihood for Change Point Detection in Autoregressive Models

# Empirical Likelihood for Change Point Detection in Autoregressive Models

## Abstract

Change point analysis has become an important research topic in many fields of applications. Several research work has been carried out to detect changes and its locations in time series data. In this paper, a nonparametric method based on the empirical likelihood is proposed to detect the structural changes of the parameters in autoregressive (AR) models . Under certain conditions, the asymptotic null distribution of the empirical likelihood ratio test statistic is proved to be the extreme value distribution. Further, the consistency of the test statistic has been proved. Simulations have been carried out to show that the power of the proposed test statistic is significant. The proposed method is applied to real world data set to further illustrate the testing procedure.
Keywords: Autoregressive model; Change point analysis; Empirical Likelihood; Extreme value distribution; Consistency.

## 1 Introduction

Change point analysis introduced by Page (1954, 1955) has become popular due to its usage in wide variety of fields, such as stock market analysis, quality control, traffic mortality rate, geology data analysis, genetics, etc. It concerns both detecting whether or not a change(s) has (have) occurred, and identifying the location(s) of any such change(s). Several methods to identify and estimate the change points in the change point problem are proposed by scholars. Bayesian approach to detect changes in the mean has been discussed by Chernoff and Zacks (1964) and Sen Srivastava (1975). Further, Csörgó and Horváth (1997) and Chen and Gupta (2000) established asymptotic results on parametric change point models. Hawkins (1977), Worsley (1986) and Gombay and Horváth (1994) are a few among the many researchers who discussed change point problem under the parametric settings. However, the parametric methods are no longer applicable if the underlying distribution is completely unknown. In such a case, a nonparametric approach should be considered as an alternative. One such popular nonparametric approach is the Cumulative Sum (CUSUM) method. Most authors have assumed that the observations are independent and studied the case where two distributions differ only in location. Combining nonparametric approaches along with the change point detection has been studied by many scholars over the past years. Aue and Horváth (2012) discussed two methods, namely, Cumulative Sum (CUSUM) and Likelihood Ratio Test (LRT), on how they can be modified for data exhibiting serial dependence. Further, they provided some insight to the sequential procedure as well. Lee et. al. (2003) also discussed about the Cusum test for changes of parameters in time series models and considered the changes of the parameters in a random coefficient autoregressive model AR(1) and that of the autocovariances of a linear process.

The change point problem may be viewed as a two-sample test adjusted for the unknown break location, thus leading to max-type procedures. Correspondingly, asymptotic relationships are derived to obtain critical values for the tests. In general, the change point problem can described as follows. Let be a sequence of independent random vectors (variables) with probability distribution functions , respectively. More specifically, suppose that the distributions belong to a common parametric family F(), where , then the change point problem is to test the hypotheses about the population parameters

 H0:θ1=θ2=...=θn=θ(unknown),

versus the alternative

 H1:θ1=...=θk1≠θk1+1=...=θk2≠...≠θkq−1=...=θkq≠θkq+1...=θn,

where and are unknown and need to be estimated.

Empirical likelihood introduced by Owen (1988, 1990) is one of the popular and powerful nonparametric approaches. It has been widely used due to the robustness of its nonparametric nature and the efficiency of its likelihood construction. Kolaczyk (1994) used empirical likelihood with generalized linear models. Further, Qin and Lawless (1994) obtained estimating equations and derived asymptotic properties of the test statistic. Many scholars have discussed about the empirical likelihood ratio test for a change point in linear models, such as Zou et al. (2007) , Liu et al. (2008) , and Ning (2012). Since the empirical likelihood was originally proposed for independent data, it is difficult to apply it to dependent data such as time series data. Several approaches suggested to reduce the dependent data problem into an independent data problem. Owen (2001) suggested using the conditional likelihood to remove the dependence structure and generate the estimating equations. Kitamura (1997) used block-wise empirical likelihood method which preserves the dependence of data, and the resulting likelihood ratios have been used to construct asymptotically valid confidence intervals. Ogata (2005) and Nordman and Lahiri (2006) independently formulated a frequency domain empirical likelihood (FDEL) using spectral estimating equations which can be used for short- and long- range dependent data. Bai and Perron (1998) proposed CUSUM and F-based statistics for change point detection. Baragona et al. (2013) compared it with the test they proposed for change point detection based on the empirical likelihood approach for change point detection.

To deal with the situation of multiple changes, it traditionally uses the binary segmentation method proposed by Vostrikova (1981). The advantage of using this method is that it detects number of change points and estimates their locations simultaneously as well as the consistency of this method has been established. Hence, the general hypothesis of the change point problem can be simplified as the hypothesis of no change point versus a single change point, i.e. the alternative hypothesis is:

 H1:θ1=...=θk≠θk+1=...=θn,

where is the location of the single change point at this stage. If is not rejected, then the process is stopped and we conclude that there is no change. If is rejected, then there is a change point and the two subsequences before and after the change point found are tested for a change. This process is repeated until there are no subsequences having change points.

In this paper, we propose a test statistic based on the empirical likelihood approach for detecting changes in a time series model. In Section 2, the change point problem in time series models has been introduced for AR(p) model. The empirical likelihood procedure for change point detection is described in Section 3. The null asymptotic distribution of the test statistic and the consistency of the test along with the proofs are provided under Section 4. Simulations are carried out in Section 5 and a real data application is given in Section 6. Section 7 provides some discussion and proofs of results are given in the Appendix.

## 2 Changepoint Problem in AR(p) Model

Consider the stationary ar(p) model with the mean 0.

 Xt={∑pi=1ϕiXt−i+ϵt;1≤t≤k∑pi=1ϕ∗iXt−i+ϵt;k+1≤t≤n,

where ’s are independent random variables with mean zero and variance , (i.e. White noise process), are all unknown parameters, and is the unknown change location which needs to be estimated. Denote , where and . Therefore, the change point problem is to test the null hypothesis of no change in the autoregressive parameters versus the alternative hypothesis of one unknown change, i.e.,

 H0:δ=0vsH1:δ≠0,\small at least% one δi≠0.

Hence, under the alternative hypothesis, there is a change in at least one of the parameters at an unknown location. We denote and to be the parameter vectors under the null and the alternative hypothesis respectively. According to Owen (1991), we derive the estimating functions to be

 g1(Xi,β0)=(Xi,Xi−1ϵi,...,Xi−pϵi,ϵi2−σ2), (1)

where and

 g2(Xj,β1)=(Xj,Xj−1ϵj,...,Xj−pϵj,ϵj2−σ2), (2)

where It is easy to see that

 E[g1(Xi,β0)]=0, E[g2(Xj,β1)]=0.

for every and .

## 3 Empirical Likelihood for AR(p) Changepoint Model

WLOG, we assume one change point at an unknown location . Let

 ΩH0={(p,q,β0)|∑ipig1(Xi,β0)=∑jqjg2(Xj,β0)=0}, (3)

and

 ΩH1={(p,q,β1)|∑ipig1(Xi,β0)=0,∑jqjg2(Xj,β1)=0} (4)

be the parameter spaces under and , respectively, where and are the probability vectors such that , and , . If a change occurs at , then the empirical likelihood ratio test statistic is defined as,

 −2logΛk =−2logsupH0{∏ipi∏jqj|(p,q,β0)∈ΩH0}supH1{∏ipi∏jqj|(p,q,β1)∈ΩH1} =ZH0,k−ZH1,k,

where

 ZH0,k=−2supH0{∑ilogkpi+∑jlog(n−k)qj|(p,q,β0)∈ΩH0},
 ZH1,k=−2supH1{∑ilogkpi+∑jlog(n−k)qj|(p,q,β1)∈ΩH1}.

The null hypothesis is rejected for a sufficiently large value of . Let . A Lagrangian argument gives,

 pi=1nθnk(1+θ−1nkλ′1g1(Xi,⋅))

and

 qj=1n(1−θnk)(1+(1−θnk)−1λ′2g2(Xj,⋅))

where and are chosen such that and . Therefore, under we obtain

 ZH0,k=2infβ0supλ1,λ2{∑ilog(1+θ−1nkλ′1g1(Xi,β0))+∑jlog(1+(1−θnk)−1λ′2g2(Xi,β0))}.

Let . The score functions are defined as:

 Q1n(β0,λ)=∂ZH0,k∂λ=1n∑m11+θ−1mλ′g(Xm,β0)θ−1mg(Xm,β0),

and

 Q2n(β0,λ)=∂ZH0,k∂β0=1n∑m11+θ−1mλ′g(Xm,β0)θ−1m(∂g(Xm,β0)∂β0)′g(Xm,β0),

where

 θ−1m=θ−1nk1{1≤m≤k}+(1−θnk)−11{k+1≤m≤n}, g(Xm,β0)=g1(Xm,β0)1{1≤m≤k}+g2(Xm,β0)1{k+1≤m≤n}.

Under certain regularity conditions, Qin and Lawless (1994) showed, there exists () such that,

 Q1n(~β0,~λ)=0 and Q2n(~β0,~λ)=0.

Hence, we obtain .

Similarly, under we have, . Then the empirical likelihood ratio statistic can be rewritten as

 −2logΛk=2lE(~Φ∘,~μ∘,0)−2lE(~Φ,~μ,δ). (5)

Since is unknown, is rejected when the maximally selected log-likelihood ratio statistic,

 Zn=maxθnk∈Θn{−2logΛk},

where , is sufficiently large.

When or is too small, then the minimax estimators of empirical likelihood may not exist. Hence we consider the trimmed likelihood ratio statistic where the range of is selected arbitrarily as follows. The Trimmed likelihood ratio statistic is defined as,

 Z∗n=maxθnk∈Θ∗n{−2logΛk}, (6)

where . According to Perron and Vogelsang (1992), the selection of and can be arbitrary. In our work, we choose , where means the largest integer not larger than . If is true, then follows an asymptotic extreme value limit distribution. The convergence to the extreme value limit can be slow and asymptotic test often tends to be too conservative in finite samples.

## 4 Main Results

The results are similar to the ones by Csörgó and Horváth (1997). Under mild regularity conditions, the following theorem holds.

###### Theorem 1.

Let be the true parameter. Suppose that , , and is positive definite. If is true, then we have

 limn→∞Pr{A(logu(n))(Z∗n)12≤t+Dr(logu(n))}=exp(−e−t)

for all t, where , , , and is the dimension of the parameter .

###### Theorem 2.

Under the conditions of Theorem 1 and the condition that for every fixed parameter , there exists a positive constant satisfy that , holds. If is true, assume that as , then ELR test statistic is consistent, i.e. there exists a constant such that

 P(Zn>cn)→1.
###### Theorem 3.

Under the conditions of Theorem 1 and the condition that for every fixed parameter , there exists a positive constant satisfy that , holds. If is true, assume that as , we have in probability as .

Proofs are given in the Appendix.

## 5 Simulation Study

A Monte Carlo simulation has been conducted to illustrate the performance of the proposed method. Consider the following AR(1) model with mean :

 Xt={0.1Xt−1+ϵt;1≤t≤k0.5Xt−1+ϵt;k+1≤t≤n,

where is the white noise with mean zero and variance . Four different distributions are considered for : (i) , (ii) , (iii) , and (iv) . The power of the proposed test in detecting changes in parameters of the AR(1) model has been calculated for two different sample sizes: n=100, 150 and 250. Different change locations have been considered under each sample size. Additional simulations have been carried out to compute the empirical critical values under different significance levels which are turned out to be close to the theoretical critical values for the corresponding significance level. Hence, we use the theoretical critical value 2.9702 with  for power calculations with 1000 simulations. The results are listed in Table 1. It can be seen that the power of the hypothesis test of AR(1) model increases with the sample size. The power values under a given change location are approximately similar for the four different error distributions. This maybe due to the fact that the three distributions are standardized. When the change location is farther away from the starting location, then the power tends to decrease. Intuitively, this maybe due to the dependency existing in the data set.

## 6 Application

In this section, we study the data which consists of monthly average soybean prices achieved by farmers in Illinois from January 1960 to November 2008 with the sample size 587. The prices are given in dollars per bushel. This data was analyzed by Balcombe et al. (2007) who considered the threshold AR(1) models for modeling the prices of agricultural products. Berkes et al. (2011) studied this data set by proposing the likelihood ratio test to detect the structural change of an AR model to threshold AR model. We apply the proposed EL method for AR(1) changepoint model to detect the structural change in the same data set. Figure 1 shows the time series plot for the given data.

In order to test if there are significant changes, we use the from (6). The value of is 16.07426. Using the critical values derived under the Theorem are given in Table 2, we have sufficient evidence to reject the null hypothesis that there is no change.

## 7 Discussion

In this paper, we discuss developing an EL-based detecting procedure for structural changes in time series data, i.e. testing null hypothesis of no change versus alternative hypothesis of one change. A test statistic is derived for a fixed change location and the max-type of test statistic over all possible change locations is considered. The asymptotic null distribution of the test statistic has been established as extreme value distribution. Simulations to compute the power in AR(1) model have been carried out with different sample sizes and different error distributions in order to illustrate the performance of the proposed test statistic. The results indicate that the proposed method is efficiently identify the changes in a given time series data set. We should point out that, due to the slow convergence of the proposed test statistic in Theorem 1, the moderate or the large sample size is recommended to achieve the good approximation (See Csörgő and Horváth, 1997). If the sample size is small, the bootstrap is suggested to obtain the approximated p-values in practice.

As for future work, we plan to extend the proposed method to other stationary time series models such as MA, ARMA, GARCH models along with corresponding analytic results and simulations. Comparisons to other existing methods will be done. Further, sequential change point detection based on EL method is to be studied where the sample size is a random variable and the null hypothesis of sequential structural stability will be rejected as soon as a change is detected. Hence, the objective in sequential change point detection is to detect such a change with a minimum number of false alarms. A nonparametric testing procedure based on EL method will be proposed and related asymptotic results will be studied.

## Appendix

In order to prove Theorem 1, we need following Lemmas.

###### Lemma 1.

Assume that for is positive definite, is continuous in a neighborhood of the true value , , , and are all bounded in the neighborhood of the true value . Then, as , , with probability 1 satisfying,

 Q1n(~β,~λ)=0,Q2n(~β,~λ)=0 and ∥~β−β0∥=Op(m−12),

where

 Q1n(β,λ) =∑l11+λ′(β)θ−1lg(xl,β)θ−1lg(xl,β), Q2n(β,λ) =∑l11+λ′(β)θ−1lg(xl,β)θ−1l(∂g(xl,β)∂β)′λ(β).
###### Proof.

First we will show

 λ(β) =ϵkOp(m−12) =[1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1[1nn∑l=1θ−1lg(xl,β)]+ϵkop(m−12),

where and .
Let for where . Let  be the solution of the function  given by the first score function defined in Section 3.

 f(λ)=1nn∑l=1θ−1l1+λ′(β)θ−1lg(xl,β)g(xl,β)=0. (A.1)

Let where and .

 0 =∥f(ρu)∥ ≥|u′f(ρu)| =1n∣∣u′(∑lθ−1lg(xl,β)−ρ∑lθ−2lg(xl,β)u′g(xl,β)1+ρu′θ−1lg(xl,β))∣∣ ≥ρnu′∑lθ−2lg(xl,β)u′g(xl,β)1+ρu′θ−1lu−1n∣∣p∑j=1ej∑lθ−1lg(xl,β)∣∣ (\text{ where $e_{j}$ is the unit vector in the $j^{th}$ coordinate direction.}) ≥ρu′Su1+ρθlg∗−Op(m−12), (\text{where $g^{*}=\smash{\displaystyle\max_{l}}g(x_{l},\beta)$ and $S=\frac{1}{n}\sum_{l}\theta_{l}^{-2}g(x_{l},\beta)g^{\prime}(x_{l},\beta).$})

Since , where is the smallest eigen value of , then

 ρ1+ρθlg∗=Op(m−12)

So, .
Let . Then, .
Expanding ((A.1)),

 0=f(λ) =1n∑lθ−1lg(xl,β)[1−γl+γ2l1+γl] =1n∑lθ−1lg(xl,β)−1n∑lθ−1lg(xl,β)⋅γ+1n∑lθ−1lg(xl,β)γ2l1+γl =E(θ−1lg(xl,β))−Sλ+1n∑lθ−1lg(xl,β)γ2l1+γl. (A.2)

The last equality is since
By substituting , we have the final term of ((A.2));

 1n∑l∥θ−1lg(xl,β)∥3∥λ∥2|1+γl|−1=op(m12)Op(m−1)op(1)=op(m−12).

Therefore,

 0 =E(θ−1lg(xl,β))−Sλ+op(m−12) ⇒λ=S−1E(θ−1lg(xl,β))+op(m−12) ⇒λ=[1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1[1nn∑l=1θ−1lg(xl,β)]+op(m−12). (A.3)

Now, denote , , and . So ((A.2)) can be rewritten as,

 λ(β)=Vn(β)−1¯g(β)+ε.

Since so .
Let be any constant sequence such that , and . Denote the ball and the surface of the ball . For any , we have

 Vn(β) =1nn∑l=1θ−2lg(xl,β)g′(xl,β) =nk1kk∑l=1g1(xl,β0)g′1(xl,β0)+nn−k1n−kn∑l=k+1g2(xl,β0)g′2(xl,β0)+op(ϵ−1k) =nkEg1(xl,β0)g′1(xl,β0)+nn−kEg2(xl,β0)g′2(xl,β0)+op(ϵ−1k) Missing or unrecognized delimiter for \big

and

 ¯g(β0) =1nn∑l=1θ−1lg(xl,β) =1kk∑l=1g1(xl,β0)+1n−kn∑l=k+1g2(xl,β0) =1kop(k12)+1n−kop((n−k)12) =op(k−12)+op((n−k)−12) =op(m−12).

By the Taylor expansion, for any , we have

 lE(β) =∑lλ′(β)θ−1lg(xl,β)−12∑l[λ′(β)θ−1lg(xl,β)]2+op(1). (A.4)

The first term of ((A.4)) is;

 ∑lλ′(β)θ−1lg(xl,β) =[1nn∑l=1θ−1lg(xl,β)]′[1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1[1nn∑l=1θ−1lg(xl,β)] +op(1). (A.4.1)

The second term of ((A.4)) is:

 12∑l[λ′(β)θ−1lg(xl,β)]2 =12∑lλ′(β)θ−2lg(xl,β)g′(xl,β) =n2[1nn∑l=1θ−1lg(xl,β)]′[1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1 [1nn∑l=1θ−2lg(xl,β)g′(xl,β)][1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1[1nn∑l=1θ−1lg(xl,β)]+op(1) =n2[1nn∑l=1θ−1lg(xl,β)]′[1nn∑l=1θ−2lg(xl,β)g′(xl,β)]−1[1nn∑l=1θ−1lg(xl,β)]+op(1). (A.4.2)

Now,

 =n2(1n∑lθ−1lg(xl,β))′(1n∑lθ−2lg(xl,β)g′(xl,β))−1(1n∑lθ−1lg(xl,β))+op(1).

So we can rewrite ((A.4)) as,

 lE(β) =