Change-point tests under local alternatives for long-range dependent processes

# Change-point tests under local alternatives for long-range dependent processes

## Abstract

We consider the change-point problem for the marginal distribution of subordinated Gaussian processes that exhibit long-range dependence. The asymptotic distributions of Kolmogorov-Smirnov- and Cramér-von Mises type statistics are investigated under local alternatives. By doing so we are able to compute the asymptotic relative efficiency of the mentioned tests and the CUSUM test. In the special case of a mean-shift in Gaussian data it is always . Moreover our theory covers the scenario where the Hermite rank of the underlying process changes.

In a small simulation study we show that the theoretical findings carry over to the finite sample performance of the tests..

Keywords: asymptotic relative efficiency, change-point test, empirical process, local alternatives, long-range dependence.

## 1 Introduction

Over the last two decades various authors have studied the change-point problem under long-range dependence and classical methods are often found to yield different results than under short-range dependence. The CUSUM test is studied in Csörgő and Horvath (1997) and compared to the Wilcoxon change-point test in Dehling et al. (2012). Ling (2007) investigates a Darling-Erdős-type result for a parametric change-point test, and estimators for the time of change are considered in Horvath and Kokoszka (1997) and Hariz et al. (2009). Moreover, the special features of long memory motivated new procedures. Beran and Terrin (1996) and Horvath and Shao (1999) are testing for a change in the linear dependence structure of the time series and Berkes et al. (2006) and Baek and Pipiras (2011) construct tests in order to discriminate between stationary long memory observations and short memory sequences with a structural change. For a general overview of the change-point problem under long-range dependence see Kokoszka and Leipus (2001) and the associated chapter in Beran et al. (2013).

One of the classical change-point problems is the change of the marginal distributions of a time series . When testing for at most one change-point (AMOC) in the marginal distribution one often considers the empirical distribution function of the first observations and that of the remaining observations. Taking a distance between the empirical distributions and the maximum over all yields a natural statistic. Common distances are the supremum norm, which gives the Kolmogorov-Smirnov statistic

 Tn=max1≤k

or an -distance, which gives the Cramér-von Mises statistic

 Sn=max1≤k

Both are widely used for goodness-of-fit tests and two-sample problems. In the change-point literature they are considered by Szyszkowicz (1994) for independent data, by Inoue (2001) for strongly mixing sequences and by Giraitis et al. (1996b) for linear long-memory processes. However, note that in the LRD setting only the Kolmogorov-Smirnov test has been investigated.

(1.1) and (1.2) are functionals of the sequential empirical process, that is for and . Thus the asymptotic distributions of and rely on that of the sequential empirical process. For weakly dependent sequences this would be a Gaussian process, in the special case of independent random variables it is called Kiefer-Müller process. For stationary sequences that exhibit long-range dependence, Dehling and Taqqu (1989a) proved that the limit process is of the form , where is a deterministic function and the process is therefore called semi-degenerate. They considered subordinated Gaussian processes, in detail for any measurable function and a Gaussian sequence with non-summable autocovariance function. A similar limit structure was later obtained independently by Ho and Hsing (1996) and Giraitis et al. (1996a) for long-range dependent moving-average sequences.

It is the main goal of this paper to derive the limit distribution of change-point statistics of the type (1.1) and (1.2) under local alternatives. We then apply these results to derive the asymptotic relative efficiency (ARE) of several change-point tests. To this end we investigate the sequence

 G1(X1),…,G1(Xk∗),Gn(Xk∗+1,)…Gn(Xn), (1.3)

Here is a sequence of functions such that the distribution of converges to the distribution of in some suitable way.

Therefore, we are able to analyze various types of change-points, among them a mean-shift. Thus we may compute the ARE of Kolmogorov Smirnov, Cramér-von Mises, CUSUM and Wilcoxon test and get the surprising result that in case of Gaussian data it is always .
The mathematically most challenging case is the situation when the Hermite rank changes. The Hermite rank of the class is defined as the smallest positive integer, such that for some , with being the -th Hermite polynomial. The structure of the limiting process , e.g. the marginal distribution and the covariance structure, mainly depends on . However, a special feature of distributional changes in subordinated Gaussian processes is the fact that the Hermite rank may change, too. Hence the question arises which Hermite process will determine the limit distribution. Under a mean-shift the Hermite rank remains unchanged, which can be seen easily by its definition.

Our results differ in various ways from those obtained in Giraitis et al. (1996b), where changes in the coefficients of an LRD linear process were investigated. While the empirical process of LDR moving average sequences converges to fractional Brownian motion, we may encounter higher order Hermite processes. The possible change in the Hermite rank is therefore a novel feature in our investigation.

The rest of the paper is organized as follows. In section 2 we will state a limit theorem for the sequential empirical process under change-point alternatives. Moreover we will give the asymptotic distribution of the test statistics under the hypothesis of no change as well as under local alternatives. Thus we are able to derive the asymptotic relative efficiency of several change-point tests. In section 2.5 we consider the empirical process for long-range dependent arrays that are stationary within rows. The outcome mainly serves as a device for proving the main results, but is also of interest on its own. Section 3 contains the simulation study. To the best of our knowledge there are no results on the finite sample performance of the Cramér-von Mises change-point test under long memory. It is compared to other change-point tests and the effect of an estimated Hurst-coefficient is discussed. We obtain that the theoretical results (e.g. asymptotic relativ efficiency between Cramér-von Mises and CUSUM test) carry over to the finite sample performance of the tests. Finally proofs are provided in section 4.

## 2 Main results

Let be a stationary Gaussian process, with

 EXi=0,  EX2i=1  and  ρ(k)=EX0Xk=k−DL(k),

for and a slowly varying function . The non-summability of the covariance function is one possibility to define long-range dependence. We investigate our results for so called subordinated Gaussian processes , where and is a measurable function. The key tool in our analysis of possible changes in the marginal distribution of such a process is the sequential empirical process. To obtain weak convergence of this process the right normalization is given by , defined by

 d2n,m=Var(n∑i=1Hm(Xi))∼n2HLm(n), (2.1)

where the constant of proportionality is , see Theorem 3.1 in Taqqu (1975). is called Hurst coefficient and

 m=min{q>0 | E[1{G(X1)≤x}Hq(X1)]≠0 for some x},

is the Hermite rank of . The mentioned result of Dehling and Taqqu (1989a) then reads as follows.

###### Theorem A (Dehling, Taqqu).

Let the class of functions have Hermite rank m and let . Then

 1dn,m⌊nt⌋∑i=1(1{G(Xi)≤x}−F(x))D→Jm(x)m!Zm,H(t) (2.2)

where the convergence takes place in , equipped with the uniform topology. is defined by

 Jm(x)=E[1{G(X1)≤x}Hm(X1)]

and is an -th order Hermite process, see Taqqu (1979) for a definition.

###### Remark 2.1.

In the case , the Hermite process becomes the well known fractional Brownian Motion, which we denote by .

### 2.1 The empirical process under change-point alternatives

Let us consider the following change-point model. Define the triangular array

 Yn,i={G(Xi),if i≤⌊nτ⌋,Gn(Xi),if i≥⌊nτ⌋+1, (2.3)

for measurable functions and and unknown . For one gets a row-wise stationary triangular array, as considered in section 2.5, and for a stationary sequence, as in Dehling and Taqqu (1989a). In what follows we will denote the distribution functions of and by and , respectively.
To obtain weak convergence of the empirical process of (2.3) we have to make some assumptions on the structure of the change and the Hermite rank.

Assumption A:

1. The class of functions has Hermite rank with .

2. Let be the Hermite rank of and . Then we assume

 n(m−m∗)D(1+δ)/2supx∈\mathdsR(P(min{G(X1),Gn(X1)}≤x)−P(max{G(X1),Gn(X1)}≤x))→0,

for some .

###### Theorem 1.

If Assumption A holds, then

 1dn,m⌊nt⌋∑i=1(1{Yn,i≤x}−P(Yn,i≤x))D→Jm(x)m!Zm,H(t),

where is the Hermite coefficient of . The convergence takes place in , equipped with the uniform topology.

###### Remark 2.2.

(i) For given functions and , Assumption A2 might easily being checked, see the examples below. It serves to ensure convergence of the Hermite coefficients . In detail,

 supx∈\mathdsR(P(min{G(X1),Gn(X1)}≤x)−P(max{G(X1),Gn(X1)}≤x))→0

implies, see the proof of Lemma 4.5,

 supx∈\mathdsR|Jq,n(x)−Jq(x)|→0    ∀q∈\mathdsN. (2.4)

By Assumption A1, for all , yet for some . Together with (2.4) this implies .

(ii) Moreover, A2 implies convergence of the marginal distribution function. To see this, note

 |F(n)(x)−F(x)|= max{F(n)(x),F(x)}−min{F(n)(x),F(x)} ≤ P(min{G(X),Gn(X)}≤x)−P(max{G(X),Gn(X)}≤x)

and . However, the converse is not always true. Consider for instance the functions and or the situation in Example 2.8. Then again, there are lots of natural choices of and for whom convergence of the marginal distribution functions (with a certain rate) implies Assumption A2. Among them (mean-shift), (change in variance) and

 Gn(x)=F−1(n)∘Φ(x)    and    G(x)=F−1∘Φ(x).

(iii) Our assumptions explicitly allow for the Hermite rank to change together with the marginal distribution. Then again, the limit behaivior seems to be untouched by this change. Intuitively this corresponds to the idea that the change in distribution and the change in the Hermite coefficient, both caused by the difference of and , are of the same order. For this enforces the function to converge rather fast to . Technically this can be explained through A2. If this assumption is dropped, we might actually encounter limits with multiple Hermite processes. Such cases will be considered in Example 2.8 and Corollary 2.13.

(iv) If A1 is violated, the sequence is actually short-range dependent. For stationary observations Csörgő and Mielniczuk (1996) showed convergence of the sequential empirical process to a two-parameter Gaussian process. Change-point alternatives have not been considered for such random variables, yet, but would require fundamentally different proofs compared to our results.

### 2.2 Asymptotic behavior of the change-point statistics

We now apply the results concerning empirical processes to determine the asymptotic distribution of the Kolmogorov-Smirnov statistics

 (2.5)

and that of the Cramér-von Mises change-point statistic

 Sn=d−2n,msupt∈[0,1]∫\mathdsR∣∣ ∣∣⌊nt⌋∑i=11{Yn,i≤x}−⌊nt⌋nn∑i=11{Yn,i≤x}∣∣ ∣∣2 d^Fn(x). (2.6)

To get a non degenerate limit under a sequence of local alternatives it is important to choose the right amount of change. For a mean-shift this is naturally the difference of the expectations before and after the change. For a general change we formulate the test problem as follows: We wish to test the hypothesis

 H:  Assumption A1 holds and Gn(x)=G(x)% for all x∈\mathdsR and n≥1,

against the sequence of local alternatives

 An: Assumption A holds and, for n→∞, ndn,m(F(x)−F(n)(x))→g(x), (2.7) uniformly in x, where g(x) is a measurable % function of bounded total variation, whose support has positive Lebesgue % measure.
###### Remark 2.3.

Note that . Thus (2.7) implies

 n(m−m∗)D(1+δ)/2(F(x)−F(n)(x))→0,

for or . This again implies Assumption A2 for certain choices of functions and , see Remark 2.2 (ii).

###### Theorem 2.

(i) Under the hypothesis of no change we have, as ,

 TnD→supx∈\mathdsR|Jm(x)/(m!)|supt∈[0,1]∣∣~Zm,H(t)∣∣ and SnD→∫x∈\mathdsR(Jm(x)/(m!))2 dF(x)supt∈[0,1]∣∣~Zm,H(t)∣∣2,

where .
(ii) Under the sequence of local alternatives we have, as ,

 TnD→supx∈\mathdsRsupt∈[0,1]∣∣Jm(x)/(m!)~Zm,H(t)−g(x)ψτ(t)∣∣ and SnD→supt∈[0,1]∫x∈\mathdsR(Jm(x)/(m!)~Zm,H(t)−g(x)ψτ(t))2 dF(x),

where

 ψτ(t)={t(1−τ),if t≤τ,τ(1−t),if t>τ.

Motivated by this Theorem we consider change-point tests based on the statistics and . Critical values might be chosen as

 supx∈\mathdsR|Jm(x)/(m!)|q1−α,m,H    and    ∫x∈\mathdsR(Jm(x)/(m!))2 dF(x)q21−α,m,H,

for the Kolmogorov-Smirnov test and the Cramér-von Mises test, respectively. Here is the -quantile of . Thereby the tests have asymptotically level and nontrivial power against local alternatives.

The tests can be performed, if the right normalization for the empirical process, the supremum of and the distribution of are known. In practical applications this might be not the case. Solutions are self-normalization (Shao (2011)), estimating the the Hurst-coefficient (see for example Künsch (1987)) and bootstrap estimators for (Tewes (2016)).

### 2.3 Examples

###### Example 2.4 (Mean-shift).

Let with , then we get the typical change in the mean problem. In the case of long-range dependent subordinated Gaussian processes this was considered in Dehling et al. (2012, 2013), Csörgő and Horvath (1997), Shao (2011) and Betken (2016). Let be the probability density of , and assume that it is continuous and of bounded variation. Then we obtain

 ndn,m(F(x)−F(n)(x))=ndn,m(F(x)−F(x−μn))→CfG(x),

where, due to continuity of , the convergence holds uniformly.

###### Example 2.5 (Change in the variance).

To describe the change-in-variance-problem define , with . For ease of notation let . Then we get

 supx∈\mathdsR∣∣δ−1n(F(x)−F(n)(x))−xfG(x)∣∣ = supx∈\mathdsR∣∣δ−1n(F(x)−F(x−δnx))−xfG(x)∣∣ = supx∈\mathdsR∣∣∣xF(x)−(x−δnx)F(x−δnx)δnx−F(x−δnx)−xfG(x)∣∣∣ ≤ supx∈\mathdsR∣∣∣xF(x)−(x−δnx)F(x−δnx)δnx−(xfG(x)+F(x))∣∣∣ (2.8) +supx∈\mathdsR|F(x−δnx)−F(x)|. (2.9)

The derivative of is , hence (2.8) converges to . The convergence is uniform, if and are continuous. (2.9) converges to , because of continuity, monotonicity and boundedness of . Thus (2.7) holds with function Assume without loss of generality , then

 Pmax{G(X1),Gn(X1)}≤x) = P(σnG(X1)≤,G(x)≥0)+P(G(X1)≤x,G(X1)≤0) = {F(x/σn),if x≥0,F(x),if x<0.

The minimum can be treated analogously, hence Assumption A2 follows from convergence of the marginals.

Additionally one might consider a combined change in mean and variance, given through . In this case (2.7) holds with .

###### Example 2.6 (Generalized inverse of a mixture distribution).

By using the generalized inverse of a distribution function one could generate subordinated Gaussian processes with any given marginals, see for example Dehling et al. (2013). We use this for the change-point problem by setting

 G≡F−1∘Φ    and    Gn≡F−1(n)∘Φ.

For a continuous distribution function define the mixture

 F(n)(x)=(1−δn)F(x)+δnF∗(x),

with . Then (2.7) holds with and moreover

 P(max{G(X1),Gn(X1)}≤x)= P(max{F−1∘Φ(X1),F−1(n)∘Φ(X1)}≤x) = P(Φ(X1)≤min{F(x),F(n)(x)}) = min{F(x),F(n)(x)}.

Analogously one has . Hence

 P(min{G(X1),Gn(X1)}≤x)−P(max{G(X1),Gn(X1)}≤x)=|F(n)(x)−F(x)|,

thus Assumption A2 is also satisfied. For strongly mixing data similar local alternatives were considered by Inoue (2001).

###### Example 2.7 (χ2-distribution).

Consider a -distribution given through and note that the indicator functions have Hermite rank , see also Dehling and Taqqu (1989a). Further let

 Gn(x)={anx2,if x≥0,x2,if x<0,

with Hermite ranks for all . If , then one can show (similar to the case of a variance change in Example 2.5) that

 ndn,2(P(G(X1)≤x)−P(Gn(X1)≤x)) →C√xϕ(√x)1[0,∞)(x),

uniformly in . As Assumption A2 is satisfied, too, we may apply Corollary 2 (ii) with function and .

###### Example 2.8 (Multiple Hermite processes in the limit).

In the previous example, together with the marginal distribution, also the Hermite rank has changed. However, the limiting process seems to be untouched by this fact and one might ask whether this is intuitive or not.

It is caused by the fact that the change in the distribution and the change in the Hermite coefficients, both originating in the difference of the functions and , are of the same order.

To get an additional Hermite process in the limit, one would need , see Corollary 2.13 and its proof. But then

 ndn,2supx∣∣F(x)−F(n)(x)∣∣=ndn,1dn,1dn,2supx∣∣F(x)−F(n)(x)∣∣→∞,

and the test would have asymptotic power .

To achieve nontrivial asymptotic power one has to consider structural breaks that consists of two aspects and where only one is captured by the marginal distribution. To this end define the transformations

 G(x)=Φ−1(F(|x|))=Φ−1(2Φ(|x|)−1)

and

 Gn(x)=Φ−1(F∗(n)(G∗n(x))+μn,

where and

 G∗n(x)={anx2,if x≥0,x2,if x<0,

for some sequence with and . On the one hand, has Hermite rank and . On the other hand, has Hermite rank for all and . Now let , then Example 2.4 applies and we obtain

 ndn,2(F(n)(x)−F(x))=ndn,2(Φ(x−μn)−Φ(x))→Cϕ(x),

for any sequence . In contrast, the convergence of the Hermite coefficients is highly influenced by . If the sequence is chosen such that (therefore, it converges slower than ), then the sequential empirical process will converge towards

 K(x,t)={J2(x)/2Z2(t),if t≤τ,~J1(x)Z1(t)+J2(x)/2Z2(t)+,if t>τ.

Actually this can be proved similar to Corollary 2.13. Moreover, the Kolmogorov-Smirnov statistic converges weakly to

 supt∈[0,1]supx∈\mathdsR|K(x,t)−tK(x,1)−ψτ(t)Cϕ(x)|.

We find this example rather pathological, therefore such situations are excluded from the main results via Assumption A2.

### 2.4 Asymptotic relative efficiency

By studying the asymptotic distributions under local alternatives one might compare different tests in terms of the asymptotic relative efficiency (ARE). Here we give a precise definition of the ARE in the very special context of our change-point setting. The general idea is due to Pitman (1948) (for a published article see for example Noether (1950)) and was formalized in Noether (1955). Of course it can be extended to all kinds of testing procedures.

###### Definition 2.9.

Let and represent two change-point test procedures. Consider the local alternatives

 (G,Gnk,τ) and a sample size (nk)k, (G,~Gmk,τ) and a sample size (mk)k,

such that for all and .

Let be the asymptotic power of the test against the local alternatives given by and be the asymptotic power of the test against the local alternatives given by . If equals , then the asymptotic relative efficiency (ARE) of the tests and is defined as

 ARE(T1,T2)=limk→∞mknk.
###### Example 2.10 (Mean-shift in Gaussian data).

Consider and , in other words a mean-shift in Gaussian data. As for the Hermite coefficient function, we get , where is the standard normal probability density. Thus, according to Corollary 2, the test statistic converges towards

whereas under the Null, that is we have a stationary standard Gaussian sequence, the limit distribution would be

For the Cramér-von Mises statistic we obtain analogously the limit distributions

 ∫ϕ3(x)dxsupt∈[0,1]∣∣~BH(t)+Cψτ(t)∣∣2  and  ∫ϕ3(x)dx,supt∈[0,1]∣∣~BH(t)∣∣2

under local alternative and hypothesis, respectively. Hence in this special case the CUSUM test, the Wilcoxon test (see Dehling et al. (2013) for each), the Kolmogorov-Smirnov test and the Cramér-von Mises test all have the same asymptotic power, namely

 P(supt∈[0,1]|~BH(t)+Cψτ(t)|>q1−α,H), (2.10)

where is the -quantile of the maximum of a fractional Brownian bridge .
As a direct consequence, one gets that the ARE of the four tests is . This result is quite surprising, keeping in mind that CUSUM and Wilcoxon tests are designed to detect level-shifts, while our tests have power against all kinds of distributional changes.

For non-Gaussian data and change-points beyond a simple mean-shift, the investigation of the ARE is not that straightforward. In fact, little is known about the distribution of

 supt|~BH(t)+f(t)|,

and even less if higher order Hermite processes are considered. This seems to prevent a precise computation of the ARE in many cases. However, one might derive lower bounds for the efficiency as we do in next example for a combined change in mean and variance. Unlike in the previous example, we will make use of the subtle definition of the ARE.

###### Example 2.11 (Combined change in mean and variance).

Let and , that is a combined change of mean and variance in Gaussian data. If further and , then by example 2.5 the empirical bridge-type process converges to (for the sample size )

 ϕ(x)~BH(t)+ϕ(x)(C1+C2x)ψτ(t),    x∈\mathdsR, t∈[0,1].

We now consider slightly modified Cramér-von Mises and CUSUM tests, in detail, instead of the supremum is taken over for some and .

The asymptotic distribution of the CUSUM test has been derived in Dehling et al. (2013), but only in the case of a mean-shift with constant variance. However, for the CUSUM statistic is a continuous functional of the sequential empirical process. Thus, we might apply our Theorem 1 and conclude that the CUSUM statistic converges under this type of local alternatives to

 supt∈[κ1,κ2]∣∣∣∫ϕ(x)(~BH(t)+(C1+C2x)ψτ(t))dx∣∣∣ = supt∈[κ1,κ2]∣∣~BH(t)+C1ψτ(t)∣∣.

Note that this is the same limit as under a mean-shift with constant variance and thus, too, the asymptotic power is the same as in example 2.10.

The limiting distribution of the Cramér-von Mises statistic is given by

 Z2= supt∈[κ1,κ2]∫ϕ3(x)(~BH(t)+(C1+C2x)ψτ(t))2dx =

and for its asymptotic power we obtain

 P(Z2>q21−α,H∫ϕ3(x) dx) = P(Z2>q21−α,H∫ϕ3(x) dx , supt∈[κ1,κ2]{~BH(t)}>q1−α,H) (2.11) +P(Z2>q21−α,H∫ϕ3(x) dx , supt∈[κ1,κ2]{~BH(t)}≤q1−α,H).

First assume and consider , given by

 C∗1= f∗(C1,C2,q,τ,κ1,κ2) = mint∈[κ1,κ2]⎧⎪ ⎪⎨⎪ ⎪⎩√q2+2qC1ψτ(t)+(C21+C22(∫ψ3(x)x2dx/∫ϕ3(x)dx))ψ2τ(t)−qψτ(t)⎫⎪ ⎪⎬⎪ ⎪⎭.

Now is constructed in a way, such that 2

 C∗1>C1

and for all with

 Z2= >

If, on the other hand, , then (because ) automatically . Combining these two findings with (2.11) we can bound the asymptotic power from below by

 P(Z2>q21−α,H∫ϕ3(x) dx) = P(supt∈[κ1,κ2]∫ϕ3(x)(~BH(t)+C∗1ψτ(t))2 dx>q21−α,H∫ϕ3(x)dx) ≥ P(supt∈[κ1,κ2]∣∣~BH(t)+C∗1ψτ(t)∣∣>q1−α,H), (2.12)

for .

Now we are ready to compute the ARE. To this end we chose different sample sizes for both test. In detail, for the Cramér-von Mises test and for the CUSUM test. Moreover, the local alternatives are such that for all , consequently

 μ(1)nk=μ(2)mk=μk    and    σ(1)nk=σ(2)mk=σk.

For the CUSUM test, in order to achieve at least asymptotic power , its limit distribution has to satisfy

 P(supt∈[0,1]|~BH(t)+C∗1ψτ(t)|>qα,H)≥β.

In other words, , where

 π(C∗)=P(supt∈[0,1]|~BH(t)+C∗1ψτ(t)|>qα,H)