Bootstrapping confidence intervals for the change-point of time series

Bootstrapping confidence intervals for the change-point of time series

Marie Hušková111Charles University of Prague, Department of Statistics, Sokolovská 83,
CZ – 186 75 Praha 8, Czech Republic;
   Claudia Kirch222Technical University Kaiserslautern, Departement of Mathematics, Erwin-Schrödinger-Straße,
D–67 653 Kaiserslautern, Germany;

We study an AMOC time series model with an abrupt change in the mean and dependent errors that fulfill certain mixing conditions. We obtain confidence intervals for the unknown change-point via bootstrapping methods.
Precisely we use a block bootstrap of the estimated centered error sequence. Then we reconstruct a sequence with a change in the mean using the same estimators as before. The difference between the change-point estimator of the resampled sequence and the one for the original sequence can be used as an approximation of the difference between the real change-point and its estimator. This enables us to construct confidence intervals using the empirical distribution of the resampled time series.
A simulation study shows that the resampled confidence intervals are usually closer to their target levels and at the same time smaller than the asymptotic intervals.

Keywords: confidence intervals, block bootstrap, mixing, change in mean

AMS Subject Classification 2000: 62G09, 62G15, 60G10

Acknowledgement: The work of the first author was partly supported by the grants GAČR 201/06/0186 and MSM 02162839.

1 Introduction

Recently a number of papers has been published on possible application of bootstrapping or permutation methods in change-point analysis, confer Hušková [17] for a recent survey. Most of these papers are concerned with obtaining critical values for the corresponding change-point tests. Another important issue in change-point analysis, however, is how to obtain confidence intervals for the change-point. In this paper we construct bootstrapping confidence intervals for the change-point in a model with dependent data.

We consider the following At-Most-One-Change (AMOC) location model


where , may depend on . The errors are stationary and strong-mixing with a rate specified below,




The purpose of this paper is to develop and study a bootstrap suitable for getting approximation of the distribution of the following class of change-point estimators



and .

There is a quite extensive literature concerning asymptotic behavior of change-point estimators for independent observations. For a survey of various results, see e.g. Dümbgen [13], Csörgő and Horváth [10] and Antoch et al. [1]. One of the first papers to derive the limit distribution for and independent errors under local changes has been written by Bhattacharya and Brockwell [7]. Dümbgen [13] considered a change in a general model for AMOC with independent observations and developed a suitable bootstrap. Antoch et al. [4] studied the asymptotic behavior of , in the model (1.1) with independent identically distributed errors and developed and studied a bootstrap valid for local changes. They also obtained various related results, such as rates of consistency for the estimators and their limiting distribution. Ferger and Stute [16] and Ferger [14, 15] studied change-point estimators based on -statistics for i.i.d. errors.

Bai [5] and Antoch et al. [3] analyzed the limit behavior of various estimators when the error sequence forms a linear process. However, they have not discussed bootstrapping.

Most of the theoretical results concerning bootstrap methods in change-point analysis (testing and estimation) have been obtained for independent observations, see e.g., Hušková [17] . Antoch and Hušková [2] obtained critical values for the change-point test related to functionals of for (”no change”) vs. (”there is a change in the mean”) using permutation methods (or equivalently, bootstrap without replacement) for the independent case. Recently, Kirch [18, 19, 20] has developed various bootstrap approximations for critical values for the above tests of ”no change” versus ”there is a change in the mean” suitable for the case of dependent observations that form a linear process. The results in [19] can also be modified in a straightforward way for dependent observations as discussed here.

In this paper we develop and prove the validity of a circular overlapping block bootstrap for obtaining asymptotically correct confidence intervals in the case of dependent errors.

In order to prove validity of the developed bootstrap scheme as well as to obtain the asymptotic under the null hypothesis for the change-point estimator we have to use some results like laws of (iterated) logarithm or large numbers for a triangular array. Therefore we need additionally to the assumptions (1.2) the following one for certain (in some cases ):

  1. Let be a strictly stationary sequence with . Assume there are with



    where is the corresponding strong mixing coefficient, i.e.

    where and vary over the -fields respectively

Under this assumption we get moment inequalities (cf. Yokoyama [26], Theorem 1, and Serfling [24], Lemma B, Theorem 3.1), which in turn yield laws of large numbers. Moreover the results remain true for triangular arrays that fulfill uniformly the assumptions above. For more details we refer to Kirch [18], Appendix B.2.

In fact we only need this assumption in order to obtain a Donsker type central limit theorem for the partial sums of the errors (to derive the asymptotic under the null hypothesis) as well as bounds on higher order moments of certain sums of the observed error sequence. This in turn yields laws of large numbers and laws of (iterated) logarithm. The proofs can easily be adapted to allow for errors that do not fulfill condition but the necessary moment conditions.

Example 1.1.

Suppose that the errors form a linear process

where the innovations are i.i.d. random variables with

We suppose that the weights satisfy

Corollary 4 in Withers [25] gives mild conditions under which linear sequences are strong mixing and even provides the mixing coefficients. This can be used to check condition (). Causal ARMA sequences with appropriate innovations, for example, fulfill it for any , if the -moment of the innovations exists.

For the sake of simplicity we will only consider the case in the following. The results for can be obtained in a similar way as outlined in Antoch et al. [4]. In the simulation study we will also consider other choices of , since the asymptotic method does not give such good approximations for . The reason is that the asymptotic distribution in this case depends on unknown parameters and thus in practice on estimators.

In the present paper we focus on local alternatives (i.e., , as ). To obtain results for fixed alternatives is more complicated because the limit distribution of the estimator is determined by finite sums, which depend on the underlying distribution function. Some comments concerning the i.i.d. case can be found in the survey paper by Antoch and Hušková  [1]. Furthermore Dümbgen [13] considers both, local and fixed changes, for independent observations in a somewhat more general setup, i.e. the parameters that are subject to change need not be location parameters.

In the following , .

2 Limit Distribution and Rate of Consistency for the Estimators

In this section we summarize and generalize some previous results by Antoch et al. [3, 4] that we need in the sequel.

The next theorem gives the rate of consistency for the change-point estimator as well as its limit distribution for a local change. For the i.i.d. case these results have been obtained by Antoch et al. [4], Theorem 1 and 2. The second result has been generalized for errors that form a linear process under the additional assumption by Antoch et al. [3] (Theorem 2.2).

Theorem 2.1.

Assume that (1.1)-(1.3) with are satisfied and, as


Moreover let assumption be fulfilled for some . Then:

  1. The following rate of consistency holds, as ,

  2. The change-point estimator fulfills, as ,


    where is a two-sided Wiener process and as in (1.3).

The proof is postponed to Section 5.

Remark 2.1.

We would like to point out that the result in a) also remains true for a fixed change, precisely it suffices to have we do not need . As a contrast we do need to obtain the limit distribution in b), but if this is not fullfilled the proof still shows .

Remark 2.2.

Also assertion (2.2) remains true, if one replaces the unknown quantity by a consistent estimator . It is also possible to replace by an estimator fulfilling , since then it follows under (2.1).

  1. The following Bartlett type estimator is a consistent estimator for under the conditions of Theorem 2.1 (cf. Theorem 1.1 in Berkes et al. [6], Lemma A.0.1 in Politis et al. [22], and Theorems 14.1 and 14.5 in Davidson [11]):



    and . should be chosen such that and .

  2. As an estimator for one can use


    Combining Theorem 2.1 a) with Theorem B.8 and Remark B.2 in Kirch [18] we obtain the rate

    under the assumptions of Theorem 2.1.

Remark 2.3.

It can be shown that the above limit distribution is continuous and explicitly known (confer Remark 2.3 in Antoch et Hušková [1]). Thus the above theorem can be used to construct asymptotic confidence intervals, precisely , where , for as in Remark 2.4 below. Note that does not depend on the unknown parameter .

Remark 2.4.

For the limit distribution depends on the unknown parameter . Precisely it can be shown that the limit is

Remark 2.3 in Antoch et Hušková [1] gives a closed formula for the limit distribution. We would like to point out a small (but for simulations very important) misprint there: The integral is equal to and not to .

3 Bootstrap approximations

Antoch et al. [4] propose a bootstrap with replacement of the estimated error sequence to obtain confidence intervals for the change-point. Since in our case the error sequence is no longer independent we have to use a slightly different approach here. We still bootstrap the estimated error sequence with replacement, but we will now use a circular moving block bootstrap as suggested by Politis and Romano [23]. It has the advantage over the regular moving blocks bootstrap by Künsch [21] that the sample mean is unbiased. Another possibility is to use non-overlapping blocks as suggested by Carlstein [9], but this bootstrap does behave slightly worse in simulations.

Kirch [18, 19] used block bootstrapping procedures (more precisely a block permutation method as well as a circular and non-circular block bootstrap) to get approximations for the critical values of the change-point test corresponding to the above problem.

Block bootstrapping methods split the observation sequence of length  into sequences of length . Then we put of them together to a bootstrap sequence (i.e. ). We keep the order within the blocks. and depend on and converge to infinity with .

The idea is that, for properly chosen block-length , the block contains enough information about the dependency structure so that the estimate is close to the null hypothesis.

We assume in the following that


Let be an estimator for , for , and for , e.g.


where as in (1.4). Remark 2.2 yields that fulfills assumption (3.5).

Define the estimated residuals and the centered residuals by

respectively. Throughout the paper the following representation will turn out to be very useful


Let be i.i.d. with for independent of the observations . Take the i.i.d. bootstrap sample , where for (hence the name circular bootstrap).

Consider the bootstrap observations

We now deal with the following bootstrap estimator of the change-point



Now we are ready to present results on the asymptotic behavior of the bootstrap estimator defined in (3.4) of the change-point together with a short discussion, how to apply the result to obtain confidence intervals.

With , , we will denote probability, expectation, variance,, given

Theorem 3.1.

Assume that (1.1)-(1.3) with and (3.1) hold. Moreover let


be fulfilled in addition to (2.1). Moreover let assumption be fulfilled for some


If , then

Since the limit distribution (for both the bootstrap as well as null asymptotic) is continuous (as has been pointed out by Remark 2.3) the described sampling scheme provides bootstrap approximations to the -quantile for arbitrary . Thus the bootstrap based approximation for the change-point can be constructed along the usual lines. Precisely the -bootstrap confidence interval is given by



Usually one uses the empirical bootstrap distribution of for say random bootstrap samples. Further discussions on bootstrap approximations of confidence intervals (for the similar case of i.i.d. errors) can be found in Antoch and Hušková [1].

Remark 3.1.

There are also several other possibilities of bootstrapping. For example we can use a non-circular approach and/or non-overlapping blocks. Simulations for the bootstrap where are i.i.d. uniformly distributed on and indicate that this bootstrap does not perform quite as good as the bootstrap proposed above.

4 Simulation Study

In the previous chapter we have established the asymptotic validity of the bootstrap confidence intervals. The question remains how well these confidence intervals behave for small samples and also how well they behave in comparison with the asymptotic intervals.

In this section we not only consider but also . The important difference is that the asymptotic confidence intervals depend on the unknown parameter for . Not surprisingly it turns out that the asymptotic intervals behave better for , whereas in all other cases it is better to use the bootstrap intervals.
Moreover we consider changes in the mean of . The latter ones can hardly be regarded as local changes, however we are still interested in the behavior of the bootstrap intervals, since we conjecture it will also be valid in those cases.

For the simulations we use an autoregressive sequence of order one as an error sequence with standard normally distributed innovations and different values of . We consider changes at . We use the estimator as in (2.4) respectively (3.2) and - for the asymptotic method - the Bartlett estimator given in (2.3) with , because in the simulation study conducted by Antoch et al. [3] this choice gave best results in the AR(1)-case.

The goodness of confidence intervals can essentially be determined by two criteria:

  1. The probability that the actual change-point is outside the (1-)-confidence interval should be close to (smaller than) .

  2. The confidence intervals should be short.

We visualize the first quantity by using CoLe-Plots (Confidence-Level-Plots) and the second one by using CoIL-Plots (Confidence-Interval-Length-Plots).

In fact we have done more simulations (such as QQ-plots or tables of the quantiles of ) for a large amount of different combinations of parameters as well as different possible bootstrap procedures. The problem, however, is that for the bootstrap they only give result for one specific underlying sequence and are thus rather not as informative. For this reason and also due to similarity of results as well as due to limitations of space we restrict ourselves to the following plots.
We explain how the plots are created using the example of asymptotic confidence intervals. The general version of Theorem 2.1 yields that the asymptotic confidence intervals are calculated using the distribution of , where is as in Remark 2.4 and . Note that

The CoLe-Plots now draw the empirical distributions function (based on observation sequences) of .
Thus for given on the -axis the plot shows the empirical probability that is outside the -confidence interval on the -axis, hence it visualizes . Optimally, the plot should be below or (even better) on the diagonal.

For the bootstrap confidence intervals the procedures works exactly the same but now the intervals are calculated using the (empirical, based on resamples) distribution of .
We calculate for observation sequences the length of the confidence intervals for levels . The empirical bootstrap distribution is based on random samples as before. Then we plot the mean using a thick line (as well as the upper and lower quartiles with thin lines), linearly interpolated. So these plots visualize the length of the intervals and thus .
Note that the scale on the -axis is not the same for different pictures. This way we can better compare the asymptotic with the bootstrap method.



(1) CoLe-Plot:
(2) CoIL-Plot:
(3) CoLe-Plot:
(4) CoIL-Plot:
(5) CoLe-Plot:
(6) CoIL-Plot:
(7) CoLe-Plot:
(8) CoIL-Plot:
Figure 4.1: CoLe- and CoIL-Plots for , , , and for the asymptotic method as well as bootstrap method with different block-lengths
(1) CoLe-Plot: ,
(2) CoIL-Plot: ,
(3) CoLe-Plot: ,
(4) CoIL-Plot: ,
(5) CoLe-Plot: ,
(6) CoIL-Plot: ,
(7) CoLe-Plot: ,
(8) CoIL-Plot: ,
Figure 4.2: CoLe- and CoIL-Plots for , , and for the asymptotic method as well as bootstrap method with different block-lengths


(1) CoLe-Plot:
(2) CoIL-Plot:
(3) CoLe-Plot:
(4) CoIL-Plot:
(5) CoLe-Plot:
(6) CoIL-Plot:
(7) CoLe-Plot:
(8) CoIL-Plot:
Figure 4.3: CoLe- and CoIL-Plots for , , , and for the asymptotic method as well as bootstrap method with different block-lengths

The plots are given in figures 4.1-4.2 for and 4.3 for . Concerning the CoIL-Plots we only plot the means for better readability. In Figure 4.4 we give the CoIL-Plot corresponding to Figure 4.1 (2) including the quartiles to give a better idea of the distribution of the length of the confidence interval.

Figure 4.4: CoIL-Plot for , , , , and , which additionally to Figure 4.1 (2) includes quartiles of the interval length

Concerning we see that for small the actual cover probability of the interval is too small for both methods, yet the asymptotic interval is somewhat better than the bootstrap intervals. At the same time the length of the asymptotic interval is very large, much larger than the length of the bootstrap interval. Frequently it is even longer than the observation sequence. We did not correct upper and lower bounds of the intervals by respectively , but bootstrap intervals can also be outside that possible range.

In fact it is somewhat surprising that even though the intervals are quite long the levels are not as good. The reason is that the change-point estimator for such a small change (and relatively few observations points) is frequently not very good. A typical example is an observation sequence with a change at , where the estimator suggests a change at . This results in intervals that do not contain the actual change-point. Also this leads to a wrong estimation of the parameters of the underlying asymptotic distribution, which is then highly skewed in the wrong direction. Thus the lower quantile of the interval is something around , whereas the upper quantile is far bigger than .

For more obvious changes the level of the intervals as well as the length becomes better. This is somewhat surprising in case of the asymptotic intervals because for fixed changes the asymptotic is not valid. The reason is that we have an interval around the change-point estimator, which is quite good for more obvious changes.

If the changes are closer to the border of the interval, the levels for both methods deteriorate somewhat. The same holds true for stronger correlation of the underlying error sequence.

Overall the bootstrap intervals behave better than the asymptotic intervals.

However, in the case of the asymptotic distribution does not depend on unknown parameters anymore. In this case the asymptotic confidence intervals for local changes are in fact better than the bootstrap intervals. The levels of both methods are for small somewhat worse than for , but the lengths are much better, especially for the asymptotic intervals. However, for more obvious changes the bootstrap intervals are again better than the asymptotic ones. This is due to the fact that the asymptotic does not hold in this case.

It is worth noting that the performance of the bootstrap method does not seem to depend significantly on the choice of the block-length. This is in contrast to the situation where we bootstrap critical values for change-point tests (cf. Kirch [19]) where a larger block-length was needed when the data was more dependent.

In real-life situations we recommend to rather use the bootstrap intervals, since they work no matter what and for both, local as well as fixed changes.

5 Proofs

Throughout the proofs we use the notation for .

We start with the proof of Theorem 2.1 in Section 2.

of Theorem 2.1.

We only sketch the proof, because it is very similar to the proof of Theorem 1 respectively 2 in Antoch et al. [4]. First note that

Simple calculations yield for


First we show assertion a), i.e. the rate of consistency for the change-point estimator. Theorem B.8 b) and Remark B.2 in Kirch [18] give

Similarly we get for , where is an arbitrary fixed constant,

Note that is increasing in for , so that


A similar argument gives

Hence assertion a) is proven.

For assertion b) we first need somewhat stronger bounds for the above sums, but only in a -stochastic sense. Theorem B.3 in Kirch [18] gives a Hájek -Rényi type inequality if certain moment conditions of the sums are fulfilled. This yields here ( arbitrary fixed constant)

where the last line follows because for all (in particular for ), and some

Analogously to above this yields for , where is an arbitrary fixed constant,

Similarly for

where the last rate is uniformly in . The proof can be finished analogously to the proof of Theorem 2 in Antoch et al. [4], where we now use Theorem 1 of Section 1.5 in Doukhan [12]. ∎

We will first formulate some auxiliary lemmas, which will enable us to prove the results in Section 3.

Lemma 5.1.

Let be a triangular array of row-wise i.i.d. random variables with and as , then


It is analogous to that of Theorem 16.1 in Billingsley [8], since the central limit theorem holds for triangular arrays and the proof of tightness also works analogously. ∎

Lemma 5.2.

Assume that (1.1)-(1.3) with and let (2.1) be fullfilled. Moreover let assumption be fulfilled for some , . If additionally (3.5), then

and for .

Remark 5.1.

More careful considerations concerning below even yield an almost sure rate of .

Remark 5.2.

’Estimator’ is closely related to the Bartlett window estimator with parameter if for this estimator one also uses a circularly extended series, precisely


For (the other case can be dealt with in a similar way) (3.3) yields the following decomposition


Theorem B.8 b) in Kirch [18] yields

Concerning we have

now Theorem 2.1 a) and (3.5) yield