Convergence rates of empirical block length selectors for block bootstrap

# Convergence rates of empirical block length selectors for block bootstrap

## Abstract

We investigate the accuracy of two general non-parametric methods for estimating optimal block lengths for block bootstraps with time series – the first proposed in the seminal paper of Hall, Horowitz and Jing (Biometrika 82 (1995) 561–574) and the second from Lahiri et al. (Stat. Methodol. 4 (2007) 292–321). The relative performances of these general methods have been unknown and, to provide a comparison, we focus on rates of convergence for these block length selectors for the moving block bootstrap (MBB) with variance estimation problems under the smooth function model. It is shown that, with suitable choice of tuning parameters, the optimal convergence rate of the first method is where denotes the sample size. The optimal convergence rate of the second method, with the same number of tuning parameters, is shown to be , suggesting that the second method may generally have better large-sample properties for block selection in block bootstrap applications beyond variance estimation. We also compare the two general methods with other plug-in methods specifically designed for block selection in variance estimation, where the best possible convergence rate is shown to be and achieved by a method from Politis and White (Econometric Rev. 23 (2004) 53–70).

\kwd
\aid

0 \volume20 \issue2 2014 \firstpage958 \lastpage978 \doi10.3150/13-BEJ511 \newproclaimconditionCondition \newremarkremarkRemark

\runtitle

Empirical block length selectors

{aug}

1]\fnmsDaniel J. \snmNordman\corref\thanksref1label=e1]dnordman@iastate.edu and 2]\fnmsSoumendra N. \snmLahiri\thanksref2label=e2]snlahiri@stat.tamu.edu

jackknife-after-bootstrap \kwdmoving block bootstrap \kwdoptimal block size \kwdplug-in methods \kwdsubsampling

## 1 Introduction

Performance of block bootstrap methods critically depends on the choice of block lengths. A common approach to the problem is to choose a block length that minimizes the Mean Squared Error (MSE) function of block bootstrap estimators as a function of the block length. For many important functionals, expansions for the MSE-optimal block lengths are known. If denotes an estimator of a parameter of interest based on a stationary stretch , examples of relevant functionals of the distribution of include the bias , variance , and the distribution function (i.e., given and where represents either the variance of or an estimator of this, cf. [11]). If denotes a block bootstrap estimator of based on block length , then as the bias and variance of often admit expansions of the form

 n2aVar(^φn(ℓ))=V0ℓrn(1+o(1)),naBias(^φn(ℓ))=−B0ℓ(1+o(1)) (1)

for some known constants depending on (e.g., for functionals , while for the distribution function when ) and lead to a large sample approximation of MSE-optimal block size given by

 ℓ0n≡ℓ0n(φ)=C0n1/(r+2)(1+o(1)),C0≡(2B20rV0)1/(r+2), (2)

involving population quantities that depend on the functional , the bootstrap method, and various parameters of the underlying process. For smooth function model statistics (described below), these expansions (1) have been established for the moving block and non-overlapping block methods [5, 7, 8, 11] and, in particular, are also known for the variance functional with other block bootstraps, such as the circular block bootstrap [18] and stationary bootstrap [13, 19]; see [10] and references therein. However, as the theoretical approximations (2) for the optimal block lengths typically depend on different unknown population parameters of the underlying process in an intricate manner, these are not directly usable in practice.

Different data-based methods for the selection of optimal block lengths have been proposed in the literature. One of the most popular general methods is proposed by Hall, Horowitz and Jing [5] (hereafter referred to as HHJ) which employs a subsampling method (cf. [19]) to construct an empirical version of the MSE function and minimizes this to produce an estimator of the optimal block length. We will refer to this approach as the HHJ method. A second general method for selecting the optimal block length is put forward by Lahiri et al. [11]. This method is based on the jackknife-after-bootstrap method of Efron [3] and its extension to block bootstrap by Lahiri [9]. For reasons explained in [11] (see also Section 2 below), we will refer to this method as the non-parametric plug-in method (or the NPPI method, in short). Both the HHJ and NPPI methods are called “general” because these can be used in the same manner across different functionals (e.g., bias, variance, distribution function, quantiles, etc.) to find the optimal block size for bootstrap estimation, without requiring exact analytical expressions for the corresponding optimal block length approximation (2) (i.e., without requiring explicit forms for quantities ). In particular, for a given functional, the HHJ method aims to directly estimate the constant in the optimal block approximation (2) while the NPPI method separately and non-parametrically estimates the bias and variance quantities in (2) without structural knowledge of these. Our major objective here is to investigate the convergence rates of these two general methods. For instance, despite the popularity of the HHJ method, little is theoretically known about its properties for block selection or how this compares to the NPPI method. As a context to compare the methods, we focus on their performance for block selections in variance estimation problems with the block bootstrap. In the literature, a few other block length selection methods also exist. These are primarily plug-in estimators which necessarily require an explicit expression for the optimal block approximation (2) for each specific functional and for each block bootstrap method (i.e., requiring exact forms for ) and are not the focus of this paper. However, two popular plug-in methods for the variance functional in the latter category are given by Bühlmann and Künsch [2] and Politis and White [21] (and its corrected version Patton, Politis and White [16]). For completeness, we later compare the performance of the two general methods with these plug-in methods for block selection in variance estimation.

For concreteness, we shall restrict attention to the moving block bootstrap (MBB) method [7, 12], which was the original focus of the HHJ method [5] and the plug-in method of Bühlmann and Künsch [2] and shares close large-sample connections to other block bootstrap methods (e.g., circular block bootstrap, non-overlapping block bootstrap, untapered version of the tapered block bootstrap) [8, 13, 15, 21]. Further, we shall work under the smooth function model of Hall [4] (see Section 2.1 below) which provides a convenient theoretical framework but, at the same time, is general enough to cover many commonly used estimators in the time series context ([10]; Chapter 4). Accordingly, let be an estimator of a parameter of interest under the smooth function model and suppose that the MBB is used for estimating or its limiting form

 σ2∞≡limn→∞nVar(^θn). (3)

Let

 MSEn(ℓ)≡E{^σ2n(ℓ)−σ2∞}2 (4)

denote the MSE of the MBB variance estimator based on blocks of length and a sample of size . (Defining the MSE with or makes no difference in the following and, for clarity, it is helpful to fix a target in defining (4) throughout.)

The theoretical MSE-optimal block size is given by

 ℓoptn=argmin{MSEn(ℓ)\dvtℓ∈Jn}, (5)

where is a suitable set of block lengths including the optimal block length. As alluded to above (1), under some standard regularity conditions, it can be shown that

 MSEn(ℓ)≈fn(ℓ)≡B20ℓ−2+V0n−1ℓ,ℓ∈Jn,

where and are population parameters arising, respectively, from the bias and variance of the MBB variance estimator . Let denote the minimizer of the asymptotic approximation to the MSE function, where (cf. (2)). As a first step towards investigating the accuracy of different empirical block rule selection methods, we consider the relative error of this theoretical approximation and show that

 ℓoptn−ℓ0nℓ0n=O(n−1/3)

as . Thus, the true optimal block size and the optimal block size determined by the asymptotic approximation to the MSE curve of the block bootstrap estimator differ by a margin of on the relative scale. In general, this rate cannot be improved further. As a result, for empirical block length selection rules involving estimation steps that target (which all existing methods do), the upper bound on their accuracy for estimating the true optimal block length is .

Next, we consider the convergence rates of the two general methods. Let and , respectively, denote the estimators of the optimal block length based on the HHJ and NPPI methods. We show that under some mild conditions and with a suitable choice of the tuning parameters,

 ^ℓoptn,HHJ−ℓoptnℓoptn=Op(n−1/6)

as . Thus, the (relative) rate of convergence of the HHJ estimator of the optimal block length is . The block length in block bootstrap methodology plays a role similar to a smoothing parameter in non-parametric functional estimation. It is well known (cf. [6]) that non-parametric data based rules for bandwidth estimation often have an “excruciatingly slow” (relative) rate of convergence (e.g., of the order of ). The convergence rate of the HHJ method turns out to be relatively better. It is worth noting that the HHJ block estimator, based on the overlapping version of the subsampling method, has the same rate of convergence irrespective of the dependence structure of the underlying time series . Additionally, in the process of determining this convergence rate, we also provide the theoretical guidance on optimally choosing two tuning parameters required in implementing the HHJ method, which has been an unresolved aspect of the method.

Next, we consider the NPPI method and compare its relative performance with the HHJ method. The rate of convergence of the NPPI method is determined by two factors, which arise from estimating the variance and the bias of a block bootstrap estimator (i.e., quantities and appearing in , ). The factor due to the variance part is based on the (block) jackknife-after-bootstrap method [3, 9], and it attains an optimal rate of , with a suitable choice of the tuning parameters. On the other hand, the second factor is determined by a non-standard bias estimator that turns out to be adaptive to the strength of dependence of . Let denote the autocovariance function of (a suitable linear function of) the ’s. When as for a suitably large , the rate of convergence of the second term can be as small as , for a given , with a suitable choice of the tuning parameters. Thus, combining the two, the optimal rate of convergence of the NPPI method becomes , which is better than optimal rate for the HHJ method. For this to hold, the user needs to specify two tuning parameters, the same number as with the HHJ method. Also, the convergence rate is interesting in the variance estimation problem because this matches the best rate obtained by the plug-in block selection method of Bühlmann and Künsch [2]. Their method is a four-step algorithm which uses lag weight estimators of the spectral density at zero and again requires explicit forms for quantities appearing in the bias and variance (e.g., ) of the MBB variance estimator. Hence, while the NPPI method for block selection applies more generally to other functionals, its convergent rate matches the optimal one for a plug-in method specifically tailored to the variance estimation problem. This provides some evidence supporting the use of the NPPI method in block selection with other functionals outside of variance estimation.

The rest of the paper is organized as follows. In Section 2, we briefly describe the smooth function model, the MBB and the empirical block length selectors proposed by HHJ [5] and Lahiri et al. [11]. In Section 3, we present the conditions and derive a general result on uniform approximation of the MSE of a block bootstrap estimator which may be of independent interest. We describe main results on the HHJ and the NPPI methods in Sections 4 and 5, respectively. In Section 6, we compare the general HJJ/NPPI methods with other plug-in block selection approaches for the MBB in the variance estimation problem. In particular, a plug-in method of Politis and White [21] (see also [16]) is shown to achieve the best possible convergence rate for block selection with variance functionals. Section 7 sketches proofs of the main results, where full proofs are deferred to a supplementary material appendix [14].

## 2 Preliminaries

### 2.1 MBB variance estimator and optimal block length

Let be a stationary stretch of -valued random vectors with mean . We shall consider the problem of estimating the variance of a statistic framed in the “smooth function” model [4]. Using some function and the sample mean , suppose that a statistic can be expressed as for purposes of estimating a process parameter . The “smooth function” model covers a wide range of parameters and their estimators, including sample mean, sample autocovariances, Yule–Walker estimators, among others; see Chapter 4, [10] for more examples. Recall the target variance of interest is or its limit (3).

We next describe the MBB variance estimator. Let (set of positive integers) denote the block length and create overlapping length blocks from as , where for any integer . We independently resample blocks by letting denote i.i.d. random variables with a uniform distribution over block indices and then define a MBB sample of size as , where denotes the integer part of a real number . The MBB analog of is given by using the MBB sample mean and the MBB variance estimator is then defined as

 ^σ2n(ℓ)≡n1Var∗(^θ∗n),

where denotes the variance with respect to the bootstrap distribution conditional on the data .

For variance estimation, we briefly consolidate notation from Section 1 on optimal block lengths. The performance of the MBB again depends on the block choice . Under certain dependence conditions and block assumptions (), the asymptotic bias and variance of the MBB estimator are

 E^σ2n(ℓ)−σ2∞=−B0ℓ(1+o(1)),Var[^σ2n(ℓ)]=V0ℓn(1+o(1)) (6)

as , for some population parameters depending on the covariance structure of the underlying process (cf. [5, 7] and Condition 3.1 of Section 3.1). Thus, the main component in MSE (4) of the MBB follows as

 MSEn(ℓ)≈fn(ℓ)≡B20ℓ2+V0ℓn (7)

as . The minimizer of is given by

 ℓ0n≡C0n1/3, (8)

where . From (7) and (8), the optimal block minimizing behaves as in large samples [5, 7, 10]. As a result, to examine properties of the block length selection methods, we shall create a collection of block lengths , for a suitably large constant such that , and formally define the optimal block size as in (5).

### 2.2 The Hall–Horowitz–Jing (HHJ) block estimation method

The HHJ [5] method seeks to estimate the optimal block size by minimizing an empirical version of the MSE (4) created by subsampling (data blocking). Let denote a sequence satisfying as , which serves to define the length of subsamples , . For each subsample, let denote the MBB variance estimator resulting from resampling length blocks from observations . For clarity, note that MBB block lengths on size subsamples are denoted by “,” while “” denotes MBB block lengths applied to the original data . To approximate the error in MBB variance estimation incurred by using length blocks in samples of size , we form a subsampling estimator

 ˆMSEm(b)=1n−m+1n−m+1∑i=1[^σ2i,m(b)−^σ2n(~ℓn)]2, (9)

where the initializing MBB estimator of is based on the entire sample and on a plausible pilot block size . By minimizing over , we formulate

 ^boptm,HHJ=argmin{ˆMSEm(b)\dvtb∈Jm} (10)

as an estimator of the theoretically optimal MBB block length for a size sample, with

 boptm=argmin{MSEm(b)\dvtb∈Jm}. (11)

Next, is a rescaling step that involves approximating true optimal block length with the minimizer of MSE-approximation (7). That is, as is the “size sample version” of in (5), one uses the large-sample block approximation and from (8) to re-scale and subsequently define the HHJ estimator of as

 ^ℓoptn,HHJ=(n/m)1/3^boptm,HHJ. (12)

Hence, the HHJ method requires specifying both a subsample size and a pilot MBB block size , which impact the performance of the block estimator .

#### An oracle-like subsampling MSE

For purposes of comparison with the HHJ method, we also define a second subsampling MSE given as

 ˆMSE∞m(b)=1n−m+1n−m+1∑i=1[^σ2i,m(b)−σ2∞]2, (13)

which resembles the empirical MSE (9) after replacing the variance estimator with its target from (3). This subsampling MSE serves to remove one tuning parameter in the original HHJ method by unrealistically assuming is known. However, we may parallel the performance of the HHJ block estimators and to their oracle-like counterparts

 ^bopt,∞m,HHJ=argmin{ˆMSE∞m(b)\dvtb∈Jm} (14)

based on (13) and the resulting estimator of the optimal block length given by

 ^ℓopt,∞n,HHJ=(n/m)1/3^bopt,∞m,HHJ. (15)

Both and estimate the same optimal block size , but the estimator is based on an unbiased subsampling criterion through knowledge of , that is, for all .

### 2.3 The non-parametric plug-in (NPPI) method

The NPPI method is based on the non-parametric plug-in principle [11] which yields estimators of MSE optimal smoothing parameters in general non-parametric function estimation problems. Here we describe the method for estimating the optimal block length for the variance functional using the MBB. Like any plug-in method, the target quantity for the NPPI method is the minimizer of the MSE-approximation of (7), which again is of the form from (8) with population parameters and in determined by the bias and variance expansion (6) of the MBB variance estimator. The NPPI method estimates the bias and the variance of the MBB estimator non-parametrically, and then estimates and by inverting (6). Specifically, the method constructs estimators and satisfying

 ˆVARVar(^σ2n(ℓ1))\lx@stackrelp→1,ˆBIASBias(^σ2n(ℓ2))\lx@stackrelp→1as n→∞

for some block lengths and and defines and . Then, the estimator of the optimal block length is given by

 ^ℓ0NPPI=[2^B20/^V0]1/3n1/3. (16)

The bias estimator for the NPPI method is

 ˆBIAS=2[^σ2n(ℓ2)−^σ2n(2ℓ2)]

and the variance estimator is constructed using the jackknife-after-bootstrap (JAB) method [3, 9], due to its computational advantages. For completeness, we next briefly describe the details of the JAB variance estimator.

{remark}

Politis and Romano [20] considered an estimator related to above for bias-correcting the Bartlett spectral estimator (e.g., at the zero frequency, this Bartlett estimator is asymptotically equivalent to and their corrected estimator is equivalent to ). It is also important to re-iterate that, while the NPPI block estimator is based on general forms (cf. (1), (6)) for the asymptotic bias and variance of a bootstrap estimator, the HHJ block estimator requires only the optimal block order (cf. (2), (8)) for minimizing the asymptotic MSE of a bootstrap estimator; in this sense, the HHJ method requires less large-sample information and could potentially be more general. At the same time, as the MSE-optimal block order is typically derived from asymptotic bias/variance quantities, both NPPI and HHJ methods are generally intended to apply for block selection with the same problems, particularly under the smooth function model.

#### The jackknife-after-bootstrap variance estimator

The JAB method was initially proposed by [3] to assess accuracy of bootstrap estimators for independent data, and was extended to the dependent case by [9]. A key advantage of the JAB method is that it does not require a second level of resampling; the JAB method produces a variance estimate of a block bootstrap estimator by merely regrouping the resampled blocks used in computing the original block bootstrap estimator [9].

Suppose that the goal is to estimate the variance of an MBB estimator based on blocks of length . (For notational simplicity here, consider and .) Let be an integer such that as . Here, denotes the number of bootstrap blocks to be deleted for the JAB. Set , and for , let . Also, let , be the MBB blocks of size . The first step of the JAB is to define a jackknife version of for each . Then, the th block-deleted jackknife point value is obtained by resampling blocks randomly, with replacement from the reduced collection and then by computing the corresponding block bootstrap variance estimator using the resulting resample.

Then, the JAB estimator of the variance of is given by

 ˆVARJAB(^φn)=m(N−m)1MM∑i=1(~φ(i)n−^φn)2, (17)

where is the th block-deleted jackknife pseudo-value of , .

## 3 Results on uniform expansion of the MSE

### 3.1 Assumptions

To develop MSE and other probabilistic expansions, we require conditions on the dependence structure of the stationary -valued process and the smooth function , described below. Condition 3.1 prescribes differentiability assumptions on the smooth function , Condition 3.1 describes mixing/moment assumptions as a function of positive integer , and Condition 3.1 entails certain covariance sums are non-zero. In particular, the sums in Condition 3.1 define the constant in the large-sample optimal block approximation from (8). For , write in the following.

{condition}

The function is 3-times continuously differentiable and , for some and integer .

{condition}

For some , and , where denotes the strong mixing coefficient of the process .

{condition}

and for in (3), where , and is the vector of first order partial derivatives of at .

Mixing and moment assumptions as formulated in Condition 3.1 are standard in investigating block resampling methods (cf. [10], Chapter 5). Typical expansions of the MSE of the MBB variance estimator often require to be 2-times differentiable in the smooth function model, whereas Condition 3.1 requires slightly more in order to determine a finer expansion of this MSE. The assumptions on the process quantities in Condition 3.1 are mild and standard for the block bootstrap [5, 7, 8, 11, 13, 15, 19]; in particular, the assumption on is needed to rule out i.i.d. processes.

### 3.2 Main results

Recalling the MSE-approximation for the MBB variance estimator from (7) (with constants as in Condition 3.1 above), Theorem 1 below provides a more refined expansion of this MSE over a collection of block lengths, as in (5) (cf. Section 2.1), of optimal order.

###### Theorem 1

Suppose that Conditions 3.1, 3.1 with , and Condition 3.1 hold, where is as specified by Condition 3.1. Then, as , {longlist}[(ii)] (i) for defined in (7), (ii) , for from (8).

Theorem 1(i) gives a close bound how the MSE-approximation matches the curve (not quite but both having the same minimizer), uniformly in . For comparison, note , , has exact order . In trying to resolve , we then have a general bound on the differences between the two curves. One implication, stated in Theorem 1(ii), is that becomes the general order on the discrepancy between the minimizer of and the minimizer of . Theorem 1 bounds cannot be generally improved by further expanding (i.e., under additional smoothness assumptions on ) and, in fact in Theorem 1(ii), is necessarily an integer while need not be.

## 4 Results on the HHJ method

To state the main result, recall denotes the HHJ block estimator (12), depending on a pilot block and subsample size , and that from (15) denotes an oracle-like version of that requires but not .

###### Theorem 2

Suppose that Conditions 3.1, 3.1 with , and Condition 3.1 hold, with as specified by Condition 3.1. Assume that as with . {longlist}[(ii)] (i) Then, as , (ii) If additionally and , then

{remark}

Theorem 2 also holds if, on the left-hand sides above, we replace and with their subsample counterparts and . This result helps to reinforce the notion that the quality of block estimation at the subsample level determines the performance of HHJ method.

Theorem 2(i) indicates how the subsample size affects the convergence rate of the oracle-type block estimate. It follows from Theorem 1(i) that, with oracle knowledge of , the best possible (fastest) rate of convergence for is achieved when the subsample size . The choice balances the sizes of all three terms in the bound from Theorem 2(i). Remark 4 below provides some explanation of the probabilistic bounds in Theorem 2(i).

In Theorem 2(ii), we impose some additional block growth conditions on the pilot block and subsample size in the HHJ method, which are mild and help to concisely express the order of the main components contributing to the error rate. While the combined effects of the tuning parameters are complicated and difficult to characterize in Theorem 2(ii), a block of MSE-optimal order for the pilot MBB variance estimator in the HHJ method is an intuitive starting point. And with this choice, it follows that is then optimal for minimizing the convergence rate of the HHJ block estimator, which becomes . In fact, the selection is overall optimal and simultaneously balances the order of all six error terms in Theorem 1(ii). So surprisingly, the HHJ block estimator achieves the best convergence rate that one could hope for by matching the optimal rate of the oracle block estimator . We summarize our findings on tuning parameters in Corollary 1.

###### Corollary 1

Under the assumptions of Theorem 2, a subsample size and pilot block yield optimal convergence rates

 ∣∣∣^ℓopt,∞n,HHJ−ℓoptnℓoptn∣∣∣=Op(n−1/6),∣∣∣^ℓoptn,HHJ−ℓoptnℓoptn∣∣∣=Op(n−1/6)

as , for the HHJ block estimator and its oracle version.

An interpretation of Corollary 1 is that, at optimal tuning parameters, random fluctuations in the HHJ block estimator are of the order . This behavior interestingly resembles that of some other kernel bandwidth estimators based empirical MSE criteria (cf. [22]), though does not take its arguments from a continuum of real-values.

{remark}

We provide a brief explanation of the probabilistic bounds in Theorem 2, and focus mainly on the behavior of oracle block estimator from Section 2.2.1 at the subsample level; more rigorous details are given in Section 7 and the supplementary material [14]. Recall the block estimator minimizes the from (13), while from (11) minimizes (the subsample version of (7)). In part, the bound in Theorem 2(i) is due to smoothness issues with and its discrepancy from (cf. Theorem 1). The other bounds in Theorem 2 arise from the size of

 Missing or unrecognized delimiter for \bigl (18)

, where ; this quantity measures the discrepancy between two differenced curves (which should ideally match at ), where differences in serve to identify and similar differences in identify . It can be shown that, for any ,

 maxb∈Jm\dvt|b−boptm|≤anm1/3a−1/2nm2/3(n/m)1/2∣∣Δ∞m(b)∣∣

remains stochastically bounded on shrinking neighborhoods of block lengths around , while at the same time (i.e., is consistent for ); see the auxiliary result, Theorem 6, of Section 7. This allows other order bounds on to be determined by recursively “caging” in decreasing neighborhoods around with high probability. The probabilistic bounds in Theorem 2(ii) are partly due to error contributions from the MBB variance estimator used through in (9) to estimate at the subsample level.

## 5 Results on the NPPI method

Next, we consider the convergence rates of the optimal block length selector based on the NPPI method. Recall that , , where , .

###### Theorem 3

Suppose that Conditions 3.1, 3.1 with , and Condition 3.1 hold, with as specified by Condition 3.1. Assume that as . Then, as ,

 ∣∣^ℓoptn,NPPI−ℓoptn∣∣/ℓoptn (19) =Op([m/n]1/2+[ℓ1/m]+ℓ−21)+Op(ℓ22ℓ2−1∑k=ℓ2∣∣r(k)∣∣+n−1/2ℓ3/22).

As the NPPI method targets the block approximation