Convergence rates of empirical block length selectors for block bootstrap
We investigate the accuracy of two general non-parametric methods for estimating optimal block lengths for block bootstraps with time series – the first proposed in the seminal paper of Hall, Horowitz and Jing (Biometrika 82 (1995) 561–574) and the second from Lahiri et al. (Stat. Methodol. 4 (2007) 292–321). The relative performances of these general methods have been unknown and, to provide a comparison, we focus on rates of convergence for these block length selectors for the moving block bootstrap (MBB) with variance estimation problems under the smooth function model. It is shown that, with suitable choice of tuning parameters, the optimal convergence rate of the first method is where denotes the sample size. The optimal convergence rate of the second method, with the same number of tuning parameters, is shown to be , suggesting that the second method may generally have better large-sample properties for block selection in block bootstrap applications beyond variance estimation. We also compare the two general methods with other plug-in methods specifically designed for block selection in variance estimation, where the best possible convergence rate is shown to be and achieved by a method from Politis and White (Econometric Rev. 23 (2004) 53–70).
0 \volume20 \issue2 2014 \firstpage958 \lastpage978 \doi10.3150/13-BEJ511 \newproclaimconditionCondition \newremarkremarkRemark
Empirical block length selectors
1]\fnmsDaniel J. \snmNordman\corref\thanksref1label=e1]firstname.lastname@example.org and 2]\fnmsSoumendra N. \snmLahiri\thanksref2label=e2]email@example.com
jackknife-after-bootstrap \kwdmoving block bootstrap \kwdoptimal block size \kwdplug-in methods \kwdsubsampling
Performance of block bootstrap methods critically depends on the choice of block lengths. A common approach to the problem is to choose a block length that minimizes the Mean Squared Error (MSE) function of block bootstrap estimators as a function of the block length. For many important functionals, expansions for the MSE-optimal block lengths are known. If denotes an estimator of a parameter of interest based on a stationary stretch , examples of relevant functionals of the distribution of include the bias , variance , and the distribution function (i.e., given and where represents either the variance of or an estimator of this, cf. L07 ()). If denotes a block bootstrap estimator of based on block length , then as the bias and variance of often admit expansions of the form
for some known constants depending on (e.g., for functionals , while for the distribution function when ) and lead to a large sample approximation of MSE-optimal block size given by
involving population quantities that depend on the functional , the bootstrap method, and various parameters of the underlying process. For smooth function model statistics (described below), these expansions (1) have been established for the moving block and non-overlapping block methods H95 (), K89 (), L99 (), L07 () and, in particular, are also known for the variance functional with other block bootstraps, such as the circular block bootstrap PR92 () and stationary bootstrap N09 (), PR94 (); see L03 () and references therein. However, as the theoretical approximations (2) for the optimal block lengths typically depend on different unknown population parameters of the underlying process in an intricate manner, these are not directly usable in practice.
Different data-based methods for the selection of optimal block lengths have been proposed in the literature. One of the most popular general methods is proposed by Hall, Horowitz and Jing H95 () (hereafter referred to as HHJ) which employs a subsampling method (cf. PR94 ()) to construct an empirical version of the MSE function and minimizes this to produce an estimator of the optimal block length. We will refer to this approach as the HHJ method. A second general method for selecting the optimal block length is put forward by Lahiri et al. L07 (). This method is based on the jackknife-after-bootstrap method of Efron E92 () and its extension to block bootstrap by Lahiri L02 (). For reasons explained in L07 () (see also Section 2 below), we will refer to this method as the non-parametric plug-in method (or the NPPI method, in short). Both the HHJ and NPPI methods are called “general” because these can be used in the same manner across different functionals (e.g., bias, variance, distribution function, quantiles, etc.) to find the optimal block size for bootstrap estimation, without requiring exact analytical expressions for the corresponding optimal block length approximation (2) (i.e., without requiring explicit forms for quantities ). In particular, for a given functional, the HHJ method aims to directly estimate the constant in the optimal block approximation (2) while the NPPI method separately and non-parametrically estimates the bias and variance quantities in (2) without structural knowledge of these. Our major objective here is to investigate the convergence rates of these two general methods. For instance, despite the popularity of the HHJ method, little is theoretically known about its properties for block selection or how this compares to the NPPI method. As a context to compare the methods, we focus on their performance for block selections in variance estimation problems with the block bootstrap. In the literature, a few other block length selection methods also exist. These are primarily plug-in estimators which necessarily require an explicit expression for the optimal block approximation (2) for each specific functional and for each block bootstrap method (i.e., requiring exact forms for ) and are not the focus of this paper. However, two popular plug-in methods for the variance functional in the latter category are given by Bühlmann and Künsch BK99 () and Politis and White PW04 () (and its corrected version Patton, Politis and White P10 ()). For completeness, we later compare the performance of the two general methods with these plug-in methods for block selection in variance estimation.
For concreteness, we shall restrict attention to the moving block bootstrap (MBB) method K89 (), L92 (), which was the original focus of the HHJ method H95 () and the plug-in method of Bühlmann and Künsch BK99 () and shares close large-sample connections to other block bootstrap methods (e.g., circular block bootstrap, non-overlapping block bootstrap, untapered version of the tapered block bootstrap) L99 (), N09 (), PP01 (), PW04 (). Further, we shall work under the smooth function model of Hall H92 () (see Section 2.1 below) which provides a convenient theoretical framework but, at the same time, is general enough to cover many commonly used estimators in the time series context (L03 (); Chapter 4). Accordingly, let be an estimator of a parameter of interest under the smooth function model and suppose that the MBB is used for estimating or its limiting form
denote the MSE of the MBB variance estimator based on blocks of length and a sample of size . (Defining the MSE with or makes no difference in the following and, for clarity, it is helpful to fix a target in defining (4) throughout.)
The theoretical MSE-optimal block size is given by
where is a suitable set of block lengths including the optimal block length. As alluded to above (1), under some standard regularity conditions, it can be shown that
where and are population parameters arising, respectively, from the bias and variance of the MBB variance estimator . Let denote the minimizer of the asymptotic approximation to the MSE function, where (cf. (2)). As a first step towards investigating the accuracy of different empirical block rule selection methods, we consider the relative error of this theoretical approximation and show that
as . Thus, the true optimal block size and the optimal block size determined by the asymptotic approximation to the MSE curve of the block bootstrap estimator differ by a margin of on the relative scale. In general, this rate cannot be improved further. As a result, for empirical block length selection rules involving estimation steps that target (which all existing methods do), the upper bound on their accuracy for estimating the true optimal block length is .
Next, we consider the convergence rates of the two general methods. Let and , respectively, denote the estimators of the optimal block length based on the HHJ and NPPI methods. We show that under some mild conditions and with a suitable choice of the tuning parameters,
as . Thus, the (relative) rate of convergence of the HHJ estimator of the optimal block length is . The block length in block bootstrap methodology plays a role similar to a smoothing parameter in non-parametric functional estimation. It is well known (cf. H87 ()) that non-parametric data based rules for bandwidth estimation often have an “excruciatingly slow” (relative) rate of convergence (e.g., of the order of ). The convergence rate of the HHJ method turns out to be relatively better. It is worth noting that the HHJ block estimator, based on the overlapping version of the subsampling method, has the same rate of convergence irrespective of the dependence structure of the underlying time series . Additionally, in the process of determining this convergence rate, we also provide the theoretical guidance on optimally choosing two tuning parameters required in implementing the HHJ method, which has been an unresolved aspect of the method.
Next, we consider the NPPI method and compare its relative performance with the HHJ method. The rate of convergence of the NPPI method is determined by two factors, which arise from estimating the variance and the bias of a block bootstrap estimator (i.e., quantities and appearing in , ). The factor due to the variance part is based on the (block) jackknife-after-bootstrap method E92 (), L02 (), and it attains an optimal rate of , with a suitable choice of the tuning parameters. On the other hand, the second factor is determined by a non-standard bias estimator that turns out to be adaptive to the strength of dependence of . Let denote the autocovariance function of (a suitable linear function of) the ’s. When as for a suitably large , the rate of convergence of the second term can be as small as , for a given , with a suitable choice of the tuning parameters. Thus, combining the two, the optimal rate of convergence of the NPPI method becomes , which is better than optimal rate for the HHJ method. For this to hold, the user needs to specify two tuning parameters, the same number as with the HHJ method. Also, the convergence rate is interesting in the variance estimation problem because this matches the best rate obtained by the plug-in block selection method of Bühlmann and Künsch BK99 (). Their method is a four-step algorithm which uses lag weight estimators of the spectral density at zero and again requires explicit forms for quantities appearing in the bias and variance (e.g., ) of the MBB variance estimator. Hence, while the NPPI method for block selection applies more generally to other functionals, its convergent rate matches the optimal one for a plug-in method specifically tailored to the variance estimation problem. This provides some evidence supporting the use of the NPPI method in block selection with other functionals outside of variance estimation.
The rest of the paper is organized as follows. In Section 2, we briefly describe the smooth function model, the MBB and the empirical block length selectors proposed by HHJ H95 () and Lahiri et al. L07 (). In Section 3, we present the conditions and derive a general result on uniform approximation of the MSE of a block bootstrap estimator which may be of independent interest. We describe main results on the HHJ and the NPPI methods in Sections 4 and 5, respectively. In Section 6, we compare the general HJJ/NPPI methods with other plug-in block selection approaches for the MBB in the variance estimation problem. In particular, a plug-in method of Politis and White PW04 () (see also P10 ()) is shown to achieve the best possible convergence rate for block selection with variance functionals. Section 7 sketches proofs of the main results, where full proofs are deferred to a supplementary material appendix NL12 ().
2.1 MBB variance estimator and optimal block length
Let be a stationary stretch of -valued random vectors with mean . We shall consider the problem of estimating the variance of a statistic framed in the “smooth function” model H92 (). Using some function and the sample mean , suppose that a statistic can be expressed as for purposes of estimating a process parameter . The “smooth function” model covers a wide range of parameters and their estimators, including sample mean, sample autocovariances, Yule–Walker estimators, among others; see Chapter 4, L03 () for more examples. Recall the target variance of interest is or its limit (3).
We next describe the MBB variance estimator. Let (set of positive integers) denote the block length and create overlapping length blocks from as , where for any integer . We independently resample blocks by letting denote i.i.d. random variables with a uniform distribution over block indices and then define a MBB sample of size as , where denotes the integer part of a real number . The MBB analog of is given by using the MBB sample mean and the MBB variance estimator is then defined as
where denotes the variance with respect to the bootstrap distribution conditional on the data .
For variance estimation, we briefly consolidate notation from Section 1 on optimal block lengths. The performance of the MBB again depends on the block choice . Under certain dependence conditions and block assumptions (), the asymptotic bias and variance of the MBB estimator are
as , for some population parameters depending on the covariance structure of the underlying process (cf. H95 (), K89 () and Condition 3.1 of Section 3.1). Thus, the main component in MSE (4) of the MBB follows as
as . The minimizer of is given by
where . From (7) and (8), the optimal block minimizing behaves as in large samples H95 (), K89 (), L03 (). As a result, to examine properties of the block length selection methods, we shall create a collection of block lengths , for a suitably large constant such that , and formally define the optimal block size as in (5).
2.2 The Hall–Horowitz–Jing (HHJ) block estimation method
The HHJ H95 () method seeks to estimate the optimal block size by minimizing an empirical version of the MSE (4) created by subsampling (data blocking). Let denote a sequence satisfying as , which serves to define the length of subsamples , . For each subsample, let denote the MBB variance estimator resulting from resampling length blocks from observations . For clarity, note that MBB block lengths on size subsamples are denoted by “,” while “” denotes MBB block lengths applied to the original data . To approximate the error in MBB variance estimation incurred by using length blocks in samples of size , we form a subsampling estimator
where the initializing MBB estimator of is based on the entire sample and on a plausible pilot block size . By minimizing over , we formulate
as an estimator of the theoretically optimal MBB block length for a size sample, with
Next, is a rescaling step that involves approximating true optimal block length with the minimizer of MSE-approximation (7). That is, as is the “size sample version” of in (5), one uses the large-sample block approximation and from (8) to re-scale and subsequently define the HHJ estimator of as
Hence, the HHJ method requires specifying both a subsample size and a pilot MBB block size , which impact the performance of the block estimator .
2.2.1 An oracle-like subsampling MSE
For purposes of comparison with the HHJ method, we also define a second subsampling MSE given as
which resembles the empirical MSE (9) after replacing the variance estimator with its target from (3). This subsampling MSE serves to remove one tuning parameter in the original HHJ method by unrealistically assuming is known. However, we may parallel the performance of the HHJ block estimators and to their oracle-like counterparts
based on (13) and the resulting estimator of the optimal block length given by
Both and estimate the same optimal block size , but the estimator is based on an unbiased subsampling criterion through knowledge of , that is, for all .
2.3 The non-parametric plug-in (NPPI) method
The NPPI method is based on the non-parametric plug-in principle L07 () which yields estimators of MSE optimal smoothing parameters in general non-parametric function estimation problems. Here we describe the method for estimating the optimal block length for the variance functional using the MBB. Like any plug-in method, the target quantity for the NPPI method is the minimizer of the MSE-approximation of (7), which again is of the form from (8) with population parameters and in determined by the bias and variance expansion (6) of the MBB variance estimator. The NPPI method estimates the bias and the variance of the MBB estimator non-parametrically, and then estimates and by inverting (6). Specifically, the method constructs estimators and satisfying
for some block lengths and and defines and . Then, the estimator of the optimal block length is given by
The bias estimator for the NPPI method is
and the variance estimator is constructed using the jackknife-after-bootstrap (JAB) method E92 (), L02 (), due to its computational advantages. For completeness, we next briefly describe the details of the JAB variance estimator.
Politis and Romano PR95 () considered an estimator related to above for bias-correcting the Bartlett spectral estimator (e.g., at the zero frequency, this Bartlett estimator is asymptotically equivalent to and their corrected estimator is equivalent to ). It is also important to re-iterate that, while the NPPI block estimator is based on general forms (cf. (1), (6)) for the asymptotic bias and variance of a bootstrap estimator, the HHJ block estimator requires only the optimal block order (cf. (2), (8)) for minimizing the asymptotic MSE of a bootstrap estimator; in this sense, the HHJ method requires less large-sample information and could potentially be more general. At the same time, as the MSE-optimal block order is typically derived from asymptotic bias/variance quantities, both NPPI and HHJ methods are generally intended to apply for block selection with the same problems, particularly under the smooth function model.
2.3.1 The jackknife-after-bootstrap variance estimator
The JAB method was initially proposed by E92 () to assess accuracy of bootstrap estimators for independent data, and was extended to the dependent case by L02 (). A key advantage of the JAB method is that it does not require a second level of resampling; the JAB method produces a variance estimate of a block bootstrap estimator by merely regrouping the resampled blocks used in computing the original block bootstrap estimator L02 ().
Suppose that the goal is to estimate the variance of an MBB estimator based on blocks of length . (For notational simplicity here, consider and .) Let be an integer such that as . Here, denotes the number of bootstrap blocks to be deleted for the JAB. Set , and for , let . Also, let , be the MBB blocks of size . The first step of the JAB is to define a jackknife version of for each . Then, the th block-deleted jackknife point value is obtained by resampling blocks randomly, with replacement from the reduced collection and then by computing the corresponding block bootstrap variance estimator using the resulting resample.
Then, the JAB estimator of the variance of is given by
where is the th block-deleted jackknife pseudo-value of , .
3 Results on uniform expansion of the MSE
To develop MSE and other probabilistic expansions, we require conditions on the dependence structure of the stationary -valued process and the smooth function , described below. Condition 3.1 prescribes differentiability assumptions on the smooth function , Condition 3.1 describes mixing/moment assumptions as a function of positive integer , and Condition 3.1 entails certain covariance sums are non-zero. In particular, the sums in Condition 3.1 define the constant in the large-sample optimal block approximation from (8). For , write in the following.
The function is 3-times continuously differentiable and , for some and integer .
For some , and , where denotes the strong mixing coefficient of the process .
and for in (3), where , and is the vector of first order partial derivatives of at .
Mixing and moment assumptions as formulated in Condition 3.1 are standard in investigating block resampling methods (cf. L03 (), Chapter 5). Typical expansions of the MSE of the MBB variance estimator often require to be 2-times differentiable in the smooth function model, whereas Condition 3.1 requires slightly more in order to determine a finer expansion of this MSE. The assumptions on the process quantities in Condition 3.1 are mild and standard for the block bootstrap H95 (), K89 (), L99 (), L07 (), N09 (), PP01 (), PR94 (); in particular, the assumption on is needed to rule out i.i.d. processes.
3.2 Main results
Recalling the MSE-approximation for the MBB variance estimator from (7) (with constants as in Condition 3.1 above), Theorem 1 below provides a more refined expansion of this MSE over a collection of block lengths, as in (5) (cf. Section 2.1), of optimal order.
for defined in (7),
, for from (8).
Theorem 1(i) gives a close bound how the MSE-approximation matches the curve (not quite but both having the same minimizer), uniformly in . For comparison, note , , has exact order . In trying to resolve , we then have a general bound on the differences between the two curves. One implication, stated in Theorem 1(ii), is that becomes the general order on the discrepancy between the minimizer of and the minimizer of . Theorem 1 bounds cannot be generally improved by further expanding (i.e., under additional smoothness assumptions on ) and, in fact in Theorem 1(ii), is necessarily an integer while need not be.
4 Results on the HHJ method
Then, as ,
If additionally and , then
Theorem 2 also holds if, on the left-hand sides above, we replace and with their subsample counterparts and . This result helps to reinforce the notion that the quality of block estimation at the subsample level determines the performance of HHJ method.
Theorem 2(i) indicates how the subsample size affects the convergence rate of the oracle-type block estimate. It follows from Theorem 1(i) that, with oracle knowledge of , the best possible (fastest) rate of convergence for is achieved when the subsample size . The choice balances the sizes of all three terms in the bound from Theorem 2(i). Remark 4 below provides some explanation of the probabilistic bounds in Theorem 2(i).
In Theorem 2(ii), we impose some additional block growth conditions on the pilot block and subsample size in the HHJ method, which are mild and help to concisely express the order of the main components contributing to the error rate. While the combined effects of the tuning parameters are complicated and difficult to characterize in Theorem 2(ii), a block of MSE-optimal order for the pilot MBB variance estimator in the HHJ method is an intuitive starting point. And with this choice, it follows that is then optimal for minimizing the convergence rate of the HHJ block estimator, which becomes . In fact, the selection is overall optimal and simultaneously balances the order of all six error terms in Theorem 1(ii). So surprisingly, the HHJ block estimator achieves the best convergence rate that one could hope for by matching the optimal rate of the oracle block estimator . We summarize our findings on tuning parameters in Corollary 1.
Under the assumptions of Theorem 2, a subsample size and pilot block yield optimal convergence rates
as , for the HHJ block estimator and its oracle version.
An interpretation of Corollary 1 is that, at optimal tuning parameters, random fluctuations in the HHJ block estimator are of the order . This behavior interestingly resembles that of some other kernel bandwidth estimators based empirical MSE criteria (cf. R84 ()), though does not take its arguments from a continuum of real-values.
We provide a brief explanation of the probabilistic bounds in Theorem 2, and focus mainly on the behavior of oracle block estimator from Section 2.2.1 at the subsample level; more rigorous details are given in Section 7 and the supplementary material NL12 (). Recall the block estimator minimizes the from (13), while from (11) minimizes (the subsample version of (7)). In part, the bound in Theorem 2(i) is due to smoothness issues with and its discrepancy from (cf. Theorem 1). The other bounds in Theorem 2 arise from the size of
, where ; this quantity measures the discrepancy between two differenced curves (which should ideally match at ), where differences in serve to identify and similar differences in identify . It can be shown that, for any ,
remains stochastically bounded on shrinking neighborhoods of block lengths around , while at the same time (i.e., is consistent for ); see the auxiliary result, Theorem 6, of Section 7. This allows other order bounds on to be determined by recursively “caging” in decreasing neighborhoods around with high probability. The probabilistic bounds in Theorem 2(ii) are partly due to error contributions from the MBB variance estimator used through in (9) to estimate at the subsample level.
5 Results on the NPPI method
Next, we consider the convergence rates of the optimal block length selector based on the NPPI method. Recall that , , where , .
As the NPPI method targets the block approximation (8), the first of the two terms on the right side of (3) is from the estimation of and the second is from the estimation of . For the first term, with any given choice of , the optimal choice of satisfies , that is, . For this choice of , the optimal choice of is determined by the relation , that is, . Thus, the optimal rate of the first term is with and .
To determine the optimal order of the second term, first note that the pilot block size is only required to satisfy the constraints stated in Theorem 3. In particular, is not required to go to with the sample size. From (3), it is also evident that the optimal choice of (to minimize the order of the second term alone) depends on the rate of decay of the autocovariance function . Since for some (implied by Condition 3.1 with ), the second term can always be made to match the optimal order of the first term, that is, , by choosing