A note on conditional versus joint unconditional weak convergence in bootstrap consistency results
The consistency of a bootstrap or resampling scheme is classically validated by weak convergence of conditional laws. However, when working with stochastic processes in the space of bounded functions and their weak convergence in the Hoffmann-Jørgensen sense, an obstacle occurs: due to possible non-measurability, neither laws nor conditional laws are well-defined. Starting from an equivalent formulation of weak convergence based on the bounded Lipschitz metric, a classical circumvent is to formulate bootstrap consistency in terms of the latter distance between what might be called a conditional law of the (non-measurable) bootstrap process and the law of the limiting process. The main contribution of this note is to provide an equivalent formulation of bootstrap consistency in the space of bounded functions which is more intuitive and easy to work with. Essentially, the equivalent formulation consists of (unconditional) weak convergence of the original process jointly with two bootstrap replicates. As a by-product, we provide two equivalent formulations of bootstrap consistency for statistics taking values in separable metric spaces: the first in terms of (unconditional) weak convergence of the statistic jointly with its bootstrap replicates, the second in terms of convergence in probability of the empirical distribution function of the bootstrap replicates. Finally, the asymptotic validity of bootstrap-based confidence intervals and tests is briefly revisited, with particular emphasis on the, in practice unavoidable, Monte Carlo approximation of conditional quantiles.
Keywords: Bootstrap; conditional weak convergence; confidence intervals; resampling; stochastic processes; weak convergence.
MSC 2010: 62E20; 62G09
It is not uncommon in statistical problems that the limiting distribution of a statistic of interest be intractable. To carry out inference on the underlying quantity, one possibility consists of using a bootstrap or resampling scheme. Ideally, prior to its use, its consistency or asymptotic validity should be mathematically demonstrated. For a real or vector-valued statistic (or, more generally, a statistic taking values in a separable metric space ), the latter classically consists of establishing weak convergence of certain conditional laws (assuming that these are well-defined; see, e.g., Fad85 or Section 6 in Kal02). Specifically, a resampling scheme can be considered asymptotically consistent if an appropriate distance between the conditional distribution of a bootstrap replicate of given the available observations and the distribution of is shown to converge to zero in probability; see, for instance, BicFre81, Van98, Hor01 and the references therein, or Assertions and in Lemma 2.2 below. A first contribution of this note is to show that, under minimal conditions, the aforementioned convergence of conditional laws is actually equivalent to the (unconditional) weak convergence of jointly with two bootstrap replicates to independent copies of the same limit. As pointed out by a referee, the proof of this result relies on a key idea dating back to Hoe52, which has been used to derive quite similar statements since then; see, for instance, Lemma 4.1 in DumZer13 and the additional references given in Section 2. Furthermore, we provide an interesting third equivalent formulation of the consistency of a bootstrap for . It roughly states that the distance between the empirical distribution of the bootstrap replicates and the unobservable distribution of converges in probability to zero as the number of replicates and the sample size increase (see also BerLecMil87, Section 4, for a similar result). The latter is particularly meaningful given that most applications of resampling involve at some point approximating the unobservable distribution of by the empirical distribution of a finite number of bootstrap replicates.
In many situations, the -valued statistic of interest is a “sufficiently smooth” functional of a certain stochastic process belonging to the space of bounded functions defined on some arbitrary set (think of the general empirical process, for instance, as defined in Chapter 2 of VanWel96). Note that , when equipped with the supremum distance, is in general neither separable nor complete, and that is usually allowed to be non-measurable as well. As a consequence, neither laws nor conditional laws are well-defined in general, which complicates the theoretical analysis of bootstraps for . Following GinZin90, the consistency of a resampling scheme is then commonly defined by the requirement that the bounded Lipschitz distance between the candidate limiting law and a suitable adaptation of what might be called a conditional law of the bootstrap replicate (even though the latter does not exist in the classical sense due to non-measurability) converges to zero in outer probability. For instance, for the general empirical process based on independent and identically distributed observations, such an investigation is carried out in PraWel93 (see also VanWel96, Section 3.6) for the so-called empirical bootstrap and various other exchangeable bootstraps. The appeal of working at the stochastic process level then arises from the fact that such bootstrap consistency results can be transferred to the -valued statistic level (often, ) by means of appropriate extensions of the continuous mapping theorem and the functional delta method.
It may however be argued that the aforementioned generalization of the classical conditional formulation of bootstrap consistency is unintuitive and complicated to use given the subtlety of the underlying mathematical concepts (in particular, relying on “conditional laws” of non-measurable maps). The latter seems all the more true for instance for empirical processes based on estimated or serially dependent observations (see, e.g., RemSca09; Seg12; BucKoj16). The main contribution of this note is to show that the -valued results from Section 2 continue to hold for stochastic processes with bounded sample paths: the “conditional” formulation is actually equivalent to the (unconditional) weak convergence of the initial stochastic process jointly with two bootstrap replicates. From a practical perspective, using the latter unconditional formulation may have two important advantages. First and most importantly, it may be easier to prove in certain situations than the conditional formulation. For this reason, it was for instance used, as explained above, for empirical processes based on estimated or serially dependent observations; see also Section 3 below for additional references. Second, the unconditional formulation may be transferable to the statistic level for a slightly larger class of functionals of the stochastic process under consideration. The latter follows for instance from the fact that continuous mapping theorems for the bootstrap, that is, adapted to the conditional formulation, require more than just continuity of the map that transforms the stochastic process into the statistic of interest (see, e.g., Kos08, Section 10.1.4). Furthermore, there does not seem hitherto to exist an extended continuous mapping theorem (see, e.g., VanWel96, Theorem 1.11.1) for the bootstrap. Once the unconditional formulation is transferred to a separable metric space (with being typically ), the classical conditional statement immediately follows by the equivalence at the -valued statistic level mentioned above. Finally, let us mention that the equivalence at the stochastic process level is well-known for the special case of multiplier central limit theorems (CLTs) for the general empirical process based on i.i.d. observations using results of VanWel96 (note that multiplier CLTs are sometimes also referred to as multiplier or weighted bootstraps; see, e.g., Kos08, CheHua10 and the references therein). As such, our proven equivalence at the stochastic process level can be seen as an extension of the latter work.
As an illustration of our results, we revisit the fact that bootstrap consistency implies that bootstrap-based confidence intervals are asymptotically valid in terms of coverage and that bootstrap-based tests hold their level asymptotically; see, for instance, Van98 for a related result and Hor01 for more specialized and deeper results. In particular, we provide results which explicitly take into account that (unobservable) conditional quantiles must be approximated by Monte Carlo in practice.
Finally, we would like to stress that the asymptotic results in this note are all of first order. Higher order correctness of a resampling scheme (usually considered for real-valued statistics) may still be important in small samples. The reader is referred to Hal92 for more details.
This note is organized as follows. The equivalence between the aforementioned formulations of asymptotic validity of bootstraps of statistics taking values in separable metric spaces is proved in Section 2. Section 3 states conditions under which the results of Section 2 extend to stochastic processes with bounded sample paths. In Section 4, it is formally verified that, as expected, bootstrap consistency implies asymptotic validity of bootstrap-based confidence intervals and tests. A summary of results and concluding remarks are given in the last section.
In the rest of the document, the arrow ‘’ denotes weak convergence, while the arrows ‘’ and ‘’ denote almost sure convergence and convergence in probability, respectively.
2 Equivalent statements of bootstrap consistency in separable metric spaces
The generic setup considered in this section is as follows. The available data will be denoted by . Apart from measurability, no assumptions are made on , but it is instructive to think of as an -tuple of multivariate observations which may possibly be serially dependent. Let denote a separable metric space. We are interested in approximating the law of some -valued statistic computed from , denoted by . -valued bootstrap replicates of , on which inference could be based, will be denoted by , , …, where , , …, typically -valued, are identically distributed and represent additional sources of randomness such that are independent conditionally on .
The previous setup is general enough to encompass most if not all types of resampling procedures. For instance, when , the classical empirical (multinomial) bootstrap of Efr79 based on resampling with replacement from some original i.i.d. data set can be obtained by letting the be i.i.d. multinomially distributed with parameter . Indeed, for fixed , the sample constructed by including the th original observation exactly times, , may be identified with a sample being drawn with replacement from the original observations. Many other resampling schemes are included as well: block bootstraps for time series such as the one of Kun89, (possibly dependent) multiplier (or weighted, wild) bootstraps (see, e.g., Sha10) or the parametric bootstrap (see, e.g., StuGonPre93; GenRem08). For all but the last mentioned resampling scheme, , , …, could be interpreted as i.i.d. vectors of bootstrap weights, independent of . Several examples of such weights when corresponds to i.i.d. observations are given for instance in VanWel96.
The previous setup is formally summarized in the following assumption. Recall the notions of conditional independence and regular conditional distribution; see, e.g., Kal02, Section 6.
Condition 2.1 (-valued resampling mechanism).
Let denote a separable metric space equipped with the Borel sigma field , and let denote a probability space. For , let be a random variable in some measurable space . Furthermore, let , , denote identically distributed random variables in some measurable space and let , , be -valued statistics (to be considered as bootstrap replicates of some -valued statistic ) that are independent conditionally on . Finally, assume that has a regular version, denoted by and called the (regular) conditional distribution of given .
The last assumption in the previous condition concerning the existence of the conditional distribution of given is automatically satisfied if there exists a possibly different metric on which is equivalent to such that is complete. In that case, is a Borel space, see Theorem A1.2 in Kal02, and the assertion follows from Theorem 6.3 in that reference. The existence of the aforementioned conditional distribution can also be guaranteed if the underlying probability space has a product structure, that is, if with probability measure , where denotes the probability measure on , such that, for any , only depends on the first coordinate of and only depends on the -coordinate of , implying in particular that are independent. In that case, it can readily be checked by Fubini’s theorem that defines a regular version of the conditional distribution of given .
In a related way, for arbitrary real-valued functions such that , conditional expectations are always to be understood as integration of with respect to (Kal02, Theorem 6.4).
Lemma 2.2 below is one of the main result of this note and essentially shows that the unconditional weak convergence of a statistic jointly with two of its bootstrap replicates is equivalent to the convergence in probability of the conditional law of a bootstrap replicate. The latter (with convergence in probability possibly replaced by almost sure convergence) is the classical mathematical definition of the asymptotic validity of a resampling scheme. A further equivalent formulation, of interest for applications, is also provided. Parts of these equivalences can also be found in DumZer13, Lemma 4.1, relying on ideas put forward in Hoe52 and also exploited in Rom89 and ChuRom13.
Recall that the bounded Lipschitz metric between probability measures on a separable metric space equipped with the Borel sigma field is defined by
where denotes the set of functions such that for all . Moreover, recall the Kolmogorov distance between probability measures on , defined by
Finally, denote the empirical distribution of the sample by
Lemma 2.2 (Equivalence of unconditional and conditional formulations).
Suppose that Condition 2.1 is met. Assume further that converges weakly to some random variable in . Then, the following four assertions are equivalent:
|If, additionally, and the (cumulative) distribution function (d.f.) of is continuous, then the preceding four assertions are also equivalent to|
Before providing a proof of this lemma, let us give an interpretation of the assertions. The intuition behind Assertions and is that a resampling scheme should be considered consistent if the bootstrap replicates behave approximately as independent copies of , the more so that is large. Assertions and translate mathematically the idea that a resampling scheme should be considered valid if the distribution of a bootstrap replicate given the data is close to the distribution of the original statistic , the more so that is large. Assertions and can be regarded as empirical analogues of Assertions and , respectively: the unobservable conditional law of a bootstrap replicate is replaced by the empirical law of a sample of bootstrap replicates, providing an approximation of the law of that improves as increase.
Assertions and are known to hold for many statistics and resampling schemes, possibly as a consequence of general consistency results such as the one of BerDuc91 (see also Hor01, Section 2.1). Assertions and are substantially less frequently encountered in the literature and appear mostly as a consequence of similar assertions at a stochastic process level; see Lemma 3.1 in Section 3 and the references therein.
Let us finally turn to the proof of Lemma 2.2. The latter is in fact a corollary of the following, slightly more general lemma which does not rely on the additional assumption that converges weakly.
Suppose that Condition 2.1 is met and let be a fixed probability measure on . Then, the following four assertions are equivalent:
|If, additionally, and the d.f. of is continuous, then the preceding four assertions are also equivalent to|
The proof of this lemma will in turn be based on the following two possibly well-known lemmas about metrizing weak convergence in separable metric spaces. Note that the results are stated in terms of nets which generalize sequences (see, e.g., VanWel96, Section 1.1) in order to account for the net convergences in Assertions and of the two preceding lemmas.
For sequences, the forthcoming assertions regarding the Kolmogorov distance can for instance be found in Van98, see Lemma 2.11 and Problem 23.1, while the assertions regarding the bounded Lipschitz metric can be found in Dud02, Theorem 11.3.3, for the non-random version (see Lemma 2.4 below) and in DumZer13, Section 2, for the random one (see Lemma 2.5 below). Detailed proofs are provided in the supplementary material for the sake of completeness.
Suppose that is a separable metric space and let be a net of probability measures on , where denotes the Borel sigma field. Then if and only if . If and if the d.f. of is continuous, we also have equivalence to .
A random probability measure on a separable metric space is a mapping from some probability space into the set of Borel probability measures on such that considered as a function from to is measurable for any bounded and continuous function on (see, e.g., DumZer13, Section 2). Note that, under Condition 2.1, is a sequence of such random probability measures.
Suppose that is a separable metric space and let denote a net of random probability measures on defined on a probability space . Then,
for any bounded and Lipschitz continuous if and only if in probability. Further, , considered as a map from to , is measurable.
If and if the d.f. of is continuous, then (2.1) is also equivalent to in probability, and is measurable as well.
We can now prove Lemma 2.3.
Proof of Lemma 2.3.
We begin by showing the equivalence between , , and . Note that, even though the equivalence between and is almost identical to Lemma 4.1 of DumZer13, we provide a self-contained proof to ease readability.
: by Lemma 2.4, we only need to show that
for all bounded and Lipschitz continuous . As in the proof of Lemma 4.1 in DumZer13, we can even prove -convergence. Let denote a random variable with distribution . Then, by the law of iterated expectation,
Since and are identically distributed and conditionally independent given , the first term on the right-hand side can be written as
The function being bounded and continuous, the convergence in implies that, as ,
where and are independent copies of .
: by Lemma 2.4 and Corollary 1.4.5 in VanWel96, it suffices to show that, as ,
for any bounded and Lipschitz continuous. By independence of conditionally on , we can write the left-hand side as
and the assertion follows from , Lemma 2.5 and dominated convergence for convergence in probability.
: fix bounded and Lipschitz continuous and , and denote by a bound on . Then, for any ,
by Chebychev’s inequality. As a consequence,
Finally, if and if the d.f. of is continuous, the equivalences and are immediate consequences of Lemma 2.5. ∎
Proof of Lemma 2.2.
Since , to show the equivalence between –, it remains to be shown that implies . By Corollary 1.4.5 in VanWel96, it suffices to show that
for any bounded and Lipschitz continuous. By independence of conditionally on , we obtain that
and the assertion follows from , Lemma 2.5 and dominated convergence for convergence in probability.
3 Extension to stochastic processes with bounded sample paths
As in the previous section, let be some data formally seen as a random variable in some measurable space . Furthermore, let denote an arbitrary non-empty set and let denote the set of real-valued bounded functions on equipped with the supremum distance. Since, as already mentioned in the introduction, the latter metric space is in general neither separable nor complete, one cannot typically set and apply the results of the previous section.
To remedy this shortcoming, we are hereafter specifically interested in the situation in which the -valued statistic of the previous section is a stochastic process on constructed from . It is assumed that every sample path is a bounded function so that may formally be regarded as a map from the underlying probability space into without however imposing any measurability conditions. We additionally suppose that, as , converges weakly in to some tight, Borel measurable stochastic process in the sense of Hoffmann-Jørgensen (see, e.g., VanWel96, Section 1.3) (which in fact implies that is asymptotically measurable). Extending the setting of Section 2, we further assume that are bootstrap replicates of , that is, stochastic processes on depending on additional identically distributed random variables in some measurable space that can, in many cases, be interpreted as bootstrap weights and should in general be seen as the additional sources of randomness introduced by the resampling scheme. As for , it is assumed that the sample paths of also belong to and, when seen as maps into , no measurability assumptions are made on these bootstrap replicates either. When represents i.i.d. observations and is the general empirical process constructed from , several examples of possible bootstrap replicates of can for instance be found in VanWel96. As in Section 3.6 of the latter reference, we assume throughout this section that the underlying probability space is independent of and has a product structure, that is, with probability measure , where denotes the probability measure on , such that, for any , only depends on the first coordinate of and only depends on the -coordinate of , implying in particular that are independent.
Some additional notation is needed before our main result can be stated. For any map , let be any minimal measurable majorant of with respect to , that is, is measurable, and almost surely for any measurable function with almost surely. A maximal measurable minorant of with respect to is denoted by and defined by (see VanWel96, Section 1.2). Furthermore, for any , we define the map such that, for any , the map is a minimal measurable majorant of with respect to . Finally, for a real-valued function on such that is measurable for all , we further use the notation
provided the integral exists. Note that if is jointly Borel measurable, the right-hand side of the last displays defines a version of the conditional expectation of given , whence the notation.
With the previous notation and under the above assumptions, the following three assertions are equivalent:
where are i.i.d.
For any , as ,
where are i.i.d.
and is asymptotically measurable, where denotes convergence in outer probability.
Let us make a few comments on this result:
Assertion is the extension put forward by GinZin90 of the conditional formulation of bootstrap consistency in a separable metric space to the non-necessarily separable space . Section 3.6 in VanWel96 and Chapter 10 in Kos08 in particular provide proofs of Assertion for various bootstraps of the general empirical process constructed from i.i.d. observations along with continuous mapping theorems for the bootstrap and a functional delta method for the bootstrap that can be used to transfer (3.3) to the statistic level in certain situations.
In VanWel96, Van98 and Kos08, the expression on the left-hand side of (3.3) appears without the minimal measurable majorant with respect to the “weights”. This is a consequence of the fact that, for all the resampling schemes considered in these monographs, the function is continuous for all , implying that is measurable for all and all . However, the minimal measurable majorant becomes for instance necessary if one wishes to apply Lemma 3.1 to certain stochastic processes appearing when using the parametric bootstrap (e.g., for goodness-of-fit testing, see, StuGonPre93; GenRem08). To see this, suppose that is an i.i.d. sample of size from some d.f. on the real line, with from some parametric family . A natural stochastic process, from which one may for instance construct classical goodness-of-fit statistics, is then , , where is the empirical d.f. of . Bootstrap samples are generated by sampling from , where is an estimator of . Note in passing that the latter way of proceeding is compatible with the product-structure condition on the underlying probability space since bootstrap samples can equivalently be regarded as obtained by applying component-wise to independent random vectors independent of and whose components are i.i.d. standard uniform. Now, corresponding parametric bootstrap replicates of are given by , where is the empirical d.f. of the sample . The need for the minimal measurable majorant with respect to the “weights” in (3.3) is then a consequence of the fact that the function from to defined by
is not measurable for all and all , as can for instance be verified by adapting arguments from Bil99.
Bootstrap asymptotic validity in the form of Assertions or is less frequently encountered in the literature, although, as discussed in the introduction, it may be argued that this unconditional formulation is more intuitive and easy to work with. It is proved for example in GenRem08 (for ), RemSca09, Seg12, GenNes14, BerBuc17 and BucKoj16; BucKoj16b, among many others, for various stochastic processes arising in statistical tests on copulas or for assessing stationarity.
As mentioned in the introduction, note that Assertions and are known to be equivalent for the special case of the multiplier CLT for the general empirical process based on i.i.d. observations and, in this case, it is even sufficient to consider in : Corollary 2.9.3 in VanWel96 corresponds to Assertion , while Theorem 2.9.6 corresponds to Assertion . The equivalence between the two follows by combining Theorem 2.9.6 with Theorem 2.9.2.
Before proving Lemma 3.1, we provide a useful corollary which is an immediate consequence of Lemma 3.1 and Lemma 2.2. It may be regarded as an analogue of Theorem 1.5.4 in VanWel96 in a conditional setting and, roughly speaking, states that conditional weak convergence of a sequence of stochastic processes is equivalent to the conditional weak convergence of finite-dimensional distributions and (unconditional) asymptotic tightness.
Suppose that the assumptions of Lemma 3.1 are met. Then, any of the equivalent assertions in that lemma is equivalent to the fact that the finite dimensional distributions of conditionally weakly converge to those of in probability, that is, for any and ,
as , and that is (unconditionally) asymptotically tight.
Proof of Lemma 3.1.
We closely follow the proof of Theorem 2.9.6 of VanWel96 and rely on Lemma 2.2 when necessary.
: Asymptotic measurability of is an immediate consequence of the weak convergence of to in (VanWel96, Lemma 1.3.8). Next, by Theorems 1.5.4 and 1.5.7 in VanWel96, the latter convergence implies that there exists a semimetric on such that is totally bounded and such that, for any ,
Fix . For any , let denote the ball of radius centered at . Since is totally bounded, there exists and , , such that is included in the union of all balls , . The latter allows us to define a mapping defined, for any , by where is the center of a ball containing . Now, to prove (3.3), we consider the decomposition
Some thought reveals that (3.3) is proved if, for any ,
and similarly for and .
Term : By Markov’s inequality for outer probabilities (Lemma 6.10 in Kos08), it suffices to show (3.6) with replaced by . For any , we have, by Lemma 1.2.2 (iii) in VanWel96,
where denotes the minimum operator and . It follows that . Note that, by Lemma 1.2.2 (viii) in VanWel96, we may choose in such a way that is nonincreasing almost surely. Then is nonincreasing as well, and from (3.5) and Problem 2.1.5 in VanWel96, we have that in probability as for any sequence , which, by dominated convergence for convergence in probability, implies that . Hence, by invoking Problem 2.1.5 in VanWel96 again.
Term : Fix and recall that the centers of the balls defining were denoted by . Since the weak convergence stated in (3.1) implies weak convergence of the respective finite dimensional distributions, we may invoke the equivalence between and in Lemma 2.2 to conclude (with the help of the triangular inequality and Lemma 2.4) that (3.4) holds, that is, that
Next, let be arbitrary. Define such that, for any and , if . Furthermore, let be defined as implying that . Some thought reveals that , whence in probability as for all , implying the analogue of (3.6) for .
Term : For any , we have
By tightness of , Addendum 1.5.8 in VanWel96 and dominated convergence, the expectation on the right converges to zero as , implying the analogue of (3.6) for .
: To prove (3.2), we need to show the weak convergence of the finite-dimensional distributions and marginal asymptotic tightness. We start with the former. Let and . It suffices to show that, as ,