Moving Block and Tapered Block Bootstrap for Functional Time Series with an Application to the K-Sample Mean Problem

Moving Block and Tapered Block Bootstrap for Functional Time Series with an Application to the -Sample Mean Problem

Dimitrios  PILAVAKIS,   Efstathios  PAPARODITIS1   and   Theofanis  SAPATINAS
Department of Mathematics and Statistics, University of Cyprus,
P.O. Box 20537, CY 1678 Nicosia, CYPRUS.
11 Corresponding author (email: stathisp@ucy.ac.cy)
Abstract

We consider infinite-dimensional Hilbert space-valued random variables that are assumed to be temporal dependent in a broad sense. We prove a central limit theorem for the moving block bootstrap and for the tapered block bootstrap, and show that these block bootstrap procedures also provide consistent estimators of the long run covariance operator. Furthermore, we consider block bootstrap-based procedures for fully functional testing of the equality of mean functions between several independent functional time series. We establish validity of the block bootstrap methods in approximating the distribution of the statistic of interest under the null and show consistency of the block bootstrap-based tests under the alternative. The finite sample behaviour of the procedures is investigated by means of simulations. An application to a real-life dataset is also discussed.

Some key words: Functional Time Series; Mean Function; Moving Block Bootstrap; Tapered Block Bootstrap; Spectral Density Operator; -sample mean problem

1 Introduction

In statistical analysis, conclusions are commonly derived based on information obtained from a random sample of observations. In an increasing number of fields, these observations are curves or images which are viewed as functions in appropriate spaces, since an observed intensity is available at each point on a line segment, a portion of a plane or a volume. Such observed curves or images are called ‘functional data’; see, e.g., Ramsay and Dalzell (1991), who also introduced the term ‘functional data analysis’ (FDA) which refers to statistical methods used for analysing this kind of data.

In this paper we focus on functional time series, that is we consider observations stemming from a stochastic process of Hilbert space-valued random variables which satisfies certain stationarity and weak dependence properties. Our goal is to infer properties of the stochastic process based on an observed stretch i.e., on a functional time series. In this context, we commonly need to calculate the distribution, or parameters related to the distribution, of some statistics of interest based on . Since in a functional set-up such quantities typically depend in a complicated way on infinite-dimensional characteristics of the underlying stochastic process , their calculation is difficult in practice. As a result, resampling methods and, in particular, bootstrap methodologies are very useful.

For the case of independent and identically distributed (i.i.d.) Banach space-valued random variables, Giné and Zinn proved the consistency of the standard i.i.d. bootstrap for the sample mean. For functional time series, Politis and Romano established validity of the stationary bootstrap for the sample mean and for (bounded) Hilbert space-valued random variables satisfying certain mixing conditions. A functional sieve bootstrap procedure for functional time series has been proposed by Paparoditis (2017). Consistency of the non-overlapping block bootstrap for the sample mean and for near epoch dependent Hilbert space-valued random variables has been established by Dehling et al. (2015). However, up to date, consistency results are not available for the moving block bootstrap (MBB) or its improved versions, like the tapered block bootstrap (TBB), for functional time series. Notice that the MBB for real-valued time series was introduced by Künsch and Liu and Singh . The basic idea is to resample blocks of the time series and to joint them together in the order selected in order to form a new set of pseudo observations. This resampling scheme retains the dependence structure of the time series within each block and can be, therefore, used to approximate the distribution of a wide range of statistics. The TBB for real-valued time series was introduced by Paparoditis and Politis (2001). It uses a taper window to downweight the observations at the beginning and at the end of each resampled block and improves the bias properties of the MBB.

The aim of this paper is twofold. First, we prove consistency of the MBB and of the TBB for the sample mean function in the case of weakly dependent Hilbert space-valued random variables. Furthermore, we show that these bootstrap methods provide consistent estimators of the covariance operator of the sample mean function estimator, that is of the spectral density operator of the underlying stochastic process at frequency zero. We derive our theoretical results under quite general dependence assumptions on i.e., under --approximability assumptions, which are satisfied by a large class of commonly used functional time series models; see, e.g., Hörmann and Kokoszka (2010). Second, we apply the above mentioned bootstrap procedures to the problem of fully functional testing of the equality of the mean functions between a number of independent functional time series. Testing the equality of mean functions for i.i.d. functional data has been extensively discussed in the literature; see, e.g., Benko et al. (2009), Hórvath and Kokoszka (2012, Chapter 5), Zhang (2013) and Staicu et al. (2015). Bootstrap alternatives over asymptotic approximations have been proposed in the same context by Benko et al. (2009), Zhang et al. (2010) and, more recently, by Paparoditis and Sapatinas (2016). Testing equality of mean functions for dependent functional data has also attracted some interest in the literature. Horváth et al. (2013) developed an asymptotic procedure for testing equality of two mean functions for functional time series. Since the limiting null distribution of a fully functional, -type test statistic, depends on difficult to estimate process characteristics, tests are considered which are based on a finite number of projections. A projection-based test has also been considered by Horváth and Rice (2015). Although such tests lead to manageable limiting distributions, they have non-trivial power only for deviations from the null which are not orthogonal to the subspace generated by the particular projections considered.

In this paper, we show that the MBB and TBB procedures can be successfully applied to approximate the distribution under the null of such fully functional test statistics. This is achieved by designing the suggested block bootstrap procedures in such a way that the generated pseudo-observations satisfy the null hypothesis of interest. Notice that such block bootstrap-based testing methodologies are applicable to a broad range of possible test statistics. As an example, we prove validity for the -type test statistic recently proposed by Horváth et al. (2013).

The paper is organised as follows. In Section 2, the basic assumptions on the underlying stochastic process are stated and the MBB and TBB procedures for weakly dependent, Hilbert space-valued random variables, are described. Asymptotic validity of the block bootstrap procedures for estimating the distribution of the sample mean function is established and consistency of the long run covariance operator, i.e., of the spectral density operator of the underlying stochastic process at frequency zero, is proven. Section 3 is devoted to the problem of testing equality of mean functions for several independent functional time series. Theoretical justifications of an appropriately modified version of the MBB and of the TBB procedure for approximating the null distribution of a fully functional test statistic is given and consistency under the alternative is shown. Numerical simulations and a real-life data example are presented and discussed in Section . Auxiliary results and proofs of the main results are deferred to Section  and to the supplementary material.

2 Block Bootstrap Procedures for Functional Time Series

2.1 Preliminaries and Assumptions

We consider a strictly stationary stochastic process where the random variables are random functions , defined on a probability space and take values in the separable Hilbert-space of squared-integrable -valued functions on , denoted by The expectation function of , is independent of and it is denoted by Throughout Section 2, we assume for simplicity that We define and the tensor product between and by For two Hilbert-Schmidt operators and , we denote by the inner product which generates the Hilbert-Schmidt norm , for an orthonormal basis of Without loss of generality, we assume that (the unit interval) and, for simplicity, integral signs without the limits of integration imply integration over the interval We finally write instead of .

To describe the dependent structure of the stochastic process , we use the notion of --approximability; see Hörmann and Kokoszka (2010). A stochastic process with taking values in , is called --approximable if the following conditions are satisfied:

  1. [label=()]

  2. admits the representation

    (1)

    for some measurable function , where is a sequence of i.i.d. elements in .

  3. and

    (2)

    where and, for each and is an independent copy of

The intuition behind the above definition is that the function in (1) should be such that the effect of the innovations far back in the past becomes negligible, that is, these innovations can be replaced by other, independent, innovations. We somehow strengthen (2) to the following assumption.

Assumption 1.

is --approximable and satisfies

Notice that the above assumption is satisfied by many linear and non-linear functional time series models cconsidered in the literature; see, e.g., Hörmann and Kokoszka (2010).

2.2 The Moving Block Bootstrap

The main idea of the MBB is to split the data into overlapping blocks of length and to obtain the bootstrapped pseudo-time series by joining together the independently and randomly selected blocks of observations in the order selected. Here, is a positive integer satisfying and For simplicity of notation, we assume throughout the paper that . Since the dependence of the original time series is maintained within each block, it is expected that for weakly dependent time series, this bootstrap procedure will, asymptotically, correctly imitate the entire dependence structure of the underlying stochastic process if the block length increases to infinity, at some appropriate rate, as the sample size increases to infinity. Adapting this resampling idea to a functional time series stemming from a strictly stationary stochastic process with taking values in and , leads to the following MBB algorithm.

  • Let , be an integer. Denote by the block of length starting from observation where is the number of such blocks available.

  • Define i.i.d. integer-valued random variables having a discrete uniform distribution assigning the probability to each element of the set

  • Let , and denote by the elements of Join the blocks in the order together to obtain a new set of functional pseudo observations of length denoted by

The above bootstrap algorithm can be potentially applied to approximate the distribution of some statistic of interest. For instance, let be the sample mean function of the observed stretch i.e., We are interested in estimating the distribution of . For this, the bootstrap random variable is used, where is the mean function of the functional pseudo observations i.e., and is the (conditional on the observations ) expected value of . Straightforward calculations yield

It is known that, under a variety of dependence assumptions on the underlying mean zero stochastic process it holds true that as where denotes a Gaussian process with mean zero and long run covariance operator Furthermore, as . Here, , is the so-called spectral density operator of and denotes the lag autocovariance operator of , defined by for any ; see Panaretos and Tavakoli (2013a,b).

The following theorem establishes validity of the MBB procedure for approximating the distribution of and for providing a consistent estimator of the long run covariance operator .

Theorem 2.1.

Suppose that the mean zero stochastic process satisfies Assumption and let be a stretch of pseudo observations generated by the MBB procedure. Assume that the block size satisfies as Then, as

  1. [label=()]

where is any metric metrizing weak convergence on and denotes the law of the random element . Furthermore,

  1. [label=()]

2.3 The Tapered Block Bootstrap

The TBB procedure is a modification of the block bootstrap procedure considered in Section 2.2 which is obtained by introducing a tapering of the random elements . The tapering function down-weights the endpoints of each block towards zero, i.e., towards the mean function of The pseudo observations are then obtained by choosing, with replacement, appropriately scaled and tapered blocks of length of centered observations and joining them together.

More precisely, the TBB procedure applied to the functional time series stemming from a strictly stationary, -values, stochastic process with can be described as follows. Let be the centered observations, i.e., , where . Furthermore, let , , be an integer and let , be a sequence of so-called data-tapering windows which satisfy the following assumption:

Assumption 2.

and for Furthermore,

(3)

where the function fulfills the conditions: for all with if ; for all in a neighbourhood of ; is symmetric around ; and is nondecreasing for all

Let

be a block of length starting from where each centered observation is multiplied by and scaled by Let be i.i.d. integers selected from a discrete uniform distribution which assigns probability to each element of the set . Let , and denote the -th block selected by . Join these blocks together in the order to form the set of TBB pseudo observations

Notice that the “inflation” factor is necessary to compensate for the decrease of the variance of the ’s effected by the shrinking caused by the window ; see, also, Paparoditis and Politis (2001). Furthermore, the TBB procedure uses the centered time series instead of the original time series in order to shrink the end points of the blocks towards zero.

To estimate the distribution of by means of the TBB procedure, the bootstrap random variable is used, where and is the (conditional on the observations ) expected value of Straightforward calculations yield

The following theorem establishes validity of the TBB procedure for approximating the distribution of and for providing a consistent estimator of the long run covariance operator .

Theorem 2.2.

Suppose that the mean zero stochastic process satisfies Assumption 1 and let be a sequence of data-tapering windows satisfying Assumption 2. Furthermore, let be a stretch of pseudo observations generated by the TBB procedure. Assume that the block size satisfies as . Then, as ,

  1. [label=()]

where is any metric metrizing weak convergence on , and

  1. [label=()]

Remark 2.1.

The asymptotic validity of the MBB and TBB procedures established in Theorem 2.1 and Theorem 2.2, respectively, can be extended to cover also the case where maps of the sample means (in the MBB case) and (in the TBB case) are considered, when is a normed space. For instance, such a result follows as an application of a version of the delta-method appropriate for the bootstrap and for maps which are Hadamard differentiable at tangentially to a subspace of (see Theorem 3.9.11 of van der Vaart and Wellner (1996)). Extensions of such results to almost surely convergence and for other types of differentiable maps, like for instance Fréchet differentiable functionals (see Theorem 3.9.15 of van de Vaart and Wellner (1996)) or quasi-Hadamard differentiable functionals (see Theorem 3.1 of Beutner and Zähe (2016)), are more involved since they depend on the particular map and the verification of some technical conditions.

3 Bootstrap-Based Testing of the Equality of Mean Functions

Among different applications, the MBB and TBB procedures can be also used to perform a test of the equality of mean functions between several independent samples of a functional time series. In this case, both block bootstrap procedures have to be implemented in such a way that the pseudo observations generated, satisfy the null hypothesis of interest.

3.1 The set-up

Consider independent functional time series , each one of which satisfies

(4)

where, for each is a --approximable functional process and denotes the length of the -th time series. Let be the total number of observations and note that is the mean function of the -th functional time series, The null hypothesis of interest is then,

and the alternative hypothesis

Notice that the above equality is in , i.e., means that whereas that

3.2 Block Bootstrap-based testing

The aim is to generate a set of functional pseudo observations , using either the MBB procedure or the TBB procedure in such a way that is satisfied. These bootstrap pseudo-time series can then be used to estimate the distribution of some test statistic of interest which is applied to test Toward this, the distribution of is used as an estimator of the distribution of , where is the same statistical functional as but calculated using the bootstrap functional pseudo-time series

To implement the MBB procedure for testing the null hypothesis of interest, assume, without loss of generality, that the test statistic rejects the null hypothesis when where, for , denotes the upper -percentage point of the distribution of under The MBB-based testing procedure goes then as follows:

  • Calculate the sample mean functions in each population and the pooled mean function, i.e., calculate , for , and , and obtain the residual functions in each population, i.e., calculate , for ; .

  • For , let be the block size for the -th functional time series and divide into overlapping blocks of length , say, Calculate the sample mean of the -th observations of the blocks , i.e., , for .

  • For simplicity assume that and for , let be i.i.d. integers selected from a discrete probability distribution which assigns the probability to each element of the set Generate bootstrap functional pseudo observations , as where

    (5)
  • Let be the same statistic as but calculated using the bootstrap functional pseudo-time series , . Denote by the distribution of given For reject the null hypothesis if , where denotes the upper -percentage point of the distribution of , i.e., .

Note that the distribution can be evaluated by Monte-Carlo.

To motivate the centering used in Step  denote, for , by the pseudo observations generated by applying the MBB procedure, described in Section 2.2, directly to the residuals time series Note that the ’s differ from the ’s used in (5) by the fact that the later are obtained after centering. The sample mean , , calculated in Step 2, is the (conditional on expected value of the pseudo observations where Furthermore, for we generate the ’s, , by subtracting from This is done in order for the (conditional on ) expected value of to be zero. In this way, the generated set of pseudo time series satisfy the null hypothesis . In particular, given , we have

for and That is, conditional on the mean function of each functional pseudo-time series is identical in each population and equal to the pooled sample mean function .

An algorithm based on the TBB procedure for testing the same pair of hypotheses can also be implemented by modifying appropriate the MBB-based testing algorithm. In particular, we replace Step  and Step  of this algorithm by the following steps:

  • For , let be the block size for the -th functional time series and Let also be the centered values of i.e., , where Also, let be a sequence of data-tapering windows satisfying Assumption 2. Now, for , let

    where Here, denotes the tapered block of ’s of length starting from Furthermore, for , calculate the sample mean of the th observations of the blocks , i.e.,

  • For let be i.i.d. integers selected from a discrete probability distribution which assigns the probability to each Generate bootstrap functional pseudo-observations according to where

As in the case of the MBB-based testing, the generation of ensures that the functional pseudo-time series satisfy that is, given , we have that

3.3 Bootstrap Validity

Notice that, since the proposed block bootstrap-based methodologies are not designed having any particular test statistic in mind, they can be potentially applied to a wide range of test statistics. To prove validity of the proposed block bootstrap-based testing procedures, however, a particular test statistic has to be considered. For instance, one such test statistic is the fully functional test statistic proposed by Horváth et al. (2013) for the case of populations. Let be two independent samples of curves, satisfying model (4). For and for , denote by the kernels of the long run covariance operators , given by The test statistic considered in Horváth et al. (2013), evaluates then the -distance of the two sample mean functions and it is given by

where Horváth et al. (2013) proved that if and then, under converges weakly to , where is a Gaussian process satisfying and Notice that calculation of critical values of the above test requires estimation of the distribution of which is a difficult task.

Although the test statist is quite appealing because it is fully functional, its limiting distribution is difficult to implement which demonstrates the importance of the bootstrap. To investigate the consistency properties of the bootstrap, we first establish a general result which allows for the consideration of different test statistics that can be expressed as functionals of the basic deviation process

(6)
Theorem 3.1.

Let Assumption 1 be satisfied. Assume that and that, for , the block size fulfills as . Then, conditional on , as ,

  1. [label=()]

  2. , in probability,

and, if additionally Assumption 2 is satisfied,

  1. [label=()]

  2. , in probability.

Here, denotes weak convergence in .

By Theorem 3.1 and the continuous mapping theorem, the suggested block bootstrap-based testing procedures can be successfully applied to consistently estimate the distribution of any test statistic of interest which is a continuous function of the basic deviation process (6). We elaborate on some examples. Below, denotes the distribution function of the random variable when is true.

Consider for instance the test statistic . Let

and

where and , . We then have the following result.

Corollary 3.1.

Let the assumptions of Theorem 3.1 be satisfied. Then,

  1. [label=()]