Functional data analysis in the Banach space of continuous functionsThis research was partially supported by NSF grants DMS 1305858 and DMS 1407530, by the Collaborative Research Center Statistical modeling of nonlinear dynamic processes (Sonderforschungsbereich 823, Teilprojekt A1, C1) and by the Research Training Group High-dimensional Phenomena in Probability - Fluctuations and Discontinuity (RTG 2131) of the German Research Foundation. Part of the research was done while A. Aue was visiting Ruhr-Universität Bochum as a Simons Visiting Professor of the Mathematical Research Institute Oberwolfach.

# Functional data analysis in the Banach space of continuous functions1

## Abstract

Functional data analysis is typically conducted within the -Hilbert space framework. There is by now a fully developed statistical toolbox allowing for the principled application of the functional data machinery to real-world problems, often based on dimension reduction techniques such as functional principal component analysis. At the same time, there have recently been a number of publications that sidestep dimension reduction steps and focus on a fully functional -methodology. This paper goes one step further and develops data analysis methodology for functional time series in the space of all continuous functions. The work is motivated by the fact that objects with rather different shapes may still have a small -distance and are therefore identified as similar when using an -metric. However, in applications it is often desirable to use metrics reflecting the visualization of the curves in the statistical analysis. The methodological contributions are focused on developing two-sample and change-point tests as well as confidence bands, as these procedures appear do be conducive to the proposed setting. Particular interest is put on relevant differences; that is, on not trying to test for exact equality, but rather for pre-specified deviations under the null hypothesis.

The procedures are justified through large-sample theory. To ensure practicability, non-standard bootstrap procedures are developed and investigated addressing particular features that arise in the problem of testing relevant hypotheses. The finite sample properties are explored through a simulation study and an application to annual temperature profiles.
Banach spaces; Functional data analysis; Time series; Relevant hypotheses; Two-sample tests; Change-point tests; Bootstrap

62G10, 62G15, 62M10,

## 1Introduction

This paper proposes new methodology for the analysis of functional data, in particular for the two-sample and change-point settings. The basic set-up considers a sequence of Banach space-valued time series satisfying mixing conditions. The proposed methodology therefore advances functional data analysis beyond the predominant Hilbert space-based methodology. For the latter case, there exists by now a fully fledged theory. The interested reader is referred to the various monographs Ferraty and Vieu [21], Horváth and Kokoszka [26], and Ramsay and Silverman [34] for up-to-date accounts. Most of the available statistical procedures discussed in these monographs are based on dimension reduction techniques such as functional principal component analysis. However, the integral role of smoothness has been discussed at length in Ramsay and Silverman [34] and virtually all functions fit in practice are at least continuous. In such cases dimension reduction techniques can incur a loss of information and fully functional methods can prove advantageous. More recently, Aue et al. [6], Bucchia and Wendler [12] and Horváth et al. [28] discussed fully functional methodology in a Hilbert space framework.

Since all functions utilized for practical purposes are at least continuous, and often smoother than that, it might be more natural to develop methodology for functional data in the space of continuous functions. This is the approach pursued in the present paper. While it might thus be reasonable to build statistical analysis adopting this point of view, there are certain difficulties associated with it. Giving up on the theoretically convenient Hilbert space setting means that substantially more effort has to be put into the derivation of theoretical results, especially if one is interested in the incorporation of dependent functional observations. Section 2 of the main part of this paper gives an introduction to Banach space methodology and states some basic results, in particular an invariance principle for a sequential process in the space of continuous functions.

The theoretical contributions will be utilized for the development of relevant two-sample and change-point tests in Sections Section 3 and Section 4, respectively. Here the usefulness of the proposed approach becomes more apparent as differences between two smooth curves are hard to detect in practice. Additionally, small discrepancies might perhaps not even be of importance in many applied situations. Therefore the “relevant” setting is adopted that is not trying to test for exact equality under the null hypothesis, but allows for pre-specified deviations from an assumed null function. For example, if , the space of continuous functions on the compact interval , is equipped with the sup-norm , and and are the mean functions corresponding to two samples, interest is in hypotheses of the form

where denotes a pre-specified constant. The classical case of testing perfect equality, obtained by the choice , is therefore a special case of . However, in applications it might be reasonable to think about this choice carefully and to define precisely the size of change which one is really interested in. In particular, testing relevant hypotheses avoids the consistency problem as mentioned in Berkson [9], that is: any consistent test will detect any arbitrary small change in the mean functions if the sample size is sufficiently large. One may also view this perspective as a particular form of a bias-variance trade-off. The problem of testing for a relevant difference between two (one-dimensional) means and other (finite-dimensional) parameters has been discussed by numerous authors in biostatistics (see Wellek [41] for a recent review), but to the best of our knowledge these testing problems have not been considered in the context of functional data. It turns out that from a mathematical point of view the problem of testing relevant (i.e., ) hypotheses is substantially more difficult than the classical problem (i.e., ). In particular, it is not possible to work with stationarity under the null hypothesis, making the derivation of a limit distribution of a corresponding test statistic or the construction of a bootstrap procedure substantially more difficult.

Section 3 develops corresponding two-sample tests for the Banach space . Section 4 extends these results to the change-point setting (see Aue and Horváth [5] for a recent review of change-point methodology for time series). Here, one has to deal with the additional complexity of locating the unknown time of change. Several new results for change-point analysis of functional data in are put forward. A specific challenge here is the fact that the asymptotic null distribution of test statistics for hypotheses of the type depends on the set of extremal points of the unknown difference , and is therefore not distribution free. Most notable for both the two-sample and the change-point problem is the construction of non-standard bootstrap tests for relevant hypotheses to solve this problem. The bootstrap is theoretically validated and then used to determine cut-off values for the proposed procedures.

Another area of application that lends itself naturally to Banach space methodology is that of constructing confidence bands for the mean function of a collection of potentially temporally dependent, continuous functions. There has been recent work by Choi and Reimherr [17] on this topic in a Hilbert space framework for functional parameters of independent functions based on geometric considerations. Here, results for confidence bands for the mean difference in a two-sample framework are added in Section ?. Natural modifications allow for the inclusion of the one-sample case. One of the main differences between the two approaches is that the proposed bands hold pointwise, while those constructed from Hilbert space theory are valid only in an -sense. This property is appealing for practitioners, because two mean curves can have a rather different shape, yet the -norm of their difference might be very small.

The finite-sample properties of the relevant two-sample and change-point tests and, in particular, the performance of the bootstrap procedures are evaluated with the help of a Monte Carlo simulation study in Section 5. A number of scenarios are investigated, with the outcomes showing that the proposed methodology performs reasonably well. Furthermore, an application to a prototypical data example is given, namely two-sample and cange-point tests for annual temperature profiles recorded at measuring stations in Australia.

The outline of the rest of this paper is as follows. Section 2 introduces the basic notions of the proposed Banach space methodology and gives some preliminary results. Section 3 discusses the two-sample problem and Section 4 is concerned with change-point analysis. Empirical aspects are highlighted in Section 5. Proofs of the main results can be found in an online supplement to this paper.

## 2C(t)-valued random variables

In this section some basic facts are provided about central limit theorems and invariance principles for -valued random variables, where is the set of continuous functions from into the real line . In what follows, unless otherwise mentioned, will be equipped with the sup norm , defined by , thus making a Banach space. The natural Borel -field over is then generated by the open sets relative to the sup norm . Measurability of random variables on taking values in is understood to be with respect to . The underlying probability space is assumed complete. It is further assumed that there is a metric on such that is totally bounded. The fact that is metrizable implies that is separable and measurability issues are avoided (see Theorem 7.7 in Janson and Kaijser [30]). Moreover, any random variable in is tight (see Theorem 1.3 in Billingsley [10]).

Let be a random variable on taking values in . There are different ways to formally introduce expectations and higher-order moments of Banach space-valued random variables (see Janson and Kaijser [30]). The expectation of a random variable in exists as an element of whenever . The th moment exists whenever . As pointed out in Chapter 11 of Janson and Kaijser [30], th order moments may be computed through pointwise evaluation as . The case is important as it allows for the computation of covariance kernels in a pointwise fashion.

A sequence of random variables converges in distribution or weakly to a random variable in , whenever it is asymptotically tight and its finite-dimensional distributions converge weakly to the finite-dimensional distributions of , that is,

for any and any , where the symbol “” indicates convergence in distribution in .

A centered random variable in is said to be Gaussian if its finite-dimensional distributions are multivariate normal, that is, for any , , where the th entry of the covariance matrix is given by , . The distribution of is hence completely characterized by its covariance function ; see Chapter 2 of Billingsley [10].

In general Banach spaces, deriving conditions under which the central limit theorem (CLT) holds is a difficult task, significantly more complex than the counterpart for real-valued random variables. In Banach spaces, finiteness of second moments of the underlying random variables does not provide a necessary and sufficient condition. Elaborate theory has been developed to resolve the issue, resulting in notions of Banach spaces of type 2 and cotype 2 (see the book Ledoux and Talagrand [31] for an overview). However, the Banach space of continuous functions on a compact interval does not possess the requisite type and cotype properties and further assumptions are needed in order to obtain the CLT, especially to incorporate time series of continuous functions into the framework. To model the dependence of the observations, the notion of -mixing triangular arrays of -valued random variables is introduced; see Bradley [11] and Samur [38]. First, for any two -fields and , define

where denotes the conditional probability of given . Next, denote by the -field generated by . Then, define the -mixing coefficient as

and call the triangular array -mixing whenever . The -mixing property is defined in a similar fashion for a sequence of random variables.

In order to obtain a CLT as well as an invariance principle for triangular arrays of -mixing random elements in , the following conditions are imposed.

Note that these assumptions can be formulated for sequences of random variables in in a similar way. Condition (A5) is satisfied if the distribution of the sums is symmetric for any and (see the remark after Proposition 3.1 in Samur [37]). Assumptions (A1)–(A4) imply the following CLT which is proved in Section 6.2 of the online supplement. Throughout this paper the symbol denotes weak convergence in for some .

Assumption (A5) will be used to verify a weak invariance principle for the process given by

useful for the change-point analysis proposed in Section 4. Note that the process is an element of the Banach space , where the norm on this space is given by

(note that each element of can equivalently be regarded as an element of ). Here and throughout this paper the notation is used to denote any of the arising -norms as the corresponding space can be identified from the context. The proof of the following result is postponed to Section 6.2 of the online supplement.

## 3The two-sample problem

From now on, consider the case , as this is the canonical choice for functional data analysis. Two-sample problems have a long history in statistics and the corresponding tests are among the most applied statistical procedures. For the functional setting, there have been a number of contributions as well. Two are worth mentioning in the present context. Hall and Van Keilegom [24] studied the effect of smoothing when converting discrete observations into functional data. Horváth et al. [27] introduced two-sample tests for - approximable functional time series based on Hilbert-space theory. In the following, a two-sample test is proposed in the Banach-space framework of Section 2. To this end, consider two independent samples and of -valued random variables. Under (A2) in Assumption ? expectation functions and covariance kernels exist and are denoted by and , and and , respectively. Interest is then in the size of the maximal deviation

between the two mean curves, that is, in testing the hypotheses of a relevant difference

where is a pre-specified constant determined by the user of the test. Note again that the “classical” two-sample problem versus – which, to the best of our knowledge, has not been investigated for -valued data yet – is contained in this setup as the special case . Observe also that tests for relevant differences between two finite-dimensional parameters corresponding to different populations have been considered mainly in the biostatistical literature, for example in Wellek [41]. It is assumed throughout this section that the samples are balanced in the sense that

as . Additionally, let and be sampled from independent time series and that satisfy conditions (A1)–(A4) of Assumption ?. Under these conditions both functional time series satisfy the CLT and it then follows from Theorem ? that

where and are independent, centered Gaussian processes possessing covariance functions

respectively. Here and , correspond to the respective sequences and and are defined in Assumption ?. Now, the weak convergence in and the independence of the samples imply immediately that

in as , where is a centered Gaussian process with covariance function

Under the convergence in the statistic

is a reasonable estimator of the maximal deviation , and the null hypothesis in is rejected for large values of . In order to develop a test with a pre-specified asymptotic level, the limit distribution of is determined in the following. For this purpose, let

if , and define if . Finally, denote by the set of extremal points of the difference of the two mean functions. The first main result establishes the asymptotic distribution of the statistic .

It should be emphasized that the limit distribution depends in a complicated way on the set of extremal points of the difference and is therefore not distribution free, even in the case of i.i.d. data. In particular, there can be two sets of processes with corresponding mean functions and such that . However, the respective limit distributions in Theorem ? will be entirely different if the corresponding sets of extremal points and do not coincide. The proof of Theorem ? is given in Section 6.3 of the online supplement. In the case , and it follows for the random variable in Theorem ? that

Here the result is a simple consequence of the weak convergence of the process (see Theorem ?) and the continuous mapping theorem.

However, Theorem ? provides also the distributional properties of the statistic in the case . This is required for testing the hypotheses of a relevant difference between the two mean functions (that is, the hypotheses in with ) of primary interest here. In this case the weak convergence of an appropriately standardized version of does not follow from the weak convergence , as the process inside the supremum in is not centered. In fact, additional complexity enters in the proofs because even under the null hypothesis observations cannot be easily centered. For details, refer to Section 6.3 of the online supplement.

### 3.1Asymptotic inference

#### Testing the classical hypothesis H0:μ1≡μ2

Theorem ? also provides the asymptotic distributions of the test statistic in the case of two identical mean functions, that is, if . This is the situation investigated in Hall and Van Keilegom [24] and Horváth et al. [27] in Hilbert-space settings. Here it corresponds to the special case and thus . Consequently,

where the random variable is defined in . An asymptotic level test for the classical hypotheses

may hence be obtained by rejecting whenever

where is the -quantile of the distribution of the random variable defined in . Using Theorem ? it is easy to see that the test defined by is consistent and has asymptotic level .

#### Confidence bands

The methodology developed so far can easily be applied to the construction of simultaneous asymptotic confidence bands for the difference of the mean functions. There is a rich literature on confidence bands for functional data in Hilbert spaces. The available work includes Degras [18], who dealt with confidence bands for nonparametric regression with functional data; Cao et al. [16], who studied simultaneous confidence bands for the mean of dense functional data based on polynomial spline estimators; Cao [15], who developed simultaneous confidence bands for derivatives of functional data when multiple realizations are at hand for each function, exploiting within-curve correlation; and Zheng et al. [42] who treated the sparse case. Most recently Choi and Reimherr [17] extracted geometric features akin to Mahalanobis distances to build confidence bands for functional parameters.

The results presented here are the first of their kind relating to Banach space-valued functional data. The first theorem uses the limit distribution obtained in Theorem ? to construct asymptotic simultaneous confidence bands for the two-sample case. A corresponding bootstrap analog will be developed in the next section. Confidence bands for the one-sample case can be constructed in a similar fashion using standard arguments and the corresponding results are consequently omitted.

Note that, unlike their Hilbert-space counterparts, the simultaneous confidence bands given in Theorem ? (and their bootstrap analogs in Section ?) hold for all and not only almost everywhere, making the proposed bands more easily interpretable and perhaps more useful for applications.

#### Testing for a relevant difference

Recall the definition of the random variable in Theorem ?, then the null hypothesis of no relevant difference in is rejected at level , whenever the inequality

holds, where denotes the -quantile of the distribution of . A conservative test avoiding the use of quantiles depending on the set of extremal points can be obtained observing the inequality

where the random variable is defined in . If denotes the -quantile of the distribution of , then implies and a conservative asymptotic level test is given by rejecting the null hypothesis in , whenever the inequality

holds. The properties of the tests and depend on the size of the distance and will be explained below. In particular, observe the following properties for the test :

• If , Slutsky’s theorem yields that

• If , we have

• If , the same calculation as in (a) implies

proving that the test defined in is consistent.

• If the mean functions and define a boundary point of the hypotheses, that is, and either or , then or , and consequently

Using similar arguments it can be shown that the test satisfies

Summarizing, the tests for the hypothesis of no relevant difference between the two mean functions defined in and have asymptotic level and are consistent. However, the discussion given above also shows that the test is conservative, even when .

### 3.2Bootstrap

In order to use the tests , and for classical and relevant hypotheses, the quantiles of the distribution of the random variables and defined in and need to be estimated, which depend on certain features of the data generating process. The law involves the unknown set of extremal points of the differences of the mean functions. Moreover, the distributions of and depend on the long-run covariance function . There are methods available in the literature to consistently estimate the covariance function (see, for example, Horváth et al. [27]). In practice, however, it is difficult to reliably approximate the infinite sums in and therefore an easily implementable bootstrap procedure is proposed in the following.

It turns out that a different and non-standard bootstrap procedure will be required for testing relevant hypotheses than for classical hypotheses (and the construction of confidence bands) as in this case the null distribution depends on the set of extremal points . The corresponding resampling procedure requires a substantially more sophisticated analysis. Therefore the analysis of bootstrap tests for the classical hypothesis and bootstrap confidence intervals is given first and discussion of bootstrap tests for relevant hypotheses is deferred to Section ?.

#### Bootstrap confidence intervals and tests for the classical hypothesis H0:μ1=μ2

Following Bücher and Kojadinovic [14] the use of a muliplier block bootstrap is proposed. To be precise, let and denote independent sequences of independent standard normally distributed random variables and define the -valued processes through

for , where denote window sizes such that and as . The following result is a fundamental tool for the theoretical investigations of all bootstrap procedures proposed in this paper and is proved in Section 6.3 of the online supplement.

Note that Theorem ? holds under the null hypothesis and alternative. It leads to the following results regarding confidence bands and tests for the classical hypothesis based on the the multiplier bootstrap. To this end, note that for the statistics

the continuous mapping theorem yields

where the random variables are independent copies of the statistic defined in . Now, if is the empirical -quantile of the bootstrap sample , the following results are obtained.

This section is concluded with a corresponding statement regarding the bootstrap test for the classical hypotheses in , which rejects the null hypothesis whenever

where the statistic is defined in .

#### Testing for relevant differences in the mean functions

The problem of constructing an appropriate bootstrap test for the hypotheses of no relevant difference in the mean functions is substantially more complicated. The reason for these difficulties consists in the fact that in the case of relevant hypotheses the limit distribution of the corresponding test statistic is complicated. In contrast to the problem of testing the classical hypotheses , where it is sufficient to mimic the distribution of the statistic in (corresponding to the case ) one requires the distribution of the statistic , which depends in a sophisticated way on the set of extreme points of the (unknown) difference . Under the null hypothesis these sets can be very different, ranging from a singleton to the full interval . As a consequence the construction of a valid bootstrap procedure requires appropriate consistent estimates of the sets and introduced in Theorem ?.

For this purpose, recall the definition of the Haussdorff distance between two sets , given by

and denote by the set of all compact subsets of the interval . First, define estimates of the extremal sets and by

where . Our first result shows that the estimated sets and are consistent for and , respectively.

The main implication of Theorem ? consists in the fact that the random variable

converges weakly to the random variable . Note that by Theorem ? and that in probability by the previous theorem, but the combination of both statements is more delicate and requires a continuity argument which is given in Section 6.3 of the online supplement, where the following result is proved.

Theorem ? leads to a simple bootstrap test for the hypothesis of no relevant change. To be precise, let denote the empirical -quantile of the bootstrap sample , then the null hypothesis of no relevant change is rejected at level , whenever

The final result of this section shows that the test is consistent and has asymptotic level . The proof is obtained by similar arguments as given in the proof of Theorem ?, which are omitted for the sake of brevity.

## 4Change-point analysis

Change-point problems arise naturally in a number of applications (for example, in quality control, economics and finance; see Aue and Horváth [5] for a recent review). In the functional framework, applications have centered around environmental and climate observations (see Aue et al. [3]) and intra-day finance data (see Horváth et al. [26]). One of the first contributions in the area are Berkes et al. [8] and Aue et al. [4] who developed change-point analysis in a Hilbert space setting for independent data. Generalizations to time series of functional data in Hilbert spaces are due to Aston and Kirch [1]. For Banach-spaces, to the best of our knowledge, the only contributions to change-point analysis available in the literature are due to Račkauskas and Suquet [35], who have provided theoretical work analyzing epidemic alternatives for independent functions based on Hölder norms and dyadic interval decompositions. This section details new results on change-point analysis for -valued functional data. The work is the first to systematically exploit a time series structure of the functions as laid out in Section 2.

### 4.1Asymptotic inference

More specifically, the problem of testing for a (potentially relevant) change-point is considered for triangular arrays of -valued random variables satisfying Assumption ?. Denote by the expectation of and assume as in part (A2) of Assumption ? that is the covariance kernel common to all random functions in the sample. Parametrize with , where is a constant, the location of the change-point, so that the sequence of mean functions satisfies

Then, for any , both and consist of (asymptotically) identically distributed but potentially dependent random functions. Let again denote the maximal deviation between the mean functions before and after the change-point. Interest is then in testing the hypotheses of a relevant change, that is,

where is a pre-specified constant. The relevant change-point test setting may be viewed in the context of a bias-variance trade-off. In the time series setting, one is often interested in accurate predictions of future realizations. However, if the stretch of observed functions suffers from a structural break, then only those functions sampled after the change-point should be included in the prediction algorithm because these typically require stationarity. This reduction of observations, however, inevitably leads to an increased variability that may be partially offset with a bias incurred through the relevant approach: if the maximal discrepancy in the mean functions remains below a suitably chosen threshold , then the mean-squared prediction error obtained from predicting with the whole sample might be smaller than the one obtained from using only the non-contaminated post-change sample. In applications to financial data, the size of the allowable bias could also be dictated by regulations imposed on, say, investment strategies (Dette and Wied [20] specifically mention Value at Risk as one such example).

Recall the definition of the sequential empirical process in , where the argument of this process is used to search over all potential change locations. Note that can be regarded as an element of the Banach space (see the discussion before Theorem ?). Define the -valued process

then, under Assumption ?, Theorem ? and the continuous mapping theorem show that

in , where . In particular, is a centered Gaussian measure on defined by

In order to define a test for the hypothesis of a relevant change-point defined by consider the sequential empirical process on given by

Evaluating its expected value shows that, in contrast to , the process is typically not centered and the equality holds only in the case . A straightforward calculation shows that

uniformly in . As the function attains its maximum in the interval at the point , the statistic

is a reasonable estimate of . It is therefore proposed to reject the null hypothesis in for large values of the statistic . The following result specifies the asymptotic distribution of .

The proof of Theorem ? is given in Section 6.4 of the online supplement. The limit distribution of is rather complicated and depends on the set which might be different for functions with the same sup-norm but different corresponding set . It is also worthwhile to mention that the condition is essential in Theorem ?. In the remaining case the weak convergence of simply follows from , and the continuous mapping theorem, that is,

whenever .

If , the true location of the change-point is unknown and therefore has to be estimated from the available data. The next theorem, which is proved in Section 6.4 of the online supplement, proposes one such estimator and specifies its large-sample behavior in form of a rate of convergence.

Recall that the possible range of change locations is restricted to the open interval and define the modified change-point estimator

where is given by . Since , it follows that if , and, if suppose that converges weakly to a -valued random variable .

### 4.2Bootstrap

In order to avoid the difficulties mentioned in the previous remark, a bootstrap procedure is developed and its consistency is shown. To be precise, denote by

estimators for the expectation before and after the change-point. Let denote independent sequences of independent standard normally distributed random variables and consider the -valued processes defined by

where is a bandwidth parameter satisfying as and

for (). Note that it is implicitly assumed that for any and any such that . Next, define

The proof of Theorem ? is provided in Section 6.4 of the online supplement. Note that condition is similar to Assumption (A5) and ensures that the weak invariance principle holds for the bootstrap processes.

We now consider a resampling procedure for the classical hypotheses, that is in . For that purpose, define, for ,

Then, by the continuous mapping theorem,

in , where are independent copies of the random variable defined in . If is the empirical -quantile of the bootstrap sample , the classical null hypothesis of no change point is rejected, whenever

It follows by similar arguments as given in Section 6.3 of the online supplement that this test is consistent and has asymptotic level in the sense of Theorem ?, that is

for any . The details are omitted for the sake of brevity.

We now continue developing bootstrap methodology for the problem of testing for a relevant change point, that is in . It turns out that the theoretical analysis is substantially more complicated as the null hypothesis defines a set in in . Similar as in the estimates of the extremal sets and are defined by

where and is given in . Consider bootstrap analogs

of the statistic in Corollary ?, where .

A test for the hypothesis of a relevant change-point in time series of continuous functions is now obtained by rejecting the null hypothesis in , whenever

where is the empirical -quantile of the bootstrap sample . It follows by similar arguments as given in Section 6.3 of the online supplement that this test is consistent and has asymptotic level in the sense of Theorem ?, that is