Testing the predictor effect on a functional response

# Testing the predictor effect on a functional response

Valentin Patilea***CREST (Ensai) & IRMAR, France; patilea@ensai.fr. This author gratefully acknowledges financial support from the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project PN-II-ID-PCE-2011-3-0893.       César Sánchez-SelleroFacultad de Matemáticas, Univ. de Santiago de Compostela, Spain; cesar.sanchez@usc.es. This author gratefully acknowledges support from the Spanish Ministry of Science, project MTM2008-03010, and from Ensai.       Matthieu SaumardPontificia Univ. Catolíca de Valparaíso, Chile; Matthieu.Saumard@gmail.com. This author gratefully acknowledges financial support fom CONICYT/FONDECYT project number 3140602.
July 3, 2019
###### Abstract

This paper examines the problem of nonparametric testing for the no-effect of a random covariate (or predictor) on a functional response. This means testing whether the conditional expectation of the response given the covariate is almost surely zero or not, without imposing any model relating response and covariate. The covariate could be univariate, multivariate or functional. Our test statistic is a quadratic form involving univariate nearest neighbor smoothing and the asymptotic critical values are given by the standard normal law. When the covariate is multidimensional or functional, a preliminary dimension reduction device is used which allows the effect of the covariate to be summarized into a univariate random quantity. The test is able to detect not only linear but nonparametric alternatives. The responses could have conditional variance of unknown form and the law of the covariate does not need to be known. An empirical study with simulated and real data shows that the test performs well in applications.

Keywords: functional data regression, goodness-of-fit test, Karhunen-Loève decomposition, nearest neighbor smoothing, statistics

## 1 Introduction

The analysis of functional data has captured huge attention from the scientific community in recent years. One main reason is that functional data arise in many applications. Regression models for functional data are now common tools for practitioners. As an illustration, in this paper we reconsider two extensively studied examples that lead to functional regression analysis and we provide new insight into the validity of commonly recommended models. The first, concerns the number of eggs laid daily by fruit flies, where the egg-laying trajectory for each fly is usually considered as a curve. Each fly is an individual, the egg-laying trajectory being its response that is modeled as a functional datum. See Chiou et al. (2003) for a detailed description of this dataset. Another application will be the Canadian Weather data, intensively studied in Ramsay and Silverman (2005), where the daily rainfall throughout the year, at each of 35 weather stations is considered as a curve to be explained through a regression model. Many other compelling examples can be found in the monographs of Ramsay and Silverman (2005) and Horváth and Kokoszka (2012).

Several models have been designed for regression with functional response. A multiplicative effects model is proposed by Chiou et al. (2003) for the egg-laying curves. Concurrent and functional linear models have been analysed by Ramsay and Silverman (2005) for the Canadian Weather dataset. For general functional responses, the functional linear model is the benchmark approach, see Chiou et al. (2003), Yao, Müller and Wang (2005), Gabrys, Horváth and Kokoszka (2010) and the references therein. Recently, alternative nonparametric approaches have been considered; see Ferraty et al. (2011), Lian (2011), Ferraty, Van Keilegom and Vieu (2012).

Checking for the goodness-of-fit of a model is an important step in the regression analysis. In a functional regression setup, aspects of the goodness-of-fit seem to be still open. In this paper a new method is proposed to check for the effect of a certain predictor covariate (or predictor) on a functional response. The statistical problem is to build a test of the null hypothesis

 H0:E(U|X)=0 \rm almost surely (a.s.), (1.1)

against the nonparametric alternative , where is a functional response, taking values in a separable Hilbert space , and being the covariate. Our method easily extends to the test of the goodness-of-fit of a regression model with functional response. To this end, it suffices to consider the error term, in the mean regression model. In applications, the sample of is replaced by the residuals obtained from the model fit. The predictor will be allowed to be an univariate or a multivariate or a functional variable taking values in a separable Hilbert space , possibly different from . In the case of egg-laying curves, the predictor will be the total number of eggs, which is a univariate random variable, while in the Canadian Weather dataset, the predictor will be the curve of the daily temperature throughout the year, at each weather station. When little is known about the structure of the data, it is preferable to allow for general alternatives when testing the goodness-of-fit. Moreover, when the link between the response and the predictor is modeled through a nonparametric approach, one should first check whether the predictor has an effect on the response or not.

To the best of our knowledge, only the contributions of Chiou and Müller (2007) and Kokoszka et al. (2008) have investigated the problem of goodness-of-fit with functional responses. Chiou and Müller (2007) introduced diagnostics for the functional regression fit using plots of functional principal component (FPC) scores of the response and the covariate. They also used residual versus fitted value FPC scores plots. (The FPC scores are the random coefficients in the Karhunen-Loève expansions.) It is easy to understand that such two-dimensional plots could not capture all types of effects of the covariate on the response, such as for instance, the effect of the interactions of the covariate FPC. Kokoszka et al. (2008) used the response and covariate FPC scores to build a test statistic with an distribution under the null hypothesis of the lack of dependence in the functional linear model. Again, by construction, the test of Kokoszka et al. cannot detect any nonlinear alternative.

The goodness-of-fit or no-effect against nonparametric alternatives, has been very little explored in the functional data context. In the case of scalar response, Delsol, Ferraty and Vieu (2011) proposed a testing procedure adapted from the approach of Härdle and Mammen (1993). Their procedure involves smoothing in the functional space and requires quite restrictive conditions. Patilea, Sánchez-Sellero and Saumard (2012) and García-Portugués, González-Manteiga and Febrero-Bande (2012) have proposed alternative, nonparametric, goodness-of-fit tests for scalar response and functional covariate using projections of the covariate. Such projection-based methods are less restrictive and perform well in applications. To the best of our knowledge, no nonparametric statistical test of no-effect or goodness-of-fit is available when the response is functional.

The paper is organized as follows. In section 2 the proposed test for the univariate predictor is presented. The test is based on nearest neighbor smoothing. In section 3, the test is extended to functional predictors, with a methodology based on projections, as in the spirit of Patilea, Sánchez-Sellero and Saumard (2012). The functional predictor has to be decomposed in a Hilbert space basis, but we show that our results allow for a data-driven basis, for instance, that given by the functional principal component analysis. A wild bootstrap procedure for computing the critical values with finite samples is also proposed. In section 4, the proposed test is evaluated for simulated data, and used to test the goodness-of-fit of well-known models for the two real data examples mentioned above: the egg-laying trajectories of fruit-flies, and the Canadian Weather dataset. We conclude that it performs well in applications and, in practice, is useful for model selection. The proofs are postponed to the appendix.

## 2 The univariate predictor case

For a clearer presentation and since this case is important in its own right, we first consider the particular case of a univariate covariate (predictor) . Our approach seems to be the first one able to check the goodness-of-fit against general alternatives even in this particular case. Given a sample from , let

 Qn=1n(n−1)∑1≤i≠j≤n⟨Ui,Uj⟩1hKh(Fn(Xi)−Fn(Xj)),

where is the inner product of and taking values in the Hilbert space , , is a kernel, is the bandwidth, and is the empirical distribution function of the sample .

The statistic is related to statistics considered by Fan and Li (1996) and Zheng (1996) for checks of parametric regressions for finite dimensional data. The main idea here is to replace the common products of responses (or residuals) for univariate response, by the inner products of the functional responses (or residuals). While Fan and Li (1996) and Zheng (1996) used Nadaraya-Watson weights, symmetrized nearest neighbor weights, introduced by Yang (1981) and Stute (1984), are employed here. It is well known that, in contrast to the Nadaraya-Watson estimator, the asymptotic variance of the nearest neighbor kernel estimator does not depend on the density of the covariate. This presents some advantages, especially in the case of a functional predictor that will be considered in the following. Hence, our new statistic is more in the spirit of that introduced by Stute and González-Manteiga (1996), to test simple linear models with scalar outcome and covariate and homoscedastic error term. Herein we allow for heteroscedasticity of unknown form and hence, in the particular case where and are scalar, we extend the framework of Stute and González-Manteiga (1996).

Let denote the norm associated with the inner product and let us define

 Q=E[⟨U,E{U∣F(X)}⟩]=E[∥E{U∣F(X)}∥2],

which by construction is nonnegative. The idea of the test comes from the fact that is true if and only if Then, it will follow from the theoretical results proven for the general covariate case that under the quantity suitably standardized has asymptotic standard normal distribution. Meanwhile, when is not true will converge to a strictly positive and this will guarantee consistency against any departure from . To standardize , the following simple variance estimator could be used

 ˆv2n=2n(n−1)h∑j≠i⟨Ui,Uj⟩2K2h(Fn(Xi)−Fn(Xj)).

Consequently, the test statistic we consider is

 Tn=nh1/2Qnˆvn

and the associated test of the asymptotic level is given by , where is the -th quantile of the standard normal distribution.

## 3 The functional predictor case

The approach introduced in the previous section is based on univariate smoothing and could not be immediately extended to multivariate or functional covariate To reduce the case of high-dimensional predictor to univariate predictor, we use a new dimension reduction idea inspired by Lavergne and Patilea (2008).

### 3.1 A dimension reduction lemma

To simplify the presentation and without loss of generality, hereafter we focus on the case where the Hilbert spaces and are equal to the space of square-integrable functions defined on Let denote the inner product in , that is

 ⟨W1,W2⟩=∫10W1(t)W2(t)dt,∀W1,W2∈L2[0,1].

Let be the associated norm. For the moment, let be an arbitrarily fixed orthonormal basis of the function space . The extension to a data-driven basis is considered in section 3.5. Also without loss of generality, hereafter we suppose that Then the response and the predictor processes can be expanded to give

 U(t)=∞∑j=1⟨U,ψj⟩ψj(t)and X(t)=∞∑j=1⟨X,ψj⟩ψj(t),t∈[0,1]. (3.2)

For any integer and any non random let

 ⟨X,γ⟩=p∑i=1⟨X,ψj⟩γj

and let denote the distribution function (d.f.) of the real-valued variable , i.e., Moreover, let denote the unit hypersphere in . Our approach relies on the following extension of Lemma 2.1 of Lavergne and Patilea (2008) to Hilbert space-valued random variables.

###### Lemma 3.1

Let be random functions. Assume that and

(A) The following statements are equivalent:

1. a.s.

2. a.s.

3. a.s.

(B) Suppose, also, that there exists such that

 E(∥U∥exp{s∥X∥})<∞. (3.3)

If , there then exists an integer such that , the set

 Ap={γ∈Sp:E(U∣⟨X,γ⟩)=0a.s.}={γ∈Sp:E(U∣Fγ(⟨X,γ⟩))=0a.s.}

has Lebesgue measure zero on the unit hypersphere and is not dense.

Point (A) is a cornerstone for proving the behavior of our test under the null and the alternative hypotheses. Point (B) shows that in applications it is rather easy to find directions able to reveal the failure of the null hypothesis (1.1) since, under mild conditions, such directions represent almost all the points on the unit hyperspheres , provided is sufficiently large. By the Cauchy-Schwarz inequality, condition (3.3) holds if for some and for The exponential moment condition on is satisfied in many situations, for instance when is a mean-zero Gaussian process with Moreover, in general, moment restrictions on the covariate are not restrictive for goodness-of-fit testing purposes. Indeed, if does not satisfy condition (3.3), it suffices to transform into some variable that generates the same field and satisfies (3.3).

Lemma 3.1 contains the case of a multivariate, finite dimension covariate as a particular case. It will be clear from the following how the testing procedure could be adapted to this situation and hence we focus on the more general case of a functional

Let

 Q(γ)=E[⟨U,E{U∣Fγ(⟨X,γ⟩)}⟩]

The following new formulation of is a direct consequences of Lemma 3.1 above.

###### Corollary 3.2

Consider a valued random variable such that . The following statements are equivalent:

1. The null hypothesis in (1.1) holds true.

2. and any set with a strictly positive Lebesgue measure on

 maxγ∈BpQ(γ)=0. (3.4)

### 3.2 The test statistic with functional predictor

In view of equation (3.4), the goal is to estimate Given a sample of , let

 Qn(γ)=1n(n−1)∑1≤i≠j≤n⟨Ui,Uj⟩1hKh(Fγ,n(⟨Xi,γ⟩)−Fγ,n(⟨Xj,γ⟩)),γ∈Sp,

where , is a kernel, the bandwidth, and is the empirical d.f. of the sample . Ties in the values could be broken by comparing indices, that is if then we define if However, for simplicity in our assumptions below we will assume that the s have a continuous distribution for all

The statistic is related to the statistic considered by Patilea, Sánchez-Sellero and Saumard (2012) who used a Nadaraya-Watson regression estimator instead of the nearest neighbor (NN) approach. Since the asymptotic variance of the NN kernel estimator does not depend on the density of the covariate, in our case the covariate is one could more confidently use the same bandwidths for any to define The projections of the covariates were also considered by Lavergne and Patilea (2008); see also Cuesta-Albertos et al. (2007), Cuesta-Albertos, Fraiman and Ransford (2007). The extension of the scope to functional responses seems to be new. As in the univariate predictor case, we allow for heteroscedasticity of unknown form.

Under , by the Central Limit Theorem (CLT) for degenerate statistics, for fixed and , has an asymptotic centered normal distribution. Here we use the CLT in Theorem 5.1 in de Jong (1987). We will show de Jong CLT still applies and the asymptotic normal distribution is preserved even when grows at a suitable rate with the sample size. On the other hand, Lemma 3.1-(B) indicates that if is sufficiently large, the maximum of over stays away from zero under the alternative hypothesis and this will guarantee consistency against any departure from .

The statistic is expected to be close to uniformly in , provided increases suitably. Then a natural idea would be to build a test statistic using the maximum of with respect to . However, as in the finite dimension covariate case, under one expects to converge to zero for any and and thus the objective function of the maximization problem to be flat. Therefore we will choose a direction as the least favorable direction for the null hypothesis obtained from a penalized criterion based on a standardized version of ; see also Lavergne and Patilea (2008) for related approaches. More precisely, let us fix some infinite-dimensional vector with Such a vector could be interpreted as an initial guess of an unfavorable direction for For any given such that , let

 γ(p)0=(b01,⋯,b0p)∥(b01,⋯,b0p)∥∈Sp,

where here denotes the norm in .

Let

 ˆv2n(γ)=2n(n−1)h∑j≠i⟨Ui,Uj⟩2K2h(Fγ,n(⟨Xi,γ⟩)−Fγ,n(⟨Xj,γ⟩)), (3.1)

be an estimate of the variance of . Given with positive Lebesgue measure in and which contains , the least favorable direction for is defined by

 ˆγn=argmaxγ∈Bp[nh1/2Qn(γ)/ˆvn(γ)−αnI{γ≠γ(p)0}], (3.2)

where is the indicator function of a set , and , is a sequence of positive real numbers decreasing to zero at an appropriate rate, which depends on the rates of and and will be made explicit below. Using a standardized version of avoids scaling according to the variability of the observations. Let us note that the maximization used to define is a finite dimension optimization problem. The choice of will be shown to be theoretically irrelevant. It will not affect the asymptotic critical values and the consistency results. Practical aspects related to the choice of and will be discussed in section 3.5.

We will prove that with suitable rates of increase for and and decrease for , the probability of the event tends to 1 under . Hence behaves asymptotically as , even when grows with the sample size. Therefore the test statistic we consider is

 Tn=nh1/2Qn(ˆγn)ˆvn(ˆγn). (3.3)

We will show that an asymptotic -level test is given by

### 3.3 Behavior under the null hypothesis

###### Assumption D
1. The random vectors are independent draws from the random vector that satisfies moreover, such that

2. For any and any , the d.f. is continuous.

3. and such that:

1. almost surely;

2. .

4. For any , are open subsets of and where denotes the null vector of dimension .

The continuity condition in Assumption D-(b) is a mild assumption that simplifies the NN smoothing. Assumption D-(c) serves to prove that the variance of is uniformly bounded away from zero and infinity. The mild conditions on simplify the proofs for the consistency and are satisfied, for instance, when is a half unit hypersphere.

###### Assumption K
1. The kernel is a continuous density on the real line such that and is non increasing on . Moreover the Fourier Transform of is integrable.

2. and

3. increases to infinity with and there exists a constant such that is bounded.

The first step for deriving a test statistic is the study of the behavior of the process , under when increases with the sample size. This study is greatly simplified by the fact that for a fixed up to permutations of lines and/or columns, the matrix with entries is equal to that with entries matrix for any dimension and direction .

###### Lemma 3.3

Under Assumptions D and K and if holds true,

 supγ∈Bp⊂Sp|Qn(γ)|=OP(n−1h−1/2plnn).

Moreover, if is the estimate defined in equation (3.1),

 supγ∈Bp⊂Sp{1/ˆv2n(γ)}=OP(1).

We next describe the behavior of under . A suitable rate will make to be equal to with high probability. Under the null, has to grow to infinity sufficiently fast to render the probability of the event close to 1. We will see below that, for better detection of the alternative hypothesis, should grow as slowly as possible. Indeed, slower rates for will allow directions to be selected which could be better suited than for revealing the departure from the null hypothesis. The rate of is also involved in the search of a trade-off for the rate of : a larger renders the rate of uniform convergence to zero of , slower, and hence requires a larger .

###### Lemma 3.4

Under Assumptions D, K, for a positive sequence , such that ,

 P(ˆγn=γ(p)0)→1, % under H0.

The following result shows that the asymptotic critical values of our test statistic are standard normal.

###### Theorem 3.5

Under the conditions of Lemma 3.4 and if the hypothesis in (1.1) holds true, the test statistic converges in law to a standard normal. Consequently, the test given by , with the quantile of the standard normal distribution, has an asymptotic level

Theorem 3.5 could be derived in the case of a finite dimension covariate under Assumption D-(a,b,c) and Assumption K-(a,b). Since no dimensional reduction is required in the univariate case, no exponential moment condition is required when is univariate.

Under technical conditions ensuring that the sample of is estimated sufficiently accurately, the test statistic will still have standard normal critical values when the ’s are replaced by some estimates. Patilea, Sánchez-Sellero and Saumard (2012) provide complete arguments for their test in the case where the ’s are the residuals of the functional linear model with scalar responses. Similar arguments could be used with functional responses. To keep this paper to a reasonable length, the theoretical investigation of the extension to the case of estimated responses will be omitted. However, some empirical evidence from extensive simulation experiments are reported in section 4.

### 3.4 The behavior under the alternatives

Our test is consistent against the general alternative

 H1:P[E(U∣X)=0]<1,

i.e., the probability that the test statistic is larger than any quantile tends to one under This could be rapidly understood from the following simple inequalities:

 Tn≥maxγ∈Bpnh1/2Qn(γ)ˆvn(γ)−αn≥nh1/2Qn(˜γ)ˆvn(˜γ)−αn,∀˜γ∈Bp⊂Sp, (3.4)

with defined in (3.1). Since , it is clear that for all . On the other hand, from Lemma 3.1, there exists a and a such that the expectation of does not approach zero as the sample size grows to infinity and decreases to zero. On the other hand, for any and any and , clearly , because . All these facts show why our test is an omnibus test, that is consistent against nonparametric alternatives, provided that

To formally state the consistency result, let be some -valued function such that and , and let be a sequence of real numbers that either decrease to zero or Consider the sequence of alternative hypotheses:

 H1n:U=U0+rnδ(X),n≥1,withU0∈L2[0,1],E(U0∣X)=0.

We show below that such directional alternatives can be detected as long as This is exactly the condition one would obtain with scalar covariate; see Lavergne and Patilea (2008). However, in the functional data framework, to obtain the convenient standard normal critical values, we need . Hence, the rate at which the alternatives tend to the null hypothesis should satisfy .

###### Theorem 3.6

Suppose that

1. Assumption D holds true with replaced by ;

2. Assumption K is satisfied and in addition and there exists a constant such that

3. and , such that ;

4. and

5. there exists and (independent of ) such that and, , the Fourier Transform of is integrable.

Then the test based on is consistent against the sequence of alternatives

The additional Lipschitz condition on the kernel and the restriction on the bandwidth range in Theorem 3.6-(b) are reasonable technical conditions that simplify the proof of consistency. The zero mean condition for keeps the mean of equal to zero under the alternative hypotheses . The existence of vectors with is guaranteed by Lemma 3.1-(B). In Theorem 3.6-(e) we impose a convenient mild technical condition on one of such vectors. Finally, Theorem 3.6 could be easily adapted to the case of a finite dimension covariate. The details are omitted.

### 3.5 Practical aspects

In the case of a functional covariate, the goodness-of-fit procedure we propose in this paper requires the choice of several quantities: the orthonormal basis in the space of , the order the penalty amplitude , the privileged direction the set and the bandwidth . In this section we provide some guidelines on how these quantities could be chosen by the practitioner, except for and for which the choice will be discussed in the Supplementary Material. Before doing this, let us point out that the choice of the basis in the space of is not really an issue. In applications, the statistician only has to compute the products and this could be easily done with high accuracy and low computational costs in any basis.

Our theoretical results above are derived for a fixed basis in the space of . The assumptions used to derive these results impose only very mild conditions on the basis , see Assumption D-(b) and condition (e) in Theorem 3.6. However, the choice of the basis could influence the finite sample performances of the test. Clearly, the practitioner would prefer a basis that allows for an accurate low-dimensional representation of the covariate and hence for a low in our testing procedure. A widely used basis is that given by the eigenfunctions of the covariance operator of that is defined by:

 (Γv)(t)=∫σ(t,s)v(s)ds,v∈L2[0,1],

where is supposed to satisfy the condition and is supposed positive definite. Let denote the ordered eigenvalues of and let be the corresponding basis of eigenfunctions of that are usually called the functional principal components (FPC). The FPCs represent the orthonormal basis of the Karhunen-Loève decomposition of and provide optimal low-dimensional representations of with respect to the mean-squared error. See, for instance, Ramsay and Silverman (2005). In some cases where the law of is given, the FPCs are available. However, most of the time this is not the case and the FPCs have to be estimated from the empirical covariance operator

 (ˆΓv)(t)=∫ˆσ(t,s)v(s)ds,

where and Let denote the eigenvalues of and let be the corresponding basis of eigenfunctions, i.e., the estimated FPCs. We adopt the usual identification condition and we suppose that for any . For any let

 ⟨Xi,γ⟩n=p∑k=1γk∫[0,1]Xi(t)ˆψk(t)dt.

Let be the test statistic obtained from equations (3.2) and (3.3) after replacing all the inner products by the estimated versions Below we show that the test behaves asymptotically like the test For the behavior under the null hypothesis, no additional assumption is required. For consistency, we impose mild conditions on and a slightly more restrictive bandwidth range.

###### Corollary 3.7

a) Under the same conditions, the conclusion of Theorem 3.5 remains true if is replaced by .

b) In addition to the conditions of Theorem 3.6 assume that

1. there exist such that , ;

2. the vector in condition (e) of Theorem 3.6 is such that the variable has a bounded density ;

The conclusion of Theorem 3.6, then remains true if is replaced by .

The condition on the spacings between the ordered eigenvalues of is a common condition in functional data modeling. In view of Lemma 3.1-(B), almost any unit norm vector of finite but sufficiently large dimension is a candidate to be . Hence the bounded density condition for some is also a mild restriction. For instance, it is satisfied for any unit norm vector if is a gaussian process.

The value of needs to grow to infinity to guarantee consistency against general alternatives. Meanwhile, large makes the optimization over more difficult. Using the FPC basis could be a good compromise to detect general alternatives with small If the ’s then decrease as fast as a power of an automatic choice for could be given by for some constant This would result in a logarithmic rate for . In practice should be replaced by the estimates , but the rate of will not change because is of order under mild conditions; see, for instance, Horváth and Kokoszka (2012), chapter 2. In practice simple empirical rules work as well. For instance could be the smallest value such that more than some fixed high percentage, say 95%, of the variance within the covariate sample is captured by the first principal components.

Under the null hypothesis, if and increases with at a suitable rate, the ratio behaves like a standard normal for any given sequence of . Meanwhile the supremum of this ratio with respect to diverges in probability with a rate smaller or equal to Hence has to grow to infinity faster than In practice, for sample sizes of hundreds, larger (like for instance ) will likely result in taking and in this case the standard normal critical values will be quite accurate. Having might be reasonable when the practitioner judges trustful for detecting alternatives. On the other hand, smaller (for instance or 2) will probably lead to a value of the test statistic equal to the maximum value of and hence in general, the test will overreject the null hypothesis. Meanwhile, smaller is preferable for detecting general alternatives. On the basis of our detailed simulation investigations, we recommend values for between 2 and 5 and a correction of the critical values through resampling, as explained below.

Now, let us propose a wild bootstrap procedure that could be used for correcting the finite sample critical values. In particular, such a correction is useful to take into account the effect of the penalty with finite samples. The bootstrap sample, denoted by , , is obtained as follows: , , where , are independent random variables with expectation zero and variance one. In particular, for their common distribution we chose the two-points distribution proposed by Mammen (1993), that is, with probability and with probability . As with the original test statistics, a bootstrap test statistic is built from a bootstrap sample. Similarly, let be the bootstrap test statistic obtained from this procedure applied with the estimated FPC basis. As usually, for any , the th conditional quantile of or given could be approximated using a Monte-Carlo method. The asymptotic validity of this bootstrap procedure is guaranteed by Theorem 3.5 and the following result.

###### Theorem 3.8

Under the null hypothesis and if the conditions of Theorem 3.5 hold true,

 supx∈R∣∣P(Tbn≤x∣U1,X1,⋯,Un,Xn)−Φ(x)∣∣→0,in % probability,

where is the standard normal distribution function. Under the sequence of alternative hypotheses and if the conditions of Theorem 3.6 hold true, for any where is the th conditional quantile of the statistic given The statements remain true with replaced by .

Finally, the optimization problem can be solved with reasonable computational effort for small (up to ) by taking a grid of values in the hypersphere . For a larger , a grid of values is not feasible in terms of computation time. In this case, we propose a sequential algorithm based on successive one-dimensional optimizations. If one considers the directions on the hypersphere can be represented as with . As mentioned, one can make the restriction of , since half of the circle is sufficient to consider all directions in the plane. An equally-spaced grid of values of in provides an equally-spaced grid of directions in . Next, if then the first step would be to optimize with respect to the first two components as before. Let be such an optimal direction in two dimensions. The next step would be to optimize in the set of directions for . This is again a one-dimensional optimization that can be solved with a grid of values in the interval . This procedure can be applied to a possible fourth dimension, from the optimal direction obtained with the first three dimensions, and so on until the chosen number of components is reached. This method would require one-dimensional optimizations. Simulations given in section 4.2 were carried out with a sequential algorithm and a grid of 50 points in each one-dimensional optimization. In the Supplementary Material, some empirical results are presented to show that the statistical properties obtained with the sequential algorithm are very close to those obtained with a full-dimensional optimization in . We also show in Lemma B in the Supplementary Material that the sequential search will lead toward a direction able to reveal any departure from the null hypothesis. On the other hand, the asymptotic behavior of the test under the null hypothesis is not affected by the sequential search, since the dominant part of the test statistic will still be given by exactly as in the case of a full-dimensional search.

## 4 Empirical study

The proposed methods were applied to simulated data as well as to real data. We first present the results obtained for a univariate predictor. Functional predictor models are considered later.

### 4.1 Univariate predictor

We consider a model where the response, , is functional and the predictor, , is univariate. Under the null hypothesis, has no effect on and a common curve represents the expectation of , that is,

 Ui(t)=μ(t)+ϵi(t),1≤i≤n,μ(t)=0.01⋅exp(−4⋅(t−0.3)2),t∈[0,1],

where are independent Brownian bridges, also independent of . The ’s have a log-normal distribution with mean and standard deviation . Under the alternative,

 Ui(t)=μ(t)⋅Xi+ϵi(t),1≤i≤n.

This is a multiplicative effects model, as proposed by Chiou, Müller and Wang (2004) for the medflies data. Figure 1 represents the curve which is the common curve shape for all individuals in the multiplicative effects model.

Insert Figure 1 here

The statistic was computed with the Epanechnikov kernel, . Table 1 below shows percentages of rejections for several nominal levels and sample sizes , under the null hypothesis and under the alternative, and different values of the bandwidth, The coefficient is indicated in the table. For each original sample, we used 499 bootstrap samples to compute the critical value. One thousand original samples were generated to approximate the percentages of rejection. Each original sample was generated once for all the significance levels and bandwidths.

The level is respected under the null hypothesis, with the approximation being better for larger sample size. Under the alternative, the power is increasing with sample size, and there is not much effect of the bandwidth.

Insert Table 1 here

#### 4.1.1 Application to egg-laying curves of fruit flies

As briefly explained in the Introduction, Chiou et al. (2003) proposed a multiplicative effects model for the egg-laying curve of each fly, which can be expressed

 Ui(t)=μ(t)ϕ(Xi)+ϵi(t),1≤i≤n,

where is the number of eggs laid by the th fly on day , represents the common shape of the egg-laying curve for all flies, is a multiplicative effect related to that denotes the total number of eggs laid by the th fly and is an error term. The data under analysis consists of 936 flies that laid at least one egg in their lifetime. The complete data set is available on the web pages of the authors of Chiou et al. (2003).

We applied the new test to check the effect of the total number of eggs on the egg-laying curve. The null hypothesis of no-effect versus nonparametric alternative was clearly rejected, with values extremely close to zero. The test was next also applied to check the goodness-of-fit of the multiplicative effects model. Note that under the multiplicative effects model, both functions and are nonparametrically estimated. To obtain the goodness-of-fit test, the residuals coming from the adjusted model were used in the expression of the test statistic. Our test clearly rejects the model (value less than 0.001). The cause could be due to some discrepancies already found by Chiou et al. (2003) in the peak of the egg-laying curve. Some flies showed a peak quite far from the model, and in particular those flies which produce fewer eggs (smaller value of ) typically had shorter lifetimes and an earlier peak. When of these flies with an anomalous peak were deleted from the data set, the remainder sub-sample of flies provided a better adjustment of the model, which was no longer rejected by our test (the value was 0.124). The new test was then useful to confirm the anomalies found by Chiou, Müller and Wang (2004) in some individuals with respect to the multiplicative effects model. Once these individuals were removed, the model was accepted by the test.

### 4.2 Functional predictor

We shall now assess the performance of the test in the case of a functional predictor. The sequential algorithm described in Section 3.5 will be used to compute the test statistic with a grid of 50 points in each one-dimensional optimization. In all our simulated models the empirical percentages of rejection will be provided on the basis of one thousand original samples. The critical values for each sample will be approximated by means of 499 bootstrap replicates.

Together with the assessment of the level under the null and the power under the alternative, we shall compare our test with the procedure proposed by Kokoszka et al. (2008), which is a parametric test of the functional linear effect.

The first situation we considered then was a functional linear model given by

 Ui(t)=∫10ζ(s,t)Xi(s)ds+ϵi(t),1≤i≤n (4.5)

where and are independent Brownian bridges and is square-integrable over . The kernel was chosen to be , with under the null and under the alternative.

The estimated functional principal components of the covariate are used as the basis. Different possibilities for the privileged direction were considered. We present here the results for an uninformative direction, with the same coefficients in all basic elements. For the penalization we used the value , which provides a good trade-off between the privileged direction and the direction maximizing the standardized statistic. The Epanechnikov kernel was again used to compute the statistic in each direction. The bandwidth was chosen following the rule .

Table 2 shows the empirical powers obtained for different significance levels, and sample sizes, with representing the null hypothesis and , and under the alternative. For the number of basic components, , they were chosen for each simulated sample such that the percentage of explained variance was at least 95%. The most frequent values observed for were 9 and 10. This is close to the dimension with 95% of variance in the Brownian bridge, which is the distribution of the covariate.

The empirical powers of the Kokoszka et al. (2008)’s test are also shown titled . The same dimension with 95% of explained variance is used as the dimension of the covariate in their test. Their test also requires choosing a dimension of the response. The value of was taken in all cases.

From Table 2, one can conclude that the new test is generally respects the nominal levels, while Kokoszka et al. (2008)’s test is somewhat conservative, specially for small samples.

Regarding the power under the alternative, one would expect Kokoszka et al. (2008)’s test to be more powerful, since their test is specifically designed to check this type of effect. We consider that this power comparison is affected by the conservative nature of their test for a high dimension such as 9 or 10, as estimated here. We will see this in more detail in the following experiments with several fixed values of the dimension .

Insert Table 2 here

Table 3 shows the empirical powers of the new test and Kokoszka et al. (2008)’s test for fixed values of the dimension under the null hypothesis. The option “random” for represents the random number of components required to obtain at least 95% of the explained variance. The new test respects the nominal levels for any dimension, while Kokoszka et al. (2008)’s test is conservative for high dimensions, especially with small sample size and small nominal level.

Insert Table 3 here

Table 4 is similar to Table 3, but under the alternative hypothesis coming from the functional linear effect with . As expected, Kokoszka et al. (2008)’s test is more powerful for low dimensions, since this is the ideal situation of their parametric test. An increasing dimension produces a power loss in both tests, as a consequence of more noise in the statistic, while low dimensions are sufficient to detect the alternative. The new test is less affected by the dimension than the parametric test. In particular, the new test becomes more powerful than its parametric competitor for high dimensions. Although part of parametric test’s power loss can be assigned to the asymptotic distribution inaccuracy, it is also true that the new test is designed to overcome the problem of dimension, usually called curse of dimension in the literature of lack-of-fit tests.

Insert Table 4 here

Other alternatives were considered to complete the comparison with Kokoszka et al.’s test. One of them is of the following type:

 Ui(t)=β(t)Xi(t)+ϵi(t),1≤i≤n,

where and are independent Brownian bridges (as in the previous situation) and is a square-integrable function on . This is the so-called concurrent model studied in detail in Ramsay and Silverman (2005), where the covariate at time only influences the response function at time . The function was , with under the null and under the alternative.

A completely nonlinear alternative was also considered. In this case a quadratic model of this type was generated:

 Ui(t)=H(Xi(t))+ϵi(t),1≤i≤n,

where and are independent Brownian motion and Brownian bridge, respectively, and . The null hypothesis is satisfied when , while the alternative is represented by .

Table 5 contains the percentages of rejection under the three alternative models for both tests with different significance levels and sample sizes. The bandwidth followed the rule , and the dimension was taken to explain 95% of the variance in the empirical PCA.

Kokoszka et al.’s test is more powerful than the new test under the linear alternative, and also under the concurrent alternative. This is not necessarily surprising since the concurrent model is in a sense, a degenerate functional linear model. On the other hand, Kokozska et al.’s test, which was designed to detect only linear effects, is not powerful under the quadratic alternative.

Insert Table 5 here

We shall now consider a functional linear model with heteroscedastic error, given by

 Ui(t)=∫10ζ(s,t)Xi(s)ds+√1/2+Xi(t)2ϵi(t),1≤i≤n

where and are independent Brownian bridges and the kernel was chosen to be