A Comparison of Testing Methods in Scalar-on-Function Regression

# A Comparison of Testing Methods in Scalar-on-Function Regression

Merve Yasemin Tekbudak Marcela Alfaro Córdoba Arnab Maity and Ana-Maria Staicu Department of Statistics, North Carolina State University

# Supplementary Material for “A Comparison of Testing Methods in Scalar-on-Function Regression”

Merve Yasemin Tekbudak Marcela Alfaro Córdoba Arnab Maity and Ana-Maria Staicu Department of Statistics, North Carolina State University
###### Abstract

A scalar-response functional model describes the association between a scalar response and a set of functional covariates. An important problem in the functional data literature is to test the nullity or linearity of the effect of the functional covariate in the context of scalar-on-function regression. This article provides an overview of the existing methods for testing both the null hypotheses that there is no relationship and that there is a linear relationship between the functional covariate and scalar response, and a comprehensive numerical comparison of their performance. The methods are compared for a variety of realistic scenarios: when the functional covariate is observed at dense or sparse grids and measurements include noise or not. Finally, the methods are illustrated on the Tecator data set.

###### keywords:
Functional regression, functional linear model, nonparametric regression, mixed-effects model, hypothesis testing
\biboptions

authoryear

## 1 Introduction

The scalar-on-function regression model refers to the situation where the response variable is a scalar, and the predictor variable is functional. Such models are generalizations of the usual regression models with a vector-valued covariate, both linear and nonlinear, to the case with functional covariates. The functional version of standard linear regression is the so-called functional linear model (FLM)(see, e.g., Ramsay and Dalzell (1991)); various extensions to nonparametric functional regression models have also been developed (see, e.g., Ferraty and Vieu (2006)). In this article, we are concerned with hypothesis testing procedures in such scalar-on-function regression models. As in standard regression models, one important problem is to test whether there is any association between the functional covariate and the response, that is, the test for nullity. Also, for nonparametric or nonlinear functional regression models, another equally important question is to test linearity of the relationship between the functional covariate and the scalar response; this is primarily because of the interpretability and ease of fit of the FLM. There is a plethora of literature that develops statistical methods for testing nullity and linearity in both linear and nonlinear scalar-on-function regression models, respectively. Despite the various available methods, there is no clear guideline as to which method provides the best performance in different situations. In this article, our goal is to provide an overview of the available testing methods, perform an extensive numerical study to compare their size and power performance in various data generation models and provide a guideline as to which method yields to the best performance. We will illustrate the discussed methods via the Tecator data set.

Much of the literature on testing nullity has been developed under the assumption that there is a linear relationship between the functional covariate and the scalar response, that is, the functional linear model (FLM). First introduced by Ramsay and Dalzell (1991), the FLM is one of the most commonly used functional regression models due to its interpretability and simplicity. It has received extensive attention in recent literature; see Ramsay and Silverman (1997, 2005), Cardot et al. (1999), Müller and Stadtmüller (2005), Ferraty and Vieu (2006), Cai et al. (2006), Hall et al. (2007), Crambes et al. (2009), and Goldsmith et al. (2011). The FLM quantifies the effect of the functional covariate as an integral of the functional covariate weighted by an unknown coefficient function. The test for nullity, in this case, involves testing whether the coefficient function is zero or not.

Cardot et al. (2003) proposed two testing methods that are based on the norm of the cross-covariance operator of the functional covariate and the scalar response for testing nullity in the FLM. They provided asymptotic normality and consistency results of the two proposed test statistics. Later, Cardot et al. (2004) considered an alternative approach based on a direct approximation of the distribution of the cross-covariance operator. Furthermore, they proposed a pseudo-likelihood test statistic for the situation when there are multiple functional predictors. Assuming an FLM, Swihart et al. (2014) proposed likelihood ratio-based test statistics, representing the model by using a mixed-effects modeling framework and rewriting the null hypothesis with zero-variance components. The major advantage of the mixed-effects model is that software is readily available for estimation and hypothesis testing. Recently, Kong et al. (2016) considered traditional testing methods—Wald, score, likelihood ratio, and F, to test for no effect in the FLM. They derived the theoretical properties of each testing method and compared their performances for both densely and sparsely sampled functions.

The main disadvantage of the FLM is the assumption of linearity of the relationship between the functional covariate and the scalar response. Such a linear relationship may not be practical in many situations, and as a result, there is a substantial amount of literature on the development of nonlinear/nonparametric functional regression models. Yao and Müller (2010) considered a quadratic regression model as an extension of the FLM by including quadratic effects of the functional covariate. García-Portugués et al. (2014) considered testing in a nonparametric functional regression model, where the effect of the functional covariate was modeled via an unknown functional. McLean et al. (2014) developed the so-called functional generalized additive model (FGAM), where the effect of the functional covariate is modeled using an integral of a bivariate smooth function involving the functional covariate at a specific time point and the time point itself. For testing nullity, García-Portugués et al. (2014) introduced the projected Cramér-von Mises (PcVM) test—a testing method which is derived by using random projection, and whose null distribution is approximated by bootstrap. McLean et al. (2015) introduced a restricted likelihood ratio test (RLRT) statistic for testing no effect under the assumption that the response and the predictor are related through an FGAM. The key idea is to use the mixed model formulation of the smooth effects and represent the null hypothesis as the test for a subset of variance components.

Other than testing for nullity, another important problem is to test for linearity of the regression function. Motivated by the idea of a polynomial functional relationship (e.g., quadratic functional regression by Yao and Müller (2010)), Horváth and Reeder (2013) developed a testing method by using functional principal component scores to test the null effect of the quadratic term and studied its asymptotic properties. The testing methods of García-Portugués et al. (2014) and McLean et al. (2015) can also be used to investigate the problem of testing for the linear effect.

In this article, our goal is to numerically compare the performance of all the existing methods for testing nullity as well as linearity of a functional covariate when the response is scalar in a variety of scenarios related to how the functional covariate is observed. The results are illustrated for varying sample sizes and situations of increasing complexity regarding the functional covariate. Additionally, we apply the methods to a commonly used data set, the Tecator data, to formally assess the relationship between the meat’s spectrum of absorbances and the fat content, using 215 finely chopped pure meat samples.

The article makes two key contributions to the literature. First, we study each of these methods under a wide variety of scenarios in which the functional predictor is observed either on a dense or sparse grid of points for each subject, with or without measurement error. Much of the previous work relies only on the assumption of densely observed functional predictors. Second, we provide a comprehensive comparison study of the existing approaches for testing nullity and linearity of scalar-on-function regression.

The remainder of the article is organized as follows. In Section 2, we introduce the data structure and model framework for scalar-on-function regression. In Section 3, we review each of the methods under comparison. Section 4 discusses the advantages and drawbacks of each method in greater detail. Simulation studies and the real data application follow in Section 5 and 6, respectively.

## 2 Model Framework

Suppose that for subject we observe data of the form , where is a scalar response variable and are discrete realizations of a real-valued, square-integrable smooth curve at observation points . For simplicity of exposition, we assume that the full predictor trajectory is observed; however, the methods are investigated for the case when the true predictor is observed on a finite grid of points and corrupted with measurement error. Without loss of generality, we assume that the functional covariate is a zero-mean process. A scalar-response functional model can be defined as

 Yi=α+m{Xi(⋅)}+εi, (1)

where is an unknown functional and are independent and identically distributed random errors with mean zero and variance . According to Ferraty and Vieu (2006), can be classified as parametric and nonparametric, depending on the specific mean model at hand. An example of a functional parametric mean model is the functional linear model (FLM) where for some unknown continuous function . In contrast, a functional nonparametric mean model assumes that the object is a continuous real-valued operator defined on a Hilbert space . In this article, we are interested in testing two important hypotheses about the mean structure: (i) , the relationship between the covariate and the response is linear, and (ii) , there is no relationship between and .

The main focus of this article is to numerically compare the performance of the existing methods for testing and in a variety of realistic scenarios. For testing , we study the nonparametric testing method of García-Portugués et al. (2014) (which we call GGF using first letters of the authors’ names), the semi-parametric method of McLean et al. (2015) (which we call MHR), and the parametric method of Horváth and Reeder (2013) (which we call HR). For testing , we study the GGF and MHR methods, and also the parametric method of Kong et al. (2016) (referred by KSM), which assumes a linear relationship between the response and the predictor. We assess the performance of the methods in the cases when the functional covariate is observed on a dense, moderately sparse, or sparse grid, with and without measurement error, using different sample sizes. This article offers a comprehensive comparison study of available approaches in the literature for testing nullity and linearity in scalar-on-function regression.

## 3 Hypothesis Testing

In this section, we review each of the methods under study. All the methods rely on the idea of using basis expansion to approximate the functional linear model by a simple mixed-effects model. The various methods use different test statistics and corresponding null distributions, and they have been developed to assess the null hypothesis in specific settings. First, we consider the problem of testing linearity. The GGF method considers this problem in the class of nonlinear models, which is the most general case considered in the literature. In contrast, MHR and HR consider this problem in a more restrictive class of models: MHR assumes a functional generalized additive model (FGAM), and HR assumes a functional quadratic model. For testing nullity, GGF assumes a general non-null relationship, MHR assumes an FGAM relationship, and KSM assumes a linear dependence. These assumptions are reflected in the form of the alternative hypothesis. The HR and KSM methods are parametric methods that require stronger assumptions in order to develop their corresponding null distributions.

### 3.1 Testing for the Linear Effect of the Functional Covariate

#### 3.1.1 GGF Method for Testing Linearity H01

The GGF method (García-Portugués et al., 2014) for testing linearity is essentially a generalization of a goodness-of-fit test in regression models for scalar responses and vector covariates to the case when the covariate is functional. The interest is to test the null hypothesis , which indicates that belongs to the family versus a general alternative of the form with positive probability. In other words, the alternative hypothesis can also be written as

 HA1:E(Y)=m{X(⋅)}, (2)

where is an unknown functional, while the null hypothesis is that .

The key idea is to characterize the linear relationship in an equivalent way that is based on random projection. Specifically, García-Portugués et al. (2014) show that for an element in is equivalent to

 E[(Y−⟨X(⋅),β⟩)\mathds1{⟨X(⋅),γ⟩≤u}]=0, (3)

almost everywhere for any and for all , where and such that their norm , and are orthogonal bases in . For more information, see Lemma 3.1 in García-Portugués et al. (2014). The latter formulation essentially implies that the mean of the departure from a linear relationship—that is, concentrated on arbitrarily small neighborhoods, is zero. Thus, one approach to testing for linearity is to quantify the mean on the left-hand side of (3) and assess how different it is from zero.

The GGF method proposes to do this by first estimating by the best linear estimator, , using known basis functions to expand both the functional covariate and the coefficients, and then rewriting the functional linear model as a standard linear model, as proposed by Cardot et al. (1999). The residuals under the null hypothesis are . Once the residuals are estimated, a projected Cramér-von Mises (PcVM) test statistic with a plug-in estimator is used. Specifically, for fixed and , a method of moment estimator of (3) is , where

 ^Rn(u,γ)=n−1/2n∑i=1^εi\mathds1∫Xi(t)γ(t)dt≤u. (4)

The PcVM test statistic is adapted to the projected space and defined as

 PCvMn,p=∫Π^Rn(u,γ)2Fn,γ(du)ω(dγ), (5)

where is the empirical cumulative distribution function of the data and is a measure on .

This expression is certainly complicated, and its derivation has numerous cumbersome steps. However, García-Portugués et al. (2014) show that in practice is approximated by , where is an matrix of the average over and of the three-dimensional array . The array represents the product surface area of a spherical wedge of angle times the determinant of the matrix (from the Cholesky decomposition of the basis functions). For details concerning the matrix and the derivation of this approximation, we refer the reader to García-Portugués et al. (2014). The null distribution of the test statistic is nonstandard and is approximated by a wild bootstrap on the residuals.

#### 3.1.2 MHR Method for Testing Linearity H01

The MHR method (McLean et al., 2015) considers testing in the class of models called functional generalized additive model (FGAM), for which the response and covariate relate according to the following relationship:

 Y=α+∫TF{X(t),t}dt+ε; (6)

we use the generic notation respectively for the response, functional covariate, and Gaussian random error with zero mean and variance , and represents an unknown bivariate function. It can be clearly seen that FGAM reduces to the FLM when ; thus FLM is a special class of FGAM. Testing the hypothesis of interest in this class is equivalent to representing the alternative hypothesis as . The key idea behind the test is to use the connection between the tensor product splines and mixed-effects modeling (Wood et al., 2013) and to formulate the FGAM as a mixed model representation with two main parts: a component represented by unpenalized, fixed effects and a component represented by random effects.

Using low-rank spline bases, denoted as and , the bivariate surface can be expanded as

 F(x,t)=KX∑j=1Kt∑k=1BXj(x)BTk(t)θjk, (7)

where the are unknown tensor product B-spline coefficients. Let denote the matrix of the -axis B-splines that are evaluated at , where is the matrix whose rows include the observed functional predictor values for each subject. Similarly, let denote the matrix of the -axis B-splines that are evaluated at , where is the matrix in which each row includes the observed time points for the functional predictor for each subject. The matrices , , , and are derived from the eigendecompositions of marginal penalty matrices and . After some mathematical manipulations, we can define the fixed-effects design matrix , and the random-effects design matrices , , and , where denotes the Kronecker product and represents the box product (also known as the row-wise Kronecker product). Then, FGAM can be expressed in the form of a mixed-effects model with three pairwise independent vectors of random effects, each with a diagonal covariance matrix independent of the other effects:

 Y≈LXβ+3∑j=1LZjbj+ε, (8)

where is an matrix of quadrature weights; with the dimensions , , and ; and . The matrix forms a basis for functions of the form without penalty, forms a basis for functions of the form and penalty , forms a basis for functions of the form and penalty , and forms a basis for functions of the form without the previous terms and with penalty . In addition, it can be shown that the FLM is nested within the FGAM in an explicit way, which allows the use of restricted likelihood ratio tests for zero-variance components to test the null hypothesis that the functional linear model holds, . The testing is done via the restricted likelihood ratio test (RLRT) under the assumption that :

 RLRT=2supH1ℓR(y)−2supH0ℓR(y),

where denotes the restricted log-likelihood function of the observed data vector for model (8). Crainiceanu and Ruppert (2004) derive the finite-sample null distribution of the RLRT statistic and show that the distribution is different from the mixture of distributions.

#### 3.1.3 HR Method for Testing Linearity H01

The HR method (Horváth and Reeder, 2013) considers the same problem and proposes a method based on projecting the predictor process onto a space of finite dimension by using the functional principal component analysis (FPCA). This approach assumes a functional quadratic regression model

 Y=α+∫X(t)β(t)dt+∬X(t)X(s)γ(s,t)dtds+ε, (9)

where and are unknown smooth univariate and bivariate functions, respectively. Notice that when , model (9) reduces to the FLM; equivalently the FLM is a subclass of model (9). Horváth and Reeder (2013) focus on testing the significance of the quadratic term in model (9); that is, they focus on the null hypothesis versus .

The regression coefficient functions are expanded using the eigenfunctions of the covariance function of the predictor to represent them as and , where denote the eigenfunctions of . By projecting the observations onto the space spanned by and using the expansions given above, we can rewrite model (9) as

 Yi=α+p∑j=1bj⟨Xi,vj⟩+p∑j=1p∑k=1{2−\mathds1(j=k)}ajk⟨Xi,vj⟩⟨Xi,vk⟩+ε∗i, (10)

where and are the coefficients, and

 ε∗i =εi+∞∑j=p+1bj⟨Xi,vj⟩+∞∑j=p+1∞∑k=j{2−\mathds1(j=k)}ajk⟨Xi,vj⟩⟨Xi,vk⟩ +p∑j=1∞∑k=p+12ajk⟨Xi,vj⟩⟨Xi,vk⟩.

Because the eigenfunctions and the mean process of the functional covariate are unknown, model (10) is not adequate to make statistical inference. Substituting the estimates into (10) results in

 Yi=α+p∑j=1bj⟨Xi−¯X,^cj^vj⟩+p∑j=1p∑k=1{2−\mathds1(j=k)}ajk⟨Xi−¯X,^cj^vj⟩⟨Xi−¯X,^ck^vk⟩+ε∗∗i, (11)

where , is the th estimated eigenfunction of , the are random signs, and

 ε∗∗i =ε∗i+p∑j=1bj⟨Xi,vj−^cj^vj⟩+p∑j=1bj⟨¯X−μX,^cj^vj⟩ −p∑j=1p∑k=j{2−\mathds1(j=k)}ajk(⟨Xi−¯X,^cj^vj⟩⟨Xi−¯X,^ck^vk⟩−⟨Xi,vj⟩⟨Xi,vk⟩).

The model can be rewritten as

 Y=^Z⎛⎜ ⎜⎝~A~Bμ⎞⎟ ⎟⎠+ε∗∗with^Z=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝^DT1^FT11^DT2^FT21⋮⋮⋮^DTn^FTn1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,

where ; , where vech() denotes the half-vectorization (vectorization of the lower triangular portion of the matrix); ; and . is the half vectorization of the matrix constructed as a cross-product of each of the eigenfunctions and the centered predictor . is a vector constructed as . , and are estimated using the least squares estimator .

Horváth and Reeder (2013) construct their test by using summary quantities of and the sum of squared :

 Un=nτ2^AT(^G−^M^MT)^A,

where , , and . They show the null distribution of is a with degrees of freedom. measures the distance between and , scaled using the sample size, the estimated coefficients and the residuals . The difference between and corresponds to the interaction between different elements of , which represents the quadratic term. If the difference is too big, then there is evidence of a quadratic relationship.

### 3.2 Testing for the Null Effect of the Functional Covariate

#### 3.2.1 GGF Method for Testing Nullity H02

Testing for the null effect of the functional covariate can be viewed as a special case of testing for the linear effect. García-Portugués et al. (2014) focus on testing for a specific functional linear model, , for a specified smooth function . When for , an equivalent to the null hypothesis is (versus ).

By making minor modifications according to the choice of the null hypothesis, the GGF method uses the same procedure that is described in Section 3.1.1 to compute the test statistic with the residuals under the null hypothesis, . The null distribution of the test statistic is again approximated by using a wild bootstrap sampling procedure on the residuals.

García-Portugués et al. (2014) compare the finite sample properties of the PcVM test statistic with two other competing methods proposed by Delsol et al. (2011) and González-Manteiga et al. (2012). Based on the numerical comparison, the PcVM test statistic is found to be the most powerful among these methods. Thus we focus on the PcVM (denoted by GGF) in this article.

#### 3.2.2 MHR Method for Testing Nullity H02

McLean et al. (2015) also consider testing whether the functional covariate has any effect on the scalar response (that is, for ), where the alternative model is specified as the FLM, . The MHR method tests for no effect by rewriting model (8) without the random effects and . Thus the null hypothesis is equivalent to testing against the alternative hypothesis . The likelihood ratio test (LRT) is more appropriate for this case, because the RLRT cannot be used for testing the fixed effects and . The LRT statistic is:

 LRT=2supH0∪H1ℓ(y)−2supH0ℓ(y),

where denotes the log-likelihood function of the observed data vector for the corresponding mixed-effects model. The exact null distribution for the LRT statistic is not a standard distribution, because the null value of the variance component is on the boundary of the parameter space. Crainiceanu and Ruppert (2004) derive the finite-sample null distribution of the LRT statistic in detail.

#### 3.2.3 KSM Method for Testing Nullity H02

The KSM method (Kong et al., 2016) is an extension of classical testing methods in linear regression to functional linear regression with a scalar response and a functional covariate. Kong et al. (2016) are interested in testing for the null hypothesis given in Section 3.2.2 against the alternative hypothesis , , which yields the alternative model of the form, .

This method uses a spectral decomposition of the covariance function to re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores

 Yi=α+sn∑j=1ξijβj+εi, (12)

where is the number of principal components, are the functional principal component scores uncorrelated over with mean zero and variance decreasing with , and denote the unknown basis coefficients in the expansion . The functions denote the eigenfunctions obtained from the spectral decomposition of the covariance operator of the functional predictor.

Testing the null hypothesis in Section 2 is equivalent to testing against the alternative hypothesis for at least one . The test is defined as

 TF=YT(P1−PB)Y/snYT(In×n−PB)Y/(n−sn−1), (13)

where and are the projection matrices under the alternative and the null models, respectively. Note that we need to fit both the alternative and null models in order to calculate the test statistic. Kong et al. (2016) show that the null distribution of behaves like , which enables us to compute -values by using quantiles.

Kong et al. (2016) theoretically and numerically investigate the finite sample performance of four tests—Wald, score, likelihood ratio, and . Their study in finite sampling shows that the test provides reasonable Type I error rates and power values compared the other valid testing methods, and thus indicates that it is a robust testing method. On the basis of their results, we use only the test for testing the null effect of the functional covariate.

## 4 General Discussion

The MHR method, proposed by McLean et al. (2015), considers an RLRT for testing the null hypothesis that the FLM is the true model versus the FGAM alternative. The main idea behind the MHR method is to represent the FGAM as a standard linear mixed model by taking advantage of the link between the mixed-effects model and penalized splines (Wood et al., 2013). This representation allows us to reduce the dimensionality of the testing problem and formulate the null hypothesis of an unknown function as a set of zero-variance components. The MHR method is computationally efficient because the finite sample null distribution of the RLRT statistic can be obtained very quickly by using a fast simulation algorithm. The MHR method assumes that the functional predictor is observed at a dense and regular grid of points, without measurement error. This method can be modified in a straightforward way when there is more than one functional predictor in addition to the response from any exponential family distribution (McLean et al., 2014).

The GGF method, introduced by García-Portugués et al. (2014), considers a projected Cramér-von Mises test statistic. The asymptotic null distribution of the test statistic is approximated by a wild bootstrap on the residuals. One advantage of the method is that it can be easily extended to any other scalar-on-function regression model, because the test statistic and its null distribution are obtained depending on the residuals under the null model. Unlike other methods in the study, the GGF method specifies a more general alternative model and provides a greater flexibility; as expected, this generality leads to a loss of power relative to competitors when simpler alternatives are true. Another drawback of the method is the fact that bootstrapping the null distribution of the test statistic is computationally intensive. We discuss these drawbacks in the context of our simulation study. The GGF method also makes the assumption that the functional predictor has a dense sampling design and is observed without measurement error.

The HR method (Horváth and Reeder, 2013) relaxes the restrictive assumption of a linear relationship between a scalar response and a functional predictor under the alternative by considering a functional quadratic regression model. The HR method is developed to test for linearity in a class of parametric scalar-on-function regression models. Horváth and Reeder (2013) showed that HR provides good Type I error rates and power results when the sample size is greater than and the functional predictor is densely observed without measurement error. However, the question of whether the HR method still performs well when the sample size is small and the functional predictor is observed on irregular and/or sparse grids was not addressed by the authors. As we will see in our simulation study, the Type I error rate of the HR method is considerably inflated for small and moderate sample sizes, as well as for a sparsely observed functional predictor.

The KSM method (Kong et al., 2016) extends the classical test from multiple linear regression to functional linear regression. This method uses the eigenbasis functions that are derived from the FPCA to reduce dimensionality and re-writes the FLM as a standard linear model. In contrast to the aforementioned methods, KSM is applicable to sparsely observed functional predictors that are corrupted with measurement error. Kong et al. (2016) indicate that the KSM method is a robust testing method that maintains the correct nominal level in various scenarios including small sample sizes and noisy and sparsely measured predictor trajectories. The power performance of the method relies mainly on the choice of the number of functional principal components (FPC). Choosing a large number of FPCs may cause a decrease in power (Kong et al., 2016). This problem has been considered recently by Su et al. (2017).

## 5 Numerical Investigation of Testing Methods

We conduct a simulation study to compare the finite sample performance of each testing method. In an effort to respect the simulation settings used by the original tests’ proponents, we carry out two sets of simulations: one for testing the linear effect of the functional covariate and the other for testing the null effect. Each data set is generated under dense, moderately sparse, and sparse designs, and the number of units per subject is defined respecting their data generation settings. To investigate how the methods perform in moderately sparse and sparse designs, we randomly sample observation points per curve without replacement from the discrete uniform distributions and for both M1 and Y1 settings, and for both G1 and G2 settings, and and for the H1 setting. After generating the designs, we carried out functional principal component analysis with 99% of the total variance explained to impute functional data that were sparsely observed.

Assuming that the settings of each method will highlight the characteristics of its respective test, we use all the testing methods with each of the data sets that are generated. We assess the size and power of the tests for sample sizes that vary from to ; the results are based on 5,000 simulations for size assessment and 1,000 simulations for power assessment.

### 5.1 Simulation Designs for Testing No Covariate Effect

For the no-effect null hypothesis, the model used to generate the data under the alternative is different in each scenario, but they have in common the use of to control the departure from the model without the covariate effect. For all settings, corresponds to the null hypothesis of no effect and corresponds to the alternative hypothesis of a non-null effect. The scenarios (G0, M0) are as follows:

• Setting 1 (G0). The functional process for the functional covariate in this case is a Brownian motion with functional mean and , with and . We use equidistant points in the interval . The model that generates the data is

 Yi=δ∫Xi(t)β(t)dt+εi, (14)

where , , and (García-Portugués et al., 2014).

• Setting 2 (M0). In this setting the functional covariate is generated as , with and . We use equidistant points in the interval . The model that generates the data uses a bivariate function linear in ,

 Yi=α+δ∫F{Xi(t),t}dt+εi, (15)

where , , , and (McLean et al., 2015).

### 5.2 Simulation Designs for Testing a Linear Covariate Effect

In this section, we consider five simulation scenarios: four of them are defined by the articles under study, and the last one is inspired from Yao and Müller (2010) and used as a baseline. As in Section 5.1, the index is used to control the departure from the null hypothesis. Specifically, corresponds to the null hypothesis of linear effect and corresponds to the alternative hypothesis of nonlinear effect. The scenarios (G1, G2, M1, H1, and Y1) are as follows:

• Setting 1 (G1). The functional process for the functional covariate in this case is a Brownian motion with functional mean , and , with and . We use equidistant points in the interval . The model that generates the data is

 Yi=∫Xi(t)β(t)dt+δ∫Xi(t)Xi(t)dt+εi. (16)

We consider , , and evaluate the model with (García-Portugués et al., 2014).

• Setting 2 (G2). This setting is like G1, except that is defined as (García-Portugués et al., 2014).

• Setting 3 (M1). In this setting, the functional covariate is given by , with and . We use equidistant points in the interval . The model that generates the data uses a convex combination of a bivariate function linear in and one nonlinear in , in the following form:

 (17)

where . The departure in this case has a factor that can control how much the generated data deviates from the linear function. We consider several values for and assume (McLean et al., 2015).

• Setting 4 (H1). The functional covariate is given by an independent standard Brownian motion. We use equidistant points in the interval . The model that generates the data is

 Yi=α+∫Xi(t)β(t)dt+δ∫∫Xi(t)Xi(s)dtds+εi, (18)

where in all cases. We consider , , and assume (Horváth and Reeder, 2013).

• Setting 5 (Y1). This last scenario is used as a baseline comparison, because there is no testing method associated with it. The functional process for the functional covariate is generated as , where , where , , , and , and is a measurement error for . We use equidistant points in the interval . The model that generates the data is

 Yi=2∑j=1ξijβ+δ{2∑j=1ξ2ij+ξi1ξi2}+εi, (19)

where and . We consider and assume (Yao and Müller, 2010).

### 5.3 Computational Implementation

The GGF method was implemented through the flm.test function in the R package fda.usc version 1.2.3. The software fits the FLM and estimates the coefficient function by using B-spline basis functions without penalization. The number of basis functions can be predetermined by the user or be chosen via the generalized cross-validation criterion (Ramsay and Silverman, 2005). However, it is worth mentioning that for the M1 setting, the flm.test function encounters singularity errors and fails when more than four basis functions are used. We therefore used basis functions to approximate the functional covariate. The number of bootstrap replicates was .

The HR testing method requires the number of functional principal components (FPCs) to be decided initially. Horváth and Reeder (2013) reported simulation results for several components. In our simulation study, we fixed the number of FPCs at 3.

We implemented the MHR method by using the pseudo.rlr.test function of the R package lmeVarComp version 1.0 and considered 10,000 runs for approximating the null distribution of the test statistic.

### 5.4 Results

We evaluate the size and power performance of the described testing methods under a wide variety of scenarios. The Type I error rates and power are estimated as the proportion of rejecting the null hypothesis in the 5,000 and 1,000 simulated samples, respectively.

For testing linearity, Table S1 in the Supplementary Material (Appendix A.1) shows the performance of the testing methods for dense sampling design by comparing their empirical Type I error rates for nominal levels of 1%, 5%, and 10% and also for varying sample sizes. The results indicate that all three methods behave satisfactorily in terms of empirical levels under all settings, when the sample size is large (). The HR method slightly overestimates the highest nominal level (10%), especially under the G1 and G2 settings. The GGF and MHR methods perform similarly. Power curves for dense sampling design are included in the Supplementary Material, Appendix A.1. Figure S2 shows that when , the MHR method appears to outperform GGF and HR under all data generation settings. For the G1 and G2 settings, there is a small difference in power between MHR and GGF. However, the difference between the two methods becomes more distinguishable under the H1, M1, and Y1 settings. The GGF and HR methods perform similarly in terms of power under the H1 and Y1 settings. For the other settings, GGF is more powerful than HR. We also investigate how the methods behave when the sample size changes. As the sample size decreases to , both GGF and MHR still provide good Type I error rates. The empirical levels of MHR are fairly close to the nominal levels regardless of the data generation setting. However, HR performs very poorly compared to the other two methods. The HR method tends to overestimate all nominal levels for the moderate sample size. When , all three methods in general overestimate the nominal levels. The empirical levels are only slightly higher than the nominal ones for the MHR method, but not for the GGF and HR methods. Particularly, the performance of HR deteriorates considerably as the sample size becomes smaller. Because the empirical levels for the small sample size are significantly inflated, power comparison for would not be appropriate. In Figure S1 (Appendix A.1), we observe that MHR is more powerful than GGF for the moderate sample size under all settings.

Table 1 summarizes the rejection rates for testing linearity under moderately sparse design. For a large sample size (), GGF and MHR maintain the correct nominal levels. The HR method still tends to overestimate the nominal ones, but provides close results to the desired levels. Figure 1 displays the simulated power curves for testing the null hypothesis of the linear covariate effect under the moderate sampling design with the large sample size. The power performance of the methods is a little affected by the change of the sampling design. The methods exhibit minor power loss compared to the results obtained for densely sampled design. The MHR method shows a general advantage over the GGF and HR methods as increases. It performs slightly better than the GGF method under the G1 and G2 settings. For the other settings, the difference between the two methods is more significant. The HR and GGF methods appear to perform very similarly under the H1 and Y1 settings. However, the power of HR is consistently lower than that of GGF for the other data generation settings. For a small sample size (), the GGF method tends to underestimate lower nominal levels (1% and 5%), while the MHR method produces significantly higher empirical rejection rates than the nominal levels of 5% and 10%. As in the dense case, there is an especially pronounced difference between the empirical levels for the HR method and the nominal levels. Both GGF and MHR result in more stable Type I errors for . We notice that the empirical levels decrease significantly as the sample size increases, but the HR method still produces inflated Type I error rates. Similar to the dense case with the moderate sample size, MHR performs better than GGF in terms of power under all settings. The power of the methods increases at a slower rate as the sample size decreases.

Tables S2 in the Supplementary Material (Appendix A.3) reports the probability of rejecting the null hypothesis of linear relationship for sparse sampling design. The Type I error rates are similar to those of the moderately sparse design except for the M1 data generation setting. For the M1 setting, we notice that all three methods have very inflated rejection probabilities. Moreover, the rejection rates for the GGF and MHR methods are not decreasing as the sample size increases. The problem here might be that there are very few observations per curve, so the estimation performance of the FPCA is affected by the sparsity level of the data. Hence, the methods fail to estimate the Type I error rates accurately. The Supplementary Material includes additional simulation results (Table S3), which indicate that adding few more observations per curve—that is, making the data less sparse, improves the performance of GGF and MHR considerably. As for power comparison, the ordering of the methods does not change except that the HR method produces sightly better results than GGF under the Y1 setting for the large sample size. In general, all three methods lose power as the functional data becomes more sparse, as expected.

For testing nullity, Table 2 shows the Type I error rates of the GGF, MHR, and KSM methods. Our results indicate that the rejection probabilities do not appear to change much as the grid of points for the functional covariate becomes more sparse. The rejection rates are mostly within two standard errors of the correct levels for all the designs and various sample sizes, which means that all three methods result in reasonable Type I errors. For all sampling designs, GGF provides more conservative results for the G0 setting than those for the M0 setting. Furthermore, when , GGF provides more conservative results than those for MHR and KSM under the G0 data generation setting. The methods still provide good rejection probabilities as the sample size decreases. Only MHR seems to provide relatively conservative results for under the G0 data generation setting. The methods have comparable power for all sample sizes. According to Figure S6 (Appendix B.1), for dense sampling design with a large sample size, the power functions for MHR and KSM are very close to each other such that they overlap. The GGF method is falling dramatically behind these two methods in terms of the power performance. When , the KSM outperforms MHR under the G0 setting. Figures 2 and S7 (Appendix B.2)show the power performance of the methods for moderately sampled data with large and moderate sample sizes, respectively. Similar to the previous results, the MHR and KSM methods have good power properties and that they outperform the GGF method substantially both under the G0 and M0 data generation settings when . For a moderate sample size (), in particular, KSM is more powerful than MHR under the G0 setting. We notice that the power curves for sparse design are very similar to those obtained for moderately sparse design.

We also compare the computational costs of the four methods for 10 simulation runs of the M0 and M1 data generation settings under dense design with the sample size . The simulations were run on a 2.3 GHz DELL Quad Processor AMD Opteron with 512 Gb of RAM. The KSM method simulated the data in approximately 3 seconds and the MHR method did so in 4 seconds, which indicates that MHR runs almost as fast as KSM. The GGF method took roughly 12 seconds. The HR method was by far the slowest method, with a computation time of 159 seconds.

To sum up, the HR method falls well behind the MHR and GGF methods, because it provides inaccurate size and power results and has computational complexity. Despite the fact that the GGF method produces results rather close to those for the MHR method for some cases, it still has the disadvantage of being computationally more expensive. Our extensive simulation studies indicate that, for testing linearity, the MHR method outperforms its competitors with regard to approximately close empirical levels, high power rates, and computational efficiency. For testing nullity, both MHR and KSM perform similarly in terms size and power performance for the large sample size. For a moderate sample size, there is no uniform best method; however, based on the results we recommend the KSM method.

## 6 Data Analysis

We consider the application of these methods to a food quality control problem. The Tecator data set has been commonly used to predict the fat content of meat samples and is found at http://lib.stat.cmu.edu/datasets/tecator.

This data set includes measurements of a 100-channel spectrum of absorbances, in addition to fat, protein, and moisture (water) content from finely chopped pure meat samples. For each sample of meat, a 100-channel near-infrared (NIR) spectrum of absorbances is calculated as a log transform of the transmittance obtained by the analyzer and recorded. The absorbances for a meat sample can be deemed to be discrete realizations of random smooth curves, . The absorbance trajectories versus wavelength are displayed in Figure 3.

The data were first analyzed by Borggaard and Thodberg (1992), who trained neural network models to predict the fat content. Yao and Müller (2010) proposed a functional quadratic regression model to predict the fat content depending on the absorbance trajectories. For the same purpose, Febrero-Bande and González-Manteiga (2013) developed an algorithm for functional regression models whose response variable comes from an exponential family. Rather than focusing on prediction, Horváth and Reeder (2013) and García-Portugués et al. (2014) used this data set to investigate whether a linear dependence existed between the functional covariate and the scalar response.

The goal of this section is to test whether the association between the spectra of absorbances (functional predictor) and each of the measures of fat, protein, or water content of the meat samples (scalar responses) is null or not. In this regard, we employ the three methods—GGF, MHR and KSM—and we discuss whether there is evidence against a null association. In addition, we investigate whether the existing association is linear by employing the three methods—GGF, MHR and HR. A significance level of is used. Because we use the same data set with three different methods for each response (fat, protein, and moisture), we apply a Bonferroni correction to account for multiple testing. The adjusted significance level is .

The results for the nullity test are shown in Table 3. These results are not very surprising, because previous analyses (Horváth and Reeder, 2013; García-Portugués et al., 2014) have determined that an association exists between each of the responses and the functional covariate. Our analysis confirms this, because all -values are less than . More interesting results are obtained by the linear tests (Table 3), because we can draw different conclusions depending on the test we use. The -values of the GGF method are greater than for both fat and water content, which means that this test produces no evidence to reject the null hypothesis of a linear relationship between percentage of fat and the absorbance trajectories, or between the moisture and the same functional covariate. A different conclusion can be drawn if we use the MHR or the HR method, because their -values are less than for all responses. These results are expected as in our simulation study; the GGF method showed consistently less power than the MHR test for various data structures.

## References

• Borggaard and Thodberg (1992) Borggaard C, Thodberg HH. Optimal minimal neural interpretation of spectra. Analytical chemistry 1992;64(5):545–51.
• Cai et al. (2006) Cai TT, Hall P, et al. Prediction in functional linear regression. The Annals of Statistics 2006;34(5):2159–79.
• Cardot et al. (2003) Cardot H, Ferraty F, Mas A, Sarda P. Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics 2003;30(1):241–55.
• Cardot et al. (1999) Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics & Probability Letters 1999;45(1):11–22.
• Cardot et al. (2004) Cardot H, Goia A, Sarda P. Testing for no effect in functional linear regression models, some computational approaches. Communications in Statistics-Simulation and Computation 2004;33(1):179–99.
• Crainiceanu and Ruppert (2004) Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2004;66(1):165–85.
• Crambes et al. (2009) Crambes C, Kneip A, Sarda P. Smoothing splines estimators for functional linear regression. The Annals of Statistics 2009;:35–72.
• Delsol et al. (2011) Delsol L, Ferraty F, Vieu P. Structural test in regression on functional variables. Journal of Multivariate Analysis 2011;102(3):422–47.
• Febrero-Bande and González-Manteiga (2013) Febrero-Bande M, González-Manteiga W. Generalized additive models for functional data. Test 2013;22(2):278–92.
• Ferraty and Vieu (2006) Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice. Springer Science & Business Media, 2006.
• García-Portugués et al. (2014) García-Portugués E, González-Manteiga W, Febrero-Bande M. A Goodness-of-Fit Test for the Functional Linear Model with Scalar Response. Journal of Computational and Graphical Statistics 2014;23(3):761–78.
• Goldsmith et al. (2011) Goldsmith J, Bobb J, Crainiceanu CM, Caffo B, Reich D. Penalized functional regression. Journal of Computational and Graphical Statistics 2011;.
• González-Manteiga et al. (2012) González-Manteiga W, González-Rodríguez G, Martínez-Calvo A, García-Portugués E. Bootstrap independence test for functional linear models. arXiv preprint arXiv:12101072 2012;.
• Hall et al. (2007) Hall P, Horowitz JL, et al. Methodology and convergence rates for functional linear regression. The Annals of Statistics 2007;35(1):70–91.
• Horváth and Reeder (2013) Horváth L, Reeder R. A test of significance in functional quadratic regression. Bernoulli 2013;19(5A):2120–51.
• Kong et al. (2016) Kong D, Staicu AM, Maity A. Classical testing in functional linear models. Journal of Nonparametric Statistics 2016;28(4):813–38.
• McLean et al. (2015) McLean MW, Hooker G, Ruppert D. Restricted likelihood ratio tests for linearity in scalar-on-function regression. Statistics and Computing 2015;25(5):997–1008.
• McLean et al. (2014) McLean MW, Hooker G, Staicu AM, Scheipl F, Ruppert D. Functional generalized additive models. Journal of Computational and Graphical Statistics 2014;23(1):249–69.
• Müller and Stadtmüller (2005) Müller HG, Stadtmüller U. Generalized functional linear models. Annals of Statistics 2005;:774–805.
• Ramsay and Dalzell (1991) Ramsay JO, Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B (Methodological) 1991;:539–72.
• Ramsay and Silverman (1997) Ramsay JO, Silverman B. Functional data analysis. Springer, New York, 1997.
• Ramsay and Silverman (2005) Ramsay JO, Silverman B. Functional data analysis. Springer, New York, 2005.
• Su et al. (2017) Su YR, Di CZ, Hsu L. Hypothesis testing in functional linear models. Biometrics 2017;.
• Swihart et al. (2014) Swihart BJ, Goldsmith J, Crainiceanu CM. Restricted likelihood ratio tests for functional effects in the functional linear model. Technometrics 2014;56(4):483–93.
• Wood et al. (2013) Wood SN, Scheipl F, Faraway JJ. Straightforward intermediate rank tensor product smoothing in mixed models. Statistics and Computing 2013;:1–20.
• Yao and Müller (2010) Yao F, Müller HG. Functional quadratic regression. Biometrika 2010;97(1):49–64.

The Supplement Material contains two sections. Section A presents additional simulation results (empirical rejection rates and power curves) for testing linearity in the context of dense, moderately sparse, and sparse functional data. Section B includes additional power curves for testing nullity in the context of dense, moderately sparse, and sparse functional data.