Inference in Functional Linear Quantile Regression

# Inference in Functional Linear Quantile Regression

Meng Li
Department of Statistical Science, Duke University, Durham, NC
email: ml371@stat.duke.edu
Kehui Wang
PPD Inc, Morrisville, NC
email: kehui.wang@ppdi.com
Arnab Maity
Department of Statistics, North Carolina State University, Raleigh, NC
email: arnab_maity@ncsu.edu
Ana-Maria Staicu
Department of Statistics, North Carolina State University, Raleigh, NC
email: astaicu@ncsu.edu
###### Abstract

In this paper we study statistical inference in functional quantile regression for scalar response and a functional covariate. Specifically, we consider linear functional quantile model where the effect of the covariate on the quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. The objective is to test that the regression parameter is constant across several quantile levels of interest. The parameter function is estimated by combining ideas from functional principal component analysis and quantile regression. We establish asymptotic properties of the parameter function estimator, for a single quantile level as well as for a set of quantile levels. An adjusted Wald testing procedure is proposed for this hypothesis of interest and its chi-square asymptotic null distribution is derived. The testing procedure is investigated numerically in simulations involving sparsely and noisy functional covariates and in the capital bike share study application. The proposed approach is easy to implement and the R code is published online.

Keywords: Functional quantile regression; Functional principal component analysis; Measurement error; Wald test; Composite quantile regression.

## 1 Introduction

The advance in computation and technology generated an explosion of data that have functional characteristics. The need to analyze these type of data triggered a rapid growth of the functional data analysis (FDA) field; see Ramsay and Silverman (2005); Ferraty and Vieu (2006) for two comprehensive treatments. The current research in functional data analysis has been primarily focused on mean regression and only limited interest has been on quantile regression (Koenker, 2005). Quantile regression is appealing in many applications by allowing us to describe the entire conditional distribution of the response at various quantile levels. In our capital bike share data application, it is of interest to study the effect of the bikes rental behavior of casual users in the previous day on the total bike rentals in the current day quantile regression allows us to form a better description of this relationship. Previous papers examining this problem assumed a mean regression functional model, which implicitly assumes that the effect of the previous day casual bike rentals is the same for current busy days (hight total number of casual rentals) and for current non-busy days. This is a rather restrictive assumption for our data application.

Most of the functional regression research considers modeling the mean response (see for example,Yao et al. (2005); Jiang and Wang (2010); Gertheiss et al. (2013); Ivanescu et al. (2015); Usset et al. (2016)); only few works are accommodating higher order moment effects (Staicu et al., 2012; Li et al., 2015). Quantile regression models for scalar responses and functional covariates have been introduced in Cardot et al. (2005). Functional quantile regression (fQR) models essentially extend the standard quantile regression framework to account for functional covariates: the effect of the covariate on a particular quantile of the response is modeled through the inner product between the functional covariate and an unknown smooth regression parameter function that varies with the level of quantile. Cardot et al. (2005) considers a smoothing splines-based approach to represent the functional covariates and establishes a convergence rate result;  Kato (2012) studied principal component analysis (PCA)-based estimation and establishes a sharp convergence rate. Ferraty et al. (2005) and Chen and Müller (2012) estimated the conditional quantile function by inverting the corresponding conditional distribution function; they too studied consistency properties of the regression estimator. Nevertheless, hitherto there is no available work on statistical inference of the quantile regression estimator. Additionally all the functional quantile regression research assumes that the functional covariate is observed either completely on the domain or at very dense grids of points and typically with little or no error contamination.

In this paper we are interested in formally assessing whether the effect of the true smooth signal, as measured by the covariate, varies across few quantile levels of the response, when the smooth signal is observed at finite grids and possibly perturbed with error. For example, if the covariate has no effect on the response, then the above hypothesis holds true; however our hypothesis covers a variety of other situations. The functional linear regression model (Cardot et al., 2005; Chen and Müller, 2012; Kato, 2012), which is very popular in functional data analysis, essentially assumes that this null hypothesis is valid. The hypothesis of interest is important in its own right, yielding a more comprehensive description of the relationship between the covariate and the conditional distribution of the response. Additionally, formally assessing such hypothesis is critical when one wishes to improve the estimation accuracy of the conditional quantile at a single level; if several quantiles of the response are equal, then the estimation accuracy can be improved by borrowing information across these quantile levels. For example, in the case of standard quantile regression where the predictor is a scalar or a vector, there are several approaches to aggregate information across quantile levels in order to improve quantile estimation. This approach is known in the literature by the name composite quantile regression  (Koenker, 1984; Zou and Yuan, 2008; Zhao and Xiao, 2014; Jiang et al., 2014). Composite quantile regression improves the efficiency of the quantile estimators at single quantile level, which is important for extreme quantiles (Wang and Wang, 2016). We consider these ideas in our real application analysis, though a more formal investigation to explore this direction is left for future research.

To approach this problem, we assume a linear functional quantile regression (fQR) model that relates the th quantile level of the response to the covariate through the inner product between a bivariate regression coefficient function and the true covariate signal. In the case when the true signal is measured at same time points across the study, one naïve way to test the null hypothesis, that the effect of the true covariate signal is constant across several quantile level of interest, is to treat the discretely observed functional covariates as high dimensional covariate and apply standard testing procedures (Wald test) in linear quantile regression for vector covariates (Koenker, 2005). As expected, such approach results in inflated type I errors rates due to the high correlation between the repeated measurements corresponding to the same subject; the situation gets progressively worse when the covariate includes noise. Another alternative is to consider a single number summary of the covariate, such as average or median, and carry out this hypothesis testing by employing standard testing methods in quantile regression. Our numerical investigation of this direction shows that while the Type I error rates are preserved well, the power is substantially affected.

We propose to represent the smooth signal covariate and the quantile regression parameter function using the same orthogonal basis system; this reduces the inner product part of the linear fQR model to an infinite sum of products of basis coefficients of the smooth covariate and parameter function. There are many options in terms of orthogonal bases: we consider the data-driven basis that is formed by the leading eigenbasis functions of the covariance of the true covariate signal and use the percentage of variance explained (PVE) criterion to determine a finite truncation for this basis. While using a finite basis system reduces the dimensionality of the problem, an important challenge is handling the variability of the basis coefficients of the smooth covariate, called functional principal component (fPC) scores. We develop the asymptotic distributions of the quantile estimators based on the estimated fPC scores, when functional covariate is sampled at fine grid of points (dense design). Finally, we introduce an adjusted Wald test statistic and develop its asymptotic null distribution. The introduced testing procedure shows excellent numerical results even in the situations when the functional covariate is sampled at few and irregular time points across the study (sparse design) and the measurements are contaminated by error.

The development of asymptotic distributions of the quantile estimators based on the estimated fPC scores has important differences from the standard linear quantile regression with vector covariates for a couple of reasons. First, the predictors, fPC scores, are unknown and require estimation; this induces increased uncertainty in the model. We show that asymptotically the quantile estimators are still unbiased, but the variance is inflated. This implies that, in this reduced framework, a direct application of the Wald testing procedure for null hypotheses involving regression parameters is not appropriate. Second, dealing with estimated fPC scores in this situation is different from the measurement errors in predictors setting. For the latter, it is typically assumed that the measurement error and the true predictors are mutually independent or that the errors are independent across subjects (Wei and Carroll, 2009; Wang et al., 2012; Wu et al., 2015). However, in the functional data setting the resulting errors, due to the difference between the estimated fPC scores and the true scores, are dependent with the true predictors and are also dependent across subjects. As a result, the theoretical development requires more careful quantification in terms of the estimated scores and the usage of quantile loss.

This article makes three main contributions. First, we establish the asymptotic distribution of the coefficient estimator for both one single quantile level and multiple quantile levels for dense sampled functional covariates. The results show that, while the estimators are asymptotically unbiased, their variance is inflated. As far as we know, this is the first work to develop the theoretical properties of the quantile estimators allowing inference; previous works have mainly discussed consistency and minimax rates (see Chen and Müller (2012); Kato (2012)). Second, we propose an adjusted Wald test statistic for formally assessing that the quantile regression parameter is constant across specified quantile levels and show that its asymptotic null distribution is chi-square. Third, we consider cases where the functional covariate is observed sparsely and contaminated with noise and illustrate through detailed numerical investigation that the testing procedure continues to have excellent performance. Furthermore, we demonstrates the usage of the composite quantile regression and the corresponding advantage in terms of estimation and prediction accuracy, using a capital bike rental data set.

The rest of the paper is organized as follows. Section 2 introduces the statistical framework, describes the null and alternative hypotheses, discusses a simpler approximation of the testing procedure and presents the estimation approach. Section 3 develops the asymptotic normality of the proposed estimators, introduces the adjusted Wald test and derives its null asymptotic distribution. Section 4 presents extensive simulation studies confirming excellent performance of the proposed test procedure in various scenarios for both dense and sparse designs. Section 5 applies the proposed test to a bike rental data and illustrates the improvement of combined quantile regressions compared to a single level quantile regression after the proposed tests being used. Proofs of Theorem 3.1 and Thereon 3.2 are given in Section 6. Some additional useful results and technical proofs are provided in the Appendix.

## 2 Methodology

### 2.1 Statistical framework

Suppose we observe data , where is a scalar response variable, is the evaluation of a latent and smooth process at the finite grid of points which is measured with noise, and is a bounded closed interval. It is assumed that the observed functional covariate is perturbed by white noise, i.e. , where has mean 0 and variance . Furthermore, we assume that the true functional signal and are independent and identically distributed. Without loss of generality, we use throughout the paper. Our objective is to formally assess whether the smooth covariate signal has constant effect at specified quantile levels of the response.

Let be the conditional th quantile function of the response given the true covariate signal where . We assume the following linear functional quantile regression (fQR) model:

 QYi|Xi(τ)=β0(τ)+∫10β(t,τ)Xci(t)dt, (2)

where is the quantle-level varying intercept function, and is the bivariate regression coefficient function and the main object of interest; it is assumed that . Here is the de-meaned smooth covariate signal, defined as . Model (2) is an extension of the standard linear quantile regression model (Koenker, 2005) to functional covariates. It was first introduced by Cardot et al. (2005) and later considered by (Chen and Müller, 2012; Kato, 2012). For simplicity, in the following it is assumed that the smooth covariate signal has zero mean function, , for all .

Let be a set with quantile levels of interest where . Our goal is to test the null hypothesis:

 H0:β(⋅,τ1)=⋯=β(⋅,τL), (3)

against the alternative hypothesis . This null hypothesis involves infinite dimensional objects, which is very different from the common null hypotheses considered in quantile regression.

One approach to simplify the null hypothesis is by using basis functions expansion. Specifically, let be an orthogonal basis in such that if and if . We represent the unknown parameter function using this orthogobal basis where are unknown parameter loadings that are varying with the quantile level . It follows that the equality is equivalent to . Thus, the null hypothesis (3) can be written as for . Furthermore, we represent the smooth covariate using the same basis function as where are smooth covariate loadings. Then, the linear fQR model (2) can be equivalently represented as . In practice the infinite summation is typically truncated to some finite truncation . As a result the fQR model can be approximated by

 QKYi|Xi(τ)=β0(τ)+K∑k=1βk(τ)ξik, (4)

and the null hypothesis to be tested can be approximated by a reduced version

 HK0:⎛⎜ ⎜ ⎜ ⎜ ⎜⎝β1(τ1)β2(τ1)⋮βK(τ1)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝β1(τ2)β2(τ2)⋮βK(τ2)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⋯=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝β1(τL)β2(τL)⋮βK(τL)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠. (5)

If the value of and the loadings ’s were known then the above model is exactly the conventional quantile regression model  (Koenker, 2005). In such case, the standard approach is to represent the reduced null hypothesis (5) in a more convenient way and then use a Wald testing procedure. More specifically denote by the -dimensional parameter vector and define the full quantile regression parameter vector of dimension . The reduced null hypothesis can be equivalently re-written as , where and

 R1(L−1)×L=⎡⎢ ⎢ ⎢⎣1−10…0001−1…00………………000…1−1⎤⎥ ⎥ ⎥⎦R2K×(K+1)=[0K|IK]. (6)

Here denotes the -dimensional vector of zeros and is the dimensional identity matrix. Wald test is typically formulated as , where is the quantile regression estimator of and is a consistent estimator of - the covariance of - conditional on the true loadings ’s. However, the truncation is unknown and so are the loadings of the smooth covariate signal, , and they both represent important sources of uncertainty that a valid aproach has to account for. In particular the uncertainty in estimating and will affect the covariance and furthermore may afect the asymptotic null distribution of the standard Wald.

Depending on the choice of the orthogonal basis, the approaches used to select the finite truncation and to develop the theoretical properties for the quantile regression estimators differ. Several choices have been commonly used in functional data analysis literature: Fourier basis functions (Staicu et al., 2015), Wavelet basis (Morris and Carroll, 2006) or orthogonal B-spline (Zhou et al., 2008; Redd, 2012). One important aspect to keep in mind when selecting the basis functions is how to handle the finite truncation . In this paper we consider the orthogonal basis given by the eigenfunctions of the covariance of the smooth covariate signal . Let be the covariance of ; Mercer’s theorem gives the following spectral decomposition of the covariance where are the pairs of eigenfunctions and corresponding eigenvalues. The eigenvalues ’s are nondecreasing and nonnegative and the eigenfunctions ’s are mutually orthogonal functions in . Using the Karhunen-Loève expansion, the zero-mean smooth covariate can be represented as where are commonly known as functional principal component (fPC) scores of , satisfying that , and uncorrelated over . A popular way to select the finite truncation, or equivalently the number of leading eigenfunctions, is the percentage of variance explained; alternative options for selecting the finite truncation are considered in Li et al. (2013).

### 2.2 Estimation procedure

We discuss estimation for the case when the functional covariate is observed on a fine grid of points - a setting known in the literature by the name of dense sampling design. Nevertheless our procedure can be successfully applied to the case when the covariate is observed on an irregular sampling design with few points (sparse sampling design) and contaminated with noise, as illustrated later in the numerical investigation. When the sampling design is dense, and thus is very large for each , a common approach in functional data analysis is to reconstruct the underlying trajectories by smoothing the data using a working independence assumption; this approach has been theoretically studied by Zhang and Chen (2007). Let be the sample mean of these reconstructed trajectories and denote by the centered covariate. Furthermore, let be the sample covariance of ; the spectral decomposition of yields the pairs of estimated eigenfunctions and eigenvalues . The theoretical properties of the estimated eigenfunctions have been discussed in Hall and Hosseini-Nasab (2006); Hall et al. (2006); Zhang and Chen (2007). Finally the fPC scores are estimated as ; in practice numerical integration is used to approximate the integral; see also Li et al. (2010).

Using the estimated fPC scores ’s, the quantile regression parameter of the approximated linear fQR mdoel, , is estimated by

 ˆθτ=argmin(b0,b1,…,bK)T∈RK+1n∑i=1ρτ(yi−b0−K∑k=1ˆξikbk). (7)

where is the quantile loss function and is the indicator function that equals if and otherwise. In the next section we study the theoretical properties of the quantile regression estimator obtained in this way. In contrast to the quantile regression, the estimated fPC scores ’s are dependent across subjects and this makes the corresponding theoretical developments much more challenging.

## 3 Theoretical properties

### 3.1 Assumptions

Let , and be the corresponding density function. We make the following assumptions:

1. are independent and identically distributed (i.i.d.) as , and and are independent where and for any .

2. The conditional distribution is twice continuously differentiable and the corresponding density function is uniformly bounded away from 0 and at points .

1. The functional covariates satisfy that uniformly for .

2. There exists a finite number such that if ; where recall that ’s are the eigenvalues of the covariance of ’s.

The i.i.d. assumption in A1 and the assumption on the conditional distribution in A2 are standard in quantile regression when the covariates are vectors; see Koenker (2005), Ch. 4. A1 assumes that the functional covariates ’s are observed with independent white noise , making the model more realistic compared to error free assumptions made by Kato (2012). The assumption A3 holds for Gaussian processes and is common in FDA literature; for example, see Hall et al. (2006); Li et al. (2010). Finally A4 requires that the functional covariates have a finite number of fPC’s.

The following assumptions are commonly when describing a dense sampling design  (Zhang and Chen, 2007; Li et al., 2010). For convenient mathematical derivations, we assume that there are the same number of observations per subject, i.e. for all .

• The time points for where the density has bounded support [0,1] and is continuously differentiable.

• where and is some constant.

For our theoretical development, we require the following condition for the kernel bandwidth that is used in smoothing the functional covariates.

• .

### 3.2 Asymptotic distribution

The following theorem gives the asymptotic distribution of the quantile estimator. Kato (2012) gave the minimax rate of the coefficient function estimation when there is no measurement error on the discrete functional covariates. The author assumed that the number of eigenvalues is infinite instead of finite as in our assumption A4. Assuming finite number of non-zero eigenvalues is critical since otherwise the quantile regression problem is ill-posed and the estimation of the coefficient function cannot achieve the root- rate as argued by Kato (2012). We denote as the diagonal matrix whose diagonal entries are and which is positive definite, where . Similarly, we denote . For now, assume that is known; the discussion of estimating is deferred to Section 3.4.

###### Theorem 3.1

Denote by the quantile regression estimator defined by  (7) for , where . Under conditions A1–A4, B1–B2, C1, we have

 √n(ˆθτ−θτ)d→N{0,τ(1−τ)D−11(τ)D0D−11(τ)+ΘτΣ0Θτ} (8)

where and the matrix is defined in Section 6 which does not depend on .

If then the asymptotic covariance matrix for for is

 Acov{√n(ˆθτl−θτl),√n(ˆθτl′−θτl′)} (9) = {min(τl,τl′)−τlτl′}D−11(τl)D0D−11(τl′)+Σ(τl,τl′),

where .

Remark that the asymptotic covariances in both (8) and (9) contain two components: a Huber (Huber, 1967) sandwich term that typical in quantile regression theory and one that we call it “variance inflation”. Specifically, if the true scores ’s were observed, then the asymptotic variance of would be , and the asymptotic covariance matrix for would be ; see  Pollard (1991); Koenker (2005). The variance inflation terms, in (8) and in (9), quantify the effect of uncertainty in estimation the fPC scores on the quantile regression estimators. Thus, when the covariates are functional data, the asymptotic distribution of is unbiased but the variance is inflated and furthermore the variance inflation terms depend on the true parameter value .

The proof of Theorem 3.1 is detailed in Section 6. The reasoning follows two main steps: 1) approximate the estimated fPC scores ’s by linear combinations of random vectors of the true fPC scores ; and 2) show that the approximation error in the predictors is negligible to the quantile loss function.

As a direct application of the Theorem 3.1, we can obtain the asymptotic distribution of the coefficient function according to Slutsky’s theorem. Under the same conditions of Theorem 3.1, for and , we have

 √n{ˆβ(t,τ)−β(t,τ)}d→N{0,τ(1−τ)aT(t)D−11(τ)D0D−11(τ)a(t)+aT(t)Θ(τ)Σ0Θ(τ)a(t)}, (10)

where .

Using the asymptotic properties of the quantile regression estimators, we are now ready to develop a Wald type testing procedure for assessing the general null hypothesis (3) or its finite reduced version (5) represented in vector form by . Recall that denotes the full quantile regression parameter.

We define a modified version of Wald test, called the adjusted Wald test, by ignoring the variance inflation terms in the above asymptotic covariances. Specifically let be matrix that has as the th block matrix on the diagonal and as the off-diagonal block matrices. Furthermore denote by a consistent estimator of . Define the adjusted Wlad test by

 Tn=n(Rˆζ)T (RˆΣˆζRT)−1 Rˆζ, (11)

where . This test is not a proper Wald test as the covariance matrix used is not the valid covariance of . The following result studies the asymptotic null distribution of the proposed test.

###### Theorem 3.2

Assume the regularity conditions A1–A4, B1–B2 and C1 hold. If the null hypothesis is true, , then the asymptotic distribution of is .

The proof of this result relies on the observation that if is the proper covariance of as described by Theorem 3.1, then . Thus, although the estimation of the fPC scores yields inflated covariance of the regression estimator, its effect on testing the null hypothesis (3) is negligible. Nevertheless, if one is interested in testing a different type of null hypothesis for , then this variance inflated term has to be taken into account for a proper testing procedure.

A consistent estimator of is a plug-in estimator using as an estimator for and as an estimator for . Both estimators are consistent estimators of and respectively, using law of large numbers-based arguments. Lemma 6.1 discusses the closeness between and . It implies that, for testing the null hypothesis of equal functional covariate effect across various quantile levels, the common Wald test based on the estimated fPC scores provides a valid testing procedure. The adjusted Wald test, that disregards the variance component due to the estimation uncertainty of the fPC scores, has a chi-square asymptotic null distribution.

### 3.4 Estimation of the finite truncation

Our results rely on the assumption that there is a finite number of positive eigenvalues of the covariance function of ’s. The number of principal components is unknown and should be selected in practice. The selection of has been studied intensively in the literature such as the criterion of percentage of variance explained (PVE). Recently Li et al. (2013) proposed a Bayesian information criterion (BIC) based on marginal modeling that can consistently select the number of principal components for both sparse and dense functional data, i.e. the resulting estimator converges to in probability. Using this result, we can show that the test statistic with replaced by the consistent estimator has an approximately null distribution given by asymptotically under H: . In the remaining studies and data analysis section, we use the PVE criterion.

## 4 Simulation

### 4.1 Settings

The simulated data is of the form , where is the scalar response and is the functional covariate contaminated with measurement error , , and is the true functional covariate. We generate the data from the following heteroscedastic model:

 Yi=∫Xi(t)tdt+{1+γ∫Xi(t)t2dt}ϵ,ϵ∼N(0,σ2ϵ=1), (12)

which leads to a quantile regression model of the form (2) with , and . Here the scalar controls the heteroscedasticity and determines how the coefficient function varies across . Specifically, if then the effect of is constant across different quantile levels of , while if then the effect of varies across different quantile levels of .

The true functional covariate is generated from a Gaussian process with zero mean and covariance function , where for and for , and are the orthonormal Legendre polynomials on : . It is assumed that the measurement error . Figure 1 shows an example of simulated data when and : observed functional covariates in the left panel and the histogram of the responses in the right panel.

The objective is to test the null hypothesis for , that the effect of the true functional covariate on the conditional distribution of the response is the same for all the quantile levels in a given set . When , the coefficient function is independent of , which means that null hypothesis is true; when then is varies with and thus the null hypothesis is false.

We use the proposed adjusted Wald test using a number of fPC selected via the PVE criterion and using PVE=95%. The R package refund (Huang et al., 2015) is used to estimate the fPC scores. We investigate the performance of the test for low and high level of measurement error in the functional covariate ( and respectively), for two sets of quantile levels - , containing one-sided quantile levels, and with two-sided quantile levels - and for varying sample sizes from to . We compare the proposed testing procedure with existing alternatives in terms of the Type I error and power performance when the sampling design of the functional covariate is regular across ’s and dense, as well as when the sampling design irregular and sparse.

### 4.2 Dense design

We first consider a dense design for the functional covariates: the grid of points for each is an equispaced grid of timepoints in . We are not aware of any testing procedures for testing the null hypothesis of constant effect at various quantile levels, when the covariate is functional; however we can exploit this particular setting and pretend the covariates are vectors and thus use or directly extend existing testing procedures from quantile regression. In particular, consider the following three alternative approaches: (1) treat the observed functional covariate as vector and use the common Wald testing procedure for vector covariates in quantile regression (NaïveQR); (2) summarize observed functional covariate via a single number summary of the functional covariate in conjunction with Wald procedure (SSQR); and treat the observed functional covariate as a vector, reduce the dimensionality using principal component analysis and then apply Wald testing procedure using the vector of principal component scores (pcaQR). For the pcaQR approach, the number of principal components are selected via PVE and using a level PVE=95%. Wald-type test for vector covariates (Koenker, 2005, Chapter 3.2.3) is used to carry out both NaïveQR and pcaQR.

Table 1 summarizes the empirical Type I error rates of the adjusted Wald test when testing at one-sided quantile levels () as well as two-sided quantile levels (), and furthermore when the functional covariate is observed with low () and large () measurement error. The results are presented for three significance levels , and ; they indicate irrespective of quantile levels set or magnitude of the measurement error the Type I error rates are slightly inflated for moderate sample sizes. Nevertheless the empirical Type I error rates converge to the nominal level. The empirical Type I error rates for the alternative approaches are presented in Table 2. As expected the NaïveQR approach has very poor performance. The NaïveQR approach does hypothesis testing when the covariates are highly correlated; this leads to numerical instability due to singularity of the design matrix. Therefore the NaïveQR method produces many missing values (reported as “–”) in the table, and yields to very inflated empirical Type I error rates for any significance level.

The third alternative approach, pcaQR, performs relatively good when the magnitude of the error is small (): the empirical Type I error is close to the nominal level. Nevertheless, as the error variance increases (), the empirical rejection probabilities are excessively inflated; additionally the approach yields to many missing values (reported as “–”); see Table 1, the last three columns and the rows corresponding to and sample sizes and . The results are not surprising, because in the case of large error variance, a direct application of principal component analysis yields a large number of principal components. As a consequence, the application of the classical Wald test for vector covariate leads to numerical instability due to singularity of the design matrix, in a similar way to the NaïveQR approach. The performance of SSQR approach is very good for all the scenarios considered and across various sample sizes: the empirical Type I error rates are close to the nominal levels. This is expected, as in the case when , the functional covariate effect is through its mean, and this effect is invariant over quantile levels.

Next we evaluate the performance in terms of empirical rejection probabilities when the null hypothesis is not true. We only focus on the proposed adjusted Wald testing and SSQR procedures, as they have the correct size. Figure 2 illustrates the power curves of the two approaches based on 2000 simulation, when the magnitude of the noise is large, ; the results are similar in the case of low noise () and for brevity are not included in the manuscript. The adjusted Wald procedure is much more powerful than the SSQR approach irrespective of the departure from the null hypothesis as reflected by the coefficient : for example, when the probability to correctly reject using the adjusted Wald is about 100 % when the sample size is or more, where as the counterpart obtained with SSQR is less than 70 % for sample sizes smaller than . The results are not surprising, as SSQR summarizes the entire functional covariate through a single scalar, whereas the proposed adjusted Wald uses the full functional covariate.

### 4.3 Sparse design

Next, we study the performance of the adjusted Wald testing procedure when the functional covariate is observed sparsely and with measurement error. In this regard we set an overall grid of 101 equispaced points in and consider two settings. First, for each we randomly generate time points to form - we call this setting ‘moderately sparse’ sampling design. Second, for each we randomly generate time points to form - we call this setting ‘sparse’ design. The generation of the true functional covariate and the observed covariate, as well as of the observed response follows as described in the previous section. We use the adjusted Wald test which relies on sparse fPCA techniques, that estimate the fPC scores ’s using conditional expectation proposed by Yao et al. (2005). When the sampling design of the functional covariate is sparse, there are no obvious reasonable alternative approaches to compare. Thus in this section we only discuss the performance of the proposed Wald-type procedure.

Table 3 shows the empirical Type I error when the noise level equals to . They show excellent performance of the adjusted Wald test in maintaining the nominal levels when the sample size is moderately large ( or larger) for both moderately sparse and sparse sampling design of the functional covariate. Figure 3 shows the power of the adjusted Wald test for moderately sparse and highly sparse designs for . It indicates that the sparsity of the functional covariates slightly affects the proposed functional Wald-type procedure, as expected. Nevertheless the adjusted Wald test continues to display excellent performance. The results are similar for low level of measurement error and for brevity are not included here.

## 5 Application

In this section we consider the capital bike sharing study and discuss the application of the proposed testing procedure to formally assess whether the effect of the previous day casual bike rentals on the current day total bike rentals varies across several quantile levels. The bike data (Fanaee-T and Gama, 2014) is recorded by the Capital Bikeshare System (CBS), Washington .C., USA, which is available at http://capitalbikeshare.com/system-data. As the new generation of bike rentals, bike sharing systems possess membership, rental and return automatically. With currently over 500 bike-share programs around the world (Larsen, 2013) and the fast growing trend, data analysis on these systems regarding the effects to public traffic and the environment has become popular. The bike data includes hourly rented bikes for casual users that are collected during January 1st 2011 to December 31st 2012, for a total of 731 days.

Our objective is to formally assess how the previous day casual bike rentals, , affects the distribution of the current day total bike rentals counts, . A subsequent interest is to predict the quantile of the total casual bike rentals . Figure 4 plots the hourly profiles of casual bike rentals in the left panel and the histogram of the total casual bike rentals in the right panel.

We assume the functional quantile regression model (2), , where is the total bike casual bike rentals for the current day and is the true profile of the casual bike rentals recorded in the previous day. As described earlier is the quantile varying intercept function and is the slope parameter and quantifies the effect of the functional covariate at the quantile level of the distribution of the response.

To address the first objective we consider a set of quantile levels and use the proposed testing procedure to test the null hypothesis

 H0:β(⋅,0.20)=β(⋅,0.40)=β(⋅,0.60)=β(⋅,0.80).

The number of fPC is selected using ; this choice selects three fPC. We use the adjusted Wald test and its asymptotic null distribution; the resulting -value is close to zero indicating overwhelming evidence that low and large number of bike rentals are affected differently by the hourly rentals on the previous day.

Next we turn to the problem of predicting the quantile of the total bike rentals for the current day. When some quantile coefficients in a region of quantile levels are constant, we may improve the estimator’s efficiency by borrowing information from neighboring quantiles to estimate the common coefficients, especially when the quantile level of interest is high. Here consider the quantile level set around the th quantile. We apply the proposed method to estimate the coefficient functions at various quantile levels as shown in Figure 5. The corresponding adjusted Wald test leads to a -value = 0.466, which suggests that the quantile coefficients are not significantly different across the quantile levels. We consider combined quantile regression at by using the methods of quantile average estimator (QAE) and composite regression of quantiles (CRQ) with equal weights; see Koenker (1984); Wang and Wang (2016) for more technical details. We denote the single quantile regression estimation at the 90th quantile by RQ.

We use bootstrap samples to study the efficiency of the three estimators. Figure 6 plots the bootstrap means and standard errors of the estimates of when using QAE, CRQ and RQ. The QAE and CRQ estimators have smaller standard errors uniformly for all , indicating the efficiency improvement. This suggests the efficiency gain by combining information across quantile levels. We also observe that the number of fPC’s is either 3 or 4 in all bootstrap samples, suggesting that the assumption A4 makes sense to this data set.

Furthermore, we conduct a cross-validation by randomly selecting of the data as the training data set and using the other half as the testing data set. We use replications and calculate the prediction error for each replication and each as follows:

 PE=∑i∈test sampleρτ(yi−ˆξTiˆθτ), (13)

where the estimated coefficients are based on the training data and the summation is over the test data. We obtain the RQ estimators separately at each of while the QAE and CRQ estimators are shared across . The mean prediction errors based on replications are reported in Table 4. We can see that the application of QAE and CRQ improves the prediction significantly for the th and th quantiles; differences among the three methods are not significant at the lower quantiles. This makes sense since the data sparsity becomes more severe for more extreme quantile levels, therefore to incorporate lower quantile levels improves efficiency. In contrast, it may be not able to benefit the prediction performance at lower quantile levels by considering the higher levels.

## 6 Proofs

In this section, we sketch the proof for the main Theorem 3.1 in Section 6.1 and the proof of Theorem 3.2 in Section 6.2. Proofs for all lemmas used in this section are deferred to the Appendix. We use as the -norm for a function and as the Euclidean norm for a vector.

### 6.1 Proof of Theorem 3.1

The proof of Theorem 3.1 is composed of three steps. In step 1, we approximate the estimated scores ’s by linear combinations of ’s. In step 2, we obtain the asymptotic distribution of at a single quantile level. In step 3, we extend the results in step 2 to multiple quantile levels.

Step 1. Approximation of the estimated scores. Most of the existing literature has been focused on establishing error bounds for the estimation of eigenvalues and eigenfunctions; see for example Hall and Hosseini-Nasab (2006, 2009) and the discussion therein. In this paper we concentrate on the accuracy in predicting the fPC scores.

###### Lemma 6.1

Under assumptions B1, B2 and C1, we have

 E∥ˆξi−ξi∥2=o(n−1/2). (14)

 ˆξi−ξi=n−1/2Bξi+Op(n−1), (15)

where is a dimensional matrix with the bottom right block matrix equal to described next and the rest of the elements equal to zero. Here is a random matrix such that for and if .

The result given by (15) indicates that the leading term of is , which is a linear combination of and the random matrix does not depend on . The proof of this lemma is in Section 8.

Step 2. Quantile regression on estimated scores. We focus on a single quantile level in this step. For any , let

 Zn(δ)=n∑i=1{ρτ(ˆui−ˆξTiδ/√n)−ρτ(ˆui)}, (16)

where . Then is a convex function which is minimized at . Therefore, the asymptotic distribution of is determined by the limiting behavior of . Let , then according to the Knight’s identity (Knight, 1998), we can decompose into two parts: , where

 \ignorespaces\ignorespacesZ1n(δ) =−1√nn∑i=1ˆξTiδψτ(ˆui) \ignorespaces\ignorespacesZ2n(δ) =n∑i=1∫ˆξTiδ/√n0{I(ˆui≤s)−I(ˆui≤0)}ds=n∑i=1Z2ni(δ).

In order to show (8), it is sufficient to prove that

 Zn(δ)d→−δTW(τ)+12δTD1(τ)δ, (17)

where , since one can apply the convexity lemma (Pollard, 1991) to the quadratic form of in (17).

We shall derive the limiting distributions of and separately. Similarly to the definitions in (LABEL:eq:z1n), we define based on the true scores :

 Z∗1n(δ)=−1√n∑ξTiδψτ(ui), (18)

where . By a direct application of the central limit theorem (CLT), we obtain that the asymptotic distribution of is . However, when the predictors are estimated with errors, the difference is not negligible. Lemma 6.2 provides the following representation of .

###### Lemma 6.2

Under assumptions B1, B2 and C1,

 Z1n(δ)=δT[−1√nn∑i=1{ξiψτ(ui)−D1(τ)di}]+op(1), (19)

where and

Since are i.i.d., Lemma 6.2 allows us to directly apply Linderberg’s CLT to obtain the asymptotic distribution of . Note that