1 Introduction

Uncertainty Estimation in Functional Linear Models

July 6, 2019

Tapabrata Maiti111maiti@stt.msu.edu, Abolfazl Safikhani222as5012@columbia.edu, Ping-Shou Zhong333pszhong@stt.msu.edu

Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, U.S.A.

Department of Statistics, Columbia University, New York, New York, U.S.A

Abstract Functional data analysis is proved to be useful in many scientific applications. The physical process is observed as curves and often there are several curves observed due to multiple subjects, providing the replicates in statistical sense. The recent literature develops several techniques for registering the curves and associated model estimation. However, very little has been investigated for statistical inference, specifically uncertainty estimation. In this article, we consider functional linear mixed modeling approach to combine several curves. We concentrate measuring uncertainty when the functional linear mixed models are used for prediction. Although measuring the uncertainty is paramount interest in any statistical prediction, there is no closed form expression available for functional mixed effects models. In many real life applications only a finite number of curves can be observed. In such situations it is important to asses the error rate for any valid statistical statement. We derive theoretically valid approximation of uncertainty measurements that are suitable along with modified estimation techniques. We illustrate our methods by numerical examples and compared with other existing literature as appropriate. Our method is computationally simple and often outperforms the other methods.

keywords Basis functions; B-splines; Bias correction; Estimating equations; Functional mixed models; Karhunen-Love expansion; Prediction interval; Random effects;

1 Introduction

Functional data analysis received considerable attention in last couple of decades due to its applicability in various scientific studies (Ramsay and Silverman, 1997, 2010). A functional data consists of several functions or curves that are observed on a sufficiently large number of grid points. The dual characteristics of functional data, namely, maintaining the smoothness of individual curves through repeated observations and combining several curves poses the main challenge in functional data analysis (Morris and Carroll, 2006). In relatively simple situations where the curves could be represented by simple parametric models, this data characteristics can easily be handled by well established parametric mixed models (Laird and Ware, 1982, Verbeke and Molenbergs, 2000, Demidenko, 2013, McCulloch and Searle, 2001, Jiang 2007). In complex biological or physical processes, the parametric mixed models are not appropriate and thus modern functional mixed model techniques are developed. For a complete description and comparisons, instead of repeating the literature, we suggest Guo (2002a), Morris and Carroll (2006), Antoniadis and Sapatinas (2007), Chen and Wang (2011), Liu and Guo (2011), Maiti, Sinha and Zhong (2016) for recent developments in functional mixed linear models and their applications. Some of the related works, but not a complete list includes Brumback and Rice (1998), Goldsmith, Crainiceanu, Caffo and Reich (2011), Greven, Crainiceanu, Caffo and Reich (2011), Krafty, Hall and Guo (2011) and Staicu, Crainiceanu and Carroll (2010).

As noted by Antoniadis and Sapatinas (2007), much of the work done in this area is on estimation and only limited attention has been paid for inference. Morris and Carroll (2006) provided a comprehensive account of functional mixed models from Bayesian perspective. In Bayesian approach, the inference is automatic along with the estimation. However, this is not the case in frequentist perspective. Antoniadis and Sapatinas (2007) developed testing of random effects and fixed effects when a wavelet decomposition is used for smoothing the functional curves. They also observed the theoretical limitations of the procedures developed for functional mixed models, for example, Guo (2002a,b). Thus research developing theoretically valid inferential techniques for functional mixed models is important.

Prediction is one of the main objectives in functional data analysis in many applications. For example, one may like to predict the number of CD4 cells in HIV patients or thai circumference (related to body weight) for obese children in a future time point. The mixed effect, combination of fixed and random effects are typically used for such prediction. For statistical inference, standard error estimation of the mixed effects is an integral part of data analysis. However, this is a non trivial problem even in simple parametric random effect models. See, Kackar and Harville (1984), Rao and Molina (2015) and Jiang (2007) for various developments. We developed the approximation theory for standard error estimation as a measure of uncertainty for mixed effects in a functional linear mixed models. Our approach is frequentist. To our knowledge, the prediction error estimation in this sense for a functional mixed model is new. Guo (2002a) reported subject (curve) specific 95% confidence intervals in his applications. However, like Antoniadis and Sapatinas (2007) we were unable to verify the development with theoretical validation. We believe, Guo’s primary goal was in developing the functional mixed models and user friendly estimation rather than prediction interval estimation. This work, in this sense, is complementary to Guo (2002a) and an important contribution to the functional mixed model methodology.

In this paper we consider a functional mixed model similar to Chen and Wang (2011). We also follow the penalized spline smoothing technique similar to them. Of course there are other possible smoothing techniques, such as wavelet (Morris and Carroll, 2006, Antoniadis and Sapatinas (2007), functional principal components (Yao, Müller and Wang, 2005), Kauermann and Wegener, 2011 and Staicu, Crainiceanu and Carroll, 2010). For comparison, contrasts, computation and further literature review of various smoothing techniques in this context we refer to Chen and Wang (2011). The section 2 introduces the models and model estimation. The section also develops the approximation theory of prediction error. The simulation study and real data examples are given in Section 3. We conclude the development in Section 4. The proofs are deferred to Appendix.

2 Models and Estimation

We consider a functional mixed-effect model as

 Y(t)=X(t)Tβ(t)+Z(t)Tν(t)+e(t) (2.1)

where , is a -dimensional fixed coefficient function, is a -dimensional subject-specific random coefficient functions and is a Gaussian noise process with and . We also assume that and are independent.

Let the data are collected/designed at time points for -th subject, and -th time point, . Then the model (2.1) for this data is

 Y(tij)=X(tij)Tβ(tij)+Z(tij)Tνi(tij)+eij (2.2)

where is the covariates, known design for random effects . These are zero mean Gaussian process with , a positive definite matrix. We assume and are independent for or . The objective is to obtain the mean square error for predicting the mixed effect , where and are and dimensional known vectors.

Assume that belongs to the normed space of continuous functions with finite second derivatives and the covariance can be decomposed by

 γ(t,s)=Bνk(s)TΩkBνk(s).

Then we can approximate by

 βk(t)=Bk(t)Tθk

where is the vector of basis functions and is the corresponding coefficients. By Karhunen-Love expansion (Ash and Gardner, 1975), the random functions can be approximated by

 νik(t)=Bνk(t)Tαik

where , and are random coefficients with . In this paper, we consider B-spline basis functions for and . Let be a set of knot points and for any , where . Using these knots, we can define normalized B-spline basis function of order . The B-spline basis is defined by

 Bki(t)=(τi−τi−r)[τi−r,⋯,τi](τ−t)r−1+for i=1,⋯,L

where

 [τi−r,⋯,τi]ϕ(τ)=[τi−r+1,⋯,τi]ϕ(τ)−[τi−r,⋯,τi−1]ϕ(τ)τi−τi−r

denotes the -th order divided difference for distinct points of function and

 [τj,τj+1]ϕ(τ)=ϕ(τj+1)−ϕ(τj)τj+1−τj.

If , then for some integer . We also used B-spline for the .

Then the model (2.2) is represented as a linear mixed effect model

 Yi=Wiθ+Uiαi+ei (2.3)

where , , , and with and where and For convenience, denote as the variance of . Define . It follows that . Then, is the variance of . Hence, we can write the mixed effect into a function of and , which is

 A=lTθ+n∑i=1dTiαi=lTθ+dTα (2.4)

where and

 l=lT0⎛⎜ ⎜ ⎜⎝BT1(t)000⋱000BTp(t)⎞⎟ ⎟ ⎟⎠anddi=dTi0⎛⎜ ⎜ ⎜⎝BTν1(t)000⋱000BTνq(t)⎞⎟ ⎟ ⎟⎠.

2.1 Estimation of the fixed parameter and random effects

Following Chen and Wang (2007), treating the random effect as missing value, define the penalized joint log-likelihood for and as

 ℓ(θ,α) =n∑i=1{(Yi−Wiθ−Uiαi)TR−1i(Yi−Wiθ−Uiαi)+αTiΩ−1αi} (2.5) +p∑k=1λkθTkΔβkθk+q∑k=1ηkn∑i=1αTikΔνikαik

where , is the second derivative of with respect to and are defined similarly based on the B-spline basis functions .

Then (2.5) can be written as

 ℓ(θ,α) =n∑i=1{(Yi−Wiθ−Uiαi)TR−1i(Yi−Wiθ−Uiαi) (2.6) +αTiΩ−1αi}+θTΔβθ+n∑i=1αTiΔνiαi.

where and ,

Given , the minimization of provides

 ~θ =(n∑i=1WTiR−1iDiWi+Δβ)−1n∑i=1WTiR−1iDiYi ~αi =(UTiR−1iUi+Ω−1+Δνi)−1UTiR−1i(Yi−Wi~θ).

By some algebra, we then have

 ~θ = (n∑i=1WTiΣ∗iWi+Δβ)−1n∑i=1WTiΣ∗i−1Yi (2.7) ~α = Ω∗UTΣ∗−1(Y−W~θ) (2.8)

where and .

Because involves some unknown parameters, we estimate the variance component in by solving the following estimating equation

 QVC,k(σ):=−tr{PΣP∂Σ∗∂σk}+YTP∂Σ∗∂σkPY=0for k=1,⋯,g. (2.9)

where . Notice that contains estimating equations. Denote the estimate of by .

Note that the above estimating equation is bias-corrected score function from the score function of the restricted log-likelihood function. To see this point, we observe that restricted log-likelihood function is the following

 ℓV(σ) := −log|WΣ∗−1W|−log(|Σ∗|)−(Y−W~θ−U~α)TR−1(Y−W~θ−U~α) (2.10) −~αTΩ−1~α−~θTΔβ~θ−~αTΔν~α = −log|WΣ∗−1W|−log(|Σ∗|)−(Y−W~θ)TΣ∗−1(Y−W~θ)−~θTΔβ~θ = −log|WΣ∗−1W|−log(|Σ∗|)−YTPY.

Then the derivative of is

 ∂ℓV(σ)∂σ=−tr(P∂Σ∗∂σk)+YTP∂Σ∗∂σkPY (2.11)

But unless . To make the score equation to be unbiased, we modified the score function to be (2.9) such that it is unbiased.

2.2 Prediction and Prediction Mean Square Error

A naive prediction of given in (2.4) is where is an unknown vector of variance components in . The -th component of will be denoted as and . Unlike the simple linear mixed models, this prediction is biased as stated in the following theorem.

Theorem 1

The prediction is biased and the bias of the prediction of is

 Bias{~A(σ)}=−l′(WTΣ∗−1W+Δβ)−1Δβθ+d′Ω∗UTΣ∗−1W(WTΣ∗−1W+Δβ)−1Δβθ. (2.12)

To reduce the order of bias, we propose a bias-corrected prediction for as

 ~Ac(σ) =l′~θ+d′~α+l′(WTΣ∗−1W+Δβ)−1Δβ~θ (2.13) −d′Ω∗UTΣ∗−1W(WTΣ∗−1W+Δβ)−1Δβ~θ.

The proof is deferred to the Appendix.

Theorem 2

If is sparse, namely only a finite number of ’s are non-zeros (). Then the bias of is of order , which is negligible in comparing to .

The proof is deferred to the Appendix.

Now we will derive the prediction error formula. Let and where and . Define . Then it can be shown that

 E{(~Ac(σ)−A(σ))2} =(l−WTs)T(WTΣ−1W)−1(l−WTs)+d′(Ω−ΩU′Σ−1UΩ)d (2.14) +(l′−s′W)Δ1ΣΔT1(l′−s′W)T +2(l′−s′W)Δ1Σ(s′+(l′−s′W)(WTΣ−1W)−1WTΣ−1)T −2(l′−s′W)Δ1UΩd′+O(n−2)

where .

Let and where

 BTk = −(s′W−l′)(WTΣ∗−1W+Δβ)−1WTΣ∗−1∂Σ∗∂σkΣ∗−1W(WTΣ∗−1W+Δβ)−1WΣ∗ +(s′W−l′)(WTΣ∗−1W+Δβ)−1WTΣ∗−1∂Σ∗∂σkΣ∗−1 −s′∂Σ∗∂σkΣ∗−1(I−W(WTΣ∗−1W+Δβ)−1WΣ∗),

and Further, denote and . The following theorem states a practical formula for the .

Theorem 3

The MSE of prediction for is

 MSE{^Ac(^σ)} =(l−WTs)T(WTΣ−1W)−1(l−WTs)+d′(Ω−ΩU′Σ−1UΩ)d+(l′−s′W)Δ1ΣΔT1(l′−s′W)T +2(l′−s′W)Δ1Σ(s′+(l′−s′W)(WTΣ−1W)−1WTΣ−1)T−2(l′−s′W)Δ1UΩd′+2tr{(BD−1JΣ)2} +tr2(ΣBD−1J)+tr(D−1BTΣBD−1Σw)+4g∑j=1g∑l=1λTjΣ(GjΣGj+GlΣGj)Σλl+o(n−1),

where , and , and .

Therefore, an estimation of the can be obtained by plugging in the variance components estimate from (2.9) into the MSE expression.

2.3 Choice of smoothing parameters

As mentioned earlier, the basic modeling framework considered in this article is similar to Chen and Wang (2011), however, there is differences in smoothing procedure of fixed and random effects and in parameter estimation. We followed their smoothing parameter choices. For completeness we explained the procedure here briefly.

Chen and Wang (2011) proposed to estimate the variance component in by minimizing the following penalized log-likelihood, for given and ,

 ℓv(σ)=log|R|+(Y−Wθ−Uα)TR−1(Y−Wθ−Uα)+θTΔβθ+αTΔνα. (2.15)

Decomposing each and let be the first component of the and be the rest component of . Now collecting be a components vector and be the corresponding coefficients. The smoothing parameter are chosen by minimizing the following marginal REML

 ℓm(λ)=log|Σλ|+(Y−W(1)θ(1))TΣ−1λ(Y−W(1)θ(1))+log|WT(1)Σ−1λW(1)| (2.16)

where is the marginal covariance of , which is

 Σλ=R+UΩU+V(2)

where , and .

The smoothing parameters are chosen by minimizing the following marginal log-likelihood, for given

 ℓm(η)=log|R|+(Y−Wθ−Uα)TR−1(Y−Wθ−Uα)+αTΔνα+log|H|−log|Δν|.

where

 H=12∂2ℓv(σ)∂α∂αT=UTR−1U+Δν

for any symmetric matrix . Because , taking derivative of with respect to , gives

 n∑i=1αTikΔνikαik+tr(H−1~Δ(k)ν)−Lηk=0

where and for . Then

 ^ηk=L(n∑i=1αTikΔνikαik+tr(H−1~Δ(k)ν))−1.

3 Numerical Findings

We investigated the finite sample performance of the proposed method through simulation and real data examples. The simulation study is also designed to verify the asymptotic behavior of the approximated prediction error formula and and related coverage errors.

3.1 Simulation Study

We considered the following mixed model setup

 Y(tij)=X(tij)β(tij)+Z(tij)ν(tij)+ϵ(tij)i=1,⋯,n and j=1,⋯,mi (3.17)

where for all . Three different values of was chosen, 50, 100 and 200. We generated the time points independently from Uniform(0,1). Let . Set and . We simulated from a Gaussian process with mean 0 and covariance where , and from a Gaussian process with mean zero and , where if , and otherwise. We set . We call this set up as Case I. As a second case, we kept everything same except the covariance structure of the random effects. Specifically, we took with . We call this as Case II. We also examined the performance for more fluctuated mean function where . We call this as case III.

The main purpose of this simulation study is to evaluate the performance of mean square error of the predictor of mixed effects that measuring subject (curve) specific means. For example, we considered where is one of the time points (for example ), and and are means of and respectively. In particular, here and , and and rest of for . Similar to Chen and Wang (2007), we used the following algorithm to estimate and . We fixed the tuning parameters and and set an initial value of , , and . Then the initial estimates of and are

 ~α(0)=Ω∗(0)UTΣ∗−1(0)(Y−W~θ(0))and~θ(0)=(n∑i=1WTiΣ∗i(0)Wi+Δβ)−1n∑i=1WTiΣ∗i−1(0)Yi.

respectively. We then repeat the following steps until all the parameter estimates converge

• Step 1: estimate through the estimating equation given in (2.9).

• Step 2: estimate and through (2.7) and (2.8), and by

 ^Ω=1nn∑i=1{~αi~αTi+^Ω∗−^Ω∗UTiMiUi^Ω∗}

where .

Plugging in the estimates of and from the previous algorithm, obtain a biased corrected estimation of by given in (2.13).

We then compute the following three quantities:

• The true MSE: computed by where is the number of replicated data sets and is the prediction of in the -th replicate. In all cases we took replication.

• Estimate MSE with estimated variance components: computed by the formula given in (2.14) using estimated and .

Along with the mean square prediction errors, we calculated 95% prediction coverage error where the prediction interval was calculated as .

The reported values in the table 1 are the averages of the prediction coverage over all the individuals. The relative bias is the relative difference between the true MSE and the estimated MSE, averaged over replications and then averaged over the individuals. Numbers in the bracket are the standard errors averaged over all the subjects.