Uncertainty Estimation in Functional Linear Models
July 6, 2019
Tapabrata Maiti^{1}^{1}1maiti@stt.msu.edu, Abolfazl Safikhani^{2}^{2}2as5012@columbia.edu, PingShou Zhong^{3}^{3}3pszhong@stt.msu.edu
Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, U.S.A.
Department of Statistics, Columbia University, New York, New York, U.S.A
Abstract Functional data analysis is proved to be useful in many scientific applications. The physical process is observed as curves and often there are several curves observed due to multiple subjects, providing the replicates in statistical sense. The recent literature develops several techniques for registering the curves and associated model estimation. However, very little has been investigated for statistical inference, specifically uncertainty estimation. In this article, we consider functional linear mixed modeling approach to combine several curves. We concentrate measuring uncertainty when the functional linear mixed models are used for prediction. Although measuring the uncertainty is paramount interest in any statistical prediction, there is no closed form expression available for functional mixed effects models. In many real life applications only a finite number of curves can be observed. In such situations it is important to asses the error rate for any valid statistical statement. We derive theoretically valid approximation of uncertainty measurements that are suitable along with modified estimation techniques. We illustrate our methods by numerical examples and compared with other existing literature as appropriate. Our method is computationally simple and often outperforms the other methods.
keywords Basis functions; Bsplines; Bias correction; Estimating equations; Functional mixed models; KarhunenLove expansion; Prediction interval; Random effects;
1 Introduction
Functional data analysis received considerable attention in last couple of decades due to its applicability in various scientific studies (Ramsay and Silverman, 1997, 2010). A functional data consists of several functions or curves that are observed on a sufficiently large number of grid points. The dual characteristics of functional data, namely, maintaining the smoothness of individual curves through repeated observations and combining several curves poses the main challenge in functional data analysis (Morris and Carroll, 2006). In relatively simple situations where the curves could be represented by simple parametric models, this data characteristics can easily be handled by well established parametric mixed models (Laird and Ware, 1982, Verbeke and Molenbergs, 2000, Demidenko, 2013, McCulloch and Searle, 2001, Jiang 2007). In complex biological or physical processes, the parametric mixed models are not appropriate and thus modern functional mixed model techniques are developed. For a complete description and comparisons, instead of repeating the literature, we suggest Guo (2002a), Morris and Carroll (2006), Antoniadis and Sapatinas (2007), Chen and Wang (2011), Liu and Guo (2011), Maiti, Sinha and Zhong (2016) for recent developments in functional mixed linear models and their applications. Some of the related works, but not a complete list includes Brumback and Rice (1998), Goldsmith, Crainiceanu, Caffo and Reich (2011), Greven, Crainiceanu, Caffo and Reich (2011), Krafty, Hall and Guo (2011) and Staicu, Crainiceanu and Carroll (2010).
As noted by Antoniadis and Sapatinas (2007), much of the work done in this area is on estimation and only limited attention has been paid for inference. Morris and Carroll (2006) provided a comprehensive account of functional mixed models from Bayesian perspective. In Bayesian approach, the inference is automatic along with the estimation. However, this is not the case in frequentist perspective. Antoniadis and Sapatinas (2007) developed testing of random effects and fixed effects when a wavelet decomposition is used for smoothing the functional curves. They also observed the theoretical limitations of the procedures developed for functional mixed models, for example, Guo (2002a,b). Thus research developing theoretically valid inferential techniques for functional mixed models is important.
Prediction is one of the main objectives in functional data analysis in many applications. For example, one may like to predict the number of CD4 cells in HIV patients or thai circumference (related to body weight) for obese children in a future time point. The mixed effect, combination of fixed and random effects are typically used for such prediction. For statistical inference, standard error estimation of the mixed effects is an integral part of data analysis. However, this is a non trivial problem even in simple parametric random effect models. See, Kackar and Harville (1984), Rao and Molina (2015) and Jiang (2007) for various developments. We developed the approximation theory for standard error estimation as a measure of uncertainty for mixed effects in a functional linear mixed models. Our approach is frequentist. To our knowledge, the prediction error estimation in this sense for a functional mixed model is new. Guo (2002a) reported subject (curve) specific 95% confidence intervals in his applications. However, like Antoniadis and Sapatinas (2007) we were unable to verify the development with theoretical validation. We believe, Guo’s primary goal was in developing the functional mixed models and user friendly estimation rather than prediction interval estimation. This work, in this sense, is complementary to Guo (2002a) and an important contribution to the functional mixed model methodology.
In this paper we consider a functional mixed model similar to Chen and Wang (2011). We also follow the penalized spline smoothing technique similar to them. Of course there are other possible smoothing techniques, such as wavelet (Morris and Carroll, 2006, Antoniadis and Sapatinas (2007), functional principal components (Yao, Müller and Wang, 2005), Kauermann and Wegener, 2011 and Staicu, Crainiceanu and Carroll, 2010). For comparison, contrasts, computation and further literature review of various smoothing techniques in this context we refer to Chen and Wang (2011). The section 2 introduces the models and model estimation. The section also develops the approximation theory of prediction error. The simulation study and real data examples are given in Section 3. We conclude the development in Section 4. The proofs are deferred to Appendix.
2 Models and Estimation
We consider a functional mixedeffect model as
(2.1) 
where , is a dimensional fixed coefficient function, is a dimensional subjectspecific random coefficient functions and is a Gaussian noise process with and . We also assume that and are independent.
Let the data are collected/designed at time points for th subject, and th time point, . Then the model (2.1) for this data is
(2.2) 
where is the covariates, known design for random effects . These are zero mean Gaussian process with , a positive definite matrix. We assume and are independent for or . The objective is to obtain the mean square error for predicting the mixed effect , where and are and dimensional known vectors.
Assume that belongs to the normed space of continuous functions with finite second derivatives and the covariance can be decomposed by
Then we can approximate by
where is the vector of basis functions and is the corresponding coefficients. By KarhunenLove expansion (Ash and Gardner, 1975), the random functions can be approximated by
where , and are random coefficients with . In this paper, we consider Bspline basis functions for and . Let be a set of knot points and for any , where . Using these knots, we can define normalized Bspline basis function of order . The Bspline basis is defined by
where
denotes the th order divided difference for distinct points of function and
If , then for some integer . We also used Bspline for the .
Then the model (2.2) is represented as a linear mixed effect model
(2.3) 
where , , , and with and where and For convenience, denote as the variance of . Define . It follows that . Then, is the variance of . Hence, we can write the mixed effect into a function of and , which is
(2.4) 
where and
2.1 Estimation of the fixed parameter and random effects
Following Chen and Wang (2007), treating the random effect as missing value, define the penalized joint loglikelihood for and as
(2.5)  
where , is the second derivative of with respect to and are defined similarly based on the Bspline basis functions .
Given , the minimization of provides
By some algebra, we then have
(2.7)  
(2.8) 
where and .
Because involves some unknown parameters, we estimate the variance component in by solving the following estimating equation
(2.9) 
where . Notice that contains estimating equations. Denote the estimate of by .
Note that the above estimating equation is biascorrected score function from the score function of the restricted loglikelihood function. To see this point, we observe that restricted loglikelihood function is the following
(2.10)  
Then the derivative of is
(2.11) 
But unless . To make the score equation to be unbiased, we modified the score function to be (2.9) such that it is unbiased.
2.2 Prediction and Prediction Mean Square Error
A naive prediction of given in (2.4) is where is an unknown vector of variance components in . The th component of will be denoted as and . Unlike the simple linear mixed models, this prediction is biased as stated in the following theorem.
Theorem 1
The prediction is biased and the bias of the prediction of is
(2.12) 
To reduce the order of bias, we propose a biascorrected prediction for as
(2.13)  
The proof is deferred to the Appendix.
Theorem 2
If is sparse, namely only a finite number of ’s are nonzeros (). Then the bias of is of order , which is negligible in comparing to .
The proof is deferred to the Appendix.
Now we will derive the prediction error formula. Let and where and . Define . Then it can be shown that
(2.14)  
where .
Let and where
and Further, denote and . The following theorem states a practical formula for the .
Theorem 3
The MSE of prediction for is
where , and , and .
Therefore, an estimation of the can be obtained by plugging in the variance components estimate from (2.9) into the MSE expression.
2.3 Choice of smoothing parameters
As mentioned earlier, the basic modeling framework considered in this article is similar to Chen and Wang (2011), however, there is differences in smoothing procedure of fixed and random effects and in parameter estimation. We followed their smoothing parameter choices. For completeness we explained the procedure here briefly.
Chen and Wang (2011) proposed to estimate the variance component in by minimizing the following penalized loglikelihood, for given and ,
(2.15) 
Decomposing each and let be the first component of the and be the rest component of . Now collecting be a components vector and be the corresponding coefficients. The smoothing parameter are chosen by minimizing the following marginal REML
(2.16) 
where is the marginal covariance of , which is
where , and .
The smoothing parameters are chosen by minimizing the following marginal loglikelihood, for given
where
for any symmetric matrix . Because , taking derivative of with respect to , gives
where and for . Then
3 Numerical Findings
We investigated the finite sample performance of the proposed method through simulation and real data examples. The simulation study is also designed to verify the asymptotic behavior of the approximated prediction error formula and and related coverage errors.
3.1 Simulation Study
We considered the following mixed model setup
(3.17) 
where for all . Three different values of was chosen, 50, 100 and 200. We generated the time points independently from Uniform(0,1). Let . Set and . We simulated from a Gaussian process with mean 0 and covariance where , and from a Gaussian process with mean zero and , where if , and otherwise. We set . We call this set up as Case I. As a second case, we kept everything same except the covariance structure of the random effects. Specifically, we took with . We call this as Case II. We also examined the performance for more fluctuated mean function where . We call this as case III.
The main purpose of this simulation study is to evaluate the performance of mean square error of the predictor of mixed effects that measuring subject (curve) specific means. For example, we considered where is one of the time points (for example ), and and are means of and respectively. In particular, here and , and and rest of for . Similar to Chen and Wang (2007), we used the following algorithm to estimate and . We fixed the tuning parameters and and set an initial value of , , and . Then the initial estimates of and are
respectively. We then repeat the following steps until all the parameter estimates converge

Step 1: estimate through the estimating equation given in (2.9).
Plugging in the estimates of and from the previous algorithm, obtain a biased corrected estimation of by given in (2.13).
We then compute the following three quantities:

The true MSE: computed by where is the number of replicated data sets and is the prediction of in the th replicate. In all cases we took replication.

Estimate MSE with estimated variance components: computed by the formula given in (2.14) using estimated and .
Along with the mean square prediction errors, we calculated 95% prediction coverage error where the prediction interval was calculated as .
The reported values in the table 1 are the averages of the prediction coverage over all the individuals. The relative bias is the relative difference between the true MSE and the estimated MSE, averaged over replications and then averaged over the individuals. Numbers in the bracket are the standard errors averaged over all the subjects.
n  Prediction Coverage  Relative Bias  True MSE  

Case I:  
50  0.9915(0.1034)  0.9454(0.0044)  0.0509(0.0325)  0.1015(0.0215) 
100  0.9936(0.0732)  0.9488(0.0051)  0.0749(0.0365)  0.0599(0.0126) 
200  0.9968(0.0521)  0.9558(0.0062) 