# Estimating Functional Linear Mixed-Effects Regression Models

###### Abstract

The functional linear model is a popular tool to investigate the relationship between a scalar/functional response variable and a scalar/functional covariate. We generalize this model to a functional linear mixed-effects model when repeated measurements are available on multiple subjects. Each subject has an individual intercept and slope function, while shares common population intercept and slope function. This model is flexible in the sense of allowing the slope random effects to change with the time. We propose a penalized spline smoothing method to estimate the population and random slope functions. A REML-based EM algorithm is developed to estimate the variance parameters for the random effects and the data noise. Simulation studies show that our estimation method provides an accurate estimate for the functional linear mixed-effects model with the finite samples. The functional linear mixed-effects model is demonstrated by investigating the effect of the 24-hour nitrogen dioxide on the daily maximum ozone concentrations and also studying the effect of the daily temperature on the annual precipitation.

###### keywords:

EM algorithm, Functional Linear Regression, Penalized Splines, Random Effects Model^{†}

^{†}journal: Computational Statistics & Data Analysis\biboptions

authoryear

## 1 Introduction

When a random variable is measured or observed at multiple time points or spatial locations, the data can be viewed as a function of time or spatial locations. This type of data is generally called as functional data (Ramsay and Silverman, 2005). In the current big data era, functional data analysis (FDA) has become very popular in statistical methodology and applied data analysis. Functional linear models (FLMs) is one of the most popular models in FDA. It models the relationship between functional variables and/or predicts the scalar response from the functional input. FLMs have been studied extensively since Ramsay and Dalzell (1991) introduced them. With the developments of modern technology, FLMs have been popularly applied to model functional data in many fields such as economics, medicine, environment, climate [see for instance, Ramsay and Silverman (2002), Ramsay and Silverman (2005), and Ferraty and Vieu (2006), for several case studies].

There is extensive literature studying estimations and properties of FLMs. For example, Chiou et al. (2003) applied a quasi-likelihood approach to study a FLM with a functional response and a finite-dimensional vector of scalar predictors. Yao et al. (2005) studied FLMs for sparse longitudinal data and suggested a nonparametric estimation method based on the functional principal components analysis (FPCA). Their proposed functional regression approach is flexible to allow for different measurement time points of functional predictors and the functional response. Cai and Hall (2006) discussed the prediction problem in FLMs based on the FPCA technique. Crambes et al. (2009) proposed a smoothing spline estimator for the functional slope parameter, and extended it to covariates with measurement-errors. Yuan and Cai (2010) suggested a smoothness regularization method for estimating FLMs based on the reproducing kernel Hilbert space (RKHS) approach. They provided a unified treatment for both the prediction and estimation problems by developing a tool on simultaneous diagonalization of two positive-definite kernels. Wu et al. (2010) proposed a varying-coefficient FLM which allows for the slope function depending on some additional scalar covariates. A systematic review on FLMs can be found in Morris (2015).

One popular FLM is to link a scalar response variable with a functional predictor through the following model

(1) |

where is the intercept, is a smooth slope function, ’s are independent and identically distributed (i.i.d.) random variables with mean 0 and variance , and S is often assumed to be a compact subset of an Euclidean space such as . The slope function, , represents the accumulative effect of the functional covariate on the scalar response .

For purposes of illustration, we take the air pollution data as an example. This data is from the R package NMMAPSdata (Peng and Welty, 2004). The data have hourly measurements of ozone and nitrogen dioxide concentrations for some U.S. cities. Our aim is to study the relationship of the daily maximum ozone concentration and the functional predictors nitrogen dioxide measured during 24 hours (from 0 am to 11 pm) of that day. For the -th city, the scalar response is the maximum ozone concentration during 24 hours (from 0 am to 11 pm) in the -th day, and the functional covariate is the hourly concentration measured during 24 hours. In a preliminary analysis, we performed a functional linear regression model (Cardot et al., 2007) on each individual city and found that there was a dramatic variation of the estimated . This indicates that each city has different effects of the hourly concentration on the daily maximum ozone. Therefore, it may not be appropriate to pool all the data of U.S. cities together and provide only one average effect of hourly on the daily maximum ozone. On the other hand, we may not use all of the data information available if we fit the functional linear model for each individual city separately.

To address this dilemma, we generalize the FLM (1) to incorporate random effects into the slope function, and call it the functional linear mixed-effects model (FLMM). Assume that we repeatedly observed a distinct functional predictor and scalar outcome for each subject over several visits. Then the observed data has the structure for and , where is the -th repeated measurement of the scalar response for the -th subject, and is the corresponding functional predictor. The functional linear mixed-effects model can be expressed as following:

(2) |

where is the population intercept, is the intercept random effect, represents the population effect of on , stands for the random effect of on for the -th subject, and is the i.i.d. random variable with mean 0 and variance . In this article, we assume that , , and follows a Gaussian stochastic process with mean 0 and covariance function , that is, . We also assume that , , , and are mutually independent. The above functional linear mixed-effects model is very attractive, because it can estimate the population effect and random effect of the functional predictor (e.g. the hourly in the air pollution study) on the scalar response (e.g. daily maximum ozone concentrations) as well as the population intercept and intercept random effect simultaneously. The application of the proposed functional linear mixed-effects model on the air pollution problem is not unique, and many similar applications can be found in environmental or biological problems.

The proposed functional linear mixed-effect model (2) is different from the following functional mixed model (Goldsmith et al., 2011, 2012):

(3) |

where accounts for correlations in the repeated outcomes for the -th subject. The highlight of the distinction between (2) and (3) is: the subject-specific random effect of (3) remains the same across visits, while the random effect of (2) allows for varying with time. The including of the random effect in (2) can characterize the different trend effect of functional predictor on scalar outcomes for different subjects.

Many nonparametric smoothers used for the FLMs can be applied to fit the model (2). In this article, we use the idea of penalized splines smoothers of Ramsay and Silverman (2005) to estimate and in (2). Then, the model (2) is transformed by a linear mixed-effects model (LMM). Then a REML-based EM algorithm is proposed to fit the LMMs, and its efficiency is illustrated by examples.

The remainder of this article is organized as follows. Section 2 introduces a smoothing spline method to estimate the above functional linear mixed-effects model. Section 3 implemented some simulations to evaluate the finite sample performance of the smoothing spline method. Then the functional linear mixed-effects model is demonstrated by two real applications in Section 4. Conclusions are given in Section 5.

## 2 Method

Without giving any parametric assumption on the slope functions, and , we estimate them as linear combinations of splines basis functions

where and are two vectors of basis functions with dimensions and , respectively, and and are the corresponding vectors of basis coefficients to estimate. Let denote the variance-covariance matrix of random-effects, then , and the covariance function for the random effect can be expressed as We first consider the scenario of the functional predictors observed without measurement errors. When is observed with measurement errors, many nonparametric smoothing approaches can be applied to reconstruct the underlying functional predictors , such as the functional principal component analysis method (Yao et al., 2005), which is introduced in Subsection 2.5.

### 2.1 Estimating Fixed and Random Effects

Let and with . The fixed effects , and random effects are estimated by minimizing

(4) | |||||

Define three vectors , and where and . Define a matrix and a matrix . Then can be expressed in a matrix form

Then the estimates for and are obtained by minimizing :

(5) |

where , , with , , , , , , and denotes the kronecker product.

Once obtaining the estimates and , the estimates of and , can be given by

(6) |

### 2.2 The REML-based EM algorithm

To estimate the fixed-effects, , the random-effects, , and the variance parameters, , and , we recommend an EM algorithm procedure called the REML-based EM-algorithm. It was proposed by Wu and Zhang (2006) for estimating nonparametric mixed-effects regression models with longitudinal data. The REML-based EM-algorithm has three steps, which are outlined as follow.

Initializing. Initializing the starting values for , and , denoted by , and , respectively. For example, we can choose and as an identity matrix .

Step 1. Set . Compute

Denote . Then estimate and by

Step 2. Compute the residuals and the matrix . Then the updates of and are given by

Step 3. Repeat Steps 2 and 3, until some convergence conditions are satisfied.

### 2.3 Smoothing Parameter Selection

The smoothness of and , are controlled by the smoothing parameter and , respectively. Define , then the optimal value for is chosen by minimizing the generalized cross-validation (GCV) criterion defined as follows

where , , and is the effective degrees of freedom, which is calculated as , where is given by

Define the matrix . The predictor can then be expressed as with

Note that the smooth matrix .

### 2.4 Constructing the Confidence Intervals

To construct the confidence intervals of and the point-wise confidence bands of , we need to calculate the covariance matrix of :

(7) | |||||

In (7), can be replaced by to account for our roughness penalty on . For simplicity, instead of using , we use

(8) |

Let be the estimator of the covariance matrix (8) and partition it as . Then the 95% confidence intervals of is approximately as

and the 95% pointwise bands of can be approximately given by

where . Moreover, the estimate of can be given as

where we use instead of in order to account for our roughness penalty on in our method.

### 2.5 Reconstructing the predictors

When covariates in the functional linear mixed-effects model (2) are not be exactly observable but measured with errors, the estimators and inference may be biased if one ignores these measurement errors. Hence, we need to adjust the resulting bias. In this article, we suggest to reconstruct the functional predictors by using a large number of functional principal components obtained from a smooth estimator of the covariance matrix estimator (Goldsmith et al., 2012) firstly. Then, we treat the estimated as the true predictors and applying the REML-based EM algorithm.

Define the covariance function of as

Mercer’s theorem (Ash and Gardner, 1975) states that has the eigen-decomposition

where satisfying , and ’s form a complete orthonormal basis in . Then, allows the Karhunen-Loeve decomposition (Rice and Silverman, 1991)

where is the orthonormal eigenfunction, which is also called the functional principal component (FPC). The coefficients is called the FPC score of , which satisfies , , and for .

Suppose we have the following additive measurement error model,

where is the observed value, is the underlying true value for the th subject at the time point , and represents the measurement error at the time point . We assume that is a mean zero process, and are mutually independent. We estimate by using a method-of-moments approach, and then smooth the off-diagonal elements of this observed covariance matrix to remove the ‘nugget effect’ that is caused by measurement error (Staniswalis and Lee, 1998; Yao et al., 2005; Goldsmith et al., 2012).

We use the principal analysis by conditional estimation (PACE) algorithm proposed by Yao et al. (2005) to estimate the mean curve , the FPCs and the FPC scores from the observations . Let , , and be the corresponding estimators of , and , respectively. Then an estimate of is obtained as

where the number of FPCs, , can be chosen by AIC, BIC, the cross-validation method, or the empirical experience based on the percentage of explained variance (such as 90% or 95%).

## 3 Simulation studies

In this section, we perform some numerical experiments to assess the efficiency of our proposed estimating procedure for the functional linear mixed-effects model (2). The performance of our estimation method is evaluated by the following relative mean integrated square error (RMISE) for the estimated population slope function and the individual slope function ,

and

where , and .

We assume that the functional predictor are observed at equally-spaced time points of with the additive normal measurement errors:

where or , and the true underlying predictors are given by

where with independent random variables , , , and , , and . We choose two types of functions for the individual slope functions : (1) with the population slope function given by ; (2) with the population slope function given by . The random coefficients are generated as , for both of cases. The scalar response is generated from the following model:

where is generated from . The number of repeated measurements for each individual is varied as . The number of individuals is set as and , and the standard deviation of the data noises is varied as and .

The functional linear mixed-effects model (2) is estimated using the method introduced in Section 2. In case 1, we choose 35 cubic B-splines basis functions for and ; while in case 2, we choose 35 Fourier basis functions for and . Figure 1 and 2 display the pointwise mean, bias, standard deviation and root mean squared error of the estimated population slope function in 1,000 simulation replicates when and . It shows that the pointwise mean of the estimated population slope function is very close to the true function in both of cases.

The estimation results for all simulation setups are summarized in Table 1-2. As expected, there is a substantial decrease in RMISE when more visits are observed per subject. When the functional covariate is observed directly without measurement errors (i.e. ), the mean and median of RMISE for the estimated population slope function and individual slope function is close, which indicates that the estimation is stable. In this case, when the number of replicated measurements for each individual increases from to , the mean of RMISE of the estimated population slope function decreases 27%, and the mean of RMISE of the estimated individual slope function decreases 8%. When the functional covariate is observed with measurement errors with the standard deviation , the mean of RMISE of the estimated population slope function increases 36%, and the mean of RMISE of the estimated individual slope function increases 25%, in comparison with the case when the functional covariate is observed directly without measurement errors.

Intercept | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bias | STD | RMSE | ||||||||||||

0.0 | 0.5 | 50 | 5 | -0.0017 | 0.0986 | 0.0986 | 0.0047 | 0.0210 | ||||||

10 | 0.0019 | 0.0926 | 0.0926 | 0.0032 | 0.0187 | |||||||||

20 | 0.0005 | 0.0862 | 0.0862 | 0.0025 | 0.0166 | |||||||||

100 | 5 | -0.0034 | 0.0638 | 0.0639 | 0.0027 | 0.0180 | ||||||||

10 | -0.0012 | 0.0611 | 0.0611 | 0.0019 | 0.0161 | |||||||||

20 | -0.0021 | 0.0609 | 0.0610 | 0.0014 | 0.0145 | |||||||||

0.0 | 1.0 | 50 | 5 | -0.0010 | 0.1079 | 0.1079 | 0.0080 | 0.0274 | ||||||

10 | 0.0000 | 0.1023 | 0.1023 | 0.0060 | 0.0237 | |||||||||

20 | -0.0018 | 0.0923 | 0.0923 | 0.0043 | 0.0204 | |||||||||

100 | 5 | -0.0006 | 0.0760 | 0.0760 | 0.0054 | 0.0230 | ||||||||

10 | -0.0016 | 0.0663 | 0.0663 | 0.0036 | 0.0198 | |||||||||

20 | -0.0015 | 0.0630 | 0.0630 | 0.0024 | 0.0174 | |||||||||

0.5 | 0.5 | 50 | 5 | -0.0024 | 0.1001 | 0.1002 | 0.0049 | 0.0212 | ||||||

10 | 0.0003 | 0.0952 | 0.0952 | 0.0035 | 0.0189 | |||||||||

20 | 0.0005 | 0.0890 | 0.0890 | 0.0033 | 0.0177 | |||||||||

100 | 5 | -0.0023 | 0.0636 | 0.0637 | 0.0029 | 0.0182 | ||||||||

10 | 0.0011 | 0.0620 | 0.0620 | 0.0020 | 0.0162 | |||||||||

20 | -0.0016 | 0.0623 | 0.0624 | 0.0022 | 0.0155 | |||||||||

0.5 | 1.0 | 50 | 5 | -0.0036 | 0.1130 | 0.1131 | 0.0082 | 0.0277 | ||||||

10 | 0.0026 | 0.1039 | 0.1039 | 0.0057 | 0.0235 | |||||||||

20 | -0.0001 | 0.0935 | 0.0935 | 0.0043 | 0.0204 | |||||||||

100 | 5 | -0.0023 | 0.0764 | 0.0764 | 0.0053 | 0.0228 | ||||||||

10 | 0.0009 | 0.0698 | 0.0698 | 0.0037 | 0.0199 | |||||||||

20 | -0.0011 | 0.0633 | 0.0633 | 0.0026 | 0.0176 |

Intercept | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bias | STD | RMSE | ||||||||||||

0.0 | 0.5 | 50 | 5 | 0.0039 | 0.0980 | 0.0981 | 0.0049 | 0.0415 | ||||||

10 | 0.0032 | 0.0912 | 0.0912 | 0.0038 | 0.0409 | |||||||||

20 | -0.0012 | 0.0890 | 0.0890 | 0.0033 | 0.0402 | |||||||||

100 | 5 | -0.0014 | 0.0643 | 0.0644 | 0.0026 | 0.0405 | ||||||||

10 | -0.0014 | 0.0630 | 0.0631 | 0.0023 | 0.0396 | |||||||||

20 | -0.0008 | 0.0629 | 0.0629 | 0.0019 | 0.0395 | |||||||||

0.0 | 1.0 | 50 | 5 | 0.0059 | 0.1106 | 0.1107 | 0.0082 | 0.0447 | ||||||

10 | 0.0050 | 0.1005 | 0.1006 | 0.0053 | 0.0423 | |||||||||

20 | 0.0006 | 0.0948 | 0.0948 | 0.0038 | 0.0407 | |||||||||

100 | 5 | -0.0014 | 0.0747 | 0.0747 | 0.0039 | 0.0418 | ||||||||

10 | -0.0006 | 0.0687 | 0.0687 | 0.0028 | 0.0401 | |||||||||

20 | -0.0003 | 0.0653 | 0.0653 | 0.0022 | 0.0397 | |||||||||

0.5 | 0.5 | 50 | 5 | 0.0021 | 0.0981 | 0.0981 | 0.0052 | 0.0418 | ||||||

10 | 0.0042 | 0.0929 | 0.0930 | 0.0039 | 0.0409 | |||||||||

20 | -0.0012 | 0.0895 | 0.0895 | 0.0033 | 0.0402 | |||||||||

100 | 5 | -0.0030 | 0.0643 | 0.0644 | 0.0027 | 0.0406 | ||||||||

10 | -0.0016 | 0.0620 | 0.0620 | 0.0023 | 0.0396 | |||||||||

20 | -0.0013 | 0.0625 | 0.0625 | 0.0019 | 0.0395 | |||||||||

0.5 | 1.0 | 50 | 5 | 0.0026 | 0.1142 | 0.1142 | 0.0082 | 0.0447 | ||||||

10 | 0.0026 | 0.1008 | 0.1009 | 0.0052 | 0.0422 | |||||||||

20 | -0.0020 | 0.0927 | 0.0927 | 0.0039 | 0.0408 | |||||||||

100 | 5 | -0.0034 | 0.0754 | 0.0755 | 0.0040 | 0.0419 | ||||||||

10 | -0.0015 | 0.0678 | 0.0678 | 0.0029 | 0.0402 | |||||||||

20 | 0.0004 | 0.0652 | 0.0652 | 0.0022 | 0.0397 |

## 4 Applications

In this section, we perform the afore-proposed FLMMs via the EM algorithm to analyze two applications.

### 4.1 Ozone Pollution Analysis

The first study is to re-visit the air pollution study introduced in Section 1. The functional linear mixed-effects model (2) is used to study the effect of the 24-hour nitrogen dioxide (t) on the daily maximum ozone concentration. We fit the following mixed-effect model

where is the daily maximum ozone within 24 hours for the -th day in the -th city, is the 24-hour nitrogen dioxide (t) observations. The data are collected for cities from April 13 to September 30, 1996.

For computational facilities, we have considered the penalized spline estimator and expand the functional coefficients in cubic B-splines basis functions with equispaced interior knots for the population slope function and the random slope function .

The fixed effects and random effects are estimated by minimizing

The smoothing parameters are chosen as and by GCV criterion. We implement the REML-based EM algorithm proposed in Section 2.4. The estimate for the intercept is with the estimated standard error 0.0094, and the 95% confidence interval of is .

Figure 3 displays the estimate population slope function and its approximate 95% pointwise confidence interval. We can see a positive correlation between the maximum ozone concentration and the nitrogen dioxide before 11 am and after 8 pm but negative correlation between 11 am and 8 pm. Due to the stopping of the sun lighting in the night, a lot of nitrogen dioxide is accumulated; with the sunrise at about 6 am, more and more nitrogen dioxide is reacted with the sun light to generate ozone, so more ozone is generated with the decreasing of nitrogen dioxide. This process will last until the sunset at about 7-8 pm, then the nitrogen dioxide is accumulated again.

Figure 4 displays the estimated individual slope function for four cities: Baton Rouge, Buffalo, Johnstown, and Tampa. The individual slope function for Buffalo is lower than the population slope function in the whole day, which indicates that the hourly nitrogen dioxide has a lower effect on maximum ozone concentration in the whole day. On the other hand, the individual slope function for Tampa is higher than the population slope function in the whole day. This interesting phenomenon cannot be found from the regular functional linear regression model.