Title
Boosting Functional Regression Models with FDboost
Abstract
The R addon package FDboost is a flexible toolbox for the estimation of functional regression models by modelbased boosting. It provides the possibility to fit regression models for scalar and functional response with effects of scalar as well as functional covariates, i.e., scalaronfunction, functiononscalar and functiononfunction regression models. In addition to mean regression, quantile regression models as well as generalized additive models for location scale and shape can be fitted with FDboost. Furthermore, boosting can be used in highdimensional data settings with more covariates than observations. We provide a handson tutorial on model fitting and tuning, including the visualization of results. The methods for scalaronfunction regression are illustrated with spectrometric data of fossil fuels and those for functional response regression with a data set including bioelectrical signals for emotional episodes.
Boosting Functional Regression Models with FDboost
Sarah Brockhaus LMU Munich David Rügamer LMU Munich Sonja Greven LMU Munich
Keywords: functional data analysis, functiononfunction regression, functiononscalar regression, gradient boosting, modelbased boosting, scalaronfunction regression.
1 Introduction
With the progress of technology today, we have the ability to observe more and more data of a functional nature, such as curves, trajectories or images (Ramsay and Silverman 2005). Functional data can be found in many scientific fields like demography, biology, medicine, meteorology and economics (see, e.g., Ullah and Finch 2013). In practice, the functions are observed on finite grids. In this paper, we deal with onedimensional functional data that are observed over a real valued interval. Examples for such data are growth curves over time, acoustic signals, temperature curves and spectrometric measurements in a certain range of wavelengths. Regression models are a versatile tool for data analysis and various models have been proposed for regression with functional variables; see Morris (2015) and Greven and Scheipl (2017) for recent reviews of functional regression models. One can distinguish between three different types of functional regression models: scalaronfunction regression, a regression with scalar response and functional covariates, functiononscalar regression referring to models with functional response and scalar covariates and functiononfunction regression, which is used when both response and covariates are functional. Models for scalaronfunction regression are sometimes also called signal regression.
Greven and Scheipl (2017) lay out a generic framework for functional regression models including the three mentioned model types. Many types of covariate effects are discussed including linear and nonlinear effects of scalar covariates as well as linear effects of functional covariates and interaction terms. They describe that estimation can be based on a mixed models framework (Scheipl et al. 2015, 2016) or on componentwise gradient boosting (Brockhaus et al. 2015, 2017). In this paper, we describe the latter approach and provide a handson tutorial for its implementation in R (R Core Team 2017) in the comprehensive R package FDboost (Brockhaus and Rügamer 2017).
Boosting estimates the model by iteratively combining simple models and can be seen as a method that conducts gradient descent (Bühlmann and Hothorn 2007). Boosting is capable of estimating models in highdimensional data settings and implicitly does variable selection. The modeled features of the conditional response distribution can be chosen quite flexibly by minimizing different loss functions. The framework includes linear models (LMs), generalized linear models (GLMs) as well as quantile and expectile regression. Furthermore, generalized additive models for location, scale and shape (GAMLSS, Rigby and Stasinopoulos 2005) can be fitted (Mayr et al. 2012). GAMLSS model all distribution parameters of the conditional response distribution simultaneously depending on potentially different covariates. Brockhaus et al. (2016) discuss GAMLSS with scalar response and functional covariates. Stöcker et al. (2017) introduce GAMLSS for functional response. Due to variable selection and shrinkage of the coefficient estimates, no classical inference concepts are available for the boosted models. However, it is possible to quantify uncertainty by bootstrap (Efron 1979) and stability selection (Meinshausen and Bühlmann 2010). The main advantages of the boosting approach are the possibility to fit models in high dimensional data settings with variable selection and to estimate not only mean regression models but also GAMLSS and quantile regression models. The main disadvantage is the lack of formal inference.
Other frameworks for flexible regression models with functional response exist. Morris and Carroll (2006) and Meyer et al. (2015) use a basis transformations approach and Bayesian inference to model functional variables. Usually, lossless transformations like a wavelet transformation are used. See Morris (2017) for a detailed comparison of the two frameworks.
In this tutorial, we present the R package FDboost (Brockhaus and Rügamer 2017), which is designed to fit a great variety of functional regression models by boosting. FDboost builds on the R package mboost (Hothorn et al. 2016) for statistical modelbased boosting. Thus, in the backend we rely on a welltested implementation. FDboost provides a comprehensive implementation of the most important methods for boosting functional regression models. In particular, the package can be used to conveniently fit models with functional response. For effects of scalar covariates on functional responses, we provide baselearners with suitable identifiability constraints. In addition, baselearners that model effects of functional covariates are implemented. The package also contains functions for model tuning and for visualizing results.
As a case study for scalaronfunction regression, we use a dataset on fossil fuels, which was analyzed in Fuchs et al. (2015) and Brockhaus et al. (2015) and is part of the FDboost package. In this application, the heat value of fossil fuels should be predicted based on spectral data. As a case study for functiononscalar and functiononfunction regression, we use the emotion components data set, which is analyzed in Rügamer et al. (2016) in the context of factorspecific historical effect estimation and which is provided in an aggregated version in FDboost.
The remainder of the paper is structured as follows. We shortly review the generic functional regression model (Section 2) for scalar and for functional response. Then the boosting algorithm used for model fitting is introduced in Section 3. In Section 4, we give details on the infrastructure of the package FDboost. Scalaronfunction regression with FDboost is described in Subsection 4.1. Regression models for functional response with scalar and/or functional covariates are described in Subsection 4.2. We present possible covariate effects as well as discuss model tuning and show how to extract and display results. In Subsection 4.3, we discuss regression models that model other characteristics of the response distribution than the mean, in particular median regression and GAMLSS. In Subsection 4.4, we shortly comment on stability selection in combination with boosting. We conclude with a discussion in Section 5. The paper is structured such that the subsections on functional response can be skipped if one is only interested in scalaronfunction regression.
2 Functional regression models
In Subsection 2.1 we first introduce a generic model for scalar response with functional and scalar covariates. Afterwards, we deal with models with functional response in Subsection 2.2.
2.1 Scalar response and functional covariates
Let the random variable be the scalar response with realization . The covariate set can include both scalar and functional variables. We denote a generic scalar covariate by and a generic functional covariate by , with and , . We assume that we observe data pairs , where comprises the realizations of scalar covariates as well as the realizations of . In practice, is observed on a grid of evaluation points , such that each curve is observed as a vector . While different functional covariates may be observed on different grid points over different intervals, we do no introduce additional indices here for ease of notation.
We model the expectation of the response by an additive regression model
(1) 
where is the additive predictor containing the additive effects . Each effect can depend on one or more covariates in . Possible effects include linear, nonlinear and interaction effects of scalar covariates as well as linear effects of functional covariates. Moreover, groupspecific effects and interaction effects between scalar and functional variables are possible. To give an idea of possible effects , Table 1 lists effects of functional covariates that are currently implemented in FDboost.
covariate(s)  type of effect  

functional covariate  linear functional effect  
scalar and functional covariate, and  linear interaction  
smooth interaction 
The effects are linearized using a basis representation:
(2) 
with basis vector and coefficient vector that has to be estimated. The design matrix for the th effect consists of rows for all observations . A ridgetype penalty term is used for regularization, where is a suitable penalty matrix for and is a nonnegative smoothing parameter. The smoothing parameter controls the effective degrees of freedom of the effect.
Consider, for example, a linear effect of a functional covariate . Using , this effect is computed as
where first, the smooth effect is expanded in basis functions, second, the integration is approximated by a weighted sum and, third, the terms are rearranged such that they fit into the scheme . The basis is thus computed as
(3)  
with spline functions , , for the expansion of the smooth effect in direction and integration weights for numerical computation of the integral. The penalty matrix is chosen such that it is suitable to regularize the splines . To set up a Pspline basis (Eilers and Marx 1996) for the smooth effect, in (3) are Bsplines and the penalty is a squared difference matrix.
2.1.1 Case study: Heat value of fossil fuels
The aim of this application is to predict the heat value of fossil fuels using spectral data (Fuchs et al. 2015, Siemens AG). For samples, the dataset contains the heat value, the percentage of humidity and two spectral measurements, which can be thought of as functional variables observed over and observed over . One spectrum is ultravioletvisible (UVVIS), the other a near infrared spectrum (NIR). For both spectra, the observation points are not equidistant. The dataset is contained in the R package FDboost. \MakeFramed\@setminipage
# load the data data("fuelSubset", package = "FDboost") # look at the structure of the data str(fuelSubset)
List of 7 $ heatan : num [1:129] 26.8 27.5 23.8 18.2 17.5 ... $ h2o : num [1:129] 2.3 3 2 1.85 2.39 ... $ nir.lambda : num [1:231] 800 803 805 808 810 ... $ NIR : num [1:129, 1:231] 0.2818 0.2916 0.0042 0.034 0.1804 ... $ uvvis.lambda: num [1:134] 250 256 261 267 273 ... $ UVVIS : num [1:129, 1:134] 0.145 1.584 0.814 1.311 1.373 ... $ h2o.fit : num [1:129] 2.58 3.43 1.83 2.03 3.07 ...
Figure 1 shows the two spectral measurements colored according to the heat value. Predictive models for the heat values, discussed in the next sections, will include scalaronfunction terms to accommodate the spectral covariates.
2.2 Functional response
We denote the functional response by , where is the evaluation point at which the function is observed. We assume that , where is a realvalued interval , for example a timeinterval. All response curves can be observed on one common grid or on curvespecific grids. For responses observed on one common grid, we write for the observations, with denoting the grid of evaluation points. For curvespecific evaluation points, the observations are denoted by , with . As above, the covariate set can contain both scalar and functional variables.
As in model (1), we model the conditional expectation of the response. In this case, the expectation is modeled for each point :
(4) 
As the response is a function of , the linear predictor as well as the additive effects are functions of . Each effect can depend on one or more covariates in as well as on . To give an idea of possible effects , Table 2 lists some effects that are currently implemented.
covariate(s)  type of effect  
(none)  smooth intercept  
scalar covariate  linear effect  
smooth effect  
two scalars ,  linear interaction  
functional varying coefficient  
smooth interaction  
functional covariate  linear functional effect  
scalar and functional  linear interaction  
smooth interaction  
functional covariate ,  concurrent effect  
with  historical effect  
lag effect, with lag  
lead effect, with lead  
effect with specific integration limits  
grouping variable  groupspecific smooth intercepts  
grouping variable and scalar  groupspecific linear effects  
curve indicator  curvespecific smooth residuals 
All effects mentioned in Table 2 are varying over but can also be modeled as constant in . The upper part of the table contains linear, smooth and interaction effects for scalar covariates. The middle part of the table gives possible effects of functional covariates and interaction effects between scalar and functional covariates. The lower part of the table in addition shows some groupspecific effects.
In practice, all effects are linearized using a basis representation (Brockhaus et al. 2017):
(5) 
where the basis vector depends on covariates and the observationpoint of the response . The corresponding coefficient vector has to be estimated. The design matrix for the th effect consists of rows for all observations and all timepoints , .
In the following, we will use a modularization of the basis into a first part depending on covariates and a second part that only depends on . This modular structure reduces the problem of specifying the basis to that of creating two suitable marginal bases. For many effects, the marginal bases are easy to define as they are known from regression with scalar response.
First, we focus on responses observed on one common grid which does not depend on . In this case, we represent the effects using the Kronecker product of two marginal bases (Brockhaus et al. 2015)
(6) 
where the marginal basis vector , , depends on covariates in and the marginal basis vector , , depends on the grid point . The design matrix is computed as the Kronecker product of the two marginal design matrices, which have dimensions and . If the effect can be represented as in (6) it fits into the framework of linear array models (Currie et al. 2006). The representation as array model has computational advantages, saving time and memory. Brockhaus et al. (2015) discuss array models in the context of functional regression.
Note that representation (6) is only possible for responses observed on one common grid, as otherwise depends on the curvespecific grid points . In this case, the marginal bases are combined by the rowwise tensor product (Scheipl et al. 2015; Brockhaus et al. 2017). This is a rather technical detail and is thoroughly explained in Brockhaus et al. (2017), also for the case where the basis for the covariates depends on such as for historical effects.
We regularize the effects by a ridgetype penalty term . The penalty matrix for the composed basis can be constructed as (Wood 2006, Sec. 4.1.8)
(7) 
where is a suitable penalty for and is a suitable penalty for . The nonnegative smoothing parameters and determine the degree of smoothing in each direction. The anisotropic penalty in (7) can be simplified in the case of an isotropic penalty depending on only one smoothing parameter :
(8) 
In this simplified case only one instead of two smoothing parameters has to be estimated. If in (8), this results in a penalty that only penalizes the marginal basis in direction:
(9) 
Consider, for example, a linear effect of a functional covariate . The basis vector and the penalty are the same as in (3). For the basis in direction, we use a spline representation
(10) 
with spline functions , and the penalty matrix has to be chosen such that it is suitable for the chosen spline basis. Using Psplines again, are Bsplines and is a squared difference matrix (Eilers and Marx 1996). The complete basis is
This choice expands in a tensorproduct spline basis and approximates the integral using numerical integration.
2.2.1 Case study: Emotion components data with EEG and EMG
The emotion components data set is based on a study of Gentsch et al. (2014), in which brain activity (EEG) as well as facial muscle activity (EMG) was simultaneously recorded during a computerised game. As the facial muscle activity should be traceable to the brain activity for a certain game situation, Rügamer et al. (2016) analyzed the synchronization of EEG and EMG signal using functiononfunction regression models with factorspecific historical effects. During the gambling rounds, three binary game conditions were varied, resulting in a total of different study settings:

the goal conduciveness (game_outcome) corresponding to the monetary outcome (gain or loss) at the end of each game round,

the power setting, which determined whether the player was able or not able to change the final outcome in her favor (high or low, respectively) and

the control setting, which was manipulated to change the participant’s subjective feeling about her ability to cope with the game outcome. The player was told to frequently have high power in rounds with high control and have frequently low power in low control situations.
We focus on the EMG of the frontalis muscle, which is used to raise the eyebrow. The EMG signal is a functional response , with ms, which is measured at a frequency of resulting in equidistant observed time points given by the vector t. The experimental conditions are scalar covariates. The EEG signal is observed over the same time interval as the EMG signal. We use the EEG signal from the Fz electrode, which is in the center front of the head.
In the following, we consider an aggregated version of the data, in which the EEG and EMG signals are aggregated per subject and game condition. One participant is excluded, yielding subjects.
# load the data data("emotion", package = "FDboost") # look at the structure of the data str(emotion)
List of 8 $ power : Factor w/ 2 levels "high","low": 1 1 2 2 1 1 2 2 1 1 ... $ game_outcome: Factor w/ 2 levels "gain","loss": 1 2 1 2 1 2 1 2 1 2 ... $ control : Factor w/ 2 levels "high","low": 1 1 1 1 2 2 2 2 1 1 ... $ subject : Factor w/ 23 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ... $ EEG : num [1:184, 1:384] 0.14 0.318 0.497 0.601 0.61 ... $ EMG : num [1:184, 1:384] 2.5581 0.0671 2.9337 1.4511 0.881 ... $ s : int [1:384] 1 2 3 4 5 6 7 8 9 10 ... $ t : int [1:384] 1 2 3 4 5 6 7 8 9 10 ...
In order to fit simple and meaningful models for functiononfunction regression, we define a subset of the data that contains only the observations for a certain game condition. We use the game condition with high control, gain and low power:
# define subset for a certain game condition subset < emotion$control == "high" & emotion$game_outcome == "gain" & emotion$power == "low" emotionHGL < list() # subset scalar variables emotionHGL$subject < emotion$subject[subset] # subset functional variables emotionHGL$EMG < emotion$EMG[subset,] emotionHGL$EEG < emotion$EEG[subset,] # keep the evaluation points emotionHGL$s < emotionHGL$t < emotion$t
In Figure 2 the EEG and EMG signal is depicted for each of the 23 participants and the 384 observation points.
3 Estimation by gradient boosting
Boosting originates from the field of machine learning and aims at combining many weak learners to form a single strong learner (e.g., Friedman et al. 2000; Schapire and Freund 2012). In the boosting context, the weak learners are called baselearners. In our context, each effect corresponds to one baselearner, leading to baselearners in total. Boosting was originally designed for binary classification problems and was extended in various directions (see, e.g., Mayr et al. 2014b). Nowadays it is also used to estimate statistical models (Mayr et al. 2014a).
Componentwise gradient boosting minimizes the expected loss (risk) via gradient descent in a stepwise procedure. In each boosting step, each baselearner is fitted separately to the negative gradient and only the best fitting baselearner is selected for the model update; hence the term ’componentwise’. To fit a model for the expectation, like models (1) and (4), the squared error loss ( loss) is minimized. In this case, the negative gradient corresponds to the residuals. In each step, the current additive predictor is updated, such that the estimated effect of the selected baselearner is added by a proportion of the estimate in this iteration controlled by the steplength . A typical choice is (see, e.g., Bühlmann and Hothorn 2007). Usually the algorithm is stopped before convergence, which is called early stopping. This leads to regularized effect estimates and therefore yields more stable predictions. Since some of the baselearners are never selected in the course of all iterations, boosting also performs variable selection. The optimal stopping iteration can be determined by resampling methods like crossvalidation, subsampling or bootstrap. For each fold, the empirical outofbag risk is computed and the stopping iteration that yields the lowest empirical risk is chosen. As resampling must be conducted on the level of independent observations, this is done on the level of curves for functional response.
By representing all baselearners as linear effects of covariates (if necessary, by using a basis representation for nonlinear effects), baselearners also define the covariate effects in the sense of additive regression models and can be associated with a specific hat matrix as well as a certain number of degrees of freedom. To ensure a fair selection of baselearners within the course of all iterations, it is important to specify equal degrees of freedom for each baselearner. If baselearners exhibit different degrees of freedom, selection is biased towards more flexible baselearners with higher degrees of freedom as they are more likely to yield larger improvements of the fit in each iteration (see Hofner et al. 2011, for details). It is recommended to use a rather small number of degrees of freedom for all baselearners to work with weak learners (Kneib et al. 2009; Hofner et al. 2011). While the predefined degrees of freedom of each baselearner are fixed and relatively small, the additive model components in the final model fit adapt to the necessary complexity and can have various effective degrees of freedom depending on the frequency of their (repeated) selection.
3.0.1 Functional Response
To adapt boosting for a functional response, we compute the loss at each point and integrate it over the domain of the response (Brockhaus et al. 2015).
For the loss the optimization problem for functional response aims at minimizing
(11) 
which is approximated by numerical integration. To obtain identifiable models, suitable identifiability constraints for the baselearners are necessary and implemented. FDboost also contains baselearners that model the effects of functional covariates. For a discussion of both points, please see Brockhaus et al. (2015).
4 The package FDboost
Fitting functional regression models via boosting is implemented in the R package FDboost. The package uses the fitting algorithm and other infrastructure from the R package mboost (Hothorn et al. 2016). All baselearners and distribution families that are implemented in mboost can be used within FDboost. Many naming conventions and methods in FDboost are implemented in analogy to mboost. Thus, we recommend users of FDboost to first familiarize themselves with mboost. A tutorial for mboost can be found in Hofner et al. (2014).
The main fitting function to estimate functional regression models, like models (1) and (4), is called FDboost(). The interface of FDboost() is as follows:^{1}^{1}1Note that for the presentation of functions we restrict ourselves to the most important function arguments. For the full list of arguments, we refer to the corresponding manuals.
FDboost(formula, timeformula, id = NULL, numInt = "equal", data, offset = NULL, ...)
First, we focus on the arguments that are necessary for regression models both with scalar and with functional response. formula specifies the baselearners for the covariate effects and timeformula specifies , which is the basis along . Per default, this basis is the same for all effects . The data is provided in the data argument as a data.frame or a named list. The dataobject has to contain the response, all covariates and the evaluation points of functional variables. Prior to the model fit, an offset is subtracted from the response to center it. This corresponds to initializing the fit with this offset, e.g. an overall average, and leads to faster convergence and better stability of the boosting algorithm. For mean regression, by default the offset is the smoothed pointwise mean of the response over time without taking into account covariates. This offset is part of the intercept and corresponds to an initial estimate that is then updated. In the dotsargument, ’...’, further arguments passed to mboost() and mboost_fit() can be specified; see the manual of mboost() for details. An important argument is family determining the loss and linkfunction for the model which is fitted. The default is family = Gaussian(), which minimizes the squared error loss and uses the identity as link function. Thus, per default a mean regression model for continuous response is fitted. For the duality of lossfunction and the family argument, we refer to Section 4.3. Another important argument is control, which determines the number of boosting iterations and the steplength of the boosting algorithm.
4.0.1 Specification for scalar response
For scalar response, we set timeformula = NULL as no expansion of the effects in direction is necessary. formula specifies the baselearners for the covariates effects as in (2). The arguments id and numInt are only needed for functional responses. For scalar response, offset = NULL results in a default offset, as, for example, the overall mean for mean regression.
4.0.2 Arguments needed for functional response
For functional response, the setup of the covariate effects generally follows (6) by separating the effects into two marginal parts. The marginal effects , , are represented in the formula as y b_1 + b_2 + …+ b_J. The marginal effect is represented in the timeformula, which has the form b_Y. The baselearners for the marginal effects also contain suitable penalty matrices. Internally, the baselearners specified in formula are combined with the baselearner specified in timeformula as in (6) and a suitable penalty matrix is constructed according to (8). Per default, the response is expected to be a matrix. In this case id = NULL. The matrix representation is not possible for a response which is observed on curve specific grids. In this case the response is provided as vector in long format and id specifies which position in the vector is attributed to which curve; see section 4.2 for details. The argument numInt provides the numerical integration scheme for computing the integral of the loss over in (11). Per default, numInt = "equal", and thus all integration weights are set to one; for numInt = "Riemann" Riemann sums are used. For functional response, offset = NULL induces a smooth offset varying over . For offset = "scalar", a scalar offset is computed. This corresponds to an offset that is constant along . For more details and the full list of arguments, see the manual of FDboost().
4.1 Scalar response and functional covariates
In this subsection, we give details on models with scalar response and functional covariates like model (1). Such models are called scalaronfunction regression models. As case study the data on fossil fuels is used.
4.1.1 Potential covariate effects: baselearners
In order to fit a scalaronfunction model (1), the timeformula is set to NULL and potential covariate effects are specified in the formula argument. The effects of scalar covariates can be linear or nonlinear. A linear effect is obtained using the baselearner bols(), which is also suitable for factor variables, in which case dummy variables are constructed for each factor level (Hofner et al. 2014). Per default, bols() contains an intercept. If the specified degrees of freedom are less than the number of columns in the design matrix, bols() penalizes the linear effect by a ridge penalty with the identity matrix as penalty matrix. The baselearner brandom() for factor variables sets up an effect, which is centered around zero and is penalized by a ridge penalty, having similar properties to a random effect. See the web appendix of Kneib et al. (2009) for a discussion on brandom(). A nonlinear effect expanded by Psplines is obtained by the baselearner bbs(). For details on baselearners with scalar covariates, we refer to Hofner et al. (2014).
Potential baselearners for functional covariates can be seen in Table 3. In this table exemplary linear predictors are listed in the left column. In the right column, the corresponding call to formula is given. For simplicity, only one possible parameterization leads to simple interpretations and one corresponding model call are shown, although FDboost allows to specify several parameterizations.
additive predictor  call in formula 

y 1 + bsignal(x, s = s)  
y 1 + bfpc(x, s = s)  
y 1 + bolsc(z) + bsignal(x, s = s)  
+ bsignal(x, s = s) %X% bolsc(z) 
For a linear effect of a functional covariate , two baselearners exist that use different basis expansions. Assuming to be smooth, bsignal() uses a Pspline representation for the expansion of . Assuming that the main modes of variation in the functional covariate are the important directions for the coefficient function , a representation with functional principal components is suitable (Ramsay and Silverman 2005). In the baselearner bfpc(), the coefficient function and the functional covariate are both represented by an expansion in the estimated functional principal components of . As penalty matrix, the identity matrix is used. In Appendix B, technical details on the representation of functional effects are given.
The specification of a model with an interaction term between a scalar and a functional covariate is given at the end of Table 3. The interaction term is centered around the main effect of the functional covariate using bolsc for the scalar covariate (as is the linear effect of the scalar covariate around the intercept). Thus, the main effect of the functional covariate has to be included in the model. For more details on interaction effects, we refer to Brockhaus et al. (2015) and Rügamer et al. (2016). The interaction is formed using the operator %X% that builds the rowwise tensor product of the two marginal bases, see Appendix C.
As explained in Section 3, all baselearners in a model should have equal and rather low degrees of freedom. The number of degrees of freedom that can be given to a baselearner is restricted. On the one hand, the maximum number is bounded by the number of columns of the design matrix (more precisely by the rank of the design matrix). On the other hand, for rankdeficient penalties, the minimum number of degrees of freedom is given by the rank of the null space of the penalty matrix.
The interface of bsignal() is as follows: \MakeFramed\@setminipage
bsignal(x, s, knots = 10, degree = 3, differences = 1, df = 4, lambda = NULL, check.ident = FALSE)
The arguments x and s specify the name of the functional covariate and the name of its argument. knots gives the number of inner knots for the Pspline basis, degree the degree of the Bsplines and differences the order of the differences that are used for the penalty. Thus, per default, 14 cubic Psplines with first order difference penalty are used. The argument df specifies the number of degrees of freedom for the effect and lambda the smoothing parameter. Only one of those two arguments can be supplied. If check.ident = TRUE identifiability checks proposed by Scheipl and Greven (2016) for functional linear effects are additionally performed.
The interface of bfpc() is: \MakeFramed\@setminipage
bfpc(x, s, df = 4, lambda = NULL, pve = 0.99, npc = NULL)
The arguments x, s, df and lambda have the same meaning as in bsignal(). The two other arguments allow to control how many functional principal components are used as basis. Per default the number of functional principal components is chosen such that the proportion of the explained variance is 99%. This proportion can be changed using the argument pve (proportion variance explained). Alternatively, the number of components can be set to a specific value using npc (number principal components).
The interface of bolsc() is very similar to that of bols(), which is laid out in detail in Hofner et al. (2014). In contrast to bols(), bolsc() centers the design matrix such that the resulting linear effect is centered around zero. More details on bolsc() are given in Section 4.2. \MakeFramed\@setminipage
bolsc(..., df = NULL, lambda = 0, K = NULL)
In the dots argument, ..., one or more covariates can be specified. For factor variables bolsc() sets up a design matrix in dummycoding. The arguments df and lambda have the same meaning as above. If lambda > 0 or df < the number of columns of the design matrix a ridgepenalty is applied. Per default, K = NULL, the penalty matrix is the identity matrix. Setting the argument K to another matrix allows for customized penalty matrices.
Case study (ctd.): Fossil fuel data
For the heat values , , we fit the model
(12) 
with water content and centered spectral curves and , which are observed over the wavelengths and . We center the NIR and the UVVIS measurement per wavelength such that and analogously for UVVIS. Thus, the functional effects have mean zero, and analogously for UVVIS. This does not affect the interpretation of and , it only changes the interpretation of the intercept of the regression model. If all effects are centered, the intercept can be interpreted as overall mean and the other effects as deviations from the overall mean.
Note that the functional covariates have to be supplied as number of curves by number of evaluation points matrices. The nonlinear effect of the scalar variable H2O is specified using the bbs() baselearner. For the linear functional effect of NIR and UVVIS, we use the baselearner bsignal(). The degrees of freedom are set to 4 for each baselearner. For the functional effects, we use a Pspline basis with 20 inner knots. Because of the scalar response timeformula = NULL.
# center the functional covariates per # observed wavelength to center their effects fuelSubset$UVVIS < scale(fuelSubset$UVVIS, scale = FALSE) fuelSubset$NIR < scale(fuelSubset$NIR, scale = FALSE) # fit the scalaronfunction regression model sof < FDboost(heatan ~ bbs(h2o, df = 4) + bsignal(UVVIS, s = uvvis.lambda, knots = 20, df = 4) + bsignal(NIR, s = nir.lambda, knots = 20, df = 4), timeformula = NULL, data = fuelSubset)
4.1.2 Model tuning and early stopping
Boosting iteratively selects baselearners to update the additive predictor. Fixing the baselearners and the steplength, the model complexity is controlled by the number of boosting iterations. With more boosting iterations the model becomes more complex (Bühlmann and Yu 2003). The steplength is chosen sufficiently small in the interval , usually as , which is also the default. For smaller steplength, more boosting iterations are required and vice versa (Friedman 2001). Note that the default number of boosting iterations is 100. This is arbitrary and in most cases not adequate. The number of boosting iterations and the steplength of the algorithm can be specified in the argument control. This argument must be supplied as a call to boost_control(). For example, control = boost_control(mstop = 50, nu = 0.2) implies 50 boosting iterations and steplength .
The most important tuning parameter is the number of boosting iterations. For regression with scalar response, the function cvrisk.FDboost() can be used to determine the optimal stopping iteration. This function directly calls cvrisk.mboost() from the mboost package, which performs an empirical risk estimation using a specified resampling method. The interface of cvrisk.FDboost() is:
cvrisk.FDboost(object, folds = cvLong(id = object$id, weights = model.weights(object)), grid = 1:mstop(object))
In the argument object, the fitted model object is specified. grid defines the grid on which the optimal stopping iteration is searched. Per default the grid from 1 to the current stopping iteration of the model object is used as search grid. But it is also possible to specify a larger grid, e.g., 1:5000. The argument folds expects an integer weight matrix with dimension (<number of observations> times <number of folds>). Depending on the range of values in the weight matrix, different types of resampling are performed. For example, if the weights sum to for each column but also have values larger than one, the resampling scheme corresponds to bootstrap while a fold crossvalidation is employed by using an incidence matrix, for which the rows sum to . If not manually specified, mboost and FDboost provide convenience functions – cv() and cvLong() – that construct such matrices on the basis of the given model object. The function cvLong() is suited for functional response and treats scalar response as the special case with one observation per curve. For scalar response, the function cv() from package mboost can be used, which has a simpler interface.
cv(weights, type = c("bootstrap", "kfold", "subsampling"), B = ifelse(type == "kfold", 10, 25))
The argument weights is used to specify the weights of the original model, which can be extracted using model.weights(object). Usually all model weights are one. Via argument type the resampling scheme is defined: "bootstrap" for nonparametric bootstrap, "kfold" for crossvalidation and "subsampling" for resampling half of all observations for each fold. The number of folds is defined by B. Per default, 10 folds are used for crossvalidation and 25 folds for bootstrap as well as for subsampling.
The function cvLong() is especially suited for functional response and has the additional argument id, which is used to specify which observations belong to the same response curve. For scalar response, id = 1:N.
Case study (ctd.): Fossil fuel data
To tune the scalaronfunction regression model (12), we search the optimal stopping iteration by 10fold bootstrapping. First, the bootstrap folds are created using the function cv(). Second, for each bootstrap fold, the outofbag risk is computed for models with 1 to 1000 boosting iterations. The choice of the grid is independent of the number of boosting iterations of the fitted model object. \MakeFramed\@setminipage
# create folds for 10fold bootstrap for the model object "sof" set.seed(123) folds_sof < cv(weights = model.weights(sof), type = "bootstrap", B = 10) # compute outofbag risk on each fold for 1 to 1000 boosting iterations cvm_sof < cvrisk(sof, folds = folds_sof, grid = 1:1000)
The object cvm_sof contains the outofbag risk of each fold for all iterations.
4.1.3 Methods to extract and visualize results from the resampling object
For a cvriskobject as created by cvrisk(), the method mstop() extracts the estimated optimal number of boosting iterations, which corresponds to the number of boosting iterations yielding the minimal mean outofbag risk. plot() generates a plot of the estimated outofbag risk per stopping iteration in each fold. In addition, the mean outofbag risk per stopping iteration is displayed. The estimated optimal stopping iteration is marked by a dashed vertical line. In such a plot, the convergence behavior can be graphically examined.
Case study (ctd.): Fossil fuel data
We generate a plot that displays for each fold the estimated outofbag risk per stopping iteration for each fold; see Figure 3.
# plot the outofbag risk for all bootstrap folds plot(cvm_sof, ylim = c(2, 15))
For small numbers of boosting iterations, the outofbag risk declines sharply with a growing number of boosting iterations. With more and more iterations the model gets more complex and the outofbag risk starts to slowly increase. The dashed vertical line marks the estimated optimal stopping iteration of 511, which can be accessed using the function mstop():
# get estimated optimal stopping iteration mstop(cvm_sof)
[1] 511
4.1.4 Methods to extract and display results from the model object
Fitted FDboost objects inherit methods from class mboost. Thus, all methods available for mboost objects can also be applied to models fitted by FDboost(). The design and penalty matrices that are constructed by the baselearners can be extracted using the extract() function. For example, extract(object, which = 1) returns the design matrix of the first baselearner and extract(object, which = 1, what = "penalty") the corresponding penalty matrix. The number of boosting iterations for an FDboost object can be changed afterwards using the subset operator; e.g., object[50] sets the number of boosting iterations for object to 50. Note that the subset operator directly changes object, and hence no assignment is necessary. One can access the estimated coefficients by the coef() function. For smooth effects, coef() returns the smooth estimated effects evaluated on a regular grid. The splinecoefficients of smooth effects can be obtained by object$coef(). The estimated effects can be graphically displayed by the plot() function. The coefficient plots can be customized by various arguments. For example, coefficient surfaces can be displayed as image plots, setting pers = FALSE, or as perspective plots, setting pers = TRUE. To plot only some of the baselearners, the argument which can be used. For instance, plot(object, which = c(1,3)) plots the estimated effects of the first and the third baselearner. The fitted values and predictions for new data can be obtained by the methods fitted() and predict(), respectively.
Case study (ctd.): Fossil fuel data
In order to continue working with the optimal model, we set the number of boosting iterations to the estimated optimal value.
# number of boosting iterations is set to the estimated optimal mstop sof < sof[mstop(cvm_sof)] # note that the following call also directly changes sof sof[mstop(cvm_sof)]
Then, we use plot() to display the estimated effects. Per default, plot() only displays effects of baselearners that were selected at least once. See Figure 4 for the resulting plots.
par(mfrow = c(1,3)) # plot the effects of all selected baselearners plot(sof, ask = FALSE, ylab = "")
The mean heat value is estimated to be higher for higher water content and lower for lower water content (see Figure 4 left). High values of the UVVIS spectrum at a wavelength of around 500 and 850 nm are associated with higher heat values. Higher values of the UVVIS spectrum at wavelength around 300 and 750 nm are associated with lower heat values (see Figure 4 middle). The effect of the NIR spectrum can be interpreted analogously.
4.1.5 Bootstrapped coefficient estimates
In order to get a measure for the uncertainty associated with the estimated coefficient functions, one can employ nested bootstrap. The optimal number of boosting iterations in each bootstrap fold, in turn, is estimated by an inner resampling procedure. The bootstrapped coefficients are shrunken towards zero as boosting shrinks coefficients towards zero due to early stopping. Thus, the resulting bootstrap “confidence” interval is biased towards zero but still captures the variability of the coefficient estimates. In FDboost the function bootstrapCI() can be used to conveniently compute bootstrapped coefficients:
bootstrapCI(object, B_outer = 100, B_inner = 25, ...)
The argument object is the fitted model object. The maximal number of boosting iterations for each bootstrap fold is the number of boosting iterations of the modelobject. Per default bootstrap is used with B_outer = 100 outer folds and B_inner = 25 inner folds. The dots argument, ... can be used to pass further arguments to applyFolds(), which is used for the outer bootstrap. In particular, setting the argument mc.cores to an integer greater will run the outer bootstrap in parallel on the number of cores that are specified via mc.cores (this does not work under Windows, as the parallelization is based on the function mclapply()).
Case study (ctd.): Fossil fuel data
We recompute the model on 100 bootstrap samples to compute bootstrapped coefficient estimates. In each bootstrap fold the optimal number of boosting iterations is estimated by an inner bootstrap with 10 folds. In contrast to other methods and analytic inference concepts, employing bootstrap for coefficient uncertainty is much more time consuming but can be easily parallelized. See the help page of bootstrapCI() for example code. The resulting estimated coefficients can be seen in Figure 5.
# compute bootstrapped coefficients with 100 folds for outer bootstrap, # parallelized on 10 cores (use mc.cores = 1 on Windows) # this takes some time! set.seed(123) sof_bootstrapCI < bootstrapCI(sof[1000], B_outer = 100, B_inner = 10, mc.cores = 10) # plot the bootstrapped coefficient estimates par(mfrow = c(1,3)) plot(sof_bootstrapCI, ask = FALSE, commonRange = FALSE, ylab = "")
4.2 Functional response
In this subsection, we explain how to fit models with functional response like model (4). Models with scalar and functional covariates are treated, thus covering functiononscalar and functiononfunction regression models.
4.2.1 Specification of functional response
If a functional variable is observed on one common grid, its observations can be represented by a matrix. In FDboost, such functional variables have to be supplied as number of curves by number of evaluation points matrices. That is, a functional response , with curves and evaluation points, is stored in an matrix with cases in rows and evaluation points in columns. This corresponds to a data representation in wide format. The variable must be given as vector .
For the functional response, curvespecific observation grids are possible, i.e., the th response curve is observed at evaluation points specific for each curve . In this case, three pieces of information must be supplied: the values of the response, the evaluation points and the curve to which each of the observations belongs. The response is supplied as the vector . This vector has length . The variable contains all evaluation points . The argument id contains the information on which observation corresponds to which response curve. The argument id must be supplied as a rightsided formula id = idvariable.
Case study (ctd.): Emotion components data
In the following, we give an example for a model fit with a functional response. In the first model fit, the response is stored in the matrix EMG, in the second in the vector EMG_long. We fit an intercept model by defining the formula as y 1 and the timeformula as bbs(t). \MakeFramed\@setminipage
# fit intercept model with response matrix fos_intercept < FDboost(EMG ~ 1, timeformula = ~ bbs(t, df = 3), data = emotionHGL)
The corresponding mathematical formula is
i.e., we simply estimate the mean curve of the functional EMG signal.
To fit a model with response in long format, we first have to convert the data into the corresponding format. We therefore construct a dataset data_emotion_long that contains the response in long format. Usually, the long format specification is only necessary for responses that are observed on curve specific grids. We here provide this version for illustrative purposes, but in this example the following model specification is equivalent to the previous model fit fos_intercept. \MakeFramed\@setminipage
# create data set with response in long format emotion_long < emotionHGL # save the curve information in a long vector emotion_long$EMG_long < as.vector(emotion_long$EMG) # compute t in long format emotion_long$time_long < rep(emotionHGL$t, each = nrow(emotionHGL$EMG)) # compute the idvariable for the curves emotion_long$curveid < rep(1:nrow(emotionHGL$EMG), ncol(emotionHGL$EMG)) # fit intercept model for response in long format fos_intercept_long < FDboost(EMG_long ~ 1, timeformula = ~ bbs(time_long, df = 3), id = ~ curveid, data = emotion_long)
Effects in the formula that are combined with the timeformula
Many covariate effects can be represented by the Kronecker product of two marginal bases as in (6). The response and the bases in covariate direction are specified in formula as y b_1 + …+ b_J. The baselearner for the expansion along is specified in timeformula as b_Y. Each baselearner in formula is combined with the baseleaner in timeformula using the operator %O%. This operator implements the Kronecker product of two basis vectors as in (6) with isotropic penalty matrix as in (8). Consider, for example, formula = Y b_1 + b_2, and the timeformula = b_Y. This yields Y b_1 %O% b_Y + b_2 %O% b_Y. If the effect should only be penalized in direction, the operator %A0% can be used as it sets up the penalty as (9). If formula contains baselearners that are composed of two baselearners by %O% or %A0%, those effects are not expanded with timeformula, allowing for model specifications with different effects in direction. This can be used, for example, to model some effects linearly and others nonlinearly in or to construct effects using %A0%. For further details on these operators and their use, we refer to Appendix C.
We start with baselearners for the timeformula. Theoretically, it is possible to use any baselearner which models the effect of a continuous variable. For a linear effect in , the baselearner bols() can be used. Usually, the effects are assumed to be smooth along . In this case, the baselearner bbs() can be used, which represents the smooth effect by Psplines (Schmid and Hothorn 2008a). Thus, bbs() uses a Bspline representation for the design matrix and a squared difference matrix as penalty matrix. Using the bbs() baseleaner in the timeformula corresponds to using a marginal basis as described in equation (10).
Baselearners that can be used in formula are listed in Table 4. In this table, a selection of additive predictors that can be represented within the array framework are listed in the left column. In the right column, the corresponding formula is given. The timeformula is set to bbs(t) to model all effects as smooth effects in .
additive predictor  
call in formula  
y 1  
y 1 + bolsc(z1)  
y 1 + bbsc(z1)  
y 1 + bolsc(z1) + bolsc(z2) +  
bols(z1) %Xc% bols(z2)  
y 1 + bolsc(z1) + bbsc(z2) + bols(z1) %Xc% bbs(z2)  
y 1 + bbsc(z1) + bbsc(z2) + bbs(z1) %Xc% bbs(z2)  
y 1 + bsignal(x, s = s)  
y 1 + bfpc(x, s = s)  
y 1 + bolsc(z) + bsignal(x, s = s)  
+ bsignal(x, s = s) %X% bolsc(z) 
For offset = NULL, the model contains a smooth offset . The smooth offset is computed prior to the model fit as smoothed population minimizer of the loss. For mean regression, the smooth offset is the smoothed mean over . The specification offset = "scalar" yields a constant offset . The resulting intercept in the final model is the sum of the offset and the smooth intercept specified in the formula as 1, i.e., .
The upper part of Table 4 gives examples for linear predictors with scalar covariates. A linear effect of a scalar covariate is specified using the baselearner bolsc(). This baselearner works for continuous and for factor variables. A smooth effect of a continuous covariate is obtained by using the baselearner bbsc(). The baselearners bolsc() and bbsc() are similar to the baselearners bols() and bbs() from the mboost package, but enforce pointwise sumtozero constraints to ensure identifiability for models with functional response (the suffix ’c’ refers to ’constrained’). Since, for example, the effect contains a smooth intercept as special case, the model would not be identifiable without constraints, see Appendix A for more details. We use the constraint for all , which centers each effect for each point (Scheipl et al. 2015). This implies that effects varying over can be interpreted as deviations from the smooth intercept and that the intercept can be interpreted as global mean if all effects are centered in this way. It is possible to check whether all covariate effects sum to zero for all points by setting check0 = TRUE in the FDboost() call. To specify interaction effects of two scalar covariates, the baselearners for each of the covariates are combined using the operator %Xc% that applies the sumtozero constraint to the interaction effect.
Case study (ctd.): Emotion components data
For the emotion components data with the EMG signal as functional response, , , we fit models with scalar and functional covariate effects in the following.
Functiononscalar regression
We specify a model for the conditional expectation of the EMG signal using a random intercept curve for each subject and a linear effect for the study setting power:
(13) 
with subject having values 1 to 23 for the participants of the study, and taking values for low and high power. Both covariate effects in the model are specified by using a centered baselearner. The linear effect of the factor variable subject and the effect of power are both specified using the bolsc() baselearner. Therefore, the effects sum up to zero for each timepoint over all observations , i.e., for all . \MakeFramed\@setminipage
# fit functiononscalar model with centered linear effects fos_random_power < FDboost(EMG ~ 1 + bolsc(subject, df = 2) + bolsc(power, df = 1) %A0% bbs(t, df = 6), timeformula = ~ bbs(t, df = 3), data = emotion)
As described in Section 3, it is important that all baselearners have the same number of degrees of freedom. In this model the degrees of freedom for each baselearner are . By specifying the bolscbaselearner with df = 2 for subject, the subject effect is estimated with a Ridge penalty similar to a random effect, whereas the power effect is estimated unpenalized due to the use of the %A0%operator.
Analogously, a model with response in long format as in fos_intercept_long could be specified by changing the formula to the formula of fos_random_power.
Functiononfunction regression
For the data subset for one specific game condition, we use the effect of the EEG signal to model the EMG signal:
(14) 
In this model each timepoint of the covariate potentially influences each timepoint of the response . We center the EEG signal per time point such that for each to center its effect per timepoint.
# center the functional covariate per timepoint to center its effect emotionHGL$EEG < scale(emotionHGL$EEG, scale = FALSE) # fit functiononfunction model with intercept and functional effect of EEG fof_signal < FDboost(EMG ~ 1 + bsignal(EEG, s = s, df = 2), timeformula = ~ bbs(t, df = 3), data = emotionHGL)
We will show and interpret plots of the estimated coefficients later on. Assuming that the brain activity (measured via the EEG) triggers the muscle activity (measured via the EMG), it is reasonable to assume that EMG signals are only influenced by past EEG signals. Such a relationship can be represented using a historical effect , which will be discussed in the following paragraph.
Effects in the formula comprising both the effect in covariate and direction
If the covariate varies with , the effect cannot be separated into a marginal basis depending on the covariate and a marginal basis depending only on . In this case the effects are represented as in equation (5). Examples for such effects are historical and concurrent functional effects, as discussed in Brockhaus et al. (2017). In Table 5 we give an overview of possible additive predictors containing such effects.
[ht]
additive predictor  call in formula 

y 1 + bconcurrent(x, s = s, time = t)  
y 1 + bhist(x, s = s, time = t)  
y 1 + bhist(x, s = s, time = t,  
limits = limitsLag)  
y 1 + bhist(x, s = s, time = t,  
limits = limitsLead)  
y 1 + bhist(x, s = s, time = t, limits = mylimits)  
y 1 + bolsc(z) + bhist(x, s = s, time = t)  
+ bhistx(x) %X% bolsc(z) 

These general limit functions are not defined in FDboost. We give examples for such functions in this paragraph.

In bhistx(), the variable x has to be of class hmatrix, please see the manual of bhistx() for details.
The concurrent effect is only meaningful if the functional response and the functional covariate are observed over the same domain. Models with concurrent effects can be seen as varyingcoefficient models (Hastie and Tibshirani 1993), where the effect varies over . The baselearner bconcurrent() expands the smooth concurrent effect in Psplines. The historical effect uses only covariate information up to the current observation point of the response. The baselearner bhist() expands the coefficient surface in and in direction using Psplines to fit the historical effect. In Appendix B, details on the representation of functional effects are given.
The interface of bhist() is: \MakeFramed\@setminipage
bhist(x, s, time, limits = "s<=t", knots = 10, degree = 3, differences = 1, df = 4, lambda = NULL, check.ident = FALSE)
Most arguments of bhist() are analogous to those of bsignal(). bhist() has the additional argument time to specify the observation points of the response. Via the argument limits in bhist() the user can specify integration limits depending on . Per default a historical effect with limits is used. Other integration limits can be specified by using a function with arguments s and t, which returns TRUE for combinations of s and t that lie within the integration interval and FALSE otherwise. In the following, we give examples for functions that can be used for limits:
# historical effect; corresponds to default limits = "s<=t" limitsHist < function(s, t) { s <= t } # lag effect with lag delta = 5 limitsLag < function(s, t, delta = 5) { s >= t  delta & s <= t } # lead effect with lead delta = 5 limitsLead < function(s, t, delta = 5) { s <= t  delta }
The baselearner bhistx() is especially suited to form interaction effects such as factorspecific historical effects (Rügamer et al. 2016), as bhist() cannot be used in combination with the rowwise tensor product operator %X% to form interaction effects. bhistx() requires the data to be supplied as an object of type hmatrix; see the manual of bhistx() for its setup.
Case study (ctd.): Emotion components data
Again, we use the subset of the data for one specific game condition. We start with a simple functiononfunction regression model by specifying a concurrent effect of the EEG signal on the EMG signal:
A concurrent effect is obtained by the baselearner bconcurrent(), which is not expanded by the baselearner in timeformula. In this model, timeformula is only used to expand the smooth intercept.
# fit functiononfunction model with intercept and concurrent EEG effect fof_concurrent < FDboost(EMG ~ 1 + bconcurrent(EEG, s = s, time = t, df = 6), timeformula = ~ bbs(t, df = 3), data = emotionHGL, control = boost_control(mstop = 200))
Assuming that the activity in the muscle can be completely traced back to previous activity in the brain, a more appropriate model seems to be a historical model including a historical effect
(15) 
From a neuroanatomy perspective, the signal from the brain requires time to reach the muscle. We therefore set and , which is in line with Rügamer et al. (2016).
# fit functiononfunction model with intercept and historical EEG effect # where limits specifies the used lag between EMG and EEG signal fof_historical < FDboost(EMG ~ 1 + bhist(EEG, s = s, time = t, limits = function(s, t) s <= t  3, df = 6), timeformula = ~ bbs(t, df = 3), data = emotionHGL, control = boost_control(mstop = 200))
More complex historical models are discussed in Rügamer et al. (2016). In particular, a model containing random effects for the participants, effects for the game conditions and game condition as well as subjectspecific historical effects of the EEG signal.
It is also possible to combine effects listed in Table 4 and Table 5 to form more complex models. In particular, baselearners with and without array structure can be combined within one model. As in the componentwise boosting procedure each baselearner is evaluated separately, the array structure of the Kronecker product baselearners can still be exploited in such hybrid models.
4.2.2 Model tuning and early stopping
For a fair selection of baselearner, additional care is needed for functional responses as only some of the baselearners in the formula are expanded by the baselearner in timeformula. In particular, all baselearners listed in Table 4 are expanded by timeformula, whereas baselearners given in Table 5 are not expanded by the timeformula. For the rowwise tensor product and the Kronecker product of two baselearners, the degrees of freedom for the combined baselearner is computed as product of the two marginally specified degrees of freedom. For instance, formula = y bbsc(z, df = 3) + bhist(x, s = s, df = 12) and timeformula = bbs(t, df = 4) implies degrees of freedom for the first combined baselearner and degrees of freedom for the second baselearner. The call extract(object, "df") displays the degrees of freedom for each baselearner in an FDboost object. For other tuning options such as the number of iterations and the specification of the steplength see section 4.1.
To find the optimal number of boosting iterations for a model fit with functional response, FDboost provides two resampling functions. Depending on the specified model, some parameters are computed from the data prior to the model fit: per default a smooth functional offset is computed (offset = NULL in FDboost()) and for linear and smooth effects of scalar variables, defined by bolsc() and bbsc(), transformation matrices for the sumtozero constraints are computed. The function cvrisk.FDboost() uses the smooth functional offset and the transformation matrices from the original model fit in all folds. Thus, these parameters are treated as fixed and the uncertainty induced by their estimation is not considered in the resampling. On the other hand, applyFolds() recomputes the whole model in each fold. The two resampling methods are equal if no smooth offset is used and if the model does not contain any baselearner with a sumtozero constraint (i.e., neither bolsc() nor bbsc()). In general, we recommend to use the function applyFolds() to determine the optimal number of boosting iterations for a model with functional response. The interface of applyFolds() is:
applyFolds(object, folds = cv(rep(1, length(unique(object$id))), type = "bootstrap"), grid = 1:mstop(object))
The interface is in analogy to the interface of cvrisk(). In the argument object, the fitted model object is specified. grid defines the grid on which the optimal stopping iteration is searched. Via the argument folds the resampling folds are defined by suitable weights. The function applyFolds() expects resampling weights that are defined on the level of curves, . That means that the folds must contain weights , , which can be done easily using the function cv().
4.2.3 Methods to extract and display results
Methods to extract and visualize results are the same irrespective of scalar or functional response. Thus, we refer to the corresponding paragraphs at the end of Section 4.1.
Case study (ctd.): Emotion components data
As for scalar response, the plotfunction can be used to access the estimated effects in a functiononfunction regression. In the following, we compare the three basic types of functional covariate effects, which can be used in conjunction with a functional response. We first determine the optimal number of stopping iterations for all three presented models.
# create folds for 5fold crossvalidation: one weight per fold for each curve set.seed(123) folds_bs < cv(weights = rep(1, fof_signal$ydim[1]), type = "kfold", B = 5) # compute outofbag risk on the 5 folds for 1 to 200 boosting iterations cvm_concurrent < applyFolds(fof_concurrent, folds = folds_bs, grid = 1:200) ms_conc < mstop(cvm_concurrent) # ms_conc = 50 fof_concurrent < fof_concurrent[ms_conc] cvm_signal < applyFolds(fof_signal, folds = folds_bs, grid = 1:200) ms_signal < mstop(cvm_signal) # ms_signal = 15 fof_signal < fof_signal[ms_signal] cvm_historical < applyFolds(fof_historical, folds = folds_bs, grid = 1:200) ms_hist < mstop(cvm_historical) # ms_hist = 14 fof_historical < fof_historical[ms_hist]
Then, we plot the estimated effects into one figure:
# plot the effect of the functional covariate EEG in the three different models par(mfrow = c(1,3)) plot(fof_concurrent, which = 2, main = "Concurrent EEG effect") plot(fof_signal, which = 2, main = "Signal EEG effect", n1 = 80, n2 = 80, zlim = c(0.02, 0.025), col = terrain.colors(20)) plot(fof_historical, which = 2, main = "Historical EEG effect", n1 = 80, n2 = 80, zlim = c(0.02, 0.025), col = terrain.colors(20))
The concurrent effect corresponds to the diagonal of the other two surfaces in Figure 6 and assumes that offdiagonal timepoints have no association. Due to the temporal lag between EEG and EMG discussed for model (15), there is no meaningful interpretation for this model and the effect is only shown for demonstrative purposes. The historical effect corresponds to the assumption that the upper triangle in the signal EEG effects should be zero, as future brain activity should not influence the present muscle activity. The results in Figure 6 (right panel) can be interpreted in the same manner as results of a scalaronfunction regression when keeping a certain time point fixed. For the time point of the EMG signal, for example, time points to of the EEG signal do not show an effect, but for the estimated effect on the expected EMG signal is positive. For a detailed description of the interpretation of historical effect surfaces as shown in Figure 6, we refer to the online appendix of Rügamer et al. (2016).
Careful interpretation has to take into account that this data set has a rather small signaltonoise ratio due to the oscillating nature of both signals. In such cases, it is recommended to check the uncertainty of estimated effects via bootstrap, e.g., by using the bootstrapCI() function as exemplarily shown in Figure 7.
# calculate bootstrap interval based on 100 bootstrap folds (default) # and use 10fold crossvalidation to determine mstop on the inner level fof_historical_bci < bootstrapCI(fof_historical, mc.cores = 2, B_inner = 10, type_inner = "kfold") # plot results par(mfrow=c(1,3)) plot(fof_historical_bci, which = 2, ask = FALSE, pers = FALSE, col = terrain.colors(20), probs = c(0.05, 0.5, 0.95))
4.3 Functional regression models beyond the mean
Using boosting for model estimation it is possible to optimize other loss functions than the squared error loss. This allows to fit, e.g., generalized linear models (GLMs) and quantile regression models (Koenker 2005; Fenske et al. 2011). It is also possible to fit models for several parameters of the conditional response distribution in the framework of generalized additive models for location, scale and shape (GAMLSS, Rigby and Stasinopoulos 2005; Mayr et al. 2012).
For the estimation of these more general models, a suitable loss function in accordance with the modeled characteristic of the response distribution is defined and optimized. The absolute error loss ( loss), for instance, implies median regression, and minimizing the loss yields mean regression.
In FDboost(), the regression type is specified by the family argument. The family argument expects an object of class Family, which implements the respective loss function with its corresponding negative gradient and link function. The default is family = Gaussian() which yields boosting (Bühlmann and Yu 2003). This means that the mean squared error loss is minimized, which is equivalent to maximizing the loglikelihood of the normal distribution. Table 6 lists some loss functions currently implemented in mboost, which can be directly used in FDboost (see Hofner et al. 2014, for more families). Hofner et al. (2014) also give an example on how to implement new families via the function Family(). See also the help page ?Family for more details on all families.
response type  regression type  loss  call 

continuous response  mean regression  loss  Gaussian() 
median regression  loss  Laplace()  
quantile regression  check function  QuantReg()  
expectile regression  asymmetric  ExpectReg()  
robust regression  Huber loss  Huber()  
nonnegative response  gamma regression  GammaReg()  
binary response  logistic regression  Binomial()  
AdaBoost classification  exponential loss  AdaExp()  
count response  Poisson model  Poisson()  
neg. binomial model  NBinomial()  
scalar ordinal response  proportional odds model  ProppOdds()  
scalar categorical response  multinomial model  Multinomial()  
scalar survival time  Cox model  CoxPH() 

See Bühlmann and Hothorn (2007) for details on boosting binary classification and regression models.

See Fenske et al. (2011) for details on boosting quantile regression.

See Sobotka and Kneib (2012) for details on boosting expectile regression.

See Schmid et al. (2010) for details on boosting with multidimensional linear predictors, including negative binomial models.

See Schmid et al. (2011) for details about boosting proportional odds models.

See Schmid and Hothorn (2008b) for details about boosting survival models.
For a continuous response, several model types are available (Bühlmann and Hothorn 2007): boosting yields mean regression; a more robust alternative is median regression, which optimizes the absolute error loss; the Huber loss is a combination of and loss (Huber 1964); quantile regression (Koenker 2005) can be used to model a certain quantile of the conditional response distribution; and expectile regression (Newey and Powell 1987) for modeling an expectile. For a nonnegative continuous response, models assuming the gamma distribution can be useful. A binary response can be modeled in a GLM framework with a logit model or by minimizing the exponential loss, which corresponds to the first boosting algorithm ’AdaBoost’ (Friedman 2001). Count data can be modeled assuming a Poisson or negative binomial distribution (Schmid et al. 2010).
For functional response, we compute the loss pointwise and integrate over the domain of the response.
The following models can only be applied for scalar and not for functional response. For ordinal response, a proportional odds model can be used (Schmid et al. 2011). For categorical response, the multinomial logit model is available. For survival models, boosting Cox proportional hazard models and accelerated failure time models have been introduced by Schmid and Hothorn (2008b).
Case study (ctd.): Emotion components data
So far, we fitted a model for the conditional mean of the response. As a more robust alternative, we consider median regression by setting family = QuantReg(tau = 0.5). We use the update function, to update the functional model with the new family.
fof_signal_med < update(fof_signal, family = QuantReg(tau = 0.5))
For median regression, the smooth intercept is the estimated median at each timepoint and the effects are deviations from the median.
The combination of GAMLSS with functional variables is discussed in Brockhaus et al. (2016) and Stöcker et al. (2017).
For GAMLSS models, FDboost builds on the package gamboostLSS (Hofner et al. 2017), in which families are implemented to fit GAMLSS. For details on the boosting algorithm to fit GAMLSS, see Mayr et al. (2012) and Thomas et al. (2017).
The families in gamboostLSS need to model at least two distribution parameters.
For an overview of currently implemented response distributions for GAMLSS, we refer to Hofner et al. (2016).
In FDboost, the function FDboostLSS() implements GAMLSS with functional data.
The interface of FDboostLSS() is:
FDboostLSS(formula, timeformula, data = list(), families = GaussianLSS(), ...)
In formula a named list of formulas is supplied. Each list entry in the formula specifies the potential covariate effects for one of the distribution parameters. The names of the list are the names of the distribution parameters. The argument families is used to specify the assumed response distribution with its modeled distribution parameters. The default families = GaussianLSS() yields a Gaussian location scale model. In the dotsargument further arguments passed to FDboost() can be supplied. The model object which is fitted by FDboostLSS() is a list of FDboost model objects. It is not possible to automatically fit a smooth offset within FDboostLSS(). Per default, a scalar offset value is used for each distribution parameter. For functional response, it can thus be useful to center the response prior to the model fit. All integration weights for the loss function are set to one, corresponding to the negative loglikelihood of the observation points.
For model objects fitted by FDboostLSS(), methods to estimate the optimal stopping iterations, as well as methods for plotting and prediction exist. For more details on boosting GAMLSS models, we refer to Hofner et al. (2016), which is a tutorial for the package gamboostLSS.
Case study (ctd.): Fossil fuel data
We fit a Gaussian location scale model for the heat value. Such a model is obtained by setting families = GaussianLSS(), where the expectation is modeled using the identity link and the standard deviation by a loglink. Mean and standard deviation of the heat value are modeled by different covariates:
The mean is modeled depending on the water content as well as depending on the NIR and the UVVIS spectrum. The standard deviation is modeled using a loglink and a linear predictor based on the water content. The formula has to be specified as a list of two formulas with names mu and sigma for mean and standard deviation of the normal distribution. We use the noncyclic fitting method that is introduced by Thomas et al. (2017).
# center the variable water content # such that the linear effect in the variance is centered around zero fuelSubset$h2o_center < fuelSubset$h2o  mean(fuelSubset$h2o) library(gamboostLSS) # fit Gaussian location scale model sof_ls < FDboostLSS(list(mu = heatan ~ bbs(h2o, df = 4) + bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4) + bsignal(NIR, nir.lambda, knots = 40, df = 4), sigma = heatan ~ 1 + bols(h2o_center, df = 2)), timeformula = NULL, data = fuelSubset, families = GaussianLSS(), method = "noncyclic") # model object is list containing model for mu and sigma names(sof_ls)
[1] "mu" "sigma"
The optimal number of boosting iterations is searched on a grid of 1 to 2000 boosting iterations. The algorithm updates in each boosting iteration the baselearner that best fits the negative gradient. Thus, in each iteration the additive predictor for only one of the distribution parameters is updated.
# find optimal stopping iterations by 5fold bootstrap # takes some time, easy to parallelize on Linux set.seed(123) cvm_sof_ls < cvrisk(sof_ls, folds = cv(model.weights(sof_ls[[1]]), B = 5), grid = 1:2000, trace = FALSE)
The estimated coefficients for the expectation are similar to the effects resulting from the pure mean model. The water content has a negative effect on the standard deviation, with higher water content being associated with lower variability.
4.4 Variable selection by stability selection
Variable selection can be refined using stability selection (Meinshausen and Bühlmann 2010; Shah and Samworth 2013). Stability selection is a procedure to select influential variables while controlling false discovery rates and maximal model complexity. For componentwise gradient boosting, it is implemented in mboost in the function stabsel() (Hofner et al. 2015), which can also be used for model objects fitted by FDboost(). Brockhaus et al. (2017) compute functiononfunction regression models with more functional covariates than observations and perform variable selection by stability selection. Thomas et al. (2017) discuss stability selection for GAMLSS estimated by boosting.
5 Discussion
The R addon package FDboost provides a comprehensive implementation to fit functional regression models by gradient boosting. The implementation allows to fit regression models with scalar or functional response depending on many covariate effects. The framework includes mean, mean with link function, median and quantile regression models as well as GAMLSS. Various covariate effects are implemented including linear and smooth effects of scalar covariates, linear effects of functional covariates and interaction effects, also between scalar and functional covariates (Rügamer et al. 2016). The linear functional effects can have flexible integration limits, for example, to form historical or lag effects (Brockhaus et al. 2017). Whenever possible, the effects are represented in the structure of linear array models (Currie et al. 2006) to increase computational efficiency (Brockhaus et al. 2015). Componentwise gradient boosting allows to fit models in highdimensional data situations and performs datadriven variable selection. FDboost builds on the well tested and modular implementation of mboost (Hothorn et al. 2016). This facilitates the implementation of further baselearners in order to fit new covariate effects and that of families modeling other characteristics of the conditional response distribution.
Appendix A Constraints for effects of scalar covariates
Consider a model for functional response with smooth intercept and an effect that contains a smooth intercept as special case, , and define the mean effect at each point as . This model can be parametrized in different ways, e.g., as
The problem arises as (or any other smooth function in ) can be shifted between the intercept and the covariate effect. At the level of the design matrices of these effects, this can be explained by the fact that the columns of the design matrix and the columns of the design matrix of the functional intercept are linearly dependent. To obtain identifiable effects, Scheipl et al. (2015) propose to center such effects at each point . The centering is achieved by setting the pointwise expectation over the covariate effects to zero on , i.e., for all , approximated by the sumtozero constraint for all . How to enforce such constraints is described in Appendix A of Brockhaus et al. (2015). Other constraints to obtain identifiable models are possible. However, this sumtozero constraint for each point yields an intuitive interpretation: the intercept can be interpreted as global mean and the covariate effects can be interpreted as deviations from the smooth intercept.
The constraint is enforced by a basis transformation of the design and penalty matrix. As shown in Brockhaus et al. (2015), it is sufficient to apply the constraint on the covariatepart of the design and the penalty matrix. Thus, it is not necessary to transform the basis in direction.