RKHSMetaMod : An R package to estimate the Hoeffding decomposition of an unknown function by solving RKHS Ridge Group Sparse optimization problem^{†}^{†}thanks: Halaleh Kamari, Université Evry Val d’Essonne, INRA, Université ParisSaclay, France, @, Sylvie Huet, INRA, France, @, MarieLuce Taupin, Université Evry Val d’Essonne, France, @.
Abstract
In the context of the Gaussian regression model, RKHSMetaMod estimates a meta model by solving the Ridge Group Sparse Optimization Problem based on the Reproducing Kernel Hilbert Spaces (RKHS). The estimated meta model is an additive model that satisfies the properties of the Hoeffding decomposition, and its terms estimate the terms in the Hoeffding decomposition of the unkwown regression function. This package provides an interface from R statistical computing environment to the C++ libraries Eigen and GSL. It uses the efficient C++ functions through RcppEigen and RcppGSL packages to speed up the execution time and the R environment in order to propose an user friendly package.
Keywords: Meta model, Hoeffding decomposition, Ridge Group Sparse penalty, Reproducing Kernel Hilbert Spaces.
Introduction
Consider the Gaussian regression model with and . The variables are mutually independent and uniformly distributed on and they are independent of ’s. The function is unknown and is square integrable, i.e. , where and for .
Since the inputs are independent, the function can be written according to its Hoeffding decomposition (Sobolá (?)). More precisely, if is the set of parts of with dimension to , then the Hoeffding decomposition of is written as :
(1) 
where for all , denotes the vector with components , , is a constant, and is a function of . All terms of this decomposition are orthogonal with respect to .
Thanks to the independency between the variables , , the variance of can be decomposed as follows :
In this context, for any group of variables , , sensitivity indices (introduced by Sobolá (?)) are defined by :
Since the function is unknown, the functions are also unknown. The idea is to approximate the function by its orthogonal projection denoted on the RKHS , constructed as in Durrande et al. (?). Thanks to the properties of these spaces, any function , satisfies :
(2) 
where is a constant, denotes the scalar product in , is the reproducing kernel associated with the RKHS , and , for being the reproducing kernel associated with the RKHS . Moreover, for all , are centered and for all , the functions and are uncorrelated. So Equation (2) is the Hoeffding decomposition of .
The function is the solution of the minimization of over functions . Since , so and each function approximates the function in (1).
The number of functions to be estimated is equal to the cardinality of that may be huge. The idea is to estimate by , called RKHS meta model, which is the solution of the residual sum of squares minimization, penalized by a Ridge Group Sparse penalty function. This method estimates the groups that are suitable for predicting and the relationship between and for each of these groups.
RKHSMetaMod is an R package, that implements the RKHS Ridge Group Sparse optimization algorithm in order to estimate the terms in the Hoeffding decomposition of and we therefore get an approximation of the true function . More precisely, RKHSMetaMod provides functions such as RKHSgrplasso and RKHSMetMod to fit a solution of two following convex optimization problems :

RKHS Group Lasso,

RKHS Ridge Group Sparse,
where RKHS Group Lasso is a special case of the RKHS Ridge Group Sparse algorithm. These algorithms are described in the next section. In section , we give an overview of the RKHSMetaMod functions and in section , we illustrate the performance of these functions through different examples.
Description of the method
RKHS Ridge Group Sparse Optimization Problem
Let denote by , the number of observations. The dataset consists of a vector of observations , and a matrix of features with components . For some tuning parameters and , we consider the RKHS Ridge Group Sparse criteria defined by,
(3) 
where is the Euclidean norm in , and the matrix represents the predictors corresponding to the th group. The penalty function in the criteria above is a combination of Hilbert norm, ridge regularization, and empirical norm, group sparse regularization, which allows both select few terms in the additive decomposition of over sets , and to favour smoothness of the estimated . The minimization of (3) is carried out over a proper subset of .
According to the Representer Theorem (Kimeldorf and Wahba (?)), for all , we have for some matrix . Therefore, the minimization of (3) over a set of functions of comes down to the minimization of (4) over , and for :
(4) 
where is the Gram marix associated with the kernel . In the minimization of the problem above there is a huge number of coefficients to be estimated, , and it explains the importance of imposing the two penalty parts in the criteria (4) in order to get a sparse solution which approximates the best the unknown model. The method is fully described in Huet and Taupin (?).
RKHS Group Lasso Optimization Problem
The minimization of (4) could be seen as a Group Lasso optimization problem by considering only the second penalty function, i.e. . The RKHS Group Lasso criteria is then defined as :
(5) 
Note that the ordinary Group Lasso algorithm has been detailed by Meier et al. (?). From now on we denote by , the penalty parameter in the RKHS Group Lasso algorithm.
Choice of the tuning parameters
We propose to use a sequence of tuning parameters to create a series of estimators. These estimators are evaluated using a testing dataset . For each value of in the sequence, let be the estimation of , obtained by the learning dataset. Then, the prediction error is calculated by,
where . We then choose the pair with the smallest prediction error, and the model associated with these chosen tuning parameters is the “best” estimator of the true model . This estimation is denoted by .
In order to set up the grid of values of , one can set and find , the smallest value of such that the solution to the minimization of the RKHS Group Lasso problem is . Then could be a grid of values for .
Sensitivity indices (SI)
Once we obtain the estimator , we calculate it’s SI by,
Since , we have . We then use an estimator based on the empirical variances of functions (Huet and Taupin (?)) :
where is the mean of for . The are the approximations of the SI of the function ( and ).
Overview of the RKHSMetaMod functions
Main RKHSMetaMod functions
RKHSMetMod function
This function calculates a sequence of meta models which are the solutions of the RKHS Ridge Group Sparse or RKHS Group Lasso optimization problems.
Table 1 displays a summary of all input parameters of the RKHSMetMod function and default values for non mandatory parameters.
Input parameter  Description 

Y  Vector of response observations of size . 
X  Matrix of input observations with rows and columns. Rows correspond to observations and columns correspond to variables. 
kernel  Character, indicates the type of reproducing kernel choosed to construct the RKHS . All kernels available in this package are presented in Table 2. 
Dmax  Integer, between and , indicates the maximum order of interactions considered in the RKHS meta model: Dmax is used to consider only the main effects, Dmax to include the main effects and the interactions of order ,…. 
gamma  Vector of non negative scalars, values of the penalty parameter in decreasing order. If the function solves an RKHS Group Lasso problem and for it solves an RKHS Ridge Group Sparse problem. 
frc  Vector of positive scalars. Each element of the vector sets a value to the penalty parameter , . The value is calculated inside the program. 
verbose  Logical. Set as TRUE to print: the group for which the correction of the Gram matrix is done (see function calc_Kv), and for each pair of the penalty parameters : the number of current iteration, active groups and convergence criterias. It is set as FALSE by default. 
RKHSMetMod returns an instance of the “RKHSMetMod” class. Its three attributes will contain all outputs :

mu: value of the penalty parameter or , depending on the value of the penalty parameter .

gamma: value of the penalty parameter .

MetaModel: an RKHS Ridge Group Sparse or RKHS Group Lasso object associated with the penalty parameters mu and gamma.
Illustration of use of this function is given in Example .
RKHSMetMod_qmax function
determines , denoted , for which the number of active groups in the Group Lasso solution is equal to qmax, and it returns an RKHS meta model with at most qmax active groups for each pair of the penalty parameters . It is useful when we have some information about the data. That is, one could be interested in a meta model with the active groups not greater than some qmax. It is possible to fix the maximum number of active groups in the final estimator by setting this value to the input “qmax” in the function RKHSMetMod_qmax.
It has following arguments:

, , kernel, Dmax, gamma, verbose (see Table 1).

qmax: integer, shows the maximum number of active groups in the obtained solution.

rat: positive scalar, to restrict the minimum value of considered in the algorithm, . The value is calculated inside the program.

Num: integer, it is used to restrict the number of different values of the penalty parameter to be evaluated in the RKHS Group Lasso algorithm until it achieves : for Num the program is done for different values of , , or depending on the number of active groups in the meta model associated with , .
The RKHSMetMod_qmax function returns an instance of the “RKHSMetMod_qmax” class. Its three attributes contain the followings as outputs :

mus: vector of values of the evaluated penalty parameters in the RKHS Group Lasso algorithm until it achieves .

qs: vector of number of active groups associated with each element in mus.

MetaModel: an instance of the "RKHSMetMod" class for the obtained and a grid of values of .
Illustration of use of this function is given in Example .
Kernel type  Mathematic formula for  RKHSMetaMod name 

Linear  linear  
Quadratic  quad  
Brownian  brownian  
Matern  matern  
Gaussian  gaussian 
Companion functions
calc_Kv function
calculates the Gram matrices , for a chosen reproducing kernel (see Table 2), and returns their associated eigenvalues and eigenvectors, for vMax, with vMax. The output of this function is used as the input of the RKHS Group Lasso and RKHS Ridge Group Sparse algorithms. These algorithms rely on the definite positiveness of the Gram matrices , so it is mandatory to have that are definite positive. The options, “correction” and “tol”, are provided by this function in order to insure the positive definiteness of the matrices :
Let be the eigenvalues associated with the matrix . Set and . For each matrix "if tol", then the correction to is done : "The eigenvalues of are replaced by epsilon, with espilontol". This function has

four mandatory arguments :

, , kernel, Dmax (see Table 1).


three facultative arguments :

correction: logical, set as TRUE to make correction on matrices . It is set as TRUE by default.

verbose: logical set as TRUE to print the group for which the correction is done. It is set as TRUE by default.

tol: scalar to be chosen small, set as by default.

It returns a list of two components “kv” and “names.Grp” :

kv : list of vMax components, each component is a list of,

Evalues : vector of eigenvalues.

Q : matrix of eigenvectors.


names.Grp : vector of group names of size vMax.
Note that when working with variables that are not distributed uniformly on it sufficies to modify the construction of kernels in the function calc_Kv.
Illustration of use of this function is given in Example .
RKHSgrplasso function
fits the solution of an RKHS group lasso problem for a chosen . It has

three mandatory arguments :

(see Table 1).

Kv: list of eigenvalues and eigenvectors of the positive definite Gram matrices for vMax and their associated group names.

mu: positive scalar indicates the value of the penalty parameter .


two facultative arguments :

maxIter: integer, to set the maximum number of loops through all groups.

verbose: logical, set as TRUE to print: the number of current iteration, active groups and convergence criterias. It is set as FALSE by default.

This function returns an RKHS Group Lasso object associated with the penalty parameter . Illustration of use of this function is given in Example .
mu_max function
calculates the value of the penalty parameter when the first penalized parameter group enters the model. It has two mandatory arguments: the response vector , and the list matZ of eigenvalues and eigenvectors of the positive definite Gram matrices for vMax and returns the value. Illustration of use of this function is given in Example .
pen_MetMod function
fits the solution of the RKHS Ridge Group Sparse optimization problem for each pair of values of the penalty parameters . It provides two steps :

Initializes the parameters by the solutions of the RKHS Group Lasso algorithm for each value of the penalty parameter , and runs the algorithm through the active support of the RKHS Group Lasso solution until it achieves convergence.

Initializes the parameters with the obtained solutions of the First Step and runs the algorithm through all groups until it achieves convergence. This second step makes possible to verify that no groups are missing in the output of the step .
This function has

five mandatory arguments :

, gamma (see Table 1).

Kv: list of eigenvalues and eigenvectors of the positive definite Gram matrices for vMax and their associated group names.

mu: vector of positive scalars. Values of the penalty parameter in decreasing order.

resg: list of the RKHSgrplasso objects associated with each value of the penalty parameter , used as initial parameters at Step .


five facultative arguments :

gama_v and mu_v: vector of vMax positive scalars. These two inputs are optional, they are provided to associate the weights to the Ridge and Sparse Group penalties, respectively. They set to scalar , to consider no weights, i.e. all weights equal to .

maxIter: integer, to set the maximum number of loops through initial active groups at Step and maximum number of loops through all groups at Step .

verbose: logical, set as TRUE to print for each pair of penalty parameters : the number of current iteration, active groups and convergence criterias. It is set as FALSE by default.

calcStwo: logical, set as TRUE to execute Step . It is set as FALSE by default.

pen_MetMod() returns an RKHS Ridge Group Sparse object associated with each pair of penalty parameters . Illustration of use of this function is given in Example .
PredErr function
calculates the prediction errors. It has eight mandatory arguments :

, gamma, kernel, Dmax (see Table 1).

: matrix of observations of the testing dataset with rows and columns.

: vector of response observations of testing dataset of size .

mu: vector of positive scalars. Values of the Group Sparse penalty parameter in decreasing order.

res: list of estimated RKHS meta models for the learning dataset associated with the penalty parameters (output of one of the functions RKHSMetMod, RKHSMetMod_qmax or pen_MetMod).
Note that the same kernel and Dmax should be chosen as the ones used for the learning dataset. The function PredErr returns a matrix of the prediction errors, when each matrix element is the obtained prediction error associated with the corresponding RKHS meta model in “res”.
SI_emp function
calculates the empirical SI for an input or a group of inputs. It has two arguments :

res: list of the estimated meta models using RKHS Ridge Group Sparse or RKHS Group Lasso algorithms.

ErrPred: matrix or NULL. If matrix, each element of the matrix is the obtained prediction error associated with the corresponding RKHS meta model in res. Set as NULL by default
The empirical SI is then calculated for each RKHS meta model in “res”, and returns a list of vectors of SI. Note that if the argument ErrPred is the matrix of the prediction errors, the vector of empirical SI is returned for the “best” RKHS meta model in the “res”.
RKHSMetaMod through examples
Recall our model, . We set , , and we consider the gfunction of Sobol (Saltelli et al. (?)), defined over by,
The true SI of the gfunction could be expressed analytically (see Durrande et al. (?)). Here, we will present examples with different sizes of experimental design. The kernel used in all examples is the “matern” kernel. For each example, we compare the estimated empirical SI with the true SI.
Example 1
Simulate the experiment as proposed by Durrande et al. (?) :
Set , , and . With these values of coefficients , the variables and explain almost all of the variance of the function . We consider a grid of values of the penalty parameters and and we calculate the sequence of RKHS meta models using RKHSMetMod() function. We choose the “best” RKHS meta model and calculate its SI :
The minimum value of prediction error is obtained for , and the “best” RKHS meta model is then . The obtained SI are presented in the Table 3. In the first row, the reader finds the true SI, in the second row, the results obtained by Durrande et al. (?), in the third row, the empirical SI for , and the last row is the mean of the empirical SI of the “best” RKHS meta models for generated experimental designs.
sum  

SI  0.43  0.24  0.19  0.06  0.04  0.03  0.01  1 
SId ()  0.44  0.24  0.19  0.01  0.01  0.01  0.00  0.9 
SI.minErr ()  0.44  0.27  0.25  0.02  0.01  0.01  0.00  1 
mean.SI.minErr ()  0.46  0.25  0.18  0.04  0.03  0.01  0.00  0.97 
Example 2
Estimate the meta models with at most “qmax” active groups :
Take , , , . According to the true SI presented in Table 3, we can notice that the main factors and explain percent of the variance. So, one may be interested in a meta model with at most active groups. We set and we aim to find in order to obtain the meta models with at most active groups. Using the function RKHSMetMod_qmax :
The obtained value of the penalty parameter is and for each value of the penalty parameter , a RKHS meta model is calculated. The groups associated with are “v1.” “v2.” “v3.”. We can see that the groups with the most variabilities are active in the obtained meta models.
Example 3
A time saving trick to obtain the “optimal” tuning parameters when dealing with larger datasets :
Set , , , . We calculate the positive definite matrices using the function calc_Kv, and we consider two following steps :

Set and . Calculate an RKHS meta model for each value of using the function RKHSgrplasso. Gather all the obtained meta models in a list, res_g, (while this job is done with the function RKHSMetMod by setting , but here it is preferable to use the function RKHSgrplasso in order to avoid the recalculation of ’s at each Step). Thereafter, the prediction error for each estimator in the res_g is calculated by the function PredErr. We denote by the value of with the smallest error of prediction.

Choose a smaller grid of values of , , and set a grid of values of , . Use the function pen_MetMod to calculate the RKHS meta models associated with each pair of penalty parameters . Calculate the prediction error for the new sequence of meta models using the function PredErr. The best estimator is used to compute the empirical SI :
Note that the “mu” given in the output of Err_g is the RKHS Group Lasso penalty parameter , so and the grid of values of in the second step is .
Example 4
Dealing with larger dataset :
Take , , , . We calculate one single RKHS meta model associated with . The prediction error and the SI are calculated for the obtained meta model. The result of the prediction quality and the SI are displayed in Figure 1.
In Table 4 the reader finds the execution time for different functions used throughout the Examples 1 to 4. The execution time for the functions RKHSgrplasso and pen_MetMod is displayed for one single value of penalty parameters .
n  calc_Kv  mu_max()  RKHSgrplasso  pen_MetMod  sum 

100  0.72s  0.13s  11.66s  12.41s  24.92s 
500  53.71s  12.87s  311.86s  578.21s  956.65s ( 16mins) 
1000  257.27s  64.78s  1297.05s  1933.66s  3552.76s ( 1h) 
2000  1760.31s  442.72s  4552.85s  5812.37s  12568.25 ( 3h:30mins) 