MetaModel Framework for SurrogateBased Parameter Estimation in Dynamical Systems
Abstract
The central task in modeling complex dynamical systems is parameter estimation. This task is an optimization task that involves numerous evaluations of a computationally expensive objective function. Surrogatebased optimization introduces a computationally efficient predictive model that approximates the value of the objective function. The standard approach involves learning a surrogate from training examples that correspond to past evaluations of the objective function. Current surrogatebased optimization methods use static, predefined substitution strategies to decide when to use the surrogate and when the true objective. We introduce a metamodel framework where the substitution strategy is dynamically adapted to the solution space of the given optimization problem. The meta model encapsulates the objective function, the surrogate model and the model of the substitution strategy, as well as components for learning them. The framework can be seamlessly coupled with an arbitrary optimization algorithm without any modification: It replaces the objective function and autonomously decides how to evaluate a given candidate solution. We test the utility of the framework on three tasks of estimating parameters of realworld models of dynamical systems. The results show that the meta model significantly improves the efficiency of optimization, reducing the total number of evaluations of the objective function up to an average of 77%.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. \doi10.1109/ACCESS.20XX.DOI
Corresponding author: jovan.tanevski@ijs.si
ifferential equations, meta models, numerical optimization, parameter estimation, surrogate models
=15pt
1 Introduction
\PARstartEstimating the values of parameters of mathematical models of dynamical systems is often formulated as an optimization task with a computationally expensive objective function [1]. Given measurements of the behavior of a dynamical system, the task is to find values of model parameters that lead to a model simulation that closely fits the measurements. Computationally expensive simulation of the model is needed to assess the discrepancy between simulated and measured behavior of the observed system. Therefore, optimization approaches to parameter estimation can highly benefit from the use of surrogatebased optimization, which uses efficient approximation of the objective function. Such use of surrogates can thus substantially improve the efficiency of mathematical modeling.
Surrogatebased optimization solves optimization problems in situations where the resources for evaluating the objective function are limited. In computational domains, the limiting resource is most commonly computation time, which becomes critical when dealing with computationally expensive objective functions. The fundamental idea of surrogatebased optimization is to replace the computationally expensive objective function with a surrogate, i.e., a computationally efficient model that approximates the value of the true objective function. Surrogatebased methods employ machine learning algorithms for learning the surrogate model from training instances derived from the available evaluations of the true objective function.
Surrogatebased optimization can be deployed in two different application contexts. The first assumes a very limited number of evaluations of the true objective function. The aim of the surrogate model is to guide the selection of the most promising candidate solutions for evaluation. The Bayesian optimization approach [2] uses the surrogate model predictions and the corresponding confidences for selecting the next candidate solution that will be evaluated with the true objective function. The computational complexity of the selection process increases proportionally with the cube of the number of previously evaluated candidate solutions.
The use of Bayesian optimization is thus prohibitive in the second application context that assumes a large number of evaluations of the true objective function. The aim of the surrogate model, in this context, is to improve the efficiency of the optimization by replacing a large portion of the evaluations of the true objective function with evaluations of its surrogate. The parameter estimation task, addressed in this paper, fits this application context. The key component of the approaches applicable in this context is the substitution strategy that, for a given candidate solution, decides whether to evaluate it with the surrogate function or the true objective function [3]. Current approaches focus on maximizing the predictive performance of the surrogate model and use fixed, hardcoded substitution strategies [4, 5, 6].
In this paper, we design a metamodel framework for surrogatebased optimization with a substitution strategy that dynamically adapts to the space of evaluated candidate solutions. It includes two learning components: a component for learning a surrogate model of the true objective function and a component for learning a model of the substitution strategy. Additionally, the metamodel framework encapsulates the objective function, the surrogate model, the model of the substitution strategy, the history of evaluations and the learning components in a single, yet modular entity. An important consequence of the encapsulation is that the metamodel can be seamlessly coupled with an arbitrary optimization algorithm without any modification of the algorithm or the meta model. The latter replaces the true objective function and autonomously decides what function or model to use for the evaluation of a given candidate solution.
In our previous study [7], we show that learning the substitution strategy improves the overall performance of surrogatebased optimization. By learning the substitution strategy, instead of using a predefined one, the meta model is capable of solving complex numerical optimization problems while significantly reducing the number of evaluations of the true objective function. In this paper, we focus on the configuration of the learning components of the meta model. In particular, we conjecture that the selection of appropriate learning algorithms for the surrogate and substitutionstrategy models significantly impacts the overall performance of surrogatebased optimization.
To test the validity of the conjecture, we perform an extensive empirical evaluation of different instances of the metamodel framework. Each corresponds to a pair of learning algorithms for training the surrogate, on one hand, and the substitutionstrategy model, on the other. We select each algorithm in the pair among six alternative algorithms for learning predictive models, previously used in the literature on surrogatebased optimization—linear regression, decision trees, nearest neighbors, support vector machines, Gaussian processes and random forests—leading to 36 metamodel instances. In the first series of experiments, performed on synthetic benchmarks [8], we tune the parameters of each metamodel instance. In turn, we select the most successful instances that significantly outperform the others. The selected metamodel variants are evaluated in a second series of experiments on three realworld tasks of estimating the parameters of models of dynamical systems described by systems of coupled ordinary differential equations [1, 9].
We first describe in more detail the task of numerical optimization and parameter estimation. Next, we introduce and formally define the meta model, its components and parameters. We then lay out the setup for the empirical evaluation and report the results of our analyses. Finally, we provide a summary of our conclusions and outline directions for further research.
2 Numerical optimization and parameter estimation
We consider the task of numerical optimization involving a single, nonlinear objective function in an unconstrained, continuous space . The task is to find a solution that leads to the extremum of the objective function . The objective function can be either minimized or maximized: in the former case, the result of optimization is .
If the analytical solution for the minimum of is intractable, numerical methods are applied. These methods can belong to two groups: local and global optimization methods. While local methods [10] are efficient, they suffer from myopia, i.e., the tendency to end up in a local extreme point in the neighborhood of the initial point. On the other hand, global methods [11] are concerned with finding the global optimum point and use different strategies for sampling the solution space. To improve their efficiency, they are often coupled with surrogates.
Parameter estimation aims at finding values of the parameters of a given model of a dynamical system that result in a model simulation that closely fits a given set of measurements of the observed system behavior. Models of dynamical systems are usually formalized as systems of coupled ordinary differential equations [12], where denotes the vector of the observed state variables of the dynamical system, is the vector of the time derivatives of the state variables, is the function representing the model structure, and denotes the vector of the realvalued constant parameters of the model. Given an initial condition , i.e., the value of at the initial time point , the model simulation leads to a set of trajectories of the dynamical change of the state variables through time. Analytical solutions of systems of coupled ordinary differential equations describing complex realworld models are rarely an option, so computationally expensive numerical approximation methods for integration (simulation) are typically applied.
The task of parameter estimation (Figure 1 (A)) can be formalized as follows. Given the measured behavior of the system variables at time points , the task is to maximize the likelihood of the observed behavior given a particular value of , i.e., , where is a likelihood function. In practice, due to the complexity of the models considered, the likelihoodbased function is approximated by a leastsquares function , where denote the simulated behavior of the system variables at time points . Recall however, that is obtained by using a computationally intensive method for integrating differential equations, often leading to inefficient optimization and poor optima.
This is especially true in the process of discovery of knowledge about the complex behavior and function of biological systems. This often involves mathematical modeling of dynamical systems from observational data [9], with a key aspect being the task of parameter estimation [1]. Regarding the choice of a parameter estimation method for problems coming from the domain of systems biology, global stochastic and hybrid methods based on metaheuristics are considered most promising in the literature [13, 14]. These methods require a large number of objective function evaluations, which makes them ideal candidates for applying surrogatebased optimization.
3 Metamodel framework for surrogatebased optimization
We first introduce the mathematical metamodel framework for surrogatebased optimization in its abstract form. This part is accompanied by a graphical overview of the framework depicted in Figure 1(B). We then gradually proceed by specifying the machine learning components of the framework. Finally, we introduce a relevancebased surrogate management strategy that allows the meta model to make autonomous decisions about which function or model to use for evaluating a given candidate solution.
3.1 Metamodel framework
Our metamodel framework consists of four components. The first is the objective function that is the subject of optimization. The second component is the surrogate model that for a given candidate solution (or a query point) and based on the history of previous metamodel evaluations computes , the surrogate approximation of . The third component is the decision function that implements the dynamic substitution strategy: for a given and based on , it decides whether to use the objective function (output 1) or its surrogate (output 0). Finally, the fourth important component of the metamodel framework is the “history of evaluations”, where the data on past metamodel evaluations is kept. The evaluation history is a finite sequence of vectors: where is the dimension of the optimization problem and the dimension of additional data being kept in the history.
More formally, the function is defined by three functions and a history of evaluations , corresponding to the components of the meta model:

objective function ,

surrogate function ,

decision function .

evaluation history .
Given this components of the metamodel framework, the function is defined as
(1) 
While the functions can be arbitrary black boxes, we assume that the evaluation history of the meta model is updated after every evaluation. For our current needs, the evaluation history records the query point , the result of the meta model and the value of the decision function . If we denote the th evaluation of the meta model at the point with and the next one with , then is the extension of with the vector . The last two values are used to define the values of the targets in the data sets for learning the predictive models in the components and .
Algorithm 1 presents the meta model for evaluating a given candidate solution . First, the meta model checks whether there are enough evaluations of the true objective function (variable nEvalsF) in the evaluation history for training (or retraining) the surrogate and relevator models (functions and , see subsection 3.3). To this end, parameters , , and are being used; we provide further explanation in subsection 3.2. Next, if the predictive models are not available yet, the meta model opts for evaluating the true objective function (the variable decision is set to have a value of 1). Otherwise, based on the evaluation history and the parameter , it updates the decision threshold (see subsection 3.3) and uses it to decide whether to use the surrogate model (equivalent to in the pseudo code) or the true objective function to evaluate . It stores the evaluation in the variable value, which is the result of the meta model function. The meta model updates the counter nEvalsF that keeps track of the number of evaluations of the true objective function.
3.2 Surrogate
The surrogate function takes care of learning, updating and evaluating the surrogate predictive model . In the rest of the paper, we use when talking about the whole and when talking about the predictive model component of . There are three important aspects to be considered when constructing a good surrogate function: the type of the predictive model, the size of the training set used for its initialization and the frequency of the model updates.
We aim at selecting a surrogate predictive model that closely approximates the objective function and can be evaluated efficiently. Because we do not wish to additionally sample data for , we only rely upon the data stored in the evaluation history . This is particularly beneficial when working with populationbased methods, as the samples are more concentrated in the current area of our population, where we want better accuracy of our prediction model. Moreover, the efficiency of the surrogate function depends upon the tradeoff between the frequency of surrogate learning and the size of the training set. Having a high update frequency is desirable since the surrogate then always takes into account the most recent history of evaluations. On the other hand, frequent surrogate updates are unprofitable unless the learning time is fairly low compared to the evaluation time of the true objective function. To this end, we introduce userdefined parameters that determine the number of true object evaluations between the consecutive surrogate updates (parameters and in Algorithm 1).
A larger training set substantially slows down the algorithm by increasing the time needed to learn the surrogate model, which in turn negatively impacts the time performance of the metamodel. We attempt to solve the problem with filtration of the history of evaluations. In order to minimize the error of our prediction model, only evaluations of are used, which are easily extracted from the evaluation history because we additionally keep the values of (records from the evaluation history are extracted, where ). Furthermore, when using the meta model in conjunction with a populationbased method, the population slowly converges towards the minimum of the true objective function. We want the surrogate to be more precise in the current area of the search, which coincides with the most recent evaluation points. By focusing only on the most recent evaluations, we additionally emphasize points with a lower value of the objective, meaning less noise from the high value outliers, which are less relevant for learning the surrogate, since they correspond to nonoptimal points. Therefore, in our implementation, the training set includes a userdefined number of the most recent points from the history of evaluations (parameters and in the Algorithm 1). Additional filtration schemes are possible, for instance considering only the points with the lowest values of the objective function or a combination of the two. While all schemes are subsumed by adding a weight function, that introduces a large amount of additional parameters.
3.3 Relevator
Selecting a suitable decision function is of vital importance. As shown in our previous work[7] a simple uninformed decision function, which uses the step number as its only argument, performs poorly on harder optimization problems. One possible way of constructing a more successful decision function is by predicting how relevant the point will be for our optimization algorithm. It is based on the idea that points of high “relevance” for the optimization algorithm need to be predicted more accurately in order not to slow down progress. Thus, the evaluation of the most relevant points should be performed using the true objective function while less relevant points can be evaluated using the surrogate model.
Taking a decision based on the relevance of a point, brings up two issues. First, how do we formally define point relevance and second, how can the relevance of a point be estimated before evaluating the objective function. The point relevance can be calculated with a function passed to the meta model as an argument. However, in our current implementation, we define the relevance of a point relative to the lowest value of the objective function seen so far: the closer is the value to the lowest value, the higher is its relevance. In particular, if we use to denote the vector of values of in the evaluation history, we define the point relevance as
(2) 
The relevance of a point is bound by definition in the interval where the value of corresponds to a point of low relevance and to a point of high relevance and points close the current average value get mapped close to .
Having defined point relevance, we have to resolve the remaining issue of its estimation without evaluation of the objective function. We could use the same prediction model as to approximate and then calculate the relevance by simply replacing with in the formula above. Instead, we have decided to learn a separate model for predicting the point relevance. This not only gives us a much wider array of possible meta models but also allows us to dynamically adapt our relevance prediction model to the optimization task at hand. We refer to this model as the relevator.
To reduce the number of parameters in the framework, when constructing the training set for the relevator, we have decided to reuse the same filtering scheme as with the surrogate. Thus, to learn the relevance function, we construct the vector by using the evaluation history . However, we decided to only include values of present in our training set. Using the whole history of evaluations tends to increase the average of making it difficult to distinguish relevant points as their relevance moves closer towards . This problem is reduced by only using “recent values” present in out filtered train set.
As with the surrogate function , a call of the relevator not only predicts the value of , where is taken from the filtered , but also learns and updates the relevance prediction model whenever needed. In order to reduce the number of parameters we reuse the surrogate parameters for update frequency.
In addition to the relevator, the decision function includes a decision threshold that distinguishes the points with high relevance, which should be evaluated using the true objective, from points with low relevance, which should be evaluated with the surrogate. To allow for dynamic change of the threshold value , we define it as a function of the evaluation history .
Thus, the decision function of a relevator meta model is the indicator function , where is the relevance estimate of point given the history of evaluations and is a dynamical relevance threshold function.
(3) 
We implement the dynamical relevance threshold using an iterative update procedure with the goal to control and locally adjust the frequency of surrogate evaluations. By considering the userdefined number of most recent evaluations, we can either raise or lower the threshold after every meta model evaluation in order to increase or decrease the frequency of surrogate evaluations to (locally) achieve the desired substitution rate.
4 Results
In this section, we present the setup and the results of two series of experiments with the proposed metamodel framework. In the first series, we use 45 standard benchmark problems for numerical optimization to tune the parameters and evaluate the performance of the framework with different metamodel instances. Based on the comparison of their performance, we identify the machine learning methods that lead to suitable surrogate and substitutionstrategy (relevator) models. The most successful among them are evaluated on a second series of experiments on three tasks of estimating the parameters of three realworld models of dynamical systems from the domain of systems biology.
4.1 Metamodel tuning and selection
The construction of both the surrogate and the relevator functions for the meta model can be readily framed as a regression problem. We considered combinations of six different methods for learning: linear regression (LINEAR), regression tree with variance reduction and reducederror pruning (TREE), knearest neighbors with k=5 (KNN), Gaussian processes with squared exponential covariance (GP), support vector machines, SVM with RBF kernel (SVM), and Random Forest with 100 trees (RF). We used the default parameters from the Weka implementation for each method [15]. The selection of the machine learning methods is based on the stateoftheart studies of surrogatebased optimization approaches [4, 5, 6] that uses mostly Gaussian processes, followed by Random Forest and support vector machines. To check the utility of other, more efficient machine learning methods, we decided to include also linear regression, nearest neighbors and regression trees.
Parameter  Candidate values 

0, 10, 25, 50  
100, 200, 500  
0, 10, 25, 50  
50, 100, 200  
0.3, 0.4, 0.5, 0.6, 0.7, 0.8 
For each of the 36 surrogaterelevator combinations, we tuned the five parameters of the meta model by using COCO, the platform for comparing numerical optimization methods in a blackbox setting [8]. The parameters were tuned by using grid search with values as shown in Table 1. The parameters and are used to calculate the training set size and the parameters and are used to calculate the interval for rebuilding the surrogate as and . Both sizes are relative to the dimension of the problem . The last parameter is , the desired rate of substitution of the objective function with the surrogate model.
The widelyused COCO platform contains a set of blackbox optimization functions that are used as benchmarks problems for numerical optimization. From this set, we selected 15 functions from three different classes of problems that resemble realworld parameter estimation tasks: unimodal functions with high conditioning, multimodal functions with adequate global structure and multimodal functions with weak global structure. Within each class the functions differ by levels of deceptiveness, illconditioning, regularity, separability and symmetry. Each optimization function can be generalized to a different number of dimensions. We consider instances of each function in 5, 10 and 20 dimensions. Thus, we use a total of 45 benchmark optimization functions. The best set of parameters for each surrogaterelevator pair is selected that maximizes the improvement in performance relative to a baseline optimization method without surrogates.
Regarding the choice of a parameter estimation method, global stochastic and hybrid methods based on metaheuristics are considered as most promising in the literature [13, 14]. Out of the many different methaheuristic methods, Evolutionary Strategies and Differential Evolution have been identified as the most successful [16, 17] in the intended domain of application in this work – systems biology. We use Differential Evolution (DE) [18] as the baseline optimization method. It is a staple evolutionary and populationbased method that has consistently shown robust and reliable performance on various problems in many domains [19].
For each of the 45 functions we establish the baseline performance of DE without meta model by running it with a budget of evaluations. The performance improvement of a metamodel on a single benchmark function is then calculated as . Note that represents the number of evaluations needed to reach the minimum value without surrogates, while is the number of true function evaluations needed to reach the same minimum of using the meta model. The overall performance improvement of a metamodel is then calculated as , where is the average performance improvement on all 45 functions and is the proportion of all functions with .
For each surrogaterelevator pair we selected the parameters that resulted in the best performance improvement. While we cannot draw any firm conclusions about the best size of the training set and the rebuild interval, the results show the potential of the meta model to achieve a very high replacement rate of evaluations of the true objective function with evaluations of the surrogate function. For parameters and , all possible values were selected an equal number of times as the best configuration. The lowest values were most frequently selected for and (17/36 and 14/36). Most importantly, the highest possible value of 0.8 or the parameter was selected in 24 out of 36 best configurations.
To analyze the impact of the selection of the algorithms for learning the surrogate and the relevator model, we performed Friedman’s rank sum test and a Nemenyi posthoc analysis [20]. To perform the tests, we grouped the methods in two ways. First, we grouped the metamodel instances in 6 groups according to the algorithm used for learning the surrogate. The second grouping is based on the algorithm used for learning the substitutionstrategy model. For both groupings, the Friedman test checks the validaty of the null hypothesis that all the groups of metamodel instances perform equally well. In both cases, the null hypotheses is rejected with a pvalue of less than , leading to a conclusion that the selection of the algorithm for learning the surrogate or the relevator has a significant impact on the performance of the meta model. Furthermore, the posthoc analysis uses the Nemenyi test to investigate the significance of the existing differences by calculating the critical distance between the average ranks of the groups. The performances of the two groups differ significantly, if their difference is larger or equal to the critical distance. In both groupings, the critical distance at the significance level of 0.05 equals 0.4594.
Figure 2 employs averagerank diagrams to summarize the results of the comparative analysis. The horizontal axis of the averagerank diagram corresponds to the rank of the group of metamodel instances: the topranked group is on the leftmost position on the axis. The line above the axis, labeled CD, depicts the value of the critical distance. The groups of metamodel instances with utilities that are not significantly different are connected with thick lines below the axis.
The averagerank diagram in Figure 2(A) groups the metamodel instances according to the algorithm used to learn the surrogate. Considering the significance of the pairwise differences between the groups, we can exclude linear regression as an algorithm leading to metamodel instances with significantly inferior performance. The averagerank diagram in Figure 2(B) groups the metamodel instances with respect to the algorithm for learning the substitutionstrategy model. Metamodel instances using Random Forest, nearest neighbors and support vector machines for learning the relevator significantly outperform the other metamodel instances using the other three algorithms. The comparison of the two graphs reveals that the meta model is more sensitive to the selection of the algorithm for learning the relevator then the selection of the algorithm for learning the surrogate. This is an important insight showing that the choice of the model for the dynamical substitution strategy adapted to the problem at hand has a significant impact on the utility of surrogatebased optimization.
4.2 Estimating the parameters of dynamical biological systems
We are interested in the performance of the metamodel in the case of the realworld problem of estimating the parameters of models of dynamical biological systems. We selected three dynamical biological systems with varying degrees of complexity (shown in Figure 3). The three systems have been well studied in terms of their dynamical properties and identifiability [21, 22].
The first system is a synthetic oscillatory network of three proteincoding genes interacting in an inhibitory loop, known as the Repressilator, modeled by Ellowitz and Leibler [23]. The system is modeled by six variables and four constant parameters. The timeseries data for this problem is obtained by numerical integration of the system of ordinary differential equations with the parameter values reported in [23] for 30 integer timepoints. The objective function used is the sum of squared errors between the simulated trajectories of the model and the available data.
The second system is a metabolic pathway representing a biological NAND gate [24]. The model is represented by a set of five ODEs with 15 constant parameters. Observation data for the metabolic pathway model was obtained from [22]. It consists of 12 sets of observations obtained by simulating the model using 12 different pairs of input step functions sampled uniformly at 7 time points. The objective function used is the negative loglikelihood of the simulated trajectories of the model and the observations, summed across all datasets.
The third system is a genetic network [25]. The system is represented as a five variable Ssystem model with 23 constant parameters. The observation data for the Ssystem model was obtained from [22]. It consists of 10 sets of observations obtained by simulating the model using 10 different sets of initial conditions for all variables, sampled uniformly at 11 time points. The objective function is a logtransformation of the negative loglikelihood function, with preserved order and mapping to . As in the previous system, the objective function was summed across all datasets.
To establish a baseline, we ran Differential Evolution (DE) without a metamodel on each problem with a budget of evaluations, where is the dimensionality of the problem, i.e., the number of constant parameters in the system. For the metamodel we considered the 5 methods for learning a surrogate and the 3 methods for learning a relevator function, that were shown to have statistically significant performance improvements over the other methods. In all cases the parameter estimation was repeated 10 times with different random seeds.
Figure 4 shows the transposed convergence curves for the optimization runs (minimum of 10 restarts) without using a metamodel (DE) and using a metamodel with different surrogaterelevator pairs. Recall that the transposed convergence curve depicts the number of evaluations (y axis) necessary to achieve a certain value of the objective function (x axis, logarithmic scale) and thus, lower curve indicates faster convergence of the method. In all the graphs, the red curve corresponding to the optimization without the meta model is above the others indicating the superior convergence of the methods using the meta model. The exception to this rule are the meta models with the SVM relevator (depicted in the top left graph of the Figure 4) that fail to outperform the baseline DE method. The graphs in the second column show the consistent superiority of the meta models using RF method as a relevator: for all the tasks, these methods achieve the lowest values of the objective function. For the repressilator task, the meta models using surrogates based on decision trees lead to fastest convergence; for the other meta models with treebased and RF repressilators have superior convergence. Note finally that the meta models using the SVM relevator also consistently outperform the baseline, but the curves are shorter when compared to the meta models using RF relevator, indicating inferior utility with respect to the obtained value of the objective function.
More importantly, the metamodel achieves a significant improvement in the speed of convergence. To compare the convergence behavior of the metamodel to the baseline (DE) across all problems, we performed Page’s trend test for ordered alternatives as proposed by Derrac et al. [26] on 20 uniformly distributed cut points along the logtransformed convergence curves. We test the null hypothesis that the difference between two curves (minimum across 10 restarts) does not increase with the number of evaluations, i.e. there is no difference in the speed of convergence. Table 2 shows the pvalues obtained for the different surrogaterelevator pairs; note that the values below are emphasized in bold letters.
TREE  KNN  GP  SVM  RF  

KNN  3.69e3  3.04e5  0.504  0.372  5.87e6 
SVM  0.399  0.437  0.644  0.528  0.704 
RF  5.82e6  4.09e13  4.98e3  1.26e8  4.26e9 
Page’s trend test indicates that in terms of improvement in the speed of convergence, Random Forest is the superior choice for the relevator of a metamodel when combined with any choice of method for learning the surrogate function. The use of the knearest neighbors method for the relevator also results in more efficient convergence, however only when combined with certain methods (i.e., TREE, KNN and RF) for learning the surrogate function.
Both Random Forest and knearest neighbors are conservative estimators in that they are limited in their ability to extrapolate predictions for candidate solutions with feature values outside the space covered by solutions in the training set. We conjecture that this property of the surrogates and the relevator is exploited by the optimization method to efficiently explore nonoptimal parts of the objective space, which improves the convergence. This property also reduces the possibility of error from evaluating the surrogate function when exploring parts of the solution space that have high potential for optimality.
We further analyze the performance improvement of the surrogaterelevator pairs. Table 3 shows the values achieved by the metamodel with different surrogaterelevator pairs. The metamodel achieves a remarkable performance improvement with an average reduction of up to 77% of the number of true function evaluations (RFRF). On individual problems the metamodel achieves up to 94% performance improvement (SVMRF).
As was the case for the improvement in speed of convergence, the best performing relevator function is Random Forest closely followed by knearest neighbors. It is compelling that the best performing surrogate function on average is a simple regression tree closely followed by SVM.
Regarding the performance achieved on individual problems, it is worth noting that the SVM relevator is unable to improve the optimization performance on the Repressilator. The performance improvement achieved on the other problems by using the SVM as relevator is on par with others. The Gaussian processes surrogate exhibits the same issue.
TREE  KNN  GP  SVM  RF  
KNN  0.41/0.83/0.87  0.37/0.5/0.78  0.00/0.47/0.81  0.81/0.50/0.86  0.33/0.24/0.89 
0.70  0.56  0.42  0.72  0.49  
SVM  0.00/0.72/0.84  0.00/0.59/0.70  0.00/0.43/0.82  0.00/0.56/0.87  0.00/0.62/0.85 
0.52  0.43  0.42  0.48  0.49  
RF  0.73/0.72/0.86  0.26/0.65/0.74  0.00/0.61/0.87  0.31/0.82/0.94  0.36/0.77/0.89 
0.77  0.55  0.49  0.69  0.67 
The best performing surrogaterelevator pairs are heterogeneous. Learning the surrogate function and the relevator function are clearly independent tasks that require different learning methods. Overall, given the results of the empirical evaluation and taking into account the computational time needed by the different learning methods, we recommend choosing a strong and robust learner for the relevator function such as Random Forest. The computational cost can be leveraged by the choice of a simpler learner for the surrogate function without compromising the performance.
5 Related work
In the literature on surrogatebased optimization, the substitution strategy is referred to as a surrogate management strategy [3]. Figure 5 depicts the categorization of the stateoftheart surrogatebased optimization methods into two classes of wrapper (B) and embedded (C) methods. To better understand the figure, consider first the simple situation of a numerical optimization algorithms that do not use surrogates (A). In such an environment, the optimization method interacts only with the true objective function by requesting numerous evaluations of different candidate solutions . At the end, the method reports the optimal solution that minimizes (or maximizes) the value of .
Wrapper methods place the surrogate management strategy outside the optimization method. Following this approach, the wrapper first initializes the surrogate using a sample of candidate solutions and their respective objective evaluations . In consecutive iterations, the wrapper first runs the optimization method using the surrogate model , obtaining a solution . Next, it evaluates this solution using the true objective function. Finally, the solution and its evaluation are added to the training set and the surrogate model is updated (relearned) before running the next iteration. Note that wrapper methods use a fixed surrogate management strategy that is encoded in the wrapper. Recent developments of the wrapperapproach methods include the methods for constrained numerical optimization COBRA [27] and SOCOBRA [4].
Embedded methods encode the substitution strategy within the optimization method. Following this approach, the decision on whether to use the surrogate or the true objective function is based on various artifacts of the algorithm [3]. In particular, populationbased evolutionary optimization methods use the surrogate model to evaluate the offspring candidates for the next generation of individuals. On the other hand, the selection of the top candidates to be actually included in the next generation, is based on the evaluation of the true objective function . A simpler, generationbased management strategy evaluates the surrogate function in some generations, and the true objective function in others. Following the embedded approach, numerous new variants of the classical optimization methods in general [5], and [28, 6] in particular, have been developed.
In sum, the comparison of the wrapper and embedded class of methods, depicted in Figure 5, shows the following. Wrapper approaches are inflexible when it comes to the substitution strategy, since they force the evaluation of the surrogate function within the wrapped optimization method, while the true objective function can only be evaluated from outside the method. On the other hand, embedded approaches are more flexible, but their decision function relies directly on the current state of the core optimization algorithm. Also, they requires reimplementation or modification of an existing implementation of the base optimization method.
The proposed metamodel framework for surrogatebased optimization combines the simplicity of the wrapper approaches with the flexibility of the embedded approaches. On one hand, like in wrapper methods, the meta model can be coupled with any core optimization method since it is used as a black box (see Figure 5(D)). Unlike other wrapper methods, the surrogate model and the substitution strategy are coupled with the true objective function in a manner independent from the optimization algorithm. On the other hand, as in embedded methods, the substitution strategy of the meta model is more flexible. While embedded approaches base the substitution decision on the artifacts of the optimization algorithm, the metamodel substitution strategy dynamically adapts to the solution space of the optimization problem at hand.
6 Conclusions
The main contribution of this paper is the novel metamodel framework for surrogatebased optimization. In contrast with the prevailing focus of existing surrogatebased optimization methods on learning accurate surrogate models, the proposed framework involves two learning components. One of these learns the surrogate model and the other learns the decision function that takes the decision on when to substitute the true objective function with the surrogate model.
The results of the empirical evaluation of the metalevel framework confirm our initial hypothesis that the selection of appropriate surrogate and decisionfunction models can have significant influence on the overall performance of surrogatebased optimization. Moreover, the results show that the metamodel performance is more sensitive to the selection of the decisionfunction model: while almost all learning algorithms (except linear regression) lead to useful surrogate models, only random forests, nearest neighbors and support vector machines are appropriate in the role of decisionfunction models.
More specific contribution of the paper is a novel surrogatebased approach to estimating the parameters of ordinary differential equations. We consider three parameter estimation tasks with different complexity and showed that, for these tasks, the use of a meta model improves the efficiency of optimization. In particular, the use of the relevator meta model for surrogatebased optimization significantly and efficiently improves the convergence rate and the final result of the optimization, when considering a limited number of evaluations of the true objective function.
The presented metamodel framework significantly contributes to the current machine learning literature by establishing a new paradigm for coupling optimization and machine learning methods. While most of the studies in the machine learning literature are currently based on the sequential modelbased optimization (SMBO) paradigm [2], the proposed framework opens a whole new avenue of research, rich with opportunities for coupling surrogate models with other stateoftheart optimization methods. The proposed framework is ready to be applied in the context of the currently very active machine learning research on hyperparameter optimization [29], algorithm configuration [30], and to other metalearning tasks [31].
The current conceptualization of the metalevel framework is limited to singleobjective, unconstrained optimization. Its generalization towards dealing with multiple objective functions and/or constraints represents two possible directions for further research. Despite the fact that we have applied the framework to numerical optimization only, it is general enough to address also combinatorial or mixed optimization problems. The evaluation of the framework, presented in this paper, is also limited to its coupling with the Differential Evolution method. While this is a typical representative of a more general class of populationbased optimization methods, further experimental evaluation is necessary to establish its generality with respect to the selection of the base optimization algorithm. Finally, further evaluation can include comparative analysis of the performance of the metamodel framework relative to the performance of wrapper and embedded surrogate approaches.
Data availability
The source code of the implementation of the metamodel framework for surrogatebased parameter estimation, the models and the data used in the experiments are freely available at http://source.ijs.si/zluksic/metamodel.
Acknowledgements
The authors acknowledge the financial support of the Slovenian Research Agency (research core funding No. P20103, No. P50093 and project No. N20128) and the Slovenian Ministry of Education, Science and Sport (agreement No. C333017529021).
Model of the repressilator
Model of a metabolic NAND gate
Ssystem model of a genetic network
Footnotes
 Note that the alternative notation will be used whenever we want to emphasize the modification of the evaluation history.
References
 Jaqaman, K. & Danuser, G. Linking data to models: data regression. Nature Reviews Molecular Cell Biology 7, 813–819 (2006).
 Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black box functions. Journal of Global Optimization 13, 455––492 (1998).
 Jin, Y. Surrogateassisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1, 61–70 (2011).
 Bagheria, S., Konena, W., Emmerich, M. & Bäck, T. Solving the gproblems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control (2015). URL http://arxiv.org/abs/1512.09251.
 Das, S., Mullick, S. S. & Suganthan, P. N. Recent advances in Differential Evolution  an updated survey. Swarm and Evolutionary Computation 27, 1–30 (2016).
 Rammohan, M. & Minho, L. An evolving surrogate modelbased differential evolution algorithm. Applied Soft Computing 34, 770–787 (2015).
 Lukšič, Ž., Tanevski, J., Džeroski, S. & Todorovski, L. General metamodel framework for surrogatebased numerical optimization. In Yamamoto, A., Kida, T., Uno, T. & Kuboyama, T. (eds.) Discovery Science, 51–66 (Springer International Publishing, Cham, 2017).
 Hansen, N., Auger, A., Mersmann, O., Tušar, T. & Brockhoff, D. COCO: A platform for comparing continuous optimizers in a blackbox setting. arXiv 1603.08785 (2016).
 Kirk, P., Silk, D. & Stumpf, M. P. H. Uncertainty in Biology: A Computational Modeling Approach, chap. Reverse Engineering Under Uncertainty, 15–32 (Springer, 2016).
 Nocedal, J. & Wright, S. J. Numerical Optimization (Springer, 2006), 2nd edn.
 Pintér, J. D. Global Optimization in Action: Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications (Springer, 1995).
 Murray, J. D. Mathematical Biology (Springer, 1993), 2nd edn.
 Ashyraliyev, M., FomekongNanfack, Y., Kaandorp, J. A. & Blom, J. G. Systems biology: parameter estimation for biochemical models. FEBS Journal 276, 886–902 (2009).
 Chou, I. & Voit, E. O. Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences 219, 57–83 (2009).
 Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. Appendix B  The WEKA workbench. In Data Mining, 553–571 (Morgan Kaufmann, 2016), fourth edition edn.
 Sun, J., Garibaldi, J. M. & Hodgman, C. Parameter estimation using metaheuristics in systems biology: A comprehensive review. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9, 185–202 (2012).
 Tashkova, K., Korošec, P., Šilc, J., Todorovski, L. & Džeroski, S. Parameter estimation with bioinspired metaheuristic optimization: Modeling the dynamics of endocytosis. BMC Systems Biology 5, 1–26 (2011).
 Storn, R. & Price, K. Differential Evolution  a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 341–359 (1997).
 Chakraborty, U. K. Advances in Differential Evolution (Springer Publishing Company, Berlin, Heidelberg, 2008), 1 edn.
 Demšar, J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006).
 Buse, O., Pérez, R. & Kuznetsov, A. Dynamical properties of the repressilator model. Physical Review E 81, 066206 (2010).
 Gennemark, P. & Wedelin, D. Efficient algorithms for ordinary differential equation model identification of biological systems. IET Systems Biology 1, 120–129 (2007).
 Elowitz, M. B. & Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000).
 Arkin, A. & Ross, J. Statistical Construction of Chemical Reaction Mechanisms from Measured TimeSeries. The Journal of Physical Chemistry 99, 970–979 (1995).
 Kikuchi, S., Tominaga, D., Arita, M., Takahashi, K. & Tomita, M. Dynamic modeling of genetic networks using genetic algorithm and ssystem. Bioinformatics 19, 643–650 (2003).
 Derrac, J., García, S., Hui, S., Suganthan, P. N. & Herrera, F. Analyzing convergence performance of evolutionary algorithms: A statistical approach. Information Sciences 289, 41–58 (2014).
 Regis, R. G. Constrained optimization by radial basis function interpolation for highdimensional expensive blackbox problems with infeasible initial points. Engineering Optimization 46, 218–243 (2013).
 Su, G. Gaussian process assisted differential evolution algorithm for computationally expensive optimization problems. In Proceedings of the IEEE PacificAsia Workshop on Computational Intelligence and Industrial Application, 272–276 (2008).
 Wistuba, M., Schilling, N. & SchmidtThieme, L. Scalable gaussian processbased transfer surrogates for hyperparameter optimization. Machine Learning 107, 43–78 (2018).
 Eggensperger, K., Lindauer, M., Hoos, H., Hutter, F. & LeytonBrown, K. Efficient benchmarking of algorithm configurators via modelbased surrogates. Machine Learning 107, 15–41 (2018).
 Brazdil, P. & GiraudCarrier, C. Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Machine Learning 107, 1–14 (2018).