A Simple Heuristic forBayesian Optimization with A Low Budget

A Simple Heuristic for
Bayesian Optimization with A Low Budget

Masahiro Nomura
nomura_masahiro@cyberagent.co.jp
&Kenshi Abe
abe_kenshi@cyberagent.co.jp
Abstract

The aim of black-box optimization is to optimize an objective function within the constraints of a given evaluation budget. In this problem, it is generally assumed that the computational cost for evaluating a point is large; thus, it is important to search efficiently with as low budget as possible. Bayesian optimization is an efficient method for black-box optimization and provides exploration-exploitation trade-off by constructing a surrogate model that considers uncertainty of the objective function. However, because Bayesian optimization should construct the surrogate model for the entire search space, it does not exhibit good performance when points are not sampled sufficiently. In this study, we develop a heuristic method refining the search space for Bayesian optimization when the available evaluation budget is low. The proposed method refines a promising region by dividing the original region so that Bayesian optimization can be executed with the promising region as the initial search space. We confirm that Bayesian optimization with the proposed method outperforms Bayesian optimization alone and shows equal or better performance to two search-space division algorithms through experiments on the benchmark functions and the hyperparameter optimization of machine learning algorithms.

\keywords

Bayesian Optimization Hyperparameter Optimization

1 Introduction

Black-box optimization is the problem of optimizing an objective function within the constraints of a given evaluation budget . In other words, the objective is to obtain a point with the lowest possible evaluation value within the function evaluations. In black-box optimization, no algebraic representation of is given, and no gradient information is available. Black-box optimization involves problems such as hyperparameter optimization of machine learning algorithms [1, 8, 13, 11], parameter tuning of agent-based simulations [32], and aircraft design [4].

In black-box optimization, it is generally assumed that the computational cost for evaluating the point is large; thus it is important to search efficiently with as low budgets as possible. For example, it is reported that the experiment of hyperparameter optimization of Online LDA takes about 12 days for 50 evaluations [25]. The performance of deep neural networks (DNN) is known to be very sensitive to hyperparameters, and it has been actively studied in recent years [2, 7, 13, 20, 19, 9]. Because the experiment of DNN also takes a long time to learn corresponding to one hyperparameter, a lower evaluation budget can be used for hyperparameter optimization.

Bayesian optimization is an efficient method for black-box optimization. Bayesian optimization is executed by repeating the following steps: (1) Based on the data observed thus far, it constructs a surrogate model that considers the uncertainty of the objective function. (2) It calculates the acquisition function to determine the point to be evaluated next by using the surrogate model constructed in step (1). (3) By maximizing the acquisition function, it determines the point to be evaluated next. (4) It then updates the surrogate model based on the newly obtained data, then returns to step (2).

However, Bayesian optimization, which constructs a surrogate model for the entire search space, can show bad performance in the low budget setting, because an optimization method cannot sample points sufficiently. In the low budget setting, we believe that a search should be performed locally; however, in Bayesian optimization, to estimate uncertainty in the search space by surrogate model, the points are sampled for the search space globally. Therefore, the lack of local search degrades the performance of Bayesian optimization. If there is no prior knowledge of the problem, the search space tends to be widely defined. When the search space is widely defined, the search will be performed more globally, degrading performance.

In this study, we develop a heuristic method that refines the search space for Bayesian optimization when evaluation budget is low. The proposed method performs division to reduce the volume of the search space. The proposed method makes it possible to perform Bayesian optimization within the local search space determined to be promising. We confirm that Bayesian optimization with the proposed method outperforms Bayesian optimization alone (that is, Bayesian optimization without the proposed method) by the experiments on the six benchmark functions and the hyperparameter optimization of the three machine learning algorithms (multi-layer perceptron (MLP), convolutional neural network (CNN), LightGBM). We also experiment with Simultaneous Optimistic Optimization (SOO) [21] and BaMSOO [30], which are search-space division algorithms, in order to confirm the validity of the refinement of the search space by the proposed method.

2 Background

2.1 Bayesian Optimization

Algorithm 1 shows the algorithm of Bayesian optimization, which samples and evaluates the initial points (line 1), constructs the surrogate model (line 3), finds the next point to evaluate by optimizing the acquisition function (line 4), evaluates the point selected and receives the evaluation value (line 5), and updates the data (line 6).

The main components of Bayesian optimization are the surrogate model and the acquisition function. In this section, we describe Bayesian optimization using a Gaussian process as the surrogate model and the expected improvement (EI) as the acquisition function.

0:  objective function , search space , initial sample size , surrogate model , acquisition function
1:  sample and evaluate initial points:
2:  for  do
3:     construct surrogate model by
4:     find by optimizing the acqusition function :
5:     evaluate and receive:
6:     update the data
7:  end for
Algorithm 1 Bayesian Optimization

2.1.1 Gaussian Process

A Gaussian process [24] is the probability distribution over the function space characterized by the mean function and the covariance function . We assume that data set and observations are obtained. The mean and variance of the predicted distribution of a Gaussian process with respect to can be calculated using the kernel function as follows:

(1)
(2)

Here,

(3)
(4)

The squared exponential kernel (Equation (5)) is one of the common kernel functions.

(5)
(6)

Here, is a parameter that adjusts the scale of the whole kernel function, and is a parameter of sensitivity to the difference between the two inputs .

2.1.2 Expected Improvement

The EI [16] is a typical acquisition function in Bayesian optimization, and it represents the expectation value of the improvement amount for the best evaluation value of the candidate point. Let the best evaluation value to be , EI for the point is calculated as follow:

(7)

When we assume that the objective function follows a Gaussian process, Equation (7) can be calculated analytically as follows:

(8)

Here, and are the cumulative distribution function and probability density function of the standard normal distribution, respectively.

2.2 Related Work

2.2.1 Bayesian optimization

In Bayesian optimization, the design of surrogate models and acquisition functions are actively studied. The tree-structured Parzen Estimator (TPE) algorithm [1, 3], Sequential Model-based Algorithm Configuration (SMAC) [12] and Spearmint [25] are known as powerful Bayesian optimization methods. The TPE algorithm, SMAC, and Spearmint use a tree-structured parzen estimator, a random forest, and a Gaussian process as the surrogate model, respectively. The popular acquisition functions in Bayesian optimization include the EI [16], probability of improvement [14], upper confidence bound (UCB) [26], mutual information (MI) [5], and knowledge gradient (KG) [10].

However, there are few studies focusing on search spaces in Bayesian optimization. A prominent problem in Bayesian optimization is the boundary problem [28] that points sampled concentrate near the boundary of the search space. Oh et al. addressed this boundary problem by transforming the ball geometry of the search space using cylindrical transformation [23]. Wistuba et al. proposed using the previous experimental results to prune the search space of hyperparameters where there seems to be no good point [31]. In contrast to Wistuba’s study, we propose a method to refine the search space without prior knowledge. Nguyen et al. dynamically expanded the search space to cope with cases where the search space specified in advance does not contain a good point [22]. In contrast to Nguyen’s study, we focus on refining the search space rather than expanding.

2.2.2 Search-Space Division Algorithm

The proposed method is similar to methods such as Simultaneous Optimistic Optimization (SOO) [21] and BaMSOO [30] in that it focuses on the division of the search space. SOO is an algorithm that generalizes the DIRECT algorithm [15], which is a Lipschitz optimization method, and the search space is expressed as a tree structure and the search is performed using hierarchical division. BaMSOO is a method that makes auxiliary optimization of acquisition functions unnecessary by combining SOO with Gaussian process. Wang et al. reported that BaMSOO shows better performance than SOO in experiments on some benchmark functions [30]. In the proposed method and search-division algorithms, SOO and BaMSOO, the motivation for optimization is different; the proposed method divides the search space to identify a promising initial region for Bayesian optimization, while the search-division algorithms divide the search space to identify a good solution.

3 Proposed Method

In Bayesian optimization, there are many tasks with a low available evaluation budget. For example, in hyperparameter optimization of machine learning algorithms, budget would be limited in terms of computing resources and time. In this study, we focus on Bayesian optimization when there is not enough evaluation budget available.

Nguyen et al. state that Bayesian opitmization using a Gaussian process as the surrogate model and UCB as the acquisition function has the following relationships between the volume of a search space and the cumulative regret (the sum of differences between the optimum value and the evaluation value at each time) [22]. (i) A larger space will have larger (worse) regret bound. (ii) A low evaluation budget will make the difference in the regrets more significant. Nguyen et al. give above description for cumulative regret [22], but converting it to simple regret is straightforward, such as  [17]. We therefore believe that in the low budget setting, making the search space smaller is also important in terms of the regret for Bayesian optimization in general.

In this study, we try to improve the performance of Bayesian optimization with the low budget setting by introducing a heuristic method that refines a given search space. We assume that we have an arbitrary hypercube (: the number of dimensions) as a search space. Our method refines the search space by division, and outputs a region considered to be promising. As a result, Bayesian optimization can be executed with the refined search space as the initial search space instead of the original search space .

3.1 Integrating with Bayesian Optimization

Algorithm 2 shows Bayesian optimization with the proposed method. This method calculates the budget for refining the search space from the whole budget (line 1), refines the promising search space (line 2), performs optimization with the search space refined in the line 2 as the initial search space (line 3). We will describe (line 2) in Section 3.2.2.

0:  budget , search space , the number of dimension , ratio for refining
1:  
2:  
3:  Bayesian optimization with the search space until a budget reach
Algorithm 2 Bayesian optimization with the proposed method

Figure 1 shows a conceptual design of the proposed method. In the Figure 1 on the right, Bayesian optimization is executed on the search space which has been refined by the proposed method.

Figure 1: Conceptual design of Bayesian optimization with the proposed method. The figure on the left shows Bayesian optimization without the proposed method, and the figure on the right shows Bayesian optimization with the proposed method. The blue balls are the points sampled by Bayesian optimization and the red balls are the points sampled by the proposed method. The gray region on the right shows the discarded region by the proposed method.

3.2 Refining the Search Space

3.2.1 Calculation of the Budget

Corresponding to the whole budget , we set the budget used for the proposed method to (in Algorithm 2, line 1). We calculate by with respect to the number of dimension . If the evaluation budget increases to infinity (that is, ), there is no need for refining the search space (that is, ). We note that is maximum budget for the proposed method, not used necessarily in fact; is used for determining the division number. We show the details about how is used in Section 3.2.3.

3.2.2 Algorithm

The proposed method refines the promising region by dividing the region at equal intervals for each dimension. Figure 2 shows that refining the search space by the proposed method when the number of dimensions is and the division number is . The proposed method randomly selects a dimension without replacement, divides the region corresponding to the dimension into pieces, leaves only the region where the evaluation value of the center point of the divided region is the best. The proposed method repeats this operation until the division of the regions corresponding to all dimensions is completed.

Figure 2: Refining the search space by the proposed method for the number of dimensions and the division number . The proposed method divides the region corresponding to a certain dimension into pieces, and evaluates the center points. The proposed method leaves the region where the center point is the best, divides the region corresponding to another certain dimension again into pieces, and repeats this.

Algorithm 3 shows the algorithm of refining the search space. We denote the set of integers between and (including and ) by throughout the paper. We describe how to set the division number (in Algorithm 3, line 1) in the next section.

0:  the number of dimension , search space , budget for refining
1:  
2:  if  then
3:     return {need not divide in this case}
4:  end if
5:  initialization :
6:  for  to  do
7:     randomly select an index without replacement from index set of dimensions
8:     divide into with respect to dimension
9:     , is a center point of
10:     update the search space :
11:  end for
12:  return
Algorithm 3

3.2.3 Division Number

We need to set the division number to adjust how much the search space is refined. If we set the division number to an even number, the evaluation budget for refining is calculated by . However, when is an odd number, the evaluation budget for refining is calculated by because the center point of the search space refined before can be reused for the next evaluation. Therefore, we set the division number to an odd number so that the evaluation budget for refining approaches most closely according to Equation (9).

(9)

4 Experiments

In this section, we assess the performance of the proposed method through the benchmark functions and the hyperparameter optimization of machine learning algorithms to confirm the effectiveness of the proposed method in the low budget setting.

4.1 Baseline Methods

We use GP-EI (Bayesian optimization using Gaussian process as the surrogate model and the EI as the acquisition function), TPE [1] and SMAC [12] as the baseline methods of Bayesian optimization. In this experiment, we refer to GP-EI with the proposed method as Ref+GP-EI. Likewise, we refer to TPE and SMAC with the proposed method as Ref+TPE and Ref+SMAC, respectively. We use the GPyOpt111https://github.com/SheffieldML/GPyOpt, Hyperopt222https://github.com/hyperopt/hyperopt and SMAC3333https://github.com/automl/SMAC3 library to obtain the results for GP-EI, TPE and SMAC, respectively. We set the parameters of GP-EI, TPE, and SMAC to the default values of each library and use the center point of the search space as the initial starting point for SMAC. We also experiment with SOO [21] and BaMSOO [30], which are search-space division algorithms, in order to confirm the validity of the refinement of the search space by the proposed method.

4.2 Benchmark Functions

In the first experiment, we assess the performance of the proposed method on the benchmark functions that are often used in black-box optimization. Table 1 shows the six benchmark functions used in this experiment.

Name Definition Dim Search Space
Sphere
-tablet
RosenbrockChain
Branin
           
Shekel
Hartmann
Table 1: Name, definition formula, number of dimension, and search space of the benchmark functions. The coefficients appearing in the Branin, Shekel, and Harmann functions are shown in [27].

4.2.1 Experimental Setting

We run 50 trials for each experiment, and we set the evaluation budget to in each trial. We assess the performance of each method using the mean and standard error of the best evaluation values in 50 trials.

In SOO and BaMSOO, we set the division numner , which is the same setting in  [29]. For BaMSOO, we use the Matrn kernel, which is one of the common kernel functions. This equation is given by where . We set the initial hyperparameters to and and update them by maximizing the data likelihood after each iteration.

4.2.2 Results

Figure 3 and Table 2 show the mean and standard error of the best evaluation values in 50 trials on the six benchmark functions. Ref+GP-EI and BaMSOO show competitive performance in RosenbrockChain and Branin function, but Ref+GP-EI shows better performance than all the other methods in other benchmark functions. Furthermore, Ref+GP-EI, Ref+TPE and Ref+SMAC outperform GP-EI, TPE and SMAC in all the benchmark functions, respectively.

Figure 4 shows the typical behavior of each method on the Hartmann function. Ref+GP-EI, Ref+TPE and Ref+SMAC sample many points with good evaluation values after refining the search space whereas other methods have not been able to sample points sufficiently with good evaluation values even at the end of the search.

(a) Sphere
(b) -tablet
(c) RosenbrockChain
(d) Branin
(e) Shekel
(f) Hartmann
Figure 3: Sequences of the mean and standard error of the best evaluation values on the benchmark functions. The x-axis denotes the number of evaluations and the y-axis denotes the mean and standard error of the best evaluation values (averaged over 50 trials).
Figure 4: Typical behavior of each method on the Hartmann function. The x-axis denotes the number of evaluations and the y-axis denotes the evaluation value. The black dotted line represents the moment that the proposed method refines the search space.
Problem GP-EI Ref+GP-EI TPE Ref+TPE SMAC Ref+SMAC SOO BaMSOO
Sphere
-tablet
Rosen
Branin
Shekel
Hartmann
Table 2: Mean and standard error of the best evaluation values on the benchmark functions. The bold line shows the best mean in all the methods. The values in the -tablet function and the RosenbrockChain function are multiplied by and with the original values, respectively.

4.3 Hyperparameter Optimization

In the second experiment, we assess the performance of the proposed method on the hyperparameter optimization of machine learning algorithms. We experiment with the following three machine learning algorithms, that are with the low budget setting in many cases.

  • MLP

  • CNN

  • LightGBM [18]

Table 4 shows the four hyperparameters of MLP and their respective search spaces. The MLP consists of two fully-connected layers and SoftMax at the end. We set the maximum number of epochs during training to 20, and the mini-batch size to 128. We use the MNIST dataset that has pixel grey-scale images of digits, each belonging to one of ten classes. The MNIST dataset consists of training images and testing images. In this experiment, we split the training images into the training dataset of images and the validation dataset of images.

The CNN consists of two convolutional layers with batch normalization and SoftMax at the end. Each convolutional layer is followed by a max-pooling layer. The two convolutional layers are followed by two fully-connected layers with ReLU activation. We use the same hyperparameters and search spaces used by the MLP problem above (Table 4). We set the maximum number of epochs during training to 10, and the mini-batch size to 128. We use the MNIST dataset and split like the MLP problem.


Hyperparameter Search Space learning rate of SGD momentum of SGD num of hidden nodes dropout rate Hyperparameter Search Space learning rate colsample bytree reg lambda max depth
Table 3: Details of four hyperparameters of MLP and CNN optimized on MNIST dataset.
Table 4: Details of four hyperparameters of LightGBM optimized on Breast Cancer Wisconsin dataset.

Table 4 shows the four hyperparameters of LightGBM and their respective search spaces. We use the Breast Cancer Wisconsin dataset [6] that consists of data instances. In the experiment using this dataset, we use of the data instances as the training dataset, and the evaluation value is calculated using -fold cross validation.

4.3.1 Experimental Setting

We run 50 trials for each experiment, and we set the evaluation budget to in each trial. For all experiments, we use the misclassification rate on the validation dataset as the evaluation value. For all the problems, we regard the integer-valued hyperparameters as continuous variables by using rounded integer values when evaluating.

4.3.2 Results

Figure 5 and Table 5 show the mean and standard error of the best evaluation values for 50 trials on the hyperparameter optimization of the three machine learning algorithms. Similar to the experiment of the benchmark functions, Ref+GP-EI, Ref+TPE and Ref+SMAC outperform GP-EI, TPE and SMAC in all the hyperparameter optimization of machine learning algorithms, respectively. Likewise, Ref+GP-EI, Ref+TPE and Ref+SMAC show equal or better performance to SOO and BaMSOO in all problems.

(a) MLP with MNIST dataset
(b) CNN with MNIST dataset
(c) LightGBM with Breast Cancer Wisconsin dataset
Figure 5: The sequences of the mean and standard error of the best evaluation values on the hyperparameter optimizations. The x-axis denotes the number of evaluations and the y-axis denotes the mean and standard error of the best evaluation values (averaged over 50 trials).
Problem GP-EI Ref+GP-EI TPE Ref+TPE SMAC Ref+SMAC SOO BaMSOO
MLP
CNN
LightGBM
Table 5: Mean and standard error of the best evaluation values on the hyperparameter optimization of the machine learning algorithms. The bold line shows the best mean in all the methods. The values in each problem are multiplied by with the original values.

5 Conclusion

In this study, we developed a simple heuristic method for Bayesian optimization with the low budget setting. The proposed method refines the promising region by dividing the region at equal intervals for each dimension. By refining the search space, Bayesian optimization can be executed with a promising region as the initial search space.

We experimented with the six benchmark functions and the hyperparameter optimization of the three machine learning algorithms (MLP, CNN, LightGBM). We confirmed that Bayesian optimization with the proposed method outperforms Bayesian optimization alone in all the problems including the benchmark functions and the hyperparameter optimization. Likewise, Bayesian optimization with the proposed method shows equal or better performance to two search-space division algorithms.

In future work, we plan to adapt the proposed method for noisy environments. Real-world problems such as hyperparameter optimization are often noisy; thus, making the optimization method robust is important. Furthermore, because we do not consider the variable dependency at present, we are planning to refine the search space taking the variable dependency into consideration.

References

  • [1] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl (2011) Algorithms for hyper-parameter optimization. In NIPS, Cited by: §1, §2.2.1, §4.1.
  • [2] J. Bergstra and Y. Bengio (2012) Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, pp. 281–305. Cited by: §1.
  • [3] J. Bergstra, D. Yamins, and D. D. Cox (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, Cited by: §2.2.1.
  • [4] K. Chiba, S. Obayashi, K. Nakahashi, and H. Morino (2005) High-fidelity multidisciplinary design optimization of wing shape for regional jet aircraft. In Evolutionary Multi-Criterion Optimization, pp. 621–635. Cited by: §1.
  • [5] E. Contal and N. Vayatis (2014) Gaussian process optimization with mutual information. In ICML, Cited by: §2.2.1.
  • [6] D. Dheeru and E. Karra Taniskidou (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. External Links: Link Cited by: §4.3.
  • [7] T. Domhan, J. T. Springenberg, and F. Hutter (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In IJCAI, Cited by: §1.
  • [8] K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown (2013) Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NeurIPS workshop on Bayesian Optimization in Theory and Practice, Cited by: §1.
  • [9] S. Falkner, A. Klein, and F. Hutter (2018) BOHB: robust and efficient hyperparameter optimization at scale. In ICML, Cited by: §1.
  • [10] P. Frazier, W. Powell, and S. Dayanik (2009) The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing 21, pp. 599–613. Cited by: §2.2.1.
  • [11] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley (2017) Google vizier: a service for black-box optimization. In KDD, Cited by: §1.
  • [12] F. Hutter, H. H. Hoos, and K. Leyton-Brown (2011) Sequential model-based optimization for general algorithm configuration. In LION, Cited by: §2.2.1, §4.1.
  • [13] I. Ilievski, T. Akhtar, J. Feng, and C. A. Shoemaker (2017) Efficient hyperparameter optimization for deep learning algorithms using deterministic rbf surrogates.. In AAAI, Cited by: §1, §1.
  • [14] H. J. Kushner (1964) A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering 86, pp. . Cited by: §2.2.1.
  • [15] D. R. Jones, C. D. Perttunen, and B. E. Stuckman (1993) Lipschitzian optimization without the lipschitz constant. Journal of Optimization Theory and Applications 79 (1), pp. 157–181. Cited by: §2.2.2.
  • [16] D. R. Jones, M. Schonlau, and W. J. Welch (1998) Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global optimization 13, pp. 455–492. Cited by: §2.1.2, §2.2.1.
  • [17] K. Kandasamy, G. Dasarathy, J. B. Oliva, J. Schneider, and B. Poczos (2016) Gaussian process bandit optimisation with multi-fidelity evaluations. In ICML, Cited by: §3.
  • [18] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Liu (2017) Lightgbm: a highly efficient gradient boosting decision tree. In NIPS, Cited by: 3rd item.
  • [19] A. Klein, S. Falkner, S. Bartels, P. Hennig, and F. Hutter (2017) Fast bayesian optimization of machine learning hyperparameters on large datasets. In AISTATS, Cited by: §1.
  • [20] L. Li, K. G. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar (2017) Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, pp. 185:1–185:52. Cited by: §1.
  • [21] R. Munos (2011) Optimistic optimization of a deterministic function without the knowledge of its smoothness. In NIPS, Cited by: §1, §2.2.2, §4.1.
  • [22] V. Nguyen, S. Gupta, S. Rana, C. Li, and S. Venkatesh (2017) Bayesian optimization in weakly specified search space. In ICDM, Cited by: §2.2.1, §3.
  • [23] C. Oh, E. Gavves, and M. Welling (2018) BOCK : Bayesian optimization with cylindrical kernels. In ICML, Cited by: §2.2.1.
  • [24] C. E. Rasmussen and C. K. I. Williams (2005) Gaussian processes for machine learning. The MIT Press. Cited by: §2.1.1.
  • [25] J. Snoek, H. Larochelle, and R. P. Adams (2012) Practical bayesian optimization of machine learning algorithms. In NIPS, Cited by: §1, §2.2.1.
  • [26] N. Srinivas, A. Krause, S. Kakade, and M. Seeger (2010) Gaussian process optimization in the bandit setting: no regret and experimental design. In ICML, Cited by: §2.2.1.
  • [27] S. Surjanovic and D. Bingham (2019) Virtual library of simulation experiments: test functions and datasets. Note: Retrieved April 15, 2019, from http://www.sfu.ca/~ssurjano Cited by: Table 1.
  • [28] K. J. Swersky (2017) Improving bayesian optimization for machine learning using expert priors. In PhD thesis, Cited by: §2.2.1.
  • [29] M. Valko, A. Carpentier, and R. Munos (2013) Stochastic simultaneous optimistic optimization. In ICML, Cited by: §4.2.1.
  • [30] Z. Wang, B. Shakibi, L. Jin, and N. Freitas (2014) Bayesian Multi-Scale Optimistic Optimization. In AISTAT, Cited by: §1, §2.2.2, §4.1.
  • [31] M. Wistuba, N. Schilling, and L. Schmidt-Thieme (2015) Hyperparameter search space pruning – a new component for sequential model-based hyperparameter optimization. In ECML, Cited by: §2.2.1.
  • [32] C. Yang, S. Kurahashi, K. Kurahashi, I. Ono, and T. Terano (2009) Agent-based simulation on women’s role in a family line on civil service examination in chinese history. J. Artificial Societies and Social Simulation 12. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
398562
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description