Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit This work is supported by the National Key R&D Program of China (2017YFB1001903), NSFC (61876077), Jiangsu SF (BK20170013), and Collaborative Innovation Center of Novel Software Technology and Industrialization. Yang Yu is the corresponding author.

Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit thanks: This work is supported by the National Key R&D Program of China (2017YFB1001903), NSFC (61876077), Jiangsu SF (BK20170013), and Collaborative Innovation Center of Novel Software Technology and Industrialization. Yang Yu is the corresponding author.

Yi-Qi Hu    Yang Yu    Jun-Da Liao \affiliationsNational Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China \emails{huyq, yuy},

An automatic machine learning (AutoML) task is to select the best algorithm and its hyper-parameters simultaneously. Previously, the hyper-parameters of all algorithms are joint as a single search space, which is not only huge but also redundant, because many dimensions of hyper-parameters are irrelevant with the selected algorithms. In this paper, we propose a cascaded approach for algorithm selection and hyper-parameter optimization. While a search procedure is employed at the level of hyper-parameter optimization, a bandit strategy runs at the level of algorithm selection to allocate the budget based on the search feedbacks. Since the bandit is required to select the algorithm with the maximum performance, instead of the average performance, we thus propose the extreme-region upper confidence bound (ER-UCB) strategy, which focuses on the extreme region of the underlying feedback distribution. We show theoretically that the ER-UCB has a regret upper bound with independent feedbacks, which is as efficient as the classical UCB bandit. We also conduct experiments on a synthetic problem as well as a set of AutoML tasks. The results verify the effectiveness of the proposed method.

Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit **footnotemark: *

Yi-Qi Hu , Yang Yu , Jun-Da Liao

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

{huyq, yuy},

1 Introduction

Algorithm selection and hyper-parameter optimization are core parts of automatic machine learning (AutoML). Previously, AutoML approaches often define the search space as the algorithm selection space [???], hyper-parameter space [??], or the joint of the both spaces (CASH problem) [??]. While the joint space allows a more thorough search that could cover potentially better configurations, the huge space is a barrier to effective search in limited time. Moreover, the joint space can be quite redundant when considering only one of the algorithms, since the hyper-parameters of the other algorithms are irrelevant. Therefore, the joint space contains redundancy or even can be misleading.

The cascaded algorithm selection can have levels [?]. The first level is on the hyper-parameter optimization. It only needs to focus on the selected algorithm, but not the hyper-parameters of all algorithms. The second level is on the algorithm selection. However, previous methods in this kind commonly carry out a full hyper-parameter optimization on the candidate algorithms, making the slow and expensive algorithm evaluations.

In this paper, we propose a cascaded algorithm selection approach to avoid a full-space hyper-parameter optimization. The hyper-parameter optimization usually employs some stepping search methods, which can be paused after every search step, and can also be resumed. The selection receives feedback and allocates the next search step to one of the algorithms. Thus, the cascaded algorithm selection is naturally to be modeled as a multi-armed bandit problem [?]. However, most of the classical bandits maximize the average feedbacks. In the AutoML, however, only the best feedback matters. A variant of the bandit, the extreme bandit [?], can model this situation, which tries to identify the arm with the maximize (or equivalently minimize) feedback value. However, as the extreme bandit follows the extreme distribution, it is not only unstable but often require to known the distribution type, making the extreme bandit approach unpractical.

In this paper, we propose the extreme-region UCB bandit (ER-UCB), which focuses on the extreme region of the feedback distributions. Unlike the extreme bandit, ER-UCB considers a region instead of the extreme point, which can lead to a better mathematical condition. Moreover, in machine learning where the test data is commonly different from the train data, the extreme region can be more robust for generalization. With -arms and trials, our analysis proves that ER-UCB has the regret upper bound, which has the same order with the classical UCB strategy. The experiments on synthetic and real AutoML tasks reveal that the ER-UCB can find the best algorithm precisely, and exploit it with the majority of the trial budget.

The rest sections present background & related works, extreme-region UCB bandit, experiments, and conclusion.

2 Background & Related Works

We consider the algorithm selection and hyper-parameter optimization on classification tasks. Let and denote the training and testing datasets. Let denote the algorithm set with candidates. For , denotes a hyper-parameter setting, where is the hyper-parameter space of . Let denote a performance criterion for a configuration , e.g., accuracy, AUC score, etc. The AutoML problem can be formulated as follows:


where and . It is also concludes the CASH problem formulation [?].

Because of the non-convex, non-continuous and non-differentiable properties, derivative-free optimization [??] is usually applied to solve it. For example, a tree-structure based Bayesian optimization (SMAC) [?] is employed on AutoWEKA [?] and AutoSKLEARN [?], the popular open-source AutoML tools. Derivative-free optimization explores search space by sampling and evaluating. But the high time-cost restrains the total number of evaluations on AutoML. With the limited trials, the performance of derivative-free optimization is very sensitive to search space. However, in above formulation, the search space . Obviously, is redundant, because the best configuration is only relevant to the hyper-parameter space of the best algorithm.

Hence, we consider an easier formulation, i.e., optimizing hyper-parameters of algorithms separately:


The hyper-parameter processes can be seen as arms. The algorithm selection level is a multi-armed bandit problem. The bandit is a classical formulation of the resource allocation problem. In [?], the authors formulated the cold-start user recommendation as a multi-armed bandit problem, which user information was unavailable at the beginning. The feedbacks of users has to be obtained by trials. In this situation, the bandit concerns more about the average feedback of arms. In [?], the authors proposed the max -armed bandit, which focused on the maximum feedback of trials. But it assumed that the reward distribution was a Gaussian distribution, and it was designed for the heuristic search, in which more than one arms can be selected at a trial step.

In this paper, we customize the extreme-region UCB (ER-UCB) bandit for AutoML problems.

3 Extreme-Region UCB Bandit

In this section, we present details of the ER-UCB: the bandit formulation for AutoML, the deduction of the ER-UCB strategy and the theoretical analysis on the ER-UCB strategy.

3.1 Bandit formulation for AutoML

In the classical multi-armed bandit, feedbacks of an arm obey an underlying distribution. In this paper, we employ the random search on the hyper-parameter optimization. A trial in a model is uniformly sampling hyper-parameters from , and its performance is the feedback of this trial. Thus, , where denote a feedback of a trial on , and is the underlying performance distribution of . Because of the random search, is fixed. With algorithm candidates, let denote the performance distribution set. The -armed bandit formulation for AutoML is: at the -th trial, the is selected from algorithm candidates, and get a feedback independently from .

3.2 Deduction

In AutoML tasks, the selected algorithm is required to have maximum performances. For this requirement, we present the extreme-region target for the proposed bandit. Then, we show the deduction details of extreme-region UCB strategy.

Extreme-region target

The target of the hyper-parameter optimization is to find the hyper-parameters which have the maximum performance. In the bandit, with a fixed , we want the probability as large as possible. With the Chebyshev inequality: , let ,


In other words, with the same fixed probability upper bound , the best arm selection is:


With the given and , the ground-truth selection strategy is (4). But, when facing the unknown distributions, we have to estimate the expectation and variance based on the observations. With the Markov inequality, it is easy to relate the expectation with its estimation. But for variance, it is hard to find the relationship. With the variance definition:


Because is the expectation of the random variable . The Markov inequality can be applied to it easily. And can partly represent according to (5). Thus, we try to replace with :


Comparing with (4), (6) magnifies the effect of expectation item on selection strategy. To tackle this issue, we introduce a hyper-parameter , and construct a new random variable . Furthermore, let , and . Thus, the extreme-region target is:


We prove that it can reduce the effect of expectation on algorithm selection by introducing into :


According to definitions of , and ,


Comparing with (4), because of , the item of expectation is reduced, but the item of variance stays the same. It concludes the proof. ∎

Extreme-region UCB strategy

We apply the upper confidence bound (UCB) strategy on the extreme-region target. In this paper, we assume that the random variables satisfy the following moment condition. There exists a convex function on the reals, for all ,


If we let and , (9) is known as Hoeffding’s lemma. We apply this assumption to construct an upper bound for the estimated expectations at some fixed confidence level. Let denote the Legendre-Fenchel transform of . With observations of , let and denote the estimated expectations of and . Only for with a fixed , using the Markov inequality:


The same deduction for , and is a monotonically increasing function:


Because , and let . With the union bound, we combine and as follows:


Let . With the probability at least ,


Within total trials, let denote the number that the -th arm is selected, and . -ER-UCB strategy is:


and are the exploitation and exploration items. With Hoeffding’s lemma, taking , then, . And let . The exploration can be re-written as:


Thus, the Hoeffding’s ER-UCB strategy is:


Because on AutoML, the exploitation item is often much smaller than the exploration item. To further exploration and exploitation trade-off, we introduce a hyper-parameter . The practical Hoeffding’s ER-UCB strategy is:


inp : model candidates;
inp : hyper-parameter spaces of models;
inp : hyper-parameters;
inp : trial budget;
inp : train dataset of task;
inp : uniform sample sub-procedure;
inp Evaluate: evaluation sub-procedure.

1:  for  to  do
4:  end for
5:  for  to  do
6:     get index according to (17)
11:  end for
12:  return  the hyper-parameters with the best .
Algorithm 1 Extreme-region UCB Bandit

The cascaded algorithm selection and hyper-parameter optimization with ER-UCB bandit is presented at Algorithm 1. Line 2 and 7 are the procedures of uniformly sampling hyper-parameters for the selected algorithm and obtaining the feedbacks. Line 1 to 4 are the initialization steps. In the main loop (line 5 to 10), the algorithm is selected by the ER-UCB strategy (line 6). Line 7 to 9 are the procedures for updating the exploitation item for the selected algorithm.

We have to discuss the hyper-parameters, i.e., , and for the ER-UCB bandit. is employed to control the space size of the extreme region. It is usually a small real number, e.g., 0.1 or 0.01. is the exploration-and-exploitation trade-off hyper-parameter. In AutoML tasks, is used to magnify the exploitation item. Thus, it is usually a big number such as 10 or 20. is applied to reduce the impact of expectation item in the selection strategy. It should be tuned according to tasks. In experiments, we will investigate them empirically.

3.3 Theoretical Analysis

We present the analysis of the upper bound for -ER-UCB strategy (3.2) and the Hoeffding’s ER-UCB strategy (15) on the extreme-region regret. For the arbitrary arm and a fixed , we define . Thus, . According to (7), let , thus , and . We assume by choosing an appropriate . The extreme-region regret is the Definition 1.

Definition 1 (Extreme-region regret).

At -th trial, event A is the number of times that occurs, and event B is the number of times that occurs with a given strategy. The extreme-region regret is:

Introducing and , The extreme-region regret can be re-written as:


We can prove the following simple upper regret bound for -ER-UCB strategy:

Theorem 1 (Regret of -Er-Ucb).

Assume the feedback distribution of arbitrary arm satisfy (9). With , -ER-UCB satisfies:

Due to the limitation of paper length, we present the proof details in our supplementary material. Based on Theorem 1, we can easily prove the extreme-region regret of the Hoeffding’s ER-UCB strategy:

Corollary 1 (Regret of Hoeffding’s ER-UCB).

Assume the feedback distribution of arbitrary arm satisfy (9). With , Hoeffding’s ER-UCB satisfies:

According to the theoretical analysis, the ER-UCB bandit has upper bound on the extreme-region regret.

a.1 HP study for a.2 HP study for a.3 HP study for b. Regret study
Figure 1: Illustrations of the results for the synthetic experiment. Figure a.1, a.2, a.3 are the results of the ER-UCB hyper-parameter studies, which illustrate the exploitation rates under the different hyper-parameter settings. The green line of those three figures is the result of arm which is the best arm. The red line is the result of arm which has the largest feedback expectation. Figure b shows the results that the number of event occurs changes with the number of trials increases. The red dash line is the expectation of the ground-truth strategy. The green line is the proposed ER-UCB strategy.

4 Experiments

In the experiment section, we empirically investigate the effectiveness of the ER-UCB bandit on some synthetic and real-world AutoML tasks. Some state-of-the-art bandit strategies are selected as the compared methods, including the classical UCB (C-UCB) [?], -greedy [?], softmax strategy [?] and random strategy which allocates the budget by selecting arms randomly. In addition, we apply the random search on the joint hyper-parameter spaces of all algorithms (Joint) to compare with the cascaded hyper-parameter optimization.

4.1 Synthetic problem

We construct a 7-armed bandit problem in this section. The feedbacks obey Gaussian distributions with different expectations and variances: , , , , , , . The best arm is not only related with the expectation, but also influenced by the variance. Obviously, it is more likely to obtain the best feedback by exploiting in , in other words, . We study on the three hyper-parameters of ER-UCB firstly, and then compare the ER-UCB with other methods.

Hyper-parameter study

We investigate the , and for the ER-UCB. With fixed two of them, we study another one: with fixed , , we study ; with fixed , , we study ; with fixed , , we study . For every hyper-parameter, we evenly sample 1000 settings from the setting region. The core problem we care about is how the methods allocate budget to arms. Let define the exploitation rate for arm . Large means the large number of trials that the arm is selected. The trial budget is set as 1000. The experiment for every hyper-parameter setting is repeated for 3 times independently, and the average results are presented.

Figure 1:a.1, 2 and 3 show the study results of , and . The arm is the best selection. Thus, the larger the better. For (Figure 1:a.1), the green line () is approaching 1 when nears by 0. In practice, should be set as a small value. For (Figure 1:a.2), when is small, the exploitation rates of arms are similar. And the green line is increasing during is increasing. It means that the small encourages exploration and the large encourages exploitation according to the observations. For (Figure 1:a.3), the exploitation rates are sensitive to when is around the expectations of reward distributions. Thus, should be carefully tuned according to different tasks.

ER-UCB 0.02 1,1,1 1,1,1 0.01
C-UCB 0.940.01 1,7,6 7,7,7 0.010.01
-Greedy 0.980.04 6,1,1 6,1,6 0.310.42
Softmax 1.010.01 1,1,1 7,7,1 0.180.01
Random 1.000.05 1,1,1 4,1,6 0.150.01
Table 1: The performance summary of compared bandit strategies on the synthetic problem. is the average best feedback for three independent runnings. is the arm index that the best feedback is from in each of runnings. is the arm index that the strategy allocates the most budget to in each of runnings. The number in bold means the best performance.

Investigation with compared methods

According to the hyper-parameter study results of the ER-UCB, we set , , , and compare it with the C-UCB, -greedy (), Softmax strategy () and random selection strategy. The trial budget is 1000. Every experiment is repeated for 3 times independently. The average performances are presented in Table 1.

Table 1 shows that the ER-UCB outperforms the compared methods. Furthermore, the ER-UCB can find the best arm (arm ) and allocate most of budget to it ( and average is 0.9). Because the C-UCB depends only on mean observations to make decisions. It wrongly allocates budget to arm (). The of -greedy is very unstable. It means -greedy can’t find the best arm effectively. In general, the ER-UCB can effectively discover the best-arm and reasonably allocate budget to exploration and exploitation in this synthetic problem.

4.2 Real-word AutoML tasks

We apply the ER-UCB to solve the real-world classification tasks. We select 10 frequently-used algorithms as the candidates from SKLEARN [?], including DecisionTree (DT), AdaBoost (Ada), QuadraticDiscriminantAnalysis (QDA), GaussianNB (GNB), BernoulliNB (BNB), K-Neighbors (KN), ExtraTree (ET), PassiveAggressive (PA), RandomForest (RF) and SGD. And 12 classification datasets from UCI are selected as AutoML tasks. The evaluation criterion of each configuration is the accuracy score. The compared methods are C-UCB, -greedy (), Softmax strategy (), random strategy and Joint. The trial budget is 1000. We set , for the ER-UCB on all datasets. The is set according to the tasks, and showed in Table 2. For each method and each dataset, we run every experiment 3 times independently, and the average performances of our experiment are presented. In addition, we apply the random search with 1000 trials to explore on every algorithm candidate. According to (2), we can find out the best ground-truth algorithm for the datasets.

Dataset Methods V-Eval B. Alg. T-Eval Dataset Methods V-Eval B. Alg. T-Eval
C-UCB .8924 .1067 PA .8339 C-UCB .8690 .1173 RF .6416
-Greedy .8931 .0693 SGD .8227 -Greedy .8630 .0277 RF .6657
Softmax .9004 .1287 SGD .8809 Softmax .8620 .1163 DT .6551
Random .8978 .1097 SGD .8597 Random .8628 .1053 RF .6839
Joint .8978 - SGD .8994 Joint .8619 - RF .8604
ER-UCB Ada .8515
C-UCB .9414 .1693 Ada C-UCB .7172 .1117 ET .5534
-Greedy .9492 .3593 Ada .7510 -Greedy .6528 .0443 ET .5006
Softmax .9414 .1290 PA .8229 Softmax .6866 .0977 ET .5749
Random .9464 .1070 PA .6999 Random .6977 .0990 ET .5534
Joint .9457 - SGD .8479 Joint .6531 - QDA .5687
C-UCB .8745 .2440 RF .8333 C-UCB .7265 .1520 RF .6370
-Greedy .8129 .3023 RF .8809 -Greedy .7163 .0017 RF .6148
Softmax .8728 .1630 RF .8809 Softmax .7197 .1180 RF .6148
Random .8695 .1037 RF .8762 Random .7247 .1027 RF .6666
Joint .8549 - RF .8714 Joint .7087 - RF .6444
ER-UCB SGD .7330
C-UCB .7297 .1133 SGD .7272 C-UCB .7871 .1370 RF .6688
-Greedy .7362 .0157 SGD .7272 -Greedy .7201 .0607 Ada .6457
Softmax .7402 .1163 SGD Softmax .8010 .1150 Ada .6238
Random .7406 .1027 SGD .7330 Random .8039 .1100 Ada .6631
Joint .7399 - SGD .7316 Joint .7884 - Ada .6304
ER-UCB Ada .9449
C-UCB .9298 .1757 Ada .9457 C-UCB .9779 .1950 RF .9696
-Greedy .9311 .7333 Ada -Greedy .9790 .8047 RF .9703
Softmax .9298 .1253 Ada .9471 Softmax .9768 .1397 RF .9660
Random .9306 .1057 Ada .9500 Random .9776 .1203 RF .9696
Joint .9290 - RF .9428 Joint .9793 - RF .9464
ER-UCB Ada .9681
C-UCB .9808 .1397 Ada .9710 C-UCB .9820 .1200 RF .9320
-Greedy .9816 .8757 Ada .9681 -Greedy .9813 .3567 Ada .9427
Softmax .9794 .1207 Ada Softmax .9819 .1097 RF .9267
Random .9794 .1060 Ada Random .9821 .1103 RF .9347
Joint .9794 - Ada .9594 Joint .9821 - DT .9320
Table 2: The average performances on AutoML tasks including the best validation accuracy (V-Eval), the exploitation rate on the ground-truth best algorithm (), the best-selected algorithm (B. Alg.) and the test accuracy (T-Eval). The items under the dataset name are the ground-truth algorithm and the setting for the ER-UCB. The number in bold means the best performance in compared methods.

The average performances of the compared methods on all 12 datasets are presented in Table 2. From Table 2, we can get the following empirical conclusions:

  • “No free lunch” has been proved again in those experiments. The best performance algorithms are different in different datasets. Particularly, tree-based ensemble algorithms, e.g., AdaBoost, RandomForest, etc, show the outstanding performance in most of the datasets. It indicates that the algorithm selection is necessary for making search hyper-parameters easier.

  • The cascaded algorithm selection and hyper-parameter optimization are necessary for making the search problem easier to solve. Comparing the random strategy with the Joint, the random strategy beats the Joint on most of the datasets (8/12). It indicates that the large search space provides more difficult for optimization.

  • It will mislead the strategy to select wrong algorithms only according to the average performance. In Table 2, the random strategy is not always bad in datasets. The strategies, such as C-UCB, -greedy and Softmax, which focus on the average performance are easy to select wrong algorithms which average performances are good.

  • The proposed ER-UCB bandit strategy can effectively find out the best performance algorithm (B. Alg. is the ground-truth algorithm on 9/12 datasets), and reasonably allocate the trial budget to the best algorithm (ER-UCB gets the highest on 12/12 datasets).

5 Conclusion

This paper proposes the extreme-region upper confidence bound (ER-UCB) bandit for the cascaded algorithm selection and hyper-parameter optimization. we employ the random search in the hyper-parameter optimization level. The level of algorithm selection is formulated as a multi-armed bandit problem. The bandit strategies are applied to allocate the limited search budget to the hyper-parameter optimization processes on algorithm candidates. However, the algorithm selection focuses on the algorithm with the maximum performance but not the average performance. To tackle this, we propose the extreme-region UCB (ER-UCB) strategy, which selects the arm with the largest extreme region of the underlying distribution. The theoretical study shows that the ER-UCB has extreme-region regret upper bound, which has the same order with the classical UCB strategy. The experiments on synthetic and real-world AutoML problems empirically verify that the ER-UCB can precisely discover the algorithm with the best performance, and reasonably allocate the trial budget to the algorithm candidates.


  • [Adankon and Cheriet, 2009] Mathias M Adankon and Mohamed Cheriet. Model selection for the LS-SVM. application to handwriting recognition. Pattern Recognition, 42(12):3264–3270, 2009.
  • [Auer et al., 2002] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235–256, 2002.
  • [Biem, 2003] Alain Biem. A model selection criterion for classification: Application to hmm topology optimization. In Proceedings of the 7th International Conference on Document Analysis and Recognition, pages 104–108, 2003.
  • [Brazdil et al., 2003] Pavel B Brazdil, Carlos Soares, and Joaquim Pinto Da Costa. Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3):251–277, 2003.
  • [Bubeck et al., 2012] Sébastien Bubeck, Nicolo Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
  • [Carpentier and Valko, 2014] Alexandra Carpentier and Michal Valko. Extreme bandits. In Advances in Neural Information Processing Systems, pages 1089–1097, 2014.
  • [Cicirello and Smith, 2005] Vincent A Cicirello and Stephen F Smith. The max k-armed bandit: A new model of exploration applied to search heuristic selection. In Proceedings of the 20th AAAI Conference on Artificial Intelligence, pages 1355–1361, 2005.
  • [Felício et al., 2017] Crícia Z Felício, Klérisson VR Paixão, Celia AZ Barcelos, and Philippe Preux. A multi-armed bandit model selection for cold-start user recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pages 32–40. ACM, 2017.
  • [Feurer et al., 2015] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost  , Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems, pages 2962–2970, 2015.
  • [Hu et al., 2017] Yi-Qi Hu, Hong Qian, and Yang Yu. Sequential classification-based optimization for direct policy search. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 2029–2035, 2017.
  • [Hu et al., 2018] Yi-Qi Hu, Yang Yu, and Zhi-Hua Zhou. Experienced optimization with reusable directional model for hyper-parameter search. In Proceeding of the 27th International Joint Conference on Artificial Intelligence, pages 2276–2282, 2018.
  • [Hu et al., 2019] Yi-Qi Hu, Yang Yu, Wei-Wei Tu, Qiang Yang, Yuqiang Chen, and Wenyuan Dai. Multi-fidelity automatic hyper-parameter tuning via transfer series expansion. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019.
  • [Hutter et al., 2011] Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. LION, 5:507–523, 2011.
  • [Jamieson and Talwalkar, 2016] Kevin G Jamieson and Ameet Talwalkar. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 240–248, 2016.
  • [Pedregosa et al., 2011] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • [Sutton and Barto, 2018] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
  • [Thornton et al., 2013] Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 847–855, 2013.
  • [Tokic and Palm, 2011] Michel Tokic and Günther Palm. Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In Annual Conference on Artificial Intelligence, pages 335–346. Springer, 2011.
  • [Yu et al., 2016] Yang Yu, Hong Qian, and Yi-Qi Hu. Derivative-free optimization via classification. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 2286–2292, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description