A simple parameterfree and adaptive approach to optimization under a minimal local smoothness assumption
Abstract
We study the problem of optimizing a function under a budgeted number of evaluations. We only assume that the function is locally smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of noise of the function evaluation and 2) the local smoothness, , of the function. A smaller results in smaller optimization error. We come with a new, simple, and parameterfree approach. First, for all values of and , this approach recovers at least the stateoftheart regret guarantees. Second, our approach additionally obtains these results while being agnostic to the values of both and . This leads to the first algorithm that naturally adapts to an unknown range of noise and leads to significant improvements in a moderate and lownoise regime. Third, our approach also obtains a remarkable improvement over the stateoftheart SOO algorithm when the noise is very low which includes the case of optimization under deterministic feedback (). There, under our minimal local smoothness assumption, this improvement is of exponential magnitude and holds for a class of functions that covers the vast majority of functions that practitioners optimize (). We show that our algorithmic improvement is also borne out in the numerical experiments, where we empirically show faster convergence on common benchmark functions.
1I \DeclareBoldMathCommand\ee \DeclareBoldMathCommand\ff \DeclareBoldMathCommand\gg \DeclareBoldMathCommand\aa \DeclareBoldMathCommand¯b \DeclareBoldMathCommandd \DeclareBoldMathCommand\mm \DeclareBoldMathCommand\pp \DeclareBoldMathCommand\qq \DeclareBoldMathCommandˇv \DeclareBoldMathCommand\VV \DeclareBoldMathCommand\xx \DeclareBoldMathCommandt \DeclareBoldMathCommandXX \DeclareBoldMathCommand\YY \DeclareBoldMathCommand\zz \DeclareBoldMathCommand\ZZ \DeclareBoldMathCommand\MM \DeclareBoldMathCommand\nn \DeclareBoldMathCommand\ssigmaσ \DeclareBoldMathCommand\SSigmaΣ \DeclareBoldMathCommand\OOmegaΩ \DeclareBoldMathCommand\yy \DeclareBoldMathCommand\UU \DeclareBoldMathCommand\ww \DeclareBoldMathCommand\WW \DeclareBoldMathCommandŁL \DeclareBoldMathCommand\ss \DeclareBoldMathCommand§S \DeclareBoldMathCommand\AA \DeclareBoldMathCommand\BB \DeclareBoldMathCommand\CC \DeclareBoldMathCommand\DD \DeclareBoldMathCommandEE \DeclareBoldMathCommand\GG \DeclareBoldMathCommand˝H \DeclareBoldMathCommand¶P \DeclareBoldMathCommand\QQ \DeclareBoldMathCommandRR \DeclareBoldMathCommandXX \DeclareBoldMathCommand\mmuμ \DeclareBoldMathCommand\ones1 \DeclareBoldMathCommand\zeros0 \ShortHeadingssimple approach to optimization under a minimal smoothness assumptionBartlett, Gabillon, Valko \firstpageno1
Satyen Kale and Aurélien Garivier
optimization, tree search, deterministic feedback, stochastic feedback
1 Introduction
In budgeted function optimization, a learner optimizes a function having access to a number of evaluations limited by . For each of the evaluations (or rounds), at round , the learner picks an element and observes a real number , where , where is the noise. Based on , we distinguish two feedback cases:
 Deterministic feedback

The evaluations are noiseless, that is , and . Please refer to the work by de Freitas et al. (2012) for a motivation, many applications, and references on the importance of the .
 Stochastic feedback

The evaluations are perturbed by a noise of range ^{1}^{1}1Alternatively, we can turn the boundedness assumption into a subGaussianity assumption equipped with a variance parameter equivalent to our range .: At any round, is a random variable, assumed to be independent of the noise at previous rounds,
(1)
The objective of the learner is to return an element with largest possible value after the evaluations. can be different from the last evaluated element . More precisely, the performance of the algorithm is the loss (or simple regret),
We consider the case that the evaluation is costly. Therefore, we minimize as a function of . We assume that there exists at least one point such that .
Prior work
Among the large work on optimization, we focus on algorithms that perform well under minimal assumptions as well as minimal knowledge about the function. Relying on minimal assumptions means that we target functions that are particularly hard to optimize. For instance, we may not have access to the gradients of the function, gradients might not be well defined, or the function may not be continuous. While some prior works assume a global smoothness of the function (Pintér, 2013; Strongin and Sergeyev, 2013; Hansen and Walster, 2003; Kearfott, 2013), another line of research assumes only a weak/local smoothness around one global maximum (Kleinberg et al., 2008; Bubeck et al., 2011a). However, within this latter group, some algorithms require the knowledge of the local smoothness such as HOO (Bubeck et al., 2011a), Zooming (Kleinberg et al., 2008), or DOO (Munos, 2011). Among the works relying on an unknown local smoothness, SOO (Munos, 2011; Kawaguchi et al., 2016) represents the stateoftheart for the deterministic feedback. For the stochastic feedback, StoSOO (Valko et al., 2013) extends SOO for a limited class of functions. POO (Grill et al., 2015) provides more general results. We classify the most related algorithms in the following table.
smoothness  deterministic  stochastic 

known  DOO  Zooming, HOO 
unknown  DiRect, SOO, SequOOL  StoSOO, POO, StroquOOL 
Note that, for more specific assumptions on the smoothness, some works study optimization without the knowledge of smoothness: DiRect (Jones et al., 1993) and others (Slivkins, 2011; Bubeck et al., 2011b; Malherbe and Vayatis, 2017) tackle Lipschitz optimization.
Finally, there are algorithms that instead of simple regret, optimize cumulative regret, for example, HOO (Bubeck et al., 2011a) or HCT (Azar et al., 2014). However, none of them adapt to the unknown smoothness and compared to them, the algorithms for simple regret that are able to do that, such as POO or our StroquOOL, need to explore significantly more, which negatively impacts their cumulative regret (Grill et al., 2015; Locatelli and Carpentier, 2018).
Existing tools
Partitionining and nearoptimality dimension: As in most of the previously mentioned work, the search domain is partitioned into cells at different scales (depths), i.e., at a deeper depth, the cells are smaller but still cover all of . The objective of many algorithms is to explore the value of in the cells of the partition and determine at the deepest depth possible in which cell is a global maximum of the function. The notion of nearoptimality dimension characterizes the complexity of the optimization task. We adopt the definition of nearoptimality dimension given recently by Grill et al. (2015) that unlike Bubeck et al. (2011a), Valko et al. (2013), Munos (2011), and Azar et al. (2014), avoids topological notions and does not artificially attempt to separate the difficulty of the optimization from the partitioning. For each depth , it simply counts the number of nearoptimal cells , cells whose value is close to , and determines how this number evolves with the depth . The smaller , the more accurate the optimization should be.
New challenges
Adaptations to different data complexities: As did Bubeck and Slivkins (2012), Seldin and Slivkins (2014), and De Rooij et al. (2014) in other contexts, we design algorithms that demonstrate nearoptimal behavior under datagenerating processes of different nature, obtaining the best of all these possible worlds. In this paper, we consider the two following data complexities for which we bring new improved adaptation.

[leftmargin=.5cm]

nearoptimality dimension : In this case, the number of nearoptimal cells is simply bounded by a constant that does not depend on . As shown by Valko et al. (2013), if the function is lower and upperbounded by two polynomial envelopes of the same order around a global optimum, then . As discussed in the book by Munos (2014, section 4.2.2), covers the vast majority of functions that practitioners optimize and the functions with given as examples in prior work (Bubeck et al., 2011b; Grill et al., 2015; Valko et al., 2013; Munos, 2011) are carefully engineered. Therefore, the case of is of practical importance. However, even with deterministic feedback, the case with unknown smoothness has not been known to have a learner with a nearoptimal guarantee. In this paper, we also provide that. Our approach not only adapts very well to the case and , it also provides an exponential improvement over the state of the art for the simple regret rate.

low or moderate noise regime: When facing a noisy feedback, most algorithms assume that the noise is of a known predefined range, often using hardcoded in their use of upper confidence bounds. Therefore, they can’t take advantage of low noise scenarios. Our algorithms have a regret that scales with the range of the noise , without a prior knowledge of . Furthermore, our algorithms ultimately recover the new improved rate of the deterministic feedback suggested in the precedent case ().
Main results
Improved theoretical results and empirical performance: We consider the optimization under an unknown local smoothness. We design two algorithms, SequOOL for the deterministic case in Section 3, and StroquOOL for the stochastic one in Section 4.

[leftmargin=.5cm]

SequOOL is the first algorithm to obtain a loss under such minimal assumption, with deterministic feedback. The previously known SOO (Munos, 2011) is only proved to achieve a loss of . Therefore, SequOOL achieves, up to log factors, the result of DOO that knows the smoothness. Note that Kawaguchi et al. (2016) designed a new version of SOO, called LOGO, that gives more flexibility in exploring more local scales but it was still only shown to achieve a loss of despite the introduction of a new parameter. Achieving exponentially decreasing regret had previously only been achieved in setting with more assumptions (de Freitas et al., 2012; Malherbe and Vayatis, 2017; Kawaguchi et al., 2015). For example, de Freitas et al. (2012) achieves regret assuming several assumptions, for example that the function is sampled from the Gaussian process with four times differentiable kernel along the diagonal. The consequence of our results is that to achieve rate, none of these strong assumptions is necessary.

StroquOOL recovers, in the stochastic feedback, up to log factors, the results of POO, for the same assumption. However, as discussed later, StroquOOL is a simpler approach than POO with also an associated simpler analysis.

StroquOOL adapts naturally to different noise range, i.e., the various values of .

StroquOOL obtains the best of both worlds in the sense that StroquOOL also obtains, up to log factors, the new optimal rates reached by SequOOL in the deterministic case. StroquOOL obtains this result without being aware a priori of the nature of the data, only for an additional log factor. Therefore, if we neglect the additional log factor, we can just have a single algorithm, StroquOOL, that performs well in both deterministic and stochastic case, without the knowledge of the smoothness in either one of them.

In the numerical experiments, StroquOOL naturally adapts to lower noise. SequOOL obtains an exponential regret decay when on common benchmark functions.
Algorithmic contributions and originality of the proofs
Why does it work? Both SequOOL and StroquOOL are simple and parameterfree algorithms. The analysis is also simple and selfcontained and does not need to rely on results of other algorithms knowing the smoothness. We now explain the reason behind this combined simplicity and efficiency.
Both SequOOL and StroquOOL are based on a new core idea that the search for the optimum should progress strictly sequentially from an exploration of shallow depths (with large cells) to deeper depths (small and localized cells). This is different from the standard approach in SOO, StoSOO, and the numerous extensions that SOO has inspired (Busoniu et al., 2013; Wang et al., 2014; AlDujaili and Suresh, 2018; Qian and Yu, 2016; Kasim and Norreys, 2016; Derbel and Preux, 2015; Preux et al., 2014; Buşoniu and Morărescu, 2014; Kawaguchi et al., 2016). We have identified a bottleneck in SOO (Munos, 2011) and its extensions that open all depths simultaneously (their Lemma ). However, in general, we show that the improved exploration of the shallow depths is beneficial for the deeper depths and therefore, we always complete the exploration of depth before going to depth . As a result, we design a more sequential approach that simplifies our Lemma to the point of being natural and straightforward.
This desired simplicity is also achieved by being the first to adequately leverage the reduced and natural set of assumptions introduced in the POO paper (Grill et al., 2015). This adequate and simple leverage should not conceal the fact that our local smoothness assumption is minimal and already way weaker than global Lipschitzness. Second, this leveraging was absent in the analysis for POO which additionally relies on the 40 pages proof of HOO (see Shang et al., 2018, for a detailed discussion). Our proofs are succinct^{2}^{2}2The proof is even redundantly written twice for StroquOOL and SequOOL for completeness while obtaining performance improvement () and a new adaptation (). To obtain these, in an original way, our theorems are now based on solving a transcendental equation with the Lambert function. For StroquOOL, a careful discrimination of the parameters of the equation leads to optimal rates both in the deterministic and stochastic case.
Intriguingly, the amount of evaluations allocated to each depth follows a Zipf law (Powers, 1998), that is, each depth level is simply pulled inversely proportional to its depth index . This is a simple but not a straightforward idea. It provides a parameterfree method to explore the depths without knowing the bound on the number of optimal cells per depth ( when ) and obtain a maximal optimal depth of order . A Zipf law has been used by Audibert et al. (2010) and AbbasiYadkori et al. (2018) in pureexploration bandit problems but without any notion of depth in the search. In this paper, we introduce the Zipf law to treesearch algorithms.
Another novelty is that of not using upper bounds in StroquOOL (unlike StoSOO, HCT, HOO, POO), which results in the contribution of removing the need to know the noise amplitude.
2 Partition, tree, assumption, and nearoptimality dimension
Partitioning
The hierarchical partitioning we consider is similar to the ones introduced in prior work (Munos, 2011; Valko et al., 2013): For any depth in the tree representation, the set of cells (or nodes) forms a partition of , where is the number of cells at depth . At depth , the root of the tree, there is a single cell . A cell of depth is split into children subcells of depth . As Grill et al. (2015), our work focuses on a notion of nearoptimality dimension that does not directly relate the smoothness property of to a specific metric but directly to the hierarchical partitioning . Indeed, an interesting fundamental question is to determine a good characterization of the difficulty of the optimization for an algorithm that uses a given hierarchical partitioning of the space as its input (see Grill et al., 2015, for a detailed discussion). Given a global maximum of , denotes the index of the unique cell of depth containing , i.e., such that . We follow the work by Grill et al. (2015) and state a single assumption on both the partitioning and the function .
Assumption 1
For any global optimum , there exists and such that ,
For any and , the nearoptimality dimension^{3}^{3}3Grill et al. (2015) define with the constant 2 instead of 3. 3 eases the exposition of our results. of with respect to the partitioning and with associated constant , is
where is the number of cells of depth such that .
Treebased learner
Treebased exploration or tree search algorithm is a classical approach that has been widely applied to optimization as well as bandits or planning (Kocsis and Szepesvári, 2006; Coquelin and Munos, 2007; Hren and Munos, 2008), see Munos (2014) for a survey. At each round, the learner selects a cell containing a predefined representative element and asks for its evaluation. We denote its value . denotes the total number of evaluations allocated by the learner to the cell . Our learners collect the evaluations of and organize them in a tree structure that is simply a subset of , , . We define, specially for the noisy case, the estimated value of the cell . Given the evaluations we have , the empirical average of rewards obtained at this cell. We say that the learner opens a cell with evaluations if it asks for evaluations from each of the children cells of cell . In the deterministic feedback, . For the sake of simplicity, the bounds reported in this paper are in terms of the total number of openings , instead of evaluations. The number of function evaluations is upper bounded by , where is the maximum number of children cells of any cell in .
The Lambert function Our results use the Lambert function. Solving for the variable , the equation gives . is multivalued for . However, in this paper, we consider and , referred to as the standard cannot be expressed in terms of elementary functions. Yet, we have (Hoorfar and Hassani, 2008). has applications in physics and applied mathematics (Corless et al., 1996).
Finally, let with , , and . denotes the logarithm in base , . Without a subscript, is the natural logarithm in base .
3 Adaptive deterministic optimization and improved rate
3.1 The SequOOL algorithm
The Sequential Optimistic Optimization aLgorithm SequOOL is described in Figure 1. SequOOL explores sequentially the depth one by one, going deeper and deeper with a decreasing number of cells opened per depth : openings at depth . is the maximal depth that is opened. The analysis of SequOOL shows that it is relevant that , where is the th harmonic number, , with for any positive integer . SequOOL returns the element of the evaluated cell with the highest value, . The budget is set to to preserve the simplicity of the bounds. SequOOL uses no more openings than that as
3.2 Analysis of SequOOL
For any global optimum , let be the depth of the deepest opened node containing at the end of the opening of depth by SequOOL (an iteration of the for cycle). Note that is increasing. The proofs of the following statements are given in Appendix A. {lemma}[] For any global optimum with associated as defined in Assumption 1, for any depth , if , we have , while . Lemma 3.2 states that as long as SequOOL opens more cells at depth than the number of nearoptimal cells at depth , the cell containing is opened at depth . {theorem}[] Let be the standard Lambert function (see Section 2). For any function and one of its global optima with associated , and nearoptimality dimension , we have, after rounds, the simple regret of SequOOL bounded by
For more readability, Corollary 3.2 uses a lower bound on (Hoorfar and Hassani, 2008). {corollary} If , assumptions in Theorem 3.2 hold and ,
3.3 Discussion for the deterministic feedback
Comparison with SOO
SOO and SequOOL both address deterministic optimization without knowledge of the smoothness. The regret guarantees of SequOOL are an improvement over SOO. While when both algorithms achieve a regret , when , the regret of SOO is while the regret of SequOOL is which is a significant improvement. As discussed in the introduction and by Valko et al. (2013, Section 5), the case is very common. As pointed out by Munos (2011, Corollary 2), SOO has to actually know whether or not to set the maximum depth of the tree as a parameter for SOO. SequOOL is fully adaptive, does not need to know any of this and actually gets a better rate.^{4}^{4}4A similar behavior is also achieved by combining two SOO algorithms, by running half of the samples for and half for . However, SequOOL does this naturally and gets a better rate when . The conceptual difference with SOO is that SequOOL is sequential, for a given depth , SequOOL first opens cells at depth and then at depth and so on, without coming back to lower depths. Indeed, an opening at depth is based on the values observed while opening at depth . Therefore, it is natural and less wasteful to do the opening in a sequential order. Moreover, SequOOL is more conservative as it opens more the lower depths while SOO opens every depth equally. However from the depth perspective, SequOOL is more aggressive as it opens depth as high as , while SOO stops at .
Comparison with DOO
Contrarily to SequOOL, DOO knows the smoothness of the function. However this knowledge only improves the logarithmic factor in the current upper bound. When , DOO achieves a regret , when , the loss is .
Lower bounds
As discussed by Munos (2014) for , DOO matches the lower bound and it is even comparable to the lowerbound for concave functions. While SOO was not matching the bound of DOO, with our result, we now know that, up to a log factor, it is possible to achieve the same performance as DOO, without the knowledge of the smoothness.
4 Noisy optimization with adaptation to low noise
4.1 The StroquOOL algorithm
In the presence of noise, it is natural to evaluate the cells multiple times, not just one time as in the deterministic case. The amount of times a cell should be evaluated to differentiate its value from the optimal value of the function depends on the gap between these two values as well as the range of noise. As we do not want to make any assumptions on knowing these quantities, our algorithm tries to be robust to any potential values by not making a fixed choice on the number of evaluations. Intuitively, StroquOOL implicitly uses modified versions of SequOOL, denoted SequOOL^{5}^{5}5Again, this is only for the intuition, the algorithm is not a metaalgorithm over SequOOL’s. where each cell is evaluated times, , while in SequOOL . On one side, given one instance of SequOOL, evaluating more each cells ( large) leads to a better quality of the mean estimates in each cell. On the other side, as a tradeoff, it implies that SequOOL is using more evaluations per depth and therefore is not be able to explore deep depths of the partition. The largest depth explored is now . StroquOOL then implicitly performs the same amount of evaluations as it would be performed by instances of SequOOL each with a number of evaluations of , where we have .
The St(r)ochastic sequential Optimization aLgorithm StroquOOL is described in Figure 2. Remember that ‘opening’ a cell means ‘evaluating’ its children. The algorithm opens cells by sequentially diving them deeper and deeper from the root node to a maximal depth of . At depth , we allocate, in a decreasing fashion, different number of evaluations to the cells with highest value of that depth, with starting at down to . The best cell that has been evaluated at least times is opened with evaluations, the two next best cells that have been evaluated at least times are opened with evaluations, the four next best cells that have been evaluated at least times are opened with evaluations and so on, until some next best cells that have been evaluated at least once are opened with one evaluation. More precisely, given, and , we open, with evaluations, the nonpreviouslyopened cells with highest values and given that . The maximum number of evaluations of any cell is . For each , the candidate output is the cell with highest estimated value that has been evaluated at least times, . We set In Appendix B, we prove that StroquOOL uses less than openings.
4.2 Analysis of StroquOOL
The proofs of this section use a similar structure to the ones for the deterministic feedback. Additionally, they take into account the uncertainty created by the noise.The proofs of the following statements are given in Appendix D and E. For any is the depth of the deepest opened node with at least evaluations containing at the end of the opening of depth of StroquOOL.
[] For any global optimum with associated (see Assumption 1), with probability at least , for all depths , for all , if and if , we have while . Lemma 4.2 gives two conditions so that the cell containing is opened at depth . This holds if (1) StroquOOL opens, with evaluations, more cells at depth than the number of nearoptimal cells at depth () and (2) the evaluations are sufficient to discriminate the empirical average of nearoptimal cells from the empirical average of suboptimal cells ().
To state the next theorems, we introduce a positive real number satisfying We have with . The quantity gives the depth of deepest cell opened by StroquOOL that contains with high probability. Consequently, also lets us characterize for which regime of the noise range we recover results similar to the loss of the deterministic case. Discriminating on the noise regime, we now state our results, Theorem 4.2 for a high noise and Theorem 4.2 for a low one. {theorem}[] Highnoise regime After rounds, for any function and one of its global optima with associated , and nearoptimality dimension denoted for simplicity , if the simple regret of SequOOL obeys
where is the standard Lambert function and . {corollary} With the assumptions of Theorem 4.2 and ,
[] Lownoise regime After rounds, for any function and one of its global optima with associated , and nearoptimality dimension denoted for simplicity , if the simple regret of StroquOOL is bounded as follows
With the assumptions of Theorem 4.2, if , then
4.3 Discussion for the stochastic feedback
Worstcase comparison with POO and StoSOO
When is large and known: StroquOOL is an algorithm designed for the noisy feedback while adapting to the smoothness of the function. Therefore, it can be directly compared to POO and StoSOO that both tackle the same problem. The results for StroquOOL, like the ones for POO, hold for , while the theoretical guarantees of StoSOO are only for the case . The general rate of StroquOOL in Corollary 4.2 ^{6}^{6}6Note that the second term in our bound has at most the same rate as the first one. is similar to the ones of POO (for ) and StoSOO (for ) as their loss is . More precisely, looking at the log factors, we can first notice an improvement over StoSOO when . We have . Comparing with POO, we obtain a worse logarithmic factor, as . Despite having this (theoretically) slightly worse logarithmic factor compared to POO, StroquOOL has two nice new features. First, our algorithm is conceptually simple, parameterfree, and does not need to call a subalgorithm: POO repetitively calls different instances of HOO which makes it a heavy metaalgorithm. Second, our algorithm, as we detail in next paragraphs, naturally adapts to low noise and, even more, recovers the rates of SequOOL in the deterministic case, leading to exponentially decreasing loss when . We do not know if this deterioration of the logarithmic factor from POO to StroquOOL is the unavoidable price to pay to obtain an adaptation to the deterministic feedback case.
Comparison with oracle HOO
HOO is also designed for the noisy optimization setting. However HOO knows the smoothness of , i.e., are input parameters of HOO. Using this extra knowledge HOO is only able to improve the logarithmic factor to achieve a regret of .
Adaptation to the range of the noise without a prior knowledge
A favorable feature of our bound in Corollary 4.2 is that it characterizes how the range of the noise affects the rate of the regret for all . Considering the common case of , the regret in Corollary 4.2 scales linearly with the range of the noise leading to potential large improvement for small . Note that is any real nonnegative number and it is unknown by StroquOOL. HOO, POO, and StoSOO, on the other hand, would only obtain a regret scaling with when is known to them as they directly encode a confidence bound that must include , in the definition of their code. To achieve this result, and contrarily to HOO, StoSOO, or POO, we designed StroquOOL without using upperconfidence bounds (UCBs). Indeed, UCB approaches are overly conservative as they use hardcoded (and often overestimated) upperbound on . Finally, note that using UCB approaches with empirical estimation of the variance would not achieve the best of both worlds: a result that is discussed in the next paragraph. Indeed, an assumption on the noise is still used in these approaches. This prevents having when and .
Adaptation to the deterministic case and
When the noise is very low, i.e., when , which includes the deterministic feedback, in Theorem 4.2 and Corollary 4.2, StroquOOL recovers the same rate as DOO and SequOOL up to logarithmic factors. Remarkably, StroquOOL obtains an exponentially decreasing regret when while POO, StoSOO or HOO only guarantee a regret of when unaware of the range . Therefore, up to log factors, StroquOOL achieves naturally the best of both worlds without being aware of the nature of the feedback (either stochastic or deterministic). Again, this is a behavior that one cannot expect from HOO, POO, and StoSOO as they explicitly use confidence intervals in their algorithm assuming the range of noise is which limits the maximum depth that can be explored.
5 Experiments
We empirically demonstrate how SequOOL and StroquOOL adapt to the complexity of the data and compare them to SOO, POO, and HOO. We use two functions used by prior work as testbeds for optimization of difficult function without the knowledge of smoothness. The first one is the wrappedsine function ( Grill et al., 2015, Figure 3, bottom right) with . This function has for the standard partitioning (Grill et al., 2015). The second is the garland function ( Valko et al., 2013, Figure 4, bottom right) with . Function has for the standard partitioning (Valko et al., 2013). Both functions are in one dimension, . We remark that our algorithms work in any dimension, but with the current computational power they would not scale beyond a thousand dimensions.
StroquOOL outperforms POO and HOO and adapts to lower noise.
In Figure 3, we report the results of StroquOOL, POO, and HOO for different values of . As detailed in the caption, we vary the range of noise and the range of noise . used by HOO and POO. In all our experiments, StroquOOL outperforms POO and HOO. StroquOOL adapts to low noise, its performance improves when diminishes. To see that, compare topleft (), topmiddle (), and topright () subfigures. On the other hand, POO and HOO do not naturally adapt to the range of the noise: For a given parameter , the performance is unchanged when the range of the real noise varies as seen by comparing again topleft (), topmiddle (), and topright (). However, note that POO and HOO can adapt to noise and perform empirically well if they have a good estimate of the range as in bottomleft, or if they underestimate the range of the noise, , as in bottommiddle. In Appendix F, we report similar results on the garland function. Finally, StroquOOL demonstrates its adaptation to both worlds in Figure 4 (left), where it achieves exponential decreasing loss in the case and deterministic feedback.
Regrets of SequOOL and StroquOOL have exponential decay when .
In Figure 4, we test in the deterministic feedback case with SequOOL, StroquOOL, SOO and the uniform strategy on the garland function (left) and the wrapsine function (middle). Interestingly, for the garland function, where , SequOOL outperforms SOO and displays a truly exponential regret decay (yaxis is in log scale). SOO appears to have the regret of . StroquOOL which is expected to have a regret lags behind SOO. Indeed, exceeds for , for which the result is beyond the numerical precision. In Figure 4 (middle), we used the wrappedsine. While all algorithms have similar theoretical guaranties since here , SOO outperforms the other algorithms.
Acknowledgements
We would like to thank JeanBastien Grill for sharing his code. We gratefully acknowledge the support of the NSF through grant IIS1619362 and of the Australian Research Council through an Australian Laureate Fellowship (FL110100281) and through the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS). The research presented was also supported by European CHISTERA project DELTA, French Ministry of Higher Education and Research, NordPasdeCalais Regional Council, Inria and OttovonGuerickeUniversität Magdeburg associatedteam northeuropean project Allocate, and French National Research Agency projects ExTraLearn (n.ANR14CE24001001) and BoB (n.ANR16CE230003).
References
 AbbasiYadkori et al. [2018] Yasin AbbasiYadkori, Peter Bartlett, Victor Gabillon, Alan Malek, and Michal Valko. Best of both worlds: Stochastic & adversarial bestarm identification. In Conference on Learning Theory, 2018.
 AlDujaili and Suresh [2018] Abdullah AlDujaili and S Suresh. Multiobjective simultaneous optimistic optimization. Information Sciences, 424:159–174, 2018.
 Audibert et al. [2010] JeanYves Audibert, Sébastien Bubeck, and Rémi Munos. Best arm identification in multiarmed bandits. In Conference on Learning Theory, pages 41–53, 2010.
 Azar et al. [2014] Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. Online stochastic optimization under correlated bandit feedback. In International Conference on Machine Learning, 2014.
 Bubeck and Slivkins [2012] Sébastien Bubeck and Aleksandrs Slivkins. The best of both worlds: stochastic and adversarial bandits. In Conference on Learning Theory, pages 42–1, 2012.
 Bubeck et al. [2011a] Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. Xarmed bandits. Journal of Machine Learning Research, 12:1587–1627, 2011a.
 Bubeck et al. [2011b] Sébastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. Lipschitz Bandits without the Lipschitz Constant. In Algorithmic Learning Theory, 2011b.
 Buşoniu and Morărescu [2014] Lucian Buşoniu and IrinelConstantin Morărescu. Consensus for blackbox nonlinear agents using optimistic optimization. Automatica, 50(4):1201–1208, 2014.
 Busoniu et al. [2013] Lucian Busoniu, Alexander Daniels, Rémi Munos, and Robert Babuska. Optimistic planning for continuousaction deterministic systems. In Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2013 IEEE Symposium on, pages 69–76. IEEE, 2013.
 Coquelin and Munos [2007] PierreArnaud Coquelin and Rémi Munos. Bandit algorithms for tree search. In Uncertainty in Artificial Intelligence, 2007.
 Corless et al. [1996] Robert M Corless, Gaston H Gonnet, David EG Hare, David J Jeffrey, and Donald E Knuth. On the lambert w function. Advances in Computational mathematics, 5(1):329–359, 1996.
 de Freitas et al. [2012] Nando de Freitas, Alex Smola, and Masrour Zoghi. Exponential regret bounds for Gaussian process bandits with deterministic observations. In International Conference on Machine Learning, 2012.
 De Rooij et al. [2014] Steven De Rooij, Tim Van Erven, Peter D Grünwald, and Wouter M Koolen. Follow the leader if you can, hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316, 2014.
 Derbel and Preux [2015] Bilel Derbel and Philippe Preux. Simultaneous optimistic optimization on the noiseless BBOB testbed. In IEEE Congress on Evolutionary Computation, CEC 2015, Sendai, Japan, May 2528, 2015, pages 2010–2017, 2015.
 Grill et al. [2015] JeanBastien Grill, Michal Valko, and Rémi Munos. Blackbox optimization of noisy functions with unknown smoothness. In Advances in Neural Information Processing Systems, pages 667–675, 2015.
 Hansen and Walster [2003] Eldon Hansen and G William Walster. Global optimization using interval analysis: revised and expanded, volume 264. CRC Press, 2003.
 Hoorfar and Hassani [2008] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the lambert w function and hyperpower function. Journal of Inequalities in Pure and Applied Mathematics (JIPAM), 9(2):5–9, 2008.
 Hren and Munos [2008] JeanFrancois Hren and Rémi Munos. Optimistic Planning of Deterministic Systems. In European Workshop on Reinforcement Learning, 2008.
 Jones et al. [1993] David Jones, Cary Perttunen, and Bruce Stuckman. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1):157–181, 1993.
 Kasim and Norreys [2016] Muhammad F Kasim and Peter A Norreys. Infinite dimensional optimistic optimisation with applications on physical systems. arXiv preprint arXiv:1611.05845, 2016.
 Kawaguchi et al. [2015] Kenji Kawaguchi, Leslie Pack Kaelbling, and Tomás LozanoPérez. Bayesian optimization with exponential convergence. In Advances in neural information processing systems, pages 2809–2817, 2015.
 Kawaguchi et al. [2016] Kenji Kawaguchi, Yu Maruyama, and Xiaoyu Zheng. Global continuous optimization with error bound and fast convergence. Journal of Artificial Intelligence Research, 56:153–195, 2016.
 Kearfott [2013] R Baker Kearfott. Rigorous global search: continuous problems, volume 13. Springer Science & Business Media, 2013.
 Kleinberg et al. [2008] Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multiarmed bandits in metric spaces. In ACM Symposium on Theory of Computing (STOC), pages 681–690. ACM, 2008.
 Kocsis and Szepesvári [2006] Levente Kocsis and Csaba Szepesvári. Banditbased MonteCarlo planning. In European Conference on Machine Learning, 2006.
 Locatelli and Carpentier [2018] Andrea Locatelli and Alexandra Carpentier. Adaptivity to Smoothness in Xarmed bandits. In Conference on Learning Theory, 2018.
 Malherbe and Vayatis [2017] Cédric Malherbe and Nicolas Vayatis. Global optimization of lipschitz functions. In Proceedings of the 34th International Conference on Machine Learning, pages 2314–2323, 2017.
 Munos [2011] Rémi Munos. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Advances in Neural Information Processing Systems, pages 783–791, 2011.
 Munos [2014] Rémi Munos. From bandits to MonteCarlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, 7(1):1–130, 2014.
 Pintér [2013] János D Pintér. Global optimization in action: Continuous and lipschitz optimization: Algorithms, implementations and applications. Nonconvex Optimization and Its Applications. Springer US, 2013.
 Powers [1998] David Powers. Applications and explanations of Zipf’s law. In New methods in language processing and computational natural language learning. Association for Computational Linguistics, 1998.
 Preux et al. [2014] Philippe Preux, Rémi Munos, and Michal Valko. Bandits attack function optimization. In Evolutionary Computation (CEC), 2014 IEEE Congress on, pages 2245–2252. IEEE, 2014.
 Qian and Yu [2016] Hong Qian and Yang Yu. Scaling simultaneous optimistic optimization for highdimensional nonconvex functions with low effective dimensions. In AAAI, pages 2000–2006, 2016.
 Seldin and Slivkins [2014] Yevgeny Seldin and Aleksandrs Slivkins. One practical algorithm for both stochastic and adversarial bandits. In International Conference on Machine Learning, pages 1287–1295, 2014.
 Shang et al. [2018] Xuedong Shang, Emilie Kaufmann, and Michal Valko. Adaptive blackbox optimization got easier: HCT needs only local smoothness. In European Workshop on Reinforcement Learning, 2018.
 Slivkins [2011] Aleksandrs Slivkins. Multiarmed bandits on implicit metric spaces. In Neural Information Processing Systems, 2011.
 Strongin and Sergeyev [2013] Roman G Strongin and Yaroslav D Sergeyev. Global optimization with nonconvex constraints: Sequential and parallel algorithms, volume 45. Springer Science & Business Media, 2013.
 Valko et al. [2013] Michal Valko, Alexandra Carpentier, and Rémi Munos. Stochastic simultaneous optimistic optimization. In International Conference on Machine Learning, pages 19–27, 2013.
 Wang et al. [2014] Ziyu Wang, Babak Shakibi, Lin Jin, and Nando de Freitas. Bayesian MultiScale Optimistic Optimization. In International Conference on Artificial Intelligence and Statistics, 2014.
Appendix A Regret analysis of SequOOL for deterministic feedback
See 3.2
{proof}
We prove Lemma 3.2 by induction over depth .
For , we trivially have .
Now consider and assume that . We want to show that . If we already know and if ,
we have that for all ,
which means, assuming that the proposition of the lemma is true for that . Therefore, at the end of the processing of depth , during which we were opening the cells of depth we managed to open the cell the optimal node of depth (i.e., such that . During phase , the cells from with highest values are opened. For the purpose of contradiction, let us assume that is not one of them. This would mean that there exist at least cells from , distinct from , satisfying . As by Assumption 1, this means we have (the is for ). However by assumption of the lemma we have . It follows that . This leads contradicts being of nearoptimality dimension with associated constant as defined in Definition 2. Indeed the condition in Definition 2 is equivalent to the condition as is an integer.
See 3.2
Let be a global optimum with associated . For simplicity, let . We have
where (a) is because and . Note that the tree has depth in the end. From the previous inequality we have . For the rest of the proof, we want to lower bound . Lemma 3.2 provides a sufficient condition on to get lower bounds. This condition is an inequality in which as gets larger (more depth) the condition is more and more likely not to hold. For our bound on the regret of StroquOOL to be small, we want a quantity so that the inequality holds but having as large as possible. So it makes sense to see when the inequality flip signs which is when it turns to equality. This is what we solve next. We solve Equation 2 and then verify that it gives a valid indication of the behavior of our algorithm in term of its optimal . We denote the positive real number satisfying
(2) 
where . As , and we have . This gives . Finally as , we have .
If we have . If we have where is the standard Lambert function. Using standard properties of the function, we have
(3) 
We always have . If , as discussed above , therefore as is increasing. Moreover because of Lemma 3.2 which assumptions are verified because of Equation 3 and . So in general we have . If we have,
If verifies for , [Hoorfar and Hassani, 2008]. Therefore, if we have, denoting ,
Appendix B StroquOOL is not using a budget larger than
Notice, for any given depth , StroquOOL never uses more openings than as
Summing over the depths, StroquOOL never uses more openings than the budget during its depth exploration as
We need to add the additional openings for the evaluation at the end,
Therefore, in total the budget is not more than . Again notice we use the budget of only for the notational convenience, we could also use for the evaluation in the end to fit under (it’s important that the amount of openings is linear in ).
Appendix C Lower bound on the probability of event
In this section, we define and consider event and prove it holds with high probability. {lemma} Let be the set of cells evaluated by StroquOOL during one of its runs. is a random quantity. Let be the event under which all average estimates in the cells receiving at least one evaluation from StroquOOL are within their classical confidence interval, then , where
The idea of the proof of this lemma follows the similar line as the proof of the equivalent statement given for StoSOO [Valko et al., 2013]. The crucial point is that while we have potentially exponentially many combinations of cells that can be evaluated, given any particular execution we need to consider only a polynomial number of estimators for which we can use ChernoffHoeffding concentration inequality.
The identity of the set of the cells evaluated by StroquOOL, , is random and can change at every run of StroquOOL. However, no cells with a depth larger than are evaluated. Therefore, given , the number of possible sets of cells associated with any run of StroquOOL is finite. Let us denote the set of all such possible sets of cells as . Given any given set of cells , that StroquOOL could open we denote the event when StroquOOL opens exactly all the cells in and define the related event ,