Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization

Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization

Gautam Goel Yiheng Lin Haoyuan Sun Adam Wierman California Institute of Technology
Abstract

We study online convex optimization in a setting where the learner seeks to minimize the sum of a per-round hitting cost and a movement cost which is incurred when changing decisions between rounds. We prove a new lower bound on the competitive ratio of any online algorithm in the setting where the costs are -strongly convex and the movement costs are the squared norm. This lower bound shows that no algorithm can achieve a competitive ratio that is as tends to zero. No existing algorithms have competitive ratios matching this bound, and we show that the state-of-the-art algorithm, Online Balanced Decent (OBD), has a competitive ratio that is . We additionally propose two new algorithms, Greedy OBD (G-OBD) and Regularized OBD (R-OBD) and prove that both algorithms have an competitive ratio. The result for G-OBD holds when the hitting costs are quasiconvex and the movement costs are the squared norm, while the result for R-OBD holds when the hitting costs are -strongly convex and the movement costs are Bregman Divergences. Further, we show that R-OBD simultaneously achieves constant, dimension-free competitive ratio and sublinear regret when hitting costs are strongly convex.

thanks: Gautam Goel, Yiheng Lin, and Haoyuan Sun contributed equally to this work. This work was supported by NSF grants AitF-1637598 and CNS-1518941, with additional support for Gautam Goel provided by an Amazon AWS AI Fellowship.

1 Introduction

We consider the problem of Smoothed Online Convex Optimization (SOCO), a variant of online convex optimization (OCO) where the online learner pays a movement cost for changing actions between rounds. More precisely, we consider a game where an online learner plays a series of rounds against an adaptive adversary. In each round, the adversary picks a convex cost function and shows it to the learner. After observing the cost function, the learner chooses an action and pays a hitting cost , as well as a movement cost , which penalizes the online learner for switching points between rounds.

SOCO was originally proposed in the context of dynamic power management in data centers lin2012online. Since then it has seen a wealth of applications, from speech animation to management of electric vehicle charging kim2015decision; joseph2012jointly; kim2014real, and more recently applications in control goel2017thinking; goel2018smoothed and power systems li2018using; badieionline. SOCO has been widely studied in the machine learning community with the special cases of online logistic regression and smoothed online maximum likelihood estimation receiving recent attention goel2018smoothed.

Additionally, SOCO has connections to a number of other important problems in online algorithms and learning. Convex Body Chasing (CBC), introduced in friedman1993convex, is a special case of SOCO bubeck2018competitively. The problem of designing competitive algorithms for Convex Body Chasing has attracted much recent attention. e.g. bubeck2018competitively; bansa2018nested; argue2019nearly. SOCO can also be viewed as a continuous version of the Metrical Task System (MTS) problem (see borodin1992optimal; bartal1997polylog; blum2000line). A special case of MTS is the celebrated server problem, first proposed in manasse1990competitive, which has received significant attention in recent years (see bubeck2018k; buchbinder2019k).

Given these connections, the design and analysis of algorithms for SOCO and related problems has received considerable attention in the last decade. SOCO was first studied in the scalar setting in lin2013dynamic, which used SOCO to model dynamic “right-sizing” in data centers and gave a 3-competitive algorithm. A 2-competitive algorithm was shown in bansal20152, also in the scalar setting, which matches the lower bound for online algorithms in this setting antoniadis2017tight. Another rich line of work studies how to design competitive algorithms for SOCO when the online algorithm has access to predictions of future cost functions (see lin2012online; li2018using; chen2015online; chen2016using).

Despite a large and growing literature on SOCO and related problems, for nearly a decade the only known constant-competitive algorithms that did not use predictions of future costs were for one-dimensional action spaces. In fact, the connections between SOCO and Convex Body Chasing highlight that, in general, one cannot expect dimension-free constant competitive algorithms due to a lower bound (see friedman1993convex; chen2018smoothed). However, recently there has been considerable progress moving beyond the one-dimensional setting for large, important classes of hitting and movement costs.

A breakthrough came in 2017 when chen2018smoothed proposed a new algorithm, Online Balanced Descent (OBD), and showed that it is constant competitive in all dimensions in the setting where the hitting costs are locally polyhedral and movement costs are the norm. The following year, goel2018smoothed showed that OBD is also constant competitive, specifically -competitive, in the setting where the hitting costs are -strongly convex and the movement costs are the squared norm. Note that this setting is of particular interest because of its importance for online regression and LQR control (see goel2018smoothed).

While OBD has proven to be a promising new algorithm, at this point it is not known whether OBD is optimal for the competitive ratio, or if there is more room for improvement. This is because there are no non-trivial lower bounds known for important classes of hitting costs, the most prominent of which is the class of strongly convex functions.

Contributions of this paper. In this paper we prove the first non-trivial lower bounds on SOCO with strongly convex hitting costs, both for general algorithms and for OBD specifically. These lower bounds show that OBD is not optimal and there is an order-of-magnitude gap between its performance and the general lower bound. Motivated by this gap and the construction of the lower bounds we present two new algorithms, both variations of OBD, which have competitive ratios that match the lower bound. More specifically, we make four main contributions in this paper.

First, we prove a new lower bound on the performance achievable by any online algorithm in the setting where the hitting costs are -strongly convex and the movement costs are the squared norm. In particular, in Theorem 1, we show that as tends to zero, any online algorithm must have competitive ratio at least .

Second, we show that the state-of-the-art algorithm, OBD, cannot match this lower bound. More precisely, in Theorem 2 we show that, as tends to zero, the competitive ratio of OBD is , an order-of-magnitude higher than the lower bound of . This immediately begs the question: can any online algorithm close the gap and match the lower bound?

Our third contribution answers this question in the affirmative. In Section 4, we propose two novel algorithms, Greedy Online Balanced Descent (G-OBD) and Regularized Online Balanced Descent (R-OBD), which are able to close the gap left open by OBD and match the lower bound. Both algorithms can be viewed as “aggressive" variants of OBD, in the sense that they chase the minimizers of the hitting costs more aggressively than OBD. In Theorem 3 we show that G-OBD matches the lower bound up to constant factors for quasiconvex hitting costs (a more general class than -strongly convex). In Theorem 4 we show that R-OBD has a competitive ratio that precisely matches the lower bound, including the constant factors, and hence can be viewed as an optimal algorithm for SOCO in the setting where the costs are -strongly convex and the movement cost is the squared norm. Further, our results for R-OBD hold not only for squared movement costs; they also hold for movement costs that are Bregman Divergences, which commonly appear throughout information geometry, probability, and optimization.

Finally, in our last section we move beyond competitive ratio and additionally consider regret. We prove in Theorem 6 that R-OBD can simultaneously achieve bounded, dimension-free competitive ratio and sublinear regret in the case of -strongly convex hitting costs and squared movement costs. This result helps close a crucial gap in the literature. Previous work has shown that it not possible for any algorithm to simultaneously achieve both a constant competitive ratio and sublinear regret in general SOCO problems daniely2019competitive. However, this was shown through the use of linear hitting and movement costs. Thus, the question of whether it is possible to simultaneously achieve a dimension-free, constant competitive ratio and sublinear regret when hitting costs are strongly convex has remained open. The closest previous result is from chen2018smoothed, which showed that OBD can achieve either constant competitive ratio or sublinear regret with locally polyhedral cost functions depending on the “balance condition” used; however both cannot be achieved simultaneously. Our result (Theorem 6), shows that R-OBD can simultaneously provide a constant competitive ratio and sublinear regret for strongly convex cost functions when the movement costs are the squared norm.

2 Model & Preliminaries

An instance of Smoothed Online Convex Optimization (SOCO) consists of a convex action set , an initial point , a sequence of non-negative convex cost functions , and a movement cost . In every round, the environment picks a cost function (potentially adversarily) for an online learner. After observing the cost function, the learner chooses an action and pays a cost that is the sum of the hitting cost, , and the movement cost, a.k.a., switching cost, . The goal of the online learner is to minimize its total cost over rounds:

We emphasize that it is the movement costs that make this problem interesting and challenging; if there were no movement costs, , the problem would be trivial, since the learner could always pay the optimal cost simply by picking the action that minimizes the hitting cost in each round, i.e., by setting . The movement cost couples the cost the learner pays across rounds, which means that the optimal action of the learner depends on unknown future costs.

There is a long literature on SOCO, both focusing on algorithmic questions, e.g., goel2018smoothed; lin2013dynamic; bansal20152; chen2018smoothed, and applications, e.g., kim2015decision; joseph2012jointly; kim2014real; lin2012online. The variety of applications studied means that a variety of assumptions about the movement costs have been considered. Motivated by applications to data center capacity management, movement costs have often been taken as the norm, i.e., , e.g. lin2013dynamic; bansal20152. However, recently, more general norms have been considered and the setting of squared movement costs has gained attention due to its use in online regression problems and connections to LQR control, among other applications (see goel2017thinking; goel2018smoothed; astrom2010feedback).

In this paper, we focus on the setting of the squared norm, i.e. ; however, we also consider a generalization of the norm in Section 4.2 where is the Bregman divergence. Specifically, we consider , where both the potential and its Fenchel Conjugate are differentiable. Further, we assume that is -strongly convex and -strongly smooth with respect to an underlying norm . Definitions of each of these properties can be found in the appendix.

Note that the squared norm is itself a Bregman divergence, with and , . However, more generally, when with domain , is the Kullback-Liebler divergence (see bansal2017potential). Further, is -strongly convex and -strongly smooth in the domain (see chen2018smoothed). This extension is important given the role Bregman divergence plays across optimization and information theory, e.g., see azizan2018stochastic; murata2004information.

Like for movement costs, a variety of assumptions have been made about hitting costs. In particular, because of the emergence of pessimistic lower bounds when general convex hitting costs are considered, papers typically have considered restricted classes of functions, e.g., locally polyhedral chen2018smoothed and strongly convex goel2018smoothed. In this paper, we focus on hitting costs that are -strongly convex; however our results in Section 4.1 generalize to the case of quasiconvex functions.

Competitive Ratio and Regret. The primary goal of the SOCO literature is to design online algorithms that (nearly) match the performance of the offline optimal algorithm. The performance metric used to evaluate an algorithm is typically the competitive ratio because the goal is to learn in an environment that is changing dynamically and is potentially adversarial. The competitive ratio is the worst-case ratio of total cost incurred by the online learner and the offline optimal costs. The cost of the offline optimal is defined as the minimal cost of an algorithm if it has full knowledge of the sequence of costs , i.e. Using this, the competitive ratio is defined as

Note that another important performance measure of interest is the regret. In this paper, we study a generalization of the classical regret called the -constrained regret, which is defined as follows. The -(constrained) dynamic regret of an online algorithm is if for all sequences of cost functions , we have where is the cost of an -constrained offline optimal solution, i.e., one with movement cost upper bounded by :

As the definitions above highlight, the regret and competitive ratio both compare with the cost of an offline optimal solution, however regret constrains the movement allowed by the offline optimal. The classical notion of regret focuses on the static optimal (), but relaxing that to allow limited movement bridges regret and the competitive ratio since, as grows, the -constrained offline optimal approaches the offline (dynamic) optimal. Intuitively, one can think of regret as being suited for evaluating learning algorithms in (nearly) static settings while the competitive ratio as being suited for evaluating learning algorithms in dynamic settings.

Online Balanced Descent. The state-of-the-art algorithm for SOCO is Online Balanced Descent (OBD). OBD, which is formally defined in Algorithm 1, uses the operator to denote the projection of onto a convex set ; and this operator is defined as . Intuitively, it works as follows. In every round, OBD projects the previously chosen point onto a carefully chosen level set of the current cost function . The level set is chosen so that the hitting costs and movement costs are “balanced": in every round, the movement cost is at most a constant times the hitting cost. The balance helps ensure that the online learner is matching the offline costs. Since neither cost is too high, OBD ensures that both are comparable to the offline optimal. The parameter can be tuned to give the optimal competitive ratio and the appropriate level set can be efficiently selected via binary search.

1:procedure OBD() Procedure to select
2:     
3:     Let . Initialize . Here .
4:     Increase . Stop when .
5:     .
6:     return
Algorithm 1 Online Balanced Descent (OBD)

Implicitly, OBD can be viewed as a proximal algorithm with a dynamic step size Boyd14proximal, in the sense that, like proximal algorithms, OBD iteratively projects the previously chosen point onto a level set of the cost function. Unlike traditional proximal algorithms, OBD considers several different level sets, and carefully selects the level set in every round so as to balance the hitting and movement costs. We exploit this connection heavily when designing Regularized OBD (R-OBD), which is a proximal algorithm with a special regularization term added to the objective to help steer the online learner towards the hitting cost minimizer in each round.

OBD was proposed in chen2018smoothed, where the authors show that it has a constant, dimension-free competitive ratio in the setting where the movement costs are the norm and the hitting costs are locally polyhedral, i.e. grow at least linearly away from the minimizer. This was the first time an algorithm had been shown to be constant competitive beyond one-dimensional action spaces. In the same paper, a variation of OBD that uses a different balance condition was proven to have -constrained regret for locally polyhedral hitting costs. OBD has since been shown to also have a constant, dimension-free competitive ratio when movement costs are the squared norm and hitting costs are strongly convex, which is the setting we consider in this paper. However, up until this paper, lower bounds for the strongly convex setting did not exist and it was not known whether the performance of OBD in this setting is optimal or if OBD can simultaneously achieve sublinear regret and a constant, dimension-free competitive ratio.

3 Lower Bounds

Our first set of results focuses on lower bounding the competitive ratio achievable by online algorithms for SOCO. While chen2018smoothed proves a general lower bound for SOCO showing that the competitive ratio of any online algorithm is , where is the dimension of the action space, there are large classes of important problems where better performance is possible. In particular, when the hitting costs are -strongly convex, goel2018smoothed has shown that OBD provides a dimension-free competitive ratio of . However, no non-trivial lower bounds are known for the strongly convex setting.

Our first result in this section shows a general lower bound on the competitive ratio of SOCO algorithms when the hitting costs are strongly convex and the movement costs are quadratic. Importantly, there is a gap between this bound and the competitive ratio for OBD proven in goel2018smoothed. Our second result further explores this gap. We show a lower bound on the competitive ratio of OBD which highlights that OBD cannot achieve a competitive ratio that matches the general lower bound. This gap, and the construction used to show it, motivate us to propose new variations of OBD in the next section. We then prove that these new algorithms have competitive ratios that match the lower bound.

We begin by stating the first lower bound for strongly convex hitting costs in SOCO.

Theorem 1.

Consider hitting cost functions that are -strongly convex with respect to norm and movement costs given by . Any online algorithm must have a competitive ratio at least .

Theorem 1 is proven in the appendix using an argument that leverages the fact that, when the movement cost is quadratic, reaching a target point via one large step is more costly than reaching it by taking many small steps. More concretely, to prove the lower bound we consider a scenario on the real line where the online algorithm encounters a sequence of cost functions whose minimizers are at zero followed by a very steep cost function whose minimizer is at . Without knowledge of the future, the algorithm has no incentive to move away from zero until the last step, when it is forced to incur a large cost; however, the offline adversary, with full knowledge of the cost sequence, can divide the journey into multiple small steps.

Importantly, the lower bound in Theorem 1 highlights the dependence of the competitive ratio on , the convexity parameter. It shows that the case where online algorithms do the worst is when is small, and that algorithms that match the lower bound up to a constant are those for which the competitive ratio is as . Note that our results in Section 4 show that there exists online algorithms that precisely achieve the competitive ratio in Theorem 1. However, in contrast, the following shows that OBD cannot match the lower bound in Theorem 1.

Theorem 2.

Consider hitting cost functions that are -strongly convex with respect to norm and a movement costs given by . The competitive ratio of OBD is as , for any fixed balance parameter .

As we have discussed, OBD is the state-of-the-art algorithm for SOCO, and has been shown to provide a competitive ratio of goel2018smoothed. However, Theorem 2 highlights a gap between OBD and the general lower bound. If the lower bound is achievable (which we prove it is in the next section), this implies that OBD is a sub-optimal algorithm.

The proof of Theorem 2 gives important intuition about what goes wrong with OBD and how the algorithm can be improved. Specifically, our proof of Theorem 2 considers a scenario where the cost functions have minimizers very near each other, but OBD takes a series of steps without approaching the minimizing points. The optimal is able to pay little cost and stay near the minimizers, but OBD never moves enough to be close to the minimizers. Figure 1 illustrates the construction, showing OBD moving along the circumference of a circle, while the offline optimal stays near the origin.

{asy}

size(8.5cm); defaultpen(fontsize(7pt));

real sq = sqrt(10);

pair A = (1, 6); pair B = (1, 1); pair C = (16, 1); pair D = (1 + 4.5*sq, 6 - 1.5*sq);

pair E = (1, 1.5); pair F = (1.5, 1.5); pair G = (1.5, 1);

pair H = (1 + 0.5*sq, 1 - 1.5*sq); pair I = (1 + 5*sq, 1);

pair ArcStart = (1 + 0.5*sq, 6 + 1.5*sq); pair ArcEnd = (16 + 5*sq, 6);

pair H2 = (4 + 0.5*sq, -3 - 1.5*sq); pair I2 = (16 + 0.5*sq, 6 - 1.5*sq);

pair X = (20, 1); pair Y = (16, 5);

pair O = (8.5+2.5*sq, 3.5); real R1 = 2.5*sqrt(20 - 6*sq); real R2 = 2.5*sqrt(20 + 6*sq);

dot(A, red+linewidth(3)); label("", A, N); dot(B, red+linewidth(3)); label("", B, SW); dot(C, red+linewidth(3)); label("", C, S); dot(D, red+linewidth(3)); label("", D, N); dot(H, red+linewidth(3)); label("", H, SW); dot(I, red+linewidth(3)); label("", I, SE); dot(O, red+linewidth(3)); label("", O, N);

dot(H2, red+linewidth(3)); label("", H2, SW); dot(I2, red+linewidth(3)); label("", I2, N);

//label("", X, S); //label("", Y, W);

label("", (1, 3.5), W); label("", (1.75, -1.37), W); label("", (8.1, 3.6), N); label("", (8.5, 1), N); label("", (9.7, -1.37), N);

draw(A–B, blue); draw(B–C, blue); draw(A–C, blue); draw(B–H, dashed+blue); draw(H–I, dashed+blue); draw(H–H2, dashed+blue); draw(I–I2, dashed+blue); draw(H2–I2, dashed+blue);

//draw(E–F, blue); //draw(F–G, blue);

//draw(C–X, EndArrow); //draw(C–Y, EndArrow);

pen circlePen=green*0.75+grey*0.25+1bp;

draw(circle(O, R1), circlePen); draw(arc(O, A, ArcEnd), circlePen);

Figure 1: Counterexample used to prove Theorem 2. In the figure, are the choices of OBD and are the choices of the offline optimal.

4 Algorithms

The lower bounds in Theorem 1 and Theorem 2 suggest a gap between the competitive ratio of OBD and what is achievable via an online algorithm. Further, the construction used in the proof of Theorem 2 highlights the core issue that leads to inefficiency in OBD. In the construction, OBD takes a large step from to , but the offline optimal, , only decreases by a very small amount. This means that OBD is continually chasing the offline optimal but never closing the gap. In this section, we take inspiration from this example and develop two new algorithms that build on OBD but ensure that the gap to the offline optimal shrinks.

How to ensure that the gap to the offline optimal shrinks is not obvious since, without the knowledge about the future, it is impossible to determine how will evolve. A natural idea is to determine an online estimate of and then move towards that estimate. Motivated by the construction in the proof of Theorem 2, we use the minimizer of the hitting cost at round , , as a rough estimate of the offline optimal and ensure that we close the gap to in each round.

There are a number of ways of implementing the goal of ensuring that OBD more aggressively moves toward the minimizer of the hitting cost each round. In this section, we consider two concrete approaches, each of which (nearly) matches the lower bound in Theorem 1.

The first approach, which we term Greedy OBD (Algorithm 2) is a two-stage algorithm, where the first stage applies OBD and then a second stage explicitly takes a step directly towards the minimizer (of carefully chosen size). We introduce the algorithm and analyze its performance in Section 4.1. Greedy OBD is order-optimal, i.e. matches the lower bound up to constant factors, in the setting of squared norm movement costs and quasiconvex hitting costs.

The second approach for ensuring that OBD moves aggressively toward the minimizer uses a different view of OBD. In particular, Greedy OBD uses a geometric view of OBD, which is the way OBD has been presented previously in the literature. Our second view uses a “local view” of OBD that parallels the local view of gradient descent and mirror descent, e.g., see bansal2017potential; hazan2016introduction. In particular, the choice of an action in OBD can be viewed as the solution to a per-round local optimization. Given this view, we ensure that OBD more aggressively tracks the minimizer by adding a regularization term to this local optimization which penalizes points which are far from the minimizer. We term this approach Regularized OBD (Algorithm 3), and study it in Section 4.2. Note that Regularized OBD has a competitive ratio that precisely matches the lower bound, including the constant factors, when movement costs are Bregman divergences and hitting costs are -strongly convex. Thus, it applies for more general movement costs than Greedy OBD but less general hitting costs.

4.1 Greedy OBD

The formal description of Greedy Online Balanced Descent (G-OBD) is given in Algorithm 2. G-OBD has two steps each round. First, the algorithm takes a standard OBD step from the previous point to a new point , which is the projection of onto a level set of the current hitting cost , where the level set is chosen to balance hitting and movement costs. G-OBD then takes an additional step directly towards the minimizer of the hitting cost, , with the size of the step chosen based on the convexity parameter . G-OBD can be implemented efficiently using the same approach as described for OBD chen2018smoothed. G-OBD has two parameters and . The first, , is the balance parameter in OBD and the second, , is a parameter controlling the size of the step towards the minimizer . Note that the two-step approach of G-OBD is reminiscent of the two-stage algorithm used in bienkowski2018better; however the resulting algorithms are quite distinct.

1:procedure G-OBD() Procedure to select
2:     
3:     
4:     if  then
5:         
6:     else
7:               
8:     return
Algorithm 2 Greedy Online Balanced Descent (G-OBD)

While the addition of a second step in G-OBD may seem like a small change, it improves performance by an order-of-magnitude. We prove that G-OBD asymptotically matches the lower bound proven in Theorem 2 not just for -strongly convex hitting costs, but more broadly to quasiconvex costs.

Theorem 3.

Consider quasiconvex hitting costs such that and movement costs . G-OBD with is an -competitive algorithm as .

4.2 Regularized OBD

The G-OBD framework is based on the geometric view of OBD used previously in literature. There are, however, two limitations to this approach. First, the competitive ratio obtained, while having optimal asymptotic dependence on , does not not match the constants in the lower bound of Theorem 1. Second, G-OBD requires repeated projections, which makes efficient implementation challenging when the functions have complex geometry.

Here, we present a variation of OBD based on a local view that overcomes these limitations. Regularized OBD (R-OBD) is computationally simpler and provides a competitive ratio that matches the constant factors in the lower bound in Theorem 1. However, unlike G-OBD, our analysis of R-OBD does not apply to quasiconvex hitting costs. R-OBD is described formally in Algorithm 3. In each round, R-OBD picks a point that minimizes a weighted sum of the hitting and movement costs, as well as a regularization term which encourages the algorithm to pick points close to the minimizer of the current hitting cost function, . Thus, R-OBD can be implemented efficiently using two invocations of a convex solver. Note that R-OBD has two parameters and which adjust the weights of the movement cost and regularizer respectively.

While it may not be immediately clear how R-OBD connects to OBD, it is straightforward to illustrate the connection in the squared setting. In this case, computing is equivalent to doing a projection onto a level set of , since the selection of the minimizer can be restated as the solution to . Thus, without the regularizer, the optimization in R-OBD gives a local view of OBD and then the regularizer provides more aggressive movement toward the minimizer of the hitting cost.

1:procedure R-OBD() Procedure to select
2:     
3:     
4:     return
Algorithm 3 Regularized OBD (R-OBD)

Not only does the local view lead to a computationally simpler algorithm, but we prove that R-OBD matches the constant factors in Theorem 1 precisely, not just asymptotically. Further, it does this not just in the setting where movement costs are the squared norm, but also in the case where movement costs are Bregman divergences.

Theorem 4.

Consider hitting costs that are strongly convex with respect to a norm and movement costs defined as , where is -strongly convex and -strongly smooth with respect to the same norm. Additionally, assume and its Fenchel Conjugate are differentiable. Then, R-OBD with parameters and has a competitive ratio of If and satisfy then the competitive ratio is

Theorem 4 focuses on movement costs that are Bregman divergences, which generalizes the case of squared movement costs. To recover the squared case, we use and , which results in a competitive ratio of . This competitive ratio matches exactly with the lower bound claimed in Theorem 1. Further, in this case the assumption in Theorem 4 that the hitting cost functions are differentiable is not required (see Theorem 7 in the appendix).

It is also interesting to investigate the settings of and that yield the optimal competitive ratio. Setting achieves the optimal competitive ratio as long as . By restating the update rule in R-OBD as , we see that R-OBD with can be interpreted as “one step lookahead mirror descent”. Further R-OBD with can be implemented even when we do not know the location of the minimizer . For example, when , we can run gradient descent starting at to minimize the strongly convex function . Only local gradients will be queried in this process. However, the following lower bound highlights that this simple form comes at some cost in terms of generality when compared with our results for G-OBD.

Theorem 5.

Consider quasiconvex hitting costs such that and movement costs given by . Regularized OBD has a competitive ratio of when .

5 Balancing Regret and Competitive Ratio

In the previous sections we have focused on the competitive ratio; however another important performance measure is regret. In this section, we consider the -constrained dynamic regret. The motivation for our study is daniely2019competitive, which provides an impossibility result showing that no algorithm can simultaneously maintain a constant competitive ratio and a sub-linear regret in the general setting of SOCO. However, daniely2019competitive utilizes linear hitting costs in its construction and thus it is an open question as to whether this impossibility result holds for strongly convex hitting costs. In this section, we show that the impossibility result does not hold for strongly convex hitting costs. To show this, we first characterize the parameters for which R-OBD gives sublinear regret.

Theorem 6.

Consider hitting costs that are strongly convex with respect to a norm and movement costs defined as , where is -strongly convex and -strongly smooth with respect to the same norm. Additionally, assume and its Fenchel Conjugate are differentiable. Further, suppose that is bounded above by , the diameter of the feasible set is bounded above by , and . Then, for such that and , where is such that , the -constrained regret of R-OBD is .

Theorem 6 highlights that regret can be achieved when and for some constant . This suggests that the tendency to aggressively move towards the minimizer should shrink over time in order to achieve a small regret. It is not possible to use Theorem 6 to simultaneously achieve the optimal competitive ratio and regret for all strongly convex hitting costs (). However, the corollary below shows that it is possible to simultaneously achieve a dimension-free, constant competitive ratio and an regret for all . An interesting open question that remains is whether it is possible to develop an algorithm that has sublinear regret and matches the optimal order for competitive ratio.

Corollary 1.

Consider the same conditions as in Theorem 6 and fix . R-OBD with parameters has an regret and is -competitive.

References

The appendices that follow provide the proofs of the results in the body of the paper. Throughout the proofs in the appendix we use the following notation to denote the hitting and movement costs of the online learner: and , where is the point chosen by the online algorithm at time . Similarly, we denote the hitting and movement costs of the offline optimal (adversary) as and , where is the point chosen by the offline optimal at time .

Before moving to the proofs, we summarize a few standard definitions that are used throughout the paper.

Definition 1.

A function is -strongly convex with respect to a norm if for all in the relative interior of the domain of and , we have

Definition 2.

A function is -strongly smooth with respect to a norm if is everywhere differentiable and if for all we have

Definition 3.

A function is quasiconvex if its domain and all its sublevel sets

for , is convex.

Definition 4.

For a norm in , its dual norm (on ) is defined to be

Definition 5.

For a convex function , its Fenchel Conjugate is defined to be

Next, we introduce a few technical lemmas that are important throughout our analysis.

The first technical lemma is a characterization of strongly convex functions.

Lemma 1.

Suppose f is strongly convex for some with respect to some norm and both and are differentiable, then , the first condition implies the second condition and the third condition:

  1. ;

  2. ;

  3. .

To prove Lemma 1, we use Lemma 2, Lemma 3, and Lemma 4 below.

The following lemma is Theorem 6 in [kakade2009duality].

Lemma 2.

If is convex and closed, the following two conditions are equivalent:

  1. ;

i.e. f is strongly convex w.r.t some norm if and only if is -strongly smooth w.r.t the dual norm .

The next lemma is a special case of Lemma 17 in [shalev2010equivalence].

Lemma 3.

Let f be a closed, convex, and differentiable function. Then we have

The following technical result describes a well-known property of the gradient of the Fenchel Conjugate.

Lemma 4.

Suppose f is strongly convex for some with respect to some norm and both and are differentiable. Then we have

Proof.

For convenience, we define and . It suffices to prove that .

By Lemma 3, we obtain

(1)

Again by Lemma 3, we have

(2)

where we use the fact that .

Combining equations (1) and (2), we obtain

where in the last inequality we use the definition of strongly convex. Therefore we have proved that . ∎

Using the three lemmas above, we now prove Lemma 1.

Proof of Lemma 1.

By the first condition and Lemma 2, we know is strongly convex with respect to . Therefore we see

Using Lemma 3 and Lemma 4, we obtain

Rearranging the terms, we get

which is the second condition.

The third condition follows from subtracting the second condition from the first condition. ∎

Finally, before moving the the proofs of our main results, we prove two properties of the Bregman Divergence that play an important role in the analysis.

Lemma 5.

and potential h, we have

Proof.

By the definition of Bregman Divergence, we obtain

Lemma 6.

For all , we have

Proof.

Using the definition of Bregman divergence, we obtain

Appendix A Proof of Theorem 1

We consider a sequence of hitting cost functions on the real line such that the algorithm stays at the starting point through time steps and is forced to incur a huge movement cost at time step , whereas the offline adversary can pay relatively little cost by dividing the long trek between and into multiple small steps through time steps .

Specifically, suppose the starting point of the algorithm and the offline adversary is , and the hitting cost functions are

for some large parameter that we choose later.

Suppose the algorithm first moves at time step . If , we stop the game at time step and compare the algorithm with an offline adversary which always stays at . The total cost of offline adversary is 0, but the total cost of the algorithm is non-zero. So, the competitive ratio is unbounded.

Next we consider the case where . This implies that and is some non-zero point, say . We see that the cost incurred by the online algorithm is

Notice that the right hand side tends to as tends to infinity; specifically, we have

(3)

Now let us consider the offline optimal. Notice that, in the limit as tends to infinity, the offline optimal must satisfy and ; otherwise it would incur unbounded cost. Our lower bound is derived by considering the case when and so we constrain the adversary to satisfy the above, knowing that the adversary is not optimal for finite , i.e., with as .

Let the sequence of points the adversary chooses as . We compute the cost incurred by the adversary as follows where, to simplify presentation, we define to be the set .

In words, is twice the minimal offline cost subject to the constraints . We derive the limiting behavior of the offline costs as in the following lemma.

Lemma 7.

For , define

Then we have .

Given the lemma, the total cost of the offline adversary will be . Finally, applying (3), we know and ,

By taking the limit and and using Lemma 7, we obtain

All that remains is to prove Lemma 7, which describes the cost of the offline adversary in the limit as tends to infinity.

Proof of Lemma 7.

Using the fact that the costs are all homogeneous of degree 2, we see that for all , we have

(4)

The sequence has a recursive relationship as follows:

(5)

Solving the equation , we find the two fixed points of the recursive relationship are

and

Notice that for , we have

Using this property, we obtain

(6)

and

(7)

Notice that . By dividing equations (6) and (7), we obtain

Remember that . Therefore we have