Tight Lower Bounds for Multiplicative Weights Algorithmic Families

Tight Lower Bounds for Multiplicative Weights Algorithmic Families

Nick Gravin Massachusetts Institute of Technology. 32 Vassar St, Cambridge, MA 02139. ngravin@mit.edu.    Yuval Peres Microsoft Research. One Microsoft Way, Redmond, WA 98052. peres@microsoft.com.    Balasubramanian Sivan Google Research. 111 8th Avenue, New York, NY 10011. balusivan@google.com.
Abstract

We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem. We develop simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families. We use these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of , there by completely closing the gap between upper and lower bounds. We further show a regret lower bound of for a much more general family of algorithms than MWA, where the learning rate can be arbitrarily varied over time, or even picked from arbitrary distributions over time. We also use our primitives to construct adversaries in the geometric horizon setting for MWA to precisely characterize the regret at for the case of experts and a lower bound of for the case of arbitrary number of experts .

1 Introduction

In this paper we develop tight lower bounds on the regret obtainable by a broad family of algorithms for the fundamental problem of prediction with expert advice. Predicting future events based on past observations, a.k.a. prediction with expert advice, is a classic problem in learning. The experts framework was the first framework proposed for online learning and encompasses several applications as special cases. The underlying problem is an online optimization problem: a player has to make a decision at each time step, namely, decide which of the experts’ advice to follow. At every time , an adversary sets gains for each expert: a gain of for expert at time . Simultaneously, the player, seeing the gains from all previous steps except , has to choose an action, i.e., decide on which expert to follow. If the player follows expert at time , he gains . At the end of each step , the gains associated with all experts are revealed to the player, and the player’s choice is revealed to the adversary. In the finite horizon model, this process is repeated for steps, and the player’s goal is to perform (achieve a cumulative gain) as close as possible to the best single action (best expert) in hindsight, i.e., to minimize his regret :

 RT,k=max1≤i≤kT∑t=1git−T∑t=1gj(t),t.

Apart from assuming that the ’s are bounded in , we don’t assume anything else about the gains111As one might expect, it turns out that restricting the adversary to set gains in instead of is without loss of generality (see [10] or [18]). Henceforth, we restrict ourselves to the binary adversary, which just sets gains of or .. Just as natural as the finite horizon model is the model with a geometric horizon: the stopping time is a geometric random variable with expectation . In other words, the process ends at any given step with probability , independently of the past. Equivalently, both the player and the adversary discount the future with a factor. In this paper, we study both the finite horizon model and the geometric horizon model. We begin with the discussion for finite horizon model below.

Main contribution.

In this paper we develop simple adversarial primitives and demonstrate that, when applied in various combinations, they result in remarkably sharp lower bounds for a broad family of algorithms. We first describe the family of algorithms we study, and then discuss our main results.

Multiplicative Weights Algorithm.

We begin with the Multiplicative Weights Algorithm, which is a simple, powerful and widely used algorithm for a variety of learning problems. In the experts problem, at each time , MWA computes the cumulative gain of each expert accumulated over the past steps, and will follow expert ’s advice with probability proportional to . Namely, with probability where is a parameter that can be tuned. The per-step computation of the algorithm is extremely simple and straightforward. The intuition behind the algorithm is to increase the weight of any expert that performs well by a multiplicative factor. Despite the simplicity and the heuristic origins of the algorithm, it is surprisingly powerful: the pioneering work of Cesa Bianchi et al. [7] showed that MWA obtains a sublinear regret of , and that this is asymptotically optimal as the number of experts and the number of time steps both tend to .

Families of algorithms.

The MWA is a single-parameter family of algorithms, i.e., the learning rate parameter is the only parameter available for the player. In general one could think of being an arbitrary function of time , i.e., at step , algorithm follows expert with probability . Note that this is a -parameter family of algorithms and is quite general. To see why this is general, note that after fixing for all , any probability of picking expert at time can expressed as , irrespective of what was — something that is certainly not possible when is independent of . The most general family of algorithms we study is when at each time , the quantity is drawn from an arbitrary distribution over reals. Since could be arbitrary, this is an infinite-parameter family of algorithms. We denote the

1. single parameter MWA family by ;

2. family where decreases with by ;

3. family where is arbitrary function of by ;

4. family where is drawn from for each by .

It is straightforward to see that . The reason we start with is that it is the classic MWA and precisely characterizing its regret is still open. We study because often when MWA algorithms are working with unknown , they employ a strategy where decreases with time. We move on to further significantly generalize this by studying .

Minimax regret, and Notation.

We study the standard notion of minimax regret for each of the above family of algorithms. Formally, let denote the regret achieved by algorithm when faced with adversary in the prediction with expert advice game with steps and experts. We use to denote the asymptotic222Although doesn’t have a in the subscript, is still dependent on . We suppress merely to indicate asymptotics in ., in , value of , i.e., . The minimax regret of a family of algorithms against a family of adversaries is given by . Let denote the universe of all adversaries. We use the shorthand for . We use to denote the asymptotic, in , value of , i.e., .

Goal.

One of our goals in this paper is to compute the precise values of , , and for each value of , and, describe and compute the adversarial sequences that realize these regrets. For clarity, we compute the precise values of by:

1. computing the best-response adversary in for every algorithm in ;

2. computing the regret of the optimal algorithm in (i.e., the algorithm that gets the smallest regret w.r.t. its best-response adversary).

In many cases, the first step, namely computing the best-response adversary, is challenging. We find the best-response adversaries for the families and . For the families and , we perform the first step approximately, i.e., we compute a nearly best-response adversary, and thus we obtain lower bounds on and .

What is known, and what to expect?

It is well known that for : for all , and in the doubly asymptotic limit, as both and go to , the optimal regret of is , i.e., . (see [7, 5]). While there are useful applications for , there are also several interesting use-cases of the experts problem with just a few experts (rain-or-shine (), buy-or-sell-or-hold ()). It seems like for small such as etc. could be a significant constant factor smaller than . And given that families like etc. are supersets of , it seems even more likely that etc. are constant factor smaller than . Surprisingly, we show that is not the case: the regret of that is obtained as is already obtained at . Thus our work completely closes the gap between upper and lower bounds for all .

1.1 Main Results

Finite horizon model.

1. for even ,  for odd .

2. for even ,  for odd .

Geometric horizon model.

In the geometric horizon model, the current time is not relevant, since the expected remaining time for which the game lasts is the same irrespective of how many steps have passed in the past. Thus is without loss of generality, independent of . Nevertheless, could still depend on other aspects of the history of the game, like the cumulative gains of all the experts etc. We establish some quick notation before discussing results. Let denote the probability that the game stops at any given step, independently of the past (and therefore the expected length of the game is ). Let denote the regret achieved by algorithm when faced with adversary in the prediction with expert advice game with stopping probability and experts. The minimax regret for a family of algorithms is given by . Let333Note that the notation is overloaded: it could refer to finite or geometric horizon setting depending on the context. But since the setting is clear from the context, we drop the vs . .

We show the following:   (1) ;   (2) for all .

The regret lower bound of we obtain is at most a factor away from the regret upper bound of . Further, we show that the adversarial family that we use for the family of algorithms to obtain the precise regret for experts, also obtains the optimal regret for the universe of all algorithms. See Remark 4 for more on this result.

1.2 Simple adversarial primitives and families

While the optimal regret is defined by optimizing over the most general family of all adversaries, (i.e., ) one of our primary contributions in this work is to develop simple and analytically easy-to-work-with adversarial primitives that we use to construct adversarial families (call a typical such family ) such that:

• is simple to-describe and to-optimize-over, i.e., computing is much simpler than computing .

• optimizing over is guaranteed to be as good (or approximately as good) as optimizing over for many algorithmic families , i.e., for many . As , the non-trivial part is to prove (approximate) equality for .

We demonstrate the versatility of our primitives by using simple combinations of them to develop sharp lower bounds to algorithmic families , , , and . There is a lot of room for further combinations of primitives that might be useful to construct adversarial families tailored to other algorithmic families.

The “looping” and “straight-line” primitives.

The generalizations of these primitives for arbitrary is straightforward. The looping primitive partitions the set of experts into two teams, say and , and then it advances all experts in team in one step and in team in the other, and so on. The straight-line primitive picks an arbitrary expert and keeps advancing that expert by in each step.

Combining the primitives.

Here’s how we create effective adversarial families from these primitives. In fact the families are often trivial, i.e., they have only one member and therefore there’s nothing to optimize. We ignore the odd and even distinctions here for ease of description and just focus on the even case. Please see the technical sections for precise descriptions, which is only slightly different from what is here.

1. Perform loops and then straight-line steps, for . Call this adversary (stands for loop-straight-deterministic). Clearly, this adversarial family is simple-to-describe and there is nothing to optimize here as there is only one member in the family. Most importantly, it gives the precisely optimal regret for algorithmic families and as . I.e.,

The best known regret lower bound for was  [11], which leaves a factor gap between upper and lower bounds, that our work closes. We are not aware of prior lower bounds for .

2. Perform loops and then straight-line steps, where is chosen uniformly at random from . This family is simple and there is nothing to optimize here as well. Call this adversary (denoting loop, straight, uniformly random). We show that when or when :

Note that while this lower bound doesn’t precisely match the upper bound, the upper bound and is likely even smaller for small (particularly for a large family of algorithms like or ) — thus our result shows that the ratio between upper and lower bounds is at most and likely even smaller. To the best of our knowledge our lower bound is the first for the classes and .

3. In geometric horizon, even for the family and at experts, instead of a single adversary working for all members of , we have a single-parameter family of adversaries to optimize over. Namely, follow the straight-line primitive for steps and then the looping primitive for steps. Call this single-parameter family (parameterized by ) as . The exact number is determined by optimizing it as a function of the parameter used by the algorithm in . Specifically, for the case of experts we show that: Note that is again simple-to-describe and straightforward-to-optimize over. Further, it is the precisely optimal adversary family for not just but also the universe of all algorithms (see Remark 4), i.e.,

4. But in the geometric horizon setting, if we don’t shoot for the precisely optimal adversary family, and aim for just approximately optimal, then we don’t need a single-parameter family: just following one of the two looping/straight-line primitives gives a lower bound of . Let be the looping and straight line primitives. Then: Note that while this lower bound doesn’t precisely match the upper bound , the latter is at most555This is a simple extension of the standard proof that MWA has a regret upper bound of in the finite horizon setting with steps and experts, and the realization that in the geometric horizon setting, the expected stopping time is . , which is at most a factor larger than lower bound. The only known regret lower bounds in the geometric horizon setting was what one could infer from the finite horizon setting lower bound of  [11], and it is not even clear what this exactly translates to in the geometric horizon setting.

Remark 1

To give a sense that the primitives offer enough variety in combination, here is a simple modification over the adversary , that we call : use with probability , and with probability play the looping primitive for all the steps. This increases the lower bound from to (see Theorem 3). We believe that this can be increased further by picking the stopping time for looping from a non-uniform distribution etc.

1.3 Motivation and discussion

In this work we seek to understand the structure of worst case input sequences for a broad family of algorithms and crisply expose their vulnerabilities. By identifying such structures, we also get the precise regret suffered by them. Our motivation in exploring this question includes the following.

1. After 25 years since MWA was introduced [17, 25], we do not have a sharp regret bound for it. is known to suffer a regret of at most , but the best known lower bound on regret is  [11], with a factor gap between these two bounds. For larger families like , no lower bounds were known. For an algorithm as widely used as MWA, it is fruitful to have a sharp regret characterization.

2. The patterns in the worst-case adversarial sequences that we characterize are simple to spot if they exist (or even if anything close exists), and make simple amends to the algorithm that result in significant gains.

3. The problem is theoretically clean and challenging: how powerful are simple input patterns beyond the typically used pure random sequences in inflicting regret?

Related Work.

Classic works: The book by Cesa-Bianchi and Lugosi [6] is an excellent source for both applications and references for prediction with expert advice. The prediction with experts advice paradigm was introduced by Littlestone and Warmuth [17] and Vovk [25]. The famous multiplicative weights update algorithm was introduced independently by these two works: as the weighted majority algorithm by Littlestone and Warmuth and as the aggregating algorithm by Vovk. The pioneering work of Cesa-Bianchi et al. [7] considered outcome space for nature and showed that for the absolute loss function (or ), the asymptotically optimal regret is . This was later extended to outcomes for nature by Haussler et al. [13]. The asymptotic optimality of for arbitrary loss (gain) functions follows from the analysis of Cesa-Bianchi [5]. When it is known beforehand that the cumulative loss of the optimal expert is going to be small, the optimal regret can be considerably improved, and such results were obtained by Littlestone and Warmuth [17] and Freund and Schapire [9]. With certain assumptions on the loss function, the simplest possible algorithm of following the best expert already guarantees sub-linear regret Hannan [12]. Even when the loss functions are unbounded, if the loss functions are exponential concave, sub-linear regret can still be achieved Blum and Kalai [4].

Recent works: Gravin et al. [10] give the minimax optimal algorithm, and the regret for the prediction with expert advice problem for the cases of and experts. The focus of [10] was providing a regret upper bound for the family of all algorithms, while the focus of this paper is to provide regret lower bounds for large families of algorithms. Luo and Schapire [18] consider a setting where the adversary is restricted to pick gain vectors from the basis vector space Abernethy et al. [2] consider a different variant of experts problem where the game stops when cumulative loss of any expert exceeds given threshold.  Abernethy et al. [1] consider general convex games and compute the minimax regret exactly when the input space is a ball, and show that the algorithms of Zinkevich [26] and Hazan et al. [14] are optimal w.r.t. minimax regret. Abernethy et al. [3] provide upper and lower bounds on the regret of an optimal strategy for several online learning problems without providing algorithms, by relating the optimal regret to the behavior of a certain stochastic process. Mukherjee and Schapire [21] consider a continuous experts setting where the algorithm knows beforehand the maximum number of mistakes of the best expert. Rakhlin et al. [22] introduce the notion of sequential Rademacher complexity and use it to analyze the learnability of several problems in online learning w.r.t. minimax regret. Rakhlin et al. [23] use the sequential Rademacher complexity introduced in [22] to analyze learnability w.r.t. general notions of regret (and not just minimax regret). Rakhlin et al. [24] use the notion of conditional sequential Rademacher complexity to find relaxations of problems like prediction with static experts that immediately lead to algorithms and associated regret guarantees. They show that the random playout strategy has a sound basis and propose a general method to design algorithms as a random playout.  Koolen [15] studies the regret w.r.t. every expert, rather than just the best expert in hindsight and considers tradeoffs in the Pareto-frontier. McMahan and Abernethy [19] characterize the minimax optimal regret for online linear optimization games as the supremum over the expected value of a function of a martingale difference sequence, and similar characterizations for the minimax optimal algorithm and the adversary. McMahan and Orabona [20] study online linear optimization in Hilbert spaces and characterize minimax optimal algorithms. Chaudhuri et al. [8] describe a parameter-free learning algorithm motivated by the cases of large number of experts Koolen and van Erven [16] develop a prediction strategy called Squint, and prove bounds that incorporate both quantile and variance guarantees.

2 Finite horizon

We begin our analysis of MWA by focusing on the simple case of experts. We first identify the structure of the optimal adversary, and through it we obtain the tight regret bound as . Before proceeding further, it is useful to recall that when the gains of the leading and lagging experts are given by and , the MWA algorithm follows these experts with probabilities and respectively. Thus, when the adversary increases by 1 i.e., increases the gain of the leading expert by , the regret benchmark (namely, the gains of the leading expert) increases by , where as MWA is correct only with probability , and this therefore inflicts a regret of on MWA. On the other hand, if the adversary decreases by 1, then the benchmark doesn’t change, whereas MWA succeeds with probability , and this therefore inflicts a regret of . When the adversary doesn’t change , the regret inflicted is .

Let be the fixed update rate of the optimal MWA (the parameter in the exponent as explained in Section 1)666In fact, we can identify the optimal adversary for a much broader family of algorithms (see Appendix A.3 for more details).. Against a specific algorithm, an optimal adversary can always be found in the class of deterministic adversaries. The actions of the optimal adversary (against a specific MWA algorithm) depend only on the distance between leading and lagging experts and time step .

1. Loop aggregation: At each time step, the adversary may either increase or decrease the gap by , or leave unchanged. We denote these actions of the adversary by , , and . The respective regret values inflicted on the algorithm are given by , , and , which are all independent of the time when an action was taken. This means that if the adversary loops between and at several disconnected points of time, it may as well aggregate all of them and complete all of them in consecutive time steps. I.e., the optimal adversary starts at and then weakly monotonically increases , stopping at various points , looping for an arbitrary length of time between and and then proceeding forward.

2. Staying at same is dominated: It is not hard to see that any action is dominated for the adversary as this wastes a time step and inflicts regret on the algorithm. Thus the “weakly monotonically increases” in the previous paragraph can be replaced by “strictly monotonically increases” (except of course for the stopping points for looping).

3. Loop(0) domination: Define . It is easy to see that the regret inflicted by Loop() is exactly and this quantity is maximized at . Thus, the optimal adversary should replace all loops by loops at . This gives us the structure claimed in Figure 1 for the optimal adversary.

Given the optimal adversary’s structure (as described in Figure 1 ) w.l.o.g. we can assume it to be looping for steps at and then monotonically increasing for steps at which point the game ends. In the following we will analyze the regret inflicted by the optimal adversary (which we showed was optimal for the class of algorithm ) against a broader class of MWA. The regret of the adversary is:

 T−ℓ2∑t=1[12−1eη(2t)+1]+ℓ−1∑d=01eη(T−ℓ+d+1)d+1. (1)

Asymptotic regret of the optimal adversary.

We first notice that for a fixed adversary with a given , the regret of MWA with decreasing in (1) is greater than or equal to the regret of MWA with a constant , i.e.,

 T−ℓ2[12−1eη′+1]+ℓ−1∑d=01edη′+1. (2)

This is true as each individual term in (2) is equal to or smaller than the corresponding term in (1). In the following we are going to use for the adversary and for convenience, we write . The two terms in (1) together place strong bounds on what should be: they imply that . We show this in steps: first we show that , and then show that .

1. The first term in (2) forces to be . The regret of MWA for is at least

 T−ℓ2[12−1eη′+1]≃T2[12−1eη′+1]=α√T4(1+eη′)=√T4(2α+1√T).

Since MWA’s regret upper bound in the finite horizon model is , must be .

2. To show that we argue that the regret from the second term of (2) is when . For all , we have . Thus MWA’s regret for is at least

 k−1∑d=01τd+1≥k−1∑d=01e+1=Ω(k)=ω(√T).

Since MWA’s regret upper bound in the finite horizon model is , we get .

Now, we obtain the following asymptotic estimate for the second part of (2), where .

 ℓ−1∑d=01τd+1∼∫ℓ0dxeη′x+1=1η′ln(2eℓη′eℓη′+1)∼√Tα(ln(2)−ln(1+e−ℓη′)). (3)

The first part of (2) can be estimated as follows

 T−ℓ2[12−1eη′+1]∼T2[eη′−12(eη′+1)]∼T2⋅η′4=α√T8. (4)

As , (3) simplifies to , while the estimate for (4) is . Now the estimate for the regret in (2) is minimized for the choice of parameter . Then the regret of the optimal MWA is at least . It is known that there is MWA for experts with regret at most (asymptotic in ). Thus, we obtain the following claim 1 (in the claim below, by “optimal MWA” we mean the MWA with the optimally tuned ).

Claim 1

For : .

We generalize the adversary for and obtain a tight lower bound for matching the known upper bound for arbitrary even number of experts and almost matching bound for odd number of experts. Since , the lower bound in Theorem 1 below applies to as well.

Theorem 1

For : i) For even : . ii) For odd : .

Proof: Let be the update rate of the optimal MWA, we define and . We employ the following adversary for the even number of experts:

1. Divide all experts into two equal parties, numbered and . For the first rounds (), advance all the experts in party in even numbered rounds, and all experts in party in odd numbered rounds.

2. For the remaining steps, pick an arbitrary expert and keep advancing just that expert.

Similar to (1) this adversary obtains the regret of at least . We further notice that similar to (2) the regret of MWA with decreasing in the above expression is greater than or equal to the regret of MWA with a constant , i.e., the previous expression is at least

 T−ℓ2[12−1eη′+1]+ℓ−1∑d=0k−1ed⋅η′+k−1. (5)

We use (4) to estimate the first term of (5). We estimate the second term of (5) similar to (3) as follows.

 ℓ−1∑d=0k−1edη′+k−1∼∫ℓ0(k−1)dxexη′+k−1=1η′ln(k⋅eℓη′eℓη′+k−1)∼√Tln(k)α. (6)

Now, combining these two estimates the regret from (5) is at least

 α√T8+√Tln(k)α≥2⋅√α√T8⋅√Tln(k)α=√Tln(k)2,

which precisely matches the upper bound on the regret of MWA[7].

For the odd number of experts we employ almost the same adversary as for even , although, since now is odd, we split experts into two parties of almost equal sizes (see Appendix A.1 for full details).

2.1 General variations of MWA

We have seen that the best known MWA with a flat learning rate achieves optimal (or almost optimal in the case of odd number of experts) regret among all MWAs with monotone decreasing learning rates . However, it seems that in the finite horizon model a better strategy for tuning parameters of MWA would be to use higher rates towards the end . In the following we study a broader family of MW algorithms where learning parameter can vary in an arbitrary way. In the following theorem we show that such adaptivity of MWA cannot decrease the regret of the algorithm by more than a factor of .

Remark 2

In fact, our analysis extends to the family where each can be a random variable drawn from a distribution . Effectively, with a random the algorithm player can get any convex combination of in the vector of probabilities for following each expert at time . This constitutes a much richer family of algorithms compared to the standard single parameter MWA family.

Theorem 2

For : i) For even : . ii) For odd : .

Proof: Define and . We use the following adversary for even number of experts :

1. Choose uniformly at random. With probability don’t advance any expert in the first step.

2. Divide all experts into two equal parties, numbered and . For the next rounds, advance all the experts in party in even numbered rounds, and all experts in party in odd numbered rounds.

3. For next steps, pick any expert and keep advancing just expert . Do nothing in remaining steps.

The regret of the algorithm is

 1RR−1∑j=0[ 12(j∑t=1[12−1eη(2t)+1]+ℓ−1∑d=0k−1ed⋅η(2j+d+1)+k−1)+ 12(j∑t=1[12−1eη(2t+1)+1]+ℓ−1∑d=0k−1ed⋅η(2j+d+2)+k−1)]. (7)

Since can be arbitrary nonnegative number, we break (7) into terms with the same (we also drop a few terms to simplify the expression). In the following, we will also assume that , where for every . Later we will explain why this assumption is without loss of generality.

 (???) ≥12RT−ℓ−1∑t=ℓ([R−⌈t/2⌉]⋅[12−1eη(t)+1]+ℓ−1∑d=0k−1eη(t)d+k−1) ≃T−ℓ−1∑t=ℓ([R−⌈t/2⌉R]α(t)8√T+√Tln(k)2R⋅α(t))≥T−ℓ−1∑t=ℓ2 ⎷[R−⌈t/2⌉R]α(t)8√T√Tln(k)2R⋅α(t) =√ln(k)2⋅2RT−ℓ−1∑t=ℓ ⎷[R−⌈t/2⌉R]≃√ln(k)2⋅T∫10√1−xdx=√ln(k)2T⋅23. (8)

In the above derivation we obtain the first approximation by using approximations from (4) and (6).

We now argue that the assumption is without loss of generality for every . We apply a similar argument as in Theorem 1, but now for each individual term with a particular . The term in (8) is already large enough for the estimate when . The term also places a strong bound of on , when is constant fraction of . To argue about close to the threshold , we can slightly modify the adversary by playing with a small constant probability entirely “looping” strategy (without “straight line” part). This would make the coefficient in front of to be sufficiently large, and at the same time would decrease the lower bound by at most factor. Taking arbitrary small we obtain the bound in (8). This concludes the proof for the even number of experts

For the odd number of experts . We slightly modify the adversary analogous to the case of odd number of experts in Theorem 1. This gives us an additional factor of for each of the looping terms.

Remark 3

One can slightly improve the lower bound in Theorem 2 and get a better factor than . To this end we employ a more complicated adversary by playing with some probability the same strategy as in Theorem 2 and with the remaining probability playing purely looping strategy (see Appendix A.2).

3 Geometric horizon

We prove two main results in this Section. We derive the structure of the optimal adversary for experts and show that the optimal regret for experts is exactly as (see Appendix B.1). For an arbitrary number of experts , we derive a regret lower bound of (see Appendix B.2).

References

• Abernethy et al. [2008a] Jacob Abernethy, Peter L. Bartlett, Alexander Rakhlin, and Ambuj Tewari. Optimal stragies and minimax lower bounds for online convex games. In 21st Annual Conference on Learning Theory - COLT 2008, Helsinki, Finland, July 9-12, 2008, pages 415–424, 2008a.
• Abernethy et al. [2008b] Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In COLT, pages 437–446, 2008b.
• Abernethy et al. [2009] Jacob Abernethy, Alekh Agarwal, Peter L. Bartlett, and Alexander Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT 2009 - The 22nd Conference on Learning Theory, Montreal, Quebec, Canada, June 18-21, 2009, 2009.
• Blum and Kalai [1999] Avrim Blum and Adam Kalai. Universal portfolios with and without transaction costs. Machine Learning, 35(3):193–205, June 1999. ISSN 0885-6125.
• Cesa-Bianchi [1997] Nicolò Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT ’97, pages 163–170, New York, NY, USA, 1997. ACM.
• Cesa-Bianchi and Lugosi [2006] Nicolò Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006. ISBN 0521841089.
• Cesa-Bianchi et al. [1997] Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. J. ACM, 44(3):427–485, May 1997. ISSN 0004-5411.
• Chaudhuri et al. [2009] Kamalika Chaudhuri, Yoav Freund, and Daniel J. Hsu. A parameter-free hedging algorithm. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada., pages 297–305, 2009.
• Freund and Schapire [1997] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, August 1997. ISSN 0022-0000.
• Gravin et al. [2016] Nick Gravin, Yuval Peres, and Balasubramanian Sivan. Towards optimal algorithms for prediction with expert advice. In To appear in the Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, 2016.
• Gyorgy et al. [2013] András Gyorgy, Dávid Pál, and Csaba Szepesvári. Online Learning: Algorithms for Big Data. Manuscript, 2013.
• Hannan [1957] James Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
• Haussler et al. [1995] David Haussler, Jyrki Kivinen, and Manfred K. Warmuth. Tight worst-case loss bounds for predicting with expert advice. In EuroCOLT, pages 69–83, 1995.
• Hazan et al. [2006] Elad Hazan, Adam Kalai, Satyen Kale, and Amit Agarwal. Logarithmic regret algorithms for online convex optimization. In Learning Theory, 19th Annual Conference on Learning Theory, COLT 2006, Pittsburgh, PA, USA, June 22-25, 2006, Proceedings, pages 499–513, 2006.
• Koolen [2013] Wouter M. Koolen. The pareto regret frontier. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 863–871, 2013.
• Koolen and van Erven [2015] Wouter M. Koolen and Tim van Erven. Second-order quantile methods for experts and combinatorial games. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 1155–1175, 2015.
• Littlestone and Warmuth [1994] Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, February 1994. ISSN 0890-5401.
• Luo and Schapire [2014] Haipeng Luo and Robert E. Schapire. Towards minimax online learning with unknown time horizon. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 226–234, 2014.
• McMahan and Abernethy [2013] H. Brendan McMahan and Jacob Abernethy. Minimax optimal algorithms for unconstrained linear optimization. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 2724–2732, 2013.
• McMahan and Orabona [2014] H. Brendan McMahan and Francesco Orabona. Unconstrained online linear learning in hilbert spaces: Minimax algorithms and normal approximations. In Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014, pages 1020–1039, 2014.
• Mukherjee and Schapire [2010] Indraneel Mukherjee and Robert E. Schapire. Learning with continuous experts using drifting games. Theor. Comput. Sci., 411(29-30):2670–2683, 2010.
• Rakhlin et al. [2010] Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning: Random averages, combinatorial parameters, and learnability. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada., pages 1984–1992, 2010.
• Rakhlin et al. [2011] Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning: Beyond regret. In COLT 2011 - The 24th Annual Conference on Learning Theory, June 9-11, 2011, Budapest, Hungary, pages 559–594, 2011.
• Rakhlin et al. [2012] Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. Relax and randomize : From value to algorithms. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., pages 2150–2158, 2012.
• Vovk [1990] Volodimir G. Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT ’90, pages 371–386, 1990. ISBN 1-55860-146-5.
• Zinkevich [2003] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 928–936, 2003.

Appendix A Finite horizon

a.1 Theorem 1 for odd number of experts k

\@begintheorem

Theorem 1 For : i) For even : . ii) For odd : . \@endtheorem Proof: We have already proven this theorem for even . So let and let be as before the update rate (non increasing in ) of the optimal MWA. We employ almost the same adversary as for even , although, since now is odd, we split experts into two parties of almost equal sizes. As in the case of even we choose and let .

1. divide all experts into two parties of sizes and respectively, advance all experts within the party of size in one step then advance all experts within the other party of size in the next step, repeat these cycles of two steps times.

2. fix on one expert and keep advancing just that expert for the remaining steps.

This adversary obtains the regret of at least

 T−ℓ2[m+12m+1−m+1m⋅eη′+m+1]+ℓ−1∑d=0k−1edη′+k−1. (9)

We use the estimate (6) for the second part of (9). For the first part of (9) we closely follow the derivation in (4) and obtain the following estimate.

 T−ℓ2[m+12m+1−m+1m⋅eη′+m+1]∼T2⋅(m+1)⋅m⋅α/√T(2m+1)2=α√T(k−1)(k+1)8k2. (10)

Therefore, the regret from (9) is at least

 α√T8⋅(1−1k2)+√Tln(k)α≥2⋅√α√T8(1−1k2)⋅√Tln(k)α=√Tln(k)2(1−1k2),

which concludes the proof of the theorem.

a.2 Improved lower bound for Arand.

We use a slightly more complicated adversary than the one used in Theorem 2 to get a better regret. Call this adversary (which we describe in the proof of the theorem below) . We show:

Theorem 3

For : i) For even : . ii) For odd : .

Proof: We closely follow the proof of Theorem 2, although now we employ a more complicated adversary by following with a probability the same strategy as in Theorem 2 and with the remaining probability playing purely looping strategy (without “straight line” part) as follows.

1. With probability don’t advance any expert in the first step.

2. Divide all experts into two equal (almost equal, when is odd ) parties, numbered and . For the next rounds, advance all the experts in party in even numbered rounds, and all experts in party in odd numbered rounds.

The regret of the MWA for even with respect to this adversary is

 p⋅(???)+(1−p)⋅⎛⎜ ⎜⎝12⌊T2⌋∑t=1[1