Optimal Non-Asymptotic Lower Bound
Minimax Regret of Learning with Expert Advice
We prove non-asymptotic lower bounds on the expectation of the maximum of independent Gaussian variables and the expectation of the maximum of independent symmetric random walks. Both lower bounds recover the optimal leading constant in the limit. A simple application of the lower bound for random walks is an (asymptotically optimal) non-asymptotic lower bound on the minimax regret of online learning with expert advice.
Let be i.i.d. Gaussian random variables . It easy to prove that (see Appendix A)
It is also well known that
Discrete analog of a Gaussian random variable is the symmetric random walk. Recall that a random walk of length is a sum of i.i.d. Rademacher variables, which have probability distribution . We consider independent symmetric random walks of length . Analogously to (1), it is easy to prove that (see Appendix A)
We prove a non-asymptotic lower bound on . Same as for the Gaussian case, the leading term of the lower bound is asymptotically matching (4).
In section 4, we show a simple application of the lower bound on to the problem of learning with expert advice. This problem was extensively studied in the online learning literature; see (Cesa-BianchiL06). Our bound is optimal in the sense that for large and large it recovers the right leading constant.
2 Maximum of Gaussians
Crucial step towards lower bounding is a good lower bound on the tail of a single Gaussian. The standard way of deriving such bounds is via bounds on the so-called Mill’s ratio. Mill’s ratio of a random variable with density function is the ratio .111Mill’s ratio has applications in economics. A simple is problem where Mill’s ratio shows up is the problem of setting optimal price for a product. Given a distribution prices that customers are willing to pay, the goal is to choose the price that brings the most revenue. It clear that a lower bound on the Mill’s ratio yields a lower bound on the tail .
Without loss of generality it suffices to lower bound the Mill’s ratio of , since Mill’s ratio of can be obtained by rescaling. Recall that probability density of is and its cumulative distribution function is . The Mill’s ratio for can be expressed as . A lower bound on Mill’s ratio of was proved by Boyd-1959.
Lemma 1 (Mill’s ratio for standard Gaussian (Boyd-1959)).
For any ,
The second inequality in Lemma 1 is our simplification of Boyd’s bound. It follows by setting and . By a simple algebra it is equivalent to the inequality which holds for any .
Corollary 2 (Lower Bound on Gaussian Tail).
Let and . Then,
Equipped with the lower bound on the tail, we prove a lower bound on the maximum of Gaussians.
Theorem 3 (Lower Bound on Maximum of Independent Gaussians).
Let be independent Gaussian random variables . For any ,
Let be the event that at least one of the is greater than where . We denote by the complement of this event. We have
where we used that .
It remains to lower bound , which we do as follows
where in the first inequality we used the elementary inequality valid for all .
Since we have . Substituting this into (8), we get
The function is decreasing on the interval , increasing on , and . From these properties we can deduce that for any . Therefore, and hence
where we used that for any . Since has minimum at , it follows that for any . ∎
3 Maximum of Random Walks
The general strategy for proving a lower bound on is the same as in the previous section. The main task it to lower bound the tail of a symmetric random walk of length . Note that
is a Binomial random variable . We follow the same approach used in nOrabona13. First we lower bound the tail with McKay1989.
Lemma 4 (Bound on Binomial Tail).
Let be integers satisfying and . Define . Then, satisfies
We lower bound the binomial coefficient using Stirling’s approximation of the factorial. The lower bound on the binomial coefficient will be expressed in terms of Kullback-Leibler divergence between two Bernoulli distributions, and . Abusing notation somewhat, we write the divergence as
The result is the following lower bound on the tail of Binomial.
Theorem 5 (Bound on Binomial Tail).
Let be integers satisfying and . Define . Then, satisfies
Lemma 4 implies that
Since , we can write the binomial coefficient as
We bound the binomial coefficient by using Stirling’s formula for the factorial. We use explicit upper and lower bounds due to Robbins-1955 valid for any ,
Using the Stirling’s approximation, for any ,
where in the equality we used the definition of . Combining all the inequalities, gives
for . For , we verify the statement of the theorem by direct substitution. The left hand side is . Since and , it’s easy to see that the right hand side is smaller than . ∎
For , the divergence can be approximated by . We define the function as
It is the ratio of the divergence and the approximation. The function satisfies the following properties:
is decreasing on and increasing on
minimum value is
maximum value is
Using the definition of and Theorem 5, we have the following Corollary.
Let be a positive integer and let be a real number. Then satisfies
Theorem 7 (Lower Bound on Maximum of Independent Symmetric Random Walks).
Let be independent symmetric random walks of length . If and ,
Define the event equal to the case that at least one of the is greater or equal to where .
We upper and lower bound . Denote by and notice that . It suffices to bound and . We already know that for all and for . The function is decreasing on , increasing on , and . It has unique minimum at . Therefore, for all . Similarly, from unimodality of we have that for all . From this we can conclude that if ,
If and this implies that
Recalling the definition of event , we have
We lower bound . Using the fact that distribution of is symmetric and has zero mean,
|(by symmetry of )|
|(again, by symmetry of )|
|(by concavity of )|
Now let us focus on . Note that is a binomial random variable with distribution . Similar to the proof of Theorem 3, we can lower bound as
|(by Corollary 6 and (12))|
We now use the fact that implies that . Hence, we obtain
where in the last equality we used the fact that for . Putting all together, we have the stated bound. ∎
4 Learning with Expert Advice
Learning with Expert Advice is an online problem where in each round an algorithm chooses (possibly randomly) an action and then it receives losses of the actions . This repeats for rounds. The goal of the algorithm is to have a small cumulative loss of actions it has chosen. The difference between the algorithm’s loss and the loss of best fixed action in hind-sight is called regret. Formally,
There are algorithms that given the number of rounds as an input achieve regret no more than for any sequence of losses.
Let and . For any algorithm for learning with expert advice there exists a sequence of losses , , , such that
Proceeding as in the proof of Theorem 3.7 in (Cesa-BianchiL06) we only need to show that
where are independent symmetric random walks of length . The theorem follows from Theorem 7. ∎
The theorem proves a non-asymptotic lower bounds, while at the same time recovering the optimal constant of the asymptotic one in Cesa-BianchiL06.
Appendix A Upper Bounds
We say that a random variable is -sub-Gaussian (for some ) if
It is straightforward to verify that is -sub-Gaussian. Indeed, for any ,
We now show that a Rademacher random variable (with distribution ) is -sub-Gaussian. Indeed, for any ,
If are independent -sub-Gaussian random variables, then is -sub-Gaussian. This follows from
This property proves that the symmetric random walk of length