A Nonasymptotic, Sharp, and Userfriendly Reverse ChernoffCramèr Bound
Abstract
The ChernoffCramèr bound is a widely used technique to analyze the upper tail bound of random variable based on its moment generating function. By elementary proofs, we develop a userfriendly reverse ChernoffCramèr bound that yields nonasymptotic lower tail bounds for generic random variables. The new reverse ChernoffCramèr bound is used to derive a series of results, including the sharp lower tail bounds for the sum of independent subGaussian and subexponential random variables, which matches the classic Hoeffldingtype and Bernsteintype concentration inequalities, respectively. We also provide nonasymptotic matching upper and lower tail bounds for a suite of distributions, including gamma, beta, (regular, weighted, and noncentral) chisquared, binomial, Poisson, IrwinHall, etc. We apply the result to develop matching upper and lower bounds for extreme value expectation of the sum of independent subGaussian and subexponential random variables. A statistical application of sparse signal identification is finally studied.
eq(LABEL:#1) \newrefformatchapChapter LABEL:#1 \newrefformatsecSection LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformattabTable LABEL:#1 \newrefformatrmkRemark LABEL:#1 \newrefformatclmClaim LABEL:#1 \newrefformatdefDefinition LABEL:#1 \newrefformatcorCorollary LABEL:#1 \newrefformatlmmLemma LABEL:#1 \newrefformatpropProposition LABEL:#1 \newrefformatappAppendix LABEL:#1 \newrefformathypHypothesis LABEL:#1 \newrefformatthTheorem LABEL:#1 \newrefformatassAssumption LABEL:#1
1 Introduction
The ChernoffCramèr bound, with a generic statement given below, is ubiquitous in probability and its applications.
Proposition 1 (ChernoffCramèr Bound).
If is a realvalued random variable with moment generating function defined for . Then for any ,
The proof, given in Section 6.1, can be directly obtained by Markov’s inequality. The ChernoffCramèr bound is the beginning step for deriving many probability inequalities, among which the Hoeffding’s inequality [18], Bernstein’s inequality [5], Bennett’s inequality [4], Azuma’s inequality [2], and McDiarmid’s inequality [24] are wellregarded and widely used. The ChernoffCramèr bound has also been extensively studied in a variety of problems in random matrix theory, highdimensional statistics, and machine learning [8, 35, 36]. This powerful tool and its applications have also been collected in various recent textbooks and class notes (see, e.g. [8, 22, 26, 27, 30, 31, 35, 36]).
Despite enormous developments on the upper tail bounds by ChernoffCramèr bound in literature, there are much fewer results on the corresponding lower bounds: how to find a sharp and appropriate , such that (or ) holds? Among existing literature, the CramèrChernoff theorem characterized the asymptotic tail probability for the sum of i.i.d. random variables ([11, Theorem 1]; also see [33, Proposition 14.23]): suppose are i.i.d. copies of and has a finite moment generating function . Then for all ,
In addition, BerryEsseen central limit theorem [6, 16] provided a nonasymptotic quantification of the normal approximation residual for the sum of independent random variables: let be i.i.d. copies of , where and ; then for all ,
(1) 
where is the cumulative distribution function of standard normal distribution. This lower bound needs not be universally sharp, as the left hand side of (1) becomes negative if . Slud [28], Sciszàr [13], Cover and Thomas [12] established upper and lower tail bounds for binomial distribution based on its probability mass function. Pitman and Rácz [25] considered the tail asymptotics for the betagamma distribution. There is still a lack of easytouse lower tail bounds for generic random variables in finite sample setting.
In this article, we try to answer this question by developing a novel and userfriendly reverse ChernoffCramèr bound, which provides a nonasymptotic lower tail bound for generic random variables based on their moment generating functions. The proofs are all elementary and can be conveniently applied to various settings. We discuss in detail the following implications.

Concentration inequalities for sum of independent random variables have been extensively used in nonasymptotic random matrix theory [35], machine learning and highdimensional statistics [36]. We utilize the reverse ChernoffCramèr bound to establish lower tail bounds for weighted sum of independent subGaussian and subexponential random variables, which matches the classic Hoeffdingtype and Bernsteintype inequalities in literature.

We also study the matching upper and lower tail bounds for a suite of distributions that commonly appear in practice. The list includes gamma, beta, (regular, weighted, and noncentral) chisquared, binomial, Poisson, IrwinHall distributions, etc.

In addition, we consider two applications of the established results. We derive the matching upper and lower bounds for extreme values of the sums of independent subGaussian and subexponential random variables. A statistical problem of sparse signal identification is finally studied.
Moreover, the proposed reverse ChernoffCramèr bound relates to a number of previous works. Bagdasarov [3] studied a reversion of Chebyshev’s inequality based on moment generating function and the properties of convex conjugate. In addition, Theodosopoulos [32] proved a tighter and broader result by using sharper techniques. By saddle point method, Iyengar and Mazumdar [19] considered the approximation of multivariate tail probabilities, and Daniels [14] studied an accurate approximation to the density of the mean of i.i.d. observations. Comparing to these previous works, we are among the first to provide a nonasymptotic, sharp, and userfriendly reverse ChernoffCramèr bound and apply the result to a number of common settings in practice, to the best of our knowledge.
The rest of this article is organized as follows. We first introduce the generic statements of userfriendly reverse ChernoffCramèr bound in Section 2. The developed tool is applied to derive lower tail bounds for the sum of independent subGaussian and subexponential random variables in Section 3, which matches the classic upper bound results. We further study the sharp tail bounds for a number of specific distributions in Section 4. In Section 5, we discuss the extreme value expectation for independent random variables and a statistical application in sparse signal estimation to further illustrate the merit of the newly established results. The additional proofs are collected in Section 6.
2 A Generic Userfriendly Reverse ChernoffCramèr Bound
We first introduce the notations that will be used throughout the paper. We use uppercase letters, e.g., , to denote random variables and lowercase letters, e.g., , to denote deterministic scalars or vectors. and respectively represent the maximum and minimum of and . We say a random variable is “centered” if . Let , and be the expectation, variance, and probability, respectively. For any random variable , let be the moment generating function. For any vector and , let be the norm. In particular, . We use to respectively represent generic large and small constants, whose actual values may differ from time to time.
Next, we introduce the userfriendly reverse ChernoffCramèr bound (Theorem 1) and apply the result to obtain lower tail bounds in a number of general settings.
Theorem 1 (A user friendly reverse ChernoffCramèr bound).
Suppose is a random variable with moment generating function defined for . Then for any , we have
Proof.
First, we claim that if and , then . Actually, by Jensen’s inequality,
Thus .
For any , , we have
Here, for and ,
The second line is because if .
Therefore, for any , , and , we have
By taking supremum, we have
Particularly, set , we have
(2) 
By symmetric argument, we can also show
∎
For any centered random variable , if is finite for a range of , by Taylor’s expansion, we have for in a neighborhood of 0. Thus, there exist constants , such that
(3) 
holds for in a neighborhood of 0. The following Theorem 2 provides the matching upper and lower bounds of tail probability for random variable satisfying Condition (3) in a certain regime. The central key in the proof relies on the userfriendly reverse ChernoffCramèr bound (Theorem 1).
Theorem 2 (Tail Probability Bound: Small ).
Suppose is a centered random variable satisfying
(4) 
where are constants. Then for any constant , there exist constants , such that whenever , we have
(5) 
Moreover, if , there exist constants , such that
(6) 
The proof of Theorem 2 is deferred to Section 6.2. Noting that Theorem 2 mainly discusses the tail probability for bounded , we further introduce the tail probability bounds for large in the following Theorem 3.
Theorem 3 (Tail Probability Bound: Large ).
Suppose and are independent random variables. is the moment generating function of and for all . satisfies the tail inequality
Then satisfies
(7) 
Proof.
The previous Theorem 3 characterizes the tail probability for large , if can be decomposed as , has lower tail bound, has ChernoffCramèr upper tail bound, and are independent. Based on Theorem 3, we can immediately derive a tail probability lower bound for the sum of independent random variables.
Corollary 1.
Suppose are centered and independent random variables. Assume for , where is the moment generating function of , . satisfies the tail inequality
Then, satisfies
(10) 
Proof.
Since
By setting , Theorem 2 implies our assertion. ∎
3 Tail Probability For Sum of Independent Random Variables
With the userfriendly reverse ChernoffCramèr bounds developed in the previous section, we are now in the position to study the tail probability bounds for the sum of independent random variables.
The subGaussian random variables, whose tail distribution can be dominated by the one of Gaussian, are a class of random variables that cover many important instances (e.g., Gaussian, Rayleigh, bounded distributions, etc). We consider the tail probability bounds for weighted sum of independent subGaussian random variables. The upper tail bound, which is referred to as the Hoeffdingtype concentration inequality, has been introduced and widely used in highdimensional statistics and machine learning literature.
Proposition 2 (Hoeffdingtype Inequality for Sum of SubGaussians).
Suppose are centered and independently subGaussian distributed, in the sense that either of the following hold: (1) ; (2) . Then satisfies
The proof of Proposition 2 can be found in [35, Proposition 5.10]. With the additional condition on the lower tail bound of each summand, we can show the following lower tail bound for sum of independent subGaussians, which matches the classic Hoeffdingtype inequality in Proposition 2.
Theorem 4 (Hoeffdingtype Inequality for SubGaussian Variables: Matching Upper and Lower Bounds).
Suppose are independent, , and is the moment generating function of . Suppose either of the following statements hold:

there exist constants , such that
(11) 
there exists constants , such that
(12)
There exist constants , such that for any fixed values , satisfies
(13) 
Proof.
The proof of this theorem relies on the following fact.
Although Theorem 3 focuses on the right tail bound, i.e., , similar results hold for the left tail, i.e., , by symmetry. We can further prove the following twosided lower tail bound for the sum of random variables, if both sides of these random variables have subGaussian tails.
Corollary 2.
Suppose are independent, , is the moment generating function of . Suppose either of the following statements hold:

for constants , , .

for constants and all , .
Then there exist constants , for any fixed real values , satisfies
On the other hand, the class of subGaussian random variables considered in Theorem 4 may be too restrictive and fail to cover many useful random variables with heavier tails than Gaussians. For example, to study the concentration of Wishart matrix in random matrix theory [38], the summands may be squares of subGaussians rather than subGaussians; the tails of many commonly used distributions, such as Poisson, exponential, and Gamma distributions, are heavier than Gaussians. To cover these cases, subexponential distributions, whose tails can be dominated by the exponential distribution, were naturally introduced and widely used in literature (see the forthcoming Proposition 3 for definition of subexponential distribution). Next, we consider the tail bounds for sum of subexponential random variables. The following Bernsteintype inequality is classic result on the tail upper bound.
Proposition 3 (Bernsteintype Inequality for Sum of Independent SubExponentials ([35], Proposition 5.16)).
Let be independent centered subexponential random variables in the sense that for all . Then for every , satisfies
With the additional lower bound on the moment generating function of each summand, we have the following matching upper and lower tail bounds for the sum of independent subexponential random variables.
Theorem 5 (Bernsteintype Inequality for Sum of Independent SubExponentials: Matching Upper and Lower Bounds).
Suppose are centered independent random variables and is the moment generating function of . Suppose ’s are subexponential in the sense that:
Suppose , where is a constant. If are nonnegative values, then satisfies
Here are constants. In addition, if there exists one () satisfying
and , where is a constant, then we further have
(14) 
where are constants.
Proof.
We first consider the lower bound. The moment generating function of satisfies
(15) 
Since , . By Theorem 2, there exist constants such that
Note that , if then
(16) 
Moreover, the moment generating function of satisfies
(17) 
Set in Theorem 3, for all ,
(18) 
Here is a constant, . In summary, we have proved the lower bound.
4 Sharp Tail Bounds of Specific Distributions
In this section, we aim to establish the matching upper and lower tail bounds for a number of commonly used distributions, including gamma, (regular, weighted, noncentral) Chisquared, beta, binomial, Poisson, and IrwinHall distributions.
4.1 Gamma Distribution
We first focus on gamma distribution. Suppose is gamma distributed with shape parameter , i.e.,
(20) 
where is the Gamma function. It is well known that the mean and mode of are and (only if ), respectively. Although the density of gamma distribution is available, it is highly nontrivial to develop its sharp tail probability bound in the closed form. Previously, Boucheron [8] proved the following result based on the ChernoffCramèr upper bound.
Proposition 4 (Gamma tail upper bound ([8], Pages 2729)).
Suppose is gamma distributed with shape parameter and . Then for every ,
Or equivalently, for every ,
Based on the reverse ChernoffCramèr bound, we can develop the following lower tail bounds for Gamma distribution that matches the upper bound. Since the density of has distinct shapes for and : if ; if , the tail bound behaves differently in these two cases and we discuss them separately in the next Theorem 6.
Theorem 6 (Gamma tail lower bound).
Suppose and .

There exist two uniform constants , such that for all and ,
for any , there exists only relying on , such that for all and ,
for any , .

For all ,
(21) (22)
4.2 Chisquared Distribution
The Chisquared distributions form a special class of Gamma distributions and are widely used in practice:
Suppose . Laurent and Massart [21, Lemma 1] developed the following upper tail bound for Chisquared distribution based on the ChernoffCramèr upper bound,
Theorem 6 implies the following lower tail bound result that matches the upper bound for Chisquared distribution.
Corollary 3 ( distribution tail probability bound).
Suppose and for integer . There exists uniform constants not rely on and only relies on , such that
(23) 
(24) 
Proof.
See Section 6.4. ∎
In addition to the regular Chisquared distributions, the weighted and noncentral Chisquared distributions (definitions are given in the forthcoming Theorems 7 and 8) are two important extensions that commonly appear in probabilistic and statistical applications. By the regular and reverse ChernoffCramèr bounds, one can show the following matching upper and lower tail bounds for weighted Chisquared distributions in Theorem 7 and noncentral Chisquared distributions in Theorem 8, respectively.
Theorem 7 (Tail Bounds of Weighted distribution).
Suppose is weighted Chisquared distributed, in the sense that , where are fixed nonnegative values and . Then the centralized random variable satisfies
Theorem 8 (Tail Bounds of Noncentral distribution).
Let be noncentral distributed with degrees of freedom and noncentrality parameter , in the sense that
Then the centralized random variable satisfies
(25) 