A Non-asymptotic, Sharp, and User-friendly Reverse Chernoff-Cramèr Bound
The Chernoff-Cramèr bound is a widely used technique to analyze the upper tail bound of random variable based on its moment generating function. By elementary proofs, we develop a user-friendly reverse Chernoff-Cramèr bound that yields non-asymptotic lower tail bounds for generic random variables. The new reverse Chernoff-Cramèr bound is used to derive a series of results, including the sharp lower tail bounds for the sum of independent sub-Gaussian and sub-exponential random variables, which matches the classic Hoefflding-type and Bernstein-type concentration inequalities, respectively. We also provide non-asymptotic matching upper and lower tail bounds for a suite of distributions, including gamma, beta, (regular, weighted, and noncentral) chi-squared, binomial, Poisson, Irwin-Hall, etc. We apply the result to develop matching upper and lower bounds for extreme value expectation of the sum of independent sub-Gaussian and sub-exponential random variables. A statistical application of sparse signal identification is finally studied.
eq(LABEL:#1) \newrefformatchapChapter LABEL:#1 \newrefformatsecSection LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformattabTable LABEL:#1 \newrefformatrmkRemark LABEL:#1 \newrefformatclmClaim LABEL:#1 \newrefformatdefDefinition LABEL:#1 \newrefformatcorCorollary LABEL:#1 \newrefformatlmmLemma LABEL:#1 \newrefformatpropProposition LABEL:#1 \newrefformatappAppendix LABEL:#1 \newrefformathypHypothesis LABEL:#1 \newrefformatthTheorem LABEL:#1 \newrefformatassAssumption LABEL:#1
The Chernoff-Cramèr bound, with a generic statement given below, is ubiquitous in probability and its applications.
Proposition 1 (Chernoff-Cramèr Bound).
If is a real-valued random variable with moment generating function defined for . Then for any ,
The proof, given in Section 6.1, can be directly obtained by Markov’s inequality. The Chernoff-Cramèr bound is the beginning step for deriving many probability inequalities, among which the Hoeffding’s inequality , Bernstein’s inequality , Bennett’s inequality , Azuma’s inequality , and McDiarmid’s inequality  are well-regarded and widely used. The Chernoff-Cramèr bound has also been extensively studied in a variety of problems in random matrix theory, high-dimensional statistics, and machine learning [8, 35, 36]. This powerful tool and its applications have also been collected in various recent textbooks and class notes (see, e.g. [8, 22, 26, 27, 30, 31, 35, 36]).
Despite enormous developments on the upper tail bounds by Chernoff-Cramèr bound in literature, there are much fewer results on the corresponding lower bounds: how to find a sharp and appropriate , such that (or ) holds? Among existing literature, the Cramèr-Chernoff theorem characterized the asymptotic tail probability for the sum of i.i.d. random variables ([11, Theorem 1]; also see [33, Proposition 14.23]): suppose are i.i.d. copies of and has a finite moment generating function . Then for all ,
In addition, Berry-Esseen central limit theorem [6, 16] provided a non-asymptotic quantification of the normal approximation residual for the sum of independent random variables: let be i.i.d. copies of , where and ; then for all ,
where is the cumulative distribution function of standard normal distribution. This lower bound needs not be universally sharp, as the left hand side of (1) becomes negative if . Slud , Sciszàr , Cover and Thomas  established upper and lower tail bounds for binomial distribution based on its probability mass function. Pitman and Rácz  considered the tail asymptotics for the beta-gamma distribution. There is still a lack of easy-to-use lower tail bounds for generic random variables in finite sample setting.
In this article, we try to answer this question by developing a novel and user-friendly reverse Chernoff-Cramèr bound, which provides a non-asymptotic lower tail bound for generic random variables based on their moment generating functions. The proofs are all elementary and can be conveniently applied to various settings. We discuss in detail the following implications.
Concentration inequalities for sum of independent random variables have been extensively used in non-asymptotic random matrix theory , machine learning and high-dimensional statistics . We utilize the reverse Chernoff-Cramèr bound to establish lower tail bounds for weighted sum of independent sub-Gaussian and sub-exponential random variables, which matches the classic Hoeffding-type and Bernstein-type inequalities in literature.
We also study the matching upper and lower tail bounds for a suite of distributions that commonly appear in practice. The list includes gamma, beta, (regular, weighted, and noncentral) chi-squared, binomial, Poisson, Irwin-Hall distributions, etc.
In addition, we consider two applications of the established results. We derive the matching upper and lower bounds for extreme values of the sums of independent sub-Gaussian and sub-exponential random variables. A statistical problem of sparse signal identification is finally studied.
Moreover, the proposed reverse Chernoff-Cramèr bound relates to a number of previous works. Bagdasarov  studied a reversion of Chebyshev’s inequality based on moment generating function and the properties of convex conjugate. In addition, Theodosopoulos  proved a tighter and broader result by using sharper techniques. By saddle point method, Iyengar and Mazumdar  considered the approximation of multivariate tail probabilities, and Daniels  studied an accurate approximation to the density of the mean of i.i.d. observations. Comparing to these previous works, we are among the first to provide a non-asymptotic, sharp, and user-friendly reverse Chernoff-Cramèr bound and apply the result to a number of common settings in practice, to the best of our knowledge.
The rest of this article is organized as follows. We first introduce the generic statements of user-friendly reverse Chernoff-Cramèr bound in Section 2. The developed tool is applied to derive lower tail bounds for the sum of independent sub-Gaussian and sub-exponential random variables in Section 3, which matches the classic upper bound results. We further study the sharp tail bounds for a number of specific distributions in Section 4. In Section 5, we discuss the extreme value expectation for independent random variables and a statistical application in sparse signal estimation to further illustrate the merit of the newly established results. The additional proofs are collected in Section 6.
2 A Generic User-friendly Reverse Chernoff-Cramèr Bound
We first introduce the notations that will be used throughout the paper. We use uppercase letters, e.g., , to denote random variables and lowercase letters, e.g., , to denote deterministic scalars or vectors. and respectively represent the maximum and minimum of and . We say a random variable is “centered” if . Let , and be the expectation, variance, and probability, respectively. For any random variable , let be the moment generating function. For any vector and , let be the norm. In particular, . We use to respectively represent generic large and small constants, whose actual values may differ from time to time.
Next, we introduce the user-friendly reverse Chernoff-Cramèr bound (Theorem 1) and apply the result to obtain lower tail bounds in a number of general settings.
Theorem 1 (A user friendly reverse Chernoff-Cramèr bound).
Suppose is a random variable with moment generating function defined for . Then for any , we have
First, we claim that if and , then . Actually, by Jensen’s inequality,
For any , , we have
Here, for and ,
The second line is because if .
Therefore, for any , , and , we have
By taking supremum, we have
Particularly, set , we have
By symmetric argument, we can also show
For any centered random variable , if is finite for a range of , by Taylor’s expansion, we have for in a neighborhood of 0. Thus, there exist constants , such that
holds for in a neighborhood of 0. The following Theorem 2 provides the matching upper and lower bounds of tail probability for random variable satisfying Condition (3) in a certain regime. The central key in the proof relies on the user-friendly reverse Chernoff-Cramèr bound (Theorem 1).
Theorem 2 (Tail Probability Bound: Small ).
Suppose is a centered random variable satisfying
where are constants. Then for any constant , there exist constants , such that whenever , we have
Moreover, if , there exist constants , such that
The proof of Theorem 2 is deferred to Section 6.2. Noting that Theorem 2 mainly discusses the tail probability for bounded , we further introduce the tail probability bounds for large in the following Theorem 3.
Theorem 3 (Tail Probability Bound: Large ).
Suppose and are independent random variables. is the moment generating function of and for all . satisfies the tail inequality
The previous Theorem 3 characterizes the tail probability for large , if can be decomposed as , has lower tail bound, has Chernoff-Cramèr upper tail bound, and are independent. Based on Theorem 3, we can immediately derive a tail probability lower bound for the sum of independent random variables.
Suppose are centered and independent random variables. Assume for , where is the moment generating function of , . satisfies the tail inequality
By setting , Theorem 2 implies our assertion. ∎
3 Tail Probability For Sum of Independent Random Variables
With the user-friendly reverse Chernoff-Cramèr bounds developed in the previous section, we are now in the position to study the tail probability bounds for the sum of independent random variables.
The sub-Gaussian random variables, whose tail distribution can be dominated by the one of Gaussian, are a class of random variables that cover many important instances (e.g., Gaussian, Rayleigh, bounded distributions, etc). We consider the tail probability bounds for weighted sum of independent sub-Gaussian random variables. The upper tail bound, which is referred to as the Hoeffding-type concentration inequality, has been introduced and widely used in high-dimensional statistics and machine learning literature.
Proposition 2 (Hoeffding-type Inequality for Sum of Sub-Gaussians).
Suppose are centered and independently sub-Gaussian distributed, in the sense that either of the following hold: (1) ; (2) . Then satisfies
The proof of Proposition 2 can be found in [35, Proposition 5.10]. With the additional condition on the lower tail bound of each summand, we can show the following lower tail bound for sum of independent sub-Gaussians, which matches the classic Hoeffding-type inequality in Proposition 2.
Theorem 4 (Hoeffding-type Inequality for Sub-Gaussian Variables: Matching Upper and Lower Bounds).
Suppose are independent, , and is the moment generating function of . Suppose either of the following statements hold:
there exist constants , such that
there exists constants , such that
There exist constants , such that for any fixed values , satisfies
The proof of this theorem relies on the following fact.
Although Theorem 3 focuses on the right tail bound, i.e., , similar results hold for the left tail, i.e., , by symmetry. We can further prove the following two-sided lower tail bound for the sum of random variables, if both sides of these random variables have sub-Gaussian tails.
Suppose are independent, , is the moment generating function of . Suppose either of the following statements hold:
for constants , , .
for constants and all , .
Then there exist constants , for any fixed real values , satisfies
On the other hand, the class of sub-Gaussian random variables considered in Theorem 4 may be too restrictive and fail to cover many useful random variables with heavier tails than Gaussians. For example, to study the concentration of Wishart matrix in random matrix theory , the summands may be squares of sub-Gaussians rather than sub-Gaussians; the tails of many commonly used distributions, such as Poisson, exponential, and Gamma distributions, are heavier than Gaussians. To cover these cases, sub-exponential distributions, whose tails can be dominated by the exponential distribution, were naturally introduced and widely used in literature (see the forthcoming Proposition 3 for definition of sub-exponential distribution). Next, we consider the tail bounds for sum of sub-exponential random variables. The following Bernstein-type inequality is classic result on the tail upper bound.
Proposition 3 (Bernstein-type Inequality for Sum of Independent Sub-Exponentials (, Proposition 5.16)).
Let be independent centered sub-exponential random variables in the sense that for all . Then for every , satisfies
With the additional lower bound on the moment generating function of each summand, we have the following matching upper and lower tail bounds for the sum of independent sub-exponential random variables.
Theorem 5 (Bernstein-type Inequality for Sum of Independent Sub-Exponentials: Matching Upper and Lower Bounds).
Suppose are centered independent random variables and is the moment generating function of . Suppose ’s are sub-exponential in the sense that:
Suppose , where is a constant. If are non-negative values, then satisfies
Here are constants. In addition, if there exists one () satisfying
and , where is a constant, then we further have
where are constants.
We first consider the lower bound. The moment generating function of satisfies
Since , . By Theorem 2, there exist constants such that
Note that , if then
Moreover, the moment generating function of satisfies
Set in Theorem 3, for all ,
Here is a constant, . In summary, we have proved the lower bound.
For the upper bound, notice that
Similarly to (9), we have
4 Sharp Tail Bounds of Specific Distributions
In this section, we aim to establish the matching upper and lower tail bounds for a number of commonly used distributions, including gamma, (regular, weighted, noncentral) Chi-squared, beta, binomial, Poisson, and Irwin-Hall distributions.
4.1 Gamma Distribution
We first focus on gamma distribution. Suppose is gamma distributed with shape parameter , i.e.,
where is the Gamma function. It is well known that the mean and mode of are and (only if ), respectively. Although the density of gamma distribution is available, it is highly non-trivial to develop its sharp tail probability bound in the closed form. Previously, Boucheron  proved the following result based on the Chernoff-Cramèr upper bound.
Proposition 4 (Gamma tail upper bound (, Pages 27-29)).
Suppose is gamma distributed with shape parameter and . Then for every ,
Or equivalently, for every ,
Based on the reverse Chernoff-Cramèr bound, we can develop the following lower tail bounds for Gamma distribution that matches the upper bound. Since the density of has distinct shapes for and : if ; if , the tail bound behaves differently in these two cases and we discuss them separately in the next Theorem 6.
Theorem 6 (Gamma tail lower bound).
Suppose and .
There exist two uniform constants , such that for all and ,
for any , there exists only relying on , such that for all and ,
for any , .
For all ,
4.2 Chi-squared Distribution
The Chi-squared distributions form a special class of Gamma distributions and are widely used in practice:
Suppose . Laurent and Massart [21, Lemma 1] developed the following upper tail bound for Chi-squared distribution based on the Chernoff-Cramèr upper bound,
Theorem 6 implies the following lower tail bound result that matches the upper bound for Chi-squared distribution.
Corollary 3 ( distribution tail probability bound).
Suppose and for integer . There exists uniform constants not rely on and only relies on , such that
See Section 6.4. ∎
In addition to the regular Chi-squared distributions, the weighted and noncentral Chi-squared distributions (definitions are given in the forthcoming Theorems 7 and 8) are two important extensions that commonly appear in probabilistic and statistical applications. By the regular and reverse Chernoff-Cramèr bounds, one can show the following matching upper and lower tail bounds for weighted Chi-squared distributions in Theorem 7 and noncentral Chi-squared distributions in Theorem 8, respectively.
Theorem 7 (Tail Bounds of Weighted distribution).
Suppose is weighted Chi-squared distributed, in the sense that , where are fixed non-negative values and . Then the centralized random variable satisfies
Theorem 8 (Tail Bounds of Noncentral distribution).
Let be noncentral distributed with degrees of freedom and noncentrality parameter , in the sense that
Then the centralized random variable satisfies