Definition 1.

] ] ]

BAYESIAN MULTIPLE TESTING UNDER SPARSITY

FOR POLYNOMIAL-TAILED DISTRIBUTIONS

Xueying Tang, Ke Li and Malay Ghosh

University of Florida, Southwestern University of Finance and Economics

and University of Florida

Abstract: This paper considers Bayesian multiple testing under sparsity for polynomial-tailed distributions satisfying a monotone likelihood ratio property. Included in this class of distributions are the Student’s t, the Pareto, and many other distributions. We prove some general asymptotic optimality results under fixed and random thresholding. As examples of these general results, we establish the Bayesian asymptotic optimality of several multiple testing procedures in the literature for appropriately chosen false discovery rate levels. We also show by simulation that the Benjamini-Hochberg procedure with a false discovery rate level different from the asymptotically optimal one can lead to high Bayes risk.

Key words and phrases: Asymptotic optimality, Benjamini-Hochberg procedure, false discovery rate, Pareto distribution, Student’s T distribution.

1. Introduction

Multiple testing has become a topic of growing importance in recent years. Its importance is particularly felt in the event of inference under sparsity, detecting a few signals in the midst of multiple noises. Applications abound, for example, in genetics, engineering, biology, and finance, just to name a few. A specific example is when one needs to identify a handful of genes attributable to a certain disease in the midst of thousands of others.

Currently, the most widely used approach for multiple testing is the one due to Benjamini and Hochberg (1995) that controls the false discovery rate (FDR). Since then, there are several noteworthy contributions in the area. Among others, we refer to Benjamini and Yekutieli (2001), Efron and Tibshirani (2002), Abramovich, Benjamini, Donoho, and Johnstone (2006), Donoho and Jin (2006), Gavrilov, Benjamini, and Sarkar (2009), Genovese and Wasserman (2002, 2004), Sarkar (2002), and Storey (2002).

Recently, Bogdan, Ghosh, and Tokdar (2008) conducted an extensive simulation study to find closeness of the Benjamini-Hochberg procedure to an optimal Bayes procedure for multiple hypothesis testing under normality of the data. Later, in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011) and Frommlet and Bogdan (2013), it was shown that several multiple testing procedures, including the Benjamini-Hochberg procedure, asymptotically attained the Bayes oracle property under sparsity, once again under normality. As an extension of Bogdan, Chakrabarti, Frommlet, and Ghosh (2011), Neuvial and Roquain (2012) studied properties of FDR thresholding with observations coming from the Subbotin family that includes Laplace and normal distributions as special cases.

Here we consider distributions with polynomial tails. As we show later, the Bayes rules of the multiple testing problem of normal distributions and polynomial-tailed distributions produce quite different Bayes risks. Both type \@slowromancapi@ and type \@slowromancapii@ errors play a role in the limiting (as the number of tests goes to infinity) Bayes risk of the oracle multiple testing procedure for polynomial-tailed distributions while, as shown in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011), the Bayes risk is asymptotically determined solely by type \@slowromancapii@ errors for normal distributions. As indicated by Chi (2007), in multiple testing problems for polynomial-tailed distributions, controlling FDR under a certain threshold level leads to asymptotically zero power. As a result, a vanishing FDR level is very unlikely to define an asymptotically optimal procedure in terms of Bayes risk. As this is not the case for normal distributions, we were motivated to study the performance of the Benjamini-Hochberg procedure and some other multiple testing procedures for polynomial-tailed distributions. We study the asymptotic optimality of multiple testing procedures for a general class of such distributions, including Student’s t, Pareto, and many others.

Our framework follows that of Bogdan, Chakrabarti, Frommlet, and Ghosh (2011) and Neuvial and Roquain (2012), where the multiple testing problem is addressed in a decision theoretic framework. Suppose there are independent observations, each of which comes from a mixture of two distributions in the scale family of a polynomial-tailed distribution. We are interested in testing simultaneously which distribution each observation comes from. We assume the loss of wrong decisions for the tests is the sum of the losses of wrong decisions for individual tests (Lehmann (1957a, b)). For each test, nonzero loss occurs if and only if a type \@slowromancapi@ or a type \@slowromancapii@ error is made. All our results are obtained in an asymptotic framework that ensures that the limiting power of an individual test based on the Bayes oracle threshold converges to a constant between zero and one.

After finding the oracle Bayes risk, we define asymptotic Bayesian optimality under sparsity (ABOS) analogous to Bogdan, Chakrabarti, Frommlet, and Ghosh (2011). Under the asymptotic framework, a necessary and sufficient condition is provided for a fixed thresholding procedure to be ABOS. A single constraint guarantees that the risks from the two error types converge to the optimal risk for our proposed class of distributions.

A more practically meaningful result is that we obtain a general sufficient condition for a random thresholding procedure to obtain ABOS. The condition requires comparison of a random threshold with a fixed ABOS threshold that is sometimes easier to work with than bounding the Bayes risk directly, as in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011). Our general results show that the procedures controlling Bayesian false discovery rate (BFDR), the procedure of Genovese and Wasserman (2002), and the Benjamini-Hochberg procedure are all ABOS if the FDR level is chosen properly. On the other hand, it is shown via simulation that the Benjamini-Hochberg procedure with an FDR level different from the optimal one can lead to high Bayes risk.

The remaining sections of this article are organized as follows. In Section 2, we describe our asymptotic framework. The Bayes oracle rule and its Bayes risk are derived in this section, and we list a few important distributions with polynomial tails. Two general results about the conditions for fixed or random thresholding procedures to be ABOS are presented in Section 3. In Section 4, we provide conditions under which several procedures, including Benjamini-Hochberg, are ABOS by applying the results in Section 3. Section 5 contains numerical results suggesting the non-optimality of the Benjamini-Hochberg procedure. Some final remarks are made in Section 6. Proofs of theoretical results are provided in the supplementary material.

2. Oracle Bayes Risk and Asymptotic Framework

Suppose we have independent observations from the same distribution. Let and be the cumulative distribution function (cdf) and the probability density function (pdf) with respect to Lebesgue measure of a distribution from a monotone polynomial tail (MPT) distribution family defined as follows.

###### Definition 1.

A distribution with cdf and pdf is said to be an MPT distribution if is either an even function or a function whose support is the nonnegative real line, as for some constant and , and for any , is a strictly increasing function in for .

The study of decision procedures for distributions with general monotone likelihood ratio (MLR) properties dates back to Karlin and Rubin (1956). The MLR property here ensures a simple form for the Bayes rule. These distributions have polynomial tails with , the polynomial tail heaviness index, specifying the heaviness of the tail. By L’Hospital’s rule, one has . We focus on symmetric MPT distributions to make comparisons with normal distributions. For symmetric MPT, the scale family with pdf has MLR property in . The MPT family includes many important distributions. Some examples are given in Table 2.1.

###### Proposition 1.

A distribution in the MPT family satisfies

1. for , is a strictly increasing function;

2. for , is a strictly increasing function;

3. if , is a strictly increasing function for ;

4. for , is a strictly increasing function.

We assume that the common distribution of the ’s has cdf

 pD(x/σ1)+(1−p)D(x/σ0)(0<σ0<σ1). (2.1)

The mixture has the distribution of determined by a latent Bernoulli random variable with success probability , and the latent variables are mutually independent. The cdf of is if and if . We are interested in simultaneously testing

 H0i:si=0 vs. HAi:si=1, i=1,…,m.

In a Bayesian hypothesis testing framework, the marginal distribution of the observations is often of the form (2.1). A specific example of our model is in stock selection, where are the log returns of stocks, often modeled by Student’s t distributions. We can equivalently assume and . To test whether some stocks have extreme returns, if we assume

 μi|τ2i∼(1−p)N(0,η20τ2i)+pN(0,η21τ2i), i=1,…,m,

then the marginal distribution of is a mixture of two Student’s t distributions with degrees of freedom and .

Let be the decision rule used for the th test. If , the null hypothesis is rejected; otherwise the null is not rejected. For each test, the loss is non-zero only if , that is, only when a type \@slowromancapi@ or type \@slowromancapii@ error is made. Let and denote the respective losses of making a type \@slowromancapi@ and a type \@slowromancapii@ error. We assume that the overall loss of the tests is the sum of losses for individual tests. This additive loss structure is similar to the one in Lehmann (1957a, b) and in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011). To simplify matters, we take

 1+u=σ21σ20,δ=δ0δA,f=1−pp,v=δf.

These parameters can vary with the number of tests .

The Bayes risk of a multiple testing procedure is

 R=pδAm∑i=1(vt1i+t2i), (2.2)

where and denote the probabilities of type \@slowromancapi@ and type \@slowromancapii@ errors for the th test; hereafter we call and the type \@slowromancapi@ and type \@slowromancapii@ risk components, respectively. The Bayes rule minimizing the Bayes risk can be shown to reject if

 (1+u)−1/2d((1+u)−1/2Xi/σ0)d(Xi/σ0)>v, i=1,…,m.

Due to the MLR property, the Bayes rule rejects if , where is the positive solution of the equation

 (1+u)−1/2d(ωopt(1+u)−1/2)d(ωopt)=v. (2.3)

As is unknown in practice, we call the oracle threshold.

We seek conditions for a multiple testing procedure to attain the Bayes oracle property under sparsity as . To impose sparsity, we assume as , and let to ensure that the signals are strong enough to be discovered as . We assume that to avoid having the power of an individual test going to zero or one. By (2.3) and the definition of MPT family, this is equivalent to assuming , where

 C0=C(γ+1)/2d(√C)/Cd (2.4)

is a strictly increasing function in , according to Proposition 1. If , this can be simplified to . Intuitively, it is more difficult to distinguish between signals and noises if the data have fewer signals (smaller ) and smaller signal-to-noise ratio (smaller ). The assumption that converges to a constant guarantees the signals are identifiable, while the magnitude of the constant indicates the intrinsic difficulty in identifying those signals. A larger reflects more difficulties and we call the difficulty index.

To summarize, we study the properties of multiple testing procedures under the asymptotic framework (as )

 p→0,u→∞,vu−γ/2→C0, (2.5)

whereas the asymptotic framework in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011) has . Noticing that the second and the third assumptions in (2.5) imply , the only difference between the two frameworks is the relation between and . For normal distributions, the rate at which the signal strength increases to infinity is the logarithm of , while for polynomial-tailed distributions, it is a polynomial in .

###### Proposition 2.

Let and . Under (2.5),

 ω2opt∼C(v/C0)2/γ,t1i=t1∼C1v−1,t2i=t2∼C2, (2.6)

with as in (2.4). The corresponding Bayes risk is

 Ropt∼mpδA(C1+C2). (2.7)

By Proposition 1, is a strictly increasing function in , which agrees with the interpretation of the difficulty index ; a more difficult multiple testing task leads to a higher Bayes risk.

Since and are the limits of type \@slowromancapi@ and type \@slowromancapii@ risk components of the oracle procedure, we call them asymptotically optimal type \@slowromancapi@ and type \@slowromancapii@ risk components. In Bogdan, Chakrabarti, Frommlet, and Ghosh (2011) and Neuvial and Roquain (2012), the limiting Bayes risks of the oracle threshold are shown to depend solely on the type \@slowromancapii@ risk component. For polynomial-tailed distributions, neither risk component of the oracle procedure is negligible as the number of tests goes to infinity. In both models, the oracle probability of type \@slowromancapi@ errors goes to zero, but the probability decays at the rate of for polynomial-tailed models, while for the normal model, the rate is faster. Besides the need for stronger signals to ensure detectability, this is yet another effect of heavy-tailed signals and noises.

###### Definition 2.

A multiple testing rule is asymptotically Bayes optimal under sparsity (ABOS) under (2.5) if its Bayes risk satisfies

3. Fixed and Random Thresholding Procedures

In this section, we consider multiple testing procedures that reject the th null hypothesis if is greater than or equal to a threshold, which can be either non-data dependent (fixed) or data dependent (random). To distinguish them, we let denote the fixed threshold and denote the random threshold. For a fixed thresholding procedure, the events and are based only on the th observation with respective probabilities the same for each . Therefore, the Bayes risk of a fixed thresholding procedure can be expressed as where , and In contrast, the events and for a random thresholding procedure are potentially based on all observations and the probabilities of type \@slowromancapi@ and type \@slowromancapii@ errors are not necessarily the same across different tests.

###### Theorem 1.

A fixed thresholding multiple testing procedure that rejects when is ABOS if and only if the threshold satisfies

 ω2/ω2opt→1, (3.1)

or, equivalently, with as in (2.4),

 ω2=C(v/C0)2/γ(1+o(1)). (3.2)

It may appear that even if the type \@slowromancapi@ and type \@slowromancapii@ risk components do not tend to the corresponding asymptotically optimal risk components, there is still a chance that . However, the proof shows that the two components have to converge to the corresponding optimal risk components individually in order to achieve ABOS. This observation is also true for the normal distribution, but as shown by Theorem 3.2 in Bogdan, Chakrabarti, Frommlet, and Ghosh (2011), two conditions, one for the type \@slowromancapii@ risk component, the other for the type \@slowromancapi@ risk component, are needed to guarantee this, while in our case, only one condition is required. In Remark 3.1 of Bogdan, Chakrabarti, Frommlet, and Ghosh (2011), the authors argued the reason for the extra condition for type \@slowromancapi@ error is that, for normal models, type \@slowromancapi@ errors are more sensitive to changes in the critical value than type \@slowromancapii@ errors. In their language, our Theorem 1 shows that for polynomial-tailed models, type \@slowromancapi@ and type \@slowromancapii@ errors are equally sensitive to changes in the critical value.

###### Theorem 2.

Under (2.5), a random thresholding multiple testing procedure that rejects if is ABOS if for all ,

 1mm∑i=1P(|^ω−ωopt|>ϵv1/γ|si=0)=o(v−1), (3.3)
 1mm∑i=1P(|^ω−ωopt|>ϵv1/γ|si=1)=o(1). (3.4)

If does not converge to zero as , then a random thresholding procedure is ABOS if for all ,

 P(|^ω/ωopt−1|>ϵ)=o(v−1). (3.5)

An equivalent condition to (3.5) is that Theorem 2 continues to hold if the oracle threshold is replaced by the threshold of a fixed thresholding procedure that is ABOS. The left hand sides of (3.3) and (3.4) can be interpreted as the average departures of the probabilities of type \@slowromancapi@ and type \@slowromancapii@ errors of a random thresholding procedure from the corresponding errors of the Bayes oracle. Although less general than (3.3) and (3.4), condition (3.5) could be easier to verify in practice because of the symmetry of the distribution of observations.

As implied by Theorem 1, to obtain an ABOS fixed thresholding procedure, the fixed threshold itself is very likely to contain unknown parameters. In contrast, a random threshold consists of observed data only. For example, it could appear as an estimator of an ABOS fixed threshold. Therefore, a random thresholding procedure is naturally an implementable procedure, and, in this sense, Theorem 2 provides a more practical result.

4. ABOS of Several Special Procedures

4.1 Procedures controlling BFDR

Benjamini and Hochberg (1995) introduced FDR as a less stringent error measure than the familywise error rate, where is the number of total rejections and is the number of false rejections. Storey (2003) argued that the positive false discovery rate (pFDR), can overcome some of the concerns in Benjamini and Hochberg (1995) and, under certain conditions, it coincides with the Bayesian false discovery rate (BFDR) of Efron and Tibshirani (2002),

 \textscBFDR=P(H0i is true∣H0i was rejected)=(1−p)t1(1−p)t1+p(1−t2).

For a fixed thresholding procedure, the threshold and the BFDR level are linked by

 (1−p){1−D(ω)}(1−p){1−D(ω)}+p{1−D(ω(1+u)−1/2)}=α, (4.1)

or equivalently,

 1−D(ω)1−D(ω(1+u)−1/2)=rαf, (4.2)

where . Since we have already found a necessary and sufficient condition for the fixed thresholding procedure to be ABOS, by using (4.2), we are able to find conditions on (depending on ) such that the BFDR controlling procedure is ABOS.

An alternative expression of the BFDR level in (4.1) is that where . Property 3 of Proposition 1 shows that is a strictly increasing function in , and, since as , as . As if , the BFDR of a finite fixed threshold procedure for a given can only be controlled within the interval , where

 β∗={1+(1+u)γ/2/f}−1. (4.3)

Thus with a BFDR level less than , the fixed threshold has to be infinite. In this case, none of the null hypotheses is rejected and the power of an individual test is zero. Since our asymptotic framework requires the power of an individual test to go to a nonzero constant, we confine to the interval .

To distinguish from general fixed thresholds, is used to denote the fixed threshold controlling BFDR under , and the subscript is omitted if there is no ambiguity.

###### Proposition 3.

A fixed thresholding procedure controlling BFDR under is ABOS if and only if

 δrα→C1/(1−C2). (4.4)

The threshold is of the form

 ω2B=CB(frα)2/γ(1+o(1)), (4.5)

where and as in Proposition 2.

Condition (4.4) implies that if either one of and goes to a positive constant, the other is forced to converge to a positive constant as well. For example, if converges to a positive constant , then where is defined by

 α∞=11+δ∞(1−C2)/C1. (4.6)

Also, as more penalty is imposed for type \@slowromancapii@ errors () as , then no control is made on BFDR since as .

4.2 Genovese-Wasserman and Benjamini-Hochberg Procedures

Let and denote the p-values for the tests, ordered as . The Benjamini-Hochberg procedure at FDR level then looks for the largest , denoted by , that satisfies and rejects all the tests whose p-values are less than or equal to . This is equivalent to rejecting the null hypothesis if , where

 ωBH=inf{y:2{1−D(y)}1−^F(y)≤α}, (4.7)

being the common cdf of ’s and . Thus the Benjamini-Hochberg procedure is a random thresholding procedure. To study the ABOS of the Benjamini-Hochberg procedure via Theorem 2, we need to compare (4.7) with a fixed ABOS threshold. Genovese and Wasserman (2002) showed that the Benjamini-Hochberg procedure can be approximated by a fixed thresholding procedure whose threshold is the solution of

 1−D(ωGW)(1−p){1−D(ωGW)}+p{1−D(ωGW(1+u)−1/2)}=α. (4.8)
###### Proposition 4.

If , the rule that rejects the null hypothesis when is ABOS if and only if (4.4) holds. In this case, with as in Proposition 3,

 ω2GW=CB(frα)2/γ(1+o(1)).
###### Theorem 3.

If

 p∝1/log(m) or p∝m−κ for some 0<κ<1, (4.9)
 δ→δ∞>0, (4.10)

the Benjamini-Hochberg procedure at FDR level is ABOS if , where is at (4.6).

The oracle Bayes rule balances type \@slowromancapi@ and type \@slowromancapii@ errors with the consideration of loss for each type of error. The optimal FDR level given in (4.6) is indeed the result of balancing since it is determined by the limiting loss ratio and the asymptotically optimal risk components .

The asymptotically optimal FDR level depends on the difficulty index , which is usually an unknown parameter. Although not having a conclusive answer, we discuss how to find a practically usable FDR level in Section 6.

5. Simulation Results

We compared the performances of the Bayes oracle, the Benjamini-Hochberg procedure with the optimal FDR level, and the Benjamini-Hochberg procedure with FDR level , through simulation studies. The FDR level was proved to be ABOS for the normal distributions by Bogdan, Chakrabarti, Frommlet, and Ghosh (2011). The simulation study in Ghosh, Tang, Ghosh, and Chakrabarti (2016) demonstrated its effectiveness in producing a misclassification probability curve similar to the one obtained from the oracle procedure. We considered this FDR level to illustrate the consequence of applying a multiple testing procedure regardless of the underlying distribution. We write the -BH procedure for the Benjamini-Hochberg procedure with FDR level .

The comparison of performances was done under two scenarios. In the first, we took the sparsity parameter to vary with number of tests , . We recorded the risks of the Bayes oracle and Benjamini-Hochberg procedures with different FDR levels. In the second scenario, with , we examined the behavior of the multiple testing procedures with changing values of . In both scenarios, we fixed the parameters of the loss function, and , to be 1. We considered combinations of polynomial tail heaviness index and the difficulty index , choosing from and , respectively. For each combination, 1000 data sets of were generated from the mixture distribution (2.1) with , , where . Pareto and Student’s t distributions in the MPT distribution family were considered in the simulation.

5.1 Results from scenario 1

The average risks based on 1000 replicates were used to estimate the Bayes risks of the two procedures and to find the Bayes risk ratio of the BH procedure to the oracle. Panels (a) and (b) of Figures 5.1 and 5.2 show the plots of Bayes risk ratios against FDR level . In the plots, the dashed vertical lines denote the asymptotically optimal FDR level as at (4.6). When , the risk ratios at are close to 1 and almost reach the lowest point of the curve. When , the risk ratios at is not as close to the minimum as in the case , but the deviations are still moderate. This observation does not conflict with the asymptotic results we have established, but it suggests that the study of non-asymptotic results or the convergence rate of asymptotic results may be helpful to find a better for smaller . In the figure, the dotted vertical lines in the plots are . In some situations, this choice of FDR level does a better job than , but it can also lead to a risk ratio away from 1 in other situations.

In the plots, the range of the Bayes risk ratios is narrower for a larger . This is probably because the denominator of the ratio, the oracle Bayes risk, is an increasing function in .

The optimal FDR level (dashed vertical lines in Figures 5.1 and 5.2) increases as the difficulty index increases. With a larger , which signifies more difficulties in identifying signals from noises, the FDR can only be controlled at a higher level to achieve asymptotic Bayesian optimality. For both Student’s t and Pareto distributions, when , is close to 0.5, which could hardly provide satisfactory control of false discoveries in practice.

5.2 Results from scenario 2

The average number of misclassified observations, type \@slowromancapi@, and type \@slowromancapii@ errors based on 1000 replicates were used to estimate the misclassification probability (MP), probability of type \@slowromancapi@ errors (P1), and probability of type \@slowromancapii@ errors (P2), respectively. Figures 5.3 and 5.4 display the plots of the three error measurements against for Student’s t and Pareto distributions, respectively. The solid, dashed, and dotted lines, respectively, represent the Bayes oracle, the -BH procedure, and the -BH procedure. Here, the -BH procedure and the Bayes oracle behave similarly if is small. The solid lines and the dashed lines are almost identical in most situations when . When is large, the -BH procedure has lower P1 and higher P2 than the Bayes oracle, which suggests the former is conservative in identifying signals when they are abundant. Second, in terms of MP, for Student’s t distributions, -BH procedure works better when is larger since Student’s t distributions with bigger degrees of freedom are closer to the normal distributions, for which the level is designed. In general, when applied to polynomial-tailed distributions, the -BH procedure is more conservative in identifying signals than the -BH procedure. In the plots for , and , its P1 is in close vicinity of 0 and P2 is close to 1, which indicates the procedure identifies almost all observations as noises. In the MP panels of both figures, as grows, the line corresponding to the Bayes oracle lies closer to the line indicating increasing difficulty in multiple testing problem. This agrees with our findings in the first scenario.

6. Discussion

This paper establishes some asymptotic optimality properties of several multiple testing procedures in a Bayesian framework where the data are generated from distributions with polynomial tails. In particular, it is shown that some of the classical multiple testing procedures attain asymptotically the Bayes oracle property under sparsity. To the authors’ knowledge, Theorem 2 is the first result clearly providing an approach to simplify the problem of finding an implementable random ABOS thresholding procedure to the construction of an appropriate estimator of a fixed ABOS threshold. Future work might extend results beyond polynomial-tailed distributions.

In Section 4.1, we show that, for a fixed , the lower bound of the BFDR level that can be controlled is , see (4.3), with

 β∗∞=limm→∞β∗=(1+δ∞/C0)−1. (6.1)

In Section 3 of Chi (2007), it is shown that if both and do not vary with and the cdf of the p-value is strictly concave, then as grows to infinity, the BFDR is always bounded below by

 β∗=1−p1−p+plimx→∞ρ(x), (6.2)

where in our notation is with limit . Thus in (4.3) and in (6.2) have the same expression, although they are derived in different contexts. Chi (2007) also proved that, under his setting, there is a critical value for the target FDR control level . If , the power of a multiple testing procedure decays to 0 as and the BFDR converges to . Under our setting, we believe that the criticality phenomenon still exists with both and replaced by defined in (6.1). In panel (c) of Figures 5.1 and 5.2, we plot the probability of type \@slowromancapii@ error (P2) of the Benjamini-Hochberg procedure against FDR level when . The solid vertical lines represent . In the plots for , and , P2 is very close to 1, which suggests that the power is close to 0.

The asymptotic optimal FDR level of the Benjamini-Hochberg procedure, , depends on the difficulty index that is usually unknown. For practical use, according to our simulation results, when is large, is a good surrogate for in terms of risk ratios and misclassification probabilities. Although in some situations is not close to , the risk ratio does not considerably deviate from 1. It is shown in Figures 5.1 and 5.2 that the risk ratios are less sensitive to the choice of for a larger . Therefore, a smaller value, say 0.1, is a safe guess for when is small. There could be more delicate methods to estimate . To illustrate an example, let and denote the number of observations with absolute values in intervals and , respectively. If is large enough and are relatively small, by an idea similar to the one used to estimate in Chapter 4.5 of Efron (2012), we could assume that almost all the observations are signals and almost all the observations are noises. Then and Taking the ratio of these two and using the polynomial tail equivalence of the MPT distribution, we have

 m0m1≈1−pp(σ0σ1)γ[(ba1)γ−(ba2)γ]≈δ−1vu−γ/2[(b/a1)γ−(b/a2)γ].

As , could be estimated by

 ^C0=(δ∞m0/m1)[(b/a1)γ−(b/a2)γ]−1. (6.3)

Since is an increasing function in , the estimate of can be solved analytically or numerically depending on the form of . A problem of this method is how to choose , . We want and to be small enough so that nearly all the observations in intervals and are noises. At the same time, and should be large enough so that the polynomial approximation of is accurate. In some simulations not shown here, there is no simple solution to this problem.

As far as we know, theories of multiple testing problems for polynomial-tailed distributions have not been well developed in literature. It is unclear whether some of the interesting results for normal distributions still exist for polynomial-tailed distributions. For example, Bogdan, Chakrabarti, Frommlet, and Ghosh (2011) mention that their assumption (obtained from when ) can be related to asymptotically least-favorable configurations for balls, discussed in Abramovich, Benjamini, Donoho, and Johnstone (2006). To examine whether our assumption has similar connection with minimax estimation, a vital question to be answered is what the configurations for polynomial-tailed distributions look like.

Global-local shrinkage priors have received much attention recently in Bayesian analysis. Ghosh, Tang, Ghosh, and Chakrabarti (2016) showed that a multiple testing procedure based on a group of global-local shrinkage priors can asymptotically achieve the oracle Bayes risk up to a multiplicative constant. In the same vein, we would like to examine, in future work, whether and how global-local shrinkage priors can be used for polynomial-tailed distributions.

7. Supplementary Material

The online supplementary material includes proofs of our results.

Acknowledgment

Ghosh’s research was supported in part by the NSF Grant SES-1327359. Li’s research is supported by the Fundamental Research Funds for the Central Universities JBK120509 and JBK140507.

References

12006Abramovich et al.Abramovich, Benjamini, Donoho, and JohnstoneAbramovich et al. (2006)abramovich2006special Abramovich, F., Benjamini, Y., Donoho, D. L., and Johnstone, I. M. (2006). Special invited lecture: Adapting to unknown sparsity by controlling the false discovery rate. The Annals of Statistics  34, 584–653. 21995Benjamini and HochbergBenjamini and HochbergBenjamini and Hochberg (1995)benjamini1995controlling Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B  57, 289–300. 32001Benjamini and YekutieliBenjamini and YekutieliBenjamini and Yekutieli (2001)benjamini2001control Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics  29, 1165–1188. 42011Bogdan et al.Bogdan, Chakrabarti, Frommlet, and GhoshBogdan et al. (2011)bogdan2011asymptotic Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. The Annals of Statistics  39, 1551–1579. 52008Bogdan et al.Bogdan, Ghosh, and TokdarBogdan et al. (2008)bogdan2008comparison Bogdan, M., Ghosh, J. K., and Tokdar, S. T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen, 211–230. Institute of Mathematical Statistics. 62007ChiChiChi (2007)chi2007performance Chi, Z. (2007). On the performance of FDR control: constraints and a partial solution. The Annals of Statistics  35, 1409–1431. 72006Donoho and JinDonoho and JinDonoho and Jin (2006)donoho2006asymptotic Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. The Annals of Statistics  34, 2980–3018. 82012EfronEfronEfron (2012)efron2012large Efron, B. (2012). Large-scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Volume 1. Cambridge University Press. 92002Efron and TibshiraniEfron and TibshiraniEfron and Tibshirani (2002)efron2002empirical Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genetic epidemiology  23, 70–86. 102013Frommlet and BogdanFrommlet and BogdanFrommlet and Bogdan (2013)frommlet2013some Frommlet, F. and Bogdan, M. (2013). Some optimality properties of FDR controlling rules under sparsity. Electronic Journal of Statistics 7, 1328–1368. 112009Gavrilov et al.Gavrilov, Benjamini, and SarkarGavrilov et al. (2009)gavrilov2009adaptive Gavrilov, Y., Benjamini, Y., and Sarkar, S. K. (2009). An adaptive step-down procedure with proven FDR control under independence. The Annals of Statistics  37, 619–629. 122002Genovese and WassermanGenovese and WassermanGenovese and Wasserman (2002)genovese2002operating Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society B  64, 499–517. 132004Genovese and WassermanGenovese and WassermanGenovese and Wasserman (2004)genovese2004stochastic Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Annals of Statistics  32, 1035–1061. 142016Ghosh et al.Ghosh, Tang, Ghosh, and ChakrabartiGhosh et al. (2016)ghosh2013asymptotic Ghosh, P., Tang, X., Ghosh, M., and Chakrabarti, A. (2016). Asymptotic properties of Bayes risk of a general class of shrinkage priors in multiple hypothesis testing under sparsity. Bayesian Analysis  11, 753–796. 151956Karlin and RubinKarlin and RubinKarlin and Rubin (1956)karlin1956theory Karlin, S. and Rubin, H. (1956). The theory of decision procedures for distributions with monotone likelihood ratio. The Annals of Mathematical Statistics  27, 272–299. 161957aLehmannLehmannLehmann (1957a)lehmann1957theorya Lehmann, E. L. (1957a). A theory of some multiple decision problems, I. The Annals of Mathematical Statistics  28, 1–25. 171957bLehmannLehmannLehmann (1957b)lehmann1957theoryb Lehmann, E. L. (1957b). A theory of some multiple decision problems. II. The Annals of Mathematical Statistics  28, 547–572. 182012Neuvial and RoquainNeuvial and RoquainNeuvial and Roquain (2012)neuvial2012false Neuvial, P. and Roquain, E. (2012). On false discovery rate thresholding for classification under sparsity. The Annals of Statistics  40, 2572–2600. 192002SarkarSarkarSarkar (2002)sarkar2002some Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics  30, 239–257. 202002StoreyStoreyStorey (2002)storey2002direct Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society B  64, 479–498. 212003StoreyStoreyStorey (2003)storey2003positive Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics  31, 2013–2035.

Department of Statistics, University of Florida, Gainesville, Florida 32611-8545, U.S.A. E-mail: xytang@stat.ufl.edu School of Statistics & Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, Sichuan, China E-mail: likec@swufe.edu.cn Department of Statistics, University of Florida, Gainesville, Florida 32611-8545, U.S.A. E-mail: ghoshm@stat.ufl.edu

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters