Optimal Grouping for Group Minimax Hypothesis Testing

# Optimal Grouping for Group Minimax Hypothesis Testing

Kush R. Varshney and Lav R. Varshney Portions of the material in this paper were first presented in [1].K. R. Varshney is with the Business Analytics and Mathematical Sciences Department, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 10598 USA (e-mail: krvarshn@us.ibm.com).L. R. Varshney is with the Services Research Department, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 10598 USA (e-mail: varshney@alum.mit.edu).
###### Abstract

Bayesian hypothesis testing and minimax hypothesis testing represent extreme instances of detection in which the prior probabilities of the hypotheses are either completely and precisely known, or are completely unknown. Group minimax, also known as -minimax, is a robust intermediary between Bayesian and minimax hypothesis testing that allows for coarse or partial advance knowledge of the hypothesis priors by using information on sets in which the prior lies. Existing work on group minimax, however, does not consider the question of how to define the sets or groups of priors; it is assumed that the groups are given. In this work, we propose a novel intermediate detection scheme formulated through the quantization of the space of prior probabilities that optimally determines groups and also representative priors within the groups. We show that when viewed from a quantization perspective, group minimax amounts to determining centroids with a minimax Bayes risk error divergence distortion criterion: the appropriate Bregman divergence for this task. Moreover, the optimal partitioning of the space of prior probabilities is a Bregman Voronoi diagram. Together, the optimal grouping and representation points are an -net with respect to Bayes risk error divergence, and permit a rate–distortion type asymptotic analysis of detection performance with the number of groups. Examples of detecting signals corrupted by additive white Gaussian noise and of distinguishing exponentially-distributed signals are presented.

Bayesian hypothesis testing, Bregman divergence, detection theory, minimax hypothesis testing, quantization, Stolarsky mean

## I Introduction

Bayesian hypothesis testing and minimax hypothesis testing are signal detection formulations for when the prior probabilities of the hypotheses are perfectly and precisely known and for when the prior probabilities of the hypotheses are completely unknown, respectively [2]. Optimal performance in both settings is achieved by likelihood ratio tests with appropriately chosen thresholds. Between these two edge cases, there is an entire set of likelihood ratio tests corresponding to a coarse knowledge of the prior probabilities; these intermediate formulations are explored in this work.

Formulations that lie between Bayesian and minimax hypothesis testing are known as group minimax or -minimax and are of interest because it is difficult to obtain complete information about priors in many decision-making scenarios, but information about priors is also not completely lacking [3, 4, 5, 6, 7, 8, 9]. Group minimax detection formulations take partial information about priors as input and provide robustness against that partial information, in contrast to minimax hypothesis testing which provides robustness against complete lack of information. Throughout the long history of group minimax statistical inference, the sets, groups, or s in which the true priors lie are treated as inputs and are not optimized for detection or estimation performance. In contrast, our work herein investigates the joint problem of optimizing the groupings within which to find a minimax-optimal representative prior as well as finding those priors for all groupings.

We view the minimax test as one in which knowledge of prior probabilities has been quantized to a single cell encompassing the entire probability simplex, and the Bayesian test as one in which knowledge of prior probabilities has been quantized to an infinite number of cells that finely partition the probability simplex. In the group minimax test, the prior probabilities are quantized to a finite number of cells. The appropriate quantization distortion measure for prior probabilities of hypotheses is Bayes risk error [10], which is a Bregman divergence [11]. However unlike standard quantization, we are interested in minimizing the maximum distortion rather than minimizing the average distortion [12, 13, 14]. Thus we pursue minimax Bayes risk error quantization of prior probabilities [1].

Group minimax, which provides a means to consider intervals of prior belief rather than exact prior belief, is similar in spirit but differs in details to decision making based on interval-valued probability described in [15]. There are also connections to representative prior distributions [16], the robust Bayesian viewpoint [17, 18], and other areas of decision making in which robustness is desired [19].

To the best of our knowledge, there has been no previous work on the quantization of prior probabilities for hypothesis testing besides our own [10, 20]. Many interesting findings on average distortion clustering with Bregman divergences as the distortion criteria are reported in [21, 22], but we believe this is the first use of Bregman divergences in studying group minimax hypothesis testing. Although studies and results in quantization theory typically focus on average distortion, maximum distortion does also appear occasionally, e.g. [23, 14, 24, 25, 26]. Such a minimax partitioning of a space is known as an -net or -covering [27].

In investigating quantization for group minimax hypothesis testing, we derive centroid and nearest neighbor conditions for minimax Bayes risk error distortion and discuss how alternating application of these conditions leads to a locally optimal quantizer. We provide direct derivations in the binary detection case and specialize elegant results from the Bregman divergence literature in the general case. Minimax centroid conditions for Bregman divergences are derived in [28]. The problem of finding the optimal nearest neighbor cell boundaries for a given set of samples, also known as a Voronoi diagram, is addressed for Bregman divergences in [29, 30]. Advantages of the direct derivations for the binary setting include direct geometric insights, as well as closed-form expressions.

As a further contribution similar in style to rate–distortion theory [31], we present asymptotic results on detection error as the partiality of information about the prior goes from the minimax hypothesis testing case to the Bayesian hypothesis testing case. We also present a few examples of group minimax detection with different likelihood models.

The rest of the paper is organized in the following manner. First in Section II, we set forth notation and briefly provide background on Bayesian, minimax and group minimax detection, along with Bayes risk error divergence. We formulate a quantization problem to find optimal groupings for group minimax detection in Section III. Section IV derives the nearest neighbor and centroid optimality conditions for the proposed quantization problem in both the binary and -ary cases. We analyze the rate–distortion behavior of the groups in Section V. Two examples are presented in Section VI to provide intuition. Section VII provides a summary of the contributions and concludes.

## Ii Preliminaries

The detection or hypothesis testing problem is the task of accurately determining which of classes a noisy signal instance belongs to. In the binary () case, this task is often determining the presence or absence of a target based on a measurement observed through noise. In this section we first discuss binary hypothesis testing and then we consider -ary hypothesis testing for . Finally we present the definition of the Bayes risk error divergence, a quantification of detection performance degradation.

### Ii-a Binary Decisions

Consider the binary hypothesis testing problem. There are two hypotheses and with prior probabilities and , and a noisy observation governed by likelihood functions and . A decision rule that uniquely maps every possible to either or is to be determined. There are two types of error probabilities:

 pIE =Pr[^h(Y)=h1|H=h0], and pIIE =Pr[^h(Y)=h0|H=h1].

Minimizing weighted error, the optimization criterion for the decision rule is the Bayes risk:

 J=c10p0pIE+c01(1−p0)pIIE, (1)

where is the cost of the first type of error and is the cost of the second type of error. The decision rule that optimizes (1) is the following likelihood ratio test [2]:

 fY|H(y|H=h1)fY|H(y|H=h0)^h(y)=h1⋛^h(y)=h0p0c10(1−p0)c01. (2)

The prior probability appears on the right side of the rule in the threshold. Since the prior probability is part of the specification of the Bayes-optimal decision rule, the error probabilities and are functions of the prior probability. Thus we may write the Bayes risk as a function of :

 J(p0)=c10p0pIE(p0)+c01(1−p0)pIIE(p0), (3)

The function is zero at the points and and is positive-valued, strictly concave, and continuous in the interval [32]. Under deterministic decision rules, is differentiable everywhere.

The Bayesian hypothesis testing threshold on the right side of (2) relies on the true prior probability , but as discussed in Section I, this value may not be known precisely. When the true prior probability is , but the threshold in uses some other decision weight , there is mismatch. The Bayes risk of the decision rule with threshold

 ac10(1−a)c01

is:

 J(p0,a)=c10p0pIE(a)+c01(1−p0)pIIE(a). (4)

The function is a linear function of with slope and intercept . The function is tangent to at and . By the point-slope formula of lines, the mismatched Bayes risk is also:

 J(p0,a)=J(a)+(p0−a)J′(a) (5)

when is differentiable. An example of how and are related is shown in Fig. 1.

The minimax hypothesis testing threshold is determined by finding the decision weight that minimizes the worst-case , that is:

 a∗minimax=argminamaxp0J(p0,a). (6)

Under equivalent notation, the optimal Bayesian decision weight is . In Bayesian hypothesis testing, the decision weight continually changes with , whereas in minimax hypothesis testing, there is a single decision weight for all .

### Ii-B M-ary Decisions

The basics from the binary case carry over to the -ary case. With hypotheses, there are prior probabilities such that . The collection of priors is denoted by the vector , which is an element of the -ary probability simplex. There is also an matrix of costs . The detection rule in the -ary case uses ratios of priors and costs in an analogous manner to the likelihood ratio test (2). The Bayes risk function is now

 J(p)=M−1∑i=0M−1∑j=0cijpjPr[^h(Y,p)=hi|H=hj]. (7)

With a vector-valued decision weight , the mismatched Bayes risk function is

 J(p,a)=J(a)+(p−a)T∇J(a) (8)

when is differentiable. In the -ary case, as in the binary case, and

 a∗minimax=argminamaxpJ(p,a). (9)

### Ii-C Bayes Risk Error Divergence

The Bayes risk represents the performance of the best possible decision making under uncertainty, whereas the mismatched Bayes risk represents the degraded decision-making performance due to the decision weight . Thus, we may quantify the degradation or distortion in detection performance using the difference:

 d(p∥a) =J(p,a)−J(p) (10) =−J(p)+J(a)+(p−a)T∇J(a). (11)

This difference is a Bregman divergence termed Bayes risk error divergence generated by the convex function over a convex domain (the -ary probability simplex) [10, 11].

## Iii Minimax Bayes Risk Error Quantization

Having described a divergence that quantifies loss in detection performance due to a mismatched decision weight, in this section we describe how that divergence can be utilized within a scalar or vector quantization framework to yield not only the optimal minimax representation point for a given set of priors (the typical group minimax problem), but also the optimal groupings for the group minimax scenario.

### Iii-a Quantization for Group Minimax Grouping

The space of all possible decision weights and the space of all true prior probability vectors is the -ary probability simplex. As discussed in Section II, in the Bayesian case, the decision weight changes continuously with the true prior probability vector of the detection problem, so that the Bayes risk error divergence for all detection problems. Denoting the mapping from true prior probability to decision weight as , this function is the identity function in the Bayesian case and has the entire -ary probability simplex as its range.

On the other hand in the minimax case, there is a single decision weight for all detection problems and for all problems except the one problem in which, by chance, the minimax decision weight is the true prior probability. Here, the mapping from true prior probability to decision weight is a function whose range contains a single point: .

As discussed in Section I, it may be that the true prior probability and thus the Bayesian decision weight is not exactly known. It may also be, however, that there is some partial information, and thus we need not restrict ourselves to just one decision weight for all detection problems but may have different decision weights. Therefore, we would like to consider functions whose range is a finite set of decision weights . With such a range, there are true prior probabilities for which there is no degradation in detection performance, i.e. . The function depends discontinuously on such that for all , . Such a function is a quantizer. The remaining question for the proposed optimal grouping for group minimax hypothesis testing is determining the decision weights and the quantization cells .

### Iii-B Minimax Bayes Risk Error Quantization Criterion

Robustness is the motivation for both minimax hypothesis testing and group minimax hypothesis testing. We take maximum Bayes risk error divergence as the objective for finding the decision weights and the quantization cells, resulting in the following minimax quantizer design problem:

 q∗K=argminqKmaxpd(p∥qK(p)), (12)

where is a quantizer function with cells and decision weights, and is a fixed parameter. Operationally, knowing in advance that the true prior probability falls in cell indicates that the decision weight be used in setting the threshold.

In the case, it is straightforward to show that the decision weight of equals the minimax hypothesis testing value , and occurs at the peak of . However for , the decision weight within a cell is not the point that minimizes the maximum mismatched Bayes risk ; rather it is the point that minimizes the maximum Bayes risk error divergence . An example of the decision weight as a function of prior probability is shown in Fig. 2.

In this section, we have defined an approach for optimal grouping for group minimax hypothesis testing. This formulation reduces to the two extreme hypothesis testing methodologies: minimax at and Bayesian as .

## Iv Optimality Conditions

This section develops necessary conditions for optimality of a quantizer for the probability simplex under the minimax criterion (12) defined in Section III, first in the scalar quantization (binary hypothesis testing) setting and then in the vector quantization (-ary hypothesis testing) setting. We find a centroid condition to locally optimize decision weights when the quantization cells are fixed. Then we find a nearest neighbor condition to locally optimize the quantization cells with decision weights fixed. Optimal quantizers can be found by alternately applying the nearest neighbor and centroid conditions through a version of the iterative Lloyd–Max algorithm [12, 14, 29]. We provide direct derivations for the binary case and specialize more general Bregman divergence results for the -ary case.

### Iv-a Binary Hypothesis Testing Centroid Condition

The -cell scalar quantizer function in the binary hypothesis testing problem has cell notation as follows. The probability simplex is partitioned into intervals , , , …, . Within a fixed scalar quantization cell with boundaries and ,111Since increases monotonically with the absolute error, we can observe the convexity of the nearest neighbor cell; consequently each cell must consist of a single interval, cf. [12, Lemma 6.2.1]. we want an expression for the optimal decision weight:

 ak=argmina∈Qkmaxp0∈Qkd(p0∥a). (13)
###### Theorem 1

In the binary hypothesis testing problem with deterministic likelihood ratio test decision rules, the minimax Bayes risk error divergence optimal decision weight satisfies:

 J′(ak)=J(bk)−J(bk−1)bk−bk−1. (14)
###### Proof:

Let us first focus on the inner maximization in (13). In the binary hypothesis testing case,

 d(p0∥a)=−J(p0)+J(a)+(p0−a)J′(a), (15)

from which we see that the second derivative of with respect to is , which is greater than zero due to the strict concavity of . Thus, has no local maxima in the interior of ; the maximum occurs at an endpoint: or . Consequently,

 maxp0∈Qkd(p0∥a) =max{d(bk∥a),d(bk−1∥a)} (16) =d(bk−1∥a)+d(bk∥a)+|d(bk−1∥a)−d(bk∥a)|2. (17)

Substituting (15) into (16) and simplifying, we find that (16) equals

 (bk−1+bk−2a)J′(a)−J(bk−1)−J(bk)+2J(a)2+|(bk−1−bk)J′(a)−J(bk−1)+J(bk)|2, (18)

which is to be minimized with respect to .

Due to the absolute value function, there are two cases to consider:

1. and

2. .

Due to the concavity of the Bayes risk function, is monotonically decreasing. Therefore, since is negative, is a monotonically increasing function of . Consequently the two cases of the absolute value correspond to the intervals for case 1 and for case 2, where satisfies:

 (bk−1−bk)J′(a†)−J(bk−1)+J(bk)=0. (19)

In the first case, (18) simplifies to:

 (bk−a)J′(a)+J(a)−J(bk)

with derivative with respect to :

 (bk−a)J′′(a),

which is less than zero because and due to Bayes risk concavity. Thus the minimization objective is monotonically decreasing in the first case.

In the second case, (18) simplifies to:

 (bk−1−a)J′(a)+J(a)−J(bk−1),

which has derivative with respect to :

 (bk−1−a)J′′(a),

which is greater than zero because and . In the second case, the minimization objective is monotonically increasing.

Since (18) is decreasing over and increasing over , it is minimized at . Therefore . The decision weight satisfies (19). This is equivalently the slope matching condition (14) given in the statement of the theorem. \qed

This minimax centroid is a Stolarsky mean [33]; the Stolarsky mean of and is in general:

 F′−1(F(u)−F(v)u−v)

for any reasonable function .

### Iv-B Binary Hypothesis Testing Nearest Neighbor Condition

In the binary hypothesis testing nearest neighbor condition, we are to find the cell boundary given the decision weights and .

###### Theorem 2

In the binary hypothesis testing problem with deterministic likelihood ratio test decision rules, the minimax Bayes risk error divergence optimal cell boundary is:

 bk=ak+1J′(ak+1)−akJ′(ak)−(J(ak+1)−J(ak))J′(ak+1)−J′(ak). (20)
###### Proof:

As discussed in Section IV-A, the maximum Bayes risk error divergence within a cell occurs at the cell boundary. Therefore, we would like to minimize the Bayes risk error divergence at the cell boundary.

Specifically, should be chosen to minimize the maximum of and . At a given potential boundary point , the term is the same in both and , so only and need be considered. Due to the geometry of the problem, should be the abscissa of the point at which the lines and intersect. Working with the definitions of and , we find the point of intersection to be (20). \qed

The cell boundary is the tangent line mean of the decision weights [34]. The nearest neighbor condition for minimax Bayes risk error quantization is the same as that for minimum mean Bayes risk error quantization [10].

### Iv-C M-ary Hypothesis Testing Nearest Neighbor Condition

We found the nearest neighbor condition over the binary simplex, i.e., the line segment between zero and one, in Section IV-B. In that case, the cell boundaries were simply two points on the line. The situation is slightly more complicated notationally in the -ary detection task because of the increased dimensionality. Let us define the -ary probability simplex as follows:

 PM={π∈RM−1+∣∣∣M−1∑i=1πi≤1}. (21)

Now in specifying the nearest neighbor condition, we assume that the decision weights are fixed. We denote the set of points in that are equidistant according to Bayes risk error divergence from and as , such that

 Bk,k+1={π∈PM∣d(π∥ak)=d(π∥ak+1)}. (22)

We show that this bisector between the two decision weights and is a hyperplane in .

###### Theorem 3

In the -ary hypothesis testing problem with deterministic likelihood ratio test decision rules, the Bayes risk error divergence bisector satisfies the hyperplane equation:

 Bk,k+1={π∈PM∣πT(∇J(ak+1)−∇J(ak))=aTk+1∇J(ak+1)−aTk∇J(ak)−(J(ak+1)−J(ak))}. (23)
###### Proof:

The result follows by specializing [29, Lemma 4], which applies to all Bregman divergences, to Bayes risk error divergence. \qed

It is easy to see that we recover the binary boundary expression for (20) when we set in (23).

In the binary case, the boundary point bisectors fully specify the quantization cells , but in the -ary case, we must go one step further. In particular, the quantization cells are defined as follows:

 Qk={π∈PM∣d(π∥ak)≤d(π∥ak′),k′≠k}. (24)

Moreover, as discussed in [29], the cell is a convex polyhedron, which is delineated by the intersection of the bisectors between its decision weight and all other cell decision weights. The set of all minimax quantization cells is the Voronoi diagram of the simplex with the set of fixed decision weights as seeds. If we write the half space induced by such that it contains and is restricted to as , then

 Qk=⋂k′≠kHk,k′. (25)

Let be the number of vertices of cell . Each has at most faces and at most vertices, i.e. . (The constant additive terms that correspond to the dimension of the space are due to intersections with the simplex boundary.) Moreover, in the same way that the maximum Bayes risk error divergence within a cell occurs at the cell boundary in the binary case, the maximum Bayes risk error divergence occurs at one of the finite vertices in the -ary case [29, Lemma 12].

### Iv-D M-ary Hypothesis Testing Centroid Condition

In Section IV-A, we found the minimax Bayes risk error centroid condition in the binary detection case through an explicit calculation that made use of convexity properties of the Bayes risk function. Here we find the centroid condition in general for -ary detection, by adapting the minimax centroid results of general Bregman divergences found in [28].

In deriving the centroid condition, the cell and its vertices are fixed. Since we know the maximum divergence occurs at a vertex, we only examine the vertices of in order to find the minimax-optimal decision weight within the cell. Let the vertices of be denoted . The optimal decision weight is a functional mean of the vertices.

###### Theorem 4

In the -ary hypothesis testing problem with deterministic likelihood ratio test decision rules, the minimax Bayes risk error divergence optimal decision weight satisfies:

 ∇J(ak)=vk∑i=1wi∇J(bk,i), (26)

where the weights satisfy and .

Putting all of the into a vector , the optimal weight vector is the solution to the following optimization problem:

 maxwvk∑i=1wid(∇−1J(vk∑j=1wj∇J(bk,j))∥∥∥bk,i) (27)

subject to the same constraints and .

###### Proof:

The result follows by specializing [28, Section 3], which applies to all Bregman divergences, to Bayes risk error divergence. \qed

The optimization problem (27) is similar to that solved in learning support vector machines [28]. The that are found are ‘support’ vertices that contribute to the location of the decision weight [28].

We note the centroid condition in the binary case (14) can be expressed as , with and due to the concavity of the Bayes risk and the intermediate value theorem of calculus. In this form, we see the correspondence to (26). In contrast to the -ary case, there is a closed form expression for the decision weight in the binary case without requiring solving an optimization program.

## V Rate–Distortion Analysis

To understand how quickly or slowly group minimax hypothesis testing approaches the performance of Bayesian hypothesis testing, in this section we examine the maximum achieved distortion as a function of the number of groups . Let us denote the minimax distortion overall as:

 D=minqKmaxpd(p∥qK(p)). (28)
###### Theorem 5

In the -ary hypothesis testing problem with deterministic likelihood ratio test decision rules, the maximum Bayes risk error of the minimax-optimal quantization with groups satisfies the rate–distortion expression:

 K=O⎛⎜⎝1(M−1)!DM−12⎞⎟⎠ (29)
###### Proof:

The result follows from the fact that the volume of the probability simplex in the -ary detection problem is and specializing the results on -nets for general Bregman divergences given in [29, Lemma 14] to Bayes risk error divergence. \qed

The convergence from the edge case of minimax hypothesis testing to the other edge case of Bayesian hypothesis testing is in proportion to in the binary hypothesis testing case, which is the same scaling seen in the mean Bayes risk error case presented in [10]. A similar scaling is also noted for detectors based on estimated prior probabilities [35]. The minimax error scaling can be viewed as the asymptotic behavior of the minimum covering radius with respect to Bayes risk error divergence. Note that all Bregman divergences, including squared error, will yield the same scaling behavior for as a function of . This implies that grouping by an incorrect Bregman fidelity criterion will incur a constant asymptotic rate loss.

## Vi Examples

We present two signal detection problem examples approached through group minimax hypothesis testing with optimal grouping. The first example is the typical example of detecting a signal through Gaussian noise. The second example is a ternary hypothesis testing problem with three different exponential likelihoods.

### Vi-a Detecting Signals in Gaussian Noise

Let us consider the following signal and measurement model:

 Y=sm+W,m∈{0,1}, (30)

where and , and is a zero-mean, Gaussian random variable with variance . The parameters and are known, deterministic quantities. The error probabilities for this signal model are:

 pIE(a) =Q(μ2σ+σμln(c10ac01(1−a))), and pIIE(a) =Q(μ2σ−σμln(c10ac01(1−a))),

where

 Q(α)=1√2π∫∞αe−x2/2dx.

These error probabilities can be put together to obtain the Bayes risk error expression for this detection task:

 J(p0,a)=c10p0Q(μ2σ+σμln(c10ac01(1−a)))+c01(1−p0)Q(μ2σ−σμln(c10ac01(1−a))). (31)

We use the Lloyd–Max algorithm to design quantizers for the proposed group minimax criterion using the centroid and nearest neighbor conditions derived in Section IV: equations (14) and (20). We show such quantizers for and different ratios of the Bayes costs and along with different ratios of and . As a point of comparison, we also show the optimal quantizers designed to minimize mean Bayes risk error divergence [10], rather than minimize maximum Bayes risk error divergence.

Fig. 3 shows quantizers for equal Bayes costs and equal mean and standard deviation.

In the plots, the black curve is and the dashed line is , with their difference being . The circle markers are the representation points and the vertical lines indicate the interval boundaries of the groups. The divergence value is shown in Fig. 4.

The minimax groups and representation points are more clustered in the middle of the probability simplex and around the peak of than the minimum mean groups and representation points. This is more apparent in the quantizers for the noisier measurement model with and shown in Fig. 5, and the quantizers for unequal Bayes costs and shown in Fig. 7. The divergence values for these other two cases are shown in Fig. 6 and Fig. 8.

The minimax Bayes risk error as a function of for this example is shown in Fig. 9 on both linear and logarithmic scales.

The curves seen in Fig. 9(b) exactly reflect the behavior expected according to the rate–distortion analysis of Section V. They are almost perfectly linear beyond a couple of small values. The slopes of the lines are which is the rate predicted for .

### Vi-B Distinguishing Exponential Likelihoods

In this example, we consider objects in a queuing that are served at varying rates. Objects are served at rate when in state for , with . The ternary hypothesis testing task is to determine which state the object is in based on an observation of the time at which it is served. The likelihood functions take the form:

 fY|H(y|H=hm)=λme−λmy. (32)

For simplicity, we only consider the case in which , and .

Recall that we denote the prior probabilities and through the vector and the decision weights and through the vector (where and ). In this example, if we define the following two functions of the decision weights:

 γ01(a) =max{0,1λ0−λ1ln(a0λ0a1λ1)}, (33) γ12(a) =max{0,1λ1−λ2ln(a1λ1(1−a0−a1)λ2)}, (34)

then the mismatched Bayes risk function is:

 J(p,a)=p0e−λ0γ01(a)+p1(1−e−λ1γ01(a)+e−λ1γ12(a))+(1−p0−p1)(1−e−λ2γ12(a)). (35)

We calculate the gradient in closed form, but omit it here because of its unwieldy nature.

We now examine optimal groupings for group minimax hypothesis testing in the ternary exponential service time example with , , and . The convex Bayes risk function defined over the probability simplex is shown via shading in Fig. 10.

The Bayes risk function is zero at all three corners and along the axis. We apply the alternating nearest neighbor condition and centroid condition of the Lloyd–Max algorithm to find the optimal groupings in the case. Fig. 11 is a plot of the groups and representation points that are found.

In Fig. 12, we show the minimax error for this example as a function of in the logarithm-transformed domain. As expected for , the function is approximately linear with slope .

## Vii Conclusion

The group minimax test—as an intermediate formulation between the Bayesian and minimax tests that takes advantage of set-structured, incomplete advanced knowledge of priors—was proposed long ago by the early decision theorists. However results in the literature were obtained under special circumstances and when the sets were predetermined. In this work, we approach group minimax through the emergent theory of quantizing with Bregman divergences and make statements about optimal representative priors that do not rely on any special likelihood functions. By optimizing the minimax Bayes risk error divergence, we obtain a closed-form Stolarsky mean expression for the optimal representative prior within a group in the binary case. In the -ary case, we present a support vector machine-like program to be solved.

In descriptions of group minimax or -minimax in the literature, no heed is given to determining the best groups to maximize detection performance. We solve this problem jointly with finding representative priors within groups through an alternating minimization involving Bregman centroids and Bregman bisectors. The optimal groupings are delineated by a Voronoi diagram or -net of the space of prior probabilities. We give closed-form expressions for the polyhedral group boundaries. Moreover, in a rate–distortion format, we characterize the rate at which detection performance of group minimax approaches Bayesian detection as the number of optimal groups increases.

The research described in this paper is for single decision makers. Distributed detection with multiple agents working as a team [36, 37, 38] or with conflicts [39, 40] can also be considered. Additionally, regret theory is closely connected with minimax hypothesis testing [41, 42]; extensions of this paper within the confines of regret theory may be explored.

## Acknowledgment

The authors thank Joong Bum Rhim for discussions.

## References

• [1] K. R. Varshney and L. R. Varshney, “Multilevel minimax hypothesis testing,” in Proc. IEEE Int. Workshop Statist. Signal Process., Nice, France, Jun. 2011, pp. 109–112.
• [2] H. L. Van Trees, Detection, Estimation, and Modulation Theory.   New York, NY: Wiley, 1968.
• [3] H. Robbins, “Asymptotically subminimax solutions of compound statistical decision problems,” in Proc. Second Berkeley Symp. Math. Stat. Prob., Berkeley, CA, Jul.–Aug. 1950, pp. 131–148.
• [4] I. J. Good, “Rational decisions,” J. Roy. Stat. Soc. B Met., vol. 14, no. 1, pp. 107–114, 1952.
• [5] L. J. Savage, The Foundations of Statistics.   New York, NY: Wiley, 1954.
• [6] H. Robbins, “The empirical Bayes approach to statistical decision problems,” Ann. Math. Stat., vol. 35, no. 1, pp. 1–20, Mar. 1964.
• [7] J. R. Blum and J. Rosenblatt, “On partial a priori information in statistical inference,” Ann. Math. Stat., vol. 38, no. 6, pp. 1671–1678, Dec. 1967.
• [8] B. Vidakovic, “-minimax: A paradigm for conservative robust Bayesians,” in Robust Bayesian Analysis, D. Rìos Insua and F. Ruggeri, Eds.   New York, NY: Springer, 2000, pp. 241–259.
• [9] F. Ruggeri, “Gamma-minimax inference,” in Encyclopedia of Statistical Sciences, S. Kotz, C. B. Read, N. Balakrishnan, and B. Vidakovic, Eds.   Hoboken, NJ: Wiley, 2006.
• [10] K. R. Varshney and L. R. Varshney, “Quantization of prior probabilities for hypothesis testing,” IEEE Trans. Signal Process., vol. 56, no. 10, pp. 4553–4562, Oct. 2008.
• [11] K. R. Varshney, “Bayes risk error is a Bregman divergence,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4470–4472, Sep. 2011.
• [12] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.   Boston, MA: Kluwer Academic Publishers, 1992.
• [13] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2325–2383, Oct. 1998.
• [14] S. Graf and H. Luschgy, Foundations of Quantization for Probability Distributions.   Berlin: Springer-Verlag, 2000.
• [15] M. Wolfenson and T. L. Fine, “Bayes-like decision making with upper and lower probabilities,” J. Am. Stat. Assoc., vol. 77, no. 377, pp. 80–88, Mar. 1982.
• [16] C. Hildreth, “Bayesian statisticians and remote clients,” Econometrica, vol. 31, no. 3, pp. 422–438, Jul. 1963.
• [17] J. O. Berger, “The robust Bayesian viewpoint,” Purdue Univ., Tech. Rep. 82-9, Apr. 1982.
• [18] L. R. Pericchi and P. Walley, “Robust Bayesian credible intervals and prior ignorance,” Int. Stat. Rev., vol. 59, no. 1, pp. 1–23, Apr. 1991.
• [19] D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applications of robust optimization,” SIAM Rev., vol. 53, no. 3, pp. 464–501, Jul. 2011.
• [20] L. R. Varshney, J. B. Rhim, K. R. Varshney, and V. K. Goyal, “Categorical decision making by people, committees, and crowds,” in Proc. Inf. Theory Appl. Workshop, La Jolla, CA, Feb. 2011.
• [21] A. Banerjee, X. Guo, and H. Wang, “On the optimality of conditional expectation as a Bregman predictor,” IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2664–2669, Jul. 2005.
• [22] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” J. Mach. Learn. Res., vol. 6, pp. 1705–1749, Oct. 2005.
• [23] C. Zhu and Y. Hua, “Image vector quantization with minimax distortion,” IEEE Signal Process. Lett., vol. 6, no. 2, pp. 25–27, Feb. 1999.
• [24] N. Sarshar and X. Wu, “Minimax multiresolution scalar quantization,” in Proc. Data Compression Conf., Snowbird, UT, Mar. 2004, pp. 52–61.
• [25] P. Venkitasubramaniam, L. Tong, and A. Swami, “Minimax quantization for distributed maximum likelihood estimation,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Toulouse, France, May 2006, pp. III–652–III–655.
• [26] Y. A. Reznik, “An algorithm for quantization of discrete probability distributions,” in Proc. Data Compression Conf., Snowbird, UT, Mar. 2011, pp. 333–342.
• [27] A. N. Kolmogorov and V. M. Tihomirov, “-entropy and -capacity of sets in functional spaces,” Am. Math. Soc. Translations Series 2, vol. 17, pp. 277–364, 1961.
• [28] R. Nock and F. Nielsen, “Fitting the smallest enclosing Bregman balls,” in Proc. Eur. Conf. Mach. Learn., Porto, Portugal, Oct. 2005, pp. 649–656.
• [29] F. Nielsen, J.-D. Boissonnat, and R. Nock, “Bregman Voronoi diagrams: Properties, algorithms and applications,” INRIA, Sophia-Antipolis, France, Tech. Rep. 6154, Mar. 2007.
• [30] J.-D. Boissonnat, F. Nielsen, and R. Nock, “Bregman Voronoi diagrams,” Discrete Comput. Geom., vol. 44, no. 2, pp. 281–307, Sep. 2010.
• [31] D. L. Donoho, M. Vetterli, R. A. DeVore, and I. Daubechies, “Data compression and harmonic analysis,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2435–2476, Oct. 1998.
• [32] R. A. Wijsman, “Continuity of the Bayes risk,” Ann. Math. Statist., vol. 41, no. 3, pp. 1083–1085, Jun. 1970.
• [33] K. B. Stolarsky, “Generalizations of the logarithmic mean,” Math. Mag., vol. 48, no. 2, pp. 87–92, Mar. 1975.
• [34] B. C. Dietel and R. A. Gordon, “Using tangent lines to define means,” Math. Mag., vol. 76, no. 1, pp. 52–61, Feb. 2003.
• [35] J. Jiao, L. Zhang, and R. D. Nowak, “Minimax-optimal bounds for detectors based on estimated prior probabilities,” IEEE Trans. Inf. Theory, vol. 58, no. 9, pp. 6101–6109, Sep. 2012.
• [36] J. B. Rhim, L. R. Varshney, and V. K. Goyal, “Quantization of prior probabilities for collaborative distributed hypothesis testing,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4537–4550, Sep. 2012.
• [37] ——, “Distributed decision making by categorically-thinking agents,” in Decision Making and Imperfection, T. V. Guy, M. Kárný, and D. H. Wolpert, Eds.   Heidelberg, Germany: Springer, 2013, pp. 37–63.
• [38] G. Gül and A. M. Zoubir, “Robust distributed detection,” Available at http://arxiv.org/pdf/1306.3618, Jun. 2013.
• [39] P. D. Grünwald and A. P. Dawid, “Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory,” Ann. Stat., vol. 32, no. 4, pp. 1367–1433, Aug. 2004.
• [40] J. B. Rhim, L. R. Varshney, and V. K. Goyal, “Conflict in distributed hypothesis testing with quantized prior probabilities,” in Proc. Data Compression Conf., Snowbird, UT, Mar. 2011, pp. 313–322.
• [41] G. Loomes and R. Sugden, “Regret theory: An alternative theory of rational choice under uncertainty,” Econ. J., vol. 92, no. 368, pp. 805–824, Dec. 1982.
• [42] Y. C. Eldar, A. Ben-Tal, and A. Nemirovski, “Linear minimax regret estimation of deterministic parameters with bounded data uncertainties,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2177–2188, Aug. 2004.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters