Efficient and Near-Optimal Noisy Group Testing: An Information-Theoretic Framework

Efficient and Near-Optimal Noisy Group Testing: An Information-Theoretic Framework

Jonathan Scarlett and Volkan Cevher
Abstract

The group testing problem consists of determining a small set of defective items from a larger set of items based on a number of tests, and is relevant in applications such as medical testing, communication protocols, pattern matching, and more. In this paper, we revisit an efficient algorithm for noisy group testing in which each item is decoded separately (Malyutov and Mateev, 1980), and develop novel performance guarantees via an information-theoretic framework for general noise models. For the special cases of no noise and symmetric noise, we find that the asymptotic number of tests required for vanishing error probability is within a factor of the information-theoretic optimum at low sparsity levels, and that with a small number of allowed errors, this guarantee extends to all sublinear sparsity levels. In addition, we provide a converse bound showing that if one tries to move slightly beyond our low-sparsity achievability threshold using separate decoding of items and i.i.d. randomized testing, the average number of items decoded incorrectly approaches that of a trivial decoder.

\setenumerate

[1]label=0.

I Introduction

The group testing problem consists of determining a small subset of “defective” items within a larger set of items based on a number of tests. This problem has a history in medical testing [1], and has regained significant attention with following applications in areas such as communication protocols [2], pattern matching [3], and database systems [4], and new connections with compressive sensing [5, 6]. In the noiseless setting, each test takes the form

(1)

where the test vector indicates which items are included in the test, and is the resulting observation. That is, the output indicates whether at least one defective item was included in the test. One wishes to minimize the total number of tests while still ensuring the reliable recovery of . We focus on the non-adaptive setting, in which all tests must be designed in advance. The corresponding test vectors are represented by the matrix .

Following both classical works [7, 8, 9, 10] and recent advances [11, 12, 13], the information-theoretic performance limits of group testing have become increasingly well-understood, and several practical near-optimal algorithms for the noiseless setting have been developed. In contrast, practical algorithms for noisy settings have generally remained less well-understood, with the best known theoretical guarantees usually being far from the information-theoretic limits (though sometimes matching in scaling laws) [14, 15, 16].

A notable exception to these limitations is the technique of Malyutov and Mateev [8] based on separate decoding of items,111This is referred to as separate testing of inputs in more recent works [17, 18], but there the word “testing” refers to a hypothesis test performed at the decoder, as opposed to testing the sense of designing . in which each given item is decoded based only on the -th column of , along with (see Section I-B). This approach is computationally efficient, and was also proved to come with strong theoretical guarantees in the case that [8, 17] (see Section I-C).

In this paper, we develop a theoretical framework for understanding separate decoding of items, and move beyond the work of [8] in several important directions: (i) We consider the general case of , thus handling much more general scenarios corresponding to “denser” settings, and leading to non-trivial challenges in the theoretical analysis; (ii) We consider not only exact recovery, but also partial recovery, often leading to much milder requirements on the number of tests; (iii) We provide a novel converse bound revealing that under separate decoding of items, our achievability bounds cannot be improved in several cases of interest.

Before discussing the previous work and our contributions in more detail, we formally state the setup.

I-a Problem Setup

We let the defective set be uniform on the subsets of of cardinality . For convenience, we will sometimes equivalently refer to a vector whose -th entry indicates whether or not item is defective:

(2)

We consider i.i.d. Bernoulli testing, where each item is placed in a given test independently with probability for some constant . The vector of observations is denoted by , and the corresponding measurement matrix (each row of which contains a single measurement vector ) is denoted by . Denoting the -th entry of by and the -th row of by , the measurement model is given by

(3)

where denotes the number of defective items in the test. That is, we consider arbitrary noise distributions for which depends on only through , with conditional independence among the tests . For each item , the -th column of is written as .

While most of our results will be written in terms of general noise models of the form (3), we also pay particular attention to two specific models: The noiseless model in (1), and the symmetric noise model with parameter :

(4)

where , and denotes modulo-2 addition.

In the general case (i.e., not necessarily using separate decoding of items), given and , a decoder forms an estimate of , or equivalently, an estimate of . We consider two related performance measures. In the case of exact recovery, the error probability is given by

(5)

and is taken over the realizations of , , and (the decoder is assumed to be deterministic). In addition, we consider a less stringent performance criterion in which we allow for up to false positives and false negatives, yielding an error probability of

(6)

In some cases (particularly for the converse) it will be convenient to consider yet another criterion in which we only seek to bound the average number of incorrectly-decoded items :

(7)

I-B Separate Decoding of Items

We use the terminology separate decoding of items to mean any decoding scheme in which is only a function of and , i.e.,

(8)

for some functions . All of our achievability results will choose not depending on ; more specifically, following [8], each decoder is of the following form for some :

(9)

where is the unconditional distribution of a given observation, and is the conditional distribution given and the value of . This can be interpreted as the Neyman-Pearson test for binary hypothesis testing with hypotheses and .

The computational complexity of (9) is for each , for a total of . This matches the runtime of typical group testing algorithms [12], though it is slower than recent sublinear-time algorithms [15, 16]. Moreover, considerable speedups are possible via distributed implementations, as shown in [45, 44].

I-C Related Work

Figure 1 summarizes the main results known for the noiseless and symmetric noise models (both information-theoretic and practical), along with our novel contributions. We proceed by outlining the relevant related work, and then describe our contributions in more detail and further discuss Figure 1.

Figure 1: Asymptotic thresholds on the number of tests required for vanishing error probability in (Left) the noiseless setting, and (Right) the symmetric noise setting with . The number of defective items is for some . The vertical axis represents the constant such that the number of tests is . The blue curves correspond to separate decoding of items, with exact recovery (Exact), false positives only (False pos), false negatives only (False neg), or both false positives and false negatives (False pos/neg). The “practical joint exact” curve is DD [12] or LP [18] in the noiseless case, and the “best previous practical” curve is NCOMP [14] in the noisy case.

Information-theoretic limits. The information-theoretic limits of group testing have long been well-understood for in the Russian literature [7, 8], and have recently become increasingly well-understood for more general [11, 19, 13, 20]. Here we highlight two results from [21, 13, 20] that are particularly relevant to our work:

  • For the noiseless model (1) with Bernoulli testing and for some , the minimal number of tests ensuring high-probability exact recovery satisfies

    (10)

    where is the binary entropy function.222Here and subsequently, all logarithms have base , and all information measures are in units of nats. In particular, for , we have , which is optimal even beyond Bernoulli testing.

  • For the symmetric noise model (4) with , we have for sufficiently small that

    (11)

    Moreover, if we move to partial recovery with for arbitrarily small , then the right-hand side of (11) is achievable for all (including in the noiseless case ).

Practical algorithms. In the noiseless setting, numerous practical group testing algorithms have been proposed with various theoretical guarantees. The best known bounds under Bernoulli testing were given for the definite defectives (DD) algorithm in [20], in particular matching (10) for . Moreover, in [22], it was shown that the same bound is attained by the linear programming (LP) relaxation techniques of [18]. The DD algorithm was also shown to yield improved bounds under a random non-Bernoulli design in [23], but in this paper we focus on Bernoulli testing.

In noisy settings, less is known. For the symmetric noise model, an algorithm called noisy combinatorial orthogonal matching pursuit (NCOMP) [14] (also referred to as noisy column matching in [24]) was shown to achieve optimal scaling, but the constant factors in this result are quite suboptimal. NCOMP is in fact also an algorithm with separate decoding of items, albeit different than that of [8].

Some heuristic algorithms have been proposed for noisy settings without theoretical guarantees, including belief propagation [25] and a noisy LP relaxation [18]. A different LP relaxation was also given in [24] that only makes use of the negative tests; as a result, we found that it does not perform as well in practice. On the other hand, it was shown to yield optimal scaling laws in the symmetric noise model, with the constant factors again left loose to simplify the analysis.

Another related algorithm is the column-based algorithm of [26], which separately computes the number of agreements and disagreements with for each column (though a final sorting step is also performed, so this is not a “separate decoding” algorithm according to our definition). The main focus in [26] is on the regime that and (i.e., a very large number of false positives), which we do not consider in this paper.

Recently, algorithms based on sparse graph codes (with non-Bernoulli testing) have been proposed with various guarantees [15, 16]. For the noiseless non-adaptive case with exact recovery, however, the scaling laws are , thus failing to match the information-theoretic scaling . Nevertheless, for partial recovery, a guarantee was proved in [16] (with loose constant factors). While these algorithms are suboptimal, they have the notable advantage of running in sublinear time.

We briefly mention that a rather different line of works has considered group testing with adversarial errors [27, 28, 29]. This is a fundamentally different setting to that of random errors, and the corresponding test designs and algorithms are less relevant to the present paper.

Separate decoding of items. The idea of separate decoding of items for sparse recovery problems (including group testing) was initiated by Malyutov and Mateev [8], who showed that when and the decoder (9) is used with suitably-chosen , one can achieve exact recovery with vanishing error probability provided that

(12)

where the single-item mutual information is defined as follows, with implicit conditioning on item 1 being defective:

(13)

As noted in [17], in the noiseless setting with we have as , and in this case, (12) matches (10) up to a factor of . In fact, the same turns out to be true for the symmetric noise model, matching (11) up to a factor of , or even better if is further optimized (see Appendix -A). In more recent works [30, 17], similar results were shown when the rule (9) is replaced by a universal rule (i.e., not depending on the noise distribution) based on the empirical mutual information. However, we stick to (9) in this paper, as we found it to be more amenable to general scalings of the form .

Characterizations of the mutual information for several specific noise models were given in [31].

I-D Contributions

In this paper, we provide an information-theoretic framework for studying separate decoding of items with general scalings of the form , as opposed to the case considered in [8, 30]. As with joint decoding [21, 13], the regime comes with significant challenges, with additional requirements of arising from concentration inequalities and often dominating (12). In addition, we consider the novel aspect of partial recovery, as well as presenting a converse bound that is specific to separate decoding.

As mentioned above, our contributions for the noiseless and symmetric noise models are summarized in Figure 1, where we plot the asymptotic number of tests for achieving or under Bernoulli testing with . Note that in this figure, the number of allowed false positives and/or false negatives (if any) is always assumed to be , though we allow for an arbitrarily small implied constant. Moreover, the horizontal line at the top of each plot also represents a converse for joint decoding with an arbitrarily small fraction of false positives and false negatives.

We make the following observations:

  • In the noiseless case, our asymptotic bounds are within a factor of the optimal threshold for joint decoding as , are reasonable for all with improvements when false positives or false negatives are allowed, and are within a factor of the optimal joint decoding threshold for all when both are allowed. Moreover, with exact recovery and , we strictly improve on the best known bound for any efficient algorithm under Bernoulli testing.

  • For the symmetric noise model, the general behavior is similar, but we significantly outperform the best known previous bound (NCOMP [14])333The final bound stated in [14, Thm. 6] appears to have a term omitted due an incorrect claim at the end of the proof stating that tends to zero (which is only true if for all ). Upon correcting this, the final bound increases from to . This correction was also made in the follow-up paper [24], but that paper only considered the suboptimal choice , yielding a bound with worse constants. We also note that the bounds in [24] for an LP relaxation (using negative tests only) are strictly worse than those stated above for NCOMP. for all . Once again, when both false positives and false negatives are allowed, we are within a factor of the optimal threshold for joint decoding.

  • Although it is not shown in Figure 1, we provide a converse bound showing that if one tries to move beyond our achievability threshold for (or equivalently, the threshold obtained at all with both false positive and false negatives), then any separate decoding of items scheme with Bernoulli testing must have , i.e., the average number of errors is close to the trivial value of that would be obtained by declaring every item as non-defective. In contrast, below the same threshold, we show that is achievable.

The exact recovery results are given in Section II, the partial recovery results are given in Section III, and the converse bounds are given in Section IV. In addition to these theoretical developments, we evaluate the performance of separate decoding via numerical experiments in Section V, showing it to perform well despite being outperformed by the LP relaxation technique of [18].

Ii Achievability Results with Exact Recovery

In this section, we develop the theoretical results leading to the asymptotic bounds for the noiseless and noisy settings in Figure 1. To do this, we first establish non-asymptotic bounds on the error probability, then present the tools for performing an asymptotic analysis, and finally give the details of the applications to specific models.

Ii-a Additional Notation

We define some further notation in addition to that in Section I-A. Our analysis will apply for any given choice of the defective set , due to the symmetry of the observation model (3) and the i.i.d. test matrix . Hence, throughout this section we will focus on the specific set . In particular, we assume that item 1 is defective, and we define accordingly:

(14)

Hence, the summation in (9) can be written as

(15)

where

(16)

Following the terminology of the channel coding literature [32, 33, 34], we refer to this quantity as the information density. Denoting the distribution of a single entry of by , we find that the average of (16) with respect to is the mutual information in (13). With the above distributions in place, we define , , and .

When we specialize our results to the noiseless and symmetric noise models, we will choose

(17)
(18)

For (as we consider), there is essentially no difference between setting or , but we found the former to be slightly more convenient mathematically. Either choice is known to be asymptotically optimal for maximizing in the noiseless model [17, 31]. Perhaps surprisingly, this is no longer true in general for the symmetric noise model (see Appendix -A); however, it simplifies some of the analysis and does not impact the bounds significantly.

Ii-B Initial Non-Asymptotic Bound

The following theorem provides an initial non-asymptotic upper bound on the error probability for general models. The result is proved using simple thresholding techniques that appeared in early studies of channel coding [35, 36], and have also been applied previously in the context of group testing [8, 21, 13].

Theorem 1.

(Non-asymptotic, exact recovery) For a general group testing model with with testing and separate decoding of items according to (9), we have

(19)

where , and is given in (9).

Proof.

For the exact recovery criterion, correct decoding requires the defective items to pass the threshold test, and the non-defective items to fail the threshold test. Hence, by the union bound, we have

(20)

where . We bound the second term by writing

(21)
(22)
(23)

where (21) bounds according to the event in the indicator function, and then bounding the indicator function by one. Combining (20) and (23) completes the proof. ∎

Ii-C Asymptotic Analysis

In order to apply Theorem 1, we need to characterize the probability appearing in the first term. The idea is to exploit the fact that is an i.i.d. sum, and hence concentrates around its mean. While the following corollary is essentially a simple rewriting of Theorem 1, it makes the application of such concentration bounds more transparent Here and subsequently, asymptotic notation such as is with respect to , and we assume that with .

Theorem 2.

(Asymptotic bound, exact recovery) Under the setup of Theorem 1, suppose that the information density satisfies a concentration inequality of the following form:

(24)

for some function . Moreover, suppose that the following conditions hold for some and :

(25)
(26)

Then under the decoder in (9) with .

Proof.

Setting in Theorem 1, we obtain

(27)

By the condition in (25), the probability in (27) is upper bounded by , which in turn is upper bounded by by (24). We therefore have from (27) that , and hence the theorem follows from the assumption along with (26). ∎

Ii-D Concentration Bounds

In order to apply Theorem 2 to specific models, we need to characterize the concentration of and attain an explicit expression for therein. The following lemma brings us one step closer to attaining explicit expressions, giving a general concentration result based on Bernstein’s inequality [37, Ch. 2].

Lemma 1.

(Concentration via Bernstein’s inequality) Defining

(28)
(29)
(30)

we have for any that

(31)
Proof.

Since is a sum of i.i.d. random variables with mean , this lemma is a direct application of Bernstein’s inequality for bounded random variables [37, Ch. 2], with the values , and conveniently defined so that they behave as in typical examples (see Section II-E). ∎

We will use Lemma 1 to establish the results shown for the symmetric noise model in Figure 1 (Right). While we could also use Lemma 1 for the noiseless model, it turns out that we can in fact do better via the following.

Lemma 2.

(Concentration for noiseless model) Under the noiseless model with (cf., (17)), we have for any that

(32)

as and simultaneously.

Proof.

We begin by characterizing at the various outcomes of and their probabilities, as well as the resulting values of . Since by the definition of , the information density simplifies to , and we have the following:

  • with probability , and in this case we deterministically have , yielding .

  • with probability , and conditioned on this event we have the following:

    • with probability , where the first equality follows from the definition of . Hence, in this case we have .

    • with probability , and in this case we have .

From these calculations and (18), we immediately obtain

(33)

Moreover, we see that we can write (with the arguments implicit) as , where the individual distributions of each and are as follows:

(34)
(35)

We proceed by fixing (later to be equated with ) and (later to be taken to zero), and writing

(36)
(37)
(38)

where (36) follows since if both of the events on the right-hand side are violated then so is the event on the left-hand side, and (37) follows from the union bound.

To bound the term , we simply apply the multiplicative form of the Chernoff bound for Binomial random variables [38, Sec. 4.1] to obtain

(39)

As for , a direct calculation yields and . Hence, by Hoeffding’s inequality [37, Ch. 2], we have for any fixed (not depending on ) that

(40)

for some constant and sufficiently large . In particular, since , we have .

Substituting the preceding bounds into (38), we obtain (32) with in place of , and in place of . The lemma is concluded by noting that may be arbitrarily small, and noting from (18) and (33) that . ∎

Ii-E Applications to Specific Models

Noiseless model: For the noiseless group testing model (cf., (1)), we immediately obtain the following from Theorem 2 and Lemma 2.

Corollary 1.

(Noiseless, exact recovery) For the noiseless group testing problem with (cf., (17)) and for some , we can achieve with separate decoding of items provided that

(41)

for some .

Proof.

We showed in the proof of Lemma 2 that (cf., (33)), and hence the first term in (41) follows from (25) with sufficiently slowly. Moreover, by equating with the right-hand side of (32) and performing simple rearranging, we find that the second term in (41) follows from (26). ∎

Symmetric noise model: For the symmetric noisy model (cf., (1)), we make use of Lemma 1, with the constants , and therein characterized in the following. Recall that is the binary entropy function in nats.

Lemma 3.

(Bernstein parameters for symmetric noise) Under the symmetric noise model with a fixed parameter (not depending on ) and (cf., (17)), we have

(42)
(43)
(44)

as and simultaneously.

Proof.

We begin by looking at the various possible outcomes of and their probabilities, as well as the resulting values of . Since by the definition of , the information density simplifies to , and we have the following:

  • with probability , and conditioned on this event we have the following:

    • with probability , and in this case we have .

    • with probability , and in this case we have .

  • with probability , and conditioned on this event we have the following:

    • with probability , where as derived following (32). Hence, in this case, we have .

    • with probability , and in this case, we have .

With these computations in place, the lemma follows easily by evaluating the expectation , variance , and maximum directly, and substituting . We briefly outline the details:

  • For the mean, the contributions corresponding to already give the right-hand side of (42), whereas for , the terms corresponding to and effectively cancel, i.e., their sum is .

  • For the variance, the contributions corresponding to already give the right-hand side of (43), whereas the contributions from the sub-cases of are .

  • For the maximum, we use the fact that for all .

From this lemma, we immediately obtain the following.

Corollary 2.

(Symmetric noise, exact recovery) For noisy group testing with (not depending on ), , and for some , we can achieve with separate decoding of items provided that

(45)

for some , where , , and are respectively given by the right-hand sides of (42)–(44).

We have focused on the case to simplify the analysis and establish an explicit gap to joint decoding as (in which case the first term of (45) dominates for arbitrarily small ). However, we show in Appendix -A that this choice can be suboptimal even as , and that more generally we obtain the sufficient condition in this limit, where , and is the binary KL divergence function. This turns out to marginally improve on the condition obtained via the specific choice .

Other noise models: We showed above how to apply Lemma 1 to the symmetric noise model. However, it can also be applied more generally, yielding an analogous result for any model in which the quantities , , and in (28)–(30) behave as . In particular, for any such model and any fixed , in the limit as , it suffices to have

(46)

for arbitrarily small . In contrast, for strictly greater than zero, the conditions on resulting from Bernstein’s inequality may dominate (46), similarly to Corollary 2.

In the following section, we show that when we move to partial recovery, it is possible to circumvent the difficulties of the concentration bounds for , and to derive sufficient conditions of the form (46) valid for all , even when the term is improved to .

Iii Achievability Results with Partial Recovery

In this section, we show that the analysis of the previous section can easily be adapted to provide achievability results when a certain number false positives and/or false negatives are allowed. We make use of the notation from Sections I-A and II-A.

The main tool we need is the following, whose proof is in fact implicit in our analysis for the exact recovery criterion.

Lemma 4.

(Auxiliary result for partial recovery) For any group testing model of the form (3), under the decoder in (9) with threshold , we have the following:

(i) For any , the probability of passing the threshold test is upper bounded by .

(ii) Suppose that the information density satisfies a concentration inequality of the form (24) for some function , and that the number of tests satisfies . Then for any , the probability of failing the threshold test is upper bounded by .

Proof.

The first part was shown in (23), and the second part is contained in the proof of Theorem 2. ∎

Iii-a General Partial Recovery Achievability Results

We proceed by giving three variations of Theorem 2 for the case that way may tolerate false positives, false negatives, or both. We focus on the case that the number of false negatives and/or false positives is , but we note that the implied constant in the notation may be arbitrarily small.

We begin with the case that only false positives are allowed. This setting is closely related to that of list decoding, which was studied in [9, 39, 40, 26]. Recall that asymptotic notation such as is with respect to , and we assume that with .

Theorem 3.

(Asymptotic bound, false positives only) Consider the group testing problem with for some (not depending on ) and , and suppose that the information density satisfies a concentration inequality of the form (24) for some . Moreover, suppose that the following conditions hold for some and :

(47)
(48)

Then under the decoder in (9) with , we have .

Proof.

The fact that the probability of one or more false negatives tends to zero follows by applying the second part of Lemma 4, and following the proof of (26) in Theorem 2.

Let denote the (random) number of false positives. Setting in the first part of Lemma 4, we obtain . By the assumption , it follows that . Hence, by Markov’s inequality, the probability of must vanish for any . ∎

The main difference in Theorem 3 compared to Theorem 1 is that is replaced by in the numerator. While this may not appear to be a drastic change, it can lead to visible improvements (cf., Figure 1), particularly for moderate to large values of .

Next, we consider the case that only false negatives are allowed, i.e., and .

Theorem 4.

(Asymptotic bound, false negatives only) Consider the group testing problem with for some (not depending on ) and , and suppose that the information density satisfies a concentration inequality of the form (24) for some . Moreover, suppose that the following conditions hold for some and :

(49)
(50)

Then under the decoder in (9) with , we have .

Proof.

Applying the first part of Lemma 4 along with the union bound and the choice , we find that the probability of one or more false negatives is upper bounded by , which vanishes by assumption.

Let denote the (random) number of false negatives. Setting in the second part of Lemma 4, we obtain . By the assumption , it follows that . Hence, by Markov’s inequality, must vanish for any . ∎

By allowing false negatives, we obtain a significantly milder condition on in (50), corresponding to the concentration of . Specifically, all we need is concentration about the mean at an arbitrarily slow rate, whereas in Theorem 2 we needed a rate of . This turns out to significantly reduce the required number of tests for moderate to large values of ; see Corollary 3 below, as well as Figure 1.

Finally, we consider the case that both false positives and false negatives are allowed.

Theorem 5.

(Asymptotic bound, false positives and false negatives) Consider the group testing problem with and for some and (not depending on ) and , and suppose that the information density satisfies a concentration inequality of the form (24) for some function . Moreover, suppose that the following conditions hold for some and :

(51)
(52)

Then under the decoder in (9) with , we have .

Proof.

This result is directly deduced from the proofs of Theorems 3 and 4. ∎

As we show in Corollary 5 below, this result leads to a broad class of noise models where the threshold can be achieved for all with partial recovery.

Iii-B Applications to Specific Models

Noiseless model: The following corollary gives three variations of the result in Corollary 1 corresponding to the three partial recovery settings considered in the previous subsection.

Corollary 3.

(Noiseless model, partial recovery) For the noiseless group testing problem with (cf., (17)) and for some , we can achieve with separate decoding of items under any of the following conditions:

(i)