GraphConstrained Group Testing
Abstract
Nonadaptive group testing involves grouping arbitrary subsets of items into different pools. Each pool is then tested and defective items are identified. A fundamental question involves minimizing the number of pools required to identify at most defective items. Motivated by applications in network tomography, sensor networks and infection propagation, a variation of group testing problems on graphs is formulated. Unlike conventional group testing problems, each group here must conform to the constraints imposed by a graph. For instance, items can be associated with vertices and each pool is any set of nodes that must be path connected. In this paper, a test is associated with a random walk. In this context, conventional group testing corresponds to the special case of a complete graph on vertices.
For interesting classes of graphs a rather surprising result is obtained, namely, that the number of tests required to identify defective items is substantially similar to what is required in conventional group testing problems, where no such constraints on pooling is imposed. Specifically, if corresponds to the mixing time of the graph , it is shown that with nonadaptive tests, one can identify the defective items. Consequently, for the ErdősRényi random graph , as well as expander graphs with constant spectral gap, it follows that nonadaptive tests are sufficient to identify defective items. Next, a specific scenario is considered that arises in network tomography, for which it is shown that nonadaptive tests are sufficient to identify defective items. Noisy counterparts of the graph constrained group testing problem are considered, for which parallel results are developed. We also briefly discuss extensions to compressive sensing on graphs.
I Introduction
In this paper we introduce the graph constrained group testing problem motivated by applications in network tomography, sensor networks and infection propagation. While group testing theory (see [1, 2] and more recently [3]), and its numerous applications, such as industrial quality assurance [4], DNA library screening [5], software testing [6], and multiaccess communications [7], have been systematically explored, the graph constrained group testing problem is new to the best of our knowledge.
Group testing involves identifying at most defective items out of a set of items. In nonadaptive group testing, which is the subject of this paper, we are given an binary matrix, , usually referred to as a test or measurement matrix. Ones on the th row of indicate which subset of the items belongs to the th pool. A test is conducted on each pool; a positive outcome indicating that at least one defective item is part of the pool; and a negative test indicating that no defective items are part of the pool. The conventional group testing problem is to design a matrix with minimum number of rows that guarantees error free identification of the defective items. While the best known (probabilistic) pooling design requires a test matrix with rows, and an almostmatching lower bound of is known on the number of pools (cf. [2, Chapter 7]), the size of the optimal test still remains open.
Note that in the standard group testing problem the test matrix can be designed arbitrarily. In this paper we consider a generalization of the group testing problem to the case where the matrix must conform to constraints imposed by a graph . In general, as we will describe shortly, such problems naturally arise in several applications such as network tomography [8, 9], sensor networks [10], and infection propagation [11]. While the graph constrained group testing problem has been alluded to in these applications, the problem of test design or the characterization of the minimum number of tests, to the best of our knowledge, has not been addressed before. In this light our paper is the first to formalize the graph constrained group testing problem. In our graph group testing problem the items are either vertices or links (edges) of the graph; at most of them are defective. The task is to identify the defective vertices or edges. The test matrix is constrained as follows: for items associated with vertices each row must correspond to a subset of vertices that are connected by a path on the graph; similarly, for items associated with links each row must correspond to links that form a path on . The task is to design an binary test matrix with minimum number of rows that guarantees error free identification of the defective items.
We will next describe several applications, which illustrate the graph constrained group testing problem.
Ia Network Tomography & Compressed Sensing over Graphs
For a given network, identification of congested links from endtoend path measurements is one of the key problems in network tomography [9], [8]. In many settings of today’s IP networks, there is one or a few links along the path which cause the packet losses in the path. Finding the locations of such congested links is sufficient for most of the practical applications.
This problem can be understood as a graphconstrained group testing as follows. We model the network as a graph where the set denotes the network routers/hosts and the set denotes the communication links (see Fig. 1). Suppose, we have a monitoring system that consists of one or more end hosts (so called vantage points) that can send and receive packets. Each vantage point sends packets through the network by assigning the routes and the end hosts.
All measurement results (i.e., whether each packet has reached its destination) will be reported to a central server whose responsibility is to identify the congested links. Since the network is given, not any route is a valid one. A vantage point can only assign those routes which form a path in the graph . The question of interest is to determine the number of measurements that is needed in order to identify the congested links in a given network.
We primarily deal with Boolean operations on binary valued variables in this paper, namely, link states are binary valued and the measurements are boolean operations on the link states. Nevertheless, the techniques described here can be extended to include nonboolean operations and nonbinary variables as well. Specifically, suppose there are a sparse set of links that take on nonzero values. These nonzero values could correspond to packet delays, and packet loss probabilities along each link. Measurements along each path provides aggregate delay or aggregate loss along the path. The set of paths generated by random walks forms a routing matrix . For an appropriate choice of and graphs studied in this paper, it turns out (see [12]) that such routing matrices belongs to the class of so called expander matrices. These expander type properties in turn obey a suitable type of restrictedisometryproperty (1RIP) [13]. Such properties in turn are sufficient for recovering sparse vectors using optimization techniques. Consequently, the results of this paper have implications for compressed sensing on graphs.
IB Sensor Networks
The network tomography problem is further compounded in wireless sensor networks (WSN). As described in [10] the routing topology in WSN is constantly changing due to the inherent adhoc nature of the communication protocols. The sensor network is static with a given graph topology such as a geometric random graph. Sensor networks can be monitored passively or actively. In passive monitoring, at any instant, sensor nodes form a tree to route packets to the sink. The routing tree constantly changes unpredictably but must be consistent with the underlying network connectivity. A test is considered positive if the arrival time is significantly large, which indicates that there is at least one defective sensor node or a congested link. The goal is to identify defective links or sensor nodes based on packet arrival times at the sink. In active monitoring network nodes continuously calculate some high level, summarized information such as the average or maximum energy level among all nodes in the network. When the high level information indicates congested links, a low level and more energy consuming procedure is used to accurately locate the trouble spots.
IC Infection Propagation
Suppose that we have a large population where only a small number of people are infected by a certain viral sickness (e.g., a flu epidemic). The task is to identify the set of infected individuals by sending agents among them. Each agent contacts a predetermined or randomly chosen set of people. Once an agent has made contact with an infected person, there is a chance that he gets infected, too. By the end of the testing procedure, all agents are gathered and tested for the disease. While this problem has been described in [11], the analysis ignores the inherent graph constraints that need to be further imposed. It is realistic to assume that, once an agent has contacted a person, the next contact will be with someone in close proximity of that person. Therefore, in this model we are given a random geometric graph that indicates which set of contacts can be made by an agent (see Fig. 2). Now, the question is to determine the number of agents that is needed in order to identify the set of infected people.
These applications present different cases where graph constrained group testing can arise. However, there are important distinctions. In the wired network tomography scenario the links are the items and each row of the matrix is associated with a route between any two vantage points. A test is positive if a path is congested, namely, if it contains at least one congested link. Note that in this case since the routing table is assumed to be static, the route between any two vantage points is fixed. Consequently, the matrix is deterministic and the problem reduces to determining whether or not the matrix satisfies identifiability.
Our problem is closer in spirit to the wireless sensor network scenario. In the passive case the links are the items and each row of the matrix is associated with a route between a sensor node and the sink. A test is positive if a path is congested, namely, if it contains at least one congested link. Note that in this case since the routing table is constantly changing, the route between a sensor node and the sink is constantly changing as well. Nevertheless the set of possible routes must be drawn from the underlying connectivity graph. Consequently, the matrix can be assumed to be random and the problem is to determine how many different tests are required to identify the congested links. Note that, in contrast to the wired scenario, tests conducted between the same sensor node and sink yields new information here. A similar situation arises in the active monitoring case as well. Here one could randomly query along different routes to determine whether or not a path is congested. These tests can be collated to identify congested links. Note that in the active case the test matrix is amenable to design in that one could selectively choose certain paths over others by considering weighted graphs.
Motivated by the WSN scenario we describe pool designs based on random walks on graphs. As is well known a random walk is the state evolution on a finite reversible Markov chain. Each row of the binary test matrix is derived from the evolution of the random walk, namely, the ones on the th row of correspond to the vertices visited by the th walk. This is close to the WSN scenario because as in the WSN scenario the path between two given nodes changes randomly. We develop several results in this context.
First, we consider random walks that start either at a random node or an arbitrary node but terminate after some appropriately chosen number of steps . By optimizing the length of the walk we arrive at an interesting result for important classes of graphs. Specifically we show that the number of tests required to identify defective items is substantially similar to that required in conventional group testing problems, except the fact that an extra term appears which captures the topology of the underlying graph. The best known result for the number of tests required when no graphical constraints are imposed scales as . For the graph constrained case we show that with nonadaptive tests one can identify the defective items, where corresponds to the mixing time of the underlying graph . Consequently, for the ErdősRényi random graph with , as well as expander graphs with constant spectral gap, it follows that nonadaptive tests are sufficient to identify defective items. In particular, for a complete graph where no pooling constraint is imposed, we have , and therefore, our result subsumes the wellknown result for the conventional group testing problem.
Next we consider unboundedlength random walks that originate at a source node and terminate at a sink node. Both the source node and the sink node can either be arbitrary or be chosen uniformly at random. This directly corresponds to the network tomography problem that arises in the WSN context. This is because the source nodes can be viewed as sensor nodes, while the sink node maybe viewed as the fusion center, where data is aggregated. At any instant, we can assume that a random tree originating at the sensor nodes and terminating at the sink is realized. While this random tree does not have cycles, there exist close connections between random walks and randomly generated trees. Indeed, it is well known that the so called looperased random walks, obtained by systematically erasing loops in random walks, to obtain spanning trees, is a method for sampling spanning trees from a uniform distribution [14]. In this scenario, we show that nonadaptive tests are sufficient to identify defective items. By considering complete graphs we also establish that the cubic dependence on in this result cannot be improved.
We will also consider noisy counterparts of the graph constrained group testing problem, where the outcome of each measurement may be independently corrupted (flipped) with probability^{1}^{1}1It is clear that if , one can first flip all the outcomes, and then reduce the problem to the regime. For , since we only observe purely random noise, there is no hope to recover from the errors. . We develop parallel results for these cases. In addition to a setting with noisy measurement outcomes, these results can be used in a so called dilution model (as observed in [3, 11]). In this model, each item can be diluted in each test with some a priori known probability. In a network setting, this would correspond to the case where a test on a path with a congested link can turn out to be negative with some probability. We show that similar scaling results holds for this case as well.
Other group testing problems on graphs: Several variations of classical group testing have been studied in the literature that possess a graph theoretic nature. A notable example is the problem of learning hidden sparse subgraphs (or more generally, hypergraphs), defined as follows (cf. [15]): Assume that, for a given graph, a small number of the edges are marked as defective. The problem is to use a small number of measurements of the following type to identify the set of defective edges: Each measurement specifies a subset of vertices, and the outcome would be positive iff the graph induced on the subset contains a defective edge. Another variation concerns group testing with constraints defined by a rooted tree. Namely, the set of items corresponds to the leaves of a given rooted tree, and each test is restricted to pool all the leaves that descend from a specified node in the tree (see [2, Chapter 12]). To the best of our knowledge, our work is the first variation to consider the natural restriction of the pools with respect to the paths on a given graph.
The rest of this paper is organized as follows. In Section II, we introduce our notation and mention some basic facts related to group testing and random walks on graphs. Section III formally describes the problem that we consider and states our main results. In Section IV we prove the main results, and finally, in Section V show instantiations of the result to the important cases of graphconstrained group testing on regular expander graphs and random graphs in the ErdősRényi model.
Ii Definitions and Notation
In this section we introduce some tools, definition and notations which are used throughout the paper.
Definition 1.
For two given boolean vectors and of the same length we denote their elementwise logical by . More generally, we will use to denote the elementwise of boolean vectors . The logical subtraction of two boolean vectors and , denoted by , is defined as a boolean vector which has a at position if and only if and . We also use to show the number of ’s in (i.e., the Hamming weight of) a vector .
We often find it convenient to think of boolean vectors as characteristic vectors of sets. That is, would correspond to a set (where ) such that iff the entry at the th position of is . In this sense, the above definition extends the settheoretic notions of union, subtraction, and cardinality to boolean vectors.
Matrices that are suitable for the purpose of group testing are known as disjunct matrices. The formal definition is as follows.
Definition 2.
An boolean matrix is called disjunct, if, for every column and every choice of columns of (different from ), there is at least one row at which the entry corresponding to is and those corresponding to are all zeros. More generally, for an integer , the matrix is called disjunct if for every choice of the columns as above, they satisfy
A disjunct matrix is said to be disjunct.
A classical observation in group testing theory states that disjunct matrices can be used in nonadaptive group testing schemes to distinguish sparse boolean vectors (cf. [2]). More precisely, suppose that a disjunct matrix with columns is used as the measurement matrix; i.e., we assume that the rows of are the characteristic vectors of the pools defined by the scheme. Then, the test outcomes obtained by applying the scheme on two distinct sparse vectors of length must differ in at least one position. More generally, if is taken to be disjunct, the test outcomes must differ in at least positions. Thus, the more general notion of disjunct matrices is useful for various “noisy” settings, where we are allowed to have a few false outcomes (in particular, up to incorrect measurement outcomes can be tolerated by disjunct matrices without causing any confusion).
For our application, sparse vectors (that are to be distinguished) correspond to boolean vectors encoding the set of defective vertices (or edges) in a given undirected graph. The encoding is such that the coordinate positions are indexed by the set of vertices (edges) of the graph and a position contains iff it corresponds to a defective vertex (edge). Moreover, we aim to construct disjunct matrices that are also constrained to be consistent with the underlying graph.
Definition 3.
Let be an undirected graph, and and be boolean matrices with and columns, respectively. The columns of are indexed by the elements of and the columns of are indexed by the elements of . Then,

The matrix is said to be vertexconsistent with if each row of , seen as the characteristic vector of a subset of , exactly represents the set of vertices visited by some walk on .

The matrix is said to be edgeconsistent with if each row of , seen as the characteristic vector of a subset of , exactly corresponds to the set of edges traversed by a walk on .
Note that the choice of the walk corresponding to each row of or need not be unique. Moreover, a walk may visit a vertex (or edge) more than once.
Definition 4.
An undirected graph is called uniform, for some , if the degree of each vertex (denoted by ) is between and .
Definition 5.
The pointwise distance of two probability distributions on a finite space is defined as
where (resp., ) denotes the probability assigned by (resp., ) to the outcome . We say that the two distributions are close if their pointwise distance is at most .
For notions such as random walks, stationary distribution and mixing time we refer to many text books on probability theory, Markov chains, and randomized algorithms. In particular for an accessible treatment of the basic notions, see [16, Chapter 6] or [17, Chapter 7]. The particular variation of the mixing time that we will use in this work is defined with respect to the pointwise distance as follows.
Definition 6.
Let with be a uniform graph and denote by its stationary distribution. For and an integer , denote by the distribution that a random walk of length starting at ends up at. Then, the mixing time of (with respect to the norm^{2}^{2}2Note that the mixing time highly depends on the underlying distance by which the distance between two distributions is quantified. In particular, we are slightly deviating from the more standard definition which is with respect to the variation distance (see, e.g., [17, Definition 11.2]).) is the smallest integer such that , for and . For concreteness, we define the quantity as the mixing time of for .
Throughout this work, the constraint graphs are considered to be uniform, for an appropriate choice of and some (typically constant) parameter . When , the graph is regular.
For a graph to have a small mixing time, a random walk starting from any vertex must quickly induce a uniform distribution on the vertex set of the graph. Intuitively this happens if the graph has no “bottle necks” at which the walk can be “trapped”, or in other words, if the graph is “highly connected”. The standard notion of conductance, as defined below, quantifies the connectivity of a graph.
Definition 7.
Let be a graph on vertices. For every , define , , and denote by the number of edges crossing the cut defined by and its complement. Then the conductance of is defined by the quantity
We also formally define two important classes of graphs, for which we will specialize our results.
Definition 8.
Take a complete graph on vertices, and remove edges independently with probability . The resulting graph is called the ErdősRényi random graph, and denoted by .
Definition 9.
For a graph with , the (edge) expansion of is defined as
A family of regular graphs is called an (edge) expander family if there exists a constant such that for each . In particular each is called an expander graph.
For a general study of ErdősRényi random graphs and their properties we refer to the fascinating book of Bollobás [18]. For the terminology on expander graphs, we refer the reader to the excellent survey by Hoory, Linial and Wigderson [19].
Definition 10.
Consider a particular random walk of length on a graph , where the random variables denote the vertices visited by the walk, and form a Markov chain. We distinguish the following quantities related to the walk :

For a vertex (resp., edge ), denote by (resp., ) the probability that passes (resp., ).

For a vertex (resp., edge ) and subset , (resp., , ), denote by (resp., ) the probability that passes but none of the vertices in (resp., passes but none of the edges in ).
Note that these quantities are determined by not only (indicated as subscripts) but they also depend on the choice of the underlying graph, the distribution of the initial vertex and length of the walk . However, we find it convenient to keep the latter parameters implicit when their choice is clear from the context.
In the previous definition, the length of the random walk was taken as a fixed parameter . Another type of random walks that we consider in this work have their end points as a parameter and do not have an a priori fixed length. In the following, we define similar probabilities related to the latter type of random walks.
Definition 11.
Consider a particular random walk on a graph that continues until it reaches a fixed vertex . We distinguish the following quantities related to : For a vertex (resp., edge ) and subset , (resp., , ), denote by (resp., ) the probability that passes but none of the vertices in (resp., passes but none of the edges in ).
Again these quantities depend on the choice of and the distribution of that we will keep implicit.
Iii Problem setting and Main Results
Problem Statement. Consider a given graph in which at most vertices (resp., edges) are defective. The goal is to characterize the set of defective items using a number of measurements that is as small as possible, where each measurement determines whether the set of vertices (resp., edges) observed along a path on the graph has a nonempty intersection with the defective set. We call the problem of finding defective vertices vertex group testing and that of finding defective edges edge group testing.
As mentioned earlier, not all sets of vertices can be grouped together, and only those that share a path on the underlying graph can participate in a pool (see Fig. 3).
In the following, we introduce four random constructions (designs) for both problems. The proposed designs follow the natural idea of determining pools by taking random walks on the graph.
Design 1. Given: a constraint graph with designated vertices , and integer parameters and . Output: an boolean matrix . Construction: Construct each row of independently as follows: Let be any of the designated vertices , or otherwise a vertex chosen uniformly at random from . Perform a random walk of length starting from , and let the corresponding row of be the characteristic vector of the set of vertices visited by the walk.
Design 2. Given: a constraint graph and integer parameters and . Output: an boolean matrix . Construction: Construct each row of independently as follows: Let be any arbitrary vertex of . Perform a random walk of length starting from , and let the corresponding row of be the characteristic vector of the set of edges visited by the walk.
Design 3. Given: a constraint graph with designated vertices , a sink node , and integer parameter . Output: an boolean matrix . Constructions: Construct each row of independently as follows: Let be any of the designated vertices , or otherwise a vertex chosen uniformly at random from . Perform a random walk starting from until we reach , and let the corresponding row of be the characteristic vector of the set of vertices visited by the walk.
Design 4. Given: a constraint graph , a sink node , and integer parameter . Output: an boolean matrix . Construction: Construct each row of independently as follows: Let be any arbitrary vertex of . Perform a random walk, starting from until we reach , and let the corresponding row of be the characteristic vector of the set of edges visited by the walk.
By construction, Designs 1 and 3 (resp., Designs 2 and 4) output boolean matrices that are vertex (resp., edge) consistent with the graph . Our main goal is to show that, when the number of rows is sufficiently large, the output matrices become disjunct (for a given parameter ) with overwhelming probability.
Remark 12.
Designs 1 and 3 in particular provide two choices for constructing the measurement matrix . Namely, the start vertices can be chosen within a fixed set of designated vertices, or, chosen randomly among all vertices of the graph. As we will see later, in theory there is no significant difference between the two schemes. However, for some applications it might be the case that only a small subset of vertices are accessible as the starting points (e.g., in network tomography such a subset can be determined by the vantage points), and this can be modeled by an appropriate choice of the designated vertices in Designs 1 and 3.
Parameter  Value 

The following theorem states the main result of this work, showing that our proposed designs indeed produce disjunct matrices that can be used for the purpose of graphconstrained group testing. We will state both noiseless results (corresponding to disjunct matrices), and noisy ones (corresponding to disjunct ones, where the noise tolerance depends on a fixed “noise parameter” ). The proof of the following theorem is given in Section IV.
Theorem 13.
Let be a fixed parameter, and suppose that is a uniform graph on vertices with mixing time (where ). Then there exist parameters with asymptotic values given in Table I such that, provided that ,

Design 1 with the path length and the number of measurements outputs a matrix that is vertexconsistent with . Moreover, once the columns of corresponding to the designated vertices are removed, the matrix becomes disjunct with probability . More generally, for the matrix becomes disjunct with probability .

Design 2 with path length and measurements outputs a matrix that is edgeconsistent with and is disjunct with probability . More generally, for the matrix becomes disjunct with probability .

Design 3 with the number of measurements outputs a matrix that is vertexconsistent with . Moreover, once the columns of corresponding to the designated vertices and the sink node are removed, the matrix becomes disjunct with probability . More generally, for the matrix becomes disjunct with probability .

Design 4 with the number of measurements outputs a matrix that is edgeconsistent with and is disjunct with probability . More generally, for the matrix becomes disjunct with probability .
Remark 14.
In Designs 1 and 3, we need to assume that the designated vertices (if any) are not defective, and hence, their corresponding columns can be removed from the matrix . By doing so, we will be able to ensure that the resulting matrix is disjunct. Obviously, such a restriction cannot be avoided since, for example, might be forced to contain an allones column corresponding to one of the designated vertices and thus, fail to be even disjunct.
Remark 15.
By applying Theorem 13 on the complete graph (using Design 1), we get measurements, since in this case, the mixing time is and also . Thereby, we recover the tradeoff obtained by the probabilistic construction in classical group testing (note that classical group testing corresponds to graphconstrained group testing on the vertices of the complete graph).
We will show in Section V that, for our specific choice of , the mixing time of an ErdősRényi random graph is (with overwhelming probability) . This bound more generally holds for any graph with conductance , and in particular, expander graphs with constant spectral gap. Thus we have the following result (with a summary of the achieved parameters given in Table II).
Theorem 16.
There is an integer such that for every the following holds: Suppose that the graph is either

A regular expander graph with normalized second largest eigenvalue (in absolute value) that is bounded away from ; i.e., , or,

An ErdősRényi random graph .
Then for every , with probability Designs 1, 2, 3, and 4 output disjunct matrices (not considering the columns corresponding to the designated vertices and the sink in Designs 1 and 3), for some , using respectively measurements, where , , and .
Parameter  Value 

The fixedinput case. Recall that, as Theorem 13 shows, our proposed designs almost surely produce disjunct matrices using a number of measurements summarized in Table I. Thus, with overwhelming probability, once we fix the resulting matrix, it has the combinatorial property of distinguishing between any two sparse boolean vectors (each corresponding to a set of up to defective vertices, not including designated ones, for Designs 1 and 3, or up to defective edges for Designs 2 and 4) in the worst case. However, the randomized nature of our designs can be used to our benefit to show that, practically, one can get similar results with a number of measurements that is almost by a factor smaller than what required by Theorem 13. Of course, assuming a substantially lower number of measurements, we should not expect to obtain disjunct matrices, or equivalently, to be able to distinguish between any two sparse vectors in the worst case. However, it can be shown that, for every fixed sparse vector , the resulting matrix with overwhelming probability will be able to distinguish between and any other sparse vector using a lower number of measurements. In particular, with overwhelming probability (over the choice of the measurements), from the measurement outcomes obtained from , it will be possible to uniquely reconstruct . More precisely, it is possible to show the following theorem, as proved in Section IV.
Theorem 17.
Consider the assumptions of Theorem 13, and let . Consider any fixed set of up to vertices such that and and any fixed set of up to edges , . Then with probability over the randomness of the designs the following holds.
Let respectively denote the measurement matrices produced by Designs with the number of rows set to . Then for every and every such that , and , , , we have that

The measurement outcomes of on and (resp., on and ) differ at more than (resp., ) positions.

The measurement outcomes of on and (resp., on and ) differ at more than (resp., ) positions.
A direct implication of this result is that (with overwhelming probability), once we fix the matrices obtained from our randomized designs with the lowered number of measurements (namely, having rows), the fixed matrices will be able to distinguish between almost all pairs of sparse vectors (and in particular, uniquely identify randomly drawn sparse vectors, with probability over their distribution).
Example in Network Tomography. Here we illustrate a simple concrete example that demonstrates how our constructions can be used for network tomography in a simplified model. Suppose that a network (with known topology) is modeled by a graph with nodes representing routers and edges representing links that connect them, and it is suspected that at most links in the network are congested (and thus, packets routed through them are dropped). Assume that, at a particular “source node” , we wish to identify the set of congested links by distributing packets that originate from in the network.
First, generates a packet containing a time stamp and sends it to a randomly chosen neighbor, who in turn, decrements the time stamp and forwards the packet to a randomly chosen neighbor, etc. The process continues until the time stamp reaches zero, at which point the packet is sent back to along the same path it has traversed. This can be achieved by storing the route to be followed (which is randomly chosen at ) in the packet. Alternatively, for practical purposes, instead of storing the whole route in the packet, can generate and store a random seed for a pseudorandom generator as a header in the packet. Then each intermediate router can use the specified seed to determine one of its neighbors to which the packet has to be forwarded.
Using the procedure sketched above, the source node generates a number of independent packets, which are distributed in the network. Each packet is either returned back to in a timely manner, or, eventually do not reach due to the presence of a congested link within the route. By choosing an appropriate timeout, can determine the packets that are routed through the congested links.
The particular scheme sketched above implements our Design 2, and thus Theorem 13 implies that, by choosing the number of hops appropriately, after generating a sufficient number of packets (that can be substantially smaller than the size of the network), can determine the exact set of congested links. This result holds even if a number of the measurements produce false outcomes (e.g., a congested link may nevertheless manage to forward a packet, or a packet may be dropped for reasons other than congestion), in which case by estimating an appropriate value for the noise parameter in Theorem 13 and increasing the number of measurements accordingly, the source can still correctly distinguish the congested links. Of course one can consider different schemes for routing the test packets. For example, it may be more desirable to forward the packets until they reach a predetermined “sink node”, an approach that is modeled by our Designs 3 and 4 above.
Iv Proof of Theorems 13 and 17
Before discussing Theorem 13 and its proof, we introduce some basic propositions that are later used in the proof. The omitted proofs will be presented in the appendix. Throughout this section, we consider an underlying graph that is uniform, with mixing time as in Definition 6.
Proposition 18.
Let be events on a finite probability space, define , and suppose that:

For every , .

For every set with , .
Then, .
The proof of this proposition may be found in Section AA. The following proposition is a corollary of a wellknown result for the stationary distribution of irregular graphs [17, Theorem 7.13]. A formal proof of this proposition is given in Section AB.
Proposition 19.
Let be a uniform graph, and denote by the stationary distribution of (assuming that is not bipartite). Then for each , .
Proposition 20.
For the quantities and in Definition 10, we have
The proof of this Proposition 20 is presented in Section AC. In fact, a stronger statement than this proposition can be obtained, that with noticeable probability, every fixed vertex (or edge) is hit by the walk at least once but not too many times nor too “early”. This is made more precise in the following two propositions, which are proved in Sections AD and AE, respectively.
Proposition 21.
Consider any walk in Design 1 (resp., Design 2). There is a such that, for every and every , the probability that passes (resp., ) more than times is at most (resp., ).
Proposition 22.
For any random walk in Design 1, let be any vertex that is not among the designated vertices . Then the probability that visits within the first steps is at most .
The following proposition shows that the distributions of two vertices on a random walk that are far apart by a sufficiently large number of steps are almost independent. The proof of this proposition may be found in Section AF.
Proposition 23.
Consider a random walk on starting from an arbitrary vertex, and suppose that . Let denote any event that only depends on the first vertices visited by the walk. Then for every ,
The following lemmas, which form the technical core of this work, lower bound the quantities , , , as defined by Definitions 10 and 11.
Lemma 24.
There is a and such that whenever , by setting the path lengths in Design 1 the following holds. Let , and be a set of at most vertices in such that and does not include any of the designated vertices . Then
(1) 
Proof:
Denote by the stationary distribution of . We know from Proposition 19 that for each , .
Let be the quantity given by Proposition 21, denote the bad event that hits some vertex in . Moreover, let denote the good event that hits no more than times in total and never within the first steps. The probability of is, by Propositions 21 and 22, at least
which can be made arbitrarily close to (say larger than ) by choosing sufficiently large and sufficiently small (as required by the statement). Now,
(2) 
By taking large enough, and in particular, , we can ensure that
Combined with Proposition 21, we have , since
Thus, (2) gives
(3) 
Now we need to upperbound . Before doing so, fix some , and assume that . Moreover, fix some vertex and assume that . We first try to upperbound .
Let and , and for the moment, assume that (a “degenerate” situation occurs when this is not the case). Partition into four parts:
For , define
Now we upperbound each of the . In a degenerate situation, some of the may be empty, and the corresponding will be zero.
Each of the subwalks and are “oblivious” of the conditioning on and (because they are sufficiently far from both and Proposition 23 applies). In particular, the distribution of each vertex on is pointwise close to . Therefore, under our conditioning the probability that each such vertex belongs to is at most . The argument on is similar, but more care is needed. Without the conditioning on , each vertex on has an almoststationary distribution. Moreover, by Proposition 23, the conditioning on changes this distribution by up to at each point. Altogether, for each , we have
Using a union bound on the number of steps, we conclude that .
In order to bound , we observe that of all or more neighbors of , at most can lie on . Therefore,
Similarly,
regardless of which means
and in general,
(4) 
Similarly we have,
and by Proposition 23 (and timereversibility), conditioning on changes this probability by at most . Therefore,
and in general,
(5) 
Altogether, using a union bound and by combining (4) and (5), we get that
Using the same reasoning, can be bounded as
Finally, we obtain
(6) 
Our next step is to relax the conditioning on the starting point of the walk. The probability that the initial vertex is in is at most (as this happens only when the initial vertex is taken randomly), and by Proposition 23, conditioning on changes this probability by at most . Now we write
where we have used the chain rule in the first inequality, and Proposition 18 with for the second one. Now, since is very close to , conditioning on this event does not increase probabilities by much (say no more than a factor ). Therefore,
Now in the probability space conditioned on , define events , , where is the event that . Note that the intersection of more than of the is empty (as conditioning on implies that the walk never passes more than times), and moreover, the union of these is the event that the walk passes . Now we apply Proposition 18 to conclude that
By taking and we can make the right hand side arbitrarily small (say at most ). Now we get back to (3) to conclude, using Proposition 20, that
∎
Similarly, we can bound the edgerelated probability as in the following lemma. The proof of the lemma is very similar to that of Lemma 24, and is therefore skipped for brevity.
Lemma 25.
There is a and such that whenever , by setting the path lengths in Design 2 the following holds. Let be a set of at most edges in , and , . Then
(7) 
In Designs 3 and 4, the quantities and defined in Definition 11 play a similar role as and . In order to prove disjunctness of the matrices obtained in Designs 3 and 4, we will need lower bounds on and as well. In the following we show the desired lower bounds.
Lemma 26.
There is a such that whenever , in Design 3 the following holds. Let , and be a set of at most vertices in such that and is disjoint from . Then
(8) 
Proof:
Let and be quantities given by Lemma^{3}^{3}3In fact, as will be clear by the end of the proof, Lemma 24 should be applied with the sparsity parameter instead of . However, this will only affect constant factors that we ignore. 24. Let denote the start vertex of a walk performed in Design 3, and consider an infinite walk that starts from a vertex identically distributed with . Let the random variables respectively denote the times that visits , and any of the vertices in for the first time. Therefore, , , and , for every and so on. Then the quantity that we wish to bound corresponds to the probability that , that is, probability of the event that in , the first visit of occurs before the walk reaches the sink node for the first time, and moreover, the walk never hits before reaching . Observe that this event in particular contains the subevent that , , and , where is picked as in Lemma 24. Denote by the set of all sequences of vertices of (i.e., walks of length ) that include but not any of the vertices in . Now, we can write
(10)  
The probability is exactly with respect to the start vertex . Therefore, Lemma 24 gives the lower bound
Furthermore observe that, regardless of the outcome , we have
where is taken with respect to the start vertex . Therefore, since , again we can use Lemma 24 to conclude that
By plugging the bounds in (10) the claim follows. ∎
A similar result can be obtained for Design 4 on the edges. Since the arguments are very similar, we only sketch a proof.
Lemma 27.
There is a such that whenever , in Design 4 the following holds. Let be a set of at most edges in , and , . Then
(11) 
Proof:
(sketch) Similar to the proof of Lemma 26, we consider an infinite continuation of a walk performed in Design 4 and focus on its first steps, where and are respectively the time parameters given by Lemmas 24 and 25. Let
Again following the argument of Lemma 26, we lower bound by the probability of a subevent consisting the intersection of the following two events:

The event that visits but neither the sink node nor any of the edges in , and

The event that visits the sink node but none of the edges in .
Consider the set consisting of the endpoints of the edges in and denote by any of the endpoints of . Let (with respect to the start vertex ). Now, since upon visiting , there is a chance that the next edge taken by the walk turns out to be . The quantity in turn, can be lower bounded using Lemma 24. Moreover, regardless of the outcome of , the probability that visits but not (and subsequently, the conditional probability ) is at least the probability (with respect to the start vertex ), where