# Distributed sensor failure detection in sensor networks

###### Abstract

We investigate the problem of distributed sensors’ failure detection in networks with a small number of defective sensors, whose measurements differ significantly from neighboring sensor measurements. Defective sensors are represented by non-zero values in binary sparse signals. We build on the sparse nature of the binary sensor failure signals and propose a new distributed detection algorithm based on Group Testing (GT). The distributed GT algorithm estimates the set of defective sensors from a small number of linearly independent binary messages exchanged by the sensors. The distributed GT algorithm uses a low complexity distance decoder that is robust to noisy messages. We first consider networks with only one defective sensor and determine the minimal number of linearly independent messages needed for detection of the defective sensor with high probability. We then extend our study to the detection of multiple defective sensors by modifying appropriately the message exchange protocol and the decoding procedure. We show through experimentation that, for small and medium sized networks, the number of messages required for successful detection is actually smaller than the minimal number computed in the analysis. Simulations demonstrate that the proposed method outperforms methods based on random walk measurements collection in terms of detection performance and convergence rate. Finally, the proposed method is resilient to network dynamics due to the effective gossip-based message dissemination protocol.

## I Introduction

Over the past years we have witnessed the emergence of simple and low cost sensors. This has led to wide deployment of sensor networks for monitoring signals in numerous applications, for example in medical applications or natural hazard detection. However, sensor networks have often a dynamic architecture with loose coordination due to the cost of communications. This raises new demands for collaborative data processing algorithms that are effective under network topology and communication constraints. In general, a sensor network is represented as a connected graph , where vertices stand for the sensors and edges determine sensors’ connectivity. For instance, if two sensors and lie within each other’s communication range, the edge has a nonzero value. Fig. 1 illustrates a setup where sensors capture a smooth physical phenomenon (e.g., spatial temperature evolution) and generate messages that are eventually gathered for analysis.

When a sensor is defective, its measurements are inaccurate and can contaminate the signal analysis. It thus becomes important to detect the defective sensors in the network, so that their erroneous values do not impact the accuracy of the underlying data processing applications.

The detection literature can be mostly classified into centralized and distributed methods. Most of the works on detection methods for binary sparse signals mainly deal with centralized systems. The pioneering work in [1] targets medical applications. It proposes a simple idea of pooling blood samples and observing the viral presence in a set, instead of performing tests on every single blood sample separately. Typically, the main target is to minimize the number of tests required to identify all the infected samples, while keeping the detection procedure as simple as possible. This paradigm is known as Group Testing (GT) and has been proposed half a century ago. GT has been studied more recently in the context of sensor networks for detection of malicious events [2]. The detection approaches differ in scenarios with errors, inhibitors or combinations of them and the detection algorithms are rather naive [3]. Defective sensors are detected by iterative elimination of identified non-defective sensors from the test outcomes. The detection time is typically of order , where is the number of tests and is the total number of sensors. Particular test design methods improve the effective time for detection in centralized systems. For example, a useful test matrix property called -disjunctness property (i.e., the Boolean sum of every columns does not result in any other column), speeds up the decoding process. This property is used in code designs, for e.g., for superimposed codes [4], [5]. Further, a random efficient detection is proposed in [6] with a decoding time of , where is the number of tests and denotes a polynomial. In our knowledge, this represents the state-of-the-art decoding performance.

In sensor networks, test design is contingent to the communication limitations. Works that consider constraints imposed by the sensor network topology in GT framework are not numerous. The authors in [7] propose to form tests by a random walk process on well-connected graphs. The minimal number of tests required for detection in this case depends on the random walk mixing time. A bipartite graph structure is considered in [8] with a two-stage hybrid detection method. Here, a subset of defective items in the first stage is determined by pre-designed tests, while the remaining items are tested individually in the next step. Data retrieval for topology-adaptive GT is studied in [9] where a binary tree splitting algorithm is proposed. The above methods use centralized decision algorithms which are not appropriate for large-scale sensor networks or networks with a dynamic topology because of the high communication costs. In those scenarios one rather needs to use distributed detection methods. To the best of our knowledge, however, no analysis on distributed detection methods which consider sparse and binary test signals are available. The distributed methods are rather employed for non-sparse signal detection with explicit network and message constraints. Such methods generally employ statistical decoders [10]. For example, a Bayesian approach in [11] proposes to compute a detection score for a priori defined sets of hypothesis, which depends on the received messages. The hypothesis with the highest score drives the decision. The binary event detection problem for hierarchically clustered networks is proposed in [12] where the cluster decisions are fused to make a final decision. Surveys on similar methods can be found in [13, 14].

In this paper, we propose a novel distributed sensors’ failure detection method that employs a simple distance decoder for sparse and binary signals. We assume that at most sensors are defective out of sensors in the network, where . Therefore, the defective sensor identification problem boils down to a sparse binary signal recovery, where nonzero signal values correspond to defective sensors. Our approach is based on GT methods that are commonly applied for centralized systems. The core idea is to perform low-cost experiments in the network, called tests, in order to detect the defective sensors. The tests are performed on pools of sensors by a set of sensors called master sensors. The master sensors request sensor measurements from their neighbors. Each sensor responds to this request with probability . Due to the smoothness of the measured function, non erroneous neighbor sensors typically have similar measurements. Each master sensor compares the sensor measurements based on a similarity measure (e.g., thresholding) to detect the presence of defective sensors in its vicinity. The result of this test takes a binary value, which might be possibly altered by noise. The tests and their outputs together form the network messages that are communicated to neighborhood of the master nodes. The messages in the sensors are then disseminated in the network with a gossip algorithm (rumor mongering) [15] that follows a pull protocol [16], [17], [18]. Each time a new message reaches the sensor, its value is linearly combined with the message available at the current sensor in order to increase the diversity of information in the network. The message design and dissemination phases are repeated for several rounds. Due to the probabilistic test design and message dissemination we employ a simple distance decoder (e.g., Hamming decoder) that is able to detect defective sensors, as long as the number of messages is sufficient. We analyze the detection failure bounds and analytically derive the conditions needed for successful failure detection in the case of a single defective sensor. Then, we provide the error bounds for detection of multiple defective sensors. We show that the number of linearly independent messages required for detection is smaller in practice than the theoretical bounds obtained in our worst case analysis. We finally provide simulation results in regular and irregular networks. The experiments outline the advantages of the proposed detection method compared to other binary signal detection algorithms based on the random walk measurements gathering. Our algorithm outperforms random walk detection methods both in terms of the detection accuracy and convergence rate.

This paper is organized as follows. Section II reviews the centralized Group Testing framework. Section III proposes a novel distributed detection method. It describes the message formation and dissemination processes in sensor networks and discusses the detection problem for single and multiple defective sensors. Section IV presents the simulation results.

## Ii Centralized detection with probabilistic Group Testing

We first review the centralized detection of sensor failures with methods based on GT. This framework is the ground for the novel distributed GT algorithm discussed in the next section. Detection is the identification of a subset of defective sensors whose measurements deviate significantly from those of the sensors in their vicinity. Based on the test construction, the methods for detection are categorized into deterministic and probabilistic algorithms. General centralized deterministic GT methods assign each sensor to the set of tests prior to performing them, where the tests are designed to assure detection. This approach however is not feasible for networks with large number of sensors. To alleviate this, probabilistic GT has been proposed in [19]. We focus on test design methods that do not use the knowledge of realized test outcomes for novel test designs, since they are more appropriate in realistic settings.

Hereafter, we adopt the following notation: matrices and vectors are represented with boldface capital letters (M, m) and their elements are given with lowercase letters (). Calligraphic letters are used to denote sets (), while represents the number of elements in a set. The -th column and the -th row of are represented with and , respectively.

GT aims at detecting defective items in the set based on the outcome of binary tests. Nonzero entries of a -dimensional binary vector indicate the defective sensors. is a finite field of size two and is a -sparse signal, where . The tests preformed on sensor measurements are represented with a dimensional matrix . The nonzero entries of refer to the sensors that participate in the -th test. The boolean matrix multiplication operator is denoted with . Then, the binary tests results are denoted with the test outcome vector :

(1) |

The design of the matrix is crucial for reducing the number of required tests for the detection of defective sensors. This design resembles the design of generator matrices of LDPC codes [20]. In the Tanner graph representation of LDPC codes, the LDPC encoded symbols are partitioned in check and variable nodes, where the check nodes are used to detect errors introduced during transmission of LDPC encoded symbols. Motivated by this similarity, the test matrix is constructed as [19]:

(2) |

The sensor participation probability is denoted with . Such a design for the test matrix assures that with high probability, any test matrix column is not a subset of any union of up to columns (disjunctness property). In other words, a matrix is called -disjunct if no column of lies in the sub-space formed by any set of columns with . This property enables fast decoding with a distance decoder (i.e., Hamming distance). The distance decoder exploits the knowledge of the test outcome vector and the test matrix or the seed of the pseudorandom generator that has been used for generating the random test matrix. Next, we discuss in more details the disjunctnesss property and the detection probability in centralized GT, since they represent the starting point of the decentralized detection method proposed in the next section.

We first formally define the disjunctness property [19] of test matrices that results in low-cost detection. This property assures that the union of any set of at most different columns of differs in at least positions from any other column of .

###### Definition 1

Disjunctness property: A boolean matrix with columns is called -disjunct if, for every subset of its columns, with :

(3) |

where denotes the nonzero elements (support) of the column and is the set difference operator.

Disjunctness is an important property since it permits to analyze the detection probability. The connection between the structure of disjunct matrices and detection of defective items is given by the following proposition [19].

###### Proposition 1

If the test matrix fulfills a -disjunct property, the detection problem is resolved in the -sparse vector with error parameter .

The disjunct matrix parameter represents the distance decoder threshold for detection. The decoder accumulates the number of entries in a column of the -disjunct test matrix that are different from the outcome vector . The columns of that achieve the lowest Hamming distance correspond to defective sensors. For any column of the test matrix that is -disjunct, the decoder verifies if:

(4) |

where is the vector of test outcomes. In other words, the decoder counts the number of positions in the column for which the union of distinct columns differs from the set in order to detect defective items. The columns of the vector are inferred as nonzero iff the inequality (4) holds.

Finally, the detection performance can also be analyzed in noisy settings, when the test matrix satisfies disjunctness property. The noisy settings results from the alternation of the nonzero entries in with probability , as represented in Fig. (2).

The following proposition provides the required number of measurements in centralized detection for successful decoding with the distance decoder in noisy settings [19].

###### Proposition 2

Let the test matrix be -disjunct. The distance decoder successfully detects the correct support with overwhelming probability for a -sparse vector in a noisy environment when the number of tests is equal to

The insights provided by the above results are used in the analysis of the novel distributed GT algorithm proposed in the next section.

## Iii Distributed detection method

### Iii-a Sensor network message design and dissemination

In this section, we propose a novel distributed failure detection algorithm and analyze its performance. The algorithm is based on a novel test design and message dissemination strategy in a distributed GT framework. The sensors iteratively create and disseminate messages in two-phases, denoted by and . During the first phase , the sensors obtain messages that estimate the presence of defective sensors in their neighborhood. In the second phase , the sensors linearly combine messages and exchange them employing a gossip mechanism. One round of our iterative distributed detection algorithm consists of these two phases. They are illustrated in Fig. 3 and described below in more details.

(a) Phase : Message design. | (b) Phase : Message dissemination. | (c) Communication phases. |

The first phase in round represents the message construction process illustrated in Fig. 3(a). master sensors cluster the network into disjoint subsets , . Clustering is used to bound the search space of decoder, as explained in the following subsections. Measurements of neighbor sensors do not vary significantly when the sensors are not defective when the signal under observation is smooth over the sensor field. The master sensors locally gather the readings or measurements from sensors that participate in their test. Each sensor randomly participates in the test with probability , as given in Eq. (2). The master sensor estimates the presence of defective sensors within its neighborhood and then attributes a binary value to each sensor in the neighborhood. The value denotes that the sensor is defective. Noise alternates non-zero bits with the probability , as shown in Fig. (2). The test outcome at master node is finally computed as:

(5) |

where the binary matrix operator is composed by and and stand respectively for the bitwise OR and the bitwise addition operators, where is the set of defective sensors. The message formed by a master sensor during the phase consists of the outcome and the test participation identifier . The message is sent to the neighbor sensors, which concludes the phase .

During the phase , the messages created in the phase are disseminated within the network. The phase is illustrated in Fig. 3(b). Every sensor requests the message formed at the previous round from its neighbor , chosen uniformly at random, following a gossip mechanism with pull protocol. Next, each sensor responds to the message request that it has received from sensor by sending its message from the previous round. This process is performed only once per round. The sensor further combines these messages as follows:

(6) |

where denotes the sensor outcome value of the neighbor at the previous round . The vector represents the test indicator vector at the sensor in round . Since the messages are created probabilistically, the message combination in the different rounds assures that an innovative message reaches sensors at every round with high probability. A toy example of the dissemination phases is illustrated in Fig. 4. In this example the sensor at round pulls the message from the sensor and constructs a new message according to Eq. (III-A).

In a matrix form, the process of message formation and transmission in rounds of our algorithm at any sensor in the network is represented as:

(7) |

where the sensor identifier matrix is of size . The latter equation resembles to the outcome computation in the centralized GT case. However, in the distributed GT the tests represent linear combinations of test vectors that build disjunct matrix with high probability, as given in Eq. (2). To make a clear distinction between test matrices in proposed and centralized setup, we assume that an oracle has a direct access to the master nodes. Let denote the concatenation vector of test realizations at master nodes collected by an oracle in the phase of the round . The matrix then represents the test matrix over collection rounds. Observe that the matrix is by construction disjunct, while is built on the boolean addition of rows of as in Eq. (III-A). The values in thus depend on the random message propagation path, which is obviously not the case in the centralized GT algorithm. Note that, for an arbitrary network, the number of network rounds required for collecting a particular number of linearly independent tests varies and depends on the network topology, the number of master nodes and the test participation probability .

Once every sensor has gathered enough test messages, it independently solves the failure detection problem finding the binary vector that satisfies the tests in Eq. (7). This solution indicates the defective sensors. This process is analyzed in more details below.

### Iii-B Detection of one defective sensor in the network

We first analyze the case of a single defective sensor (case ) in the network and study the detection probability of our distributed algorithm. To recall, the distance decoder used for detection computes the Hamming distance between two vectors and . The element-wise distance is given by:

(8) |

To avoid the false alarms, the decoder threshold is set to the value that is higher than the expected number of noise-induced bit flips per columns in the disjunct matrix [19]:

(9) |

where is a small constant and is the number of rows in . Columns of have in average non-zero elements. Every non-zero matrix element is flipped with probability and the expected number of flips per column is:

(10) |

Recall that the matrix is by construction a disjunct matrix. Proposition 1 states that the detection problem is resolved for tests that form a disjunct test matrix. However, the messages available at sensors in the network form a test matrix that is obtained by linear combinations of disjunct matrix rows and not disjunct matrix rows itself. Nevertheless, we show below that the distance decoder detects defective sensor with high probability under certain conditions.

The formal propositions for detection with high probability are given below. First we show that the proposed algorithm in the network with a single master node designs a -disjunct matrix during the phase . Next we show that in a single cluster network linear combinations of rows in preserve distances between the test outcome and the column of the defective sensor in the test matrix. We then build on these two propositions to analyze the number of messages needed for the distributed detection of a single defective sensor, which is given in Proposition 7.

We first show that for a network with a single master node () and probabilistic message design in the phase , a -disjunct matrix is built with high probability. This case boils down to the centralized collection of data described in [19] and the defective sensor can be detected by a distance decoder as shown in Proposition 1.

###### Proposition 3

For a single-cluster network, the message design over the phase of our proposed method builds a -disjunct matrix with high probability for an arbitrary and defined as in Eq. (9).

We show that the probability that the number of rows with a good disjunctness property of is smaller than and we follow the development proposed in [19]. The sensor participation probability in a test defined as in Eq. (2). A row of the matrix is considered to have a good disjunctness property if a single symbol “” occurs, while the rest values are equal to zero. The probability of such an event is equal to . The random variable that marks the total number of rows with such a property is denoted with . The distribution of is binomial with a mean value . We show that the probability of having less than rows with good disjunctness property is small under the assumption that . We limit this probability by a Chernoff bound as:

(11) |

Knowing that and that constant , we get . Since holds, is bounded. For the parameter choice in [19] , the value . Therefore this probability can be designed to be arbitrary small:

(12) |

Then we show that linear combinations of rows of -disjunct matrices in a network with a single master node preserve the Hamming distance only between the column of matrix that corresponds to the defective sensor and the outcome vector .

###### Proposition 4

Let be the -disjunct matrix created over consecutive rounds in a single-cluster network during the phase . Linear combinations of messages generated during the phase , performed as in Eq. (III-A), preserve the Hamming distance between the column of obtained matrix that corresponds to the defective sensor and the outcome vector .

We first analyze the case that leads to a decoding failure for -disjunct matrices following a development similar to [19]. We prove further that linear combinations of rows in such matrices preserve vector distances between the outcome vector and the column of that corresponds to the defective sensor.

A decoding failure with a distance decoder occurs in a -disjunct matrix when the number of flips of column elements of is higher than . The probability of occurrence of a single flip is equal to . Let denotes the number of flips in the columns of the matrix. Hence, the expected number of flips per column is given in Eq. (10). We want to compute the lower bounds for the event that more than flips occurred in the column of the matrix, where . Applying the Markov inequality:

(13) |

and plugging the probability of the single flip event:

(14) |

to the expectation term of the previous equation leads to:

(15) | |||||

If we set and plug the inequality , we obtain:

(16) | |||||

For the constant , we finally obtain:

(17) |

Observing that , the Eq. (17) becomes:

(18) |

The outcome value depends on the presence of a defective sensor in the test. We prove here that the distance between and the -th column does not increase more than during , while this is not true for the rest of the columns. When sensor sends its message to sensor during the round , we have:

(19) |

where the first equality results from Eq. (III-A). The second equality follows directly from the fact that the values of and the columns are identical for the defective sensor due to Eq. (7). Since these two columns can initially differ at positions due to noise flips, the overall distance between the vectors and is at maximum given in Eq.(9). We consider now networks with master sensors and an hypothetical centralized data collection. We assume that master nodes cluster the sensor network in disjoint subsets, where every sensor belongs to exactly one cluster. The master nodes perform message design over the rounds as proposed by our algorithm. We show now that the tests gathered from the different clusters build a disjunct matrix, where each cluster relates a -disjunct matrix.

###### Proposition 5

The diagonal matrix obtained from -disjunct matrices is at least -disjunct.

Proof follows directly from the Definition 1 and the disjunctness property of the matrices in . We consider now the gathering of messages that are linearly combined over successive rounds of our detection algorithm. Uniform gathering of linearly combined messages at clusters by a hypothetical centralized decoder results in detection of the defective sensor with high probability when the number of received messages is sufficient.

###### Proposition 6

When the -disjunct matrices are linearly combined as in Eq. (III-A), where and , the resulting test matrix permits detection by a distance decoder with high probability as long as it contains in total messages collected from clusters chosen at random.

We first show that a diagonal matrix constructed from -disjunct matrices of the set is -disjunct. Next, we recall the Proposition 4 and finally, we show that the measurements assure a good disjunct property of cluster matrices. Let the number of rows for all matrices be . The parameters and are defined in Eq. (9) and , the diagonal matrix of matrices is disjunct. The next part of the proof follows from the Proposition 4 which states that a matrix whose rows are formed by linear combinations of rows of -disjunct matrix permits detection with a distance decoder. Finally, we need to prove that for a given the disjunct property holds given that at least messages are available. For this purpose, we follow a development similar to [19] and consider maximum number of sensors in clusters is . The probability bound given in Proposition 3 should hold for all possible choices of a fixed set of out of columns: . This probability can be arbitrary small, e.g., in case . Further on, the condition in Eq. (18), which gives the probability bound that the number of flips in any out of columns exceeds a threshold value is also bounded. It reads

where the last equality is obtained by using Eq. (10). This probability is small for the sufficiently large value of .

We now analyze the proposed distributed algorithm and consider the detection requirements for every sensor in the network. We show that the test messages collected by the sensors during the transmission rounds enable failure detection by the distance decoder with high probability if the number of messages is sufficient, where the decoder operations are performed locally at sensors.

###### Proposition 7

We assume that master sensors partition the sensor network in disjunct parts. Test realizations within a cluster form test vectors. Over the rounds, these vectors create -disjunct matrices :

(20) |

where . Messages arrive at all the sensors in the network in our proposed algorithm, as described in the previous section. If the above assumptions hold and if the number of linearly independent messages received per cluster at every sensor in the network is at least , where , the probability that sensors fail to detect the defective sensor by the distance decoder tends to zero as .

The message collection method does not influence the decoder performance, since the number of per-cluster measurements is sufficient for decoding with high probability. Therefore, the proof follows from the proof of Proposition 6.

### Iii-C Detection of multiple defective sensors in the network

We analyze now the distributed detection of multiple defective sensors, where the number of defective sensors is much smaller than the total number of sensors. We propose here to slightly modify our distributed algorithm and to limit the decoder search space to be able to apply the Hamming distance decoder. The protocol modification and the adaptation of the distance decoder are described below. We assume that sensors completely differentiate between sensors in the network that belong to particular clusters and that at most one defective sensor is located in a given cluster. This knowledge limits the size of the decoder search space.

The proposed protocol is first modified as follows to deal with multiple defective sensors. A decoder error occurs when two or more messages with positive test outcomes are combined together during the phase , since the distance preserving property defined in Eq. (III-B) is not guaranteed in this case. Since the number of defective sensors is very small compared to the total number of sensors, this event however occurs rarely. We explain the protocol modification with a simple example. Let the sensor pull the message from the sensor , where both sensor test outcomes have nonzero values. Instead of combining the messages as in Eq. (III-A), we simply buffer the new message of sensor and consider the message from sensor at previous round as the final outcome of the phase :

(21) |

At the first subsequent round of our distributed algorithm where both messages and have non-zero values as test outcomes, is replaced by the message buffered in node . The rest of the protocol remains unchanged.

Then the decoding proceeds in two main steps. First, the appropriate unions of test matrix columns are created to form a search set space and second, the Hamming distance between the test outcome vector and the vectors of the search set are computed. The minimum Hamming distance indicate the solution of the detection problem. The outcomes collected at some sensor are divided into two sets, i.e., the negative and positive outcome vectors and , respectively. Subsequently, the rows of the test matrix form two sub-matrices and and Eq. (7) is rewritten as:

(22) |

We eliminate non-defective sensors from using the knowledge from and obtain . The columns of interest are those columns of which contain at least one non-zero value. These columns are classified in sets , whose size depends on the complete or partial sensor knowledge about cluster affiliation of other sensors in the network. Columns belonging to the same cluster are grouped together in a set , where and is the number of clusters. The search space consists of vectors that are obtained from unions of up to columns, where each column is picked from a different set . We choose up to columns, since the number of defective elements can be smaller than by the problem definition, while the selection of at most one column from a particular comes from the assumption that at most one defective sensor exists in each cluster. For instance, let the number of defective sensors and clusters be . Let contain and contain columns. Then the search space size has in total elements, where denotes the number of unions of columns and single column subsets are chosen in ways. Distance decoding is performed between and elements of the set , starting from the vectors that are created as unions of columns towards the smaller number of column unions. If no solution exists for a particular value of , we perform the decoding for vectors built from column unions of . If no unique solution is found, we encounter a decoding failure.

Now that the decoder has been described, we analyze in details the number of required messages that are necessary for detection of multiple defective sensors with high probability.

###### Proposition 8

Under the assumption that at most one defective sensor is present in the cluster, that the number of available linearly independent messages at all sensors is at least per cluster, where and that sensors know membership identifiers of all the clusters in the network, the distance decoder detects defective sensors at all sensors in the network with high probability.

To recall, the transmission protocol ensures that the assumptions imposed by Proposition 7 hold for one defective sensor. Then, due to the assumption that at most one defective sensor is present in one cluster and that there is at most one defective sensor active in the test, we can form the set of solutions for the multiple defective case, which has a unique solution. Distance decoder between the outcome vector and a limited set of vectors that form a full search space can therefore find the appropriate solution. In other words, this procedure is identical to per-cluster decoding, where each cluster has at most one defective element, so the Proposition 7 can be applied.

###### Proposition 9

Under the assumption that one defective sensor at most is present in the cluster, that the number of available linearly independent messages at all sensors in the network is at least per cluster, where and sensors know the partial set of identifiers of the clusters in the network, the distance decoder detects defective sensors at all sensors in the network with high probability.

The search space created in this case is larger but it contains the solution. Now the proof is identical to that in the previous proposition.

Finally, we show that the assumption of at most one defective sensor occurrence per cluster is reasonable. We here bound the probability that at least two defective sensors occur within any cluster. An erroneous message is generated in a cluster that contains more than one defective sensor when only a fraction of defective sensors participate in the test actively and we denote the probability of such an event with . If defective sensors participate in the test, the distance within the column that signifies these vectors and the outcome result does not change. The same occurs if none of the defective sensors participate in a test. Due to the protocol modification, only one cluster may generate the erroneous message per round. In total we assume there are , defective sensors and that clusters contain sensors. Then, the probability of decoding error in one cluster is equal to:

(23) |

due to independence of parameters and . represents the probability that some cluster contains defective sensors, is a probability of choosing defective sensors within a cluster with sensors and denotes the conditional probability of the error occurrence in a cluster with defective sensors and test participation probability . We assume that takes a value from the set with uniform distribution, so . Next, (Appendix A-A). Total error probability for clusters is bounded by , so:

(24) |

We use the well known binomial coefficient inequality that holds for where and , to bound the value:

(25) |

We rewrite by using a well known inequality as . Plugging these expressions to the previous expression and performing simple calculations we finally obtain:

(26) |

For the network values this probability is bounded with .

The distance decoder error probability due to our assumption that only one defective sensor is present in the network is small. In addition, the decoder threshold value can be updated to increase the robustness. We increase the value of threshold parameter as , where and is the expected number of non-zero test outcomes. It is set to the total number of observed positive test outcomes.

## Iv Performance evaluation

### Iv-a Setup

In this section, we investigate the performance of our distributed detection method denoted as GP in various scenarii. We first examine the influence of the different network parameters in the rate of dissemination of messages. Next, we examine the decoding probability for both single and multiple defective sensor(s) detection. The number of system rounds required to collect the necessary number of messages for the accurate decoding varies with the topology. The simulations are performed for fully connected, -connected and irregular graphs. Finally, we discuss the number of required linearly independent measurements needed for successful detection and compare it with the theoretical one.

We also analyze the performance of several alternative schemes, namely a Random Walk method that employs a Gossip mechanism with pull protocol (RWGP) and a classical Random Walk (RW) detection. A random walk determines the path of successive random dissemination message exchanges between neighbor sensors. In the RWGP method, the random walk is initiated at sensors (equivalent to the master sensors in the GP method) and terminates after a pre-determined number of rounds. The sensors create messages from the sensor measurements collected along the random walk path. These messages are transmitted with the gossip algorithm that uses a pull protocol. Note that, for identical choice of the sensors over rounds, RWGP and GP are identical. The RW method initiates the raw (uncompressed) measurements collection in random sensors and completes it in a given number of rounds. Every sensor that lays along the random walk path stores the values of all sensors along the transmission path. When all the sensors receive all the data, the process terminates.

The GT algorithm is also compared with a Store-and-Forward (SF) and a Greedy Store-and-Forward (GSF) method that employs pull protocol. Both algorithms disseminate raw sensor measurements. For the SF method, upon receiving a message request, a node responds by forwarding randomly chosen messages from the available set of messages. In GSF, each sensor randomly requests the innovative measurements in a greedy manner from its randomly chosen neighbor sensor. This procedure involves additional message exchange among sensors in every round.

We analyze the performance of these algorithms in fully connected, k-regular graphs and irregular networks. For irregular sensor networks construction, we place sensors randomly in a unit square area. Sensors that lay within a certain radius can communicate and exchange messages directly. In each case, we build different network realizations and for each such realization we perform independent simulations. The results are averaged over all simulations.

### Iv-B Influence of the master node selection process

First, we study the influence of networks’ capability to generate innovative messages on the decoder performance. We consider two different methods for selecting master sensors: random master sensor selection (RM) and deterministic master sensor (DM) selection. Fig. 5 illustrates the detection probability and the achieved average rank with respect to the number of message dissemination rounds, for fully connected graphs with sensors and one () defective sensor. We observe that the performance depends on and for both RM and DM. These values should be selected properly in order to maximize the information diversity in the network. Specifically, we observe that RM achieves the maximum message diversity for (maximum value) since the diversity of messages in this case is maximized by construction in Fig. 5. We can also note that the number of clusters does not affect significantly the detection performance of RM. On the contrary, for DM both parameters and are important. Small values of guarantee high message diversity. This is due to the fact that DM requires more rounds to receive enough messages for detection. In the following, we focus on RM selection where possible (that is, for ), as it provides higher probability of creating innovative messages.

(a) | (b) |

### Iv-C Detection performance

We first consider the case of a single defective sensor (). The detection probability and the average rank evolution over rounds are examined for fully connected (FG) and -connected regular networks (RG) with sensors degree . For all cases, the network consists of sensors. From Fig. 6 we see that networks with higher number of connections achieve faster dissemination of innovative messages. We also note that high connectivity value is beneficial, but it cannot drive by itself the performance of our detection scheme. It should be combined with appropriate choice of network parameters, as discussed earlier. For example, RM master sensor selection for achieves better detection performance, compared to that of fully connected graphs.

(a) | (b) |

In Fig. 7, we illustrate the detection probability for random graphs ( simulations per different graph) with , defective sensor, random clusters and minimum sensors’ degree . We observe that random graphs require more rounds in average for successful detection, as expected. Also, we observe that the detection performance decreases because of the limited message diversity (smaller probability of receiving innovative messages) and the low connectivity. Similarly, Fig. 8 presents results for larger networks which are in accordance with the above.

(a) | (b) |

(a) | (b) |

We then consider the case of multiple defective sensors. In Figs. 9 and 10 we present results for the cases with two defective sensors () in networks of sensors. The results are given in terms of the average detection probability over dissemination rounds, for both fully and irregularly connected graphs. The master sensors are selected deterministically (DM) due to decoder design for multiple defective sensors identification. Note that this example violates the condition and the performance of the detection algorithm is pretty poor. In addition, results for and are depicted in Figs. 11 and 12. We focus on the evolution of the decoding probability and the average number of messages collected over rounds. From the evaluation it is clear that the detection performance is reasonable when the selected parameters value favor diverse message generation.

(a) | (b) |

(a) | (b) |

(a) | (b) |

(a) | (b) |

In [19], a centralized system has been proposed, which can be considered as dual to fully connected networks with centralized tests (single master sensor that covers all the network). For comparison reasons, we compute the required number of measurements for networks with: and . The results are reported in Table I. We observe that the worst case analysis leads to higher number of dissemination rounds than the real ones. However, these values decrease relatively to the growth of number of sensors in the network. Simulations show that in practice the required measurements are significantly fewer.

S=20 | S=70 | |||

130 | (115-244) | (174-217) | (125-284) |

Detection probability comparison of the proposed method with several detection methods are illustrated in Figs. 13 and 14, for and sensors respectively. The proposed scheme outperforms all other methods. Note that the number of necessary rounds in RWGP scheme is large compared to the other schemes, while RW needs higher communication overhead for dissemination due to the transmission of raw sensor measurements. Average rank values over the network rounds are illustrated in Fig. 15. We observe that for the fixed detection probability for the network with sensors the average number of system rounds required for the proposed method is approximately and , respectively. The number of system rounds required by the other algorithms to reach the same probability of performance is higher, especially for the network with sensors.

(a) | (b) |

(a) | (b) |

(a) | (b) |

### Iv-D Communication overhead

For the sake of completeness, we analyze the communication costs of the proposed gossiping protocol and compare it with all other schemes under comparison. Let and denote the number of bits needed for raw measurements transmission and sensor identifier, respectively. Recall that the tuple stands for the number of sensors in the network, the number of master sensors (clusters), the number of neighbors that each master is connected with, the average number of sensors per cluster () and the total number of transmission rounds.

During the first phase of GP, the master sensors receive raw measurements from their neighbors. Thus, bits are used for communicating these values. Further, the master sensors create binary messages and send them to their neighbors. Every neighbor requires knowledge about the identifier of sensors that participate in a test, thus the cost is bits, plus an additional bit in each message for sending the outcome result. Hence, the overall bit consumption is . In the message exchange phase bits are required, from which bits are reserved for the test outcome and the test matrix row . Note that this analysis includes the full vector size and it can be further compressed. The overall number of transmitted bits over rounds is given by:

(27) |

We compared the communication costs of GP with the one of RWGP that takes place also in two phases. The first phase represents the random walk message collection, while the second is equivalent to the GP algorithm. Note that in the special case when RWGP and GP collect exactly the same data, they have identical decoding performance. However, if RWGP visits some sensors several times (more probable in irregular networks with a smaller connectivity degree), it performs worse than GP. In typical simulations, a random walk of RWGP terminates after transmission round, where is the number of elements per cluster in GP. RWGP transmits raw measurements, which results in bits. Therefore, the communication cost for RWGP is given by:

(28) |

The bit transmission requirements for the algorithm is equivalent to that of the first step of RWGP, since it transmits also raw data. The detection is performed at nodes by comparison of known sensor values at that moment, without message design step. The number of transmitted bits is equal to: . Recall that for transmission of a message to all the nodes in a fully connected graph, one requires transmissions. Therefore, the SF algorithm requires in total bits.

The comparison between the proposed method and all other schemes regarding the bits spent for communication is illustrated in Fig. 16 for a fully connected graph. Note that the proposed algorithm in this setup requires only rounds for efficient detection (Fig. 5), but it consumes approximately three times more communication overhead compared to that of RWGP algorithm. However, due to the specific collection approach (hops), the duration of one transmission round of RWGP lasts ten times longer than that of the proposed algorithm. From the figure we can observe that the RW algorithm has very small communication overhead. However, it requires significantly higher number of rounds ( rounds) compared to the detection time of the proposed GP algorithm. Overall, the proposed GP scheme is able to compete with the other schemes in terms of bits used untill detection.

(a) | (b) |

## V Conclusion

In this work, we have addressed the problem of distributed failure detection in sensor networks. We have proposed a novel distributed algorithm that is able to detect a small number of defective sensors in a networks. We have designed a probabilistic message propagation algorithm that allows the use of a simple and efficient distance decoder at sensors. The transmitted messages are formed from local sensor observations and they are communicated using a gossip algorithm. We have derived for the worst case scenario the lower bound on the required number of linearly independent messages that sensors need to collect per cluster to ensure detection of one defective sensor with high probability. We have shown experimentally that this number is quite smaller in practice, even for the small size networks, which confirms the validity of the theoretical bound. The experimental results have shown that the proposed method outperforms other detection schemes in terms of successful detection probability. The convergence rate is very fast, which largely compensates for the higher communication overhead.

## References

- [1] R. Dorfman, “The detection of defective members of large populations,” Annals of Mathematical Statistics, vol. 14, pp. 436–440, 1943.
- [2] M.Young and R. Boutaba, “Overcoming adversaries in sensor networks: A survey of theoretical models and algorithmic approaches for tolerating malicious interference,” IEEE Communications Surveys and Tutorials, vol. 13, pp. 617–641, April 2011.
- [3] H-B. Chen and F. K. Hwang, “A survey on nonadaptive group testing algorithms through the angle of decoding,” J. Comb. Optim., vol. 15, pp. 49–59, 2008.
- [4] W. Dai and O. Milenkovic, “Weighted superimposed codes and constrained integer compressed sensing,” IEEE Trans. Inform. Theory, vol. 55, pp. 2215–2229, May 2009.
- [5] A. De Bonis and U. Vaccaro, “Constructions of generalized superimposed codes with applications to group testing and conflict resolution in multiple access channels,” Theor. Comput. Sci., vol. 306, no. 1-3, pp. 223–243, 2003.
- [6] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non-adaptive group testing,” in Proc. of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, 2010, pp. 1126–1142.
- [7] M. Cheraghchi, A. Karbasi, S. Mohajer, and V. Saligrama, “Graph-constrained group testing,” Proc. of Int. Symp. on Inform. Theory (ISIT), pp. 1913–1917, 2010.
- [8] M. Mézard and C. Toninelli, “Group testing with random pools: Optimal two-stage algorithms,” IEEE Trans. Inform. Theory, vol. 57, no. 3, pp. 1736–1745, March 2011.
- [9] Y.-W. Hong and A. Scaglione, “Group testing for sensor networks: The value of asking the right question,” 38th Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1297–1301, 2004.
- [10] P. K. Varshney, Distributed Detection and Data Fusion, Springer-Verlag New York, Inc., 1st edition, 1996.
- [11] J. N. Tsitsiklis, “Decentralized detection,” Proc. of Advanced Statistical Signal Processing, vol. 2-Signal Detection, pp. 297–344, 1993.
- [12] Q. Tian and E. J. Coyle, “Optimal distributed detection in clustered wireless sensor networks,” IEEE Trans. on Signal Proc., vol. 55, no. 7, pp. 3892–3904, 2007.
- [13] R. Viswanathan and P. K. Varshney, “Distributed detection with multiple sensors: Part I-Fundamentals,” Proc. IEEE, vol. 85, no. 1, pp. 54–63, Jan. 1997.
- [14] R. S. Blum, S. A. Kassam, and H. V. Poor, “Distributed detection with multiple sensors: Part II-Advanced topics,” Proc. IEEE, vol. 85, no. 1, pp. 64–79, Jan. 1997.
- [15] A. Dimakis, S. Kar, J.M.F. Moura, M.G. Rabbat, and A. Scaglione, “Gossip algorithms for distributed signal processing,” Proc. IEEE Trans. Inform. Theory, vol. 98, pp. 1847–1864, Nov. 2010.
- [16] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms for replicated database maintenance,” pp. 1–12, 1987.
- [17] R. Karp, C. Schindelhauer, S. Shenker, and B. Vöcking, “Randomized rumor spreading,” pp. 565–574, 2000.
- [18] S. Deb, M. Medard, and C. Choute, “Algebraic gossip: A network coding approach to optimal multiple rumor mongering,” IEEE Trans. Inform. Theory, vol. 52, no. 6, pp. 2486–2507, 2006.
- [19] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli, “Group testing with probabilistic tests: Theory, design and application,” IEEE Trans. of Inf. Theory, vol. 57, no. 10, pp. 7057–7067, Oct. 2011.
- [20] R. Gallager, “Low-density parity-check codes,” Monograph, M.I.T. Press, 1963.

## Appendix A Appendix

### A-a Model for probability

models the probability of event that multiple defective sensors are present in the same cluster but only a subset of defective sensors participates in the test. This event introduces errors while detection of defective sensors. Recall that sensors participate in the test with the probability . For defective sensors possible message realizations are given with elements of the polynomial . This polynomial represents the the binomial expansion of the form , with and . Polynomial expansion is equal to and the coefficients represent the numbers of -th row of Pascal’s triangle. Messages that do not cause decoding error are the messages of all zeros and of all ones. These messages occur with probabilities and , respectively and they have coefficients equal to . Note that and that probability of error event is therefore equal to:

(29) |