Distributed Detection over Time Varying Networks: Large Deviations Analysis

Distributed Detection over Time Varying Networks: Large Deviations Analysis

Dragana Bajović, Duan Jakovetić, João Xavier, Bruno Sinopoli and José M. F. Moura This work is partially supported by: the Carnegie Mellon—Portugal Program under a grant from the Fundação para a Ciência e Tecnologia (FCT) from Portugal; by FCT grants SIPM PTDC/EEA-ACR/73749/2006; and by ISR/IST plurianual funding (POSC program, FEDER). Work of José M. F. Moura is partially supported by NSF under grants CCF-1011903 and CCF-1018509, and by AFOSR grant FA95501010291. Dragana Bajović and Duan Jakovetić hold fellowships from FCT.D. Bajović and Duan Jakovetić are with the Institute for Systems and Robotics (ISR), Instituto Superior Técnico (IST), Lisbon, Portugal, and with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA dbajovic@andrew.cmu.edu, djakovet@andrew.cmu.eduJ. Xavier is with the Institute for Systems and Robotics (ISR), Instituto Superior Técnico (IST), Lisbon, Portugal jxavier@isr.ist.utl.ptB. Sinopoli and José M. F. Moura are with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA brunos@ece.cmu.edu, moura@ece.cmu.edu
Abstract

We apply large deviations theory to study asymptotic performance of running consensus distributed detection in sensor networks. Running consensus is a stochastic approximation type algorithm, recently proposed. At each time step , the state at each sensor is updated by a local averaging of the sensor’s own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations. We allow the underlying network be time varying, provided that the graph that collects the union of links that are online at least once over a finite time window is connected. This paper shows through large deviations that, under stated assumptions on the network connectivity and sensors’ observations, the running consensus detection asymptotically approaches in performance the optimal centralized detection. That is, the Bayes probability of detection error (with the running consensus detector) decays exponentially to zero as at the Chernoff information rate–the best achievable rate of the asymptotically optimal centralized detector.

\lightrulewidth

=0.03em

I Introduction

We apply large deviations to study the asymptotic performance of distributed detection in sensor networks. Each node in the network senses the environment and cooperates locally with its neighbors to decide between the two hypothesis, and . The nodes are connected by a generic, time varying network, and there is no fusion center. Specifically, we consider distributed detection via running consensus111The running consensus algorithm is a type of recursive stochastic approximation algorithm, see, e.g., [1]. Reference [1] studies more general stochastic approximation type algorithms in the context of distributed estimation. We use the algorithm in form given in [2] and will refer to it as running consensus. that has been recently proposed in [2]. With running consensus, at each time , nodes update their decision variables by: 1) incorporating new observation (innovation step); and 2) mixing their decision variables locally with the neighbors (consensus step).

We allow the underlying communication graph be (deterministically) time varying; but we assume that the graph that collects all communication links that are online (at least once) within a finite time window is connected. We assume Gaussian, spatially correlated, time–uncorrelated sensors’ observations. Under stated assumptions on the network connectivity and the sensors’ observations, we show that the running consensus distributed detector is asymptotically optimal, as the number of observations goes to infinity. That is, the running consensus distributed detector asymptotically approaches the performance of the optimal centralized detector. We apply large deviations to study the asymptotic performance of both the (asymptotically) optimal centralized detector, which collects observations from all nodes at each time , and the running consensus detector. For both detectors, the Bayes probability of error decays as , where is the Chernoff distance between the distributions of the observation vectors under the two hypothesis, i.e., the Chernoff information.

We now briefly review the existing work on distributed detection. Distributed detection has been extensively studied. Prior work studies parallel fusion architectures (see, e.g., [3, 4, 5, 6, 7, 8]) where all nodes communicate with a fusion node. Also, consensus-based detection schemes have been studied (with no fusion node) in, for example, [9, 10, 11], where nodes in the network: 1) collect measurements; and 2) subsequently run the consensus algorithm to fuse their detection rules. The running consensus distributed detection has been proposed in [12]. Running consensus is different from classical consensus detection, as it incorporates new observations at each time step , in real time; thus, unlike classical consensus, no delay is introduced from collecting observations to reaching consensus.

We now comment on the differences between this paper and reference [12], which also studies asymptotic optimality of distributed detection via running consensus. Reference [12] considers the Neyman-Pearson framework, while we adopt the Bayesian framework. Reference [12] considers that, as the number of observations grows, the distribution means under the two hypothesis become closer and closer, at the rate of ; consequently, as , there is an asymptotic, non zero, probability of miss, and asymptotic, non zero, probability of false alarm. In contrast, we assume that the distributions do not change with (do not approach each other,) and the Bayes probability of error decays to zero; we then examine the rate of decay of the Bayes error probability. Further, reference [12] assumes that the observations at different sensors are independent identically distributed, with generic distribution, while we assume Gaussian; however, we allow for spatial correlation among observations–a well-suited assumption, e.g., for densely deployed wireless sensor networks (WSNs). Reference [12] studies the case where the underlying network is randomly varying; we consider deterministically time varying network.

Paper organization. Section II reviews the large deviations results and the Chernoff lemma in hypothesis testing. Section III explains data and network models that we assume. Section IV introduces the (asymptotically) optimal centralized detection, as if there was a fusion node and its detection performance. Section V shows that the distributed running consensus detector asymptotically approaches in performance the optimal centralized detector. Finally, section VI summarizes the paper.

Ii Background

In this section, we briefly review standard large deviations analysis for binary hypothesis testing and standard asymptotic results (in particular, Chernoff lemma) in binary hypothesis testing. We will later use these results throughout the paper.

Ii-a Binary hypothesis testing problem: Log-likelihood ratio test

Consider the sequence of independent identically distributed (i.i.d.) -dimensional random vectors (observations) , , and the binary hypothesis testing problem of deciding whether the probability measure (law) generating is (under hypothesis ) or (under ). Assume that and are mutually absolutely continuous, distinguishable measures. Based on the observations , formally, a decision test is a sequence of maps , , with the interpretation that means that is decided, . Specifically, consider the log-likelihood ratio (LLR) test to decide between and , where is given as follows:

(1)
(2)

Here is the LLR (given by the Radon-Nikodym derivative of with respect to evaluated at ), is a chosen threshold, and is the indicator of event . The LLR test with threshold , , is asymptotically optimal in the sense of Bayes probability of error decay rate, as will be explained in next subsection (II-B).

Ii-B Log-likelihood ratio test: Large deviations

This subsection studies large deviations for the LLR decision test with decision variables given in eqn. (1). The large deviations analysis will be very useful in estimating the exponential rate at which the Bayes probability of error decays and in showing the asymptotic optimality of the distributed running consensus detector. We first give the definition of the large deviations principle [13].

Definition 1 (Large deviations principle (LDP))

Consider a sequence of real valued random variables and denote by the probability measure of . We say that the sequence of measures satisfies the LDP with a rate function if the following holds:

  1. For any closed, measurable set :

  2. For any open, measurable set :

It can be shown that the sequence of LLR’s , conditioned on , , is i.i.d. Denote by the probability measure of under hypothesis . Using Cramér’s theorem ([13]), it can be shown that the sequence of measures , , satisfies the LDP with good222Goodness of rate function is compactness of its sublevel sets. rate function:

(3)

where is the log-moment generating function of under hypothesis :

(4)

That is, the rate function is the Fenchel-Legendre (F-L) ([13]) transform of the log-moment generating function of under . It can be shown that . We summarize this result in the following theorem, e.g., [13]:

Theorem 2

The sequence of measures of under satisfies the LDP with good rate function given by eqn. (3).

Ii-C Asymptotic Bayes detection performance: Chernoff lemma

We adopt the Bayes minimum probability of error detection. Denote by the Bayes probability of error after samples are processed:

(5)

where are the prior probabilities, and are, respectively, the probability of false alarm and the probability of miss, and is the test threshold.

We will be interested in the rate at which the Bayes probability of error decays to zero as the number of observations goes to infinity. Also, as auxiliary results, we will need the rates at which and go to zero as . That is, we will be interested in the following quantities:

(6)
(7)
(8)

Theorem 4 ([13]) states that, among all possible decision tests, the LLR test with zero threshold minimizes (6). This result is a corollary of the Theorem 3 ([13]), that asserts that, for a LLR test with fixed threshold , and indeed (simultaneously) decay to zero exponentially; also, Theorem 3 expresses the exponential rate of decay in terms of the rate functions defined in eqns. (7) and (8). Before stating the Theorem, define , .

Theorem 3

The LLR test with constant threshold , satisfies:

(9)
(10)
Theorem 4 (Chernoff lemma)

If , then:

(11)

where the infimum over all possible tests is attained for the LLR test with , .

The quantity is called the Chernoff distance between the distributions of under and , or Chernoff information, [13].

Asymptotically optimal test. We introduce the following definition of the asymptotically optimal test.

Definition 5

The decision test is asymptotically optimal if it attains the infimum in eqn. (11).

We will show that, for the distributed Gaussian hypothesis testing over time varying networks, the running consensus is asymptotically optimal in the sense of Definition 5.

Iii Distributed detection model: Data and Network models

This section describes: 1) the data model (subsection III-A), i.e., the observation model at each sensor in the network; and 2) the model of the network through which the sensors cooperate with the running consensus distributed detection algorithm (subsection III-B). The distributed detection algorithm is detailed in Section V.

Iii-a Data model

We consider Gaussian binary hypothesis testing in spatially correlated noise. The sensors operate (in terms of sensing and communication) synchronously, at discrete time steps . At time , sensor measures (scalar) . Collect the sensor measurements in a vector , where is the total number of sensors. Nature can be in one of two possible states: event occurring (e.g., target present); and event not occurring (e.g., target absent.) We assume the following distribution model for the vector :

(12)

where is the (constant) signal under hypothesis , and is zero mean Gaussian additive noise. We assume that is an independent identically distributed (i.i.d.) sequence of random vectors with distribution , where is a (positive definite) covariance matrix. Thus, with our model, the noise is temporally independent, but can be spatially correlated. Spatial correlation should be taken into account due to, for example, dense deployment of wireless sensor networks, while it is still reasonable to assume that the observations are independent along time. (Conditioned to , are i.i.d. with the distribution .)

Iii-B Network model and data mixing model

We consider distributed detection via running consensus where each node at a time : 1) measures ; 2) exchanges its current decision variable (denote it by ) with its neighbors; and 3) performs a weighted average of its own decision variable and the neighbors’ decision variables. The network connectivity is assumed time varying. The weighted averaging, at each time , as with the standard consensus algorithm, is described by the weight matrix . We assume is a symmetric, stochastic matrix (it has nonnegative entries and the rows sum to 1.) The weight matrix respects the sparsity pattern of the network, i.e., , if the link is down at time . We define also the undirected graph , where is the set of nodes with cardinality , and is the set of undirected edges that are online at time . Formally, . Define also , where is vector with unit entries. We now summarize the assumptions on the matrices and the graphs :

Assumption 6

For the sequence of matrices , we assume the following:

  1. is symmetric and stochastic, .

  2. There exists a scalar , such that i) , , ; and ii) , , if and .

  3. There exists an integer , such that, , the graph is connected.

Assumption 6-3) says that nodes should communicate sufficiently often (within finite time windows,) such that the network provides sufficiently fast information flow.

Iv Centralized detection: Bayes optimal test

We first consider the centralized detection scenario, as if there was a fusion node that collects and processes all sensor observations. The decision variable and the LLR decision test are given by eqns. (1) and (2), where now, under the data assumptions in subsection III-A:

(13)

Conditioned on either hypothesis and , , where

(14)
(15)

Define the vector as

(16)

Then, the LLR can be written as follows:

(17)

where denotes the -th entry of vector , . Thus, the LLR at time is separable, i.e., the LLR is the sum of the terms that depend affinely on the individual observations . We will exploit this fact in subsection V-A to derive the distributed, running consensus, detection algorithm.

Applying Theorem 2 to the sequence (under hypothesis , ), we have that the sequence of measures of satisfies the LDP with good rate function , which, by evaluating the log-moment generating function of in (13) and its F-L transform, can be shown to be:

(18)

We state this result as a Corollary 7.

Corollary 7

The sequence , under , , satisfies the LDP with good rate function , given by eqn. (18).

We remark that Theorem 4 also applies to the detection problem explained in subsection III-A. Denote by the Bayes probability of error for the centralized detector (defined in section IV,) after samples are processed. Due to the continuity of the rate functions in (18), it can be shown that: . Thus, Theorem 4 in this case simplifies to the following corollary:

Corollary 8

(Chernoff lemma for the optimal centralized detector) The LLR test with , , is asymptotically optimal in the sense of definition 5. Moreover, for the LLR test with , , we have:

(19)

Remark. The LLR test with zero threshold is optimal also in the finite time regime, for all , in the sense that it minimizes the Bayes probability of error, when the prior probabilities are . When the prior probabilities are not equal, the LLR test is also optimal, but the threshold will be different than zero.

V Distributed detection algorithm

V-a Distributed detection via running consensus

We now present a distributed detection algorithm via running consensus. With this detection algorithm, no fusion node is required, and the underlying network is generic, time varying. The running consensus is proposed in [2], and it is a stochastic approximation type of algorithm (see [1]). Reference [2] studies the case when the observations of different sensors at a fixed time are i.i.d. We extend the running consensus detection algorithm to the case of spatially correlated Gaussian observations.

With the running consensus distributed detector, each node makes local decisions based on its local decision variable : If , then is accepted; if , then is accepted. At each time step , the local decision variable at node is improved two-fold: 1) by exchanging information with its immediate neighbors in the network; 2) by incorporating into the decision process the new local observation . Recall the definition of in eqn. (17). Specifically, the update of the local decision variable at node is given by the following equation:

(20)

Here is the (time varying) neighborhood of node at time , and are the (time varying) averaging weights, defined together with the (time varying) matrices in subsection III-B. Let and . The algorithm in matrix form is given by:

(21)

Recall the definition of the vector in (16). The sequence of random vectors , conditioned to , is i.i.d. Vector (under hypothesis , ) is Gaussian with mean and covariance :

(22)
(23)

Here is a diagonal matrix with the diagonal entries equal to the entries of .

V-B Asymptotic optimality of the distributed detection algorithm

In this subsection, we present our main result, which states that the distributed detection via running consensus asymptotically achieves the performance of the optimal centralized detector, in the sense that it approaches the exponential error decay rate of the (asymptotically) optimal centralized detector.

Denote the probability measure of under hypothesis with . First, we show that the sequence of measures , for all nodes , satisfies the LDP with good rate function; the rate function for all nodes is the same, and it is the same as the rate function of the optimal centralized detector in eqn. (18).

We prove that the sequence of measures for (under , ) satisfies the LDP using the Gärtner-Ellis Theorem from large deviations theory, see [13]. We now state Theorem 9.

Theorem 9

Let assumption 6 hold. The sequence of measures , for all nodes , satisfies the large deviations principle with good rate function. The rate function is the same as for the optimal centralized detector and is given by in eqn. (18).

Before proving Theorem 9, define , for , as follows:

(24)

and remark that the algorithm in eqn. (21) can be written as:

(25)

Next, recall that , introduce notation:

(26)

and remark that

To prove Theorem 9, we borrow the following result (Lemma 10) on the matrices from reference [14] (Lemma 3.2). First, denote by the entry in -th row and -th column of matrix .

Lemma 10

Let Assumption 6 hold. Then, for the matrices , defined by eqn. (26), there holds:

(27)

where , and

Lemma 10 says that, under Assumption 6, the size of the matrix decays geometrically (in ) to zero. This fact will be important in showing Theorem 9.

Proof of Theorem 9.

Define, for , the quantity:

(28)
(29)

where , , and , . Here denotes the -th column of identity matrix. We drop the dependence on in the definition of for notation simplicity. Recall the expressions for and in eqns. (14) and (15). We will show, for all , the following equality:

(30)

Consider the function ; this function is essentially smooth, continuous, and its domain is ; hence, by the Gärtner-Ellis theorem ([13], Theorem 2.3.6), (the sequence of measures of under ) satisfies the LDP. The corresponding rate function equals the F-L transform of the function ; and it is easy to show that the F-L transform of equals the rate function given by eqn. (18). Thus, proving Theorem 9 reduces to showing (30). We thus proceed with showing (30). Namely, we have:

where the last equality holds because is independent from , . We will be interested in computing the limit , for all ; with this respect, remark that

for all , because is a Gaussian random vector and hence it has finite log-moment generating function at any point .

Thus, we have that , where

and we proceed with the computation of . The random variables , , are independent; moreover, they are Gaussian random variables, as linear transformation of the Gaussian variables . Recall that and denote the mean and the covariance of under hypothesis . Using the independence of and , , and using the expression for the moment generating function of , we obtain successively:

(31)

Denote further:

where dependence on is dropped in the definition of . Then, it is easy to see that . Also, we have:

Recall the expressions for , , , , and in eqns. (16), (14), (15), (22), (23). We proceed with the computation of :

We proceed by showing that as , which implies the equality in eqn (30). Define the quantities , , and , by:

(33)
(34)
(35)

Then, it can be shown that is bounded as follows:

Applying Lemma 10 to (V-B), and using the fact that , , we obtain successively:

(37)
(38)
(39)
(40)

Letting , we get that , and hence, , which establishes eqn. (30). ∎

We are now ready to state the main result on asymptotic optimality of the distributed detector (in the sense of Definition 5.)

Corollary 11

(Chernoff lemma for the distributed detector: Asymptotic optimality) The local decision test , , at each node , is asymptotically optimal in the sense of Definition 5. The corresponding exponential decay rate of the Bayes probability of error, at each node , is given by:

(41)
Proof.

Denote by and , respectively, the probability of false alarm and the probability of miss for the distributed detector at sensor , i.e.,

Consider now only but the same applies to . By Theorem 9, the sequence of measures satisfies the LDP with good rate function given in eqn. (18). Thus, we have the following bounds:

(42)
(43)

Due to the continuity of the function (see eqn. (18)), the infima on the righthand sides in eqns. (42) and (43) are equal; it is easy to see that they are equal to . Thus, we have:

From the last set of inequalities we conclude that:

(44)

Similarly, it can be shown that:

(45)
(46)

Now, consider

(47)

for which the following inequalities hold:

(48)

By eqns. (48), we obtain: