Randomized Sensor Selection in Sequential Hypothesis Testing

Randomized Sensor Selection in Sequential Hypothesis Testing

Abstract

We consider the problem of sensor selection for time-optimal detection of a hypothesis. We consider a group of sensors transmitting their observations to a fusion center. The fusion center considers the output of only one randomly chosen sensor at the time, and performs a sequential hypothesis test. We consider the class of sequential tests which are easy to implement, asymptotically optimal, and computationally amenable. For three distinct performance metrics, we show that, for a generic set of sensors and binary hypothesis, the fusion center needs to consider at most two sensors. We also show that for the case of multiple hypothesis, the optimal policy needs at most as many sensors to be observed as the number of underlying hypotheses.

Sensor selection, decision making, SPRT, MSPRT, sequential hypothesis testing, linear-fractional programming.

I Introduction

In today’s information-rich world, different sources are best informers about different topics. If the topic under consideration is well known beforehand, then one chooses the best source. Otherwise, it is not obvious what source or how many sources one should observe. This need to identify sensors (information sources) to be observed in decision making problems is found in many common situations, e.g., when deciding which news channel to follow. When a person decides what information source to follow, she relies in general upon her experience, i.e., one knows through experience what combination of news channels to follow.

In engineering applications, a reliable decision on the underlying hypothesis is made through repeated measurements. Given infinitely many observations, decision making can be performed accurately. Given a cost associated to each observation, a well-known tradeoff arises between accuracy and number of iterations. Various sequential hypothesis tests have been proposed to detect the underlying hypothesis within a given degree of accuracy. There exist two different classes of sequential tests. The first class includes sequential tests developed from the dynamic programming point of view. These tests are optimal and, in general, difficult to implement [5]. The second class consists of easily-implementable and asymptotically-optimal sequential tests; a widely-studied example is the Sequential Probability Ratio Test (SPRT) for binary hypothesis testing and its extension, the Multi-hypothesis Sequential Probability Ratio Test (MSPRT).

In this paper, we consider the problem of quickest decision making using sequential probability ratio tests. Recent advances in cognitive psychology [7] show that the performance of a human performing decision making tasks, such as ”two-alternative forced choice tasks,” is well modeled by a drift diffusion process, i.e., the continuous-time version of SPRT. Roughly speaking, modeling decision making as an SPRT process is somehow appropriate even for situations in which a human is making the decision.

Sequential hypothesis testing and quickest detection problems have been vastly studied [17, 4]. The SPRT for binary decision making was introduced by Wald in [21], and was extended by Armitage to multiple hypothesis testing in [1]. The Armitage test, unlike the SPRT, is not necessarily optimal [24]. Various other tests for multiple hypothesis testing have been developed throughout the years; a survey is presented in [18]. Designing hypothesis tests, i.e., choosing thresholds to decide within a given expected number of iterations, through any of the procedures in [18] is infeasible as none of them provides any results on the expected sample size. A sequential test for multiple hypothesis testing was developed in [5][10], and [11], which provides with an asymptotic expression for the expected sample size. This sequential test is called the MSPRT and reduces to the SPRT in case of binary hypothesis.

Recent years have witnessed a significant interest in the problem of sensor selection for optimal detection and estimation. Tay et al [20] discuss the problem of censoring sensors for decentralized binary detection. They assess the quality of sensor data by the Neyman-Pearson and a Bayesian binary hypothesis test and decide on which sensors should transmit their observation at that time instant. Gupta et al [13] focus on stochastic sensor selection and minimize the error covariance of a process estimation problem. Isler et al [14] propose geometric sensor selection schemes for error minimization in target detection. Debouk et al [9] formulate a Markovian decision problem to ascertain some property in a dynamical system, and choose sensors to minimize the associated cost. Wang et al [22] design entropy-based sensor selection algorithms for target localization. Joshi et al [15] present a convex optimization-based heuristic to select multiple sensors for optimal parameter estimation. Bajović et al [3] discuss sensor selection problems for Neyman-Pearson binary hypothesis testing in wireless sensor networks.

A third and last set of references related to this paper are those on linear-fractional programming. Various iterative and cumbersome algorithms have been proposed to optimize linear-fractional functions [8], [2]. In particular, for the problem of minimizing the sum and the maximum of linear-fractional functionals, some efficient iterative algorithms have been proposed, including the algorithms by Falk et al [12] and by Benson [6].

In this paper, we analyze the problem of time-optimal sequential decision making in the presence of multiple switching sensors and determine a sensor selection strategy to achieve the same. We consider a sensor network where all sensors are connected to a fusion center. The fusion center, at each instant, receives information from only one sensor. Such a situation arises when we have interfering sensors (e.g., sonar sensors), a fusion center with limited attention or information processing capabilities, or sensors with shared communication resources. The fusion center implements a sequential hypothesis test with the gathered information. We consider two such tests, namely, the SPRT and the MSPRT for binary and multiple hypothesis, respectively. First, we develop a version of the SPRT and the MSPRT where the sensor is randomly switched at each iteration, and determine the expected time that these tests require to obtain a decision within a given degree of accuracy. Second, we identify the set of sensors that minimize the expected decision time. We consider three different cost functions, namely, the conditioned decision time, the worst case decision time, and the average decision time. We show that the expected decision time, conditioned on a given hypothesis, using these sequential tests is a linear-fractional function defined on the probability simplex. We exploit the special structure of our domain (probability simplex), and the fact that our data is positive to tackle the problem of the sum and the maximum of linear-fractional functionals analytically. Our approach provides insights into the behavior of these functions. The major contributions of this paper are:

  1. We develop a version of the SPRT and the MSPRT where the sensor is selected randomly at each observation.

  2. We determine the asymptotic expressions for the thresholds and the expected sample size for these sequential tests.

  3. We incorporate the processing time of the sensors into these models to determine the expected decision time.

  4. We show that, to minimize the conditioned expected decision time, the optimal policy requires only one sensor to be observed.

  5. We show that, for a generic set of sensors and underlying hypotheses, the optimal average decision time policy requires the fusion center to consider at most sensors.

  6. For the binary hypothesis case, we identify the optimal set of sensors in the worst case and the average decision time minimization problems. Moreover, we determine an optimal probability distribution for the sensor selection.

  7. In the worst case and the average decision time minimization problems, we encounter the problem of minimization of sum and maximum of linear-fractional functionals. We treat these problems analytically, and provide insight into their optimal solutions.

The remainder of the paper is organized in following way. In Section II, we present the problem setup. Some preliminaries are presented in Section III. We develop the switching-sensor version of the SPRT and the MSPRT procedures in Section IV. In Section V, we formulate the optimization problems for time-optimal sensor selection, and determine their solution. We elucidate the results obtained through numerical examples in Section VI. Our concluding remarks are in Section VII.

Ii Problem Setup

We consider a group of agents (e.g., robots, sensors, or cameras), which take measurements and transmit them to a fusion center. We generically call these agents ”sensors.” We identify the fusion center with a person supervising the agents, and call it ”the supervisor.”

Fig. 1: The agents transmit their observation to the supervisor , one at the time. The supervisor performs a sequential hypothesis test to decide on the underlying hypothesis.

The goal of the supervisor is to decide, based on the measurements it receives, which of the alternative hypotheses or “states of nature,” , is correct. For doing so, the supervisor uses sequential hypothesis tests, which we briefly review in the next section.

We assume that only one sensor can transmit to the supervisor at each (discrete) time instant. Equivalently, the supervisor can process data from only one of the agents at each time. Thus, at each time, the supervisor must decide which sensor should transmit its measurement. We are interested in finding the optimal sensor(s), which must be observed in order to minimize the decision time.

We model the setup in the following way:

  1. Let indicate which sensor transmits its measurement at time instant .

  2. Conditioned on the hypothesis , , the probability that the measurement at sensor is , is denoted by .

  3. The prior probability of the hypothesis , , being correct is .

  4. The measurement of sensor at time is . We assume that, conditioned on hypothesis , is independent of , for .

  5. The time it takes for sensor to transmit its measurement (or for the supervisor to process it) is .

  6. The supervisor chooses a sensor randomly at each time instant; the probability to choose sensor is stationary and given by .

  7. The supervisor uses the data collected to execute a sequential hypothesis test with the desired probability of incorrect decision, conditioned on hypothesis , given by .

  8. We assume that there are no two sensors with identical conditioned probability distribution and processing time . If there are such sensors, we club them together in a single node, and distribute the probability assigned to that node equally among them.

Iii Preliminaries

Iii-a Linear-fractional function

Given parameters , , , and , the function , defined by

is called a linear-fractional function [8]. A linear-fractional function is quasi-convex as well as quasi-concave. In particular, if , then any scalar linear-fractional function satisfies

(1)

for all and .

Iii-B Kullback-Leibler distance

Given two probability distributions functions and , the Kullback-Leibler distance is defined by

Further, , and the equality holds if and only if almost everywhere.

Iii-C Sequential Probability Ratio Test

The SPRT is a sequential binary hypothesis test that provides us with two thresholds to decide on some hypothesis, opposed to classical hypothesis tests, where we have a single threshold. Consider two hypothesis and , with prior probabilities and , respectively. Given their conditional probability distribution functions and , and repeated measurements , with defined by

(2)

the SPRT provides us with two constants and to decide on a hypothesis at each time instant , in the following way:

  1. Compute the log likelihood ratio: := log ,

  2. Integrate evidence up to time , i.e., := ,

  3. Decide only if a threshold is crossed, i.e.,

Given the probability of false alarm = and probability of missed detection = , the Wald’s thresholds and are defined by

(3)

The expected sample size , for decision using SPRT is asymptotically given by

(4)

as . The asymptotic expected sample size expressions in equation (4) are valid for large thresholds. The use of these asymptotic expressions as approximate expected sample size is a standard approximation in the information theory literature, and is known as Wald’s approximation [4][17][19].

For given error probabilities, the SPRT is the optimal sequential binary hypothesis test, if the sample size is considered as the cost function [19].

Iii-D Multi-hypothesis Sequential Probability Ratio Test

The MSPRT for multiple hypothesis testing was introduced in [5], and was further generalized in [10] and [11]. It is described as follows. Given hypotheses with their prior probabilities , the posterior probability after observations is given by

where is the probability density function of the observation of the sensor, conditioned on hypothesis .

Before we state the MSPRT, for a given , we define by

The MSPRT at each sampling iteration is defined as

where the thresholds , for given frequentist error probabilities (accept a given hypothesis wrongly) , , are given by

(5)

where is a constant function of (see [5]).

It can be shown [5] that the expected sample size of the MSPRT, conditioned on a hypothesis, satisfies

where .

The MSPRT is an easily-implementable hypothesis test and is shown to be asymptotically optimal in [5, 10].

Iv Sequential hypothesis tests with switching sensors

Iv-a SPRT with switching sensors

Consider the case when the fusion center collects data from sensors. At each iteration the fusion center looks at one sensor chosen randomly with probability , . The fusion center performs SPRT with the collected data. We define this procedure as SPRT with switching sensors. If we assume that sensor is observed at iteration , and the observed value is , then SPRT with switching sensors is described as following, with the thresholds and defined in equation (3), and defined in equation (2):

  1. Compute log likelihood ratio:

  2. Integrate evidence up to time , i.e., := ,

  3. Decide only if a threshold is crossed, i.e.,

Lemma 1 (Expected sample size)

For the SPRT with switching sensors described above, the expected sample size conditioned on a hypothesis is asymptotically given by:

(6)

as .

Proof:

Similar to the proof of Theorem 3.2 in  [23]. \qed

The expected sample size converges to the values in equation (6) for large thresholds. From equation (3), it follows that large thresholds correspond to small error probabilities. In the remainder of the paper, we assume that the error probabilities are chosen small enough, so that the above asymptotic expression for sample size is close to the actual expected sample size.

Lemma 2 (Expected decision time)

Given the processing time of the sensors , the expected decision time of the SPRT with switching sensors , conditioned on the hypothesis , , is

(7)

where , are constant vectors for each .

Proof:

The decision time using SPRT with switching sensors is the sum of sensor’s processing time at each iteration. We observe that the number of iterations in SPRT and the processing time of sensors are independent. Hence, the expected value of the decision time is

(8)

By the definition of expected value,

(9)

From equations (6), (8), and (9) it follows that

where is a constant, for each , and . \qed

Iv-B MSPRT with switching sensors

We call the MSPRT with the data collected from sensors while observing only one sensor at a time as the MSPRT with switching sensors. The one sensor to be observed at each time is determined through a randomized policy, and the probability of choosing sensor is stationary and given by . Assume that the sensor is chosen at time instant , and the prior probabilities of the hypothesis are given by , then the posterior probability after observations is given by

Before we state the MSPRT with switching sensors, for a given , we define by

For the thresholds , , defined in equation (5), the MSPRT with switching sensors at each sampling iteration is defined by

Before we state the results on asymptotic sample size and expected decision time, we introduce the following notation. For a given hypothesis , and a sensor , we define by

We also define by

where represents the probability simplex in .

Lemma 3 (Expected sample size)

Given thresholds , , the sample size required for decision satisfies

as .

Proof:

The proof follows from Theorem 5.1 of [5] and the observation that

\qed
Lemma 4 (Expected decision time)

Given the processing time of the sensors , the expected decision time conditioned on the hypothesis , for each , is given by

(10)

where are constants.

Proof:

Similar to the proof of Lemma 2. \qed

V Optimal Sensor Selection

In this section we consider sensor selection problems with the aim to minimize the expected decision time of a sequential hypothesis test with switching sensors. As exemplified in Lemma 4, the problem features multiple conditioned decision times and, therefore, multiple distinct cost functions are of interest. In Scenario I below, we aim to minimize the decision time conditioned upon one specific hypothesis being true; in Scenarios II and III we will consider worst-case and average decision times. In all three scenarios the decision variables take values in the probability simplex.

Minimizing decision time conditioned upon a specific hypothesis may be of interest when fast reaction is required in response to the specific hypothesis being indeed true. For example, in change detection problems one aims to quickly detect a change in a stochastic process; the CUSUM algorithm (also referred to as Page’s test) [16] is widely used in such problems. It is known [4] that, with fixed threshold, the CUSUM algorithm for quickest change detection is equivalent to an SPRT on the observations taken after the change has occurred. We consider the minimization problem for a single conditioned decision time in Scenario I below and we show that, in this case, observing the best sensor each time is the optimal strategy.

In general, no specific hypothesis might play a special role in the problem and, therefore, it is of interest to simultaneously minimize multiple decision times over the probability simplex. This is a multi-objective optimization problem, and may have Pareto-optimal solutions. We tackle this problem by constructing a single aggregate objective function. In the binary hypothesis case, we construct two single aggregate objective functions as the maximum and the average of the two conditioned decision times. These two functions are discussed in Scenario II and Scenario III respectively. In the multiple hypothesis setting, we consider the single aggregate objective function constructed as the average of the conditioned decision times. An analytical treatment of this function for , is difficult. We determine the optimal number of sensors to be observed, and direct the interested reader to some iterative algorithms to solve such optimization problems. This case is also considered under Scenario III.

Before we pose the problem of optimal sensor selection, we introduce the following notation. We denote the probability simplex in by , and the vertices of the probability simplex by , . We refer to the line joining any two vertices of the simplex as an edge. Finally, we define , , by

V-a Scenario I (Optimization of conditioned decision time):

We consider the case when the supervisor is trying to detect a particular hypothesis, irrespective of the present hypothesis. The corresponding optimization problem for a fixed is posed in the following way:

(11)

The solution to this minimization problem is given in the following theorem.

Theorem 1 (Optimization of conditioned decision time)

The solution to the minimization problem (11) is , where is given by

and the minimum objective function is

(12)
Proof:

We notice that objective function is a linear-fractional function. In the following argument, we show that the minima occurs at one of the vertices of the simplex.

We first notice that the probability simplex is the convex hull of the vertices, i.e., any point in the probability simplex can be written as

We invoke equation (1), and observe that for some and for any

(13)

which can be easily generalized to

(14)

for any point in the probability simplex . Hence, minima will occur at one of the vertices , where is given by

\qed

V-B Scenario II (Optimization of the worst case decision time):

For the binary hypothesis testing, we consider the multi-objective optimization problem of minimizing both decision times simultaneously. We construct single aggregate objective function by considering the maximum of the two objective functions. This turns out to be a worst case analysis, and the optimization problem for this case is posed in the following way:

(15)

Before we move on to the solution of above minimization problem, we state the following results.

Lemma 5 (Monotonicity of conditioned decision times)

The functions , are monotone on the probability simplex , in the sense that given two points , the function is monotonically non-increasing or monotonically non-decreasing along the line joining and .

Proof:

Consider probability vectors . Any point on line joining and can be written as , . We note that is given by:

The derivative of along the line joining and is given by

We note that the sign of the derivative of along the line joining two points is fixed by the choice of and . Hence, the function is monotone over the line joining and . Moreover, note that if , then is strictly monotone. Otherwise, is constant over the line joining and . \qed

Lemma 6 (Location of min-max)

Define by . A minimum of lies at the intersection of the graphs of and , or at some vertex of the probability simplex .

Proof:

Case 1: The graphs of and do not intersect at any point in the simplex .

In this case, one of the functions and is an upper bound to the other function at every point in the probability simplex . Hence, , for some , at every point in the probability simplex . From Theorem 1, we know that the minima of on the probability simplex lie at some vertex of the probability simplex .

Case 2: The graphs of and intersect at a set in the probability simplex , and let be some point in the set .

Suppose, a minimum of occurs at some point , and , where relint denotes the relative interior. With out loss of generality, we can assume that . Also, , and by assumption.

We invoke Lemma 5, and notice that and can intersect at most once on a line. Moreover, we note that , hence, along the half-line from through , , that is, . Since , is decreasing along this half-line. Hence, should achieve its minimum at the boundary of the simplex , which contradicts that is in the relative interior of the simplex . In summary, if a minimum of lies in the relative interior of the probability simplex , then it lies at the intersection of the graphs of and .

The same argument can be applied recursively to show that if a minimum lies at some point on the boundary, then either or the minimum lies at the vertex. \qed

In the following arguments, let be the set of points in the simplex , where , that is,

(16)

Also notice that the set is non empty if and only if has at least one non-negative and one non-positive entry. If the set is empty, then it follows from Lemma 6 that the solution of optimization problem in equation (15) lies at some vertex of the probability simplex . Now we consider the case when is non empty. We assume that the sensors have been re-ordered such that the entries in are in ascending order. We further assume that, for , the first entries, , are non positive, and the remaining entries are positive.

Lemma 7 (Intersection polytope)

If the set defined in equation (16) is non empty, then the polytope generated by the points in the set has vertices given by:

where for each
(17)
Proof:

Any satisfies the following constraints

(18)
(19)

Eliminating , using equation (18) and equation (19), we get:

(20)

The equation (20) defines a hyperplane, whose extreme points in are given by

Note that for , . Hence, these points define some vertices of the polytope generated by points in the set . Also note that the other vertices of the polytope can be determined by the intersection of each pair of lines through and , and and , for , and . In particular, these vertices are given by defined in equation (17).

Hence, all the vertices of the polytopes are defined by , , . Therefore, the set of vertices of the polygon generated by the points in the set is . \qed

Before we state the solution to the optimization problem (15), we define the following:

We also define

Theorem 2 (Worst case optimization)

For the optimization problem (15), an optimal probability vector is given by:

and the minimum value of the function is given by:

Proof:

We invoke Lemma 6, and note that a minimum should lie at some vertex of the simplex , or at some point in the set . Note that on the set , hence the problem of minimizing max reduces to minimizing on the set . From Theorem 1, we know that achieves the minima at some extreme point of the feasible region. From Lemma 7, we know that the vertices of the polytope generated by points in set are given by set . We further note that and are the value of objective function at the points in the set and the vertices of the probability simplex respectively, which completes the proof. \qed

V-C Scenario III (Optimization of the average decision time):

For the multi-objective optimization problem of minimizing all the decision times simultaneously on the simplex, we formulate the single aggregate objective function as the average of these decision times. The resulting optimization problem, for , is posed in the following way:

(21)

In the following discussion we assume , unless otherwise stated. We analyze the optimization problem in equation (21) as follows:

Lemma 8 (Non-vanishing Jacobian)

The objective function in optimization problem in equation (21) has no critical point on if the vectors are linearly independent.

Proof:

The Jacobian of the objective function in the optimization problem in equation (21) is

For , if the vectors are linearly independent, then is full rank. Further, the entries of are non-zero on the probability simplex . Hence, the Jacobian does not vanish anywhere on the probability simplex . \qed

Lemma 9 (Case of Independent Information)

For , if and are linearly independent, and , for some , then the following statements hold:

  1. if and have opposite signs, then has no critical point on the simplex , and

  2. for , has a critical point on the simplex if and only if there exists perpendicular to the vector .

Proof:

We notice that the Jacobian of satisfies

(22)

Substituting , equation (22) becomes

Since , and are linearly independent, we have

Hence, has a critical point on the simplex if and only if

(23)

Notice that, if , and have opposite signs, then equation (23) can not be satisfied for any , and hence, has no critical point on the simplex .

If , then equation (23) leads to

Therefore, has a critical point on the simplex if and only if there exists perpendicular to the vector