On the Performance Analysis of Binary Hypothesis Testing with Byzantine Sensors
We investigate the impact of Byzantine attacks in distributed detection under binary hypothesis testing. It is assumed that a fraction of the transmitted sensor measurements are compromised by the injected data from a Byzantine attacker, whose purpose is to confuse the decision maker at the fusion center. From the perspective of a Byzantine attacker, under the injection energy constraint, an optimization problem is formulated to maximize the asymptotic missed detection error probability, which is based on the Kullback-Leibler divergence. The properties of the optimal attack strategy are analyzed by convex optimization and parametric optimization methods. Based on the derived theoretic results, a coordinate descent algorithm is proposed to search the optimal attack solution. Simulation examples are provided to illustrate the effectiveness of the obtained attack strategy.
Yuqing Ni, Kemi Ding, Yong Yang, Ling Shi
1. Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong
E-mail: firstname.lastname@example.org, email@example.com
2. School of Electrical, Computer and Energy Engineering, Arizona State University, United States of America
3. School of Mechatronic Engineering, Guangdong Polytechnic Normal University, China
Wireless sensor networks (WSNs) deploy a large number of sensors to monitor their environment and transmit their measurements to a remote fusion center over wireless communication links. They have been extensively applied in health care monitoring, environmental sensing and industrial monitoring. Based on these received measurements, the fusion center makes a decision about the presence or absence of the phenomenon of interest. Distributed detection at the fusion center has been well studied in detection theory literature [2, 1].
However, these sensors are vulnerable to malicious attacks due to their own limited capabilities and the distributed nature of WSNs. One typical attack type is Byzantine attack. According to , Byzantine attack refers to tampering or falsifying the transmitted data by some internal adversary who has the knowledge about the WSNs. The purpose of the Byzantine attackers is to confuse the fusion center and let the fusion center make an incorrect decision about the state of nature. Distributed detection in the presence of Byzantine attacks has been widely studied in state-of-the-art works. Marano et al.  considered the distributed detection under the Neyman-Pearson setup, where a fraction of the sensors were compromised by a Byzantine attacker. An optimal attack strategy to minimize the detection error exponent, which is based on the Kullback-Leibler divergence, was obtained by using a “water-filling” procedure. Rawat et al.  analyzed the performance limits of collaborative spectrum sensing with the presence of Byzantine attackers, who did not know the true state of nature. Optimal strategies for the Byzantine attackers and the fusion center were derived under a minimax game framework. Kailkhura et al.  adopted Chernoff information as the performance metric and obtained closed-form expressions for the optimal attack strategies which degraded the detection performance most in the asymptotic regime.
All the works discussed so far for distributed detection under Byzantine attacks consider scenarios where the values of transmitted measurements can only be chosen from a discrete finite alphabet, i.e., . We consider a more general case where the measurement can be any real number. Furthermore, a constraint for the attack power is taken into consideration in our work. We are interested in analytically characterizing the impact of the malicious data injected by a Byzantine attacker. Specifically, from the Byzantine attacker’s perspective, what is the most effective attack strategy under limited injection power?
In this work, we adopt a standard model in distributed detection under binary hypotheses versus with known Gaussian distributions. Measurements are independently and identically distributed conditioned on the unknown hypothesis. We assume that the Byzantine attacker knows the true state of nature and they inject independent Gaussian noises to a fraction of the measurements based on this knowledge. The fusion center makes the detection under the Neyman-Pearson setup.
The remainder of this paper is organized as follows: Section 2 introduces the Byzantine attack model and the problem of interest. Section 3 provides some preliminaries about the approximation methods of the KL divergence between Gaussian mixture models. Section 4 presents the main theoretic results regarding the optimal attack strategy and proposes an algorithm to search the optimal solution. Section 5 shows simulation examples and gives interpretations. Section 6 draws conclusions.
Notations: denotes the set of real numbers. is the -dimensional Euclidean space. () is the set of positive semi-definite (definite) matrices. When () , we simply write (). denotes a Gaussian distribution with mean and variance . The notation is read as “is distributed according to”. stands for the trace of a matrix. and the superscript denote the Euclidean norm and the transpose of a vector, respectively.
2 Problem Formulation
Consider a binary state detection problem, where , using sensors’ measurements. Define the measurement from sensor as . Given the state , we assume that all measurements are independently and identically distributed (i.i.d.). When the state , the probability measure generated by is and when , it is denoted as . We assume that the probability measures and are Gaussian distributions under two hypotheses and :
2.1 Byzantine attack model
Denote the manipulated measurements at sensor as
where is the bias vector injected by the attacker obeying Gaussian distributions under two hypotheses:
Assume that the injected bias is independent of the original measurement . Furthermore, and . Correspondingly, the manipulated measurement is also Gaussian distributed. Its probability measures under two hypotheses and are given by
The following assumption is made on the attacker.
(Model Knowledge): The attacker knows the probability measures and and the true state .
Generally, this is a common assumption regarding the worst-case attacks, which is also included in [7, 8, 4, 9]. Moreover, this assumption is in accordance with the Shannon’s maxim, that is the defensive systems should be designed under the assumption that the enemy will immediately gain full knowledge of the systems. Therefore, the probability measures and can be developed by the attacker. The true state can be obtained by deploying attacker’s own sensor network. Based on the model knowledge, the attacker is capable of well designing the injected vectors to confuse the fusion center. Let the parameter represent the attacking power of the adversary. We assume that the measurements received at the fusion center are manipulated by the attacker with probability . Therefore, the -th sample at the fusion center is distributed as follows:
Note that all of these measurements are conditional i.i.d..
2.2 Problem of interest
The attacker aims at devastating the detection performance at the fusion center. Similar to  and , we quantify the impact of Byzantine attacks by Kullback-Leibler (KL) divergence, which measures the “distance” between the hypotheses under test. The KL divergence determines the missed detection error probability under the Neyman-Pearson setup by Stein’s lemma . A smaller KL divergence implies a larger missed detection error probability at the fusion center. The attacker should choose and wisely to minimize the KL divergence under an injection energy constraint. We consider the following optimization problem from the perspective of the Byzantine attacker:
where is a given positive constant, denoting the degree of difficulty for the Byzantine attack. A larger allows more energy to inject, which avails the attacker of more opportunities to launch the Byzantine attack.
3 Preliminary: KL Divergence Approximation between Gaussian Mixture Models
In this section, we introduce several methods to approximate the KL divergence between two Gaussian mixtures, which is a key supporting technique to deal with the objective in Problem 2.2, since there is no accurate closed-form expression.
3.1 Monte Carlo sampling
For large dimension , Monte Carlo simulation is the only method that can estimate with arbitrary accuracy. We can draw i.i.d. samples from the probability density function , and we have :
3.2 Upper bound approximation
By the chain rule for relative entropy , the upper bound of the KL divergence can be given by:
3.3 Gaussian approximation
A common method is to replace the Gaussian mixtures with modified Gaussian distributions . Denote the Gaussian approximations as and :
Based on this Gaussian approximation method, the KL divergence between two Gaussian mixture models then can be expressed in a closed form .
The above three approximations have their own features. The Monte Carlo sampling performs much better in accuracy, especially for high-dimension cases. The upper bound approximation is more concise, but somewhat loose. The Gaussian approximation is a closed-form expression and probably, it tends to be followed by more theoretic analysis. In the following sections, we mainly focus on the Gaussian approximation and derive some theoretic results.
4 Main Results
Due to the complexity of Problem 2.2, in this paper, we only consider the scalar case , aiming to get some inspiring insights. By the Gaussian approximation, the KL divergence objective is then transformed into:
The problem is complex with all the decision variables , , , , and . To deal with this challenging situation, we mildly simplify it by fixing variables , , and first, and we show that it can be transformed into a convex optimization by change of variables with respect to Gaussian variances and . Second, we reduce the solution space to a search space only depending upon the Gaussian means and , and the attacking power . By proving that the new objective is continuous at the above three variables, we reveal the special characteristics of the optimal attack solution. Finally, a coordinate descent algorithm is proposed to search the optimal Byzantine attack policy.
4.1 Results regarding and
In this subsection, we fix the Gaussian means and , and the attacking power . For notational convenience, we define the following constants:
The Byzantine attack optimization problem is then transformed into
To make the problem feasible, we further assume that the given variables satisfy
4.2 Results regarding , , and
For the remaining three variables , and , we will show that there exist some good properties for the optimal solutions. It is treated as a parametric optimization problem. Before that, some preliminaries are presented first. We give the following terms, definitions and Lemma 4.2, mainly based on  and .
Let and be subsets of and , respectively. A correspondence from to is a map that associates each element with a nonempty subset . We denote such a correspondence as .
A correspondence is upper-semicontinuous at if and only if for any open set such that , there exists an open set containing , such that for any , holds. It is said to be upper-semicontinuous on if and only if it is upper-semicontinuous at each .
A correspondence is lower-semicontinuous at if and only if for any open set such that , there exists an open set containing such that for any , holds. It is said to be lower-semicontinuous on if and only if it is lower-semicontinuous at each .
A correspondence is continuous on if and only if is both upper-semicontinuous and lower-semicontinuous on .
A correspondence is said to be
compact-valued at if is a compact set;
convex-valued at if is a convex set.
A correspondence is said to be compact-valued (convex-valued) if it is compact-valued (convex-valued) at each .
(Berge’s Maximum Theorem under Convexity) Let be a continuous function, and is convex in for each given . Let be a continuous, compact-valued, and convex-valued correspondence. Let and be defined as:
Then is a continuous function on , and is an upper-semicontinuous, compacted-valued, and convex-valued correspondence on .
Based on the above preliminaries, we denote two variables as and . A subset of is described as , and a subset of is described as . A continuous function is defined as:
For notational convenience, we define the following two functions as:
A correspondence is defined as:
Consider the optimization problem:
In Problem 4.2, is a continuous function on , and is an upper-semicontinuous, compacted-valued, and convex-valued correspondence on .
The proof is mainly based on Lemma 4.2. It is obvious that , which is the objective in Problem 4.1, is convex in for each given . For the rest part, we need to check the properties of the correspondence .
Compact-valuedness of is obvious, since for each , is closed and bounded. Convex-valuedness is also obvious. In the following, we will show that the correspondence is both upper-semicontinuous and lower-semicontinuous.
(Upper-semicontinuous) Let be an open set such that . Define an -neighborhood of in by
We will prove the upper-semicontinuity by contradiction. Suppose that is not upper-semicontinuous at . Then , such that and . Choose a sequence , and let , with but . We will first show that the sequence has a convergent subsequence since the sequence lies in a compact set, which is stated by the Bolzano-Weierstrass theorem . Since , we have , and . Therefore, there is such that for all , we have
for some small enough positive . By some tedious but basic calculations, it follows that for , we have , where is the compact set defined by:
For brevity, and are denoted as:
Therefore, there is a subsequence of , which we will continue to denote by for notation convenience, converging to a limit . Moreover, since and , , we also have . Because , is directly obtained. However, for any , and is an open set. Therefore, we also have , which is a contradiction. This validates the upper-semicontinuity of the correspondence .
(Lower-semicontinuous) Let be an open set such that . Let be a point in this intersection, and therefore . We denote an internal point of the triangle area characterized by as , i.e.,
Since is open, for , close to . Let , and then . We will show the lower-semicontinuity by contradiction. Suppose that holds for all in any neighborhood of . Take a sequence , and pick such that . Since , for sufficient large, . It implies , which is a contradiction.
After proving that the correspondence is continuous, compact-valued and convex-valued, we conclude that is continuous and is upper-semicontinuous, compacted-valued and convex-valued according to Lemma 4.2.
Theorem 4.2 states that is continuous at each . Fig. 1 illustrates the case when and are scalars. is represented by the pink curve, which is like “a winding stream running through high mountains”. It means that for each fixed , is the minimum which can be found with respect to . Moreover, the global minimum of is on this pink curve. We only need to search along this continuous curve, and we will find the optimal attack strategy for this Byzantine attack optimization problem.
4.3 Coordinate descent algorithm
In the last subsection, we have proved that is continuous at , where . With the Gaussian approximation method, the minimum of Problem 2.2 then can be searched along by numerical algorithms. Since we have only proved the existence of continuity for , other properties, i.e., differentiability and twice differentiability, are not guaranteed. Based only on the continuity, we propose Algorithm 1 to search the optimal Byzantine attack strategy for Problem 4.2. The cvx toolbox mentioned is a MATLAB-based modeling system for convex optimization.
5 Numerical Results
In this section, we provide some numerical examples to illustrate the main results. We consider a scenario where the original probability measures and are distributed as:
As shown in the first sub-figure in Fig. 2, with the Gaussian approximation method, the KL divergence can be minimized by using the proposed coordinate descent algorithm when power constraint . After iterations, a feasible attack solution is obtained as , , , , , and a resulting KL divergence very close to . This attack strategy is derived with the Gaussian approximation of the KL divergence objective. The real probability measures and the KL divergence between two Gaussian mixture models are portrayed in Fig. 3. It can be seen that the original KL divergence is without Byzantine attack. By Monte Carlo sampling, which is introduced in Section 3.1 with the sample size , the KL divergence under Byzantine attack is computed to be . The decrease of the KL divergence implies a tremendous increase of the missed detection error probability in the hypothesis testing as follows. Without the Byzantine attack, the false alarm probability and the missed detection error probability under the Neyman-Pearson setup almost can be zero based on i.i.d. measurements from sensors. On the other hand, the designed Byzantine attack increases the missed detection error probability to while keeping the false alarm probability under .
The second sub-figure in Fig. 2 shows the approximated KL divergence curve with respect to the attacking power when constraint level . For each fixed , we compute the KL divergence by using coordinate descent algorithm. We find that a larger attacking power leads to a smaller KL divergence, which means a larger missed detection error probability. Notice that the KL divergence is still greater than even when . This is because the Byzantine attack is launched by injecting noises instead of directly tampering measurements and it is conducted under an energy constraint.
In this paper, a binary hypothesis testing is conducted based on measurements from a number of identical sensors, some of which may be compromised by a Byzantine attacker with probability . The attacker manipulates the measurements by injecting independent noises under the power constraint. We first formulated this attack optimization problem by using KL divergence to evaluate the attack impact. We then investigated the optimization problem with Gaussian approximation method and derived some theoretic results regarding the optimal attack strategy. In addition, a coordinate descent algorithm based on the theoretic results was proposed to search the optimal solution. Numerical examples verified the main results and showed the attack impact for the original problem, which is difficult to solve directly. Investigating this problem in vector case and with other approximation methods is a future direction.
-  R. Viswanathan and P. K. Varshney, Distributed detection with multiple sensors Part I. Fundamentals, Proceedings of the IEEE, 85(1): 54-63, 1997.
-  P. K. Varshney, Distributed Detection and Data Fusion. Springer Science & Business Media, 2012.
-  A. Vempaty, L. Tong and P. K. Varshney, Distributed inference with Byzantine data: State-of-the-art review on data falsification attacks, IEEE Signal Processing Magazine, 30(5): 65-75, 2013.
-  S. Marano, V. Matta, and L. Tong, Distributed detection in the presence of Byzantine attacks, IEEE Transactions on Signal Processing, 57(1): 16-29, 2009.
-  A. S. Rawat, P. Anand, H. Chen and P. K. Varshney, Collaborative spectrum sensing in the presence of Byzantine attacks in cognitive radio networks, IEEE Transactions on Signal Processing, 59(2), 774-786, 2011.
-  B. Kailkhura, Y. S. Han, S. Brahma and P. K. Varshney, Asymptotic analysis of distributed Bayesian detection with Byzantine data, IEEE Signal Processing Letters, 22(5), 608-612, 2015.
-  X. Ren, J. Yan, and Y. Mo, Binary hypothesis testing with Byzantine sensors: fundamental tradeoff between security and efficiency, IEEE Transactions on Signal Processing, 66(6): 1454-1468, 2018.
-  X. Ren and Y. Mo, Secure detection: performance metric and sensor deployment strategy, IEEE Transactions on Signal Processing, 66(17): 4450-4460, 2018.
-  G. Fellouris, E. Bayraktar, and L. Lai, Efficient Byzantine sequential change detection, IEEE Transactions on Information Theory, 64(5): 3346-3360, 2018.
-  M. Coutino, S. P. Chepuri and G. Leus, Submodular sparse sensing for Gaussian detection with correlated observations, in IEEE Transactions on Signal Processing, 66(15): 4025-4039, 2018.
-  T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2012.
-  J. R. Hershey and P. A. Olsen, Approximating the Kullback Leibler divergence between Gaussian mixture models, in IEEE International Conference on Acoustics, Speech and Signal Processing, 4: IV-317-IV-320, 2007.
-  J. Duchi, Derivations for linear algebra and optimization, Berkeley, California, 3, 2007.
-  R. K. Sundaram, A First Course in Optimization Theory. Cambridge university press, 1996.
-  J. P. Aubin and H. Frankowska, Set-Valued Analysis. Springer Science & Business Media, 2009.
-  R. G. Bartle and D. R. Sherbert, Introduction to Real Analysis. Wiley New York, 2000.