Distributed Inference with M-ary Quantized Data in the Presence of Byzantine Attacks

# Distributed Inference with M-ary Quantized Data in the Presence of Byzantine Attacks

## Abstract

The problem of distributed inference with M-ary quantized data at the sensors is investigated in the presence of Byzantine attacks. We assume that the attacker does not have knowledge about either the true state of the phenomenon of interest, or the quantization thresholds used at the sensors. Therefore, the Byzantine nodes attack the inference network by modifying modifying the symbol corresponding to the quantized data to one of the other M symbols in the quantization alphabet-set and transmitting the false symbol to the fusion center (FC). In this paper, we find the optimal Byzantine attack that blinds any distributed inference network. As the quantization alphabet size increases, a tremendous improvement in the security performance of the distributed inference network is observed.

We also investigate the problem of distributed inference in the presence of resource-constrained Byzantine attacks. In particular, we focus our attention on two problems: distributed detection and distributed estimation, when the Byzantine attacker employs a highly-symmetric attack. For both the problems, we find the optimal attack strategies employed by the attacker to maximally degrade the performance of the inference network. A reputation-based scheme for identifying malicious nodes is also presented as the network’s strategy to mitigate the impact of Byzantine threats on the inference performance of the distributed sensor network.

{IEEEkeywords}

Distributed Inference, Network-Security, Sensor Networks, Byzantine Attacks, Kullback-Leibler Divergence, Fisher Information.

\IEEEpeerreviewmaketitle

## 1 Introduction

Distributed inference in sensor networks has been widely studied by several scholars in the past three decades (See [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] and references therein). The distributed inference framework comprises of a group of spatially distributed sensors which acquire observations about a phenomenon of interest (POI) and send processed data to a fusion center (FC) where a global inference is made. Due to resource-constraints in sensor networks, this data is processed at the sensors in such a way that the observations are mapped to symbols from an alphabet set of size M, prior to transmission to the FC. When , we employ binary quantization to generate processed data. When , we send an M-ary symbol that is assumed to be generated via fine quantization. A sensor decision rule is assumed to be characterized by a set of quantization thresholds. In this paper, we use the phrases ’mapped to one of the M-ary symbols’ and ’quantized to an M-ary symbol’ interchangeably. A lot of work in the past has focussed on the binary quantization case, i.e., . In this paper, we consider the case of more general , being a special case. The framework of distributed inference networks has been extensively studied for different types of inference problems such as detection (e.g., [1, 12, 5, 6, 7, 8, 3]), estimation (e.g., [9, 10, 3]), and tracking (e.g., [11, 3]) in the presence of both ideal and non-ideal channels. In this paper, we focus our attention on two distributed inference problems, namely detection and estimation in the framework of distributed inference, where sensors quantize their data to M-ary symbols.

Although the area of sensor networks has been a very active field of research in the past, security problems in sensor networks have gained attention only in the last decade [13, 14, 15]. As the security threats have evolved more specifically directed towards inference networks, attempts have been made at the system-level to either prevent or mitigate these threats from deteriorating the network performance. While there are many types of security threats, in this paper, we address the problem of one such attack, called the Byzantine attack, in the context of distributed inference networks (see a recent survey [16] by Vempaty et al.). Byzantine attacks (proposed by Lamport et al. in [17]) in general, are arbitrary and may refer to many types of malicious behavior. In this paper, we focus only on the data-falsification aspect of the Byzantine attack wherein one or more compromised nodes of the network send false information to the FC in order to deteriorate the inference performance of the network. A well known example of this attack is the man-in-the-middle attack [18] where, on one hand, the attacker collects data from the sensors whose authentication process is compromised by the attacker emulating as the FC, while, on the other hand, the attacker sends false information to the FC using the compromised sensors’ identity. In summary, if the sensor’s authentication is compromised, the attacker remains invisible to the network, accepts the true decision from the sensor and sends to the FC in order to deteriorate the inference performance.

Marano et al., in [19], analyzed the Byzantine attack on a network of sensors carrying out the task of distributed detection, where the attacker is assumed to have complete knowledge about the hypotheses. This represents the extreme case of Byzantine nodes having an extra power of knowing the true hypothesis. In their model, they assumed that the sensors quantized their respective observations into M-ary symbols, which are later fused at the FC. The Byzantine nodes pick symbols using an optimal probability distribution that are conditioned on the true hypotheses, and transmit them to the FC in order to maximally degrade the detection performance. Rawat et al., in [20], also considered the problem of distributed detection in the presence of Byzantine attacks with binary quantizers at the sensors in their analysis. Unlike the authors in [19], Rawat et al. did not assume complete knowledge of the true hypotheses at the Byzantine attacker. Instead, they assumed that the Byzantine nodes derive the knowledge about the true hypotheses from their own sensing observations. In other words, a Byzantine node potentially flips the local decision made at the node. It does not modify the thresholds at the sensor quantizers. Rawat et al. also analyzed the performance of the network in the presence of independent and collaborative Byzantine attacks and modeled the problem as a zero-sum game between the sensor network and the Byzantine attacker. In addition to the analysis of distributed detection in the presence of Byzantine attacks, a reputation-based scheme was proposed by Rawat et al. in [20] for identifying the Byzantine nodes by accumulating the deviations between each sensor’s decision and the FC’s decision over a time window of duration . If the accumulated number of deviations is greater than a prescribed threshold for a given node, then the FC tags it as a Byzantine node. In order to mitigate the attack, the FC removes nodes which are tagged Byzantine node from the fusion rule. Another mitigation scheme was proposed by Vempaty et al. [21], where each sensor’s behavior is learnt over time and compared to the known behavior of the honest nodes. Any significant deviation in the learnt behavior from the expected honest behavior is labelled Byzantine node. Having learnt their parameters, the authors also proposed the use of this information to adapt their fusion rule so as to maximize the performance of the FC. In contrast to the parallel topology in sensor networks, Kailkhura et al. in [22] investigated the problem of Byzantine attacks on distributed detection in a hierarchical sensor network. They presented the optimal Byzantine strategy when the sensors communicate their decisions to the FC in multiple hops of a balanced tree. They assumed that the cost of compromising sensors at different levels of the tree varies, and found the optimal Byzantine strategy that minimizes the cost of attacking a given hierarchical network.

Soltanmohammadi et al. in [23] investigated the problem of distributed detection in the presence of different types of Byzantine nodes. Each Byzantine node type corresponds to a different operating point, and, therefore, the authors considered the problem of identifying different Byzantine nodes, along with their operating points. The problem of maximum-likelihood (ML) estimation of the operating points was formulated and solved using the expectation-maximization (EM) algorithm. Once the Byzantine node operating points are estimated, this information was utilized at the FC to mitigate the malicious activity in the network, and also to improve global detection performance.

Distributed target localization in the presence of Byzantine attacks was addressed by Vempaty et al. in [24], where the sensors quantize their observations into binary decisions, which are transmitted to the FC. Similar to Rawat et al.’s approach in [20], the authors in [24] investigated the problem of distributed target localization from both the network’s and Byzantine attacker’s perspectives, first by identifying the optimal Byzantine attack and second, mitigating the impact of the attack with the use of non-identical quantizers at the sensors.

In this paper, we extend the framework of Byzantine attacks when Byzantine nodes do not have complete knowledge about the true state of the phenomenon-of-interest (POI), and when the sensors generate M-ary symbols instead of binary symbols. We also assume that the Byzantine attacker is ignorant about the quantization thresholds used at the sensors to generate the M-ary symbols.1 Under these assumptions, we address two inference problems: binary hypotheses-testing and parameter estimation.

The main contributions of the paper are three-fold. First, we define a Byzantine attack model for a sensor network with individual sensors quantizing their observations into one of the M-ary symbols, when the attacker does not have complete knowledge about the true state of the POI and thresholds employed by the sensors. We model the attack strategy as a flipping probability matrix, where entry represents the probability with which the symbol is flipped into the symbol. Second, we show that quantization into M-ary symbols at the sensors, as opposed to binary quantization, improves both inference as well as security performance simultaneously. As a function of the number of Byzantine nodes in the network, we derive the optimal flipping matrix. Finally, we extend the mitigation scheme presented by Rawat et al. in [20] to the more general case where sensors generate M-ary symbols. We present simulation results to illustrate the performance of the reputation-based scheme for the identification of Byzantine nodes in the network.

The remainder of the paper is organized as follows. In Section 2, we describe our system model and present the Byzantine attack model for the case where sensors generate M-ary symbols when the attacker has no knowledge about the true state of the phenomenon of interest and quantization thresholds employed by the sensors. Next, we determine the most powerful attack strategy that the Byzantine nodes can adopt in Section 3. In the case of resource-constrained Byzantine attacks, where the attacker cannot compromise enough number of nodes in the network to blind it (to be defined in Section 2), we find the optimal Byzantine attack for a fixed fraction of Byzantine nodes in the network in the context of distributed detection and estimation in Sections 4 and 5 respectively. From the network’s perspective, we present a mitigation scheme in Section 6 that identifies the Byzantine nodes using reputation-tags. Finally, we present our concluding remarks in Section 7.

## 2 System Model

Consider an inference (sensor) network with N sensors, where fraction of the nodes in the network are assumed to be compromised (Refer to Figure a). These compromised sensors transmit false data to the fusion center (FC) in order to deteriorate the inference performance of the network. We assume that the network is designed to infer about a particular phenomenon, regarding which sensors acquire conditionally-independent observations. We denote the observation of the sensor as . This observation is mapped to one of the symbols, . In a compromised inference network, since the Byzantine sensors do not transmit their true quantized data, we denote the transmitted symbol as at the sensor. If the node is honest, then . Otherwise, we assume that the Byzantine sensor modifies to with a probability , as shown in Figure b. For the sake of compactness, we denote the transition probabilities depicted in the graph in Figure b using a row-stochastic matrix , as follows:

 P=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣p11p12…p1Mp21p22…p2M⋮⋮⋱⋮pM1pM2…pMM⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (1)

Since the attacker has no knowledge of quantization thresholds employed at each sensor, we assume that is independent of the sensor observations. The messages are transmitted to the fusion center (FC) where a global inference is made about the phenomenon of interest based on .

In order to consider the general inference problem, we assume that is the parameter that denotes the phenomenon of interest in the received signal at the sensor. If we are considering a detection/classification problem, is discrete (finite or countably infinite). In the case of parameter estimation, is a continuous set. Without any loss of generality, we assume if the problem of interest is classification. Hence, detection is a special case of classification with . In the case of estimation, we assume that .

Note that the performance of the FC is determined by the probability distribution (mass function) . Therefore, in Section 3, we analyze the behavior of in the presence of different attacks and identify the one with the greatest impact on the network.

## 3 Optimal Byzantine Attacks

Given the conditional distribution of , , and the sensor quantization thresholds, for , the conditional distribution of can be found as

 P(ui=m|θ)=∫λmλm−1p(ri|θ)dri (2)

for all .

If the true quantized symbol at the node is , a compromised node will modify it into as depicted in Figure b, and transmit it to the FC. Since the FC is not aware of the type of the node (honest or Byzantine), it is natural to assume that node is compromised with probability , where is the fraction of nodes in the network that are compromised. Therefore, we find the conditional distribution of at the FC as follows.

 P(vi=m|θ) = αP(vi=m|i=Byzantine,θ)+(1−α)P(vi=m|i=Honest,θ)=αM∑l=1P(ui=l|θ)⋅P(vi=m|ui=l,θ)+(1−α)P(ui=m|θ)=αM∑l=1plmP(ui=l|θ)+(1−α)P(ui=m|θ)=α∑l≠mplmP(ui=l|θ)+[(1−α)+αpmm]P(ui=m|θ)=[(1−α)+αpmm]+∑l≠m{αplm−[(1−α)+αpmm]}P(ui=l|θ). (3)

The goal of a Byzantine attack is to blind the FC with the least amount of effort (minimum ). To totally blind the FC is equivalent to making for all . In Equation (3), the RHS consists of two terms. The first one is based on prior knowledge and the second term conveys information based on the observations. In order to blind the FC, the attacker should make the second term equal to zero. Since the attacker does not have any knowledge regarding , it can make the second term of Equation (3) equal to zero by setting

 αplm=(1−α)+αpmm,∀ l≠m. (4)

Then the conditional probability becomes independent of the observations (or its quantized version ), resulting in equiprobable symbols at the FC. In other words, the received vector does not carry any information about and, therefore, results in the most degraded performance at the FC. So, the FC now has to solely depend on its prior information about in making an inference.

Having identified the condition in Equation (4) under which the Byzantine attack makes the greatest impact on the performance of the network, we identify the strategy that the attacker should employ in order to achieve this condition as follows. Since we need

 P(vi=m|θ)=(1−α)+αpmm=1/M,

. To minimize , one needs to make . In this paper, we denote the corresponding to this optimal strategy that minimizes the Byzantine attacker’s resources required to blind the FC as . Hence,

 αblind=M−1M.

Rearranging Equation (4), we have

 1α=1+(plm−pmm)=1+plm∀ l≠m. (5)

By setting to , we have , . That is, the transition probability is a highly-symmetric matrix. We summarize the result as a theorem as follows.

###### Theorem 1.

If the Byzantine attacker has no knowledge of the quantization thresholds employed at each sensor, then the optimal Byzantine attack is given as

 plm=⎧⎨⎩1M−1; % if l≠m0; otherwiseαblind=M−1M. (6)

We term Equation (6) as the optimal Byzantine attack, since the FC does not get any information from the data it receives from the sensors to perform an inference task. Therefore, the FC has to rely on prior information about the parameter , if available.

Theorem 1 can be extended to the case where the channels between sensors (attackers) are not perfect. The result is given in Appendix 8.

In Figure 2, we show how scales with increasing quantization alphabet size, . Since the quantized symbols are encoded into bits, we also show an exponential increase in as the number of bits needed to encode the symbols, i.e., , increases. This is also shown in Table 1. Note that, if the sensors use one additional quantization-bit (2-bit quantization) in their quantization scheme instead of 1-bit quantization (binary quantization), then the increases from 0.5 to 0.75. This trend is observed with increasing number of quantization bits, and when the sensors employ an 8-bit quantizer, then the attacker needs to compromise at least 99.6% of the sensors in the network to blind the FC. Obviously, the improvement in security performance is not free as the sensors incur a communication cost in terms of energy and bandwidth as the number of quantization bits increases. Therefore, in a practical world, the network designer faces a trade-off between the communication cost and the security guarantees.

Also, note that, when (1-bit quantization), our results coincide with those of Rawat et al. in [20], where the focus was on the problem of binary hypotheses testing in a distributed sensor network. On the other hand, our results are more general as they address any inference problem - detection, estimation or classification in a distributed sensor network. Another extreme case to note is when , in which case, . This means that the Byzantine attacker cannot blind the FC unless all the sensors are compromised.

In the following sections, we consider distributed detection and estimation problems in sensor networks and analyze the impact of the optimal Byzantine attack on these systems. For the sake of tractability, we consider a noiseless channel () at the FC in the framework of resource-constrained Byzantine attack. Therefore, according to Theorem 1, we restrict our attention to the set of highly-symmetric for the sake of tractability. In other words, we assume that

 plm={pif l≠m1−(M−1)potherwise. (7)

## 4 Distributed Detection in the Presence of Resource-Constrained Byzantine Attacks

In this section, we consider a resource-constrained Byzantine attack on binary hypotheses testing in a distributed sensor network where the phenomenon of interest is denoted as and is modeled as follows:

 θ={0; if H01; if H1. (8)

In order to characterize the performance of the FC, we consider Kullback-Leibler Divergence (KLD) as the performance metric. Note that KLD can be interpreted as the error exponent in the Neyman-Pearson detection framework [25], which means that the probability of missed detection goes to zero exponentially with the number of sensors at a rate equal to KLD computed at the FC. We denote KLD at the FC by and define it as follows:

 DFC = EH0[log(P(v|H0)P(v|H1))]  =∑m∈{1,⋯,M}NP(v=m|H0)⋅log(P(v=m|H0)P(v=m|H1)) (9)

Since we have assumed that the sensor observations are conditionally independent,2 KLD can be expressed as

 DFC=NDFC, (10)

where

 DFC=M∑m=1P(v=m|H0)⋅log(P(v=m|H0)P(v=m|H1)).

Note that the optimal Byzantine attack, as given in Equation (6), results in equiprobable symbols at the FC irrespective of the hypotheses. Therefore, under optimal Byzantine attack, resulting in the blinding of the FC.

On the other hand, if the attacker does not have enough resources to compromise fraction of sensors in the network (i.e. ), an optimal strategy for the Byzantine node is to use an appropriate matrix that deteriorates the performance of the sensor network to the maximal extent. As mentioned earlier in Section 3, we restrict our search to finding the optimal within a space of highly symmetric row-stochastic matrices, as given in Equation (7). Thus, we formulate the problem as follows.

###### Problem 1.

Given the value of , find the optimal within a space of highly symmetric row-stochastic matrices, as given in Equation (7), such that

 minimizepDFCsubject to0≤p≤1M−1

Theorem 2 presents the optimal flipping probability that provides the solution to Problem 1. Note that this result is independent of the design of the sensor network and, therefore, can be employed when the Byzantine has no knowledge about the network.

###### Theorem 2.

Given a fixed , the probability that optimizes within a space of highly symmetric row-stochastic matrices, as given in Equation (7), such that is minimized, is given by

 p∗=1M−1. (11)
###### Proof.

See Appendix 9. ∎

Note that this solution is of particular interest to the Byzantine attacker since the solution does not require any knowledge about the sensor network design. Also, the attacker’s strategy is very simple to implement.

### Numerical Results

For illustration purposes, let us consider the following example, where the inference network is deployed to aid the opportunistic spectrum access for a cognitive radio network (CRN). In other words, the CRs are sensing a licensed spectrum band to find the vacant band for the operation of the CRN.

Let the observation model at the sensor be defined as follows.

 ri=s(θ)+ni, (12)

where , is a BPSK-modulated symbol transmitted by the licensed (or the primary) user transmitter, and the noise is the AWGN at the sensor with probability distribution .

Therefore, the conditional distribution of under and can be given as and respectively. The range of spans the entire real line (). However, we assume that the quantizer restricts the support by limiting the range of output values to a smaller range, say . This parameter is called the overloading parameter [26] because the choice of dictates the amount of overloading distortion caused by the quantizer. Within this restricted range of observations, we assume a uniform quantizer with a step size (called the granularity parameter) given by , which dictates the granularity distortion of the quantizer. In other words, the observation is quantized using the following quantizer:

 ui=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩0;if −∞

where

 λi=A⋅[2(i−1)M−2−1].

Note that, and represent the restricted range of the quantizer, as discussed earlier. The sensor transmits a symbol to the FC, where if it is honest. In the case of the sensor being a Byzantine node, the decision is modified into using the flipping probability matrix as given in Equation (6).

Although the performance of a given sensor network is quantified by the probability of error at the FC, we use a surrogate metric, as described earlier, called the KLD at the FC (Refer to Equation (9)) for the sake of tractability. In an asymptotic sense, Stein’s Lemma [25] states that the KLD is the rate at which the probability of missed detection converges to zero under a constrained probability of false alarm. Therefore, in our numerical results, we present how KLD at the FC varies with the fraction of Byzantine nodes , in the network.

For the above sensor network, we assume that , and . In Figure 3, we plot the contribution of each sensor in terms of KLD at the FC as a function of , for 1-bit, 2-bit, 3-bit and 4-bit quantizations, i.e., 2, 4, 8 and 16 respectively, at the sensors. As per our intuition, we observe an improvement in both the detection performance (KLD) as well as security performance (). Therefore, for a given , the Byzantine attack can be mitigated by employing finer quantization at the sensors. Of course, the best that the designer can do is to let the sensors transmit unquantized data to the FC, whether in the form of observation samples or their sufficient statistic (likelihood ratio). In this case, we can see that , since .

## 5 Distributed Estimation in the Presence of Resource-Constrained Byzantine Attacks

In this section, we consider the problem of estimating a scalar parameter of interest, denoted by , in a distributed sensor network. As described in the system model, we assume that the sensor quantizes its observation into an M-ary symbol , and transmits to the FC. If the node is honest, then . Otherwise, we assume that the sensor is compromised and flips into using a flipping probability matrix . Under the assumption that the FC receives the symbols over an ideal channel, the estimation performance at the FC depends on the probability mass function .

The performance of a distributed estimation network can be expressed in terms of the mean-squared error, defined as . In the case of unbiased estimators, this mean-squared error is lower bounded by the Cramer-Rao lower bound (CRLB) [27], which provides a benchmark for the design of an estimator at the FC. We present this result in Equation (14):

 E[(^θ(v)−θ)2]≥1IFC, (14)

where

 IFC=E⎡⎣(∂logP(v,θ)∂θ)2⎤⎦. (15)

The term is well known as the Fisher information (FI), and is, therefore, a performance metric that captures the performance of the optimal estimator at the FC. Note that, as shown in Equation (16), can be further decomposed into two parts, one corresponding to the prior knowledge about at the FC, and the other (denoted as ) representing the information about , in the sensor transmissions :

 IFC=JFC+E⎡⎣(∂logp(θ)∂θ)2⎤⎦, (16)

where

 JFC=E⎡⎣(∂logP(v|θ)∂θ)2⎤⎦. (17)

In most cases, a closed form expression for the mean-squared error is intractable and, therefore, conditional Fisher information (FI) is used as a surrogate metric to quantify the performance of a distributed estimation network. In this paper, we also use conditional FI of the received data as the performance metric. Since the sensor observations are conditionally independent resulting in independent , we denote the conditional FI as and is defined as follows:

 JFC=NJFC, (18)

where

 JFC=E[∂∂θlogP(v|θ)]2=−E[∂2∂θ2logP(v|θ)]. (19)

Following the same approach as in Section 4, we consider the problem of finding an optimal resource-constrained Byzantine attack when , by finding the symmetric transition matrix that minimizes the conditional FI at the FC. This can be formulated as follows.

###### Problem 2.

Given the value of , determine the optimal within a space of highly symmetric row-stochastic matrices, as given in Equation (7), such that

 minimizepJFCsubject to0≤p≤1M−1.

Theorem 3 presents the optimal flipping probability that provides a solution to Problem 2. Note that this result is independent of the design of the sensor network and, therefore, can be employed when the Byzantine has no knowledge about the network.

###### Theorem 3.

Given a fixed , the flipping probability that optimizes over a space of highly symmetric row-stochastic matrices, as given in Equation (7), by minimizing is given by

 p∗=1M−1.
###### Proof.

See Appendix 10. ∎

### Numerical Results

As an illustrative example, we consider the problem of estimating at the FC based on all the sensors’ transmitted messages. Let the observation model at the sensor be defined as follows:

 ri=θ+ni, (20)

where the noise is the AWGN at the sensor with probability distribution . The sensors employ the same quantizer as the one presented in Equation (13). The quantized symbol, denoted as at the sensor, is then modified into using the flipping probability matrix , as given in Equation (6).

Figure 4 plots the conditional FI corresponding to one sensor, for different values of and , when the uniform quantizer is centered around the true value of . Note that as SNR increases (), we observe that it is better for the network to perform as much finer quantization as possible to mitigate the Byzantine attackers. On the other hand, if SNR is low, coarse quantization performs better for lower values of . This phenomenon of coarse quantization performing better under low SNR scenarios, can be attributed to the fact that more noise gets filtered as the quantization gets coarser (decreasing ) than the signal itself. On the other hand, in the case of high SNR, since the signal level is high, coarse quantization cancels out the signal component significantly, thereby resulting in a degradation in performance.

## 6 Mitigation of Byzantine Attacks in a Bandwidth-Constrained Inference Network

Given that the distributed inference network is under Byzantine attack, we showed that the performance of the network can be improved by increasing the quantization alphabet size of the sensors. Obviously, in a bandwidth-constrained distributed inference network, the sensors can only transmit with the maximum possible , which is finite. In this section, we assume that the network cannot further increase the quantization alphabet size due to this bandwidth constraint. Therefore, we present a reputation-based Byzantine identification/mitigation scheme, which is an extension of the one proposed by Rawat et al. in [20], in order to improve the inference performance of the network.

### 6.1 Reputation-Tagging at the Sensors

As proposed by Rawat et al. in [20], the FC identifies the Byzantine nodes by iteratively updating a reputation-tag for each node as time progresses. We extend the scheme to include fine quantization scenarios, i.e., , and analyze its performance through simulation results.

As mentioned earlier in the paper, the FC receives a vector of received symbols from the sensors and fuses them to yield a global decision, denoted as . We assume that the observation model is known to the network designer, and is given as follows:

 ri=fi(θ)+ni, (21)

where denotes the known observation model. We denote the quantization rule employed at the sensor as . Therefore, the quantized message at the sensor is given by . As discussed earlier, the sensor flips into using a flipping probability matrix . Since the FC makes a global inference , it can calculate the squared-deviation of each sensor from the expected message that it is to nominally transmit as follows:

 di=(γ−1(vi)−fi(^θ))2, (22)

where is the inverse of the sensor quantizer and it is assumed to be the centroid of the corresponding decision region of the quantizer .

Note that is the received symbol which characterizes the behavior (honest or Byzantine) of the sensor, while is the signal that the FC expects the sensor to observe. If the sensor is honest, we expect the mean of to be small. On the other hand, if the sensor is a compromised node, then the mean of is expected to be large. Therefore, we accumulate the squared-deviations over time intervals and compute a reputation tag , as a time-average for the node as follows:

 Λi=1TT∑t=1di(t). (23)

The sensor is declared honest/Byzantine using the following threshold-based tagging rule

 Λi\lx@stackrelByzantine≷Honestη. (24)

The performance of the above tagging rule depends strongly on the choice of . Note that the threshold should be chosen based on two factors. Firstly, should be chosen in such a way that the probability with which a malicious node is tagged Byzantine is high. Higher the value of , lower is the chance of tagging a node to be Byzantine and vice-versa. This results in a tradeoff between the probability of detecting a Byzantine vs. the probability of falsely tagging an honest node as a Byzantine. Secondly, the value of also plays a role in the choice of , and therefore, the performance of the tagging rule. We illustrate this phenomenon in our simulation results.

### 6.2 Optimal Choice of the Tagging Threshold as T→∞

In this paper, we denote the true type of the node as , where corresponds to honest behavior, while corresponds to Byzantine behavior, for all . Earlier, in this section, we presented Equation (24) which allows us to make inferences about the true type. But, the performance of the Byzantine tagging scheme corresponding the sensor is quantified by the conditional probabilities , for both . In order to find the optimal choice of in Equation (24), we continue with the Neyman-Pearson framework even in the context of Byzantine identification, where the goal is to maximize , subject to the condition that .

To find these two conditional probabilities and , we need a closed form expression of the conditional distributions, and respectively. In practice, where is finite, it is intractable to determine the conditional distribution of , which is necessary to come up with the optimal choice of . Therefore, in this paper, we assume that and present an asymptotic choice of the tagging threshold used in Equation (24).

As , since is independent across , due to central-limit theorem, , where

 μi,T=E(Λi | Ti=T)=E[(γ−1(vi(t))−^θ(t))2| Ti=T] (25)

and

 σ2i,T=Var(Λi | Ti=T)=1TVar[(γ−1(vi(t))−^θ(t))2| Ti=T]. (26)

In this paper, we do not present the final form of and in order to preserve generality. Assuming that is independent across sensors as well as time, the moments of can be computed for any given FC’s inference at time about a given phenomenon. Although the final form of and is not presented, since is a function of , we present the conditional probability of in Equation (27), which is necessary for the computation of and .

 P(vj|Ti=T)=∫P(vj|θ,Ti=T)p(θ)dθ, (27)

where can be calculated as follows:

 P(vj=m|θ,Ti=H)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩P(uj=m|θ),if j=i(1−πBH)P(uj=m|θ) +πBHM∑k=1pkmP(uj=k|θ),if j≠i (28)

and

 P(vj=m|θ,Ti=B)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩M∑k=1pkmP(uj=k|θ),if j=i(1−πBB)P(uj=m|θ) +πBBM∑k=1pkmP(uj=k|θ),if j≠i, (29)

where and are conditional probabilities of the node’s type, given the type of the node. Since there are fraction of nodes in the network, given that the FC knows the type of node as , the conditional probability of the node belonging to a type is given by and .

Given the conditional distributions and , we find the performance of the Byzantine identification scheme as follows:

 P(Λi≥η|Ti=H)=Q(η−μi,Hσi,H)P(Λi≥η|Ti=B)=Q(η−μi,Bσi,B). (30)

Under the NP framework, the optimal can be chosen by letting , when is normally distributed conditioned on the true type of a given node. In other words,

 Q(η−μi,Hσi,H)=ξ (31)

or equivalently,

 ηoptimal=μi,H+σi,HQ−1(ξ). (32)

Note that, since is a function of , it follows that both and are functions of . Although we do not provide a closed-form expression for as a function of , we provide the following example to portray how varies with different values of .

#### Example: Variation of η as a function of α

Consider a distributed estimation network with identical nodes. Let the prior distribution of the true phenomenon be the uniform distribution . We assume that the sensing channel is an AWGN channel where the sensor observations is given by . Therefore, the conditional distribution of the sensor observations is , when conditioned on . We assume that the sensors employ the quantizer rule shown in Equation (13) on their observations . At the FC, we let be defined as the centroid function that returns . Let be the fusion rule employed at the FC to estimate .

Since the network comprises of identical nodes, without any loss of generality, we henceforth focus our attention on the reputation-based identification rule at sensor-1. Substituting the above mentioned fusion rule in the squared-deviation corresponding to sensor-1 in Equation (22), we have

 d1=(γ−1(v1)−1M5∑i=1γ−1(vi(t)))2=(M−1Mγ−1(v1)−1M5∑i=2γ−1(vi(t)))2. (33)

Let us denote , for all and . Here, can be computed using Equation (28) as follows:

 P(vi=m|T1=H)=∫∞−∞P(vi=m|θ,T1=H)p(θ)dθ=∫10P(vi=m|θ,T1=H)dθ =⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩a1,mif i=1Nα(N−1)(M−1)+ (1−MNα(N−1)(M−1))ai,m% otherwise. (34)

where , for all . Note that, since all the nodes in the network are identical, is independent of the node-index , and therefore, , for all .

Thus, the conditional mean and variance, and , are given as follows for the special case of :

 μ1H =E⎡⎣(M−1Mγ−1(v1)−1M5∑i=2γ−1(vi(t)))2| Ti=H⎤⎦ =1M2E⎡⎣((M−1)γ−1(v1)−5∑i=2γ−1(vi(t)))2| Ti=H⎤⎦ =1M2[(M−1)2ϕ12+4ϕ22+12ϕ221−8(M−1)ϕ11ϕ21] (35)

and

 σ21H=1TVar[(γ−1(vi(t))−^θ(t))2| Ti=H]=1T{Δ−μ21H}, (36)

where

 Δ =E⎡⎣(M−1Mγ−1(v1)−1M5∑i=2γ−1(vi(t)))4| Ti=H⎤⎦ =1M4[(M−1)4ϕ14−16(M−1)3ϕ13ϕ21+6(M−1)2ϕ12{4ϕ22+12ϕ221}−4(M−1)ϕ11(4ϕ23+36ϕ22ϕ21+24ϕ321)+4ϕ24+12ϕ23ϕ21+36(ϕ23ϕ21+ϕ222+2ϕ22ϕ221)+24(ϕ421+3ϕ22ϕ221)]. (37)

Thus, for , we compute the tagging threshold numerically as shown in Equation (32), and plot the variation of as a function of in Figure 5. Note that, in our numerical results, we observe that the optimal choice of is a convex function of , where the curvature of the convexity decreases with increasing . This can be clearly seen from Figure b, where we only plot the case of . We observe a similar behavior for all the other values of , and therefore, present the case of to illustrate the convex behavior of . In other words, for very large values of , the choice of becomes independent of , for any fixed .

### 6.3 Simulation Results

In order to illustrate the performance of the proposed reputation-based scheme, we consider a sensor network with a total of 100 sensors in the network, out of which 20 are Byzantine sensors. Let the sensor quantizers be given by Equation (13) and the fusion rule at the FC be the MAP rule, given as follows:

 N∑i=1log(P(vi|H1)P(vi|H0))\lx@stackrel^θ=1≷^θ=0logp0p1. (38)

Figure 6 plots the rate of identification of the number of Byzantine nodes in the network for the proposed reputation-based scheme for different sizes of the quantization alphabet set. Note that the convergence rate deteriorates as increases. This is due to the fact that the Byzantine nodes have increasing number of symbol options to flip to, because of which a greater number of time-samples are needed to identify the malicious behavior. In addition, we also simulate the evolution of mislabelling an honest node as a Byzantine node in time, and plot the probability of the occurrence of this event in Figure 7. Just as the convergence deteriorates with increasing , we observe a similar behavior in the evolution of the probability of mislabelling honest nodes. Another important observation in Figure 7 is that the probability of mislabelling a node always converges to zero in time. Similarly, we simulate the evolution of mislabelling a Byzantine node as an honest one in time in Figure 8. We observe similar convergence of the probability of mislabelling a Byzantine node as an honest node to zero, with a rate that decreases with increasing number of quantization levels, . Therefore, Figures 6, 7 and 8 demonstrate that, after a sufficient amount of time, the reputation-based scheme always identifies the true behavior of a node within the network, with negligible number of mislabels.

## 7 Concluding Remarks

In summary, we modelled the problem of distributed inference with M-ary quantized data in the presence of Byzantine attacks, under the assumption that the attacker does not have knowledge about either the true hypotheses or the quantization thresholds at the sensors. We found the optimal Byzantine attack that blinds the FC in the case of any inference task for both noiseless and noisy FC channels. We also considered the problem of resource-constrained Byzantine attack () for distributed detection and estimation in the presence of resource-constrained Byzantine attacker for the special case of highly symmetric attack strategies in the presence of noiseless channels at the FC. From the inference network’s perspective, we presented a mitigation scheme that identifies the Byzantine nodes through reputation-tagging. We also showed how the optimal tagging threshold can be found when the time-window . Finally, we also investigated the performance of our reputation-based scheme in our simulation results and show that our scheme always converges to finding all the compromised nodes, given sufficient amount of time. In our future work, we will investigate the optimal resource-constrained Byzantine attack in the space of all row-stochastic flipping probability matrices, and if possible, find schemes that mitigate the Byzantine attack more effectively.

\appendices

## 8 Optimal Byzantine Attack in the Presence of a Discrete Noisy Channel at the FC

Given that the messages are transmitted to the fusion center (FC), we assume a discrete noise channel between the sensors and the FC, where is the probability with which is transformed to symbol at the sensor. Based on the received at the FC, a global inference is made about the phenomenon of interest. In this paper, we assume that the row-stochastic channel matrix is invertible for the sake of tractability.

Given the transition probability matrix for the channel between the sensors and the FC, we assume that the FC receives when the the sensor transmits , with a probability . The conditional distribution of under a given phenomenon , is given as

 P(zi=n|θ)=M∑m=1qmnP(vi=m|θ). (39)

Note that if is a doubly stochastic matrix, since , it is sufficient for the Byzantine attacker to ensure . Thus, by Theorem 1, we have the following theorem when is a doubly stochastic matrix.

###### Theorem 4.

If the channel matrix is doubly-stochastic, and if the Byzantine attacker has no knowledge about the sensors’ quantization thresholds, then the optimal Byzantine attack is given as

 plm=⎧⎨⎩1M−1; % if l≠m0; otherwiseαblind=M−1M. (40)

Therefore, we focus our attention to any general row-stochastic channel matrix , where need not necessarily sum to unity for all . In other words, the Byzantine attacker has to find an alternative strategy to blind the FC, where . Substituting Equation (3) in Equation (39) and rearranging the terms, we have the following.

 P(zi=n|θ)=M∑m=1qmnP(vi=m|θ) =M∑m=1qmn[(1−α)+αpmm] +M∑m=1qmn⎧⎨⎩∑l≠m{αplm−[(1−α)+αpmm]}P(ui=l|θ)⎫⎬⎭ =M∑m=1qmn[(1−α)+αpmm] +M∑l=1⎡⎣∑m≠lqmn{αplm−[(1−α)+αpmm]}⎤⎦P(ui=l|θ). (41)

The goal of a Byzantine attack is to blind the FC with the least amount of effort (minimum ). To totally blind the FC is equivalent to making for all . In Equation (41), the RHS consists of two terms. The first one is based on prior knowledge and the second term conveys information based on the observations. In order to blind the FC, the attacker should make the second term equal to zero. Since the attacker does not have any knowledge regarding , it can make the second term of Equation (41) equal to zero by setting

 ∑m≠lqmn{αplm−[(1−α)+αpmm]}=0 for all 1≤n,l≤M. (42)

Then the conditional probability becomes independent of the observations (or its quantized version ), resulting in equiprobable symbols at the FC. In other words, the received vector does not carry any information about , thus making FC solely dependent on its prior information about in making an inference.

In order to identify the strategy that the attacker should employ to achieve the condition in Equation (