Social Learning Against Data Falsification in Sensor Networks
Abstract
Although surveillance and sensor networks play a key role in Internet of Things, sensor nodes are usually vulnerable to tampering due to their widespread locations. In this letter we consider data falsification attacks where an smart attacker takes control of critical nodes within the network, including nodes serving as fusion centers. In order to face this critical security thread, we propose a data aggregation scheme based on social learning, resembling the way in which agents make decisions in social networks. Our results suggest that social learning enables network resilience, even when a significant portion of the nodes have been compromised by the attacker. Finally, we show the suitability of our scheme to sensor networks by developing a lowcomplexity algorithm to facilitate the social learning data fusion rule in devices with restricted computational power.
I Introduction
Large distributed sensor networks typically provide surveillance services over extensive areas, such as activity monitoring in military or secure zones, protection of drinkable water tanks from chemical attacks, or intrusion detection [1, 2]. However, the reliability of these networks is in many cases limited due to the high vulnerability of the sensor nodes [3]. In reality, nodes are frequently located in unprotected locations and are susceptible to physical or cyber captures. Moreover, nodes are generally not tamperproof due to cost concerns, and their limited computing power, memory, and energy capabilities do not allow sophisticated cryptographic techniques.
One serious threat to the reliability of distributed surveillance is the data falsification or “Byzantine” attack, where an adversary takes control over a number of authenticated nodes [4]. Following the classic Byzantine Generals Problem [5], Byzantine nodes can generate false sensing data, exhibit arbitrary behaviour or collude in order to create a networked malfunction. The effect of data falsification attacks over distributed detection has been intensely studied, characterizing the impact over the detection performance and also proposing various defense mechanisms (c.f. [6] for an overview, and also [7, 8, 9]). These works focus in networks with star or tree topology, where the data is gathered in a special node called “fusion center” (FC) that is responsable for the final decision.
A key assumption in the literature is that the adversary can compromise regular sensor nodes but not the FC itself. However, in many scenarios the short range of the nodes’ transmissions force the FC to be installed in unsafe locations, being vulnerable to tampering as well. A tampered FC completely disables the detecting capabilities of the network, generating a single point of failure and hence becoming the weakest point of the system [10]. To address this serious security thread, this letter is novel in considering powerful topologyaware data falsification attacks, where the adversary knows the network topology and leverage this knowledge to take control of the most critical nodes of the network —either regular nodes or FCs. This represents a worstcase scenario, where the network structure has been disclosed e.g. from network tomography via traffic analysis[11].
The design of reliable distributed detection schemes is a challenging task. In effect, even though the distributed sensing literature is vast (see e.g. [1, 2] and references therein), the construction of optimal schemes is in general NPhard [12]. Moreover, although in many cases the optimal schemes can be characterized as a set of thresholds for likelihood functions, the determination of these thresholds is usually an intractable problem [13]. For example, symmetric thresholds can be suboptimal even for networks with similar sensors arranged in star topology [14], being only asymptotically optimal when the network size increases [13, 15]. Moreover, symmetric strategies are not suitable for more elaborate network topologies, and hence heuristic methods are usually necessary.
To deal with this dilemma, in this letter we propose a lowcomplexity data aggregation scheme based on social learning principles, which resembles social decisionsmaking processes while avoiding fusion center functions [16, 17, 18]. The scheme is a thresholdbased data fusion strategy related to the ones considered in [13]. However, its connection with social decisionmaking enables an intuitive understanding of its inner mechanisms, and also allows an efficient implementation that is suitable for the limited computational capabilities of a sensor node. For avoiding the security threads introduced by fusion centers, our scheme uses a tandem or serial topology [19, 20, 21, 22, 23]. Contrasting with the literature, our analysis does not focus on optimality issues of the data fusion but aims to illustrate how this scheme can enable network resilience against a powerful topologyaware data falsification attacker, even when a significant number of nodes have been compromised.
Ii System model and problem statement
Iia System model
We consider a network of sensor nodes that are deployed over an area where surveillance is needed. The output of the sensor of the th node is denoted by , taking values over a set that can be discrete or continuous. Based on these signals, the network needs to infer the value of the binary variable , with events and corresponding to the presence or absence of an attack, respectively. No knowledge about of the prior distribution of is assumed, as attacks are rare and might follow unpredictable patters.
We consider nodes with equal sensing capabilities, and hence assume that the signals are identically distributed. For the sake of tractability, it is assumed that the variables are conditionally independent^{*}^{*}*The conditional independency of sensor signals is satisfied when the sensor noise is due to local causes (e.g. thermal noise), but do not hold when there exist common noise sources (e.g. in the case of distributed acoustic sensors [24]). given the event , following a probability distribution denoted by . It is assumed that both and are absolutely continuous with respect to each other [25], i.e. no particular signal determines unequivocally. The loglikelihood ratio of these two distributions is therefore given by the logarithm of the corresponding RadonNikodym derivative ^{†}^{†}†When takes a finite number of values then , while if is a continuous random variable with conditional p.d.f. then ..
In addition to sensing hardware, each node is equipped with computing capability and a lowpower transceiver to transit and receive data. However, battery limitations impose severe restrictions over the communication bandwidth, and thus it is assumed that each node forward its data to others by broadcasting a binary variable . Note that these signals could be appended to wireless control packages and viceversa.
The nodes transmit their signals sequentially according to their indices. Due to the nature of wireless broadcasting, which might be overlooked in some security literatures, nearby transmissions can be overheard. Therefore, it is assumed that the th node can generate based on information provided by and . A strategy is a collection of functions such that . Although the burden of overhearing all the previously broadcasted signals can be reduced by designing smart network topologies and routing strategies, these networking functions are left for future studies.
The network operator collects the transmitted packages from a specific node labeled as , possibly employing unmanned ground or aerial vehicles that access a shared signal at a specific network location, or by using a shared communication channel. The network performance is quantified by the corresponding missdetection and false alarm rates, given by and , respectively.
Finally, it is assumed that Byzantine nodes are controlled by an adversary without being noticed by the network operator. The adversary can freely define the values of the binary signals transmitted by byzantine nodes in order to degrade the network performance. It is further assumed that the adversary is “topologyaware”, knowing the node sequence and the strategy that is in use. Therefore, the adversary could well control the most critical nodes in terms of network performance. However, the adversary has no knowledge about , as it can be chosen at runtime and changed regularly.
IiB Problem statement
Our goal is to develop a networkresilient strategy to mitigate the effect from a powerful topologyaware adversary when the network operator (i.e. defender) has no knowledge of the number of Byzantine nodes or other attack’s statistics. Note that in most surveillance applications missdetections are more important than false alarms, being difficult to estimate the cost of the worstcase scenario. Therefore, the system performance is evaluated following the NeymanPearson criteria by setting an allowable false alarm rate and focusing on the achievable missdetection rate.
Most signal processing techniques for distributed detection rely on a FC(s) that gather data and generate estimators, and sensor nodes that provide informative signals to them [26]. Intuitively, if is influenced by with , this would “doublecount” the information provided by . Therefore, in order to guarantee diversity, traditional distributed detection schemes choose to ignore previously broadcasted signals. However, as nodes don’t perform any data aggregation, each of their shared signals are not, by themselves, good estimations of the target variable. This generates a single point of failure in the network, as if the adversary compromises the FC(s) then the only accurate estimator that exist within the network is lost and hence the inference process fails.
Iii Social learning as a data aggregation scheme
Iiia Data fusion rule
Social learning models supply new directions to analyze the sequential decision processes where agents combine personal information and peers’ opinions [18]. Applied to a sensor network, each node can be considered as an agent that decides the presence of attacks based on measurements and overheard signals from other nodes. In this letter we consider rational agents that follow a Bayesian strategy, denoted as , which can be described by
(1) 
Above, is a cost assigned to the decision when , which can be engineered in order to match the relevance of missdetections and false alarms [27]. Moreover, by noting that is influenced only by , the conditional independency of the signals imply that and are also conditionally independent given . Therefore, using the Bayes rule, a direct calculation shows that (1) can be rewritten as
(2) 
where and is the loglikelihood ratio of . As the prior distribution of is usually unknown, the network operator needs to select the lowest value of that satisfies the required false alarm rate given by the NeymanPearson criteria (c.f. Section IIB).
As in a realistic scenario the statistical properties of the potential topologyaware data falsification attacks are not available to the defender, our approach is to make each node to follow a bayesian strategy ignoring the potential attack. Such an approach has three attractive features:

Provides a computation rule that does not need to adapt to different attacker’s profiles.

Minimizes the average cost when no attacks take place [27].

Enables network resilience (c.f. Section IIIC and IV).
Clearly Byzantine nodes do not follow (2), as their interest is to degrade the network performance. Let us denote as the set of indices of the Byzantine nodes and the cardinality of . As events are much more frequent than , any abnormal increase of the false alarm rate would be easily noted and hence provides no benefit to the adversary. Therefore, a rational strategy for the adversary is to increase the missdetection rate by forcing for all .
IiiB An algorithm for computing the social loglikelihood
The only challenge for implementing (2) in a sensor node as a data fusion rule is to have an efficient algorithm for computing . For finding such an algorithm, a direct application of the chain rule of probabilities shows that
with the understanding that is null. Then, following the discussion presented in Section IIIA, we compute ignoring potential attacks. Assuming that the th node is not a Byzantine node, one obtains
(3) 
where is the c.d.f. of the variable conditioned to . Using the above results, it can be shown that
where is defined as
Leveraging above derivations, we develop Algorithm 1 as a simple iterative procedure for computing . Note that the algorithm’s complexity scales gracefully, as it grows linearly with the length of . Moreover, the algorithm does not need any information about potential attack, only requiring knowledge of the signals statistics as given by .
IiiC Information cascades as strength or weakness
The term “social learning” refers to the fact that the accuracy of as a predictor of grows with , and hence is usually chosen as one of the last nodes in the decision sequence. However, as the number of shared signals grows the increasing “social pressure” can make the nodes to ignore their individual measurements and blindly follow the dominant choice [16]. This phenomenon, known as information cascade, introduces severe limitations in the achievable asymptotic performance of social learning [17].
A positive effect of information cascades, which has been overlooked before, is to make a large number of agents/nodes to hold equally qualified estimator(s), generating many locations where the network operator can collect and aggregate the data. This property avoids the existence of a single point of failure to robustly face topologyaware attacks. An attempt to blindly guess in order to tamper the node would be inefficient due to the large number of potential candidates.
However, an attacker can also leverage the information cascade phenomenon. A rational attacking strategy is to tamper the first nodes of the decision sequence, setting their signals in order to push the networked decisions towards a misleading cascade^{‡}^{‡}‡Intuitively, it is more likely for a node to follow a misleading cascade if all the previous nodes have been tampered and act homogeneously, than for a node of higher index if the previous decisions are nonhomogeneous. . If is large enough an information cascade can be triggered almost surely, making the learning process to fail. However, if is not large enough then the network may undo the initial pool of wrong opinions and end up triggering a correct cascade anyway. This capability of “resilience” is explored in the next section.
Iv Proof of concept
To illustrate the application of social learning against topologyaware data falsification attacks, we consider a network of randomly distributed sensors over a sensitive area following a Poisson Point process (PPP). The ratio of the area that is within the range of each sensor is denoted by . If attacks occur uniformly over the surveilled area, then is also the probability of an attack taking place under the coverage area of a particular sensor is. It is further assumed that each node is equipped with a binary sensor (i.e. ), whose probability of generating a wrong measurement due to electronic and other imperfections is denoted by .
For finding the posterior distributions of , first note that , as a sensor falsealarm can only be due to noise. The probability of detecting an event is given by
Therefore, the sensor missdetection rate is . The signal loglikehood is hence given by
Note that , which is consequence of and . Correspondingly, the c.d.f. of is
We studied a network composed by sensor nodes, generating sequentially following (3) and using Algorithm 1 to compute . Following Section IIIC, it is assumed that a topologyaware attacker tampered the first nodes of the decision sequence and uses them to increase the missdetection rate by setting for . Finally, in order to favour the reduction of missdetections over false alarms, is chosen as is the lowest value that still allows a nontrivial inference process.For each set of parameter values, simulation runs are performed.
Simulations demonstrate that the proposed scheme enables strong network resilience in this scenario, allowing the sensor network to maintain a low missdetection rate even in the presence of an important number of Byzantine nodes (see Figure 1). In contrast, f a traditional distributed detection scheme is used, a topologyaware attacker can cause a missdetection rate of by just compromising the few nodes that perform data aggregation, i.e. the FC(s). Figure 1 shows that nodes aggregating data by social learning can achieve an average asymptotic missdetection rate of less than even when of the most critical nodes are under the control of the attacker, having some resemblance with the wellknown 1/3 threshold of the Byzantine generals problem [5]. Moreover, Figure 1 also suggest that our scheme can still provide network resilience within the most unfavorable cases.
Interestingly, the data aggregation is performed node by node independently of the network size. Hence, in a very large network the first 200 nodes would exhibit the same performance as the one shown in Figure 1. Adding more nodes may not introduce significant improvements to the asymptotic performance, as the asymptotic estimator is copied by later nodes following an information cascade. Nevertheless, in a large network information cascades provide the fundamental benefit of creating a large number of nodes from where the network operator can access aggregated data.
The network resilience provided by our scheme is influenced by the sensor statistics, which are determined by and (see Figure 2). Intuitively, the achievable missdetection rate under a low number of Byzantine nodes is reduced by a smaller or larger . Furthermore, our numerical results suggest that the number of Byzantine nodes affects the missdetection rate exponentially with a rate of growth inversely proportional to , as nodes with smaller trust each otherâs decisions less and hence are less affected by “social pressure”. Consequently, it is desirable to deploy sensors with smaller probability of malfunction () than larger coverage (), as a larger coverage makes the network more vulnerable to Byzantine nodes and subsequent misleading information cascades.
Our scheme does not require knowledge about attack statistics, being wellsuited for practical scenarios as operation in large scale or mobile scenarios suggest dynamically changing topology. Moreover, simulations show that if the adversary tamper not the initial nodes but a different set of the same cardinality, then the attack has less impact over the system performance. This suggests that our scheme can provide further resilience against attackers who are not topologyaware.
References
 [1] V. V. Veeravalli and P. K. Varshney, “Distributed inference in wireless sensor networks,” Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 370, no. 1958, pp. 100–117, 2012.
 [2] S. Barbarossa, S. Sardellitti, and P. D. Lorenzo, Distributed Detection and Estimation in Wireless Sensor Networks. Academic Press Library in Signal Processing, Vol. 2, Communications and Radar Signal Processing, Oct. 2013, vol. 2, pp. 329–408.
 [3] E. Shi and A. Perrig, “Designing secure sensor networks,” IEEE Wireless Communications, vol. 11, no. 6, pp. 38–43, 2004.
 [4] S. Marano, V. Matta, and L. Tong, “Distributed detection in the presence of byzantine attacks,” IEEE Transactions on Signal Processing, vol. 57, no. 1, pp. 16–29, 2009.
 [5] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982.
 [6] A. Vempaty, L. Tong, and P. K. Varshney, “Distributed inference with byzantine data: Stateoftheart review on data falsification attacks,” IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 65–75, 2013.
 [7] V. S. S. Nadendla, Y. S. Han, and P. K. Varshney, “Distributed inference with mary quantized data in the presence of byzantine attacks,” IEEE Transactions on Signal Processing, vol. 62, no. 10, pp. 2681–2695, May 2014.
 [8] J. Zhang, R. S. Blum, X. Lu, and D. Conus, “Asymptotically optimum distributed estimation in the presence of attacks,” IEEE Transactions on Signal Processing, vol. 63, no. 5, pp. 1086–1101, March 2015.
 [9] B. Kailkhura, Y. S. Han, S. Brahma, and P. K. Varshney, “Distributed bayesian detection in the presence of byzantine data,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5250–5263, Oct 2015.
 [10] B. Parno, A. Perrig, and V. Gligor, “Distributed detection of node replication attacks in sensor networks,” in 2005 IEEE Symposium on Security and Privacy (S&P’05). IEEE, 2005, pp. 49–63.
 [11] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network tomography: recent developments,” Statistical science, pp. 499–517, 2004.
 [12] J. Tsitsiklis and M. Athans, “On the complexity of decentralized decision making and detection problems,” IEEE Transactions on Automatic Control, vol. 30, no. 5, pp. 440–446, 1985.
 [13] J. N. Tsitsiklis et al., “Decentralized detection,” Advances in Statistical Signal Processing, vol. 2, no. 2, pp. 297–344, 1993.
 [14] D. Warren and P. Willett, “Optimum quantization for detector fusion: some proofs, examples, and pathology,” Journal of the Franklin Institute, vol. 336, no. 2, pp. 323–359, 1999.
 [15] J.F. Chamberland and V. V. Veeravalli, “Asymptotic results for decentralized detection in power constrained wireless sensor networks,” IEEE Journal on selected areas in communications, vol. 22, no. 6, pp. 1007–1015, 2004.
 [16] S. Bikhchandani, D. Hirshleifer, and I. Welch, “A theory of fads, fashion, custom, and cultural change as informational cascades,” Journal of political Economy, pp. 992–1026, 1992.
 [17] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, “Bayesian learning in social networks,” The Review of Economic Studies, vol. 78, no. 4, pp. 1201–1236, 2011.
 [18] V. Krishnamurthy and H. V. Poor, “Social learning and bayesian games in multiagent signal processing: How do local and global decision makers interact?” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 43–57, 2013.
 [19] R. Viswanathan, S. C. Thomopoulos, and R. Tumuluri, “Optimal serial distributed decision fusion,” IEEE Transactions on Aerospace and Electronic Systems, vol. 24, no. 4, pp. 366–376, 1988.
 [20] J. D. Papastavrou and M. Athans, “Distributed detection by a large team of sensors in tandem,” IEEE Transactions on Aerospace and Electronic Systems, vol. 28, no. 3, pp. 639–653, 1992.
 [21] P. F. Swaszek, “On the performance of serial networks in distributed detection,” IEEE transactions on aerospace and electronic systems, vol. 29, no. 1, pp. 254–260, 1993.
 [22] R. Viswanathan and P. K. Varshney, “Distributed detection with multiple sensors i. fundamentals,” Proceedings of the IEEE, vol. 85, no. 1, pp. 54–63, 1997.
 [23] I. Bahceci, G. AlRegib, and Y. Altunbasak, “Serial distributed detection for wireless sensor networks,” in Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on. IEEE, 2005, pp. 830–834.
 [24] A. Bertrand, “Applications and trends in wireless acoustic sensor networks: A signal processing perspective,” in 2011 18th IEEE Symposium on Communications and Vehicular Technology in the Benelux (SCVT), Nov 2011, pp. 1–6.
 [25] M. Loeve, Probability Theory I. Springer, 1978.
 [26] R. Rajagopalan and P. K. Varshney, “Dataaggregation techniques in sensor networks: A survey,” IEEE Communications Surveys Tutorials, vol. 8, no. 4, pp. 48–63, Fourth 2006.
 [27] H. V. Poor, An introduction to signal detection and estimation. Springer Science & Business Media, 2013.