An Information Theoretic Location Verification System for Wireless Networks
As location-based applications become ubiquitous in emerging wireless networks, Location Verification Systems (LVS) are of growing importance. In this paper we propose, for the first time, a rigorous information-theoretic framework for an LVS. The theoretical framework we develop illustrates how the threshold used in the detection of a spoofed location can be optimized in terms of the mutual information between the input and output data of the LVS. In order to verify the legitimacy of our analytical framework we have carried out detailed numerical simulations. Our simulations mimic the practical scenario where a system deployed using our framework must make a binary Yes/No “malicious decision” to each snapshot of the signal strength values obtained by base stations. The comparison between simulation and analysis shows excellent agreement. Our optimized LVS framework provides a defence against location spoofing attacks in emerging wireless networks such as those envisioned for Intelligent Transport Systems, where verification of location information is of paramount importance.
As Location-Based Services become widely deployed, the importance of verifying the location information being fed into the location service is becoming a critical security issue. The main difference between a Location Verification System (LVS) and a localization system is that we are confronted by some a priori information, such as a claimed position in the LVS [1, 4, 6, 2, 3, 5, 7]. In the context of a main target application of our system, namely Intelligent Transport Systems (ITS), the issue of location verification has attracted a considerable amount of recent attention [8, 9, 10, 11, 12, 13]. Normally, in order to infer whether a network user or node is malicious (attempting to spoof location) or legitimate (actually at the claimed location), we have to set a threshold for the LVS. This threshold is set so as to obtain low false positive rates for legitimate users and high detection rates for malicious users. As such, the specific value of the threshold will directly affect the performance of an LVS.
One traditional approach to set the threshold of an LVS is to search for a tradeoff between false positive rate and detection rate according to receiver operating characteristic (ROC) curve . Another technique is to obtain the false positive and detection rates through empirical training data and minimize specific functions of the two rates to set the threshold   . For example, in , the sum of false positive and false negative rates were minimized. However, although successful in many scenarios, the approaches mentioned above do not specify in any formal sense what the ‘optimal’ threshold value of an LVS should be. In addition, in our key target application of our LVS, namely ITS, it is not practical to collect the required training data due to the variable circumstances.
The main point of this paper is to develop for the first time an information theoretic framework that will allow us to formally set the optimal threshold of an LVS. In order to do this, we first define a threshold based on the squared Mahalanobis distance, which utilizes the Fisher Information Matrix (FIM) associated with the location information metrics utilized by the LVS. To optimize the threshold, the Intrusion Detection Capability (IDC) proposed by Gu  for an Intrusion Detection System (IDS) will be utilized. The IDC is the ratio of the reduction of uncertainty of the IDS input given the output. As such, the IDC measures the capability of an IDS to classify the input events correctly. A larger IDC means that the LVS has an improved capability of classifying users as malicious or legitimate accurately. From an information theoretic point of view the optimal threshold is the value that maximizes the IDC.
The rest of this paper is organized as follows. Section 2 presents the system model, which details the observation model and the threat model we utilize. In section 3, the threshold is defined in terms of the FIM associated with the location metrics. Section 3 also provides the techniques used to determine the false positive and detection rates, which are utilized to derive the IDC. Section 4 provides the details of how the IDC is used in the optimizing the threshold. Simulation results which validate our new analytical LVS framework are presented in Section 5. Section 6 concludes and discusses some future directions.
2 System model
2.1 A Priori Information: Claimed Position
Let us assume a user could obtain its true position, , from its localization equipment (, GPS), and that the localization error is zero. Thus, a legitimate user’s claimed (reported) position, , is exactly the same as its true position . However, a malicious user will falsify (spoof) its claimed position in an attempt to fool the LVS . We denote the legitimate and malicious hypothesis as and , respectively, and the a priori information can be summarized as
2.2 Observation Model based on
Although the framework we develop can be built on any location information metric, for purposes of illustration in this work we will solely investigate the case where the location information metric is the Received Signal Strength (RSS) obtained by a Base Station (BS) from a user. The RSS of the -th BS from a legitimate user, , is assumed to be given by
where is a reference received power, is the reference distance, is the path loss exponent, is a zero-mean normal random variable with variance , the Euclidean distance of the -th BS to the user’s true position is
where is the location of the -th BS, and is the number of BSs. For in eq. (1), in eq. (2) can be replaced by , where is the Euclidean distance of the -th BS to the user’s claimed position and can be expressed as
2.3 Threat Model (Observation Model based on )
Let us assume a malicious user knows the positions of all BSs and is able to boost its transmit power according to its claimed positions. The RSS of the -th BS from a malicious user, , can be written as
where is the boost power. We assume the malicious user is equipped with only one omni-antenna, and thus is constant for all the BSs.
In the following, one strategy to set a boost value of for the malicious user will be provided. A malicious user’s claimed position is determined by its purpose and LVS parameters. Constrained by the positions of all BSs, the spoofed observations are not exactly the same as the ideal observations calculated according to its claimed position as follows
However, the malicious user would like to spoof the observations as similar as possible to the ideal observations . Thus, it will set a value of to minimize the divergence between and . This divergence can be defined by the Mean Square Error (MSE) as follows
where is the expectation with respect to all the observations. Then, the value of can be expressed as . Taking the first derivative of with respect to and setting it to zero, we can obtain as
In the above we use instead of in the equations related to to avoid confusion them with the observation model. Substituting into eq. (3), the threat model (observation model based on ) can be rewritten as
Eq. (4) is the general threat model based on RSS, but it is not practical since a malicious user’s true position is unknown. We can approximate the threat model by assuming follows a distribution. Here, due to the limited space, let us assume a malicious user has an approximate infinite distance away from all BSs to facilitate the LVS (the more general case is discussed later). Given this assumption, all the BSs distance’s from the user converge to one value. That is, the distance of a malicious user’s true position to every BS is nearly a constant number , , . Therefore, the term can be rewritten as
Based on the above analysis, the threat model can be expressed as
3 Threshold and Two Rates
In this section, we first present our threshold based on the squared Mahalanobis distance, which utilizes the inverse FIM. Then, we provide techniques used to determine the false positive rate and the detection rate of our LVS.
The threshold is defined in terms of the squared Mahalanobis distance of an estimated position vector .
where is the mean of and is the covariance matrix of . According to the definition of , it is a dimensionless scalar and involves not only the Euclidean distance but also the geometric information. In an LVS, we are interested in the ‘distance’ between a user’s estimated position and its claimed position . Thus, we will use instead of to calculate . In addition, without any a priori results from a localization algorithms, we can not obtain any estimate of the covariance matrix . Therefore, we will utilize the inverse FIM, , to approximate . With this, the squared Mahalanobis distance in our LVS can be written as
where and is the FIM to be calculated as given below. In practice, the LVS works on the observation model based on , and the likelihood function of received powers can be obtained using eq. (2). Let us assume the observations received by different BSs are independent, then the log-likelihood function can be expressed as
where is the -dimension observation vector and the constant number is
Then, we can calculate the terms of the FIM through
where represents the expectation operation with respect to all observations. After some algebra, the FIM can be written as ,
After setting a threshold parameter for the squared Mahalanobis distance, the decision rule of an LVS (i.e. a malicious user or not) can be expressed as follows
Note that, we are able to transform any covariance matrix into a diagonal matrix by rotating the position vector . Thus, the general form of can be expressed as
Then, the threshold can be encapsulated within the equation for an ellipse as follows
Therefore, the threshold can also be understood as an ellipse, denoted as , which is determined by extending the error ellipse provided by the FIM with the threshold parameter .
Based on the above analysis, the overall process of an LVS includes four steps
Collect observations of the RSS received from a user by each BS;
Apply a localization algorithm to obtain an estimated position ;
Calculate the squared Mahalanobis distance of to the user’s claimed position ;
Infer if the user is legitimate or malicious according to the decision rule in eq. (6).
In practice, the above are all the steps of our LVS. However, to evaluate an LVS, false positive and detection rates, which are functions of the threshold parameter and other LVS parameters, are always investigated in theory. In the following subsections, we provide techniques used to determine false positive and detection rates in order to optimize the threshold parameter .
3.2 False Positive Rate
The false positive rate is the probability by which legitimate users are judged as malicious ones. For a legitimate user, . Then, in the 2-D physical space, the false positive rate can be expressed as .
In fact, the true positive rate () is a well known metric that underlies the performance of unbiased localization algorithms. For example, in the 2-D physical space, it states that the probability by which an estimated position lies within the ellipse with is no more than .
3.3 Detection Rate
The detection rate is the probability that malicious users are recognized as malicious ones. In order to calculate , we have to obtain the posterior probability density function (pdf) for a location given some RSS observation vector, which can be expressed as
where is a general location, and is the observation vector. Of course, if the user is malicious the observed signal vector will be one that has undergone a boost as described by eq. (5). Let us denote the average value of this spoofed observation vector as . Given this, the likelihood function can be derived from eq. (2). If we take to be a uniform variable vector, then the detection rate can be calculated as
where is a normalizing constant that can be written as
Numerical methods are utilized to solve the above integral equation for since there is no closed form solution. Based on the above analysis, is also a function of .
As an aside it is worth mentioning that the false positive rate can also be written in a similar form as follows
where is the average non-spoofed observation vector and
4 Optimization of the Threshold
In this section we will optimize the value of the threshold by maximizing the IDC, which is a function of the false positive rate , detection rate and the base rate (the a priori probability of intrusion in the input event data). That is, our optimization procedure is to find the value of that maximizes the IDC. From an information theoretic point of view, the IDC is a metric that measures the capability of an IDS to classify the input events correctly and is defined as 
where is the entropy of the input data , is the mutual information of input data and output data , and is the conditional entropy. Mutual information measures the reduction of uncertainty of the input given the output . Thus, is the ratio of the reduction of uncertainty of the input given the output. Its value range is [0, 1]. A larger value means that the IDS has an improved capability of classifying input events accurately.
Our LVS can be modeled as an IDS whose input data are the claimed positions, and the output data are the binary decisions. Then, represents an actual claimed position from a legitimate user, represents a spoofed claimed position from a malicious user, infers the user is legitimate, and indicates the user is malicious. Accordingly, the false positive rate is the probability , and detection rate is the probability . Therefore, the optimal value of is the one that maximizes the value of the of the LVS.
The realizations of input and output data are denoted as and , respectively. Given the base rate , the entropy of the input data can be written as 
The conditional entropy can be expressed as
Numerical methods are applied in order to search for the optimal value of since there is no closed form for . In the following we refer to this optimal value as .
5 Simulation Result
Adopting a Maximum Likelihood Estimator (MLE) in our location estimation algorithm we now verify, via detailed simulations, our previous analysis. The theoretical and simulated , and , all of which are dependent on , are utilized in order to find the value that maximizes .
5.1 Simulation Set-up
The simulation settings are as follows:
BSs are deployed in a square field and the legitimate and honest users can communicate with all BSs;
The claimed positions of honest and malicious users are the same, denoted ;
observations are collected from each base station;
The BSs are set at fixed positions (we investigate a range of fixed locations);
The results shown are averaged over 1,000 Monte Carlo realizations of the estimated position, and where the base rate for all the simulations.
5.2 with Different Values of
As shown in Fig.1, the solid lines are the theoretical , and while the symbols are the simulated , and . The simulated values of and are calculated directly according to the realizations of estimated positions, and then the simulated is obtained from eq. (7). The simulation parameters are shown in the figure caption and the theoretical optimal value can be seen to be (note that in all the figures explicitly shown in this paper the four BSs are fixed at the corners of a 200m x 200m grid). The comparison between simulation and analysis shows excellent agreement. Beyond the simulations explicitly shown in Fig.1, we have investigated a range of other fixed BSs positions (up to 10 BSs whose positions are randomly selected), and these simulation also show excellent agreement with simulations. Collectively, these simulation results verify the analysis we have provided earlier.
The simulation results with a malicious user having a certain distance to all BSs are shown in Fig.2. The true position of the malicious user in the simulations is set at 10km away from the claimed position. Although the simulation and theoretical values of , and do not match with each other exactly (the theoretical analysis approximates the user as being at infinity), the simulation and theoretical optimal values are effectively the same. We find this result holds down to distance where the malicious user is a few km away from the claimed position. This shows that our framework is tenable when the assumption that malicious user is infinitely far away is relaxed down to the few km range.
In order to verify the with the optimal value is correct, we also simulated for a range of . Fig. 3 shows such results for the case where the user malicious user if effectively at infinity. Here the optimal value is derived from the proposed theoretical analysis, but in the simulations the threshold is set to the other values of T shown ( and ). From the results shown we can see that these other values do provide simulated false positive and detection rates which result in lower values of (and therefore sub-optimal performance), which once again verifies the robustness of our analytical framework. Fig. 4 shows the same results except that the malicious user is again set at 10km away from the claimed position. Again we see a validation of our analysis.
6 Conclusion and Future Work
In this paper, we have proposed a novel and rigorous information theoretic framework for an LVS. The theoretical framework we have developed shows how the value of the threshold used in the detection of a spoofed location can be optimized in terms of the mutual information between the input and output data. In order to verify the legitimacy of our framework we have carried out detailed numerical simulations of our framework under the assumption of an idealized threat model in which the malicious user is far enough from the claimed location such that his boosted signal strength results in all BSs receiving the same RSS (modulo noise). Our numerical simulations mimic the practical scenario where a system deployed using our framework must make a binary Yes/No “malicious decision” to each snapshot of RSS values obtained by the BSs. The comparison between simulation and analysis shows excellent agreement. Other simulations where we modify the approximation of constant RSS at BSs also showed very good agreement with analysis.
The work described in this paper formalises the performance of an optimal LVS system under the simplest (and perhaps most likely scenario), where a single malicious user attempts to spoof his location to a wider wireless network. The practical scenario we had in mind whilst carrying out our simulations was in an ITS where another vehicle is attempting to provide falsified location information the wider vehicular network. Future work related our new framework will include the formal inclusion of more sophisticated threat models, where the malicious user is both closer to the claimed location and has the use of colluding adversaries. It is well known that no LVS can be made foolproof under the colluding adversary scenario,
This work has been supported by the University of New South Wales, and the Australian Research Council (ARC).
- Note that an equivalent description of our LVS, which does not introduce the Mahabalotnis distance, can be described in terms of the Cramer-Rao Lower Bound . In this alternative description, an error ellipse is derived directly from the FIM, with the scale of the ellipse being set by and the orientation being set by the eigenvectors of the inverse FIM. For different values of the threshold the ellipse size scales as , and the detection algorithm decides the user is malicious if the estimated position returned by the location MLE lies outside of the ellipse.
- Note that location verification in the context of quantum communications systems have previously been considered e.g. ,, , and it has been argued that such systems are able to securely verify a location under all known threat models  - although see  who argue otherwise. It is undisputed that classical communications alone cannot achieve secure location verification under all known threat models.
- A. Vora, M. Nesterenko, “Secure location verification ssing radio broadcast,” IEEE Trans. on Dependable and Secure Computing, vol. 3, no. 4, pp. 377-385, 2006.
- Y. Sheng, K. Tan, G. Chen, D. Kotz, and A. Campbell, “Detecting 802.11 MAC-layer spoofing using received signal strength,” in Proc. IEE INFOCOM, Apr. 2008, pp. 1768-1776.
- R. A. Malaney, “Wireless intrusion detection using tracking verification,” in Proc. IEEE ICC, Glasgow, June 2007, pp. 1558-1563.
- Y. Chen, J. Yang, W. Trappe, and R. P. Martin, “Detecting and localizing identity-based attacks in wireless and sensor networks,” IEEE Trans. Veh. Technol., vol. 59, no. 5, pp. 2418-2434, Jun. 2010.
- S. apkun, K. B. Rasmussen, M. agalj, and M. Srivastava, “Secure location verification with hidden and mobile base station,” IEEE Trans. Mobile Comput., vol. 7, no. 4, pp. 470-483, Apr. 2008.
- Z. Yu, L. Zang, and W. Trappe, “Evaluation of localization attacks on power-modulated challenge-response systems,” IEEE Trans. Inf. Forensics Security, vol. 3, no. 2, pp. 259-272, Jun. 2008.
- L. Dawei, L. Moon-Chuen, and W. Dan, “A node-to-node location verification method,” IEEE Trans. Ind. Electron, vol. 57, pp. 1526-1537, May. 2010.
- O. Abumansoor, A. Boukerche, “A secure cooperative approach for nonline-of-sight location verification in VANET,” IEEE Trans. Veh. Technol., vol. 61, pp. 275-285, Jan. 2012.
- T. Leinmller, E. Schoch, and F. Kargl, “Position verification approaches for vehicular ad hoc networks,” IEEE Wireless Commun., vol. 13, no. 5, pp. 16-21, Oct. 2006.
- G. Yan, S. Olairu, and M. C. Weigle, “Providing VANET security through active position detection,” Comput. Commun., vol. 31, no. 12, pp. 2883-2897, Jul. 2008.
- N. Sastry, U. Shankar, and D. Wagner, “Secure verification of location claims,” in Proc. ACM Workshop Wireless Security (WiSe ’03), Sept. 2003, pp. 1-10.
- J.-H. Song, V. W. S. Wong, and V. C. M. Leung, “Secure location verification for vehicular ad-hoc networks,” in Proc. IEEE GLOBECOM, Dec. 2008, pp. 1-5.
- B. Xiao, B. Yu, and C. Gao, “Detection and localization of Sybil nodes in VANETs,” in Proc. Workshop DIWANS, Sep. 2006, pp. 1-8.
- G. Gu, P. Fogla, D. Dagon, W. Lee, and B. Skoric, “Measuring intrusion detection capability: An information-theoretic approach,” in Proc. ASIACCS ’06, Taipei, Taiwan, March 2006.
- D. Ververidis and C. Kotropoulos, “Gaussian mixture modeling by exploiting the Mahanalobis distance,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 2797-2811, Jul. 2008.
- R. A. Malaney, “Nuisance parameters and location accuracy in log-normal fading model,” IEEE Trans. Wireless Commun., vol. 6, no. 3, March 2007.
- M. I. Ribeiro, “Gaussian probability density functions: Properties and Error Characterization,” Instituto Superior Tcnico, Lisboa, Portugal, Tech. Rep. 1049-001, Feb. 2004.
- G. Gu, P. Fogla, D. Dagon, W. Lee, and B. Skoric, “An information-theoretic measure of intrusion detection capability,” College of Computing, Georgia Tech, Tech. Rep. GIT-CC-05-10, 2005.
- A. Kent, W. Munro, T. Spiller and R. Beausoleil, “Tagging Systems,” US Patent, Pub. No. US2006/0022832, 2006.
- R. A. Malaney, “Location-dependent communications using quantum entanglement,” Phys. Rev. A 81, 042319, 2010.
- R. A. Malaney, “Quantum location verification in noisy channels,” in Proc. IEEE GLOBECOM, Dec. 2010, pp. 1-6.
- R. A. Malaney, “Location verification in quantum communications,” WIPO Patent, Pub. No. WO/2011/044629, 2011.
- H. Buhrman, N. Chandran, S. Fehr, R. Gelles, V. Goyal, R. Ostrovsky and C. Schaffner, “Position-based quantum cryptography: impossibility and constructions,” In Advances in Cryptology, vol. 6841 of Lecture Notes in Computer Science, pp. 429-446, Springer-Verlag, 2011.