Privacy Preservation in LocationBased Services: A Novel Metric and Attack Model
Abstract
Recent years have seen rising needs for locationbased services in our everyday life. Aside from the many advantages provided by these services, they have caused serious concerns regarding the location privacy of users. An adversary such as an untrusted locationbased server can monitor the queried locations by a user to infer critical information such as the user’s home address, health conditions, shopping habits, etc. To address this issue, dummybased algorithms have been developed to increase the anonymity of users, and thus, protecting their privacy. Unfortunately, the existing algorithms only consider a limited amount of side information known by an adversary which may face more serious challenges in practice. In this paper, we incorporate a new type of side information based on consecutive location changes of users and propose a new metric called transitionentropy to investigate the location privacy preservation, followed by two algorithms to improve the transitionentropy for a given dummy generation algorithm. Then, we develop an attack model based on the Viterbi algorithm which can significantly threaten the location privacy of the users. Next, in order to protect the users from Viterbi attack, we propose an algorithm called robust dummy generation (RDG) which can resist against the Viterbi attack while maintaining a high performance in terms of the privacy metrics introduced in the paper. All the algorithms are applied and analyzed on a reallife dataset.
I Introduction
With the ubiquitous use of smartphones and social networks, locationbased services (LBSs) have become an essential part of the contemporary society. The users of smart devices can simply download locationbased applications and query the information from the LBS provider. For example, LBSs offered by companies like Alibaba, Apple, and Google can be used to find nearby restaurants, track the parcels, and provide personalized weather notifications. The annual market for LBSs is expected to reach USD Billion by , with an annual growth rate of [1].
In spite of countless advantages of LBSs, the privacy issues associated with the user locations have raised many concerns in our society. An untrusted server can collect the location data of users and analyze it to learn sensitive information such as the type of queries submitted, shopping habits of users, and the address of users’ properties or workplaces. Such information can be easily abused by the server or disclosed to other parties. Therefore, it is of great importance to devise new ways to preserve the location privacy of users defined as ”the ability to prevent other parties from learning one’s current or past locations” [2].
The techniques to address the threats to the location privacy of users have attracted much attention among researchers [3, 4, 5, 6, 2]. Most of the literature is based on an approach called anonymity [7]. Using this criterion, the release of a location is said to provide anonymity, if the real location of any user is not distinguishable from at least other locations. Initially, the approach to hide the location of the user was conducted using a trusted anonymization server [8], but later on, due to the shortcomings of this approach such as the anonymizer becoming the bottleneck itself, the use of dummy locations to achieve the anonymity was proposed in [9]. Since then, the researchers have strived to develop dummy generation algorithms to preserve the anonymity for users.
The principal idea behind the dummy generation algorithms is to generate dummy locations aside from the real location of the user and submitting them all together to the LBS server while asking for a query from the LBS provider. Thus, it makes it difficult for an untrusted LBS provider, or socalled the adversary, to identify the real location of the user. The groundwork in this field was laid by the authors in [9]. They generated the dummies randomly throughout the map and evolved them as users move. Followed by this work, the authors in [10] and [11] proposed to choose the candidate dummies from a virtual circle or grid constructed around the current location of the user. Unfortunately, in all of the mentioned works, the fact that the adversary might have some side information which can rule out the dummies or reveal the real location of the user was overlooked.
One important piece of side information which can be exploited by the adversary is the query probability of the locations across the map. The adversary can utilize the recorded data and infer the number of times that the users have queried over various locations on the map. Using this information, the adversary can calculate the query probability of each location, and then, identify the dummy locations according to the history of interests in locations. For instance, if a dummy has been chosen on a lake, where the query probability is basically close to zero, the adversary will then know with a high likelihood that such queried location is a dummy. And therefore, such naive selection of dummy locations compromises the location privacy of the user. To solve this issue, an enhanced algorithm was proposed by [12], referred to as the dummylocation selection (DLS) algorithm. Basically speaking, the authors used an entropy metric [13] to evaluate the queries submitted in different locations and generated the dummies in a way to maximize the entropy.
Although the DLS algorithm is promising for a stationary set of the queried locations including the real location and its associated dummies, the algorithm fails to address the privacy issues caused by the consecutive queries made to the LBS provider. In more detail, the authors have limited the side information to queries submitted in different locations but overlooked the fact the adversary has also access to the trajectories, and consequently, the number of times the paths between locations have been traveled. Having access to such extra side information, the adversary can expose the dummies and compromise the anonymity of the users. For further explanation, a toy example has been provided in Fig. 1,
where we show a user moving from location to location with set to two. The associated dummies of the real locations and are denoted by and , respectively. The dummies in each location set are generated using the DLS algorithm, hence, they have a similar probability of being selected. The numbers on the directed edges indicate the number of times that users have queried the end location of the edge right after asking about the starting point of the edge. For instance, the users have queried location for times immediately after location . According to the DLS algorithm, the anonymity requirement has been preserved for each location. However, let us look at the four paths connecting the two sets of locations together and consider the number of times that each path has been inquired. It can be seen from Fig. 1 that location has been inquired for times after locations and whereas location has only received times of inquiries. Therefore, the adversary can infer with a high likelihood that the real location is possibly location , and thus, compromise the location privacy of the user.
The main contributions of this paper follow.

We quantify the currently existing metric and name it cellentropy and propose a new metric called transition for two consecutive queries which considers the introduced side information based on transitions of the users.

We expand the transitionentropy metric for trajectories followed by developing two algorithms which can be applied on any of the existing dummy generation methods to improve the transition entropy.

We propose an attack model based on the Viterbi algorithm and develop an algorithm to improve the resilience against the attack while maintaining the high performance in terms of cellentropy and transitionentropy

We analyze the performance of the proposed metrics and algorithms on a reallife dataset.
The rest of the paper is organized as follows. We start by explaining the existing works in literature in Section II. Section III describes the system model used throughout the paper including the system architecture, the adversary model, and the side information used by the adversary. In section IV, we introduce our proposed metrics followed by explaining the proposed attack model in section V. Next, the proposed algorithms are illustrated in section VI. Finally, the analysis of the proposed metrics and algorithms is provided in section VII, and we conclude our work in section VIII.
Ii Related works
Anonymity is defined as ”the state of being not identifiable within a set of subjects, the anonymity set” [14]. Also, the location of a user is said to be anonymous if it is not distinguishable from at other user locations [15]. To obtain anonymity for users several approaches have been proposed in which we have identified four broad categories: location cloaking, mixzones, pseudonyms, and dummy aided algorithms.
The research on location cloaking was initiated by Gruteser and Grunwald [16]. The key idea is to employ a trusted server to aid the users preserve their anonymity. Upon receiving a query from a user, the location anonymizer server computes a cloaking box including the location of the user and other user locations and queries the requested service from the LBS provider for all the locations. Therefore, making it difficult for the LBS provider to identify the user [17, 18]. Several algorithms have been proposed to implement location cloaking scheme such as ICliqueCloak [19] and MaxAccuCloak [20]. The main drawback of the location cloaking is the need for a location anonymizer which is an additional cost overhead to the system. The location anonymizer can become a bottleneck itself both from the privacy and computational complexity perspective.
The authors in [21] proposed the idea of mixed zones. Mixed zone is defined as the spatial zone where the identity of users is not identifiable. All the users entering into a mixed zone will change their pseudonym to a new unused pseudonym making it difficult for the adversary to identify the users. The anonymization process is performed by a middleware mechanism before transferring the data to thirdparty applications. The authors further extended their work in [22] by considering irregular shapes for mix zones. Moreover, the use of mix zones has particularly attracted attention in vehicular communications. Applying the mix zones method for road networks is considered in [23, 24], where a mixed zone construction method called MobiMix is proposed. Lu et al. [25] exploited the pseudonym changes to for mix zones at social spots and Gao et al. [26] applied mix zones approach on trajectories for mobile crowd sensing applications. Furthermore, the use of cryptography for generation of mix zones in vehicular communications is considered in [27]. As it is the case for location cloaking approach, the main drawback of mix zones is also the need for a middleware mechanism or a trusted party before transferring the data to an untrusted LBS provider.
Another technique to increase the location privacy of the users is based on the assignment of pseudonyms to hide the identity of the users. The identity of a user can be the name of the person, a unique identifier such as IP address, or any properties that can be related to the user. The authors in [28] proposed a scenario socalled intermediary scenario in which a trusted intermediary collects the location information of the users such as GPS data and assigns a pseudonym before sending them to a third party LBS provider which is considered to be untrusted. The paper claims that the use of pseudonyms prevents the third party LBS provider from identifying and tracking the users. The work in [29] suggests that instead of delegating the generation of pseudonyms to the location intermediary, users generate the pseudonyms themselves. The use of pseudonyms for preserving the location privacy has also been considered in vehicular communication systems such as the work in [30]. There are several drawbacks associated with this approach. First of all, many of the locationbased applications require the users to subscribe in order to use the offered services. Secondly, similar to the last two categories, this approach also requires a trusted intermediary. And finally, analyzing the patterns in location data an adversary can compromise the identity of the users [31].
The last category which is considered to be a more promising approach since there is no need for a trusted anonymizer as it was the case for the location cloaking, mix zones, and pseudonyms is the use of dummy locations. This approach was initially proposed in [9]. The principal idea is to achieve anonymity by sending dummy location aside from the real location of the user while asking for an LBS from an untrusted LBS provider. All the locations use the same identifier corresponding to the user. Having dummy locations, it would become difficult for the adversary to identify the real location of the users. Several algorithms have been proposed to help the users generate the dummies. The authors in [10] proposed to use a virtual circle or a virtual grid which is based on the real location of users to generate the dummies. The idea was further developed in [11]. More recently, an algorithm called dummylocation selection (DLS) was proposed in [12]. The algorithm takes the number of queries made in the map into consideration and proves via simulations that previous algorithms are susceptible if the adversary exploits this side information. Although the algorithm provides a great framework for the generation of dummies, it does not take into account that the users are in danger of losing their location privacy if the adversary tracks them and access to other side information such as the number of transitions made in the map. Do et al. [32] utilized conditional probabilities to generate realistic false locations and Hara et al. [33] proposed a method based on physical constraints of the real environment.
Iii System Model
Iiia System Architecture
In this paper, we adopt a noncooperative system architecture [34], as shown in Fig. 2. In this architecture, the LBS users are directly in contact with the LBS provider with no middleman or a third party service provider.
Assume that the location map is divided into an grid and a user communicates with an LBS server for a service. At time , the user intends to make his/her th query from the service provider, preserving anonymity. Here, quantifies the privacy protection requirement of the user. This metric implies that the adversary is not able to identify the real location of the user with a probability higher than . Hence, such user needs to transmit dummy location to hide its true location from the observer. Note that by the term location we refer to the cell in which the user is located. We denote the set of locations transmitted to the LBS provider at th query by
(1) 
Also, the real location is shown by where . The probability of location being the real location is shown by
(2) 
In the next query, the user requires anonymity and queries the location set from the LBS provider. The probability of being queried consecutively after is denoted by
(3) 
IiiB Adversary Model
Two types of adversary models are considered in our work: an active adversary, and a passive adversary. The passive adversary can listen to the communication between the users and the LBS provider. Analyzing the collected information, the passive adversary can compromise the location privacy of the users by performing an eavesdropping attack. An active adversary, on the other hand, compromises the LBS provider and has access to the information stored on the server. In our work, the active adversary is assumed to be the LBS provider itself.
IiiC Side Infromation
The adversary is assumed to possess the location map of the area where the users are distributed. He has access to the queries made by the users and can record them over time to obtain the history of the locations where the users have queried from. Moreover, the adversary can calculate the query probability of different locations in the map, which is defined as the number of times a particular location has been queried. The adversary can exploit the query probability to infer the probability of a location being genuine or fake in the future queries. For instance, if a user queries two location, one with a comparably higher probability, it is more likely that the real location has the higher probability.
Apart from the possession of traditional side information by the adversary, we assume that the adversary has access to the number of times each path has been traveled on the map. The authorities do not provide any time limit for storing the location information of the users, as it is the case in the US [35]. This lack of legislation enables the adversary to monitor the users and access to the trajectories traveled by them. Therefore, the adversary not only has the data on the number of queries made on each location, but it is wellaware of the number of times that a location has been queried consecutively after the other locations.
Iv Performance Metrics of Privacy
In this section, we briefly explain a metric which was partially developed in [12]. Then, we propose a metric called transitionentropy to analyze the privacy preservation in LBSs for two consecutive queries followed by expanding the metric for trajectories.
Iva Cellentropy Metric
Although not mentioned as a metric in [12], cellentropy was implicitly proposed as part of the DLS algorithm. We have named this metric cellentropy to distinguish it from the transitionentropy metric proposed in this paper. For a given location set which includes the real location of a user and dummies chosen to preserve anonymity, the set of query probabilities are shown by where is the query probability of location (cell) for . The query of probability of cell is calculated by
(4) 
The cellentropy borrows the concept of entropy from information theory to quantify the uncertainty in query probability of the locations in . The cellentropy metric for location set can be defined as [12]
(5) 
IvB Transitionentropy
The main purpose of the metric we propose here is to provide a benchmark for the comparison between dummybased algorithms taking into account the comprehensive side information we consider in this paper. The metric indicates the susceptibility of the existing algorithms to attacks on location privacy of the users as the anonymity requirement of the users can easily be compromised in trajectories. Hence, necessitating the need for the development of new algorithms for preserving the location privacy of the users. We start by illustrating the metric for two consecutive queries and then generalizing it for trajectories.
IvB1 Transitionentropy metric for two consecutive queries
Assume that at time a user makes its th query and has an anonymity constraint of , and requests the service for the location set of . The set includes dummies and the real location of user. Then, at time the user moves to a new location with the anonymity constraint of and makes his th query providing the server with the location set of consisting of the real location of the user and the associated dummies. The dummies can be generated using any of the existing algorithms.
Using the sets and , we generate a bipartite graph shown in Fig. 3, where each set forms the vertices at a side of the graph.
We denote the number of times the location follows the location by and assign it to the directed edge connecting to . Also, for every location , we denote query probability of the location by . The query probability of a cell is calculated by dividing the number of times that cell has been called over the whole number of queries of the map. This data is calculated from the history of data LBS provider holds.
We would like to find out how probable it is for each member of the location set to be the real location of the user () given the location set in the previous query from the LBS provider. In other words, the aim is to calculate the posterior probability of the members in with respect to . This probability for each member of can be calculated based on the as
(6)  
(7)  
(8) 
where the equation (7) is the joint probability of being the real location of and moving to the location after . The former probability in equation (8) can be calculated as
(9) 
and the latter probability which indicates the normalized query probability as
(10) 
Note that equation (10) indicates that the posterior probability of the cells in are set to the normalized query probability of the locations. Calculating equation (8) for every member of the location set , the posterior probabilities of the locations in are derived based on the . Having these probabilities, we exploit the concept of entropy to infer the uncertainty in identifying the dummies or the real location of the users calculated by
(11) 
We call , the transitionentropy of the location set with respect to . The transitionentropy metric represents the uncertainty of identifying the real location by the adversary in consecutive queries from the LBS provider. Having a larger value for the transitionentropy indicates that for each member of , the probability of the paths originating from the to the destination of that member is similar to the other members of . Hence, it would be more difficult for the adversary to compromise the anonymity of the users based on the transitions made from their previous query. The formal algorithm for calculating the transitionentropy of the location set with respect to is presented in algorithm 1. The main advantages of the metric can be mentioned as: (i) considering the performance of the dummybased algorithms in trajectories and not just a stationary set of locations; (ii) being able to investigate the performance of the dummybased algorithms for users with varying anonymity requirements in their trajectory; (iii) entailing many other factors such as time reachability or direction similarity considered in other works.
IvB2 Transitionentropy metric for trajectories
In this subsection, we generalize the transition metric for trajectories with different lengths. Assume that at time the user makes its ()th query providing the LBS provider with the location set with privacy requirement of . The previous queried location sets are shown by for each with the privacy requirement of and being queried at time . Initially, our aim is to calculate the posterior probability of each location in . The posterior probabilities indicate the likelihood of any location in being the real location of the user based on the previous queries that the user has made. Posterior probability for each location in can be written as
(12)  
(13)  
(14) 
(15) 
Following the same process of moving from equation (12) to equation (14), the probability of can be solved recursively to reach the equation (15) where the transition probabilities can be calculated similar to the equation (9). Therefore, evaluating this equation for each node in we can realize the likelihood of a location being the real location of the queried set . Finally, we borrow the concept of entropy to understand the uncertainty in the data calculated as
(16) 
We call , the transitionentropy of the set with respect to the previous queried location sets . As it will be demonstrated in simulation results, the proposed transitionentropy metric will indicate the susceptibility of the locations in to be identified as dummies or real location of the user based on previous the queried location sets of the trajectory. The algorithm to calculate the transitionentropy metric is formally presented in Algo. 4.
Calculation of transitionentropy is only based on the query probability of initial location set and the transition entropies throughout the trajectory. It is important to understand why the query probability of the other locations on the trajectory are not considered in the calculation of the transitionentropy metric. This can best be understood by an example. Fig. 4 demonstrates a user requesting a LBS in two consecutive queries. The numbers written on the nodes indicate the normalized query probability of the locations and the numbers written on the edges indicate the normalized probability of that transition. Assume we want to calculate the transitionentropy metric for based on the previous queried location set . The purpose of the example is to illustrate why the posterior probabilities calculated by the previous queries for is more reliable than the query probability of the locations in . First, let us calculate the posterior probabilities of and its entropy. The posterior probabilities according to the equation (15) can be written as
(17)  
(18)  
(19)  
(20)  
(21)  
(22) 
According to the query probabilities of the location is more likely to be the real location as it has a significantly higher query probability, but looking at the posterior probabilities calculated for the location set we can see that based on , location is more probable to be the real location of the user. This discrepancy can be explained by looking at what the actual meaning of query probability is. The query probability indicates the number of times a location has been called but does not specify if it is been called after any particular location. Therefore, although location has been called more times than the other locations in , most of these queries perhaps have been made consecutively after locations and which are not a member of the location set . Hence, it can be seen that the posterior probabilities are more credible as they are considering the number of times queries made after prior location set .
V Viterbi Attack
The Viterbi algorithm is a wellknown dynamic programming algorithm proposed in by the authors in [36]. Initially, it was used for convolutional codes, but then it found numerous applications such as exploring the most likely sequence of hidden states in Hidden Markov Models (HMMs). For a given graph, the aim of the algorithm is to find the shortest path or socalled the most likely path. The most likely path is usually referred to as the Viterbi path. The Viterbi algorithm provides several features which distinguishes this algorithm from others existing algorithms for this purpose. The most important characteristic of the algorithm can be mentioned as low computational complexity. Here, we design an attack based on the Viterbi algorithm and name it Viterbi attack since the principal idea behind the attack is inspired by the Viterbi algorithm. The proposed Viterbi attack can significantly endanger the location privacy of the users if it is not considered in the design of the dummy generation algorithms. The adversary can exploit the accessed side information such as transition probabilities to compromise the location privacy of the users by conducting Viterbi attack. As it will be demonstrated in simulations, for a user traveling an even short trajectory the Viterbi attack can successfully identify many of the real location. In the following, We adopt and explain how the Viterbi algorithm can be considered as a threat to the location privacy of users.
Given the queried location sets , ,…, corresponding to a trajectory of length of a user, an attacker seeks to find the most probable state sequence to compromise the location privacy of the user. Here is referred to as a state of the location set . The desired state sequence of the adversary would be where for , refers to the true location of the queried set . We define to be the maximum probability of a state sequence with the length of given as where and . This function can be expressed mathematically as
(23) 
where for each the initial value of the function is set to
(24) 
in which as the most credible information for the first queried location set is the query probability, is calculated via equation (10). Starting from the second queried location set the most probable path can be calculated recursively as
(25) 
The formal presentation of Viterbi attack is given in the Algo. 3. The algorithm starts by setting the initial values of the array to their normalized query probability in lines . An array called is used to keep track of the most likely state of the previous queried location set as the most probable path is calculated in lines . Finally, the most probable path is chosen and the corresponding states are returned as output.
Vi The proposed algorithms to improve location privacy of users
In this section, we start by proposing two algorithms for improving the transitionentropy metric. The algorithms are independent of the method used for generating the dummies. For the purpose of explanation, the underlying dummy generation algorithm is set to DLS in our work. The first proposed algorithm is based on exhaustively searching for the desired dummy set and the second algorithm follows a greedy approach for selection of the dummies. We continue by proposing an algorithm called robust dummy generation (RDG) which can significantly increase the privacy of the users against the Viterbi attack while maintaining the high performance in terms of transitionentropy and cellentropy.
Via Exhaustive Search Algorithm
Suppose that at time the user has made its th query for the location set of which includes the real location and its associated dummies. As the user changes its location and makes his th query at time , assuming anonymity for the user, we wish to generate the location set to maximize the transitionentropy metric. The idea is to generate a pool of dummies instead of only fake locations which have similar cellentropy to the real location of the user and choosing subsets of the dummy location pool for evaluation of their transitionentropy performance. The formal description of the proposed method for generating the dummies of the set is explained in the Algorithm 4. The procedure starts by generating a pool of dummies using the DLS algorithm and assigning them to set . Then, distinct subsets of are chosen each with members, which will form a complete set of locations by addition of the real location (). Finally, the transitionentropy of each set is calculated with respect to , and the set with the maximum transitionentropy is returned as the th query set.
The proposed exhaustive search algorithm considers the extra side information incorporated in this paper. As it will be demonstrated in simulation results, the algorithm provides a significantly better transitionentropy performance compared to the existing algorithms while maintaining the traditional cellentropy metric near optimal.
ViB Greedy Algorithm
Although the Exhaustive search algorithm can significantly improve the location privacy of the users, the high computational overhead is a major drawback of the algorithm. The computational cost of the exhaustive search algorithm is in the order of where if no bound is selected for , its value is . It can be seen that the implementation of such algorithm can be timeconsuming. Therefore, in order to decrease the computation complexity, we propose a greedy approach which can achieve an order of .
Following the same setup explained in exhaustive search algorithm we aim to generate the set in a way to maximize the transitionentropy with respect to the previous location set . The principal idea behind the greedy algorithm is to choose the members which maximize the transitionentropy one by one and add them to the instead of looking at all the possible combinations and the transitionentropy they achieve. The algorithm starts by generating a pool of dummies using the DLS algorithm. The DLS algorithm has been chosen due to its robust performance in terms of cellentropy, the algorithm is applicable for other dummy generation methods as well. Initially, the location set only includes the real location of the user at th query. The next member is added by trying out all the members in and calculating the transitionentropy of including that member and choosing the one which maximizes the transitionentropy. Then, we move to the third member and the same procedure is repeated until all the dummies are chosen. The greedy algorithm is formally presented in Algorithm 5.
ViC RDG Algorithm
In this subsection, we propose RDG algorithm in which the aim is to increase the resilience against the Viterbi attack while maintaining the cellentropy and transitionentropy as high as the currently existing algorithms.
The algorithm is based on the idea of posterior probabilities introduced as part of the derivation of the transitionentropy. We explain the algorithm for generation of from the queried location set . If is the initial query of the user from the LBS provider, then, the initial posterior probabilities are set to the normalized query probability of the locations in ; otherwise, the posterior probabilities are calculated from equation (12). In the algorithm, posterior probabilities are assigned to an array called .
The algorithm starts by the generation of a pool of dummies using the DLS algorithms based on the real location of . Using DLS algorithm to generate the pool of dummies will ensure the high performance of the algorithm in terms of the cellentropy. From our experiments, setting the pool size to four times of the would still keep the cellentropy quite high while resulting a robust performance in terms of the transitionentropy and Viterbi attack resilience. Next, the algorithm continues by employing a greedy approach to add the most suitable dummies for the location set . For choosing the th member of the set , each of the remaining dummies in the pool is checked one by one. A criterion chosen here is based on maximizing the entropy for the array . For each member , the array is calculated as
(26) 
The weight array is chosen to be a twodimensional array to distinguish between the weights for different location sets. For each member of the dummy pool, its weight is calculated followed by the entropy of the weight array. After calculation of the entropy for all the possible members, the member which results in maximum entropy is chosen as a next member of . The process continues until all the dummies of are chosen. Note that before calculation of the entropy the weights are normalized to make the accumulation of the probabilities add up to one. The algorithm has been designed to provide a high cellentropy and transitionentropy privacy for the users while protecting them from the Viterbi attack on trajectories.
Vii Performance Evaluation
Viia Experiment Setup
In our experiment, we use the data collected by Geolife project [37, 38, 39], which includes the GPS trajectories of users from April 2007 to August 2012 in Beijing, China. The dataset contains the GPS logs of the users including trajectories with a total distance of . There are two main advantages distinguishing Geolife dataset for our work. Firstly, the recorded data aside from monitoring the daily routines of the users, such as going to work or home, includes trajectories involving the sports activities like hiking and cycling. Secondly, many of the recorded trajectories are tagged with a transportation mode, which indicates the use of various means of traveling from bus and car to airplane and train.
We have conducted our experiments on central part of the Beijing map with the resolution of for each grid cell. The location privacy requirement () of the users are investigated for the values to . For each value of , the algorithms are repeated time to ensure the reliability of the results. Although the proposed algorithm and metric can be used for the users who have varying location privacy requirements in consecutive queries of the LBS, for the sake of comparison, we have assumed that the value stays the same in consecutive calls for the LBS. Additionally, the experiments are performed on a PC with a GHz corei7 Intel processor, bit Windows operating system, and GB of RAM. Moreover, Python program is used to implement the algorithms.
ViiB Performance Analysis
In this section, we evaluate the performance of the proposed algorithms and metrics through an extensive number of experiments. The desired outcome of the experiments is to show that the proposed RDG algorithm can withhold the currently established metric of cellentropy [12] while increasing the performance in terms of the proposed metric transitionentropy and providing the users with a high privacy preservation against the developed Viterbi attack. We start by analyzing the performance of the algorithms in terms of cellentropy, followed by transitionentropy analysis and investigating their performance against Viterbi attack.
ViiB1 Cellentropy performance evaluation
In order to calculate the cellentropy metric, the adversary records the number of times each cell has been queried over time, and using this information calculates the query probability of each cell. Once the dataset including the real location and dummies are submitted to the server, the adversary can calculate the cellentropy of the user. A higher value for the cellentropy indicates more uncertainty in finding the real location or recognizing the dummies. Therefore, maximum cellentropy is desirable to maintain the anonymity of the users.
Fig. 5 represents the comparison of different algorithms in terms of cellentropy. The optimal value is achieved when the locations queried form the LBS provider all have the same probability of , or equivalently, the location set has the cellentropy of . The optimal value is the target for all the algorithms since it is the maximum entropy that a location set can achieve. In the random scheme [9], the dummies are generated randomly which expectedly results in a lower cellentropy compared to the other algorithms. As it can be seen in the figure, the DLS algorithm achieves nearoptimal performance in terms of the cellentropy. Therefore, the adversary is unable to compromise the anonymity of the user from the stationary set of locations submitted to the server using the available query probabilities. The exhaustive, greedy and RDG algorithms can also achieve nearoptimal performance which indicates that in all the algorithms the adversary is unable to identify the dummy location by exploiting the cellentropy. It must be noted that proposed algorithms here is adaptable to any dummy generation algorithm, therefore, the reason for a high cellentropy performance of the proposed algorithms is that we have chosen DLS as our base. Hence, if other algorithms are chosen, the cellentropy performance must be evaluated for them as well to ensure the robust performance in terms of the cellentropy.
ViiB2 Transitionentropy performance evaluation
The currently established cellentropy metric only considers the location privacy for the stationary set of queried locations submitted to the LBS server, but overlooks the fact that the adversary has access to the trajectories traveled by the users as well. The adversary can use the likelihood of traveling different paths between the consecutive location sets, and infer with a high probability that many of the submitted locations are dummies, which leads to failure in preserving location privacy requirements of the users. Fig. 6 compares the performance of different algorithms in terms of the transitionentropy for a path length of two. For all the algorithms, based on the value of , two consecutive location sets are generated, each including the real location and its associated dummies. To make the experiments as realistic as possible, the real location movements are chosen randomly from the recorded trajectories in the dataset.
The optimal value in Fig. 6 corresponds to a scenario in which all the members of the second location set are equally likely to be called consecutively after the members of the first location set. The optimal values can be calculated in a similar way as the optimal number for the cellentropy for different values of . As it can be seen from the figure, the random scheme has a very poor performance which means that the adversary can easily recognize most of the dummies from the transitionentropy even for the two consecutive location sets queried by the user. The first point to notice in the figure is that although DLS algorithm achieved a nearoptimal performance in terms of cellentropy, transitionentropy performance indicates that the adversary can compromise the location privacy of the users by calculating the posterior probabilities. The transitionentropy of the proposed algorithms in this paper can be seen to significantly improve the transitionentropy performance, almost improving the performance more than twice as high as the DLS algorithm. In other words, the likelihood of compromising the anonymity requirement is decreased by the proposed algorithms which leads to a higher location privacy for the users of LBSs. The exhaustive algorithm can be seen to achieve a little worse performance compared to the RDG and greedy. This lower performance is due to setting an upper bound for the number of sets chosen for calculation of the transitionentropy instead of going through them all which will become highly computational when the pool size of dummies is large. The performance of RDG can be seen to significantly high compared to the other algorithms.
Fig. 7, extends our analysis of transitionentropy for trajectories with higher length. The crucial inference from the graph is that as more number of locations are queried from the LBS provider, the transitionentropy reduces. This simulation result corresponds to the theoretical analysis that by having more information the adversary is able to calculate the posterior probabilities more accurately which results in less uncertainty for the adversary to identify the real location of the users. The previous algorithm DLS can be seen to have a very low transitionentropy compared to the proposed algorithms greedy and DLS. Therefore, our proposed algorithms are viable in increasing the transitionentropy of the users while maintaining the cellentropy to nearoptimal performance. It must be noted that the greedy and RDG are able to increase the transitionentropy for different dummy generation algorithms without depending on what the underlying algorithm for the generation of the pool of dummies is. Therefore, a better algorithm than DLS algorithm can cause the performance to improve as the greedy and RDG algorithms increased the transitionentropy of DLS algorithm.
ViiC Performance of Algorithms Against Viterbi Attack
In this subsection, we analyze the performance of our proposed algorithms against the designed Viterbi attack. The performance analysis is given in Fig. 8. Considering the extensive side information we incorporated in this paper, the Viterbi attack would be a significantly threatening privacy issue for the users of LBSs. Looking at the percentage of real locations protected in the Viterbi attack on DLS algorithm, it can be seen that, for instance, in a trajectory of length the adversary is able to identify almost all the real locations of the users. This shows that although in a single request of LBS from the server the locations are protected using existing dummy generation algorithms, in trajectories the side information that the adversary has, can cause the compromised LBS provider to almost identify all the real locations. The apparent trend for all the path lengths is that increasing the number of dummies can improve the preservation of location privacy, but this increase is not sufficient even for trajectories of length two.
The second algorithm considered in Fig 8 is the greedy algorithm proposed in our work to increase the transitionentropy of the dummy generation algorithms. Although the algorithm prevents the inference of real locations based on transitionentropy, it is not capable of providing location privacy against the Viterbi attack conducted by the adversary. The performance of greedy algorithm against Viterbi attack gets worse as more number of queries are made from the LBS as the adversary will have more accurate information from the history of data. On the other hand, looking at the performance analysis of RDG, it can be seen that as the algorithm tends to confuse the adversary more and more in each requested query from the LBS provider, having larger trajectories the difference between the real path and estimated path of the Viterbi attack becomes larger. RDG algorithm is able to protect at least percent of the user queried locations if the anonymity criterion is set to or larger.
Viii Conclusions
In this work, we incorporated new side information which can be exploited by the adversary to compromise the location privacy of the users. We proposed a metric called transitionentropy to evaluate the performance of the dummybased algorithms and quantified the currently existing metric cellentropy. The metric is based on the transitions between the locations in the map and considers the deplorable effect of new side information on location privacy of the users. To improve the transitionentropy metric two general approaches were proposed to increase the transitionentropy for a given dummy generation algorithm. Furthermore, we developed an attack model based on the Viterbi algorithm on location privacy of the users, followed by proposing an algorithm called RDG to increase the performance in terms of the cellentropy and transitionentropy while protecting the users against Viterbi attack. Finally, numerous experiments were performed on realworld data to analyze the performance of the algorithms.
References
 [1] “Locationbased services (lbs) and real time location systems (rtls) market by location (indoor and outdoor), technology (context aware, uwb, bt/ble, beacons, agps), software, hardware, service and application area  global forecast to 2021.” [Online]. Available: https://www.marketsandmarkets.com/MarketReports/locationbasedservicemarket96994431.html
 [2] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive computing, vol. 2, no. 1, pp. 46–55, 2003.
 [3] T. Jiang, H. J. Wang, and Y.C. Hu, “Preserving location privacy in wireless lans,” in Proceedings of the 5th international conference on Mobile systems, applications and services. ACM, 2007, pp. 246–257.
 [4] C.Y. Chow, M. F. Mokbel, and X. Liu, “A peertopeer spatial cloaking algorithm for anonymous locationbased service,” in Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems. ACM, 2006, pp. 171–178.
 [5] M. L. Yiu, C. S. Jensen, J. Møller, and H. Lu, “Design and analysis of a ranking approach to private locationbased services,” ACM Transactions on Database Systems (TODS), vol. 36, no. 2, p. 10, 2011.
 [6] R. Schlegel, C.Y. Chow, Q. Huang, and D. S. Wong, “Userdefined privacy grid system for continuous locationbased services,” IEEE Transactions on Mobile Computing, vol. 14, no. 10, pp. 2158–2172, 2015.
 [7] L. Sweeney, “kanonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, vol. 10, no. 05, pp. 557–570, 2002.
 [8] M. Gruteser and D. Grunwald, “Anonymous usage of locationbased services through spatial and temporal cloaking,” in Proceedings of the 1st international conference on Mobile systems, applications and services. ACM, 2003, pp. 31–42.
 [9] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous communication technique using dummies for locationbased services,” in Pervasive Services, 2005. ICPS’05. Proceedings. International Conference on. IEEE, 2005, pp. 88–97.
 [10] B. Niu, Z. Zhang, X. Li, and H. Li, “Privacyarea aware dummy generation algorithms for locationbased services,” in Communications (ICC), 2014 IEEE International Conference on. IEEE, 2014, pp. 957–962.
 [11] H. Lu, C. S. Jensen, and M. L. Yiu, “Pad: privacyarea aware, dummybased location privacy in mobile services,” in Proceedings of the Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access. ACM, 2008, pp. 16–23.
 [12] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving kanonymity in privacyaware locationbased services,” in INFOCOM, 2014 Proceedings IEEE. IEEE, 2014, pp. 754–762.
 [13] A. Serjantov and G. Danezis, “Towards an information theoretic metric for anonymity,” in International Workshop on Privacy Enhancing Technologies. Springer, 2002, pp. 41–53.
 [14] A. Pfitzmann and M. Köhntopp, “Anonymity, unobservability, and pseudonymitya proposal for terminology,” in Designing privacy enhancing technologies. Springer, 2001, pp. 1–9.
 [15] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression,” Technical report, SRI International, Tech. Rep., 1998.
 [16] M. Gruteser and D. Grunwald, “Anonymous usage of locationbased services through spatial and temporal cloaking,” in Proceedings of the 1st international conference on Mobile systems, applications and services. ACM, 2003, pp. 31–42.
 [17] C.Y. Chow and M. F. Mokbel, “Enabling private continuous queries for revealed user locations,” in International Symposium on Spatial and Temporal Databases. Springer, 2007, pp. 258–275.
 [18] T. Xu and Y. Cai, “Exploring historical location data for anonymity preservation in locationbased services,” in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE. IEEE, 2008, pp. 547–555.
 [19] X. Pan, J. Xu, and X. Meng, “Protecting location privacy against locationdependent attacks in mobile services,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 8, pp. 1506–1519, 2012.
 [20] J. Xu, X. Tang, H. Hu, and J. Du, “Privacyconscious locationbased queries in mobile environments,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 3, pp. 313–326, 2010.
 [21] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive Computing, vol. 2, no. 1, pp. 46–55, Jan 2003.
 [22] ——, “Mix zones: User privacy in locationaware services,” in Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second IEEE Annual Conference on. IEEE, 2004, pp. 127–131.
 [23] B. Palanisamy and L. Liu, “Mobimix: Protecting location privacy with mixzones over road networks,” in Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011, pp. 494–505.
 [24] ——, “Attackresilient mixzones over road networks: architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 14, no. 3, pp. 495–508, 2015.
 [25] R. Lu, X. Lin, T. H. Luan, X. Liang, and X. Shen, “Pseudonym changing at social spots: An effective strategy for location privacy in vanets,” IEEE Transactions on Vehicular Technology, vol. 61, no. 1, pp. 86–96, 2012.
 [26] S. Gao, J. Ma, W. Shi, G. Zhan, and C. Sun, “Trpf: A trajectory privacypreserving framework for participatory sensing,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 6, pp. 874–887, 2013.
 [27] J. Freudiger, M. Raya, M. Félegyházi, P. Papadimitratos, and J.P. Hubaux, “Mixzones for location privacy in vehicular networks,” in ACM Workshop on Wireless Networking for Intelligent Transportation Systems (WiNITS), no. LCACONF2007016, 2007.
 [28] T. Kölsch, L. Fritsch, M. Kohlweiss, and D. Kesdogan, “Privacy for profitable location based services,” in International Conference on Security in Pervasive Computing. Springer, 2005, pp. 164–178.
 [29] T. Rodden, A. Friday, H. Muller, A. Dix et al., “A lightweight approach to managing privacy in locationbased services,” 2002.
 [30] R. Lu, X. Lin, T. H. Luan, X. Liang, and X. Shen, “Pseudonym changing at social spots: An effective strategy for location privacy in vanets,” IEEE Transactions on Vehicular Technology, vol. 61, no. 1, pp. 86–96, 2012.
 [31] M. Wernke, P. Skvortsov, F. Dürr, and K. Rothermel, “A classification of location privacy attacks and approaches,” Personal and ubiquitous computing, vol. 18, no. 1, pp. 163–175, 2014.
 [32] H. J. Do, Y.S. Jeong, H.J. Choi, and K. Kim, “Another dummy generation technique in locationbased services,” in Big Data and Smart Computing (BigComp), 2016 International Conference on. IEEE, 2016, pp. 532–538.
 [33] T. Hara, A. Suzuki, M. Iwata, Y. Arase, and X. Xie, “Dummybased user location anonymization under realworld constraints,” IEEE Access, vol. 4, pp. 673–687, 2016.
 [34] R. Cheng, Y. Zhang, E. Bertino, and S. Prabhakar, “Preserving user location privacy in mobile data management infrastructures,” in International Workshop on Privacy Enhancing Technologies. Springer, 2006, pp. 393–412.
 [35] A. Franken, “Text  s.1223  112th congress (20112012): Location privacy protection act of 2012,” Dec 2012. [Online]. Available: https://www.congress.gov/bill/112thcongress/senatebill/1223/text
 [36] G. D. Forney, “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268–278, 1973.
 [37] Y. Zheng, L. Zhang, X. Xie, and W.Y. Ma, “Mining interesting locations and travel sequences from gps trajectories,” in Proceedings of the 18th international conference on World wide web. ACM, 2009, pp. 791–800.
 [38] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.Y. Ma, “Understanding mobility based on gps data,” in Proceedings of the 10th international conference on Ubiquitous computing. ACM, 2008, pp. 312–321.
 [39] Y. Zheng, X. Xie, and W.Y. Ma, “Geolife: A collaborative social networking service among user, location and trajectory.” IEEE Data Eng. Bull., vol. 33, no. 2, pp. 32–39, 2010.