Privacy Preservation in Location-Based Services: A Novel Metric and Attack Model

Privacy Preservation in Location-Based Services: A Novel Metric and Attack Model

Sina Shaham, Ming Ding, Bo Liu, Zihuai Lin, Jun Li
School of Electrical and Information Engineering, The University of Sydney, Australia
Department of Engineering, La Trobe University, Australia
Email: {sina.shaham, zihuai.lin}@sydney.edu.au, ming.ding@data61.csiro.au, b.liu2@latrobe.edu.au, jun.li@njust.edu.cn
Abstract

Recent years have seen rising needs for location-based services in our everyday life. Aside from the many advantages provided by these services, they have caused serious concerns regarding the location privacy of users. An adversary such as an untrusted location-based server can monitor the queried locations by a user to infer critical information such as the user’s home address, health conditions, shopping habits, etc. To address this issue, dummy-based algorithms have been developed to increase the anonymity of users, and thus, protecting their privacy. Unfortunately, the existing algorithms only consider a limited amount of side information known by an adversary which may face more serious challenges in practice. In this paper, we incorporate a new type of side information based on consecutive location changes of users and propose a new metric called transition-entropy to investigate the location privacy preservation, followed by two algorithms to improve the transition-entropy for a given dummy generation algorithm. Then, we develop an attack model based on the Viterbi algorithm which can significantly threaten the location privacy of the users. Next, in order to protect the users from Viterbi attack, we propose an algorithm called robust dummy generation (RDG) which can resist against the Viterbi attack while maintaining a high performance in terms of the privacy metrics introduced in the paper. All the algorithms are applied and analyzed on a real-life dataset.

I Introduction

With the ubiquitous use of smartphones and social networks, location-based services (LBSs) have become an essential part of the contemporary society. The users of smart devices can simply download location-based applications and query the information from the LBS provider. For example, LBSs offered by companies like Alibaba, Apple, and Google can be used to find nearby restaurants, track the parcels, and provide personalized weather notifications. The annual market for LBSs is expected to reach USD Billion by , with an annual growth rate of [1].

In spite of countless advantages of LBSs, the privacy issues associated with the user locations have raised many concerns in our society. An untrusted server can collect the location data of users and analyze it to learn sensitive information such as the type of queries submitted, shopping habits of users, and the address of users’ properties or workplaces. Such information can be easily abused by the server or disclosed to other parties. Therefore, it is of great importance to devise new ways to preserve the location privacy of users defined as ”the ability to prevent other parties from learning one’s current or past locations” [2].

The techniques to address the threats to the location privacy of users have attracted much attention among researchers [3, 4, 5, 6, 2]. Most of the literature is based on an approach called -anonymity [7]. Using this criterion, the release of a location is said to provide -anonymity, if the real location of any user is not distinguishable from at least other locations. Initially, the approach to hide the location of the user was conducted using a trusted anonymization server [8], but later on, due to the shortcomings of this approach such as the anonymizer becoming the bottleneck itself, the use of dummy locations to achieve the -anonymity was proposed in [9]. Since then, the researchers have strived to develop dummy generation algorithms to preserve the -anonymity for users.

The principal idea behind the dummy generation algorithms is to generate dummy locations aside from the real location of the user and submitting them all together to the LBS server while asking for a query from the LBS provider. Thus, it makes it difficult for an untrusted LBS provider, or so-called the adversary, to identify the real location of the user. The groundwork in this field was laid by the authors in [9]. They generated the dummies randomly throughout the map and evolved them as users move. Followed by this work, the authors in [10] and [11] proposed to choose the candidate dummies from a virtual circle or grid constructed around the current location of the user. Unfortunately, in all of the mentioned works, the fact that the adversary might have some side information which can rule out the dummies or reveal the real location of the user was overlooked.

One important piece of side information which can be exploited by the adversary is the query probability of the locations across the map. The adversary can utilize the recorded data and infer the number of times that the users have queried over various locations on the map. Using this information, the adversary can calculate the query probability of each location, and then, identify the dummy locations according to the history of interests in locations. For instance, if a dummy has been chosen on a lake, where the query probability is basically close to zero, the adversary will then know with a high likelihood that such queried location is a dummy. And therefore, such naive selection of dummy locations compromises the location privacy of the user. To solve this issue, an enhanced algorithm was proposed by [12], referred to as the dummy-location selection (DLS) algorithm. Basically speaking, the authors used an entropy metric [13] to evaluate the queries submitted in different locations and generated the dummies in a way to maximize the entropy.

Although the DLS algorithm is promising for a stationary set of the queried locations including the real location and its associated dummies, the algorithm fails to address the privacy issues caused by the consecutive queries made to the LBS provider. In more detail, the authors have limited the side information to queries submitted in different locations but overlooked the fact the adversary has also access to the trajectories, and consequently, the number of times the paths between locations have been traveled. Having access to such extra side information, the adversary can expose the dummies and compromise the -anonymity of the users. For further explanation, a toy example has been provided in Fig. 1,

Fig. 1: An example of location privacy of the user being compromised by considering the introduced side information.

where we show a user moving from location to location with set to two. The associated dummies of the real locations and are denoted by and , respectively. The dummies in each location set are generated using the DLS algorithm, hence, they have a similar probability of being selected. The numbers on the directed edges indicate the number of times that users have queried the end location of the edge right after asking about the starting point of the edge. For instance, the users have queried location for times immediately after location . According to the DLS algorithm, the -anonymity requirement has been preserved for each location. However, let us look at the four paths connecting the two sets of locations together and consider the number of times that each path has been inquired. It can be seen from Fig. 1 that location has been inquired for times after locations and whereas location has only received times of inquiries. Therefore, the adversary can infer with a high likelihood that the real location is possibly location , and thus, compromise the location privacy of the user.

The main contributions of this paper follow.

  • We quantify the currently existing metric and name it cell-entropy and propose a new metric called transition for two consecutive queries which considers the introduced side information based on transitions of the users.

  • We expand the transition-entropy metric for trajectories followed by developing two algorithms which can be applied on any of the existing dummy generation methods to improve the transition entropy.

  • We propose an attack model based on the Viterbi algorithm and develop an algorithm to improve the resilience against the attack while maintaining the high performance in terms of cell-entropy and transition-entropy

  • We analyze the performance of the proposed metrics and algorithms on a real-life dataset.

The rest of the paper is organized as follows. We start by explaining the existing works in literature in Section II. Section III describes the system model used throughout the paper including the system architecture, the adversary model, and the side information used by the adversary. In section IV, we introduce our proposed metrics followed by explaining the proposed attack model in section V. Next, the proposed algorithms are illustrated in section VI. Finally, the analysis of the proposed metrics and algorithms is provided in section VII, and we conclude our work in section VIII.

Ii Related works

Anonymity is defined as ”the state of being not identifiable within a set of subjects, the anonymity set” [14]. Also, the location of a user is said to be -anonymous if it is not distinguishable from at other user locations [15]. To obtain -anonymity for users several approaches have been proposed in which we have identified four broad categories: location cloaking, mix-zones, pseudonyms, and dummy aided algorithms.

The research on location cloaking was initiated by Gruteser and Grunwald [16]. The key idea is to employ a trusted server to aid the users preserve their -anonymity. Upon receiving a query from a user, the location anonymizer server computes a cloaking box including the location of the user and other user locations and queries the requested service from the LBS provider for all the locations. Therefore, making it difficult for the LBS provider to identify the user [17, 18]. Several algorithms have been proposed to implement location cloaking scheme such as ICliqueCloak [19] and MaxAccuCloak [20]. The main drawback of the location cloaking is the need for a location anonymizer which is an additional cost overhead to the system. The location anonymizer can become a bottleneck itself both from the privacy and computational complexity perspective.

The authors in [21] proposed the idea of mixed zones. Mixed zone is defined as the spatial zone where the identity of users is not identifiable. All the users entering into a mixed zone will change their pseudonym to a new unused pseudonym making it difficult for the adversary to identify the users. The anonymization process is performed by a middle-ware mechanism before transferring the data to third-party applications. The authors further extended their work in [22] by considering irregular shapes for mix zones. Moreover, the use of mix zones has particularly attracted attention in vehicular communications. Applying the mix zones method for road networks is considered in [23, 24], where a mixed zone construction method called MobiMix is proposed. Lu et al. [25] exploited the pseudonym changes to for mix- zones at social spots and Gao et al. [26] applied mix zones approach on trajectories for mobile crowd sensing applications. Furthermore, the use of cryptography for generation of mix zones in vehicular communications is considered in [27]. As it is the case for location cloaking approach, the main drawback of mix zones is also the need for a middle-ware mechanism or a trusted party before transferring the data to an untrusted LBS provider.

Another technique to increase the location privacy of the users is based on the assignment of pseudonyms to hide the identity of the users. The identity of a user can be the name of the person, a unique identifier such as IP address, or any properties that can be related to the user. The authors in [28] proposed a scenario so-called intermediary scenario in which a trusted intermediary collects the location information of the users such as GPS data and assigns a pseudonym before sending them to a third party LBS provider which is considered to be untrusted. The paper claims that the use of pseudonyms prevents the third party LBS provider from identifying and tracking the users. The work in [29] suggests that instead of delegating the generation of pseudonyms to the location intermediary, users generate the pseudonyms themselves. The use of pseudonyms for preserving the location privacy has also been considered in vehicular communication systems such as the work in [30]. There are several drawbacks associated with this approach. First of all, many of the location-based applications require the users to subscribe in order to use the offered services. Secondly, similar to the last two categories, this approach also requires a trusted intermediary. And finally, analyzing the patterns in location data an adversary can compromise the identity of the users [31].

The last category which is considered to be a more promising approach since there is no need for a trusted anonymizer as it was the case for the location cloaking, mix zones, and pseudonyms is the use of dummy locations. This approach was initially proposed in [9]. The principal idea is to achieve -anonymity by sending dummy location aside from the real location of the user while asking for an LBS from an untrusted LBS provider. All the locations use the same identifier corresponding to the user. Having dummy locations, it would become difficult for the adversary to identify the real location of the users. Several algorithms have been proposed to help the users generate the dummies. The authors in [10] proposed to use a virtual circle or a virtual grid which is based on the real location of users to generate the dummies. The idea was further developed in [11]. More recently, an algorithm called dummy-location selection (DLS) was proposed in [12]. The algorithm takes the number of queries made in the map into consideration and proves via simulations that previous algorithms are susceptible if the adversary exploits this side information. Although the algorithm provides a great framework for the generation of dummies, it does not take into account that the users are in danger of losing their location privacy if the adversary tracks them and access to other side information such as the number of transitions made in the map. Do et al. [32] utilized conditional probabilities to generate realistic false locations and Hara et al. [33] proposed a method based on physical constraints of the real environment.

Iii System Model

Iii-a System Architecture

In this paper, we adopt a non-cooperative system architecture [34], as shown in Fig. 2. In this architecture, the LBS users are directly in contact with the LBS provider with no middle-man or a third party service provider.

Fig. 2: Non-cooperative system architecture for LBSs.

Assume that the location map is divided into an grid and a user communicates with an LBS server for a service. At time , the user intends to make his/her -th query from the service provider, preserving -anonymity. Here, quantifies the privacy protection requirement of the user. This metric implies that the adversary is not able to identify the real location of the user with a probability higher than . Hence, such user needs to transmit dummy location to hide its true location from the observer. Note that by the term location we refer to the cell in which the user is located. We denote the set of locations transmitted to the LBS provider at -th query by

(1)

Also, the real location is shown by where . The probability of location being the real location is shown by

(2)

In the next query, the user requires -anonymity and queries the location set from the LBS provider. The probability of being queried consecutively after is denoted by

(3)

Iii-B Adversary Model

Two types of adversary models are considered in our work: an active adversary, and a passive adversary. The passive adversary can listen to the communication between the users and the LBS provider. Analyzing the collected information, the passive adversary can compromise the location privacy of the users by performing an eavesdropping attack. An active adversary, on the other hand, compromises the LBS provider and has access to the information stored on the server. In our work, the active adversary is assumed to be the LBS provider itself.

Iii-C Side Infromation

The adversary is assumed to possess the location map of the area where the users are distributed. He has access to the queries made by the users and can record them over time to obtain the history of the locations where the users have queried from. Moreover, the adversary can calculate the query probability of different locations in the map, which is defined as the number of times a particular location has been queried. The adversary can exploit the query probability to infer the probability of a location being genuine or fake in the future queries. For instance, if a user queries two location, one with a comparably higher probability, it is more likely that the real location has the higher probability.

Apart from the possession of traditional side information by the adversary, we assume that the adversary has access to the number of times each path has been traveled on the map. The authorities do not provide any time limit for storing the location information of the users, as it is the case in the US [35]. This lack of legislation enables the adversary to monitor the users and access to the trajectories traveled by them. Therefore, the adversary not only has the data on the number of queries made on each location, but it is well-aware of the number of times that a location has been queried consecutively after the other locations.

Iv Performance Metrics of Privacy

In this section, we briefly explain a metric which was partially developed in [12]. Then, we propose a metric called transition-entropy to analyze the privacy preservation in LBSs for two consecutive queries followed by expanding the metric for trajectories.

Iv-a Cell-entropy Metric

Although not mentioned as a metric in [12], cell-entropy was implicitly proposed as part of the DLS algorithm. We have named this metric cell-entropy to distinguish it from the transition-entropy metric proposed in this paper. For a given location set which includes the real location of a user and dummies chosen to preserve -anonymity, the set of query probabilities are shown by where is the query probability of location (cell) for . The query of probability of cell is calculated by

(4)

The cell-entropy borrows the concept of entropy from information theory to quantify the uncertainty in query probability of the locations in . The cell-entropy metric for location set can be defined as [12]

(5)

Iv-B Transition-entropy

The main purpose of the metric we propose here is to provide a benchmark for the comparison between dummy-based algorithms taking into account the comprehensive side information we consider in this paper. The metric indicates the susceptibility of the existing algorithms to attacks on location privacy of the users as the -anonymity requirement of the users can easily be compromised in trajectories. Hence, necessitating the need for the development of new algorithms for preserving the location privacy of the users. We start by illustrating the metric for two consecutive queries and then generalizing it for trajectories.

Iv-B1 Transition-entropy metric for two consecutive queries

Assume that at time a user makes its -th query and has an anonymity constraint of , and requests the service for the location set of . The set includes dummies and the real location of user. Then, at time the user moves to a new location with the anonymity constraint of and makes his -th query providing the server with the location set of consisting of the real location of the user and the associated dummies. The dummies can be generated using any of the existing algorithms.

Using the sets and , we generate a bipartite graph shown in Fig. 3, where each set forms the vertices at a side of the graph.

Fig. 3: The bipartite graph generated by the consecutive queries of a user.

We denote the number of times the location follows the location by and assign it to the directed edge connecting to . Also, for every location , we denote query probability of the location by . The query probability of a cell is calculated by dividing the number of times that cell has been called over the whole number of queries of the map. This data is calculated from the history of data LBS provider holds.

We would like to find out how probable it is for each member of the location set to be the real location of the user () given the location set in the previous query from the LBS provider. In other words, the aim is to calculate the posterior probability of the members in with respect to . This probability for each member of can be calculated based on the as

(6)
(7)
(8)

where the equation (7) is the joint probability of being the real location of and moving to the location after . The former probability in equation (8) can be calculated as

(9)

and the latter probability which indicates the normalized query probability as

(10)

Note that equation (10) indicates that the posterior probability of the cells in are set to the normalized query probability of the locations. Calculating equation (8) for every member of the location set , the posterior probabilities of the locations in are derived based on the . Having these probabilities, we exploit the concept of entropy to infer the uncertainty in identifying the dummies or the real location of the users calculated by

(11)
1 Input: The location sets and .
2 Output: The transition-entropy of with respect to .
3 Initialization: , .
4 for  do
5      
6       for  do
7            
8       end for
9      for  do
10            
11       end for
12      
13 end for
14for  do
15      
16 end for
17for  do
18      
19 end for
20for  do
21      
22       for  do
23            
24            
25       end for
26      
27      
28 end for
return
Algorithm 1 Calculation of transition-entropy for the location set with respect to .
1 Input: The location sets .
2 Output: The TransitionEntropy of with respect to .
3 Initialization: TransitionEntropy .
4 Run Algo. 1 for and
5 for  do
6       Normalize posterior probabilities of
7       Query probabilities of posterior probabilities of
8       Run Algo. 1 for and
9      
10 end for
11Normalize posterior probabilities of
12 TransitionEntropy calculate Entropy
return TransitionEntropy
Algorithm 2 Calculation of transition-entropy for trajectories of length .

We call , the transition-entropy of the location set with respect to . The transition-entropy metric represents the uncertainty of identifying the real location by the adversary in consecutive queries from the LBS provider. Having a larger value for the transition-entropy indicates that for each member of , the probability of the paths originating from the to the destination of that member is similar to the other members of . Hence, it would be more difficult for the adversary to compromise the -anonymity of the users based on the transitions made from their previous query. The formal algorithm for calculating the transition-entropy of the location set with respect to is presented in algorithm 1. The main advantages of the metric can be mentioned as: (i) considering the performance of the dummy-based algorithms in trajectories and not just a stationary set of locations; (ii) being able to investigate the performance of the dummy-based algorithms for users with varying -anonymity requirements in their trajectory; (iii) entailing many other factors such as time reachability or direction similarity considered in other works.

Iv-B2 Transition-entropy metric for trajectories

In this subsection, we generalize the transition metric for trajectories with different lengths. Assume that at time the user makes its ()-th query providing the LBS provider with the location set with privacy requirement of . The previous queried location sets are shown by for each with the privacy requirement of and being queried at time . Initially, our aim is to calculate the posterior probability of each location in . The posterior probabilities indicate the likelihood of any location in being the real location of the user based on the previous queries that the user has made. Posterior probability for each location in can be written as

(12)
(13)
(14)
(15)

Following the same process of moving from equation (12) to equation (14), the probability of can be solved recursively to reach the equation (15) where the transition probabilities can be calculated similar to the equation (9). Therefore, evaluating this equation for each node in we can realize the likelihood of a location being the real location of the queried set . Finally, we borrow the concept of entropy to understand the uncertainty in the data calculated as

(16)

We call , the transition-entropy of the set with respect to the previous queried location sets . As it will be demonstrated in simulation results, the proposed transition-entropy metric will indicate the susceptibility of the locations in to be identified as dummies or real location of the user based on previous the queried location sets of the trajectory. The algorithm to calculate the transition-entropy metric is formally presented in Algo. 4.

Calculation of transition-entropy is only based on the query probability of initial location set and the transition entropies throughout the trajectory. It is important to understand why the query probability of the other locations on the trajectory are not considered in the calculation of the transition-entropy metric. This can best be understood by an example. Fig. 4 demonstrates a user requesting a LBS in two consecutive queries. The numbers written on the nodes indicate the normalized query probability of the locations and the numbers written on the edges indicate the normalized probability of that transition. Assume we want to calculate the transition-entropy metric for based on the previous queried location set . The purpose of the example is to illustrate why the posterior probabilities calculated by the previous queries for is more reliable than the query probability of the locations in . First, let us calculate the posterior probabilities of and its entropy. The posterior probabilities according to the equation (15) can be written as

(17)
(18)
(19)
(20)
(21)
(22)

According to the query probabilities of the location is more likely to be the real location as it has a significantly higher query probability, but looking at the posterior probabilities calculated for the location set we can see that based on , location is more probable to be the real location of the user. This discrepancy can be explained by looking at what the actual meaning of query probability is. The query probability indicates the number of times a location has been called but does not specify if it is been called after any particular location. Therefore, although location has been called more times than the other locations in , most of these queries perhaps have been made consecutively after locations and which are not a member of the location set . Hence, it can be seen that the posterior probabilities are more credible as they are considering the number of times queries made after prior location set .

Fig. 4: An example of two consecutive queried location sets.

V Viterbi Attack

The Viterbi algorithm is a well-known dynamic programming algorithm proposed in by the authors in [36]. Initially, it was used for convolutional codes, but then it found numerous applications such as exploring the most likely sequence of hidden states in Hidden Markov Models (HMMs). For a given graph, the aim of the algorithm is to find the shortest path or so-called the most likely path. The most likely path is usually referred to as the Viterbi path. The Viterbi algorithm provides several features which distinguishes this algorithm from others existing algorithms for this purpose. The most important characteristic of the algorithm can be mentioned as low computational complexity. Here, we design an attack based on the Viterbi algorithm and name it Viterbi attack since the principal idea behind the attack is inspired by the Viterbi algorithm. The proposed Viterbi attack can significantly endanger the location privacy of the users if it is not considered in the design of the dummy generation algorithms. The adversary can exploit the accessed side information such as transition probabilities to compromise the location privacy of the users by conducting Viterbi attack. As it will be demonstrated in simulations, for a user traveling an even short trajectory the Viterbi attack can successfully identify many of the real location. In the following, We adopt and explain how the Viterbi algorithm can be considered as a threat to the location privacy of users.

Given the queried location sets , ,…, corresponding to a trajectory of length of a user, an attacker seeks to find the most probable state sequence to compromise the location privacy of the user. Here is referred to as a state of the location set . The desired state sequence of the adversary would be where for , refers to the true location of the queried set . We define to be the maximum probability of a state sequence with the length of given as where and . This function can be expressed mathematically as

(23)

where for each the initial value of the function is set to

(24)

in which as the most credible information for the first queried location set is the query probability, is calculated via equation (10). Starting from the second queried location set the most probable path can be calculated recursively as

(25)
1 Input: queried location sets , ,…, and the normalized query probability for the location set
2 Initialization: .
3 for  do
4      
5      
6 end for
7for  do
8       for  do
9            
10            
11       end for
12      
13 end for
14
15 for  do
16      
17 end for
18Output: .
Algorithm 3 The algorithm of the proposed Viterbi attack.

The formal presentation of Viterbi attack is given in the Algo. 3. The algorithm starts by setting the initial values of the array to their normalized query probability in lines . An array called is used to keep track of the most likely state of the previous queried location set as the most probable path is calculated in lines . Finally, the most probable path is chosen and the corresponding states are returned as output.

Vi The proposed algorithms to improve location privacy of users

In this section, we start by proposing two algorithms for improving the transition-entropy metric. The algorithms are independent of the method used for generating the dummies. For the purpose of explanation, the underlying dummy generation algorithm is set to DLS in our work. The first proposed algorithm is based on exhaustively searching for the desired dummy set and the second algorithm follows a greedy approach for selection of the dummies. We continue by proposing an algorithm called robust dummy generation (RDG) which can significantly increase the privacy of the users against the Viterbi attack while maintaining the high performance in terms of transition-entropy and cell-entropy.

Vi-a Exhaustive Search Algorithm

1 Input: , the location set in -th query, and the location set which only includes the real location of the user at -th query.
2 Output: the location set which includes the real location and dummies.
3 Initialization: .
4 generate a pool of dummies using the DLS algorithm
5 choose distinct -subsets of
6 for  do
7      
8       calculate transition-entropy of
9      
10      
11 end for
12for  do
13       if  is the maximum number in  then
14            return
15             exit;
16       end if
17      
18 end for
Algorithm 4 The proposed exhaustive search algorithm for location privacy preservation of the users.

Suppose that at time the user has made its -th query for the location set of which includes the real location and its associated dummies. As the user changes its location and makes his -th query at time , assuming -anonymity for the user, we wish to generate the location set to maximize the transition-entropy metric. The idea is to generate a pool of dummies instead of only fake locations which have similar cell-entropy to the real location of the user and choosing subsets of the dummy location pool for evaluation of their transition-entropy performance. The formal description of the proposed method for generating the dummies of the set is explained in the Algorithm 4. The procedure starts by generating a pool of dummies using the DLS algorithm and assigning them to set . Then, distinct subsets of are chosen each with members, which will form a complete set of locations by addition of the real location (). Finally, the transition-entropy of each set is calculated with respect to , and the set with the maximum transition-entropy is returned as the -th query set.

The proposed exhaustive search algorithm considers the extra side information incorporated in this paper. As it will be demonstrated in simulation results, the algorithm provides a significantly better transition-entropy performance compared to the existing algorithms while maintaining the traditional cell-entropy metric near optimal.

Vi-B Greedy Algorithm

1 Input: , the location set in -th query, and the location set which only includes the real location of the user at -th query.
2 Output: the location set which includes the real location and dummies.
3 Initialization: .
4 generate a pool of dummies using the DLS algorithm
5 for  do
6      
7       for  do
8            
9            
10            
11            
12       end for
13      
14      
15      
16 end for
return
Algorithm 5 The proposed greedy algorithm for location privacy preservation of the users.

Although the Exhaustive search algorithm can significantly improve the location privacy of the users, the high computational overhead is a major drawback of the algorithm. The computational cost of the exhaustive search algorithm is in the order of where if no bound is selected for , its value is . It can be seen that the implementation of such algorithm can be time-consuming. Therefore, in order to decrease the computation complexity, we propose a greedy approach which can achieve an order of .

Following the same setup explained in exhaustive search algorithm we aim to generate the set in a way to maximize the transition-entropy with respect to the previous location set . The principal idea behind the greedy algorithm is to choose the members which maximize the transition-entropy one by one and add them to the instead of looking at all the possible combinations and the transition-entropy they achieve. The algorithm starts by generating a pool of dummies using the DLS algorithm. The DLS algorithm has been chosen due to its robust performance in terms of cell-entropy, the algorithm is applicable for other dummy generation methods as well. Initially, the location set only includes the real location of the user at -th query. The next member is added by trying out all the members in and calculating the transition-entropy of including that member and choosing the one which maximizes the transition-entropy. Then, we move to the third member and the same procedure is repeated until all the dummies are chosen. The greedy algorithm is formally presented in Algorithm 5.

Vi-C RDG Algorithm

In this subsection, we propose RDG algorithm in which the aim is to increase the resilience against the Viterbi attack while maintaining the cell-entropy and transition-entropy as high as the currently existing algorithms.

The algorithm is based on the idea of posterior probabilities introduced as part of the derivation of the transition-entropy. We explain the algorithm for generation of from the queried location set . If is the initial query of the user from the LBS provider, then, the initial posterior probabilities are set to the normalized query probability of the locations in ; otherwise, the posterior probabilities are calculated from equation (12). In the algorithm, posterior probabilities are assigned to an array called .

The algorithm starts by the generation of a pool of dummies using the DLS algorithms based on the real location of . Using DLS algorithm to generate the pool of dummies will ensure the high performance of the algorithm in terms of the cell-entropy. From our experiments, setting the pool size to four times of the would still keep the cell-entropy quite high while resulting a robust performance in terms of the transition-entropy and Viterbi attack resilience. Next, the algorithm continues by employing a greedy approach to add the most suitable dummies for the location set . For choosing the -th member of the set , each of the remaining dummies in the pool is checked one by one. A criterion chosen here is based on maximizing the entropy for the array . For each member , the array is calculated as

(26)

The weight array is chosen to be a two-dimensional array to distinguish between the weights for different location sets. For each member of the dummy pool, its weight is calculated followed by the entropy of the weight array. After calculation of the entropy for all the possible members, the member which results in maximum entropy is chosen as a next member of . The process continues until all the dummies of are chosen. Note that before calculation of the entropy the weights are normalized to make the accumulation of the probabilities add up to one. The algorithm has been designed to provide a high cell-entropy and transition-entropy privacy for the users while protecting them from the Viterbi attack on trajectories.

1 Input: , the location set in -th query, and the location set which only includes the real location of the user at -th query.
2 Output: the location set which includes the real location and dummies.
3 Initialization: .
4 for  do
5      
6      
7 end for
8 generate a pool of dummies using the DLS algorithm
9 for  do
10      
11       for  do
12            
13             for  do
14                  
15                  
16             end for
17            normalize
18            
19            
20            
21       end for
22      
23      
24      
25 end for
return
Algorithm 6 The proposed greedy algorithm for location privacy preservation of the users.

Vii Performance Evaluation

Vii-a Experiment Setup

In our experiment, we use the data collected by Geolife project [37, 38, 39], which includes the GPS trajectories of users from April 2007 to August 2012 in Beijing, China. The dataset contains the GPS logs of the users including trajectories with a total distance of . There are two main advantages distinguishing Geolife dataset for our work. Firstly, the recorded data aside from monitoring the daily routines of the users, such as going to work or home, includes trajectories involving the sports activities like hiking and cycling. Secondly, many of the recorded trajectories are tagged with a transportation mode, which indicates the use of various means of traveling from bus and car to airplane and train.

We have conducted our experiments on central part of the Beijing map with the resolution of for each grid cell. The location privacy requirement () of the users are investigated for the values to . For each value of , the algorithms are repeated time to ensure the reliability of the results. Although the proposed algorithm and metric can be used for the users who have varying location privacy requirements in consecutive queries of the LBS, for the sake of comparison, we have assumed that the value stays the same in consecutive calls for the LBS. Additionally, the experiments are performed on a PC with a GHz core-i7 Intel processor, -bit Windows operating system, and GB of RAM. Moreover, Python program is used to implement the algorithms.

Fig. 5: Comparison of algorithms in terms of cell-entropy for different values of .

Vii-B Performance Analysis

In this section, we evaluate the performance of the proposed algorithms and metrics through an extensive number of experiments. The desired outcome of the experiments is to show that the proposed RDG algorithm can withhold the currently established metric of cell-entropy [12] while increasing the performance in terms of the proposed metric transition-entropy and providing the users with a high privacy preservation against the developed Viterbi attack. We start by analyzing the performance of the algorithms in terms of cell-entropy, followed by transition-entropy analysis and investigating their performance against Viterbi attack.

Vii-B1 Cell-entropy performance evaluation

In order to calculate the cell-entropy metric, the adversary records the number of times each cell has been queried over time, and using this information calculates the query probability of each cell. Once the dataset including the real location and dummies are submitted to the server, the adversary can calculate the cell-entropy of the user. A higher value for the cell-entropy indicates more uncertainty in finding the real location or recognizing the dummies. Therefore, maximum cell-entropy is desirable to maintain the -anonymity of the users.

Fig. 5 represents the comparison of different algorithms in terms of cell-entropy. The optimal value is achieved when the locations queried form the LBS provider all have the same probability of , or equivalently, the location set has the cell-entropy of . The optimal value is the target for all the algorithms since it is the maximum entropy that a location set can achieve. In the random scheme [9], the dummies are generated randomly which expectedly results in a lower cell-entropy compared to the other algorithms. As it can be seen in the figure, the DLS algorithm achieves near-optimal performance in terms of the cell-entropy. Therefore, the adversary is unable to compromise the -anonymity of the user from the stationary set of locations submitted to the server using the available query probabilities. The exhaustive, greedy and RDG algorithms can also achieve near-optimal performance which indicates that in all the algorithms the adversary is unable to identify the dummy location by exploiting the cell-entropy. It must be noted that proposed algorithms here is adaptable to any dummy generation algorithm, therefore, the reason for a high cell-entropy performance of the proposed algorithms is that we have chosen DLS as our base. Hence, if other algorithms are chosen, the cell-entropy performance must be evaluated for them as well to ensure the robust performance in terms of the cell-entropy.

Fig. 6: Comparison of algorithms in terms of transition-entropy for different values of in trajectory of length .

Vii-B2 Transition-entropy performance evaluation

The currently established cell-entropy metric only considers the location privacy for the stationary set of queried locations submitted to the LBS server, but overlooks the fact that the adversary has access to the trajectories traveled by the users as well. The adversary can use the likelihood of traveling different paths between the consecutive location sets, and infer with a high probability that many of the submitted locations are dummies, which leads to failure in preserving location privacy requirements of the users. Fig. 6 compares the performance of different algorithms in terms of the transition-entropy for a path length of two. For all the algorithms, based on the value of , two consecutive location sets are generated, each including the real location and its associated dummies. To make the experiments as realistic as possible, the real location movements are chosen randomly from the recorded trajectories in the dataset.

Fig. 7: Comparison of algorithms in terms of transition-entropy for different values of in trajectories of length and .
Fig. 8: The performance evaluation of DLS, greedy, and RDG algorithms against Viterbi attack considering various path lengths and privacy requirement

The optimal value in Fig. 6 corresponds to a scenario in which all the members of the second location set are equally likely to be called consecutively after the members of the first location set. The optimal values can be calculated in a similar way as the optimal number for the cell-entropy for different values of . As it can be seen from the figure, the random scheme has a very poor performance which means that the adversary can easily recognize most of the dummies from the transition-entropy even for the two consecutive location sets queried by the user. The first point to notice in the figure is that although DLS algorithm achieved a near-optimal performance in terms of cell-entropy, transition-entropy performance indicates that the adversary can compromise the location privacy of the users by calculating the posterior probabilities. The transition-entropy of the proposed algorithms in this paper can be seen to significantly improve the transition-entropy performance, almost improving the performance more than twice as high as the DLS algorithm. In other words, the likelihood of compromising the -anonymity requirement is decreased by the proposed algorithms which leads to a higher location privacy for the users of LBSs. The exhaustive algorithm can be seen to achieve a little worse performance compared to the RDG and greedy. This lower performance is due to setting an upper bound for the number of sets chosen for calculation of the transition-entropy instead of going through them all which will become highly computational when the pool size of dummies is large. The performance of RDG can be seen to significantly high compared to the other algorithms.

Fig. 7, extends our analysis of transition-entropy for trajectories with higher length. The crucial inference from the graph is that as more number of locations are queried from the LBS provider, the transition-entropy reduces. This simulation result corresponds to the theoretical analysis that by having more information the adversary is able to calculate the posterior probabilities more accurately which results in less uncertainty for the adversary to identify the real location of the users. The previous algorithm DLS can be seen to have a very low transition-entropy compared to the proposed algorithms greedy and DLS. Therefore, our proposed algorithms are viable in increasing the transition-entropy of the users while maintaining the cell-entropy to near-optimal performance. It must be noted that the greedy and RDG are able to increase the transition-entropy for different dummy generation algorithms without depending on what the underlying algorithm for the generation of the pool of dummies is. Therefore, a better algorithm than DLS algorithm can cause the performance to improve as the greedy and RDG algorithms increased the transition-entropy of DLS algorithm.

Vii-C Performance of Algorithms Against Viterbi Attack

In this subsection, we analyze the performance of our proposed algorithms against the designed Viterbi attack. The performance analysis is given in Fig. 8. Considering the extensive side information we incorporated in this paper, the Viterbi attack would be a significantly threatening privacy issue for the users of LBSs. Looking at the percentage of real locations protected in the Viterbi attack on DLS algorithm, it can be seen that, for instance, in a trajectory of length the adversary is able to identify almost all the real locations of the users. This shows that although in a single request of LBS from the server the locations are protected using existing dummy generation algorithms, in trajectories the side information that the adversary has, can cause the compromised LBS provider to almost identify all the real locations. The apparent trend for all the path lengths is that increasing the number of dummies can improve the preservation of location privacy, but this increase is not sufficient even for trajectories of length two.

The second algorithm considered in Fig 8 is the greedy algorithm proposed in our work to increase the transition-entropy of the dummy generation algorithms. Although the algorithm prevents the inference of real locations based on transition-entropy, it is not capable of providing location privacy against the Viterbi attack conducted by the adversary. The performance of greedy algorithm against Viterbi attack gets worse as more number of queries are made from the LBS as the adversary will have more accurate information from the history of data. On the other hand, looking at the performance analysis of RDG, it can be seen that as the algorithm tends to confuse the adversary more and more in each requested query from the LBS provider, having larger trajectories the difference between the real path and estimated path of the Viterbi attack becomes larger. RDG algorithm is able to protect at least percent of the user queried locations if the -anonymity criterion is set to or larger.

Viii Conclusions

In this work, we incorporated new side information which can be exploited by the adversary to compromise the location privacy of the users. We proposed a metric called transition-entropy to evaluate the performance of the dummy-based algorithms and quantified the currently existing metric cell-entropy. The metric is based on the transitions between the locations in the map and considers the deplorable effect of new side information on location privacy of the users. To improve the transition-entropy metric two general approaches were proposed to increase the transition-entropy for a given dummy generation algorithm. Furthermore, we developed an attack model based on the Viterbi algorithm on location privacy of the users, followed by proposing an algorithm called RDG to increase the performance in terms of the cell-entropy and transition-entropy while protecting the users against Viterbi attack. Finally, numerous experiments were performed on real-world data to analyze the performance of the algorithms.

References

  • [1] “Location-based services (lbs) and real time location systems (rtls) market by location (indoor and outdoor), technology (context aware, uwb, bt/ble, beacons, a-gps), software, hardware, service and application area - global forecast to 2021.” [Online]. Available: https://www.marketsandmarkets.com/Market-Reports/location-based-service-market-96994431.html
  • [2] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive computing, vol. 2, no. 1, pp. 46–55, 2003.
  • [3] T. Jiang, H. J. Wang, and Y.-C. Hu, “Preserving location privacy in wireless lans,” in Proceedings of the 5th international conference on Mobile systems, applications and services.   ACM, 2007, pp. 246–257.
  • [4] C.-Y. Chow, M. F. Mokbel, and X. Liu, “A peer-to-peer spatial cloaking algorithm for anonymous location-based service,” in Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems.   ACM, 2006, pp. 171–178.
  • [5] M. L. Yiu, C. S. Jensen, J. Møller, and H. Lu, “Design and analysis of a ranking approach to private location-based services,” ACM Transactions on Database Systems (TODS), vol. 36, no. 2, p. 10, 2011.
  • [6] R. Schlegel, C.-Y. Chow, Q. Huang, and D. S. Wong, “User-defined privacy grid system for continuous location-based services,” IEEE Transactions on Mobile Computing, vol. 14, no. 10, pp. 2158–2172, 2015.
  • [7] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.
  • [8] M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in Proceedings of the 1st international conference on Mobile systems, applications and services.   ACM, 2003, pp. 31–42.
  • [9] H. Kido, Y. Yanagisawa, and T. Satoh, “An anonymous communication technique using dummies for location-based services,” in Pervasive Services, 2005. ICPS’05. Proceedings. International Conference on.   IEEE, 2005, pp. 88–97.
  • [10] B. Niu, Z. Zhang, X. Li, and H. Li, “Privacy-area aware dummy generation algorithms for location-based services,” in Communications (ICC), 2014 IEEE International Conference on.   IEEE, 2014, pp. 957–962.
  • [11] H. Lu, C. S. Jensen, and M. L. Yiu, “Pad: privacy-area aware, dummy-based location privacy in mobile services,” in Proceedings of the Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access.   ACM, 2008, pp. 16–23.
  • [12] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving k-anonymity in privacy-aware location-based services,” in INFOCOM, 2014 Proceedings IEEE.   IEEE, 2014, pp. 754–762.
  • [13] A. Serjantov and G. Danezis, “Towards an information theoretic metric for anonymity,” in International Workshop on Privacy Enhancing Technologies.   Springer, 2002, pp. 41–53.
  • [14] A. Pfitzmann and M. Köhntopp, “Anonymity, unobservability, and pseudonymity—a proposal for terminology,” in Designing privacy enhancing technologies.   Springer, 2001, pp. 1–9.
  • [15] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Technical report, SRI International, Tech. Rep., 1998.
  • [16] M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in Proceedings of the 1st international conference on Mobile systems, applications and services.   ACM, 2003, pp. 31–42.
  • [17] C.-Y. Chow and M. F. Mokbel, “Enabling private continuous queries for revealed user locations,” in International Symposium on Spatial and Temporal Databases.   Springer, 2007, pp. 258–275.
  • [18] T. Xu and Y. Cai, “Exploring historical location data for anonymity preservation in location-based services,” in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE.   IEEE, 2008, pp. 547–555.
  • [19] X. Pan, J. Xu, and X. Meng, “Protecting location privacy against location-dependent attacks in mobile services,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 8, pp. 1506–1519, 2012.
  • [20] J. Xu, X. Tang, H. Hu, and J. Du, “Privacy-conscious location-based queries in mobile environments,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 3, pp. 313–326, 2010.
  • [21] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive Computing, vol. 2, no. 1, pp. 46–55, Jan 2003.
  • [22] ——, “Mix zones: User privacy in location-aware services,” in Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second IEEE Annual Conference on.   IEEE, 2004, pp. 127–131.
  • [23] B. Palanisamy and L. Liu, “Mobimix: Protecting location privacy with mix-zones over road networks,” in Data Engineering (ICDE), 2011 IEEE 27th International Conference on.   IEEE, 2011, pp. 494–505.
  • [24] ——, “Attack-resilient mix-zones over road networks: architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 14, no. 3, pp. 495–508, 2015.
  • [25] R. Lu, X. Lin, T. H. Luan, X. Liang, and X. Shen, “Pseudonym changing at social spots: An effective strategy for location privacy in vanets,” IEEE Transactions on Vehicular Technology, vol. 61, no. 1, pp. 86–96, 2012.
  • [26] S. Gao, J. Ma, W. Shi, G. Zhan, and C. Sun, “Trpf: A trajectory privacy-preserving framework for participatory sensing,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 6, pp. 874–887, 2013.
  • [27] J. Freudiger, M. Raya, M. Félegyházi, P. Papadimitratos, and J.-P. Hubaux, “Mix-zones for location privacy in vehicular networks,” in ACM Workshop on Wireless Networking for Intelligent Transportation Systems (WiN-ITS), no. LCA-CONF-2007-016, 2007.
  • [28] T. Kölsch, L. Fritsch, M. Kohlweiss, and D. Kesdogan, “Privacy for profitable location based services,” in International Conference on Security in Pervasive Computing.   Springer, 2005, pp. 164–178.
  • [29] T. Rodden, A. Friday, H. Muller, A. Dix et al., “A lightweight approach to managing privacy in location-based services,” 2002.
  • [30] R. Lu, X. Lin, T. H. Luan, X. Liang, and X. Shen, “Pseudonym changing at social spots: An effective strategy for location privacy in vanets,” IEEE Transactions on Vehicular Technology, vol. 61, no. 1, pp. 86–96, 2012.
  • [31] M. Wernke, P. Skvortsov, F. Dürr, and K. Rothermel, “A classification of location privacy attacks and approaches,” Personal and ubiquitous computing, vol. 18, no. 1, pp. 163–175, 2014.
  • [32] H. J. Do, Y.-S. Jeong, H.-J. Choi, and K. Kim, “Another dummy generation technique in location-based services,” in Big Data and Smart Computing (BigComp), 2016 International Conference on.   IEEE, 2016, pp. 532–538.
  • [33] T. Hara, A. Suzuki, M. Iwata, Y. Arase, and X. Xie, “Dummy-based user location anonymization under real-world constraints,” IEEE Access, vol. 4, pp. 673–687, 2016.
  • [34] R. Cheng, Y. Zhang, E. Bertino, and S. Prabhakar, “Preserving user location privacy in mobile data management infrastructures,” in International Workshop on Privacy Enhancing Technologies.   Springer, 2006, pp. 393–412.
  • [35] A. Franken, “Text - s.1223 - 112th congress (2011-2012): Location privacy protection act of 2012,” Dec 2012. [Online]. Available: https://www.congress.gov/bill/112th-congress/senate-bill/1223/text
  • [36] G. D. Forney, “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268–278, 1973.
  • [37] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining interesting locations and travel sequences from gps trajectories,” in Proceedings of the 18th international conference on World wide web.   ACM, 2009, pp. 791–800.
  • [38] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma, “Understanding mobility based on gps data,” in Proceedings of the 10th international conference on Ubiquitous computing.   ACM, 2008, pp. 312–321.
  • [39] Y. Zheng, X. Xie, and W.-Y. Ma, “Geolife: A collaborative social networking service among user, location and trajectory.” IEEE Data Eng. Bull., vol. 33, no. 2, pp. 32–39, 2010.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
195691
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description