ClusterAided Mobility Predictions
Abstract
Predicting the future location of users in wireless networks has numerous applications, and can help service providers to improve the quality of service perceived by their clients. The location predictors proposed so far estimate the next location of a specific user by inspecting the past individual trajectories of this user. As a consequence, when the training data collected for a given user is limited, the resulting prediction is inaccurate. In this paper, we develop clusteraided predictors that exploit past trajectories collected from all users to predict the next location of a given user. These predictors rely on clustering techniques and extract from the training data similarities among the mobility patterns of the various users to improve the prediction accuracy. Specifically, we present CAMP (ClusterAided Mobility Predictor), a clusteraided predictor whose design is based on recent nonparametric bayesian statistical tools. CAMP is robust and adaptive in the sense that it exploits similarities in users’ mobility only if such similarities are really present in the training data. We analytically prove the consistency of the predictions provided by CAMP, and investigate its performance using two largescale datasets. CAMP significantly outperforms existing predictors, and in particular those that only exploit individual past trajectories.
2pt 2pt
I Introduction
Predicting users’ mobility in wireless networks has received a great deal of attention recently, strongly motivated by a wide range of applications. Examples of such applications include: locationbased services provided to users by anticipating their movements (e.g., mobile advertisement, recommendation systems, risk alarm); urban traffic engineering and forecasting; the design of more efficient radio resource allocation protocols (e.g., scheduling and handover management [1], data prefetching [2] and energy efficient location sensing [3]). However, for these applications to significantly benefit from users’ mobility predictions, the latter should be made with a sufficiently high degree of accuracy.
Many mobility prediction methods and algorithms have been devised over the last decade, see e.g. [4, 5, 3, 6]. The algorithms proposed so far estimate the next location of a specific user by inspecting the data available about her past mobility, i.e., her past trajectory, and exploit the inherent repeated patterns present in this data. These patterns correspond to the regular behavior of the user, e.g. commuting from home to work or visiting favourite restaurants, and need to be extracted from the data to provide accurate predictions. To this aim, one has to observe the behavior of the user over long periods of time. Unfortunately, gathering data about users’ mobility can be quite challenging. For instance, detecting the current location of a user with sensors (e.g., GPS, WiFi and cell tower) consumes a nonnegligible energy. Users may also hesitate to log their trajectories to preserve their privacy. In any case, when the data about the mobility of a given user is limited, it is hard to identify her typical mobility patterns, and in turn difficult to provide accurate predictions on her next move or location.
In this paper, we aim at devising mobility predictors that perform well even if the past trajectories gathered for the various users are short. Our main idea is to develop clusteraided predictors that exploit the data (i.e., past trajectories) collected from all users to predict the next location of a given user. These predictors rely on clustering techniques and extract from the training data similarities among the mobility patterns of the various users to improve the prediction accuracy. More precisely, we make the following contributions:

We present CAMP (ClusterAided Mobility Predictor), a clusteraided predictor whose design is based on recent nonparametric bayesian statistical tools [7, 8]. CAMP extracts, from the data, clusters of users with similar mobility processes, and exploit this clustered structure to provide accurate mobility predictions. The use of nonparametric statistical tools allows us to adapt the number of extracted clusters to the training data (this number can actually grow with the data, i.e., with the number of users). This confers to our algorithm a strong robustness, i.e., CAMP exploits similarities in users’ mobility only if such similarities are really present in the training data.

We derive theoretical performance guarantees for the predictions made under the CAMP algorithm. In particular, we show that CAMP can achieve the performance of an optimal predictor (among the set of all predictors) when the number of users grows large, and for a large class of mobility models.

Finally, we compare the performance of our predictor to that of other existing predictors using two largescale mobility datasets (corresponding to a WiFi and a cellular network, respectively). CAMP significantly outperforms existing predictors, and in particular those that only exploit individual past trajectories to estimate users’ next location.
Ii Related work
Most of existing mobility prediction methods estimate the next location of a specific user by inspecting the past individual trajectories of this user. One of the most popular mobility predictors consists in modelling the user trajectory as an order Markov chain. Predictors based on the order Markov model are asymptotically optimal [9, 6] for a large class of mobility models. This optimality only holds asymptotically when the length of the observed user past trajectory tends to infinity. Unfortunately, when the observed past trajectory of the user is rather short, these predictors perform poorly. Such phenomenon is often referred to as the “coldstart problem”. To improve the performance of these predictors for short histories, a fallback mechanism can be added [4] to reduce the order of the Markov model when the current sequence of previous locations has not been encountered before. Alternatively, one may adapt the order of the Markov model used for prediction as in the Sampled Pattern Matching (SPM) algorithm [6], which sets the order of the Markov model to a fraction of the longest suffix match in the history. SPM is asymptotically optimal with provable bounds on its rate of convergence, when the trajectory is generated by a stationary mixing source. Another type of mobility predictor, Nextplace [5] attempts to leverage the timestamps that may be associated with the successive locations visited by the user. Empirical evaluations [4, 3] show that complex mobility models do not perform well: the order Markov predictor with fallback gives comparable performance to that of SPM [6], NextPlace [5] and higher order Markov predictors. In addition [3] reports that the order Markov predictor can actually provide better predictions than higher order Markov predictors, as the latter suffer more from the lack of training data.
There have been a few papers aiming at clustering trajectories or more generally stochastic processes. For example, [10] proposes algorithms to find clusters of trajectories based on likelihood maximization for an underlying hidden Markov model. For the same problem, [11] uses spectral clustering in a semiparametric manner based on Bhattacharyya affinity metric between pairs of trajectories. Those methods would not work well in our setting. This is due to the facts that (i) users belonging to a same cluster should have trajectories generated by identical parameters, and (ii) the number of clusters should be known beforehand, or estimated in a reliable way. The nonparametric Bayesian approach developed in this paper addresses both issues. [12] also introduced Bayesian approach that focused on the similarity between users’ temporal patterns. But they do not consider the similarity between spatial trajectories and the correlation to the recent locations which are crucial to the correct predictions in our setting.
Iii Models and Objectives
In this section, we first describe the data on past user trajectories available at a given time to build predictors. We then provide a model for user mobility, used to define our nonparametric inference approach, as well as its objectives.
Iiia Collected Data
We consider the problem of predicting at a given time the mobility, i.e., the next position of users based on observations about past users’ trajectories. These observations are collected and stored on a server. The set of users is denoted by , and users are all moving within a common finite set of locations. The trajectory collected for user is denoted by , where corresponds to the th location visited by user , and where refers to the length of the trajectory. denotes the current location of user . By definition, we impose , i.e., two consecutive locations on a trajectory must be different. Let denote the set of user trajectories. Observe that the lengths of the trajectories may vary across users. If the location of a user is sensed periodically, we can collect the time a given user has stayed at each location. Those staying times for user are denoted by , where is the staying time at the th visited location. To simplify the presentation, we present our prediction methods ignoring the staying times ; but we mention how to extend our approach to include staying times in §IVB4.
Next we introduce additional notations. We denote by the number of observed transitions for user from location to , (i.e., ). Similarly, is the number of times user has been observed at location . Let denote the set of all possible trajectories of a given user, and let be the set of all possible set of trajectories of users in .
IiiB Mobility Models
The design of our predictors is based on a simple mobility model. We assume that user trajectories are order1 Markov chains, with arbitrary initial state or location. More precisely, user’s trajectory is generated by the transition kernel , where denotes the probability that user moves from location to along her trajectory. Hence, given her initial position , the probability of observing trajectory is . Our mobility model can be readily extended to order Markov chains. However, as observed in [3], order1 Markov chain model already provides reasonably accurate predictions in practice, and higherorder models would require a fallback mechanism^{1}^{1}1To accurately predict the next position of user given that the sequence of her past positions is , her trajectory should contain numerous instances of this sequence, which typically does not occur if the observed trajectory is short – and this is precisely the case we are interested in.[4]. Throughout the paper, we use uppercase letters to represent random variables and the corresponding lowercase letters for their realizations, e.g. (resp. ) denotes the random (resp. realization of) trajectory of user .
IiiC Bayesian Framework, Clusters, and Objectives
We adopt a Bayesian framework, and assume that the transition kernels of the various users are drawn independently from the same distribution ^{2}^{2}2 denotes the set of distributions over the set , and . referred to as the prior distribution over the set of all possible transition kernels . This assumption is justified by De Finetti’s theorem (see [13], Theorem 11.10) if are exchangeable (which is typically the case if users are a priori indistinguishable). In the following, the expectation and probability under are denoted by and , respectively. To summarize, the trajectories of users are generated using the following hierarchical model: for all , , , and are arbitrarily fixed.
To provide accurate predictions even if observed trajectories are rather short, we leverage similarities among user mobility patterns. It seems reasonable to think that the trajectories of some users are generated through similar transition kernels. In other words, the distribution might exhibit a clustered structure, putting mass around a few typical transition kernels. Our predictors will identify these clusters, and exploit this structure, i.e., to predict the next location of a user , we shall leverage the observed trajectories of all users who belong to user’s cluster.
For any user , we aim at proposing an accurate predictor of her next location, given the observed trajectories of all users. The (Bayesian) accuracy of a predictor for user , denoted by , is defined as (where for conciseness, we write ). Clearly, given , the best possible predictor would be:
(1) 
Computing this optimal predictor, referred to as the Bayesian predictor with prior , requires the knowledge of . Indeed:
(2) 
Since here the prior distribution is unknown, we will first estimate from the data, and then construct our predictor according to (1)(2).
Iv Bayesian Nonparametric Inference
In view of the model described in the previous section, we can devise an accurate mobility predictor if we are able to provide a good approximation of the prior distribution on the transition kernels dictating the mobility of the various users. If concentrates its mass around a few typical kernels that would in turn define clusters of users (i.e., users with similar mobility patterns), we would like to devise an inference method identifying these clusters. On the other hand, our inference method should not discover clusters if there are none, nor specify in advance the number of clusters (as in the traditional mixture modelling approach). Towards these objectives, we apply a Bayesian nonparametric approach that estimates how many clusters are needed to model the observed data and also allows the number of clusters to grow with the size of the data. In Bayesian nonparametric approaches, the complexity of the model (here the number of clusters) is part of the posterior distribution, and is allowed to grow with the data, which confers flexibility and robustness to these approaches. In the remaining of this section, we first present an overview of the Dirichlet Process mixture model, a particular Bayesian nonparametric model, and then apply this model to the design of CAMP (ClusterAided Mobility Predictor), a robust and flexible prediction algorithm that efficiently exploits similarities in users’ mobility, if any exist.
Iva Dirichlet Process Mixture Model
When applying Bayesian nonparametric inference techniques [7] to our prediction problem, we add one level of randomness. More precisely, we approximate the prior distribution on the transition kernels by a random variable with distribution . This additional level of randomness allows us to introduce some flexibility in the number of clusters present in . We shall compute the posterior distribution given the observations , and hope that this posterior distribution, denoted as , will concentrate its mass around the true prior distribution . To evaluate , we use Gibbs sampling techniques (see Section IVB1), and from these samples, we shall estimate the true prior , and derive our predictor by replacing by its estimate in (1)(2).
For the higherlevel distribution , we use the Dirichlet Process (DP) mixture model, a standard choice of prior over infinite dimensional spaces, such as . The DP mixture model has a possibly infinite number of mixture components or clusters, and is defined by a concentration parameter , which impacts the number of clusters, and a base distribution , from which new clusters are drawn. The DP mixture model with parameters and is denoted by and defined as follows. If is a random measure drawn from (i.e., ), and is a (measurable) partition of , then follows a Dirichlet distribution with parameters ^{3}^{3}3The Dirichlet distribution with parameters has density (with respect to Lebesgue measure) proportional to .. It is well known [14] that a sample from has the form where is the Dirac measure at point , the ’s are i.i.d. with distribution and represent the centres of the clusters (indexed by ), and the weights ’s are generated using a Beta distribution according to the following stickbreaking construction:
When is generated under the above DP mixture model, we can compute the distribution of given . When is fixed, then users in are clustered and the set of corresponding clusters is denoted by . Users in cluster share the same transition kernel , and the number of users assigned to cluster is denoted by . The distribution of given is then:
(3) 
(3) makes the cluster structure of the DP mixture model explicit. Indeed, when considering a new user , a new cluster containing user only is created with probability , and user is associated with an existing cluster with probability proportional to the number of users already assigned to this cluster. Refer to [15] for a more detailed description on DP mixture models.
Our prediction method simply consists in approximating by the expectation w.r.t. the posterior distribution . In other words, for user , the estimated next position will be:
(4) 
where denotes the expectation w.r.t. the probability measure induced by . To compute , we rely on Gibbs sampling techniques to generate samples with distribution . The way concentrates its mass around the true prior will depend on the choice of parameters and , and to improve the accuracy of our predictor, these parameters will be constantly updated when successive samples are produced.
IvB CAMP: ClusterAided Mobility Predictor
Next we present CAMP, our mobility prediction algorithm. The objective of this algorithm is to estimate from which we derive the predictions according to (4). CAMP consists in generating independent samples of the assignment of users to clusters induced by the posterior distribution , and then in providing an estimate of from these samples. As mentioned above, the accuracy of this estimate strongly depends on the choice of parameters and in the DP mixture model, and these parameters will be updated as new samples are generated.
More precisely, the CAMP algorithm consists in two steps. (i) In the first step, we use Gibbs sampler to generate samples of the assignment of users to clusters under the probability measure induced by , and update the parameters and of the DP mixture model using these samples (hence we update the prior distribution ). We repeat this procedure times. In the th iteration, we construct samples of users’ assignment. The th assignment sample is referred to as in CAMP pseudocode, where is the cluster of user in that sample. The subroutines providing the assignment samples, and updating the parameters of the prior distribution are described in details in §IVB1 and §IVB2, respectively. At the end of the first step, we have constructed a prior distribution parametrized by and which is adapted to the data, i.e., a distribution that concentrates its mass on the true prior . (ii) In the second step, we use the updated prior to generate one last time samples of users’ assignment. Using these samples, we compute an estimate of for each user , and finally derive the prediction of the next position of user . The way we compute is detailed in §IVB3.
The CAMP algorithm takes as inputs the data , the number of updates of the prior distribution , the number of samples generated by the Gibbs sampler in each iteration, and the number of times the users’ assignment is updated when producing a single assignment sample using Gibbs sampler (under Gibbs sampler, the assignment is a Markov chain, which we simulate long enough so as it has the desired distribution). , , and have to be chosen as large as possible. Of course, increasing these parameters also increases the complexity of the algorithm, and we may wish to select the parameters so as to achieve an appropriate tradeoff between accuracy and complexity.
IvB1 Sampling from the DP mixture posterior
We use Gibbs sampler [16] to generate independent samples of the assignment of users to clusters under the probability measure induced by the posterior , i.e., samples of assignment with distribution , where denotes the probability measure induced by . Gibbs sampling is a classical MCMC method to generate samples from a given distribution. It consists in constructing and simulating a Markov chain whose stationary state has the desired distribution. In our case, the state of the Markov chain is the assignment , and its stationary distribution is . The Markov chain should be simulated long enough (here the number of steps is denoted by ) so that at the end of the simulation, the state of the Markov chain has converged to the steadystate. The pseudocode of the proposed Gibbs sampler is provided in Algorithm 2, and easily follows from the description of the DP mixture model provided in (3).
To produce a sample of the assignment of users to clusters, we proceed as follows. Initially, we group all users in the same cluster , the number of cluster is set to 1, and the number of users (except for user ) assigned to cluster is . (see Algorithm 2). Then the assignment is revised times. In each iteration, each user is considered and assigned to either an existing cluster, or to a newly created cluster (the latter is denoted by if in the previous iteration there was clusters). This assignment is made randomly according to the model described in (3). Note that in the definition of , we have , where corresponds to the data of users in cluster , i.e., .
IvB2 Updates of and
As in any Bayesian inference method, our prediction method could suffer from a bad choice of parameters and defining the prior . For example, by choosing a small value for , we tend to get a very small number of clusters, and possibly only one cluster. On the contrary, selecting a too large would result in a too large number of clusters, and in turn, would make our algorithm unable to capture similarities in the mobility patterns of the various users. To circumvent this issue, we update and fit the parameters to the data, as suggested in [8]. In the CAMP algorithm, the initial base distribution is uniform over all transition kernels (over ) and is taken equal to 1. Then after each iteration, we exploit the samples of assignments of users to clusters to update these initial parameters, by refining our estimates of and .
(5)  
(6) 
Note that (5) simply corresponds to a kernel density estimator based on the cluster samples obtained with prior distribution parametrized by and , whereas (6) corresponds to a maximum likelihood estimate (see [17]), which sets to the value which is most likely to have resulted in the average number of clusters obtained when sampling from the model with parameters and .
IvB3 Computation of
As mentioned earlier, is an estimator of where is parameterized by and , and is used for our prediction of user’s mobility. is just the empirical average of for clusters to which user is associated in the last samples generated in CAMP, i.e.,
(7)  
(8) 
Note that in view of the law of large numbers, when grows large, converges to . The predictions for user are made by first computing an estimated transition kernel according to (8). We derive an explicit expression of that does not depend on , but only on data and the samples generated in the CAMP algorithms. This expression, given in the following lemma, will be useful to understand to what extent the prediction of user’s mobility under CAMP leverages observed trajectories of other users.
Lemma 1
For any is computed by a weighted sum of all users’ empirical transition kernels (), i.e.,
(9)  
(10)  
The sum stands for , and is the set of every cluster sampled at th iterations (i.e., ). and are given by:
where , , and .
Proof. Refer to Appendix.
When the current location is fixed, the first term in the r.h.s. of (9) is constant over all users. The second term can be interpreted as a weighted sum of the empirical transition kernels of all users (i.e., ). The weight of user ( in (9)) quantifies how much we account for user’s trajectory in the prediction for user at the current location , and can be seen as a notion of similarity between and . Indeed, as the number of sampled clusters in which both and are involved increases, in (9) increases accordingly. Also, if has relatively high compared to other users (i.e., has accumulated more observations at the location than other users), a higher weight is assigned to .
IvB4 Estimating the Stayingtimes
Next we provide a way of estimating how long user will stay at her current location . We may perform such estimation when the available data include the time users stay at the various locations. Typically, the existing spatiotemporal predictors predict the staying time at the current location by computing average [5] or quantile [3] of user ’s staying times observed at her previous visits to On the other hand, CAMP additionally exploits other users’ staying time observations using the weight . More precisely, the staying time of user at location (denoted by ) is estimated by
(11) 
in (11) is a normalization constant to make the sum of weights over all users equal to 1. The estimate (11) is a heuristic, for is actually obtained by clustering based on their location trajectories , rather than their staying times. This heuristic estimate actually performs well as empirically shown in Section VIB4.
V Consistency of CAMP Predictor
In this section, we analyze to what extent (that is well approximated, when is large, by derived in the CAMP algorithm) is close to , the expectation under the true prior . We are mainly interested in the regime where the user population becomes large, while the number of observations for each user remains bounded. This regime is motivated by the fact it is often impractical to gather long trajectories for a given user, while the user population available may on the contrary be very large. For the sake of the analysis, we assume that the length of user’s observed trajectory is a random variable with distribution , and that the lengths of trajectories are independent across users. We further assume that the length is upper bounded by , e.g., .
Since the length of trajectories is bounded, we cannot ensure that is arbitrarily small. Indeed, for example if users’ trajectories are of length 2 only, we cannot group users into clusters, and in turn, we can only get a precise estimate of the transition kernels averaged over all users. In particular, we cannot hope to estimate for each user . Next we formalize this observation. We denote by the set of possible trajectories of length less than . With finitelength observed trajectories, there are distributions that cannot be distinguished from the true prior by just observing users’ trajectories, i.e., these distributions induce the same law on the observed trajectories as : on (here denotes the probability measure induced under , and recall that is the probability measure induced by ).
We prove that, when the number of observed users grows large,
is upperbounded by the performance provided by a distribution indistinguishable from , which expresses the consistency of our inference framework.
Before we state our result, we introduce the following two notions:
KL neighborhood: the KullbackLeibler neighborhood of a distribution with respect to is defined as the following set of distributions:
where .
KL support: The distribution is in the KullbackLeibler support of a distribution with respect to if for all .
Theorem 2
If is in the KLsupport of with respect to , then we have, almost surely, for any ,
(12) 
Proof. Refer to Appendix.
The r.h.s. of (2) captures the performance of an algorithm that would perfectly estimate for the worst distribution which agrees with the true prior on Note that in our framework, for the prior , we use is a DP mixture , with a base measure having full support . Therefore, the KLsupport of is here the whole space ; it thus contains .
As far as we are aware, Theorem 2 presents the first performance result on inference algorithms using DP mixture models with indirect observations. By indirect observations, we mean that the kernels cannot be observed directly, but are revealed only through the trajectories . Most existing analysis [18, 19, 20] do not apply in our setting, as these papers aim at identifying conditions on the Bayesian prior and on the true distribution under which the Bayesian posterior will converge (either weakly or in norm) to in the limit of large population size. Hence, existing analysis are concerned with direct observations of the kernels .
Vi Empirical Evaluation of CAMP
Via Mobility Traces
We evaluate the performance of CAMP predictor using two sets of mobility traces collected on a WiFi and cellular network, respectively.
WiFi traces [21]. We use the dataset of [21] where the mobility of 62 users are collected for three months in WiFi networks mainly around a campus in South Korea. The smartphone of each users periodically scans its radio environment and gets a list of mac addresses of available access points (APs). To map these lists of APs collected over time to a set of locations, we compute the Jaccard index^{4}^{4}4Jaccard index between two lists and is defined as . between two lists of of APs scanned at different times. If two lists of APs have a Jaccard index higher than 0.5, these two lists are considered to correspond to a same geographical locations [21]. From the constructed set of locations, we then construct the trajectories of the various users.
ISP traces [22]. We also use the call detailed record (CDR) dataset provided by Orange where the mobility of 50000 subscribers in Senegal are measured over two weeks. We use the SET2 data [22], where the mobility of a given user is reported as a sequence of base station (BS) ids, and time stamps. Each record is obtained only when the user communicates with base stations (e.g., phone call, text message).
In each dataset, we first restrict our attention to a subset of frequently visited locations. We select the 116 and 80 most visited locations in WiFi traces and ISP traces datasets, respectively. We then reconstruct users’ trajectories by removing locations not in . For the ISP dataset, we extract 200 users (randomly chosen among users who visited at least 10 of the locations in ). From the reconstructed trajectories, we observe a total number of transitions from one location to another equal to 8194 and 13453 for the WiFi and ISP dataset.
Users’ similarity. Before actually evaluating the performance of various prediction algorithms, we wished to assess whether users exhibit similar mobility patterns, that could in turn be exploited in our predictions. Here, we test the similarity of pairs of users only. More precisely, we wish to know whether the observed trajectory of user could be aggregated to that of user to improve the prediction of user’s mobility. To this aim, we use the concept of mutual prediction [23] as follows.
We first define the empirical accuracy of an estimator of user’s transition kernel:
(13) 
Let be the maximum likelihood estimator of given (i.e., ). Intuitively, user’s trajectory is useful to predict the mobility of user if has a high empirical accuracy for user , i.e., if is high. We hence define the similarity of users and as . Note that the notion of similarity is not symmetric (in general ), and it always takes its value between 0 and 1.
Fig. 1 (a) and (b) present the similarity between 62 users in WiFi trace and 100 users in the ISP subscriber dataset. To provide meaningful plots, we have ordered users so that pairs of users with high similarity are neighbours (to this aim, we have run the spectral clustering algorithm [11] and regrouped users in the identified clusters). From these plots, the similarity of users is apparent, however we also clearly observe that perfect clusters (in which users’ patterns are exactly same) do not really exist. From the dataset, we observe that 1.65% and 5% of user pairs out of all possible pairs have similarity higher than 0.5 for the WiFi and ISP traces. We also computed the number of users having at least one user with whom the similarity is higher than 0.5. In the WiFi traces, we found 19 (out of 62) such users, whereas in the ISP traces there are 173 (out of 200) such users. These numbers are high, and justify the design of clusteraided predictors.
ViB Prediction Accuracy
ViB1 Tested Predictors
We assess the performance of six types of predictors: the order1 Markov predictor (Markov [4]), the order2 Markov predictor with fallback (MarkovO(2) [4]), AGG, CAMP and CAMP, AGG. Before describing each predictor, we briefly introduce some notations regarding the training data available at a given time. The time stamp of the arrival at th location on user’s trajectory is denoted by , and is the length of user’s trajectory collected before time (i.e., ). The collection of users’ trajectories available for a prediction at time is denoted by (i.e., where ). The prediction for is denoted by
In order to derive an estimate of the th location of user , the Markov predictors first estimate based on user trajectory only, i.e., based on . In contrast, AGG and CAMP algorithms exploit the data available on all users to estimate . The AGG algorithm tries in a very naive way to exploit users’ similarities. It considers that all users have the same transition kernel (as if there were a single cluster only), and thus uses all trajectories (in the same way) to estimate . CAMP (resp. AGG) differs from CAMP (resp. AGG) in that its prediction at time under for user uses other users’ complete trajectories (i.e., ). This corresponds to a case where user starts moving along her trajectory after other users have gathered sufficiently long trajectories. Under all algorithms, the estimated is denoted by ). Finally, MarkovO(2) assumes that users’ trajectories are order2 Markov chains, and for the locations where the corresponding order2 transitions are not observed, MarkovO(2) falls back to the Markov predictor. The description of the various predictors is summarized in Table 1.
Markov [4]  
AGG  
CAMP  
AGG  
CAMP 
The parameters , and for CAMP and CAMP are set to 8, 3 and 30.
ViB2 Results
We assess the performance of the various algorithms using two main types of metrics. The first metric, referred to as the Cumulative Accurate Prediction Ratio (CAPR), is defined as the fraction of accurate predictions for all users up to time :
We also introduce a similar metric that captures the cumulative accuracy of predictions after observing different locations on users’ trajectories:
The second type of metrics concerns the instantaneous accuracy of the predictions. The Instantaneous Accurate Prediction Ratio (IAPR) after observing different locations on users’ trajectories is defined as follows.
Fig.2(a)(b) present as a function of time for various algorithms and for the two mobility traces. CAMP outperforms all other algorithms at any time. The improvement over Markov and MarkovO(2) can be as high as 65%. This illustrates the performance gain that can be achieved when exploiting users’ similarities. Note MarkovO(2) does not outperform Markov, which was also observed in [3]. In the following, we only evaluate the performance of the Markov predictor, and do not report that of its order2 equivalent.
In Fig.2 (c)(f), we plot the CAPR and IAPR as a function of the length of the observed trajectory. In Fig.2(c) and (d), when the collected trajectory is not sufficient (i.e., ), CAMP and CAMP outperforms Markov by 64% and 40%, respectively. Regarding the IAPR in WiFi traces, Fig 2(e) shows that CAMP and CAMP provide much better predictions than Markov, when the length of trajectory is less than 140. After a sufficient training data is collected, they yield comparable IAPR. In Fig 2 (f), for the ISP traces, the IAPR under CAMP and Markov are similar sooner, for trajectories of length greater than 20 only.
In Fig.2 (g) and (h), we evaluate the CAPR and IAPR averaged only over users having at least one user with whom the similarity is higher than 0.5 (see §VIA). These users are referred to as Mobility Friendly (MF) users. In Fig.2(g), we observe that for MF users, the gain of CAMP and CAMP becomes really significant, i.e., when =10, the CAPR of CAMP and CAMP outperform that of Markov by 102% and 65%, respectively. Also note that CAMP becomes significantly better than CAMP for MF users. This is explained by the fact that we can predict the mobility of MF users much more accurately if we have a long history of the mobility of users they are similar to. The performance for MF users in the ISP traces is not presented, because there, most of users (i.e., 86%) are already MF users.
ViB3 Exploiting Similarities in CAMP
Recall that, by the weight of the empirical transition kernel of user (i.e., ) in computing in (9), we can quantify to what extent the observed trajectory of user is taken into account in the estimate . When summing over all locations , we get an aggregate indicator capturing how impacts the prediction for user’s mobility. To understand how many users actually impact the prediction for user in the CAMP, we may look at the cardinality of the set of users whose aggregate indicator exceeds a given threshold: where is a normalization constant to make the sum of aggregate indicators over all users equal to 1. The above set is called the set of similar users.
In Fig.3, we plot the number of similar users, averaged over all users , and as a function of the length of trajectories (in days ). In case of CAMP, the first day, the average numbers are 7 and 110 in WiFi traces and ISP traces, which means that CAMP aggressively uses the trajectories of all users for its prediction. When the length of the trajectories increase, the average size decreases to 1.5 after one month in WiFi traces and 2.2 after two weeks in ISP traces. In other words, as data is accumulated, CAMP does not use the trajectories of a lot of users for its prediction. This illustrates the adaptive nature of CAMP, which only exploits similarities among users if this is needed. In the case of CAMP, we observe a faster decrease with time of the average number of similar users, which means that CAMP tends to utilize other users’ data more selectively, even at the beginning. This explains why CAMP performs better than CAMP in Fig.2.
ViB4 Error of Staying Time Estimation
In our scenario, where each user arrives at th location , a predictor estimates the staying time with the available data. Markov predictor [3, 5](resp. AGG) computes the average of staying times of user (resp. all users) which have been measured at the location until . CAMP predicts by computing the equation (11) with the observed data of all users. The performance metric for each user measured at th location is the difference between the estimated and acutual staying time ( i.e., ). We call it as estimation error. We test the estimation error only with WiFi trace, because we cannot precisely observe staying time in ISP trace in which a location is recorded not periodically, but only when users randomly communicate with base stations.
Fig.4 (a) plots CDFs of estimation errors of every user and obtained by tested predictors. CAMP provides lower estimation error than that of Markov and AGG. The median of CAMP is less than those of Markov and AGG by 35% and 28%, respectively. For 18% of all instances (marked as “Estimation failure”), Markov couldn’t provide estimations, because the individual users haven’t collected their staying times at the current location before. However in those cases AGG and CAMP are still able to estimate the staying time by using other users’ observations. In Fig.4 (b), we further test the estimation quality of AGG and CAMP, when Markov is unavailable due to lack of the individual training data. In that case, 43% of estimations provided by CAMP give less than 30 minutes errors. Median of estimation errors of CAMP is 13.4% less than that of AGG, because CAMP selectively utilizes other users’ data.
Vii Concluding Remarks
In this paper, we have presented a clusteraided inference method to predict the mobility of users in wireless networks. This method significantly departs from existing prediction techniques, as it aims at exploiting similarities in the mobility patterns of the various users to improve the prediction accuracy. The proposed algorithm, CAMP, relies on Bayesian nonparametric estimation tools, and is robust and adaptive in the sense that it exploits users’ mobility similarities only if the latter really exist. We have shown that our Bayesian prediction framework can asymptotically achieve the performance of an optimal predictor when the user population grows large, and have presented extensive experiments indicating that CAMP outperforms any other existing prediction algorithms. Note also that CAMP can be implemented without damaging users’ privacy (the data can be anonymized).
Many interesting questions remain about the design of CAMP. In particular, we plan to investigate how to set its parameters (, , and ) to achieve an appropriate tradeoff between accuracy and complexity. These parameters could also be modified in an online manner while the algorithm is running to adapt to the nature of the data. We further plan to apply the techniques developed in this paper to various kind of mobility, e.g., we could investigate how users dynamically browse the web, and use our framework to predict the next visited webpage.
Appendix
Viia Proof of Lemma 1
Observe that in view of (5), we have:
(14) 
where the sum is over all possible partitions of the set of users in clusters and the weight is
(15) 
with . Recursively replacing in (14) with and putting = Uniform(), we obtain another expression of as
(16) 
where the sum is where is a set of every cluster sampled at th iterations, i.e., . We can further obtain the recursive expression of the weights by plugging (16) in (15):