Learning SpatiotemporalAware Representation for POI Recommendation
Abstract
The wide spread of locationbased social networks brings about a huge volume of user checkin data, which facilitates the recommendation of points of interest (POIs). Recent advances on distributed representation shed light on learning low dimensional dense vectors to alleviate the data sparsity problem. Current studies on representation learning for POI recommendation embed both users and POIs in a common latent space, and users’ preference is inferred based on the distance/similarity between a user and a POI. Such an approach is not in accordance with the semantics of users and POIs as they are inherently different objects. In this paper, we present a novel spatiotemporal aware (STA) representation, which models the spatial and temporal information as a relationship connecting users and POIs. Our model generalizes the recent advances in knowledge graph embedding. The basic idea is that the embedding of a time, location pair corresponds to a translation from embeddings of users to POIs. Since the POI embedding should be close to the user embedding plus the relationship vector, the recommendation can be performed by selecting the topk POIs similar to the translated POI, which are all of the same type of objects. We conduct extensive experiments on two realworld datasets. The results demonstrate that our STA model achieves the stateoftheart performance in terms of high recommendation accuracy, robustness to data sparsity and effectiveness in handling cold start problem.
Learning SpatiotemporalAware Representation for POI Recommendation
Bei Liu, Tieyun Qian, Bing Liu, Liang Hong, Zhenni You, Yuxiang Li State Key Laboratory of Software Engineering, Wuhan University, China Department of Computer Science, University of Illinois at Chicago, USA School of Information Management, Wuhan University, China {qty, beiliu}@whu.edu.cn, liub@cs.uic.edu, {hong, znyou, liyux}@whu.edu.cn
1 Introduction
Locationbased social networks (LBSN), such as Foursquare, Yelp, and Facebook Places, are becoming pervasive in our daily lives. Users on LBSN like to share their experiences with their friends for points of interest (POIs), e.g., restaurants and museums. The providers of locationbased services have collected a huge amount of users’ checkin data, which facilitates the recommendation of POIs to unvisited users. The POI recommendation is of high value to both the users and companies, and thus has attracted much attention from researchers in recent years [?; ?; ?; ?].
Most existing studies mainly focused on leveraging spatial information due to the wellknown strong correlation between users’ activities and geographical distance [?; ?; ?]. For example, Ye et al. [?] proposed a Bayesian collaborative filtering (CF) algorithm to explore the geographical influence. Cheng et al. [?] captured the geographical influence by modeling the probability of a user’s checkin on a location as a multicenter Gaussian model and then combined it into a generalized matrix factorization model. Lian et al. [?] adopted a weighted matrix factorization framework to incorporate the spatial clustering phenomenon.
Similar to the geospatial information, time is another important factor in POI recommendation. Ye et al., [?] found the periodic temporal property that people usually go to restaurants at around noon and visit clubs at night. Yuan et al., [?] developed a CF based model to integrate temporal cyclic patterns. Cheng et al. [?] explored the temporal sequential patterns for personalized POI recommendation by using the transition probability of two successive checkins of a user.
Existing studies has exploited spatial or temporal influences mainly using CF [?; ?] and Markov transition approaches [?]. Due to the sparsity of users’ checkin records, it is hard to find similar users or calculates transition probability. Although matrix factorization (MF) methods are effective in dealing with the sparsity in userPOI matrix [?; ?], they do not consider the current location of the user. More importantly, while time and location together play a critical role in determining users’ activities in LBSNs, rare work has modeled their joint effects. Considering only one factor will deteriorate the predictive accuracy. For instance, a student may go to a school cafeteria or to a food court in a mall at lunch time depending on he/she is on campus or outside. It is not suggested for a system to recommend the same restaurant to a user at the same time but different location. This example shows the ineffectiveness when using one type of information but ignoring the other. However, taking both time and location into consideration exaggerates the data sparsity.
In this paper, we propose a novel spatiotemporal aware (STA) model, which captures the joint effects of spatial and temporal information. Our model has the following distinct characteristics.

STA takes location and time as a whole to determine the users’ choice of POIs.

STA embeds a spatiotemporal pair time, location as a relationship connecting users and POIs.
By considering the time and location at the same time, our model can be successfully applied to realtime POI recommendation. Furthermore, distributed representations of STA are very effective in solving the problem of data sparsity.
Two recent works [?; ?] also exploited the power of distributed representation for alleviating data sparsity. The personalized ranking metric embedding (PRME) by Feng et al. [?] projected each POI and each user into a latent space, and then recommended a POI to a user at location based on the Euclidean distance between the POI and the user and that between the POI and the location . Xie et al. [?] proposed a graph based embedding model (GE) by embedding graphs into a shared low dimensional space, and then computed the similarity between a user ’s query at current time and location and a POI using an inner product, . While PRME, especially GE, shows significant improvements over many other baselines, these two methods have the drawback that they embed both users and POIs in a common latent space, and users’ preference is inferred based on the distance/similarity between a user and a POI. Such an approach is unnatural since users and POIs are inherently different objects. In contrast, our STA model generalizes recent advances in knowledge graph embedding [?]. A user reaches an interested POI via an edge denoting the time, location pair, i.e., . With this transformation, we can do recommendation for by selecting the topk POIs similar to POI , which are all of the same type of objects with similar semantics.
2 Problem Definition and Preliminary
Definition 1. (POI) A POI v is defined as a unique identifier representing one specific position (e.g., a cafe or a hotel), and V is a set of POIs, i.e., .
Definition 2. (Checkin Activity) A checkin activity is a quadruple (u, t, l, v), which means a user u visits a POI v in location l at time t.
Definition 3. (Spatiotemporal pattern) A spatiotemporal pattern, denoted as tl, is a combination of a time slot t and a location l like 11 a.m., Los Angeles.
For ease of presentation, we summarize the notations in Table 1. The POI recommendation problem investigated in this paper has the same settings as that in [?]. The formal problem definition is as follows.
Variable  Interpretation 

u, v  the user u and POI v 
t  the time slot discretized from timestamp 
l  the location mapped from (longitude, latitude) 
tl  the spatiotemporal pattern t, l 
,,  embeddings of u, (t,l), and v 
, ,  query user , his/her current time and location 
the potential POI that query user is interested in 
Problem Definition (Locationbased Recommendation) Given a dataset recording a set of users’ activities, and a query , we aim to recommend topk POIs in V that the query user would be interested in.
Preliminary  KG Embedding The knowledge graph (KG) is a directed graph whose nodes and edges describing entities and their relations of the form (head, relation, tail), denoted as (h, r, t). The goal of knowledge graph embedding is to learn a continuous vector space where the embeddings of entity and relation can preserve certain information of the graph. Bordes et al. [?] presented a simple yet effective approach TransE to learn vector embeddings for both entities and relations in KG. The basic idea is that the relationship between entities corresponds to a translation the embeddings of entities, namely, when (h ,r, t) exits in graph. Later, a model named TransH [?] was proposed to enable an entity to have distinct representations when it is involved in different relations.
Both TransE and TransH project all entities and relations into the same space. However, some entities may have multiple aspects and relations focusing on different aspects of the entities. Such entities are close in the entity space when they are similar, but they should be far away from each other in the relation space if they are strongly different in some specific aspects. To address this issue, Lin et al. [?] presented a TransR model to project two entities h and r of (h,r,t) into a rrelation space as and with operation , such that holds in the relation space.
3 Our Proposed Framework
We seek to learn the representations with the following characteristics.

Spatiotemporal awareness  Location and time together play a crucial role when a user selects a POI; they should not be separated into individual ones.

Semantics consistency  All the POIs, either the query user’s interested POI or all existing POIs , should come from a consistent semantic space.
In order to satisfy the first requirement, we combine each time slot and location as a spatiotemporal pattern t, l, and convert the quadruples into triples (, , , ) in . We then learn representations for users, spatiotemporal patterns, and POIs from the converted set to meet the second condition, using the translation technique originated from knowledge graph embedding.
3.1 STA model
For the locationbased recommendation problem, we focus on the connections between users and POIs corresponding to the spatiotemporal relations. Intuitively, if a POI v is often visited by similar users in location l at time t, the probability of a query user visiting v with the same spatiotemporal relation will be high. On the other hand, users similar in the entity space may visit different POIs under distinct temporal and geographic conditions. In order to capture the strong correlations of users and POIs to the spatiotemporal patterns, we generalize the TransR technique [?] to fit the POI recommendation task. The basic idea is that a user will reach an interested POI via a translation edge , i.e., . Fig. 1 illustrates the impacts of tl patterns.
In Fig. 1, suppose , , and are three university students, and taking same courses, and and sharing the dormitory. Given two patterns and , the query user will be translated into two POIs and , hence we should recommend for the POI in the left lower subfigure and in the right lower subfigure, which are the close neighbor of and , respectively. The different recommending results and are caused by the effects of different spatiotemporal relations and .
We now introduce the detail for STA. For each triple (, , , ) in , the user , the spatiotemporal pair , ( in short), and POI corresponds to the head entity h, the relationship edge r and the tail entity t in TransR, respectively. Their embeddings are set as , , and . For each spatiotemporal pair , we set a projection matrix to project a user embedding and a POI embedding in the original entity space to and in the relation space, such that . This indicates that a POI embedding should be the nearest neighbor of . Hence the the score function can be defined as:
(1) 
Given the score function defined in Eq. 3.1 for a triple (, , ), the entire objective function for training is as follows.
(2) 
where max(a,b) is used to get the maximum between a and b, is the margin, T and T’ are the sets of correct and corrupted triples, respectively. The corrupted triples are generated by replacing the head and tail entities in correct triples using the same sampling method as that in [?].
We adopt stochastic gradient descent (SGD) (in minibatch mode) to minimize the objective function in Eq. 2. A small set of triplets, is sampled from the training data. For each such triplet, we sample its corresponding incorrect triplets. All the correct and incorrect triples are put into a minibatch. We compute the gradient and update the parameters after each minibatch. When the iteration reaches a predefined number, we learn all the embedding for users, POIs, and spatiotemporal patterns.
3.2 Recommendation Using STA
Once we have learned the embeddings, given a query user with the query time and location , i.e., q = (, , ), we first combine and as a spatiotemporal pattern , and then we can get the potential POI using Eq. 3.
(3) 
The learned POI embedding naturally reflects the user’s preference, because it encodes the users’ past activities in . It also captures the geographic and temporal influence in .
For each POI , we compute its distance to the POI in the normed linear space as defined in Eq. 4, and then select the k POIs with the smallest ranking scores as recommendations.
(4) 
We would like to emphasize our differences in computing and recommending POIs from those in [?; ?]. First, we can find an explicit POI directly from the latent space through the translation of the embedding of the spatiotemporal pattern on the user’s embedding, while others compute an implicit by its distance/similarity to user . Second, since the embeddings for POIs in V are also from the same space, we can choose the ones which are the closest neighbors of in this space. This indicates that our recommended POIs are semantically consistent with the query user’s interested POI .
3.3 Dealing with Cold Start POIs
Considering the cold start POIs, which contain geographic and content information like tags but do not have any checkins [?], we can simply extend our model to include the POIPOI relationship through the translation of content patterns. We call this model STAC. The rationale is that, if two POIs share a common tag or location, there will be a high degree of similarity between them, and their vector representations should be close to each other. Based on this observation, we define the score function as following:
(5) 
where s is a POI sharing at least one word, location pair with POI v, and the objective function for cold start POIs is defined as:
(6) 
We once again use stochastic gradient descent to minimize the objective function LC in Eq. 6. The only difference is the sampling procedure. For STAC, since we have two types of edges, we sample the triplets (u, tl, v) and (v, wl, s) and their corresponding incorrect triples alternatively to update the model.
Our STAC model proposed for dealing with cold start POIs can also be applied to the normal POI recommendation problem. However, it requires that those POIs should contain content information. For the recommendation on datasets like Gowalla, STAC is not valid. Hence we only treat it as an extended model. Please also note that, it is STAC that uses the same information as GE does. Our standard STA model, on the other hand, uses less information than GE because it does not include the contents of POIs.
4 Experimental Evaluation
In this section, we first introduce the experimental setup and then compare our experimental results with those of baselines. Finally we show the performance of our method for addressing the data sparsity and cold start problem.
4.1 Experimental Setup
Datasets We evaluate our methods on two reallife LBSN datasets: Foursquare and Gowalla. A number of researchers have conducted experiments on data collected from these two social networks [?; ?; ?; ?; ?]. However, many of them are collected from various regions or in different time spans. For a fair comparison with GE, we use the publicly available version ^{1}^{1}1https:/sites.google.com/site/dbhongzhi provided by the authors of [?].
The two datasets have different scales such as geographic ranges, the number of users, POIs, and checkins. Hence they are good for examining the performance of algorithms on various data types. Their statistics are listed in Table 2.
Foursquare  Gowalla  
# of users  114,508  107,092 
# of POIs  62,462  1,280,969 
# of Checkins  1,434,668  6,442,892 
#std time slots  24  24 
# of locations  5,846  200 
# of t, l patterns  28,868  3,636 
Each checkin is stored as userID, POIID, POIlocation in the form of latitude and longitude, checkin timestamp, and POIcontent (only for Foursquare). In order to get the spatiotemporal patterns t, l in Table 2, we use the same discretized method as that in [?], i.e., dividing time into 24 time slots which correspond to 24 hours, and the whole geographical space into a set of regions according to 5,846 administrative divisions (for Foursquare) and 200 regions clustered by a standard kmeans method (for Gowalla). We finally get 28,868 and 3,636 t, l pairs on Foursquare and Gowalla, respectively.
Baselines  {GE, STAE, STAH} We use GE, the stateoftheart location based recommendation approach in [?], as our baseline. GE adopts a graphbased embedding framework. It learns the embeddings based on the POIPOI, POITime, POILocation, and POIWords graphs. By integrating the sequential, geographical, temporal cyclic, and semantic effect into a shared space, GE effectively overcomes the data sparsity problem and reaches the best performance so far.
We do not compare our method with other existing approaches because, GE has already significantly outperformed a number of baselines including JIM [?], PRME [?], and GeoSAGE [?]. We thus only show our improvements over GE.
Also note that although we choose the TransR technique in knowledge graph embedding to materialize our STA model, the essential of our proposed framework is the translation of time, location pairs in the embedding space. This indicates that we do not rely on a specific translation model. Hence we can use TransE [?] and TransH [?] to realize STA. We denote the resulting methods as STAE and STAH baselines, respectively.
Settings We first organize the quadruples (u, v, t, l) in each dataset by users to get each user’s profile . We then rank the records in according to the checkin timestamps, and finally divide these ordered records into two parts: the first 80% as the training data, and the rest 20% data as the test data. Moreover, the last 10% checkin records in the training data are used as a validation set for tuning the hyperparameters. We use the accuracy@k (k = {1, 5, 10, 15, 20}) as our evaluation metric. All these settings, as well as the computation approach to accuracy@k, are same as those in [?].
We use the default settings in the original TransR [?] as the parameter settings for our STA model. Specifically, we set the learning rate , the margin , the minibatch size , and the embedding dimensions , and we traverse over all the training data for 1000 rounds.
4.2 Comparison with baselines
For a fair comparison, we implement GE using the same LINE software provided by the authors of [?] on our data divisions. All the parameters for GE are same as those in [?]. We find a slightly difference (less than 1% in accuracy) between the original results and those by our implemented GE. This is understandable and acceptable considering the randomness when sampling negative edges in LINE and initiating the centers of clusters of regions. All parameters for STAE and STAH use the default settings in [?] and [?]. We present the comparison results on Foursquare and Gowalla in Fig. 2 (a) and (b), respectively.
From Fig. 2 (a), it is clear that all our proposed STAstyle models significantly outperform GE. For instance, the accuracy@1 for STA, STAH, and STAE is 0.307, 0.280, 0.255, respectively, much better than 0.225 for GE. Similar results can be observed in Fig. 2 (b) on Gowalla dataset. This clearly demonstrates the effectiveness of our translation based framework.
While STA shows drastic improvement over GE for all ks on Foursquare, the trend is not that obvious on Gowalla when k = 15, 20. This is because there is a much smaller number of relations in Gowalla than that in Foursquare. As shown in Table 2, Gowalla only has 3,636 relation patterns (t, l pairs) while Foursquare has 28,868 pairs. Hence the learnt embeddings for entities and relations are worse than those on Foursquare, and incur the less accurate results when k is large.
Besides the improvement over GE, STA outperforms STAH and STAE as well. The reason is that TransR can differentiate the entities in the transformed relation space. Nevertheless, we see a less significant enhancement of STA over STAH on Gowalla. This also conforms to the characteristics of the data: the graph of Gowalla is much larger but has less tl relation edges than that of Foursquare, and the advantage of TransR over TransE is not obvious on such a dataset.
4.3 Effects of Model Parameters
The effects of embedding dimension d on Foursquare and Gowalla are shown in Table 3 and Table 4, respectively.
1  5  10  15  20  

70  0.281  0.376  0.409  0.433  0.451 
80  0.294  0.384  0.417  0.445  0.462 
90  0.300  0.390  0.425  0.459  0.476 
100  0.307  0.393  0.434  0.461  0.483 
110  0.311  0.407  0.439  0.463  0.486 
120  0.312  0.407  0.439  0.464  0.486 
1  5  10  15  20  

70  0.355  0.432  0.474  0.503  0.527 
80  0.358  0.436  0.478  0.508  0.530 
90  0.359  0.439  0.482  0.509  0.535 
100  0.361  0.445  0.486  0.511  0.539 
110  0.361  0.445  0.488  0.513  0.540 
120  0.361  0.445  0.488  0.513  0.540 
We can see that the experimental results are not very sensitive to the dimension d. With an increasing number of dimension, the accuracy on Gowalla is almost unchanged, i.e., the improvement is less than 1% in nearly all cases. The accuracy on Foursquare is slightly enhanced with a large dimension d, and finally it becomes stable.
To investigate the effects of time interval, we divide timestamps by three methods, i.e., splitting time into 24, 7, and 2 time slots, corresponding to the daily, weekly, and weekday/weekend patterns, respectively. Figure 3 shows the effects of various time intervals.
We observe that the impact of the daily patterns is the most significant on both datasets. In addition, the results for different patterns vary widely, suggesting a good strategy for dividing the time slot is important.
4.4 Sensitivity to Data Sparsity
To investigate the sensitivity to data sparsity of STA and GE, we conduct extensive experiments to evaluate the performance on two datasets by reducing training data. More precisely, we keep the testing dataset unchanged and reduce the training data randomly by a ratio of 5% to 20% stepped by 5. Due to the space limitation, we only present the results by reducing 20% training data Table 5. The trends with other ratios are all alike.


We have the following important notes for Table 5.

With the reduction of training data, the accuracy values for STA and GE both decrease. However, STA always achieves the best results at different k values on two datasets.

The reduction of accuracy of our STA model is much smaller than that of GE. For instance, the accuracy@1 of GE decreases from 0.225 to 0.154, showing a 31.69% drop. In contrast, our STA model only has a 20.00% change. This strongly suggests that our model is more robust to the data sparsity.

The declination of accuracy on Foursquare is more obvious than on Gowalla. The reason may be that Foursquare is much sparser in users’ checkins than Gowalla, hence reducing the training data has a greater impact on Foursquare.
4.5 Test for Cold Start Problem
In this experiment, we further compare the effectiveness of our extended STAC model with GE when addressing the coldstart problem. The cold start POIs are defined as those visited by less than 5 users [?]. To test the performance of cold start POI recommendations, we select users who have at least one coldstart checkin as test users. For each test user, we choose his/her checkin records associated with coldstart POIs as test data and the remains as training data. Since there is no content information for POIs in Gowalla, we conduct experiments, just as GE did, only on Foursquare. The results are shown in Fig. 4.
From Fig. 4, it is clear that our proposed STAC model consistently beats GE when recommending cold start POIs. The superior performance of STAC model is due to the translation of content and geography information wl from an ordinary POI v to a cold start POI . As long as there is an existing v sharing one word, location pair with , our STAC model can get a translation for . In contrast, GE utilizes the bipartite graphs of POIWord and POILocation. The weight of an edge in the graph is calculated by a TFIDF value of the word or the frequency of a location. The edge weight is proportional to the probability of edge sampling. Since there are few checkin records for cold start POIs, a word and location edge has an extremely rare chance to be selected and updated. Consequently, the learnt embedding for will be poor and further deteriorates the recommendation accuracy.
5 Conclusion
We present a novel spatiotemporal aware model STA for learning representations of users, spatiotemporal patterns, and POIs. The basic idea is to capture the geographic and temporal effects using a time, location pair, and then model it as a translation connecting users and POIs. We realize STA using the knowledge graph embedding technique. Our method has two distinguished advantages. 1) We learn a joint representation for spatiotemporal patterns whose components contribute together to a user’s choice in POIs. 2) The translation mechanism enables the learnt POI embeddings to be in the same semantic space with that of the query POI.
We conduct extensive experiments on two reallife datasets. Our results show that STA achieves the stateoftheart performance in recommendation accuracy. It also significantly outperforms the baselines in terms of the effectiveness in addressing both the data sparsity and cold start problems.
Acknowledge
The work described in this paper has been supported in part by the NSFC project (61572376).
References
 [Bordes et al., 2014] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. A semantic matching energy function for learning with multirelational data. Machine Learning, 94(2):233–259, 2014.
 [Chen et al., 2015] Xuefeng Chen, Yifeng Zeng, Gao Cong, Shengchao Qin, Yanping Xiang, and Yuanshun Dai. On information coverage for location category based pointofinterest recommendation. In Proc. of AAAI, page 37¨C43, 2015.
 [Cheng et al., 2012] Chen Cheng, Haiqin Yang, Irwin King, and Michael R Lyu. Fused matrix factorization with geographical and social influence in locationbased social networks. In Proc. of AAAI, pages 17–23, 2012.
 [Cheng et al., 2013] Chen Cheng, Haiqin Yang, Michael R. Lyu, and Irwin King. Where you like to go next: Successive pointofinterest recommendation. In Proceedings of IJCAI, pages 2605–2611, 2013.
 [Cheng et al., 2016] Chen Cheng, Haiqin Yang, Irwin King, and Michael R Lyu. A unified pointofinterest recommendation framework in locationbased social networks. ACM TIST, 8(1):1–21, 10 2016.
 [Cho et al., 2011] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: user movement in locationbased social networks. In Proceedings of SIGKDD, page 1082¨C1090, 2011.
 [Feng et al., 2015] Shanshan Feng, Xutao Li, Yifeng Zeng, Gao Cong, Yeow Meng Chee, and Quan Yuan. Personalized ranking metric embedding for next new poi recommendation. In Proc. of 24th IJCAI, pages 2069–2075, 2015.
 [Gao et al., 2015] Huiji Gao, Jiliang Tang, Xia Hu, , and Huan Liu. Contentaware point of interest recommendation on locationbased social networks. In Proceedings of 29th AAAI, pages 1721–1727, 2015.
 [Lian et al., 2014] Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and Yong Rui. Geomf: joint geographical modeling and matrix factorization for pointofinterest recommendation. In Proceedings of SIGKDD, pages 831–840, 2014.
 [Lin et al., 2015] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Proc. of 29th AAAI, pages 2181–2187, 2015.
 [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In Proceedings of WWW, pages 1067–1077, 2015.
 [Wang et al., 2014] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proc. of 28th AAAI, pages 1112–1119, 2014.
 [Wang et al., 2015] Weiqing Wang, Hongzhi Yin, Ling Chen, Yizhou Sun, Shazia Sadiq, and Xiaofang Zhou. Geosage: A geographical sparse additive generative model for spatial item recommendation. In Proceedings of SIGKDD, pages 1255–1264, 2015.
 [Xie et al., 2016] Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, and Sen Wang. Learning graphbased poi embedding for locationbased recommendation. In Proc. of CIKM, pages 15–24, 2016.
 [Ye et al., 2010] Mao Ye, Peifeng Yin, and Wang Chien Lee. Location recommendation for locationbased social networks. In Proc. of ACM SIGSPATIAL, pages 458–461, 2010.
 [Ye et al., 2011a] Mao Ye, Krzysztof Janowicz, and Wang Chien Lee. What you are is when you are: the temporal dimension of feature types in locationbased social networks. In Proc. of ACM SIGSPATIAL, pages 102–111, 2011.
 [Ye et al., 2011b] Mao Ye, Peifeng Yin, WangChien Lee, and DikLun Lee. Exploiting geographical influence for collaborative pointofinterest recommendation. In Proceedings of SIGIR, pages 325–334, 2011.
 [Yin et al., 2015] Hongzhi Yin, Xiaofang Zhou, Yingxia Shao, Hao Wang, and Shazia Sadiq. Joint modeling of user checkin behaviors for pointofinterest recommendation. In Proceedings of CIKM, pages 1631–1640, 2015.
 [Yin et al., 2016] Hongzhi Yin, Xiaofang Zhou, Bin Cui, Hao Wang, Kai Zheng, and Nguyen Quoc Viet Hung. Adapting to user interest drift for poi recommendation. TKDE, 28(10):2566–2581, 2016.
 [Yuan et al., 2013] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thalmann. Timeaware pointofinterest recommendation. In Proc. of SIGIR, pages 363–372, 2013.
 [Zheng et al., 2009] Yu Zheng, Lizhu Zhang, Xing Xie, and WeiYing Ma. Mining interesting locations and travel sequences from gps trajectories. In Proc. of WWW, pages 791–800, 2009.
 [Zhu et al., 2015] WenYuan Zhu, WenChih Peng, LingJyh Chen, Kai Zheng, and Xiaofang Zhou. Modeling user mobility for location promotion in locationbased social networks. In Proceedings of SIGKDD, pages 1573–1582, 2015.