Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data

# Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data

Susan Athey, David Blei, Robert Donnelly, Francisco Ruiz and Tobias Schmidt Athey: Stanford University, 655 Knight Way, Stanford, CA 94305, athey@stanford.edu. Blei: Columbia University, Department of Computer Science, New York, NY, 10027, david.blei@columbia.edu. Donnelly: Stanford University, 655 Knight Way, Stanford, CA 94305, rodonn@stanford.edu. Ruiz: Columbia University, Department of Computer Science, New York, NY, 10027, fr2392@columbia.edu, and University of Cambridge, Department of Engineering, Cambridge CB2 1PZ, UK. Schmidt: Stanford University, 655 Knight Way, Stanford, CA 94305, tobiass@stanford.edu. The authors are listed in alphabetical order. We are grateful to SafeGraph and Yelp for providing the data, and to Paula Gablenz, Renee Reynolds, Tony Fan, and Arjun Parthipan for exceptional research assistance. We acknowledge generous financial support from Microsoft Corporation, the Sloan Foundation, the Cyber Initiative at Stanford, and the Office of Naval Research. Ruiz is supported by the EU H2020 programme (Marie Skłodowska-Curie grant agreement 706760).
July 17, 2019
###### Abstract

This paper analyzes consumer choices over lunchtime restaurants using data from a sample of several thousand anonymous mobile phone users in the San Francisco Bay Area. The data is used to identify usersâ approximate typical morning location, as well as their choices of lunchtime restaurants. We build a model where restaurants have latent characteristics (whose distribution may depend on restaurant observables, such as star ratings, food category, and price range), each user has preferences for these latent characteristics, and these preferences are heterogeneous across users. Similarly, each item has latent characteristics that describe usersâ willingness to travel to the restaurant, and each user has individual-specific preferences for those latent characteristics. Thus, both usersâ willingness to travel and their base utility for each restaurant vary across user-restaurant pairs. We use a Bayesian approach to estimation. To make the estimation computationally feasible, we rely on variational inference to approximate the posterior distribution, as well as stochastic gradient descent as a computational approach. Our model performs better than more standard competing models such as multinomial logit and nested logit models, in part due to the personalization of the estimates. We analyze how consumers re-allocate their demand after a restaurant closes to nearby restaurants versus more distant restaurants with similar characteristics, and we compare our predictions to actual outcomes. Finally, we show how the model can be used to analyze counterfactual questions such as what type of restaurant would attract the most consumers in a given location.

\setstretch

1

Where should a a new restaurant be located? What type of restaurant would be best in a given location? How close does a competitor need to be to matter? These are examples of questions about product design and product choice. While there is extensive literature on consumer response to prices, there is relatively little attention to firm choices about physical location and product characteristics. Recent trends in digitization have led to the creation of many large panel datasets of consumers, which in turn motivates the development of models that exploit the rich information in the data and provide precise answers to these questions.

Answering many of these questions requires a model that incorporates individual-level heterogeneity in preferences for product attributes and travel time, as these characteristics might vary substantially even within a city. More broadly, understanding individual heterogeneity in travel preferences is a key input for urban planning. To this end, we develop an empirical model of consumer choices over lunchtime restaurants, the Travel-Time Factorization Model (TTFM). TTFM incorporates rich heterogeneity in user preferences for both observed and unobserved restaurant characteristics as well as for travel time. We apply the model to a dataset derived from mobile phone locations for several thousand anonymized mobile phone users in the San Francisco Bay Area; this is the first structural model of individual travel choice based on mobile location data.

TTFM can answer counterfactual questions. For example, what would happen if a restaurant with a given set of characteristics opened or closed in a particular location? Using data about several hundred openings and closings of restaurants, we compare TTFM’s predictions to the real outcomes. TTFM can also make personalized predictions for individuals and restaurants. Its personalized predictions are more accurate than existing methods, especially for high-activity individuals and popular restaurants.

TTFM incorporates recently developed approaches from machine learning for estimating models with a large number of latent variables. It uses a standard discrete choice framework to model each user’s choice over restaurants, inferring the parameters of the users’ utility functions from their choice behavior. TTFM differs from more traditional models in the number of latent variables; it incorporates a vector of latent characteristics for each restaurant as well as latent user preferences for these characteristics. In addition, it incorporates heterogeneous user preferences for travel distance, which vary by restaurant. These distance preferences are represented as the inner product of restaurant-specific factors and user willingness to travel to restaurants with those factors. Finally, TTFM is a hierarchical model, where observable restaurant characteristics affect the distribution of latent restaurant characteristics. We use a Bayesian approach to inference, where we estimate posterior distributions over each user’s preferences and each restaurant’s characteristics. The posterior is complex and the dataset is large. Thus, to make the estimation computationally feasible, we rely on stochastic variational inference to approximate the posterior distribution with a stochastic gradient optimization algorithm.

Our approach builds on a large literature in economics and marketing on estimating discrete choice models of consumer behavior; see Keane (2015) for a survey. It also relates to a decades-old literature in marketing on inferring “product maps” from panel data (Elrod, 1988). Our estimation strategy is drawn from approaches developed in Athey et al. (2017) and Ruiz, Athey and Blei (2017), both of which considered the problem of choosing items from a supermarket, and it also relates to Wan et al. (2017), who take a matrix factorization approach to consumer choice. Though less well-studied, there has also been some work on estimating consumer preferences for travel time, e.g., Neilson (2013)’s study of school choice.

## I Empirical Model and Estimation

We model the consumer’s choice of restaurant conditional on deciding to go out to lunch. We assume that the consumer selects the restaurant that maximizes utility, where the utility of user for restaurant on her -th visit is

 Uuit=λi+θ⊤uαi+μ⊤iδwut−γ⊤uβi⋅log(dui)+ϵuit,

where denotes the week in which trip happens, and is the distance from to . This gives a parameterized expression for the utility: is an intercept term that captures a restaurant’s popularity; and are latent vectors that model a user’s latent preferences and a restaurant’s latent attributes; is a vector that captures a restaurant’s latent factors for travel distance and is a user’s latent preferences of willingness to travel to restaurants with those factors; and are latent vectors of week/restaurant time effects (this allows us to capture varying effects for different parts of the year); and are error terms, which we assume to be independent and identically Gumbel distributed. We specify a hierarchical model where observable characteristics of restaurants, denoted by , affect the mean of the distribution of latent restaurant characteristics and . This hierarchy allows restaurants to share statistical strength, which helps to infer the latent variables of low-frequency restaurants. We estimate the posterior over the latent model parameters using variational inference. Our approach is similar to Ruiz, Athey and Blei (2017), but differs in a few respects. First, we assume that each consumer chooses only one restaurant on a purchase occasion, so interactions among products are not important. Second, TTFM is hierarchical, allowing observed restaurant characteristics to affect the prior distribution of latent variables. (See Appendix A.A3 for details.)

For comparison, we also consider a simpler model, a standard multinomial logit model (MNL), which is a restricted version of our proposed model: the term is constant across restaurants, is set to be equal to the observable characteristics of items, is constant across users, is omitted (including it created problems with convergence of the estimation), and is restricted to be constant across users and restaurants.

## Ii The Data and Summary Statistics

The dataset is from SafeGraph, a company that collects anonymous, aggregates locational information from consumers who have opted into sharing their location through mobile applications. The data consists of “pings” from consumer phones; each observation includes a unique device identifier that we associate with a single anonymous consumer, the time and date of the ping, and the latitude, longitude and accuracy of the ping over a sample period from January through October 2017.

From this data, we construct the key variables for our analysis. First, we construct the approximate “typical” morning location of the consumer, defined as the most common place the consumer is found from 9:00 to 11:15 a.m. on weekdays. We restrict attention to consumers whose morning locations are consistent over the sample period, and for which these locations are in the Peninsula of the San Francisco Bay Area (roughly, South San Francisco to San José, excluding the mountains and coast). We determine that the consumer visited a restaurant for lunch if we observed at least two pings more than 3 minutes apart during the hours of 11:30 a.m. to 1:30 p.m. in a location that we identify as a restaurant. Restaurants are identified using data from Yelp that includes geo-coordinates, star ratings, price range, restaurant categories (e.g., Pizza or Chinese), and we also use Yelp to infer approximate dates of restaurant openings and closings. Last, we narrow the dataset to consumer choices over a subset of restaurants that appear sufficiently often in the data, and to consumers who visit a sufficient number of restaurants. This process results in a final dataset of 106,889 lunch visits by 9,188 users to 4,924 locations. Table 1 provides summary statistics on the users and restaurants included in the dataset. (Appendix A.A2 gives all details about the dataset processing pipeline.)

## Iii Estimation and Model Fit

We divide the dataset into three parts, 70.6 percent training, 5.0 percent validation, and 24.4 percent testing. We use the validation dataset to select parameters such as the length of the latent vectors and ( and , respectively), while we compare models and evaluate performance in the test dataset. (See Section A.A4 for details.) We select and . In the hierarchical prior, the distribution of a restaurant’s components depends on price range, star ratings, and restaurant category.

Across several measures evaluated on the test set, TTFM is a better model than MNL. For example, precision@5 is the percentage of times that a user’s chosen restaurant is in the set of the top five predicted restaurants. It is 35% for TFMM and 11% for MNL. Further, as shown in Figures A6 and A6, TTFM predictions improve significantly for high-frequency users and restaurants, while MNL does not exhibit that improvement. This highlights the benefits of personalization: When given enough data, TTFM learns user-specific preferences.

Figure 2 illustrates that both TTFM and MNL fit well the empirical probability of visiting restaurants at varying distances from the consumer’s morning location. But Figure 2 shows that TTFM outperforms MNL at fitting the actual visit rates of different restaurants; here restaurants are grouped by their visit-frequency deciles. The rich heterogeneity of TTFM allows personalized predictions for restaurants.

## Iv Parameter Estimates

The distributions of estimated elasticities from TTFM are summarized in Table A2 and Figure A7. Note that the elasticities in the MNL vary only because the baseline visit probabilities vary across consumers and restaurants. TTFM elasticities are more dispersed, reflecting the personalization capabilities of the TTFM model. The average elasticity across consumers and restaurants (weighted by trip frequency) is . Thus, distance matters substantially for lunch, which is consistent with the fact that roughly 60 percent of visits are within two miles of the consumer’s morning location. Furthermore, there is substantial heterogeneity in that willingness to travel. Across users and restaurants, the standard deviation of elasticities in the TTFM model is 0.68, while the average within-user standard deviation of elasticities is 0.30 and the average within-restaurant standard deviation of elasticities is 0.60. Elasticities are substantially less dispersed in the MNL model.

Tables 3 and 4 and Figure 3 illustrate how elasticities vary across restaurant types and cities. Willingness to travel is lower for low-priced restaurants (elasticity for price range $(under$10) versus for price range  ($11$30)); lower for Mexican restaurants and Pizza places than for Chinese and Japanese restaurants (elasticities of and versus and , respectively). Cities with many work locations nearby retail districts, including San José, Sunnyvale, and Mountain View have a lower willingness to travel than cities that are more spread out like Daly City, Burlingame, San Bruno, and San Mateo. Appendix Section A.A5 provides further descriptive statistics about latent factors and model results, illustrating for example how to model can be used to find restaurants that are intrinsically similar (without regard to location) as well as which restaurants are similar in terms of user utilities.

## V Analyzing Restaurant Opening and Closing

The TTFM model can make predictions about how market share will be redistributed among restaurants when restaurants open or close, and these predictions can be compared to the actual changes that occur in practice. For this exercise, we focus on 221 openings and 190 closings where, both before and after the change, there were at least 500 restaurant visits by users with morning locations within a 3 mile radius of the relevant restaurant. Figure A3 illustrates that restaurant openings and closings are fairly evenly distributed over the time period.

One challenge of analyzing market share redistribution is that for any given target restaurant that opens or closes, we would expect some baseline level of market share changes of competing restaurants due to changes in the open status of neighboring restaurants. We address this in an initial exercise where we hold the environment fixed in the following way. For each target restaurant that changed status, we first construct the predicted difference in market shares for each other restaurant between the “closed” and “open” regime (irrespective of which came first in time), and then subtract out the predicted change in market share that would have occurred for each restaurant if the target restaurant had been closed in both periods. We then sum the changes across restaurants in different groups defined by their distance from the target restaurant. Table 5 shows TTFM model predictions for how the opening/closing restaurant’s market share is redistributed over other restaurants within certain distances after the restaurant becomes unavailable (i.e. before the opening or after the closing). The TTFM model estimates imply that just over 50 percent of the market share impact of a closure accrues restaurants within 2 miles of the target restaurant.

Figure 4 compares the actual changes in market share that occured against the predictions of the TTFM model. It should be noted that baseline changes unrelated to the opening and closing of the target restaurants seem to dominate both the actual and predicted market share changes in the figure. The figure shows that our model’s predictions match well the actual changes that occurred, but it there is substantial variation in the changes that occured in the actual data, making it difficult to evaluate model performance using this exercise.

Our final exercise considers the best choice of restaurant type for a location. For the set of restaurants that open or close, we look at how the demand for the restaurant that changed status (the “target restaurant”) compares to the counterfactual demand the model predicts in the scenario where a different restaurant in our sample (as described by its mean latent characteristics) is placed in the location of the target restaurant. For each target, we consider a set of 200 alternative restaurants, 100 from the same category as the target restaurant and 100 from a different category.111These alternatives are sampled with equal probabilities from the set of restaurants in our sample. We then compare the target restaurant’s estimated market share to the mean demand across the set of alternatives. In Table 6, we see that both the restaurants that opened and those that closed on average have higher predicted demand than either group of alternatives. However, the restaurants that opened appear to be in more valuable locations, since for the 200 alternative restaurants, we predict higher average demand if they were (counterfactually) placed at the opening locations than at the locations of closing restaurants. As a further comparison, we split the set of alternatives into groups based on whether or not they are in the same broad category as the restaurant that opened or closed. We find that alternative restaurants from the same category as the target would perform better on average than alternatives from a different category.

## Vi Ideal Locations and Ideal Restaurant Types

In this section, we consider the match between restaurant characteristics and locations. In each geohash6, we select one restaurant location at random and use the TTFM model to predict what the total demand would have been if a different restaurant had been located in its place. The set of alternative restaurants was chosen to include one restaurant from each of the major categories in the sample.222From each category, we randomly selected one restaurant whose market share is within standard deviation of the mean market share in the full sample.

In Figure A9, we examine which locations are predicted to provide the largest demand in the lunch market for each restaurant category. We can see for example that Vietnamese restaurants are predicted to have the highest demand in a dense region in the southeastern portion of the map. The demand for Filipino restaurants is relatively diffuse, whereas the demand for sandwiches is characterized by small but dense pockets of relatively high demand.

In Figure A10, we group the restaurant categories into coarse groups based on the price range and the type of cuisine. We examine within each group which category would have the highest total demand in each location. There is considerable spatial heterogeneity in which restaurant category is predicted to perform best in each location.

## Vii Conclusions

This paper makes use of a novel dataset to analyze consumer choice: mobile location data. We propose the TTFM model, a rich model that allows heterogeneity in user preferences for restaurant characteristics as well as for travel time, where preferences for travel time vary across restaurants as well. We show that this model fits the data substantially better than traditional alternatives, and by incorporating recent advances in Bayesian inference, the estimation becomes tractable. We use the model to conduct counterfactual analysis about the impact of restaurants opening and closing, as well as to evaluate how the choice of restaurant characteristics affects market share. More broadly, we believe that with the advent of digitization, panel datasets about consumer location can be combined with rich structural models to answer questions about firm strategy as well as urban policy, and models such as TTFM can be used to accomplish these goals.

## References

• (1)
• Athey et al. (2017) Athey, Susan, David M. Blei, Robert Donnelly, and Francisco J. R. Ruiz. 2017. “Counterfactual Inference for Consumer Choice Across Many Product Categories.” Unpublished.
• Blei, Kucukelbir and McAuliffe (2017) Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association, 112(518): 859–877.
• Blum (1954) Blum, Julius R. 1954. “Approximation methods which converge with probability one.” The Annals of Mathematical Statistics, 25(2): 382–386.
• Bottou, Curtis and Nocedal (2016) Bottou, L., F. E. Curtis, and J. Nocedal. 2016. “Optimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838.
• Elrod (1988) Elrod, Terry. 1988. “Choice map: Inferring a product-market map from panel data.” Marketing Science, 7(1): 21–40.
• Hoffman et al. (2013) Hoffman, M. D., David M. Blei, C. Wang, and J. Paisley. 2013. “Stochastic Variational Inference.” Journal of Machine Learning Research, 14: 1303–1347.
• Jordan (1999) Jordan, Michael I., ed. 1999. Learning in Graphical Models. Cambridge, MA, USA:The MIT Press.
• Keane (2015) Keane, Michael P. 2015. “Panel Data Discrete Choice Models of Consumer Demand.” , ed. B. H. Baltagi, Chapter 18, 549–583. Oxford University Press.
• Kingma and Welling (2014) Kingma, Diederik P., and Max Welling. 2014. “Auto-Encoding Variational Bayes.” arXiv:1312.6114.
• Neilson (2013) Neilson, C. 2013. “Targeted vouchers, competition among schools, and the academic achievement of poor students.” Yale University Working Paper.
• Rezende, Mohamed and Wierstra (2014) Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. 2014. “Stochastic backpropagation and approximate inference in deep generative models.” Vol. 32 of Proceedings of Machine Learning Research, 1278–1286. PMLR.
• Robbins and Monro (1951) Robbins, H., and S. Monro. 1951. “A stochastic approximation method.” The Annals of Mathematical Statistics, 22(3): 400–407.
• Ruiz, Athey and Blei (2017) Ruiz, Francisco J. R., Susan Athey, and David M. Blei. 2017. “SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements.” arXiv:1711.03560.
• Titsias and Lázaro-Gredilla (2014) Titsias, M. K., and M. Lázaro-Gredilla. 2014. “Doubly stochastic variational Bayes for non-conjugate inference.” Vol. 32 of Proceedings of Machine Learning Research, 1971–1979. PMLR.
• Wainwright and Jordan (2008) Wainwright, M. J., and M. I. Jordan. 2008. “Graphical Models, Exponential Families, and Variational Inference.” Foundations and Trends in Machine Learning, 1(1–2): 1–305.
• Wan et al. (2017) Wan, Mengting, Di Wang, Matt Goldman, Matt Taddy, Justin Rao, Jie Liu, Dimitrios Lymberopoulos, and Julian McAuley. 2017. “Modeling Consumer Preferences and Price Sensitivities from Large-Scale Grocery Shopping Transaction Logs.” 1103–1112, International World Wide Web Conferences Steering Committee.
• Zhao, Du and Buntime (2017) Zhao, He, Lan Du, and Wray Buntime. 2017. “Leveraging Node Attributes for Incomplete Relational Data.” Vol. 70 of Proceedings of Machine Learning Research, 4072–4081. PMLR.

## A Appendix

This Appendix begins by providing details of the data and dataset creation. Next we provide estimation details. Then, we provide a variety of results about goodness of fit and our model estimates, including summaries of estimated sensitivity to distance broken out by restaurant category and other characteristics. Next, we provide details of our analyses of restaurant openings and closings, as well as counterfactual analyses about the ideal locations of restaurants of different categories.

### A1 Data Description

Our dataset is constructed using data from SafeGraph, a company which aggregates locational information from anonymous consumers who have opted in to sharing their location through mobile applications. The data consists of “pings” from consumer phones; each observation includes a unique device id that we associate with a single consumer; the time and date of the ping; and the latitude and longitude and horizontal accuracy of the ping, all for smartphones in use during the sample period from January through October 2017.

Our second data source is Yelp. From Yelp, we obtained a list of restaurants, locations, ratings, price ranges, and categories, and we infer dates of openings and closings from the dates on which consumers created a listing on Yelp or marked a location as closed, respectively.

### A2 Dataset Creation and Sample Selection

Our area of interest is the corridor from South San Francisco to South San José around I-101 and I-280. We start with a rough bounding box around the area, find all incorporated cities whose area intersects the bounding box and then remove Fremont, Milpitas, Hayward, Pescadero, Loma Mar, La Honda, Pacifica, Montara, Moss Beach, El Granada, Half Moon Bay, Lexington Hills and Colma from the set because they are too far from the corridor.

This leaves us with the following 41 cities: Los Gatos, Saratoga, Campbell, Cupertino, Los Altos Hills, Monte Sereno, Palo Alto, San José, San Bruno, Atherton, Brisbane, East Palo Alto, Foster City, Hillsborough, Millbrae, Menlo Park, San Mateo, Portola Valley, Sunnyvale, Mountain View, Los Altos, Santa Clara, Belmont, Burlingame, Daly City, San Carlos, South San Francisco, Woodside, Redwood City, Alum Rock, Burbank, Cambrian Park, East Foothills, Emerald Lake Hills, Fruitdale, Highlands-Baywood Park, Ladera, Loyola, North Fair Oaks, Stanford and West Menlo Park.

We then take the shapefiles for these cities as provided by the Census Bureau and find the set of rectangular regions known as geohash5s333Geohashes are a system in which the earth is gridded into a set of successively finer set of rectangles, which are then labelled with alphanumeric strings. These strings can then be used to describe geographic information in databases in a form that is easier to work with than latitudes and longitudes. At its coarsest, the geohash1 level, the earth is divided into 32 rectangles whose edges are roughly 3000 miles long. Each geohash1 is then in turn divided into 32 rectangles that are about 800 miles across. The finest geohash resolution used in this paper, geohash8, corresponds to rectangles of size 125 60 feet. See http://www.geohash.org/ for further details. that cover their union. This is our area of interest and is shown in Figure A1.

To construct our user base we only consider movement pings emitted on weekdays. We define an active week to be one during which a user emits at least one such ping. The user base includes users who meet the following criteria during our sample period, January to October 2017:

• Have an approximate inferred home location as provided by SafeGraph

• Are “active” (defined as having at least 12 — not necessarily consecutive — active weeks)

• Have at least 10 pings in the area of interest on average in active weeks

• 80 percent of pings during hours of 9 — 11:15 a.m. are in the area of interest

• 60 percent of pings during hours of 9 — 11:15 a.m. are in their “broad morning location” where “broad morning location” is at the geohash6 level (a rectangle of roughly 0.75 miles 0.4 miles).

• 40 percent of pings during hours of 9 — 11:15 a.m. are in their “narrow morning location” where “narrow morning location” is at the geohash7 level (a square with edge length of roughly 500 feet).

• Have their “broad morning location” in the area of interest

These restrictions give us 32,581 users, which we refer to as our “user base.” We then consider the set of restaurants. We begin with the set of restaurants known to Yelp in the San Francisco Bay Area, which we reduce through the following restrictions:

• Locations are in the area of interest

• Locations belong not just to the category “food” but also belong to certain sub-categories (manually) selected from Yelp’s list (https://www.yelp.com/developers/documentation/v2/category_list): thai, soup, sandwiches, juicebars, chinese, tradamerican, newamerican, bars, breweries, korean, mexican, pizza, coffee, asianfusion, indpak, delis, japanese, pubs, italian, greek, sportsbars, hotdog, burgers, donuts, bagels, spanish, basque, chicken_wings, seafood, mediterranean, portuguese, breakfast_brunch, sushi, taiwanese, hotdogs, mideastern, moroccan, pakistani, vegetarian, vietnamese, kosher, diners, cheese, cuban, latin, french, irish, steak, bbq, vegan, caribbean, brazilian, dimsum, soulfood, cheesesteaks, tapas, german, buffets, fishnchips, delicatessen, tex-mex, wine_bars, african, gastropubs, ethiopian, peruvian, singaporean, malaysian, cajun, cambodian, cafes, halal, raw_food, foodstands, filipino, british, southern, turkish, hungarian, creperies, tapasmallplates, russian, polish, afghani, argentine, belgian, fondue, brasseries, himalayan, persian, indonesian, modern_european, kebab, irish_pubs, mongolian, burmese, hawaiian, cocktailbars, bistros, scandinavian, ukrainian, lebanese, canteen, austrian, scottish, beergarden, arabian, sicilian, comfortfood, beergardens, poutineries, wraps, salad, cantonese, chickenshop, szechuan, puertorican, teppanyaki, dancerestaurants, tuscan, senegalese, rotisserie_chicken, salvadoran, izakaya, czechslovakian, colombian, laos, coffeeshops, beerbar, arroceria_paella, hotpot, catalan, laotian, food_court, trinidadian, sardinian, cafeteria, bangladeshi, venezuelan, haitian, dominican, streetvendors, shanghainese, iberian, gelato, ramen, meatballs, armenian, slovakian, czech, falafel, japacurry, tacos, donburi, easternmexican, pueblan, uzbek, sakebars, srilankan, empanadas, syrian, cideries, waffles, nicaraguan, poke, noodles, newmexican, panasian, acaibowls, honduran, guamanian, brewpubs.444Locations can belong to several categories. The location will be included if any categories match.

This yields a list of locations far too broad. We thus refine the resulting set of locations by removing:

• The coffee and tea chains Starbucks, Peet’s and Philz Coffee

• All locations whose name matches the regular expression (coffee|tea) but whose name does not start with “coffee”

• All locations whose name matches the regular expression (donut|doughnut) but does not contain “bagel”

• All locations whose name matches the regular expression food court

• All locations whose name matches the regular expression mall

• All locations whose name matches the regular expression market

• All locations whose name matches the regular expression supermarket

• All locations whose name matches the regular expression shopping center

• All locations whose name matches the regular expression (yogurt|ice cream|dessert)

• All locations whose name matches the regular expression cater but does not match the regular expression (and|&) (this is to keep places like “Catering and Cafe” in the sample)

• All locations whose name matches the regular expression truck and who do not have a street address (these are likely to be food trucks that move around)

• A number of “false positives” manually by name (commonly these are grocery stores, festivals or farmers’ markets)

• A number of cafeterias at prominent Bay Area tech companies like Google, VMWare and Oracle

Finally, we review the list of locations that would be removed under these rules and save a few handsful of locations from removal manually.

Applying these restrictions leaves us with 6,819 locations. As a last step we de-duplicate on geohash8. Some locations are so close together that given our matching method we cannot tell them apart and need to decide which of potentially several locations in a geohash8 we want to assign a visit to. In 4,577 cases there is a unique restaurant in the geohash8, while 687 have two, with the remainder having three or more. We de-duplicate using the first restaurant in alphabetical order, leaving us with 5,555 locations. (One reason to remove San Francisco from the sample is that higher density areas have more duplication.) The resulting restaurants are visualized in Figure A2.

Next, we define a “visit” to a restaurant. For each user, each restaurant and each day we count the number of pings in the restaurant’s geohash8 as well as its immediately adjacent geohash8s as well as the dwelltime, defined as the difference between the earliest and the latest ping seen at the loction during lunch hour. Call any such match a “visit candidate”. To get from visit candidates to visits, we impose the requirement that there be at least 2 pings in one of the location’s geohash8s and that the dwelltime be at least 3 minutes. We also require that the visit be to a location that has no overlap with either the person’s home geohash7 or the geohash7 we have identified as the person’s narrow morning location so as to reduce the possibility of mis-identifying people living near a location or working at the location as visiting the location. In cases where a sequence of pings satisfying these criteria falls into the geohash8s of multiple locations we attribute the visit to the locations for which the dwelltime is longest.

To put together our estimation dataset, we restrict the above visits to a set of users and restaurants we see sufficiently often. We require first that each user have at least 3 visits during the sample period, that each location have at least one visit by someone in the user base per week on average, or at least five visits overall (from users overall, not just those in our user base). This leaves us with 106,889 lunch visits by 9,188 users to 4,924 locations.

We also use data from Yelp to infer the dates of restaurant openings and closings. We use the following heuristic: the opening is the date on which a listing was added to the Yelp database, while the closing date is the date on which a restaurant is marked by a member as closed. Figure A3 shows the openings and closings throughout the sample period. We focus on openings and closings of restaurants that are considered by users whose morning location is within 3 miles of the opening/closing restaurant and who collectively take at least 500 lunch visits both before and after the change in status.

#### Distance

As our measure of distance between a user’s narrow morning location and each of the items in her choice set we use the simple straight-line distance (taking into account the earth’s curvature). After calculating these distances we cull all alternatives that are further than 20 miles away from the choice set.

#### Item covariates

The following restaurant covariates (or subsets thereof) are used in the estimation of both the MNL and the TTFM:

• rating_in_sample: the average rating awarded during the sample period Jan – Oct 2017. If missing the value is replaced by the rating_in_sample average and another variable, rating_in_sample_missing indicates that this replacement has been made

• N_ratings_in_sample: the number of ratings that entered the computation of rating_in_sample

• rating_overall: the average all–time rating. If missing the value is replaced by the rating_overall average and another variable, rating_overall_missing indicates that this replacement has been made

• N_ratings_overall: the number of ratings that entered the computation of rating_overall

• category_mexicancategory_dancerestaurants: A number of 0/1 indicator variables for whether an item has the corresponding category associate with it on Yelp

• pricerange: categorical variable indicating the restaurant’s price category, from \$ to

### A3 Estimation Details

To estimate the TTFM model, we build on the approach outlined in the appendix of Ruiz, Athey and Blei (2017), and indeed we use the same code base, since when we ignore the observable attributes of items, our model is a special case of Ruiz, Athey and Blei. Ruiz, Athey and Blei considers a more complex setting where shoppers consider bundles of items. When restricted to the choice of a single item, the model is identical to TTFM replacing price with distance for TTFM. However, we treat observable characteristics differently in TTFM than Ruiz, Athey and Blei. In the latter, observables enter the consumer’s mean utility directly, while in TTFM we incorporate observables by allowing them to shift the mean of the prior distribution of latent restaurant characteristics in a hierarchical model.

We assume that one quarter of latent variables are affected by restaurant price range, one quarter are affected by restaurant categories, one quarter are affected by star ratings, and for one quarter of the latent variables there are no observables shifting the prior.

The TTFM model defines a parameterized utility for each customer and restaurant,

 Uuit=λipopularity+θ⊤uαicustomer preferences−γ⊤uβi⋅log(duit)distance effect+μ⊤iδwuttime-varying effect+ϵuitnoise,

where denotes the utility for the -th visit of customer to restaurant . This expression defines the utility as a function of latent variables which capture restaurant popularity, customer preferences, distance sensitivity, and time-varying effects (e.g., for holidays). All these factors are important because they shape the probabilities for each choice. Below we describe the latent variables in detail.

Restaurant popularity. The term is an intercept that captures overall (time-invariant) popularity for each restaurant . Popular restaurant will have higher values of , which increases their choice probabilities.

Customer preferences. Each customer has her own preferences, which we wish to infer from the data. We represent the customer preferences with a -vector for each customer. Similarly, we represent the restaurant latent attributes with a vector of the same length. For each choice, the inner product represents how aligned the preferences of customer and the attributes of restaurant are. This term increases the utility (and consequently, the probability) of the types of restaurants that the customer tends to prefer.

Distance effects. We next describe how we model the effect of the distance from the customer’s morning location to each restaurant. We posit that each customer has an individualized distance sensitivity for each restaurant , which is factorized as , where latent vectors and have length . Using a matrix factorization approach allows us to decompose the customer/restaurant distance sensitivity matrix into per-customer latent vectors and per-restaurant latent vectors , both of length , therefore reducing the number of latent variables in the model. Thus, the inner product indicates the distance sensitivity, which affects the utility through the term . We place a minus sign in front of the distance effect terms to indicate that the utility decreases with distance.

Time-varying effects. Taking into account time-varying effects allows us to explicitly model how the utilities of restaurants vary with the seasons or as a consequence of holidays. Towards that end we introduce the latent vectors and of length . For each restaurant and calendar week , the inner product captures the variation of the utility for that restaurant in that specific week. Note that each trip of customer is associated with its corresponding calendar week, .

Noise terms. We place a Gumbel prior over the error (or noise) terms , which leads to a softmax model. That is, the probability that customer chooses restaurant in the -th visit is

 p(yut=i)∝exp{λi+θ⊤uαi−γ⊤uβi⋅log(duit)+μ⊤iδwut},

where denotes the choice.

Hierarchical prior. The resulting TTFM model is similar to the Shopper model (Ruiz, Athey and Blei, 2017), which is a model of market basket data. The TTFM is simpler because it does not consider bundles of products, i.e., we restrict the choices to one restaurant at a time, and thus we do not need to include additional restaurant interaction effects.

A key difference between Shopper and the TTFM is how we deal with low-frequency restaurants. To better capture the latent properties of low-frequency restaurants, we make use of observed restaurant attributes. In particular, we develop a hierarchical model to share statistical strength among the latent attribute vectors and .555We could also consider a hierarchical model over the time effect vectors , but these are low-dimensional and factorize a smaller restaurant/week matrix, so for simplicity we assume independent priors over . Inspired by Zhao, Du and Buntime (2017), we place a prior that relates the latent attributes with the observed ones. More in detail, let be the vector of observed attributes for restaurant , which has length . We consider a hierarchical Gaussian prior over the latent attributes and distance coefficients ,

 p(αi|Hα,xi)=1(2πσ2α)k1/2exp{−12σ2α||αi−Hαxi||22},p(βi|Hβ,xi)=1(2πσ2β)k2/2exp{−12σ2β||βi−Hβxi||22}.

Here, we have introduced the latent matrices and , of sizes and respectively, which weigh the contribution of each observed attribute on the latent attributes. In this way, the (weighted) observed attributes of restaurant can shift the prior mean of the latent attributes. By learning the weighting matrices from the data, we can leverage the information from the observed attributes of high-frequency restaurants to estimate the latent attributes of low-frequency restaurants.

To reduce the number of entries of the weighting matrices, we set some blocks of these matrices to zero. In particular, we assume that one quarter of the latent variables is affected by restaurant price range only, one quarter is affected by restaurant categories, one quarter is affected by star ratings, and for the remaining quarter we assume that there are no observables shifting the prior (which is equivalent to independent priors). We found that this combination of independent and hierarchical priors over the latent variables works well in practice.

To complete the model specification, we place an independent Gaussian prior with zero mean over each latent variable in the model, including the weighting matrices and . We set the prior variance to one for most variables, except for and , for which the prior variance is , and for and , for which the prior variance is . We also set the variance hyperparameters .

Inference. As in most Bayesian models the exact posterior over the latent variables is not available in closed form. Thus, we must use approximate Bayesian inference. In this work, we approximate the posterior over the latent variables using variational inference.

Variational inference approximates the posterior with a simpler and tractable distribution (Jordan, 1999; Wainwright and Jordan, 2008). Let be the vector of all hidden variables in the model, and the variational distribution that approximates the posterior over . In variational inference, we specify a parameterized family of distributions , and then we choose the member of this family that is closest to the exact posterior, where closeness is measured in terms of the Kullback-Leibler (KL) divergence. Thus, variational inference casts inference as an optimization problem. Minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO),

 L=Eq(H)[logp(y,H)−logq(H)],

where denotes the observed data and . Thus, in variational inference we first find the parameters of the approximating distribution that are closer to the exact posterior, and then we use the resulting distribution as a proxy for the exact posterior, e.g., to approximate the posterior predictive distribution. For a review of variational inference, see Blei, Kucukelbir and McAuliffe (2017).

Following other successful applications of variational inference, we consider mean-field variational inference, in which the variational distribution factorizes across all latent variables. We use Gaussian variational factors for all the latent variables in the TTFM model, and therefore, we need to maximize the ELBO with respect to the mean and variance parameters of these Gaussian distributions. We use gradient-based stochastic optimization (Robbins and Monro, 1951; Blum, 1954; Bottou, Curtis and Nocedal, 2016) to find these parameters. The stochasticity allows us to overcome two issues: the intractability of the expectations and the large size of the dataset.

The first issue is that the expectations that define the ELBO are intractable. To address that, we take advantage of the fact that the gradient itself can be expressed as an expectation, and we form and follow Monte Carlo estimators of the gradient in the optimization procedure. In particular, we use the reparameterization gradient (Kingma and Welling, 2014; Titsias and Lázaro-Gredilla, 2014; Rezende, Mohamed and Wierstra, 2014). The second issue is that the dataset is large. For that, we introduce a second layer of stochasticity in the optimization procedure by subsampling datapoints at each iteration and scaling the gradient estimate accordingly (Hoffman et al., 2013). Both approaches maintain the unbiasedness of the gradient estimator.

### A4 Model Tuning and Goodness of Fit

Figure 2 shows how well the model matches the actual purchase probabilities by distance. Figures A6A6 and A6 show goodness of fit broken out by distance from ther user, by user frequency decile, and by restaurant visit decile for the TTFM and MNL models.

Table A1 illustrates how much of the variation in mean item utility (excluding distance) is explained by observable characteristics. All observables combined explain 14 percent of the variation. City and categories each explain 6 – 7 percent and lose only a little explanatory power once other variables are accounted for. Star ratings and price range account for 2.8 and 2.3 percent of the variation respectively when considered alone, but only 0.6 percent and 0.4 percent once the other variables are taken into account.

Table A2 gives the means and standard deviations of elasticities in the MNL and TTFM models. Figure A7 plots the distribution of elasticities where the unit of analysis is the restaurant-user pair.

Tables A3, A4 and A5 illustrate how the model can be used to discover restaurants that are similar in terms of latent characteristics to a target restaurant. Distance between two restaurants, and , is calculated as the Euclidean distance between the vectors of latent factors affecting mean utility, and . Note that because distance is explicitly accounted for at the user level, we do not expect restaurants with similar latent characteristics to be near one another; rather, they will uncover restaurants that would tend to be visited by the same consumers, if they were (counterfactually) in the same location. We see that indeed, the most similar restaurants to our target restaurants are in quite different geographic locations. Perhaps surprisingly, the category of the similar restaurants is generally different from the target restaurant, suggesting that other factors are important to individuals selecting lunch restaurants.

Tables A6, A7 and A8 examine restaurants that are similar accounting for all components of utility. Let be the average over dates that user visited restaurants of . Distance between two restaurants, and , is calculated as the Euclidean distance between the mean utility vectors, and , where is the number of users. Relative to the previous exercise, we see that similar locations are very close geographically, but also still similar in other respects as well. There are many restaurants in close proximity to the selected restaurants, so the list displayed is not simply the set of closest restaurants.

### A6 Counterfactual Calculations

Figure A8 illustrates the model’s predicted impact of restaurant openings and closings on different groups of neighboring restaurants.

Sections VI and the counterfactual exercise in V rely on a similar form of calculation: how many visits would we predict restaurant would receive if it were located in location currently occupied by restaurant . When we do this, we assume that all characteristics of , both observed and latent stay the same, except that when we calculate the utility for each consumer for , we use the location of when calculating distances. In principle, we can predict the demand would receive at any location in the region, however it is easier to have replace an existing location , since this ensures that the chosen location is reasonable (e.g. not in the middle of a forest or a highway).

To calculate demand for replacing restaurant , we calculate new values of the utilities for for each user and session , which change only due to the new distances are used instead of the real distances .

 Uuti′;i=Uuti′−γuβi′(log(dui)−log(dui′))

Then we recalculate each user’s new choice probabilities in each session, and take the sum across all users and sessions in order to get the new predicted total demand for each restaurant under the counterfactual that is located in the location of restaurant .

 P(yuti′;i=1)=exp(Uuti′;i)exp(Uuti′;i)+∑l∉i,i′exp(Uutl)
 Demandi′;i=∑u∑tP(yuti′;i=1)

In Section V, we repeat this calculation for each restaurant that either opens or closes. We draw from two distinct sets, is 100 restaurants chosen at random from the same category as and is 100 restaurants chosen at random from restaurants that are not in the same category as . In Table 6 we compare the predicted demand for the place that opens or closes, , to the mean counterfactual predictions for in and , i.e.,

 1|Isame|∑i′∈IsameDemandi′;i
 1|Idiff|∑i′∈IdiffDemandi′;i

In Section VI, the set of target restaurants includes one location selected at random from each geohash6. The set is one restaurant from each major category (the variable category_most_common) with the constraint that each restaurant chosen is within standard deviation of the population mean for total demand. This constraint was to try to make the set of comparison restaurants relatively similar in popularity. In the “best location for each category” in Figure A9 we plot for a single category the predicted demand for each in the set of target locations. In Figure A10, we selected subsets of 4 or 5 categories of restaurants from that have the same price range and illustrate for each target location the category of restaurant that is

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

102088

How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description