A Nonparametric Latent Factor Model For Location-Aware Video Recommendations

A Nonparametric Latent Factor Model For Location-Aware Video Recommendations


We are interested in learning customers’ video preferences from their historic viewing patterns and geographical location. We consider a Bayesian latent factor modeling approach for this task. In order to tune the complexity of the model to best represent the data, we make use of Bayesian nonparameteric techniques. We describe an inference technique that can scale to large real-world data sets. Finally we show results obtained by applying the model to a large internal Netflix data set, that illustrates that the model was able to capture interesting relationships between viewing patterns and geographical location.

1 Introduction

In a web application we are provided with a rich view of each user. For example in a video streaming application, like Netflix, we can observe not only their preference for different types of content but also how those preferences change with respect context, such as time of day, day of week, device, and so on. An important contextual variable that influences a customer’s preferences is their geographical location. It is reasonable to assume that customers who live in close proximity may have similar viewing preferences. Hence, a model is required that can capture not only a customer’s latent viewing preferences, but also the relationship between those and their location. To capture both these aspects we seek to model them in a unified model so that both location and viewing behavior can take advantage of information in each modality. For this task, we employ a nonparametric latent factor model to jointly model a customer’s viewing history and their geographical location.

Nonparametric mixed membership style techniques have shown great promise in modeling large collections of documents [1]. Given that there is more information available for a document (author, date of publishing, metadata etc.) than just its content, it seems natural to extend these approaches to model all these modalities in a unified approach. Hence there have been many attempts in applying nonparametric latent factor modeling for such data sets [2]. Our approach uses a similar model structure as [2] which attempts to model document-level features along with the content of documents. For the problem under consideration, we view a customer’s viewing history as an unordered collection of discrete view events from Netflix’s video catalog. Geographical locations of customers are expressed in longitudes and latitudes. The geographical locations can be viewed as points on a 2-sphere. Therefore we use an approach similar to [3] using Von Mises-Fisher distribution to describe geographical data. The full model combines these sub-components (viewing history and geographical location) and is able to learn embeddings for customers’ viewing history data, geographical location data, and the interactions between the two.

The following sections detail how we model these components, how we infer that model (in a way that scales to large-scale data sets), and finally results from an internal Netflix data set that illustrates that the model is indeed able to capture interactions between geographical information and viewing preferences.

2 Model Details

The component of our model which describes customers’ streaming history data is a nonparametric mixed membership model that uses a hierarchical dirichlet process to learn latent video factors; each of which are multinomial distributions over content catalog. Similarly, the component of our model that models geographical locations uses hierarchical dirichlet process to learn latent factors for geographical locations; each of which are Von Mises-Fisher distributions over a 2-sphere. Finally the relationship between the two latent spaces is expressed through a dirichlet process over the interaction of video and location latent factor spaces. We summarize our modeling assumptions as follows and then comment on different components of the model.

3:  for customer in data set  do
7:     for  in video history  do
9:        .
10:     end for
11:  end for

2.1 Modeling Location Data

For geographical location data of customers, we need a distribution which can express the spherical nature of the data. We make use of Von Mises-Fisher distribution for modeling locations. We use the following parameterization of Von Mises-Fisher distribution:


where ; and are the parameters of the distributions; is modified Bessel function of first kind with order computed at . This parameterization requires locations to be expressed in Euclidean coordinates. Hence, we convert geo-spherical coordinates to Euclidean system. The prior distributions for and are:


The prior distribution of is chosen to be a Von Mises-Fisher Distribution itself which is conjugate to Von Mises-Fisher likelihood. The concentration parameter does not have a conjugate prior. We use a log normal prior for similar to [3].

2.2 Modeling Video History Data

As mentioned above, we view customers’ videos streaming history as unordered collections of videos watched from the Netflix’s catalog. We use a Dirichlet-Multinomial conjugate model for representing video streaming history of customers:


represents a single draw from the multinomial distribution on a video catalog of size V.

2.3 Modeling Interaction of Video and Location Latent Factors

The interaction of video and geographical latent spaces is modeled by a dirichlet process with a product base measure i-e the base measure is on atoms which are pairs of dirichlet processes drawn from the dirichlet process on location and video latent factors respectively. This construction allows the model to flexibly learn as many interactions between video preferences and geo-locations as needed to best express the data.

2.4 Inference

We use a sampling based approach for posterior inference. Due to dirichlet-multinomial conjugacy in the video component of the model, we collapse out for each latent video factor. For the location component of the model, prior distribution of (Von Mises-Fisher) is conjugate to Von Mises-Fisher likelihood, hence we collapse out as well for each latent location factor. The prior distribution of (log-normal) is not conjuage to Von Mises-Fisher likelihood, hence we use Metropolis-Hasting algorithm to sample for each latent location factor. For the nonparametric components, we make use of the direct assignment scheme described in [1]. Hence, instead of sampling atoms, we sample indicators to those atoms. Specifically, (taking values in t = 1,…,) is the indicator to the atom , (taking values in s = 1,…,) is the indicator to the atom , and (taking values in z = 1,…,) is the indicator to the atom . Additionally, we sample the global dirichlet processes and according to the direct assignment scheme in [1]. The sampling distributions for these latent variables are as follow:


Above, represents the complete conditional distribution of the variable. Notations like represent conditional counts; count of variables and ignoring customer for example. Notations like and represent marginal counts; marginal counts of variable , marginalizing over and respectively for all customers except (subscripts and are used to differentiate the two marginals involving ).

3 Experiments

In order to scale our sampling based posterior inference, we use an approximate parallel gibbs sampling approach as described in [4]. For our experiment we use an internal data set that contains video viewing history for one million Netflix customers along with their geographical locations. We include some of the examples of latent video and geographical factor learned by our model as well as the top three video topics for the two geographical latent factors found in the United States of America.

(a) Romantic Shows Topic
(b) Documentaries Topic
Figure 1: Video Latent Factors capturing Romantic Shows and Documentaries
Figure 2: Two Example Geographical Latent Factors Found in the United States of America.
Figure 3: Top Video Topics for the geographical latent factor
Figure 4: Top Video Topics for the geographical latent factor

4 Conclusion

We use bayesian non-parameteric machinery to combine geographical and viewing behavior information of customers of Netflix for location-aware video recommendations. The approach presented can also be helpful in situations where the viewing history data is sparse or cold-start scenario.


[1] Teh, Y.W., Jordan, M.I.,Beal, M.J., & Blei, D.M. (2006) Hierarchical Dirichlet Process. Journal of the American Statistical Association, 101, 1566-1581.

[2] Nguyen, V., Phung, D., Nguyen, X.,Venkatesh, S. & Bui, H.H. (2014) Bayesian Nonparametric Multilevel clustering with group-level contexts . Proceedings of the ICML

[3] Gopal, S. & Yang, Y. (2014) Von Mises-Fisher Clustering Models. Proceedings of the ICML.

[4] Newman, D., Asuncion, A., Smyth, P. & Welling, M. (2009) Distributed Algorithms for Topic Models. Journal of Machine Learning,10(Aug):1801-1828.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description