Recommendation via matrix completion using Kolmogorov complexity
A usual way to model a recommendation system is as a matrix completion problem. There are several matrix completion methods, typically using optimization approaches or collaborative filtering. Most approaches assume that the matrix is either low rank, or that there are a small number of latent variables that encode the full problem. Here, we propose a novel matrix completion algorithm for recommendation systems, without any assumptions on the rank and that is model free, i.e., the entries are not assumed to be a function of some latent variables. Instead, we use a technique akin to information theory. Our method performs hybrid neighborhood-based collaborative filtering using Kolmogorov complexity. It decouples the matrix completion into a vector completion problem for each user. The recommendation for one user is thus independent of the recommendation for other users. This makes the algorithm scalable because the computations are highly parallelizable. Our results are competitive with state-of-the-art approaches on both synthetic and real-world dataset benchmarks.
The work was partially supported through the Carnegie Mellon/Portugal Program managed by ICTI from FCT and by FCT grant SFRH/BD/52162/2013.
The continuing increase of online services, like e-commerce, audio/video streaming, online news, reviews and opinion providers, potentiate the demand for recommendation of online services/products. The huge amount of services/products available makes the choice a difficult matter. Users rely not only on reviews and ratings, but also take into account automatic suggestions by the providers. Therefore, automatic recommendation systems became essential and widely used by providers and consumers.
Previous work. Several approaches to the matrix completion problem reformulate it into an optimization problem, assuming that the matrix to recover has low rank, and that the observed entries’ positions are sampled from accordingly to a uniform distribution, see [Candès and Tao, 2010]. Although the rank minimization problem is NP-hard, approaches following the ideas in [Candès and Tao, 2010] are used with relative success. It consists in relaxing the problem so that it becomes convex, and then in minimizing the nuclear norm of the matrix. These methods are very used in practice. In other approaches, it is assumed that the matrix to complete is high rank. This also entails dealing with a NP-hard problem. Nonetheless, under certain assumptions, some incomplete high rank or even full rank matrix can be completed, as in [Balzano et al., 2012]. In their work, the authors assume that the columns of the matrix to complete belong to a union of multiple low-rank subspaces. This way, the problem can be viewed as a missing-data version of the subspace clustering problem.
Collaborative filtering approaches are mainly divided in two research lines: model-based and neighborhood-based [Ricci et al., 2011]. The first line tries to model latent factors of both users and items and is widely used due to its demonstrated success for movie recommendation in the Netflix prize [Bennett et al., 2007]. The second line of research does recommendation based on users with similar tastes/preferences or items that are similar to the users preferences. This last line further divides into three main approaches, user-based, item-based and hybrid. In user-based methods, we select a set of similar users based on similarity among them to recommend items as, for example, in [Zhao and Shang, 2010]. Item-based methods, are analogous, but performed using similarities among the items as, for instance, in [Sarwar et al., 2001]. The hybrid approaches combine the previous, see [Wang et al., 2006]. In this work we use hybrid collaborative filtering to address the matrix completion problem.
In recent work by [Ganti et al., 2015], the authors addressed the matrix completion problem not assuming that the matrix is low rank, as is most common. They consider the case when entries of a low-rank matrix are recovered through a Lipschitz monotonic function, transforming the matrix into a high rank one, and the aim is to recover the unobserved entries. For the task, they propose an iterative method that alternates between estimating a low rank matrix, and estimating the monotonic function, in order to recover the missing elements of the high rank matrix. Further, they provide Mean Square Error (MSE) bounds for the recover error, based on the rank of the matrix, its size, and properties of the nonlinear transformation. The algorithm only applies to functions that are nonlinear monotonic transformations of the inner product of latent features.
In [Song et al., 2016], the authors address the matrix completion problem using a novel framework for nonparametric regression over latent variable models. They propose to model the unknown matrix entries as a Lipschitz function of two latent variables, one for users and another for items. Using the Taylor expansion of the unknown function, around different points, they can define the value of the missing entry as a weighted convex combination of the known entries. They use as measure of similarity the sample variance between rows and columns. Then, they use kernel regression to perform local smoothing.
In [Wang et al., 2006], the authors present a generative probabilistic framework that considers similarity between users and between items. The prediction of each unknown matrix entry is made by averaging the individual ratings weighted by the users confidence. This allows the authors to take advantage of both user correlations and item correlations to better estimate the missing entries of the rating matrix. The authors consider three similarity matrices in their work.
Main contributions. We present a simple approach to build a recommendation system based on matrix completion by performing hybrid (user and item) neighborhood-based collaborative-filtering, summarized in Algorithm 1 from Section 2.2. Our method explores Kolmogorov complexity to construct a similarity measure from information theory [Cover and Thomas, 2012], and to propose new similarity measures. The algorithm that we propose is modular and the recommendation for each user can be computed independently. Further, our algorithms works with a small number of data points, it works for both low-rank and high-rank matrix completion, without the need of any initialization. Last, the computations of the algorithm can be done in a distributed fashion, making it scalable.
Paper structure. The remainder of the paper is organized as follows. In Section 2, we introduce some notation and present our setup specification. In Section 3, we use our matrix completion algorithm, Algorithm 1, to evaluate its performance, with both synthetic data and real-world datasets. Section 4 concludes the paper and draws avenues for further research.
We first introduce some notation to make the paper self-contained, and then we present our matrix completion algorithm and its computational complexity analysis.
We denote the set of users by , the set of items by , and the matrix of ratings by , where denotes the rating that user gave to item . The entries take values on the allowed ratings together with a special number denoting the absence of rating (in this work this value is ). We adopt standard notation to denote matrices and vectors. For a matrix , we denote the th row of by , the th column of by , and the th column of the th row by . Given a set of objects , a similarity is a function such that whenever , . For a square matrix representing similarities we use the letter indexed by or , if the similarity matrix represents similarities between users or items, respectively. Further, given two vectors with dimension , and , we denote by the vector whose entries are the product of the entries of and , i.e., . Finally, we use the semi-norm . Given a vector , is the number of non zero entries of .
2.2. Setup specification
We propose a recommendation system, by making matrix completion as in hybrid neighborhood-based collaborative filtering approaches. Our approach computes two matrices of similarities, one between users, , and another between items, . After, we complete each entry of user and item by assigning a convex combination of two quantities, by a parameter . The first quantity is a weighted average of the ratings user gave to other items by the similarities between the other items and item . The second quantity is a weighted average of the ratings of item given by other users similar to user . Figure 1 depicts the users and items , connected by an edge with weight whenever user rated item . The blue and green edges depict the similarities between users and between items, respectively, with the weights from each similarity matrix and , respectively.
To build the matrices and , we propose two compression similarities based on Kolmogorov complexity, see [Cover and Thomas, 2012]. Given the description of a string, , its Kolmogorov complexity, , is the length of the smallest computer program that outputs . In other words, is the length of the smallest compressor for . Although Kolmogorov complexity is non-computable, there are efficient and computable approximations by compressors. Let be a compressor and denote the length of the output string resulting of the compression of using . The first similarity measure we propose is the following.
Compression similarity. Using the normalized compression distance, see [Li et al., 2004], we define the compression similarity as:
where string is the concatenation of and . We implement the description of users/items as the string composed by the index of rated items/rating users and respective rating. For instance, if user rated the items , , we write the description of user as the string “”.
Inspired by CS, in order to reduce the computational complexity, we propose another similarity measure.
Kolmogorov similarity. We define the Kolmogorov similarity as:
To compress the description strings, we use the standard compression tools from the zlib library111https://tools.ietf.org/html/rfc1950. Intuitively, both similarities measure how identical are the compactest descriptions of a pair of users or a pair of items.
The compression similarity measures are used to compute the two similarity matrices, and .
To complete the rating matrix , we set each non-filled entry in the completed matrix as a convex combination by parameter of two quantities. The first is the weighted average of the sum of the ratings of each user , weighed by the square of the number of common rated items together with , . The second is the sum of the ratings of each item , weighed by the square of the number of user rating the item together with , . Recalling the definitions of and , from Section 2.1, the first quantity is given by
Similarly, the second quantity is given by
Lastly, fixed the parameter , we estimate each non filled matrix entry as
Observe that if , it corresponds to user-based collaborative filtering, and if , it corresponds to item-based collaborative filtering. The previous steps are summarized in Algorithm 1.
Our approach allows to decouple the problem into a set of independent user-by-user subproblems. Hence, to generate a set of recommendations for a user, we do not need to complete the entire rating matrix, instead we only need to complete the corresponding matrix row.
2.3. Complexity analysis
To build the user similarity matrix , we first precompute the quantity for each user . After, we build an matrix where each entry for each , where we use the pre-computed values from the first step. Hence, both time and space complexity for this step are . Mutatis mutandis, both time and space complexity to build the item-item similarity matrix are .
For the similarity measure CS, we perform the same precomputations, but to build matrices and , we further need to compute the compression of the concatenation of pairs of users and pairs of products, respectively. Henceforth, the time complexity is and , whilst the space complexity is and , respectively for and .
For the matrix completion problem, steps 4-9 of Algorithm 1, the time complexity is (to compute the weighted averages in step 7) times the number of elements of the matrix . This yields a time complexity of . The space complexity of those steps is .
3. Experimental setup
Next, we describe our experimental settings and analyze the experimental results.
We test Algorithm 1 on synthetic and real-world datasets. All experiments were done in a 2.8GHz Intel Core 2 Duo, with 4GB 800MHz RAM, using Matlab 2016 and Python 3. For the synthetic data, we generate randomly four full rank matrices, with dimension , and with entries in .
For the real-world datasets we use the MovieLens 100k (ML–100k) and the MovieLens 1M (ML–1M), available in http://movielens.umn.edu, and both datasets have ratings in . Table 1 contain a more detailed description of these datasets.
|number of users||1000||6000|
|number of items||1700||4000|
|number of ratings||100,000||1,000,000|
3.2. Evaluation metric
To evaluate and compare the performance of the proposed algorithm, Algorithm 1, we use the 5-fold-cross-validation method on both synthetic and real data. For the ML–100k, the dataset already provides a set of train and test files. For the ML–1M we randomly split the original dataset in a set of train/test files. In the synthetic data the four randomly generated full rank matrices, with dimension , were split as in the ML–1M case.
We use the root-mean-square error (RMSE) [Koren, 2008] to evaluate the performance of the proposed algorithm by measuring the difference between the estimated missing values and the original values. Let be the original matrix, equal to except on the missing entries of the test set , and let be the estimation of by a matrix completion method when applied to . The RMSE is given by
3.3. Experimental results
We use the above described datasets to test our algorithm, using both similarity measures KS and CS, against the following algorithms: NormalPredictor, BaselineOnly [Koren, 2010], KNNBasic [Altman, 1992], KNNWithMeans [Altman, 1992], KNNBaseline [Koren, 2010], SVD [Salakhutdinov and Mnih, 2007], SVD++ [Koren, 2008], NMF [Lee and Seung, 2001], Slope One [Lemire and Maclachlan, 2005] and Co-clustering [George and Merugu, 2005]. This set of algorithms is implemented in the Python toolkit Surprise222http://surpriselib.com/. The results of the experiments are summarized in Table 2, for the synthetic data, and in Table 3, for the real datasets. For the synthetic data, the best result corresponds to using Algorithm 1, with the similarity CS. When using similarity KS, the result is the third best in the set of tested methods. This happens because the majority of the compared methods assume that the matrix they are completing is low rank, which might be the case in these datasets, but might not be the case in general.
With real data, using both KS and CS similarity measures, our algorithm does not have the lowest RMSE, which may happen due to the fact that most of the compared methods assume the completed matrix is low rank. However, the results are comparable and of the same order as the best reported ones. The advantages of our algorithm are: it can be computed in a distributed fashion, does not need assumptions on the matrix rank, does not need to known the dimensions of the subspaces neither initialization, does not estimate latent variables, and it is model free. Finally, it scales better than the methods with better RMSE, on the real data, than our method.
We present a novel hybrid neighborhood-based collaborative filtering recommendation system, by making independent user-by-user matrix completion, that uses Kolmogorov complexity. Our method does not require assumptions about the rank of the matrix, does not need to specify dimensions of subspaces, it is model free, and therefore it is more general. We present experimental results on both synthetic and real dataset which show that our approach is comparable with state of the art approaches. The avenues for further research include exploring matrix completion with the presence of noise, and to extend this work, where in an initial step, we cluster users and items by using the similarities between users and items, respectively.
- [Altman, 1992] Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185.
- [Balzano et al., 2012] Balzano, L., Eriksson, B., and Nowak, R. (2012). High rank matrix completion and subspace clustering with missing data. In Proceedings of the conference on Artificial Intelligence and Statistics (AIStats).
- [Bennett et al., 2007] Bennett, J., Lanning, S., et al. (2007). The netflix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35. New York, NY, USA.
- [Candès and Tao, 2010] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053–2080.
- [Cover and Thomas, 2012] Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons.
- [Ganti et al., 2015] Ganti, R. S., Balzano, L., and Willett, R. (2015). Matrix completion under monotonic single index models. In Advances in Neural Information Processing Systems, pages 1873–1881.
- [George and Merugu, 2005] George, T. and Merugu, S. (2005). A scalable collaborative filtering framework based on co-clustering. In Data Mining, Fifth IEEE international conference on, pages 4–pp. IEEE.
- [Koren, 2008] Koren, Y. (2008). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–434. ACM.
- [Koren, 2010] Koren, Y. (2010). Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1):1.
- [Lee and Seung, 2001] Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556–562.
- [Lemire and Maclachlan, 2005] Lemire, D. and Maclachlan, A. (2005). Slope one predictors for online rating-based collaborative filtering. In Proceedings of the 2005 SIAM International Conference on Data Mining, pages 471–475. SIAM.
- [Li et al., 2004] Li, M., Chen, X., Li, X., Ma, B., and Vitányi, P. M. (2004). The similarity metric. IEEE transactions on Information Theory, 50(12):3250–3264.
- [Ricci et al., 2011] Ricci, F., Rokach, L., and Shapira, B. (2011). Introduction to recommender systems handbook. Springer.
- [Salakhutdinov and Mnih, 2007] Salakhutdinov, R. and Mnih, A. (2007). Probabilistic matrix factorization. In Nips, volume 1, pages 2–1.
- [Sarwar et al., 2001] Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295. ACM.
- [Song et al., 2016] Song, D., Lee, C. E., Li, Y., and Shah, D. (2016). Blind regression: Nonparametric regression for latent variable models via collaborative filtering. In Advances in Neural Information Processing Systems, pages 2155–2163.
- [Wang et al., 2006] Wang, J., De Vries, A. P., and Reinders, M. J. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 501–508. ACM.
- [Zhao and Shang, 2010] Zhao, Z.-D. and Shang, M.-S. (2010). User-based collaborative-filtering recommendation algorithms on hadoop. In Knowledge Discovery and Data Mining, 2010. WKDD’10. Third International Conference on, pages 478–481. IEEE.