Attributes Coupling based Item Enhanced Matrix Factorization Technique for Recommender Systems
YU : Attributes Coupling based Item Enhanced Matrix Factorization Technique for Recommender Systems
1Introduction
systems [1] are intelligent software tools that provide web users with the decision-making support information, such as what movie to watch, what book to read and what product to buy. At present, recommender systems have become indispensable since it overcomes the information overload problem, by providing web users with the personalized information, products or services to satisfy their tastes and preferences. In this paper, we unify these information, products and services and call them ‘items’. In order to keep customer loyalty and prompt sale revenues, more and more e-commerce sites deploy recommender systems to meet users’ information demands. Some typical web applications equipped with recommender systems include product recommendation in Amazon
Collaborative filtering (CF) [2] is one of the most widely used techniques for building recommender systems and has achieved great successes in E-commerce for its domain independency (i.e. collaborative filtering only requires the past activities history of users to make recommendations, and does not depend on the types of Items). However, collaborative filtering suffers from the following limitations [1].
Data Sparsity. A modern E-commerce recommender system may include millions of users and millions of items. Even a very active user, however, exhibits a relatively small proportion of items available in E-commerce systems. Meanwhile, even the very popular items are rated by only a tiny part of users existing in E-commerce systems. Facing the sparsity of available user activity records, it is difficult for collaborative filtering based recommender systems to discover similar users or similar items according to their rating behaviors. As a result, the collaborative filtering based recommender systems are unable to generate personalized recommendations for users. This problem, in general, referred to as the data sparsity problem, is the major issue that leads to negative effects on the recommendation quality of the collaborative filtering based recommender systems.
Cold Start Problem. Cold start problem can be categorized into cold start user problem and cold start item problem. Cold start users refer to the users who have just joined the e-commerce system and have expressed very few ratings. Hence, the collaborative filtering based recommender systems are incapable to provide accurate recommendations for cold start users, due to the lacking of sufficient rating information to find cold start users’ neighbors or learn their latent preferences. Similarly, cold start items refer to new items or items that only have received a small number of ratings from users. Hence, cold start items cannot be accurately recommended until they have been rated by a sufficient number of users.
Scalability. In order to make recommendations for users, recommender systems equipped with traditional collaborative filtering algorithms need to compute the pairwise similarities among users or among items, whose time complexity of computing similarities grows exponentially with the number of users and the number of items. As the rapidly growing amount of users and items available in E-commerce systems, traditional collaborative filtering algorithms suffer seriously from scalability problems.
Many work has been proposed to overcome different types of issues mentioned above in the research of recommender systems. For instance, in order to deal with the data sparsity issue, Sarwar et al. [5] and Yongli Ren [6] adopted imputation techniques to filling the missing ratings and make the user-item rating matrix dense. However, data imputation is still in its infancy and several issues involved data imputation still remain unexplored, such as how to select the most important missing data to fill in. On the other hand, several clustering techniques based recommendation algorithms have been proposed to cope with the scalability issue. Rashid et al. proposed CLUSTKNN [7], which uses a variant of basic k-means algorithm to partition users into clusters, and then leverages a CF algorithm to produce recommendations. Xue et al. proposed CBSMOOTH [12], which uses the clusters as the computed groups and smoothes the unrated data for individual users. Although clustering techniques based recommendation algorithms can improve the scalability of recommender systems, they often provide less personalized recommendations and often lead to poor accuracy. To overcome cold start problems, earlier work combined traditional collaborative filtering with user demographics or product descriptions to alleviate the cold start problem [8], more recently research concentrates on extending the matrix factorization method [11], to which our work belongs to.
In the last years, matrix factorization [14] methods have drawn lots of attentions due to their good scalability and predictive accuracy. In addition, matrix factorization technique offers a flexible framework to incorporate additional sources of information to improve the recommendation quality. Moreover, Koren [14] and Adomavicius [1], who both are famous research scientists in the research of recommender system, argued that additional information, such as social network information, user demographics and item descriptions, may provide useful information for matrix factorization technique to improve the recommendation performance. Following by the hints and with more rich additional sources of information become available, several recommendation approaches are introduced to extend the matrix factorization techniques by utilizing additional information recently. For example, in [11], Zhen Yi et al. proposed TagiCoFi to seamlessly integrate tagging history into the matrix factorization framework. Hao Ma et al. [12] and Jamali et al. [13] present social recommendation algorithms based on matrix factorization by employing both users’ social network information and rating records. Their experimental results demonstrate that those additional information can be leveraged to improve the recommendation quality.
Various additional information has been exploited to improve the quality of recommendation under the matrix factorization framework. However, the majority focus on dealing with the cold start user problem by leveraging all kinds of additional information and ignore the cold start item problem, which our work try to tackle by leveraging item attribute information with matrix factorization framework.
Item attribute information is an important supplement to the user interaction records and has been exploited to improve the performance of recommendation algorithms. For instance, Kim et al. [15] incorporated item attributes into a item-based probabilistic model to solve the cold start item problem. Hence, we can inherit the advantages of matrix factorization approach as well as cope with the cold start problem by combining matrix factorization approach and item attribute information.
To the best of our knowledge, there exists only one recommendation algorithm [16] which attempts to combine matrix factorization approach and item attribute information to improve the recommendation quality. Specifically, in this method, the similarity between different items is measured by the simple matching similarity ( SMS ) [17], which is too rough to capture the closeness of two items.
In this paper, we propose attributes coupling based item enhanced matrix factorization method by incorporating item attribute information to overcome the cold start item problem, and consequently improve the quality of recommendation. Specifically, item attribute information is exploited to regularize the matrix factorization by adding item relationship regularization term to the objective function of matrix factorization. The item relationship regularization term makes two item-specific latent feature vectors as similar as possible if the two items have similar attribute contents. Furthermore, in order to deeply capture the relationship between items, Coupled Object Similarity (COS) [18] is adapted to measure the interactions or couplings between items. The effectiveness of COS in capturing genuine relationships between items described by categorical attributes has been validated in [18]. Experimental results on two real-life data sets show that our proposed method outperforms the state-of-art recommendation methods, and can effectively cope with the cold start item problem when more item attribute information is available.
The key contributions of our work are summarized as follows:
we propose attributes coupling based item enhanced matrix factorization method. By combining item attribute information and matrix factorization framework, we can cope with the cold start item problem existed in matrix factorization, and at the same time, inherit the advantages of matrix factorization approach.
we capture the relationships among items based on COS, which has been evaluated to outperform other similarity measures (e.g., SMS [17],ADD [21]) for categorical data. By this means, we overcome the similarity measure problem in matrix factorization framework.
we perform extensive experiments to evaluate our proposed method on two real data sets in terms of the recommendation quality and the effectiveness of tackling the cold start item problem.
The rest of this paper is organized as follows. Section 2 briefly reviews related work in recommender systems. Section 3 introduces the preliminary knowledge used in this paper. Section 4 describes the details of our proposed item recommendation algorithm by combining matrix factorization framework with item attribute information, whose relationships are measured by the coupled object similarity metric. Experiments are evaluated in Section 5. Finally, we conclude this paper and present some directions for future work in Section 6.
2Related Work
Collaborative filtering (CF) [1] approaches have achieved a great success in the research of recommender systems since CF methods are domain independent and only require the past activities history of users, i.e. user-item rating matrix, to make recommendations. According to different means of utilizing the user-item rating matrix, collaborative filtering approaches can be divided into two main categories [2]: memory-based algorithms and model-based algorithms.
Memory-based filtering algorithms, also known as neighbor-based methods, use the entire user-item rating matrix to generate recommendations. Memory-based methods firstly employ various similarity measures to find user neighborhood or item neighborhood for the active user or target item, respectively. Once the neighborhoods are formed, memory-based filtering algorithms usually take a weighted sum of ratings given by their neighbors (active user’ neighbors or target item’ neighbors) as a prediction for target item. Typical memory-based algorithms include user-based methods [3] and item-based methods [22]. User-based approaches predict the ratings based on the opinions of active user’s neighbors, which have similar preferences with active user. On the other hand, item-based approaches provide predictions based on the ratings given by active user for items similar to target items in terms of rating patterns.
In contrast with memory-based filtering approaches, which utilize entire user-item matrix to provide recommendations for active users, model-based filtering approaches first make use of statistical and machine learning techniques to learn a predictive model from training data. The predictive model can characterize the rating behaviors of active users. Then model-based filtering approaches use the trained model to make predictions, rather than directly utilize the entire user-item matrix to compute predictions. Typical examples of model-based filtering approaches include Bayes networks [2], clustering model [24], latent semantic analysis [26], restricted boltzmann machines [28] and association rules [29]. Breese et al. [2] presented a collaborative filtering algorithm based on Bayesian networks learned from training data. Hofmann et al. [26] introduces latent class variables to discover user communities and prototypical interest profiles. Ungar et al. [24] grouped similar users in the same class and make predictions according to active user’s neighbors belonged to the same class with active user. Sarwar et al. [29] applied association rule discovery algorithms to seek association between co-purchased items and then provided recommendations based on the strength of the association between items.
Generally, memory-based algorithms tend to easy to implement and produce reasonable highly prediction quality. However, memory-based algorithms suffer from serious scalability problem. As the volume of of user and item sets increasingly grow, their worse online performance make it not appropriate for modern E-commerce sites. Model-based algorithms tend to be faster than memory-based algorithms in terms of response time. The disadvantages of model-based algorithms are that many theoretical models are complex and are not fit well with real data. In addition, it takes a long time to build or update models for model-based algorithms.
Since the great success of Netflix Prize competition, matrix factorization [14] based recommendation algorithms have gained great popularity due to their effectiveness and efficiency in dealing with very large user-item rating matrix. Based on the assumption that only a few factors contribute to a user’s preference and item’s characteristics, matrix factorization approaches simultaneously embed both user and item feature vectors into a low dimension latent factor space, where the correlation between user’s preference and item’s characteristics can be computed directly, and then utilize their low dimension representations to make further recommendations. Examples of matrix factorization based recommendation algorithms include Singular Value Decomposition (SVD) [31], Nonnegative Matrix Factorization (NMF) [32], Maximum-Margin Matrix Factorization (MMMF) [35], Probabilistic Matrix Factorization (PMF) [37], nonparametric matrix factorization (NPCA) [38].
The above mentioned matrix factorization methods for recommender systems only utilize user-item rating information to learn latent user feature vector and item feature vector, but ignore additional information, for instance, social networks, tagging information and item attribute information etc.. Although the proceeding matrix factorization methods can effectively and efficiently deal with large user-item rating information, they may fall into cold start problem since the sparsity of user-item rating information.
Recently, based on the intuition that additional information may be useful for improving the performance of recommender systems, especially for overcoming the cold start user problem, several matrix factorization algorithms have been proposed. For example, Zhen Yi et al. [11] proposed TagiCoFi to seamlessly integrate tagging history into the matrix factorization framework. Le Wu [39] proposed a two-stage recommendation framework, named as Neighborhood-aware Probabilistic Matrix Factorization (NHPMF), to improve recommendation accuracy. The NHPMF extended the probabilistic matrix factorization method by leveraging tagging data. Hao Ma et al. [12] and Jamali et al. [13] proposed social recommendation algorithms based on matrix factorization by employing both users’ social network information and rating information. These extensions of matrix factorization methods leverage additional information, such as tagging data and social relations, to infer the similarity among users. Then the preprocessed similarity information are incorporated into some kind of basic matrix factorization methods to guarantee that the learned latent user feature vectors are close as possible to that of neighbors of users. These approaches are specially effective for tackling cold start user problem and force the latent feature vectors of new user with no or very few ratings to depend on the latent feature vector of their most similar neighbors whose latent feature vectors can be accurately learned from user-item matrix.
However, there are several problems with these methods. First, Tagging data, expressed as words, are labeled by user arbitrarily. Taking social relations as the similarity between users, which is too coarse-grained to distinguish the degree of similarity between different users since the similarity value take 1 only if two users have trust relationship, otherwise 0. Moreover, they only consider cold start user problem and ignore the cold start item problem.
In contrast, item attribute information, for example, director, actor, genre for movie item, generated by domain experts, can more accurately represent the characteristics of item. Hence, item attributes information can be exploited to deal with cold start item problem and improve the quality of recommendation. However, few work focus on exploiting item attributes information to improve the quality of recommendation. To the best of knowledge, only Nguyen et al. [16] proposed content-boosted matrix factorization method for recommender systems by utilizing item attribute content to improve recommendation quality. In the content-boosted matrix factorization method, the similarity between two items is measured according to the simple matching similarity, which is too rough to capture the genuine relationships among items.
3Preliminary Knowledge
In this section, we introduce the preliminary knowledge related to our proposed attributes coupling based item enhanced matrix factorization algorithm. We first introduce the notations used in this paper in Section 3.1. Then, in Section 3.2, we briefly describe the matrix factorization based recommendation algorithm. Finally, we present the Coupled Object Similarity (COS) [18], which is used to measure the relationships among items based on item attributes information.
3.1Notations
In a typical scenario, a recommender system consists of a set of users , and a set of items . Generally, user preferences on items are usually converted into a user-item rating matrix , with rows and columns. Each entry of represents the rating given by user on item . In principle, can be any real number, but usually ratings are integers and fall into [0,5], in which indicates that the user has not yet rated that item. A higher rating corresponds to better satisfactory. The set of items rated by the user is denoted as ().
In practical, the user-item rating matrix is generally very sparse with many unknown entries since a typical user may have only rated a tiny percentage of items. For example, in MovieLen100K data set and Netflix data set, 93% and 99% of the possible ratings are missing, respectively. Consequently, the sparse nature of user-item rating matrix leads to poor recommendation quality.
Moreover, each item is represented as an attribute vector , where is the number of attributes. These attribute vectors are extracted from content information of items, and they are categorical in nature. For example, if the item set represents a collection of movies, then the attributes, i.e., director, actor, genre, are extracted to express a movie item. In addition, those attributes have categorical values, such as “Drama”,“War” and “Comedy” etc. for the attribute genre. All item attribute vectors form item-attribute information matrix , and each entry of represents the value of attribute for item .
In essence, the objective of recommender systems is to predict the rating on the specified item for an active user , denoted by , by leveraging all available sources of information by all kinds of machine learning techniques.
3.2Matrix Factorization for Recommender Systems
Matrix factorization technique is widely employed in the research of recommender systems. The goal of matrix factorization technique is to learn the latent preferences of users and the latent characteristics of items from all known ratings, then predict the unknown ratings through the inner products of user latent feature vectors and item latent feature vectors. Formally, matrix factorization based methods decompose the user-item rating matrix into two low rank latent feature matrices and , where , and then use the product of and to approximate the rating matrix . As a result
The column vectors and represent the -dimensional user-specific latent feature vector and item-specific latent feature vector, respectively. Once recommender systems gain the low rank latent feature matrices, we can use the inner product of and to estimate the rating given by the active user for target item . Formally,
In order to learn the latent feature vectors of users and items, we solve the approximate problem described above in a traditional way by utilizing the Singular Value Decomposition (SVD) [40], which minimizes the following objective function,
where is the Frobenius norm [41]. Although SVD is a powerful technique for identifying latent semantic factors in information retrieval, it is not well-defined when the user-item rating matrix is highly sparse. Hence, it is common to directly factorize the observed ratings only and turn objective function (Equation 3) into
where indicates the set of the pairs for known ratings. To avoid over-fitting, two regularization terms on the sizes of and are added into Equation (Equation 4). As a result, Equation (Equation 4) is changed to
where represent the regularization parameters and control the impacts on the learnt latent feature vectors.
Due to both and being unknown, the optimization problem in Equation (Equation 5) is biconvex. Usually, an efficient and easy-to-implementation algorithm called the stochastic gradient descent algorithm (SGD) [42] is applied to seek a local minimum solution of the objective function given by Equation (Equation 5). The SGD algorithm keeps on iterating on the training set until the objective function shown in Equation (Equation 5) converges to or arrivals at the upper bound of the number of iterations.
To learn the user latent feature matrix , we fix . Then the derivative of with respect to is as follows,
Similarly, we learn the item latent feature matrix by firstly keeping fixed. Then the derivative of with respect to is displayed below,
Accordingly, the stochastic gradient descent algorithm uses the following updating rules to learn the latent feature vectors , :
where is the learning rate.
The matrix factorization algorithm described above is the so-called Regularized Singular Value Decomposition (RSVD) [40], which is widely employed due to its good scalability and high recommendation quality. From the perspective of Bayesian, RSVD is equivalent to Probabilistic Matrix Factorization [37], which has been demonstrated to be one of the state-of-the-art collaborative filtering methods.
In this paper, we take the RSVD method as a baseline approach and enhance it by incorporating item attributes information to improve the recommendation quality and make recommendations more interpretable.
3.3Item Relationship Measure Cos
In recommender systems, items are usually described by categorical attributes. For example, a movie item can be represented by a collection of categorical features (i.e. director, actor, genre and country). There are few suitable similarity measures to compute the similarity between items described by categorical attributes. For instance, in [16], which is one of our main comparison algorithms, Nguyen et al. use simple matching similarity to measure the closeness between items and . Formally,
where is the number of attributes. is the simple match similarity between and and is defined as follows,
In essence, for categorical data, the SMS only uses and to distinguish similarities between distinct and identical categorical values. Hence, it is relatively rough and fails to capture the genuine relationship between categorical data. For example, by using the simple matching similarity measure, the similarity between two items described as [‘A1’,‘B1’,‘C1’] and [‘A1’,‘B2’,‘C2’] is 0.33, while this similarity based on Table 2 in [18] by using the coupled object similarity is 0.75, which more accurately reflects the relationship between categorical data.
Therefore, we adopt the Coupled Object Similarity (COS) [0,1] proposed in [18] to measure the similarity between items based on the item-attribute information matrix . The COS considers both the intra-coupled similarity within an attribute and the inter-coupled similarity between different attributes, where the effectiveness of COS in capturing genuine relationship between items described by categorical data has been validated in [18].
Formally, the Coupled Object Similarity (COS) between categorical items and is defined as follows.
where and are the values of attribute for and , respectively; and is Coupled Attribute Value Similarity( CAVS) between attribute values and .
The CAVS consists of the Intra-coupled Attribute Value Similarity (IaAVS) measure and the Inter-coupled Attribute Value Similarity (IeAVS) measure for attribute . The IaAVS takes value occurrence frequency within an attribute into account and reflects the value similarity in terms of frequency distribution, while the IeAVS considers the dependency aggregation among attributes and reflects the value similarity in terms of item value co-occurrence. By simultaneously considering both IaAVS and IeAVS, the definition of CAVS between attribute values and is as follows.
In detail, based on the intuition that more similar occurrence frequencies of an attribute value pair indicate greater similarity and higher occurrence frequencies of an attribute value means more importance [17], the Intra-coupled Attribute Value Similarity (IaAVS) measure is defined as follows:
where and are the set information functions, which denote the set of items that their values of attribute are and , respectively.
On the other hand, the Inter-coupled Attribute Value Similarity (IeAVS) measure between attribute values and can be computed by:
where is the weight of attribute , , all sums up to , and is defined as:
where denotes the intersection set of and , whose elements are values of attribute for items that their values of attribute are and , respectively. is the information conditional probability of attribute value with respect to another attribute value and is defined as follows:
Overall, by adopting the coupled object similarity to measure the similarity among categorical items, we can accurately capture the genuine relationship among items and better characterize the item latent feature vectors in the process of matrix factorization, hence produce more accurate recommendations compared to conventional approaches.
4Attributes Coupling Based Item Enhanced Matrix Factorization Method
In this section, we propose our attributes coupling based item enhanced matrix factorization method for recommender systems, in which item attribute information are utilized to regularize the matrix factorization procedure.
4.1Framework of Attributes Coupling Based Item Enhanced Matrix Factorization Method
The key idea of our proposed recommendation algorithm is to utilize item attribute information to regularize the matrix factorization. The item attribute information is formed as an item relationship regularization term and makes an assumption that two item latent feature vectors and are similar if the two items have similar characteristics in terms of item attribute information.
In order to make two item latent feature vectors and as similar as possible if they are relatively close according to their item attribute contents, we add an item relationship regularization term based on item attribute information to constrain the baseline matrix factorization framework, i.e. RSVD. The item relationship regularization term is defined as:
where is another regularization parameter to control the impact from the item attribute information, is the similarity between two items based on their item attribute information. The similarity between items and forms the entry of similarity matrix . In our proposed approach, this similarity is measured by using coupled object similarity [18], which has been described in Section 3.3. A small value of means that the distance of two item latent feature vectors must be great, while a small value of distance indicates that must be large. Hence, this term relationship regularization term makes two item latent feature vectors more “close” if they share some common characteristics based on their item attribute information.
Let Q be expressed as and indicate the element column vector, then . We can rewrite item relationship regularization term as follows:
where represents the Laplacian matrix and is a diagonal matrix with diagonal elements .
By adding the item relationship regularization term into Equation 5, our proposed attributes coupling based item enhanced matrix factorization method can be formulated as:
Replacing Equation 17 with Equation 18, we can change the objective function Equation 19 to
Similar to the RSVD approach, we seek a local minimum solution of the objective function derived from Equation 20 by applying the stochastic gradient descent algorithm. To learn the latent feature vectors, we use the following updating rules for and :
From updating Equations Equation 21 and Equation 22, it is easy to see that the gradient with respect to is identical to Equation 6, while the gradient with respect to changes to
To summarize, our proposed attributes coupling based item enhanced matrix factorization approach for recommender system is described in Algorithm ?.
4.2Complexity Analysis
In our proposed recommendation algorithm, the main computation cost involves two parts: learning the latent feature vectors and computing the similarity among items with the coupled object measure.
The main computation cost of learning parameters is to evaluate the objective function and its gradients against latent user and item feature vectors. The computational complexity of evaluating objective function is , where is the average number of ratings per item and is the average number of most similar neighbors per item. Since the user-item rating matrix is extremely sparse, the value of is relatively small. On the other hand, in our proposed matrix factorization model, we always choose items that are most similar to target item as the neighbors of target item, which indicates that generally takes relatively small value. Hence, the computation of is fast and linear with respect to the number of items in the user-item rating matrix . Assuming the average number of ratings per user is , the time complexities of evaluating and are and , respectively. Hence, the total time complexity of computing the gradients in each iteration is .
The overhead of computing the similarity among items is , where is the maximal number of attribute values for all the attributes in item-attribute information matrix . Generally, the value of is small. For instance, the value of is 1 in MovieLens data sets
5Experiments and Evaluation
In this section, we conduct several experiments on real data sets to compare the performance of our proposed attributes coupling based item enhanced matrix factorization method, referred as to IEMF, with other state-of-the-art methods. We address the following questions.
How does our proposed IEMF compare with other state-of-the-art collaborative filtering approaches, especially with matrix factorization technique based recommendation algorithms?
How does the control parameter and impact the quality of recommendation?
Can IEMF effectively tackle the cold start item problem?
How does the size of item’s neighborhood affect the recommendation results?
5.1Data Sets Description
Several data sets have been widely used to evaluate the performance of recommendation algorithms, such as Movielens, EachMovie
MovieLens100K contains 100,000 ratings from 943 users and 1,682 movies. Users with less than 20 ratings have been removed. The sparsity level of MovieLens100K is 1- , which is equal to 93.69%. HetRec2011 is an extension of MovieLens10M, published by GroupLens research group. HetRec2011 contains 855,598 ratings given by 2,113 users on 10,197 movies. The sparsity level of HetRec2011 is 96.03%, which is sparser than MovieLens100K. Moreover, Movielens100K lacks director, actor, and country etc. attributes and only contains genre information, while HetRec2011 data set includes relatively complete item attribute information and contains director, actor, country and genre attributes. In our experiments, we extract director, country and genre etc. attributes from HetRec2011 data set to represent the item attribute vectors and only extract genre attribute to describe the item attribute vectors in Movielens100K.
Note that, the original HetRec2011 data set is incomplete. For instance, some movie items do not have country attribute, others do not have director attribute, even the genres of several movie items are incorrectly labeled. Moveover, the numbers of movie items are not consistent among movie item information that include country attribute and that include director attribute information as well as that contain genres information. So, we preprocess the HetRec2011 data set before our experiments by leveraging the IMDB
General statistics about these two data sets are summarized in Table ?.
Statistics |
Moivelens100K | HetRec2011 |
---|---|---|
Num. of Ratings | 100,000 | 851,871 |
Num. of Users, | 943 | 2,113 |
Num. of Items, | 1,682 | 10,046 |
Sparsity | 0.9369 | 0.9599 |
Avg. Ratings per User | 106.04 | 403.157 |
Avg. Ratings per Item | 59.45 | 84.80 |
In Figure 1, we also plot the power distributions of two data sets. From Figure 1, we can observe that the number of ratings per item shows more serious long tail effect that of per user for both data sets. In other words, the negative effect of cold start items is larger than that of cold start users on recommendation quality. This difference hints us that we should pay more attention to the cold start item problem than to the cold start user problem, which is the motivation of our proposed method.
5.2Evaluation Metrics
We choose two popular metrics: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), to measure the recommendation quality of our proposed method compared with other recommendation algorithms. Formally,
where and are the real rating and the corresponding prediction, respectively, and denotes the total number of predictions generated for all active users.
From above equations, we can see that the lower the MAE or RMSE, the better the recommendation algorithm.
5.3Compared Approaches and Experimental Settings
In order to evaluate the performance of our proposed method, we choose the following state-of-the-art approaches for comparison.
RSVD
: RSVD is proposed by Arkadiusz Paterek [40]. This method learns latent feature vectors by minimizing the sum-of-squared error between real ratings and estimations for available ratings in training set. It has been demonstrated to be one of the state-of-the-art collaborative filtering methods and only utilizes user-item rating matrix to generate recommendations.
NMF
: This method is proposed by Lee et al. [43]. Different from other matrix factorization techniques, it adds one more constraint on matrix factor model: both low rank latent feature matrices and only have positive entries. This method also utilizes user-item rating matrix to produce recommendations.
PMF
: This method is represented by Salakhutdinov et al. [37] and can be viewed as a probabilistic extension of the SVD model. PMF represents the latent user and item feature vector by means of a probabilistic graphic model with Gaussian observation noise. Similar to RSVD and NMF, PMF learns the latent user and item feature vector only based on rating information.
CBMF
[16]: This method is proposed by Nguyen et al. [16]. To facilitate comparison, We refer this method as CBMF. CBMF incorporates content information directly into the matrix factorization approach to improve the quality of recommendation. More specially, the simple matching similarity metric is used to measure the relationship between two categorical items.
In order to make a fair comparison, we set the common parameters to be identical parameter values in all methods. For all involved recommendation algorithms, we set . Meanwhile, the learning rate in all methods is set to be 0.005. Specially, the control parameters in CBMF and IEMF are set to 0.1. Finally, we use and the number of iteration to control the loop conditions of matrix factorization procedures.
We conduct a five-fold cross validation over Moivelens100K and HetRec2011 data sets by randomly extracting different training and test sets at each time, which accounts for 80% and 20%, respectively. Finally, we report the average results on test sets.
We use a PC with a Intel Xeon CPU@3.2GHz Processor, 8GB memory, Windows2003 Server operating system and J2SE 1.7, to conduct all our experiments.
5.4 Recommendation Quality Comparisons
Dataset | Dimension | Metric | RSVD | NMF | PMF | CBMF | IEMF |
---|---|---|---|---|---|---|---|
Movielens100K |
10 | MAE | 0.7468 | 0.7919 | 0.7519 | 0.7308 | 0.7282 |
RMSE | 0.9576 | 1.0027 | 0.9663 | 0.9213 | 0.9186 | ||
50 |
MAE | 0.7437 | 0.7774 | 0.7674 | 0.7298 | 0.7277 | |
RMSE | 0.9594 | 0.9840 | 0.9757 | 0.9198 | 0.9182 | ||
HetRec2011 | 10 | MAE | 0.6091 | 0.6287 | 0.6082 | 0.6026 | 0.5802 |
RMSE | 0.7910 | 0.8317 | 0.8000 | 0.7845 | 0.7667 | ||
50 |
MAE | 0.6178 | 0.6355 | 0.6159 | 0.6097 | 0.5920 | |
RMSE | 0.8234 | 0.8308 | 0.8219 | 0.7922 | 0.7816 | ||
Table ? reports the results of recommendation quality for the above selected recommendation algorithms, in which the number of dimensions of latent feature vectors are set to be 10 and 50.
From Table ?, we can observe that approaches CBMF and IEMF outperform other methods, which only utilize the user-item rating matrix to learn latent feature vectors. CBMF improves the MAE of PMF by 2.8% and 1% on MovieLens100K and HetRec2011 with , respectively. With the same parameters settings, our proposed method IEMF improves the MAE of PMF by 3.2% and 4.6% on MovieLens100K and HetRec2011, respectively.
This observation confirms the assumption that using item content information can improve the recommendation quality. Moveover, for CBMF and IEMF, which both integrate item content information into matrix factorization to improve the recommendation quality, IEMF generally achieves better result than CBMF on both data sets. This observation demonstrates that our COS measure is more accurate than SMS in capturing the genuine relationship between two categorical items. Hence, COS measure is more helpful to generate better recommendations.
It should be noted that on the HetRec2011 data set, MAE and RMSE values generated by the above selected methods when are generally higher than the corresponding values when , which means that a high dimension of latent feature vectors may degrade the performance of recommendation algorithms based on matrix factorization technique. A possible reason is that continuously increasing may introduce noise into the matrix factorization model after arrivals at the optimal value to characterize the user and item features adequately.
In addition, all our selected methods perform better on HetRec2011 than on Moivelens100K. This is due to the fact that the average number of ratings per user in HetRec2011 is much larger, which is nearly 4 times the corresponding number of MovieLens100K. Moveover, the gain of our proposed method IEMF over NMF on HetRec2011 in term of MAE is greater than the gain on Moivelens100K. This phenomena indicates that our proposed method can work better when more attribute information are available since HetRec2011 includes country, director, actor and genre attributes, while MovieLens100K only contains genre attribute.
5.5Impact of Control Parameter
In our proposed method, the parameter plays an important role and controls the influence of item attribute information on learning the item latent feature vectors. A larger value of indicates that we put more weights on item attribute information to predict items’ characteristics. In the extreme case, item attribute information would dominate the learning process and make item latent feature vectors close to its direct neighbors. A small value of makes our method degrade to baseline RSVD method. Hence, very large values of or very small values of hurt the recommendation quality. In this section, we perform a group of experiments to evaluate the impact of on the performance of our proposed method by changing the values of from 0.01 to 1. Another parameter is set as .
Figure 2 reports the impacts of parameter on MAE and RMSE for both data sets. From Figure 2, we have the following observations: (1) the values of have a significant impact on the recommendation quality, which indicates that combining user-item rating information and item attribute information can greatly improve the recommendation quality, (2) the curves of MAE and RSME on two data sets show similar change trends. As the increases, the values of MAE and RMSE firstly drop down, the recommendation quality improves, after the parameter reaches a certain threshold, the MAE and RMSE begin to increase as the parameter increases, which means that the performance degrades when is too large. This observations indicate that only using user-item rating matrix by abandoning item attribute information or excessively rely on item attribute information cannot generate reliable recommendations.
Moveover, our recommendation approach achieves the best performance: MAE=0.7197 when is around 0.2 on MovieLens100K, while we get MAE=0.5765 at on HetRec2011. This phenomenon demonstrates that our recommendation approach with HetRec2011 depends more on item attribute information than that with MovieLens100K, which confirms that more available item attribute information is helpful for alleviating the cold start item problem since HetRec2011 contains a larger portion of cold start items than that of MovieLens100K.
5.6Impact of Dimension of Latent Feature
The dimension of latent feature vectors is another important parameter in our proposed method. We conduct another group of experiments to assess the impact of parameter on the recommendation quality of our proposed method by changing from 5 to 50 with a step of 5. Another parameter is set as and 0.4 in MovieLens100K and HetRec2011, respectively. The experimental results are plotted in Figure 3.
From Figure 3, we can clearly see that as increases, the values of MAE decrease at first, and then begin to increase. Based on the intuition that the greater value of , the more preferences that can be represented by latent feature vectors, and hence better recommendations. However, Figure 3 shows that continually increasing the value of does not improve the performance after the dimension of latent feature vector surpasses a certain threshold like 10-dimension on Movielens100K and 20-dimension on HetRec2011. The possible reason is that when arrives at a specific threshold, the latent user and item feature vectors are enough to characterize the preferences of users or items, and continually increasing will introduce much noise into the objective function, resulting in degrading recommendation quality.
Our recommendation approach gains the best recommendation quality when and on MovieLens100K and HetRec2011, respectively.
5.7Performance on Cold Start Items
The principle purpose of our proposed approach is to deal with cold start item issue in recommender systems. Although many research work has explored the cold start problem, most of the work focuses on the cold start user problem and ignores the cold start item problem. For example, social networks based recommendation approaches combine social relations between users to solve the cold start user problem. Moveover, as mentioned in Section 5.1, the cold start item problem is more serious than the cold start user problem in MovieLens100K and HetRec2011. For instance, if we take users who have rated less than 20 items as cold start users, no users are cold start user in MovieLens100K and HetRec2011. In contrast, if we consider items which are rated by users less than 20 times as cold start items, 48.37% and 44.17% are cold start items in MovieLens100K and HetRec2011, respectively.
To evaluate the effectiveness of our recommendation approach on coping with the cold start item problem, we firstly group items according to the number of observed ratings on items in the training set, and then compare MAE and RMSE of different item groups with other baseline approaches.
The distributions of items in each training set for both data sets are depicted in Fig. ?, in which the X-axis shows item groups categories as “1-10“, “11-20“, “21-40“, “41-80“, “81-160“, “161-320“ , “321-640“ and “640“ and the Y-axis displays the number of items that are rated the corresponding times. For example, for MovieLens100K data set, there are around 600 items, for which the number of observed ratings in each training set is in the range of [1-10]. Meanwhile, there are around 3800 similar items in the training set of HetRec2011. In this group of experiments, we set and for both MovieLens100K and HetRec2011 data sets.
The experimental results are shown in Fig. ?. Fig. ? shows that our proposed IEFM is able to generate better recommendations than other algorithms, especially for the items with few observed ratings. In terms of MAE, the improvement of our approach for the second category items, i.e., items that are rated from 1 to 10 times, is 5.5% over RSVD on MovieLens100K and 6.3% on HetRec2011. As more observed ratings are given, the improvement of our proposed approach gradually reduces, and all compared methods achieve similar performance. These observations indicate that our proposed recommendation algorithm can cope with cold start item problem more effectively than other state-of-art techniques. We argue that the main reason for the improvement is the consideration of the item attribute information as well as the adoption of the coupled object similarity measure to capture the relationships among items in our proposed recommendation algorithm.
5.8Impact of Size of Neighborhood of Item
In this paper, the size of neighborhood for each item is the final control parameter that affects the performance of our proposed approach since two latent item feature vectors are assumed to be close if these two items are similar according to their item attribution contents. In other words, the latent feature vectors of items depend on the feature vectors of their neighbors, especially for those cold start items, the degrees of dependency are greater than those of items that have many observed ratings. To explore the impact of this control parameter on our recommendation algorithm, we vary the numbers of similar neighbors and observe the according changes of recommendation quality. We set K = 10 and for both MovieLens100K and HetRec2011 data sets.
The experimental results are shown in Figure 4. We can observe that the size of neighborhood does have significantly effects on the recommendation quality of our proposed approach for both data sets. Our recommendation approach achieves the best performance when the size of neighborhood is around 40 on MoiveLens100K, while the optimal value of the size of neighborhood on HetRec2011 approximates 200 which is larger than the corresponding value in MovieLens100K. This is primarily because that HetRec2011 contains more items than MovieLens100K. Hence, an item of HetRec2011 generally has more neighbors than items of MovieLens100K. Secondly, HetRec2011 includes more attribution information than MovieLens100K, which can be used to generate more accurate similarity with the coupled object similarity measure. As a result, for HetRec2011, our proposed approach depends more on item neighbors to learn the latent item feature vectors.
We make the following conclusions from the above experimental evaluation. First, the incorporation of item attribute information is effective in improving the traditional matrix factorization methods, which completely discard additional item content information and only utilize user-item rating matrix to learn the preferences of users and items. Second, compared with the SMS used in CIMF, which is relatively rough and fails to capture the relationship between items, the COS is more accurate than the SMS in capturing the genuine relationship between two categorical items and hence helps recommender systems generate better recommendations for users. Third, our proposed approach can effectively cope with the cold start item problem by keeping the latent item feature vectors of cold start item as close as possible to the latent feature vectors of their neighbors. Finally, when more item attribute information is available, our proposed method can find more reliable neighbors for target items by leveraging the coupled object similarity, resulting in generating better recommendations for users. Hence, lacking of item attribute information would limit the accuracy of our proposed recommendation algorithm.
6Conclusion and Future Work
Recommender systems play an important role in e-commerce for both users and businesses due to the huge volumes of information on the Web. It provides personalized services for users and promotes more revenues for businesses. In this paper, we propose attributes coupling based item enhanced matrix factorization method by incorporating item attribute information into matrix factorization technique as well as adapting the coupled object similarity to capture the relationship between items. Item attribute information is formed as an item relationship regularization term to regularize the process of matrix factorization and makes two item-specific latent feature vectors as similar as possible if the two items have similar attribute content. More specially, we adapt the coupled object similarity to capture the genuine relationship between two categorical items, and hence these reliable item neighbors can be leveraged to better characterize the preferences of items. Experimental results on two real data sets show that our proposed method outperforms state-of-the-art recommendation algorithms, such as RSVD, NMF, PMF and CBMF.
At present, the available public data sets only contain a small portion of item attribute information and even some popular data sets don’t have any related information about items’ attributes. For instance, the Netflix contains no item attribute information and MovieLens100K only contains genre information. In the future, we plan to extract more attribute information of item to improve our proposed method. For example, movies’ production companies may contribute to the higher values of rating for some users who always tend to those movies that produced by the famous movie production companies, such as, Twentieth Century Fox, Columbia Pictures Corp. and Warner Bros etc. The more available item attribute information will help increase the recommendation quality of our propose method.
Moveover, we only constrain item latent feature vectors by using item attribute information without considering the user social networking relations. In the future, we plan to investigate whether social networking relations are useful for our proposed method to improve the recommendation quality.
Furthermore, although the process of computing similarities among items is offline, the cost of computing similarities measured by the coupled object similarity is expensive, whose time complexity is . In the future, we plan to investigate how to reduce the time complexity of coupled object similarity measure at the same time keep its advantage and how to use parallel computing method, e.g, MapReduce, to speed up the process of computing the coupled object similarities among items.
Acknowledgments
The authors would like to thank the anonymous referees and the editor for their helpful comments and suggestions.
Footnotes
References
- G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 6, pp. 734–749, 2005.
- J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence.1em plus 0.5em minus 0.4emMorgan Kaufmann Publishers Inc., 1998, pp. 43–52.
- P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an open architecture for collaborative filtering of netnews,” in Proceedings of the 1994 ACM conference on Computer supported cooperative work.1em plus 0.5em minus 0.4emACM, 1994, pp. 175–186.
- X. Su and T. M. Khoshgoftaar, “A survey of collaborative filtering techniques,” Advances in artificial intelligence, vol. 2009, p. 4, 2009.
- B. M. Sarwar, J. A. Konstan, A. Borchers, J. Herlocker, B. Miller, and J. Riedl, “Using filtering agents to improve prediction quality in the grouplens research collaborative filtering system,” in Proceedings of the 1998 ACM conference on Computer supported cooperative work.1em plus 0.5em minus 0.4emACM, 1998, pp. 345–354.
- Y. Ren, G. Li, J. Zhang, and W. Zhou, “Lazy collaborative filtering for data sets with missing values,” 2013.
- S. K. L. Al Mamunur Rashid, G. Karypis, and J. Riedl, “Clustknn: a highly scalable hybrid model-& memory-based cf algorithm,” Proceeding of WebKDD, 2006.
- P. Melville, R. J. Mooney, and R. Nagarajan, “Content-boosted collaborative filtering for improved recommendations,” in AAAI/IAAI, 2002, pp. 187–192.
- K. Yu, A. Schwaighofer, V. Tresp, X. Xu, and H.-P. Kriegel, “Probabilistic memory-based collaborative filtering,” Knowledge and Data Engineering, IEEE Transactions on, vol. 16, no. 1, pp. 56–69, 2004.
- C.-N. Ziegler, G. Lausen, and L. Schmidt-Thieme, “Taxonomy-driven computation of product recommendations,” in Proceedings of the thirteenth ACM international conference on Information and knowledge management.1em plus 0.5em minus 0.4emACM, 2004, pp. 406–415.
- Y. Zhen, W.-J. Li, and D.-Y. Yeung, “Tagicofi: tag informed collaborative filtering,” in Proceedings of the third ACM conference on Recommender systems.1em plus 0.5em minus 0.4emACM, 2009, pp. 69–76.
- H. Ma, H. Yang, M. R. Lyu, and I. King, “Sorec: social recommendation using probabilistic matrix factorization,” in Proceedings of the 17th ACM conference on Information and knowledge management.1em plus 0.5em minus 0.4emACM, 2008, pp. 931–940.
- M. Jamali and M. Ester, “A matrix factorization technique with trust propagation for recommendation in social networks,” in Proceedings of the fourth ACM conference on Recommender systems.1em plus 0.5em minus 0.4emACM, 2010, pp. 135–142.
- Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
- B. M. Kim and Q. Li, “Probabilistic model estimation for collaborative filtering based on items attributes,” in Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence.1em plus 0.5em minus 0.4emIEEE Computer Society, 2004, pp. 185–191.
- J. Nguyen and M. Zhu, “Content-boosted matrix factorization techniques for recommender systems,” Statistical Analysis and Data Mining, 2013.
- G. Gan, C. Ma, and J. Wu, Data clustering: theory, algorithms, and applications.1em plus 0.5em minus 0.4emSiam, 2007, vol. 20.
- C. Wang, L. Cao, M. Wang, J. Li, W. Wei, and Y. Ou, “Coupled nominal similarity in unsupervised learning,” in CIKM.1em plus 0.5em minus 0.4emACM, 2011, pp. 973–978.
- L. Cao, Y. Ou, and P. S. Yu, “Coupled behavior analysis with applications,” Knowledge and Data Engineering, IEEE Transactions on, vol. 24, no. 8, pp. 1378–1392, 2012.
- Y. Yu, C. Wang, Y. Gao, L. Cao, and X. Chen, “A coupled clustering approach for items recommendation,” in Advances in Knowledge Discovery and Data Mining.1em plus 0.5em minus 0.4emSpringer, 2013, pp. 365–376.
- A. Ahmad and L. Dey, “A< i> k</i>-mean clustering algorithm for mixed numeric and categorical data,” Data & Knowledge Engineering, vol. 63, no. 2, pp. 503–527, 2007.
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web.1em plus 0.5em minus 0.4emACM, 2001, pp. 285–295.
- G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,” Internet Computing, IEEE, vol. 7, no. 1, pp. 76–80, 2003.
- L. H. Ungar and D. P. Foster, “Clustering methods for collaborative filtering,” in AAAI Workshop on Recommendation Systems, no. 1, 1998.
- G.-R. Xue, C. Lin, Q. Yang, W. Xi, H.-J. Zeng, Y. Yu, and Z. Chen, “Scalable collaborative filtering using cluster-based smoothing,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval.1em plus 0.5em minus 0.4em ACM, 2005, pp. 114–121.
- T. Hofmann, “Latent semantic models for collaborative filtering,” ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 89–115, 2004.
- T. HOFMANN, “Collaborative filtering via gaussian probabilistic latent semantic analysis,” Proceedings of the 26th ACM SIGIR, 2003, 2003.
- R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann machines for collaborative filtering,” in Proceedings of the 24th international conference on Machine learning.1em plus 0.5em minus 0.4emACM, 2007, pp. 791–798.
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Analysis of recommendation algorithms for e-commerce,” in Proceedings of the 2nd ACM conference on Electronic commerce.1em plus 0.5em minus 0.4emACM, 2000, pp. 158–167.
- W. Lin, S. A. Alvarez, and C. Ruiz, “Efficient adaptive-support association rule mining for recommender systems,” Data mining and knowledge discovery, vol. 6, no. 1, pp. 83–105, 2002.
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system-a case study,” DTIC Document, Tech. Rep., 2000.
- D. Seung and L. Lee, “Algorithms for non-negative matrix factorization,” Advances in neural information processing systems, vol. 13, pp. 556–562, 2001.
- D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1548–1560, 2011.
- S. Zhang, W. Wang, J. Ford, and F. Makedon, “Learning from incomplete ratings using non-negative matrix factorization.” in SDM, 2006.
- N. Srebro, J. Rennie, and T. S. Jaakkola, “Maximum-margin matrix factorization,” in Advances in neural information processing systems, 2004, pp. 1329–1336.
- J. D. Rennie and N. Srebro, “Fast maximum margin matrix factorization for collaborative prediction,” in Proceedings of the 22nd international conference on Machine learning.1em plus 0.5em minus 0.4emACM, 2005, pp. 713–719.
- A. Mnih and R. Salakhutdinov, “Probabilistic matrix factorization,” in Advances in neural information processing systems, 2007, pp. 1257–1264.
- K. Yu, S. Zhu, J. Lafferty, and Y. Gong, “Fast nonparametric matrix factorization for large-scale collaborative filtering,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval.1em plus 0.5em minus 0.4emACM, 2009, pp. 211–218.
- L. Wu, E. Chen, Q. Liu, L. Xu, T. Bao, and L. Zhang, “Leveraging tagging for neighborhood-aware probabilistic matrix factorization,” in Proceedings of the 21st ACM international conference on Information and knowledge management.1em plus 0.5em minus 0.4emACM, 2012, pp. 1854–1858.
- A. Paterek, “Improving regularized singular value decomposition for collaborative filtering,” in Proceedings of KDD cup and workshop, vol. 2007, 2007, pp. 5–8.
- G. H. Golub and C. F. Van Loan, Matrix computations.1em plus 0.5em minus 0.4emJHU Press, 2012, vol. 3.
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM Journal on Optimization, vol. 19, no. 4, pp. 1574–1609, 2009.
- D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.