Learning over Knowledge-Base Embeddings for Recommendation

Learning over Knowledge-Base Embeddings for Recommendation


State-of-the-art recommendation algorithms – especially the collaborative filtering (CF) based approaches with shallow or deep models – usually work with various unstructured information sources for recommendation, such as textual reviews, visual images, and various implicit or explicit feedbacks. Though structured knowledge bases were considered in content-based approaches, they have been largely neglected recently due to the availability of vast amount of data, and the learning power of many complex models.

However, structured knowledge bases exhibit unique advantages in personalized recommendation systems. When the explicit knowledge about users and items is considered for recommendation, the system could provide highly customized recommendations based on users’ historical behaviors. A great challenge for using knowledge bases for recommendation is how to integrated large-scale structured and unstructured data, while taking advantage of collaborative filtering for highly accurate performance. Recent achievements on knowledge base embedding sheds light on this problem, which makes it possible to learn user and item representations while preserving the structure of their relationship with external knowledge. In this work, we propose to reason over knowledge base embeddings for personalized recommendation. Specifically, we propose a knowledge base representation learning approach to embed heterogeneous entities for recommendation. Experimental results on real-world dataset verified the superior performance of our approach compared with state-of-the-art baselines.

Recommender Systems; Knowledge-base Embedding; Collaborative Filtering; Personalization

printacmref=false \copyrightyear2018 \acmYear2018 \setcopyrightacmcopyright \acmConferenceSIGIR’18July 8 – 12, 2018, Ann Arbor, Michigan, U.S.A \acmPrice15.00

1 Introduction

Most of the existing Collaborative filtering (CF)-based recommendation systems work with various unstructured data such as ratings, reviews, or images to profile the users for personalized recommendation. Though effective, it is difficult for existing approaches to model the explicit relationship between different information that we know about users and items. In this paper, we would like to ask a key question, i.e., “can we extend the power of collaborative filtering upon large-scale structured user behavior data?”. The main challenge to answer this question is how to effectively integrate different types of user behaviors and item properties, while preserving the internal relationship betwen them to enhance the final performance of personalized recommendation.

Fortunately, the emerging success on knowledge base embeddings may shed some light on this problem, where heterogeneous knowledge entities and relations can be projected into a unified embedding space. By encoding the rich information from multi-type user behaviors and item properties into the final user/item embeddings, we can enhance the recommendation performance while preserving the internal structure of the knowledge.

Inspired by the above motivation, in this paper, we design a novel collaborative filtering framework over knowledge graph. The main building block is an integration of traditional CF and knowledge-base embedding technology. More specifically, we first define the concept of user-item knowledge graph, which encodes our knowledge about the users and items as a relational graph structure. The user-item knowledge graph focuses on how to depict different types of user behaviors and item properties over heterogenous entities and relations in a unified framework. Then, we extend the design philosophy of collaborative filtering (CF) to learn over the knowledge graph for personalized recommendation.

Figure 1: A toy example of user-item knowledge graph. In the left is a set of triplets of user behaviors and item properties, and in the right is the corresponding graph structure.

Contributions. The main contributions of this paper can be summarized as follows:

We propose to leverage entity-level knowledge-base embedding for recommendation, which to the best of our knowledge, is the first time in the filed of recommendation community.

We extend traditional collaborative filtering to learn over the heterogenous knowledge-base embeddings, which makes it possible to capture the user preferences more comprehensively.

Extensive experiments verify that our model can consistently outperform many state-of-the-art baselines on real-world e-commerce datasets.

In the following part of the paper, we first illustrate our models in section 2, and verify its effectiveness with experimental results in section 3. At last, the related work and conclusions are presented in section 4 and 5, respectively.

2 Collaborative Filtering with Knowledge-Graph

In this section, we illustrate our model to make recommendations by learning knowledge-graph embeddings with collaborative filtering.

2.1 Model structure

Incorporating additional information has been a widely used method to enhance recommendation performance, however, the objects in a practical recommender system are usually very heterogeneous (e.g. product brands, categories, and user reviews, etc), and the relations between them can also be quite different (e.g. belong to, buy, view, etc). To tackle with such heterogenous data in a unified framework, we first define a user-item knowledge-graph structure specialized for recommender systems, then we conduct collaborative filtering on this graph to provide personalized recommendations.

User-Item Knowledge Graph in Recommender system

In the context of recommender system, the user-item knowledge graph is built from a set of triplets. Each triplet is composed of a head entity , a tail entity , and the relation from to . The semantic for a triplet is that has a (directed) relation with . For example, means that the has bought the before, and means that the belongs to a particular .

Specifically, we define 5 types of entities and 6 types of relations in our system, where the entities include user, item, word, brand and category, while the relations include:
: the relation from a user to an item, meaning that the user has bought the item.
: the relation from an item to a category, meaning that the item belongs to the category.
: the relation from an item to a brand, meaning that the item belongs to the brand.
: relation from a user or an item to a word, meaning that the word is mentioned in the reviews of the user or item.
: the relation from an item to another item, meaning that users who bought the first item also bought the second item.
: the relation from an item to another item, meaning that users who bought the first item also viewed the second item.

An example of the constructed user-item knowledge graph can be seen in Figure 1.

0:  Entity set , relation set , triplet set , dimension , number of negative samples ;
  Randomly initialize the embeddings for and
  for  in  do
     Update embeddings according to
  end for
  Embeddings of the subjects and relations
Algorithm 1 Collaborative Filtering based on Knowledge Graph
Dataset CDs Clothing Cell Phones Beauty
Measures(%) NDCG Recall HT Prec NDCG Recall HT Prec NDCG Recall HT Prec NDCG Recall HT Prec
BPR 2.009 2.679 8.554 1.085 0.601 1.046 1.767 0.185 1.998 3.258 5.273 0.595 2.753 4.241 8.241 1.143
BPR_HFT 2.661 3.570 9.926 1.268 1.067 1.819 2.872 0.297 3.151 5.307 8.125 0.860 2.934 4.459 8.268 1.132
VBPR 0.631 0.845 2.930 0.328 0.560 0.968 1.557 0.166 1.797 3.489 5.002 0.507 1.901 2.786 5.961 0.902
DeepCoNN 4.218 6.001 13.857 1.681 1.310 2.332 3.286 0.229 3.636 6.353 9.913 0.999 3.359 5.429 9.807 1.200
CKE 4.620 6.483 14.541 1.779 1.502 2.509 4.275 0.388 3.995 7.005 10.809 1.070 3.717 5.938 11.043 1.371
JRL 5.378 7.545 16.774 2.085 1.735 2.989 4.634 0.442 4.364 7.510 10.940 1.096 4.396 6.949 12.776 1.546
CFKG 5.563 7.949 17.556 2.192 3.091 5.466 7.972 0.763 5.370 9.498 13.455 1.325 6.370 10.341 17.131 1.959
Improvement 3.44 5.35 4.66 5.13 78.16 82.87 72.03 72.62 23.05 26.47 22.99 20.89 44.90 48.81 34.09 26.71
Table 1: Performance on top-10 recommendation between the baselines and our model (all the values in the table are percentage numbers with ‘%’ omitted), where the bolded numbers indicate the best performance of each column. The first block shows the results of the baselines, where the stared numbers indicate the best baseline performances; the the second line from the bottom presents the results of our model. The last line shows the percentage increment of our results against the best baseline (i.e., JRL), which are significant at p=0.001.

Collaborative Filtering based on User-Item Knowledge Graph

The user-item knowledge graph provides us with the ability to access different information sources and multi-type behaviors in a unified manner. In this section, we conduct collaborative filtering on this graph for accurate user profiling and personalized recommendation. Inspired by  [Bordes et al. (2013)], we project each entity and relation into a unified low-dimensional embedding space. Intuitively, the embedding of a tail entity should be close to its translated head entity embedding. Formally, for a triplet , suppose the embeddings of are , respectively, then we want that . Considering all the observed triplets , we minimize a margin-based loss to learn the embeddings as follows:


where, is the set of negative triplets that replace the tail by a random entity, and is another set of negative triplets that replace the head by a random entity. is a metric function to measure the distance between two embeddings, where we select -norm as its specific implementation. is an arbitrary translation function, or even a neural network, for here, we adopt the addition function as in the transE model [Bordes et al. (2013)], because it gives us better efficiency and effectiveness on our dataset. However, it is not necessarily restricted to this function and many other choices can be used in practice.

In the loss function , we essentially try to discriminate the observed triplets from the corrupted ones by a hinge loss function, and the embeddings will be forced to recover the ground truth. Our model can be learned by stochastic gradient descent (SGD), and the model learning algorithm is shown in Algorithm 1.

Personalized Recommendation

We will obtain the embeddings for all entities and relations in the graph once our model is optimized. To generate personalized recommendations for a particular user, we take advantage of the relation type buy. Specifically, suppose the embedding of the relation is , and the embedding of a target user is , then we can generate recommendations for the user by ranking the candidate items in ascending order of the distance .

3 Experiments

In this section, we evaluate our proposed models by comparing with many state-of-the-art methods. We begin by introducing the experimental setup, and then analyze the experimental results.

3.1 Experimental Setup

Datasets. Experiments are conducted on the Amazon e-commerce dataset [He and McAuley (2016a)]. We adopt four representative sub-datasets in terms of size and sparsity, which are CD, Clothing, Cell Phone, and Beauty. Statistics of the four datasets are summarized in Table 2.

Datasets #Users #Items #Interactions Density
CDs 75258 64421 1097592 0.0226%
Clothing 39387 23033 278677 0.0307%
Cell Phones 27879 10429 194493 0.0669%
Beauty 22363 12101 198502 0.0734%
Table 2: Statistics of the datasets.

Evaluation methods. In our experiments, we leverage the widely used Top-N recommendation measurements including Precision, Recall, Hit-Ratio and NDCG to evaluate our model as well as the baselines. The former three measures aim to evaluate the recommendation quality without considering the ranking positions, while the last one evaluates not only the accuracy but also the ranking positions of the correct items in the final list.

Baselines. We adopt the following representative and state-of-the-art methods as baselines for performance comparison:

BPR: The bayesian personalized ranking [Rendle et al. (2009)] model is a popular method for top-N recommendation. We adopt matrix factorization as the prediction component for BPR.

BPR_HFT: The hidden factors and topics model [McAuley and Leskovec (2013)] is a recommendation method leveraging textual reviews, however, the original model was designed for rating prediction rather than top-N recommendation. To improve its performance, we learn HFT under the BPR pair-wise ranking framework for fair comparison.

VBPR: The visual bayesian personalized ranking [He and McAuley (2016b)] model is a state-of-the-art method for recommendation with images.

DeepCoNN: A review-based deep recommender [Zheng et al. (2017)], which leverages convolutional neural network (CNN) to jointly model the users and items.

CKE: This is a state-of-the-art neural recommender [Zhang et al. (2016)] that integrates textual, visual information, and knowledge base for modeling, but it does not consider the heterogenous connection across different types of entities.

JRL: The joint representation learning model [Zhang et al. (2017)] is a state-of-the-art neural recommender, which can leverage multi-model information for Top-N recommendation.

Parameter settings. All the embedding parameters are randomly initialized in the range of , and then we update them by conducting stochastic gradient descent (SGD). The learning rate is determined in the range of , and model dimension is tuned in the range of . This gives us the final learning rate as 0.01 and dimension as 300. For the baselines, we also determine the final settings by grid search, and for fair comparison, the models designed for rating prediction (i.e. HFT and DeepCoNN) are learned by optimizing the ranking loss similar to BPR. When conducting experiments, 70% items of each user are leveraged for training, while the remaining are used for testing. We generate Top-10 recommendation list for each user in the test dataset.

Relations CDs Clothing Cell Phones Beauty
Measures(%) NDCG Recall HT Prec NDCG Recall HT Prec NDCG Recall HT Prec NDCG Recall HT Prec
buy 3.822 5.185 12.828 1.628 1.019 1.754 2.780 0.265 3.387 5.806 8.548 0.848 3.658 5.727 10.549 1.305
buy+category 4.287 5.990 14.388 1.790 1.705 3.021 4.639 0.442 3.372 5.918 8.842 0.869 3.933 6.253 11.515 1.370
buy+brand 3.541 4.821 12.239 1.563 1.101 1.906 2.981 0.284 3.679 6.211 9.118 0.898 4.832 7.695 13.406 1.621
buy+mention 4.265 5.858 13.874 1.731 1.347 2.305 3.585 0.344 4.065 7.065 10.316 1.026 4.364 6.942 12.476 1.492
buy+also_view 3.724 5.070 12.633 1.604 2.276 3.931 5.827 0.561 3.305 5.705 8.458 0.840 5.295 8.723 14.891 1.728
buy+also_bought 5.055 7.094 16.216 2.032 1.799 3.078 4.634 0.446 5.018 8.707 12.375 1.220 5.058 8.118 13.907 1.643
all (CFKG) 5.563 7.949 17.556 2.192 3.091 5.466 7.972 0.763 5.370 9.498 13.455 1.325 6.370 10.341 17.131 1.959
Table 3: Performance on top-10 recommendation when incorporating different types of relation in the structured knowledge graph (all the values in the table are percentage numbers with ‘%’ omitted). The final result (using all relations) is significantly better than all other models (using part of the relations) at p=0.001 level.

3.2 Performance Comparison

Performance of our Collaborative Filtering with Knowledge Graph (CFKG) model as well as the baseline methods are shown in Table 1. Basically, the baseline methods can be classified according to the information source(s) used in the method, which are rating-based (BPR), review-based (HFT and DeepCoNN), image-based (VBPR), and heterogenous information modeling (CKE and JRL). Generally, the information sources used by our model include ratings (through the buy relation), reviews (through relation), and our knowledge about the items (through the belong_to_category, belong_to_brand, also_view and also_bought relations).

From the experimental results we can see that, both of the review-based models can enhance the performance of personalized recommendation from rating-based methods, and by considering multiple heterogenous information sources, JRL and CKE outperform the other baselines, with JRL achieving the best performance among the baselines. It is encouraging to see that our collaborative filtering with knowledge graph (CFKG) method outperforms the best baseline (JRL) consistently over the four datasets and on all evaluation measures, which verifies the effectiveness of our approach for personalized recommendation.

However, the performance improvement of our approach may benefit from two potential reasons – that we used more information sources, and that we used a better structure (i.e., structured knowledge graph) to model heterogenous information. For better understanding, we analyze the contribution of types of relation to our model in the following subsection.

3.3 Further Analysis on Different Relations

We experiment the performance of our model when using different relations. Because we eventually need to provide item recommendations for users, the CFKG approach would at least need the buy relation to model the user purchase histories. As a result, we test our model when using only the buy relation (which simplifies into the translation-based model [He et al. (2017)]), as well as the buy relation plus each of the other relations, as shown in Table 3.

We see that when using only the buy relation, our CFKG_buy model significantly outperforms the BPR approach on all measures and datasets. On the NDCG measure, the percentage improvement can be 90% (CDs), 70% (Clothing), 69% (Phone), and 33% (Beauty). Because both BPR and CFKG_buy only used the user purchase information, this observation verifies the effectiveness of using structured knowledge graph embeddings for recommendation. Similarly, CFKG_buy+mention significantly outperforms BPR_HFT, and also outperforms DeepCoNN except for the recall on the CD dataset. Considering that CFKG_buy+mention, BPR_HFT and DeepCoNN all work with user purchase history plus textual reviews, this observation further verifies the advantage of using structured knowledge graph for user modeling and recommendation.

Furthermore, we see that adding any one extra relation to the basic buy relation gives us improved performance from CFKG_buy. Finally, by modeling all of the heterogenous relation types, the final CFKG model outperforms all baselines and the simplified versions of our model with one or two types of relation, which implies that our knowledge base embedding approach to recommendation is scalable to new relation types, and it has the ability to leverage very heterogeneous information sources in a unified manner.

4 Related Work

Using knowledge base to enhance the performance of recommender system is an intuitive idea, which has attracted research attention since very early stages of the recommendation community [Trewin (2000), Ghani and Fano (2002)]. However, the difficulty of reasoning over the paths on heterogenous knowledge graphs prevent current approaches from applying collaborative filtering on very different entities and relation types [Zhang et al. (2016), Catherine et al. (2017)], which further makes it difficult to take advantage of the wisdom of crowd.

Fortunately, recent years have witnessed the success of heterogenous knowledge base embedding techniques [Bordes et al. (2013), Wang et al. (2014), Lin et al. (2015)], which can help to learn the embeddings of very different entities to support various application scenarios such as question answering [] and relation extraction from text [Lin et al. (2015)]. To the best of our knowledge, this work is the first time to learn entity-level knowledge base embeddings for personalized recommendation.

5 Conclusions and Future Work

In this paper, we propose to learn over heterogenous knowledge base embeddings for personalized recommendation. To do so, we construct the user-item knowledge graph to incorporate both user behaviors and our knowledge about the items. We further learn the knowledge base embeddings with the heterogenous relations collectively, and leverage the user and item embeddings to generate personalized recommendations. Experimental results on real-world datasets verified the superior performance of our approach, as well as its flexibility to incorporate multiple relation types.


  1. A. Bordes, S. Chopra, and J. Weston. 2014. Question Answering with Subgraph Embeddings. In ACL.
  2. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS. 2787–2795.
  3. R. Catherine, K. Mazaitis, M. Eskenazi, and W. Cohen. 2017. Explainable Entity-based Recommendations with Knowledge Graphs. RecSys (2017).
  4. Rayid Ghani and Andrew Fano. 2002. Building recommender systems using a knowledge base of product semantics. In Workshop on Recommendation and Personalization in E-Commerce. 27–29.
  5. Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based Recommendation. In RecSys. ACM.
  6. Ruining He and Julian McAuley. 2016a. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In WWW.
  7. Ruining He and Julian McAuley. 2016b. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI.
  8. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion.. In AAAI.
  9. Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys. 165–172.
  10. Steffen Rendle, C. Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI.
  11. Shari Trewin. 2000. Knowledge-based recommender systems. Encyclopedia of library and information science 69, Supplement 32 (2000), 180.
  12. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI, Vol. 14. 1112–1119.
  13. Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge Base Embedding for Recommender Systems. In KDD.
  14. Y. Zhang, Q. Ai, X. Chen, and W. B. Croft. 2017. Joint representation learning for top-n recommendation with heterogeneous information sources. In CIKM.
  15. Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation. In WSDM.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description