NISER: Normalized Item and Session Representations with Graph Neural Networks

NISER: Normalized Item and Session Representations with Graph Neural Networks

Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff priyanka.g35@tcs.com, diksha.7@tcs.com, malhotra.pankaj@tcs.com, lovekesh.vig@tcs.com, gautam.shroff@tcs.com TCS Research, New Delhi, India
Abstract.

The goal of session-based recommendation (SR) models is to utilize the information from past actions (e.g. item/product clicks) in a session to recommend items that a user is likely to click next. Recently it has been shown that the sequence of item interactions in a session can be modeled as graph-structured data to better account for complex item transitions. Graph neural networks (GNNs) can learn useful representations for such session-graphs, and have been shown to improve over sequential models such as recurrent neural networks (Wu et al., 2019). However, we note that these GNN-based recommendation models suffer from popularity bias: the models are biased towards recommending popular items, and fail to recommend relevant long-tail items (less popular or less frequent items). Therefore, these models perform poorly for the less popular new items arriving daily in a practical online setting. We demonstrate that this issue is, in part, related to the magnitude or norm of the learned item and session-graph representations (embedding vectors). We propose a training procedure that mitigates this issue by using normalized representations. The models using normalized item and session-graph representations perform significantly better: i. for the less popular long-tail items in the offline setting, and ii. for the less popular newly introduced items in the online setting. Furthermore, our approach significantly improves upon existing state-of-the-art on three benchmark datasets.

Session-based Recommendation; Graph Neural Networks; Item and Session Representations
journalyear: 2019copyright: rightsretainedconference: 1st International Workshop on Graph Representation Learning and its Applications; November 3rd–7th, 2019; Beijing, Chinadoi: isbn:

1. Introduction

Figure 1. Typical popularity distribution of items depicting the long tail.

The goal of session-based recommendation (SR) models is to recommend top-K items to a user based on the sequence of items clicked so far. Recently, several effective models for SR based on deep neural networks architectures have been proposed (Wu et al., 2019; Liu et al., 2018; Li et al., 2017; Wang et al., 2019). These approaches consider SR as a multi-class classification problem where input is a sequence of items clicked in the past in a session and target classes correspond to available items in the catalog. Many of these approaches use sequential models like recurrent neural networks considering a session as a sequence of item click events (Hidasi et al., 2016; Jannach and Ludewig, 2017; Wang et al., 2019). On the other hand, approaches like STAMP (Liu et al., 2018) consider a session to be a set of items, and use attention models while learning to weigh (attend to) items as per their relevance to predict the next item. Approaches like NARM (Li et al., 2017) and CSRM (Wang et al., 2019) use a combination of sequential and attention models.

An important building block in most of these deep learning approaches is their ability to learn representations or embeddings for items and sessions. Recently, an alternative approach, namely SR-GNN (Wu et al., 2019), has been proposed to model the sessions as graph-structured data using GNNs (Li et al., 2015) rather than as sequences or sets, noting that users tend to make complex to-and-fro transitions across items within a session: for example, consider a session of item clicks by a user. Here, the user clicks on item , then clicks on item and then re-clicks on item . This sequence of clicks induces a graph where nodes and edges correspond to items and transitions across items, respectively. For session in the above example, the fact that and are neighbors of in the induced session-graph, the representation of can be updated using representations of its neighbors, i.e. and , and thus obtain a more context-aware and informative representations. It is worth noting that this way of capturing neighborhood information has also been found to be effective in neighborhood-based SR methods such as SKNN (Jannach and Ludewig, 2017) and STAN (Garg et al., 2019).

It is well-known that more popular items are presented and interacted-with more often on online platforms. This results in a skewed distribution of items clicked by users (Steck, 2011; Abdollahpouri et al., 2017; Yang et al., 2018), as illustrated in Fig. 1. The models trained using the resulting data tend to have popularity bias, i.e. they tend to recommend more popular items over rarely clicked items.

We note that SR-GNN (referred to as GNN hereafter) also suffers from popularity bias. This problem is even more severe in a practical online setting where new items are frequently added to the catalog, and are inherently less popular during initial days. To mitigate this problem, we study GNN through the lens of an item and session-graph representation learning mechanism, where the goal is to obtain a session-graph representation that is similar to the representation of the item likely to be clicked next. We motivate the advantage of restricting the item and session-graph representations to lie on a unit hypersphere both during training and inference, and propose NISER: Normalized Item and Session Representations model for SR. We demonstrate the enhanced ability of NISER to deal with popularity bias in comparison to a vanilla GNN model in the offline as well as online settings. We also extend NISER to incorporate the sequential nature of a session via position embeddings (Vaswani et al., 2017), thereby leveraging the benefits of both sequence-aware models (like RNNs) and graph-aware models.

2. Related Work

Recent results in computer vision literature, e.g. (Wang et al., 2017; Zheng et al., 2018), indicate the effectiveness of normalizing the final image features during training, and argue in favor of cosine similarity over inner product for learning and comparing feature vectors. (Zheng et al., 2018) introduces the ring loss for soft feature normalization which eventually learns to constrain the feature vectors on a unit hypersphere. Normalizing words embeddings is also popular in NLP applications, e.g. (Peng et al., 2015) proposes penalizing the L norm of word embeddings for regularization. However, to the best of our knowledge, the idea of normalizing item and session-graph embeddings or representations has not been explored.

In this work, we study the effect of normalizing the embeddings on popularity bias which has not been established and studied so far. Several approaches to deal with popularity bias exist in the collaborative filtering settings, e.g. (Abdollahpouri et al., 2017; Steck, 2011; Yang et al., 2018). To deal with popularity bias, (Abdollahpouri et al., 2017) introduces the notion of flexible regularization in a learning-to-rank algorithm. Similarly (Steck, 2011; Yang et al., 2018) uses the power-law of popularity where the probability of recommending an item is a smooth function of the items’ popularity, controlled by an exponent factor. However, to the best of our knowledge, this is the first attempt to study and address popularity bias in DNN-based SR using SR-GNN (Wu et al., 2019) as a working example. Furthermore, SR-GNN does not incorporate the sequential information explicitly to obtain the session-graph representation. We study the effect of incorporating position embeddings (Vaswani et al., 2017) and show that it leads to minor but consistent improvement in recommendation performance.

3. Problem Definition

Let denote all past sessions, and denote the set of items observed in the set . Any session is a chronologically ordered tuple of item-click events: , where each of the item-click events () corresponds to an item in , and denotes the position of the item in the session . A session can be modeled as a graph , where is a node in the graph. Further, is a directed edge from to . Given , the goal of SR is to predict the next item by estimating the -dimensional probability vector corresponding to the relevance scores for the items. The items with highest scores constitute the top-K recommendation list.

4. Learning Item and Session Representations

Each item is mapped to a -dimensional vector from the trainable embedding look-up table or matrix such that each row is the -dimensional representation or embedding111We use the terms representation and embedding interchangeably. vector corresponding to item (). Consider any function (e.g. a neural network as in (Liu et al., 2018; Wu et al., 2019))—parameterized by —that maps the items in a session to session embedding , where222To ensure same dimensions of across sessions, we can pad with a dummy vector times. . Along with as an input which considers as a sequence, we also introduce an adjacency matrix to incorporate the graph structure. We discuss this in more detail later in Section 5.

The goal is to obtain that is close to the embedding of the target item , such that the estimated index/class for the target item is with . In a DNN-based model , this is approximated via a differentiable softmax function such that the probability of next item being is given by:

(1)

For this -way classification task, softmax (cross-entropy) loss is used during training for estimating by minimizing the sum of over all training samples, where is a 1-hot vector with corresponding to the correct (target) class .

We next introduce the radial property (Wang et al., 2017; Zheng et al., 2018) of softmax loss, and then use it to motivate the need for normalizing the item and session representations in order to reduce popularity bias.

4.1. Radial Property of Softmax Loss

Figure 2. Illustrative flow diagram for NISER.

It is well-known that optimizing for the softmax loss leads to a radial distribution of features for a target class (Wang et al., 2017; Zheng et al., 2018): If , then it is easy to show that

(2)

for any . This, in turn, implies that softmax loss favors large norm of features for easily classifiable instances. We omit details for brevity and refer the reader to (Wang et al., 2017; Zheng et al., 2018) for a thorough analysis and proof. This means that a high value of can be attained by multiplying vector by a scalar value ; or simply by ensuring a large norm for the item embedding vector.

Figure 3. Recall@20 and L norm of learned item embeddings decreases with decreasing popularity in GNN (Wu et al., 2019).

4.2. Normalizing the Representations

We note that the radial property has an interesting implication in the context of popularity bias: target items that are easier to predict are likely to have higher L norm. We illustrate this with the help of an example: Items that are popular are likely to be clicked more often, and hence the trained parameters and should have values that ensure these items get recommended more often. The radial property implies that for a given input and a popular target item , a correct classification decision can be obtained as follows: learn the embedding vector with high such that the inner product (where is the angle between the item and session embeddings) is likely to be high to ensure large value for (even when is not small enough and is not large enough). When analyzing the item embeddings from a GNN model (Wu et al., 2019), we observe that this is indeed the case: As shown in Fig. 3, items with high popularity have high L norm while less popular items have significantly lower L norm. Further, performance of GNN (depicted in terms of Recall@20) degrades as the popularity of the target item decreases.

4.3. Niser

Based on the above observations, we consider to minimize the influence of embedding norms in the final classification and recommendation decision. We propose optimizing for cosine similarity as a measure of similarity between item and session embeddings instead of the above-stated inner product. Therefore, during training as well as inference, we normalize the item embeddings as , and use them to get . The session embedding is then obtained as , and is similarly normalized to to enforce a unit norm. The normalized item and session embeddings are then used to obtain the relevance score for next clicked item computed as,

(3)

Note that the cosine similarity is restricted to . As shown in (Wang et al., 2017), this implies that the softmax loss is likely to get saturated at high values for the training set: a scaling factor is useful in practice to allow for better convergence.

5. Leveraging NISER with GNN

We consider learning the representations of items and session-graphs with GNNs where the session-graph is represented by as introduced in Section 3. Consider two adjacency matrices and corresponding to the incoming and outgoing edges in graph as illustrated in Fig. 2. Each edge has a normalized weight calculated as the occurrence of the edge divided by the outdegree of the start node of that edge.

GNN takes adjacency matrices and , and the normalized item embeddings as input, and returns an updated set of embeddings after iterations of message propagation across vertices in the graph using gated recurrent units (Li et al., 2015): , where represents the parameters of the GNN function . For any node in the graph, the current representation of the node and the representations of its neighboring nodes are used to iteratively update the representation of the node times. More specifically, the representation of node in the -th message propagation step is updated as follows:

(4)
(5)
(6)
(7)
(8)

where , depicts the -th row of and respectively, , , and are trainable parameters, is the sigmoid function, and is the element-wise multiplication operator.

To incorporate sequential information of item interactions, we optionally learn position embeddings and add them to item embeddings to effectively obtain position-aware item (and subsequently session) embeddings. The final embeddings for items in a session are computed as , where is embedding vector for position obtained via a lookup over the position embeddings matrix , where denotes the maximum length of any input session such that position .

The soft-attention weight of the -th item in session is computed as , where , . The s are further normalized using softmax. An intermediate session embedding is computed as: . The session embedding is a linear transformation over the concatenation of intermediate session embedding and the embedding of most recent item , s.t. , where .

The final recommendation scores for the items are computed as per Eq. 3. Note that while the session-graph embedding is obtained using item embeddings that are aware of the session-graph and sequence, the normalized item embeddings () independent of a particular session are used to compute the recommendation scores.

6. Experimental Evaluation

Dataset Details: We evaluate NISER on publicly available benchmark datasets: i) Yoochoose (YC), ii) Diginetica (DN), and iii) RetailRocket (RR). The YC333http://2015.recsyschallenge.com/challege.html dataset is from RecSys Challenge 2015, which contains a stream of user clicks on an e-commerce website within six months. Given the large number of sessions in YC, the recent 1/4 and 1/64 fractions of the training set are used to form two datasets: YC-1/4 and YC-1/64, respectively, as done in (Wu et al., 2019). The DN444http://cikm2016.cs.iupui.edu/cikm-cup dataset is transactional data from CIKM Cup 2016 challenge. The RR555https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0 dataset is from an e-commerce personalization company retailrocket, which published dataset with six month of user browsing activities.
Offline and Online setting: We consider two evaluation settings: i. offline and ii. online. For evaluation in offline setting, we consider static splits of train and test as used in (Wu et al., 2019) for YC and DN. For RR, we consider sessions from last 14 days for testing and remaining 166 days for training. The statistics of datasets are summarized in Table 1. For evaluation in online setting, we re-train the models every day for 2 weeks (number of sessions per day for YC is much larger, so we evaluate for 1 week due to computational constraints) by appending the sessions from that day to the previous train set, and report the test results of the trained model on sessions from the subsequent day.
NISER and its variants: We apply our approach over GNN and adapt code666https://github.com/CRIPAC-DIG/SR-GNN from (Wu et al., 2019) with suitable modification described later. We found that for sessions with length , considering only the most recently clicked 10 items to make recommendations worked consistently better across datasets. We refer to this variant as GNN+ and use this additional pre-processing step in all our experiments.

Statistics DN RR YC-1/64 YC-1/4
#train sessions 0.7 M 0.7 M 0.4 M 0.6 M
#test sessions 60,858 60,594 55,898 55,898
#items 43,097 48,759 16,766 29,618
Average length 5.12 3.55 6.16 5.17
Table 1. Statistics of the datasets used for offline experiments.

We consider enhancement over GNN+, and proposed following variants of the embedding normalization approach:

  • Normalized Item Representations (NIR): only item embeddings are normalized and scale factor is not used777 is not restricted to [-1,1], in general .,

  • Normalized Item and Session Representations (NISER): both item and session embeddings are normalized,

  • NISER+: NISER with position embeddings and dropout applied to input item embeddings.

Hyperparameter Setup: Following (Wu et al., 2019), we use and learning rate of 0.001 with Adam optimizer. We use 10% of train set which is closer to test test in time as hold-out validation set for hyperparameter tuning including scale factor . We found to work best across most models trained on respective hold-out validation set chosen from {}, and hence, we use the same value across datasets for consistency. We use dropout probability of 0.1 on dimensions of item embeddings in NISER+ across all models. Since the validation set is closer to the test set in time, therefore, it is desirable to use it for training the models. After finding the best epoch via early stopping based on performance on validation set, we re-train the model for same number of epochs on combined train and validation set. We train five models for the best hyperparameters with random initialization, and report average and standard deviation of various metrics for all datasets except for YC-1/4 where we train three models (as it is a large dataset and takes around  15 hours for training one model).
Evaluation Metrics: We use same evaluation metrics Recall@K and Mean Reciprocal Rank (MRR@K) as in (Wu et al., 2019) with . Recall@K represents the proportion of test instances which has desired item in the top-K items. MRR@K (Mean Reciprocal Rank) is the average of reciprocal ranks of desired item in recommendation list. Large value of MRR indicates that desired item is in the top of the recommendation list. For evaluating popularity bias, we consider the following metrics as used in (Abdollahpouri et al., 2019):

(a) DN
(b) RR
(c) YC-1/64
(d) YC-1/4
Figure 4. Offline setting evaluation: Recall@20 and MRR@20 with varying indicating larger gains by using NISER+ over GNN+ for less popular items.
(a) DN
(b) RR
(c) YC
Figure 5. Online setting evaluation: Recall@20 and MRR@20 for sessions where target item is one of the less popular newly introduced items from the previous day. denotes the fraction of such sessions in the training set for . Standard deviations over five models are shown in lighter-shaded region around the solid lines.

Average Recommendation Popularity (ARP): This measure calculates the average popularity of the recommended items in each list given by:

(9)

where is popularity of item , i.e. the number of times item appears in the training set, is the recommended list of items for session , and is the number of sessions in the test set. An item belongs to the set of long-tail items or less popular items if . We evaluate the performance in terms of Recall@20 and MRR@20 for the sessions with target item in the set by varying .

6.1. Results and Observations

Method DN RR YC-1/64 YC-1/4
GNN+ 495.252.52 453.398.97 4128.5427.80 17898.10126.93
NISER+ 487.310.30 398.533.09 3972.4041.04 16683.52120.74
Table 2. Offline setting evaluation: NISER+ versus GNN+ in terms of Average Recommendation Popularity (ARP). Lower values of ARP indicate lower popularity bias.
Method DN RR YC-1/64 YC-1/4
Recall@20
SKNN (Jannach and Ludewig, 2017) 48.06 56.42 63.77 62.13
STAN (Garg et al., 2019) 50.97 59.80 69.45 70.07
GRU4REC (Hidasi et al., 2016) 29.45 - 60.64 59.53
NARM (Li et al., 2017) 49.70 - 68.32 69.73
STAMP (Liu et al., 2018) 45.64 53.94 68.74 70.44
GNN (Wu et al., 2019) 51.390.38 57.630.15 70.540.14 70.950.04
GNN+ 51.810.11 58.590.10 70.850.08 71.100.07
NIR 52.400.06 60.670.08 71.120.05 71.320.11
NISER 52.630.09 60.850.06 70.860.15 71.690.03
NISER+ 53.390.06 61.410.09 71.270.05 71.800.09
MRR@20
SKNN (Jannach and Ludewig, 2017) 16.95 33.16 25.22 24.82
STAN (Garg et al., 2019) 18.48 35.32 28.74 28.89
GRU4REC (Hidasi et al., 2016) 8.33 - 22.89 22.60
NARM (Li et al., 2017) 16.17 - 28.63 29.23
STAMP (Liu et al., 2018) 14.32 28.49 29.67 30.00
GNN (Wu et al., 2019) 17.790.16 32.740.09 30.800.09 31.370.13
GNN+ 18.030.05 33.290.03 30.840.10 31.510.05
NIR 18.520.06 35.570.05 30.990.10 31.730.11
NISER 18.270.10 36.090.03 31.500.11 31.800.12
NISER+ 18.720.06 36.500.05 31.610.02 31.770.10
Table 3. NISER+ versus other benchmark methods in offline setting. Numbers after are standard deviation values over five models.
Method DN RR YC-1/64 YC-1/4
Recall@20
NISER+ 53.390.06 61.410.09 71.270.05 71.800.09
-L norm 52.230.10 59.160.10 71.100.09 71.460.19
-Dropout 52.810.12 60.990.09 71.070.13 71.900.03
-PE 53.11 61.220.03 71.130.04 71.700.11
MRR@20
NISER+ 18.720.06 36.500.05 31.610.02 31.770.10
-L norm 18.110.05 33.780.04 30.900.07 31.490.07
-Dropout 18.430.11 35.990.02 31.560.06 31.930.17
-PE 18.600.09 36.320.03 31.680.05 31.710.06
Table 4. Ablation results for NISER+ indicating that normalization of embeddings (L norm) contributes the most to performance improvement. Here PE: Position Embeddings.

(1) NISER+ reduces popularity bias in GNN+ in : Table 2 shows that ARP for NISER+ is significantly lower than GNN+ indicating that NISER+ is able to recommend less popular items more often than GNN+, thus reducing popularity bias. Furthermore, from Fig. 4, we observe that NISER+ outperforms GNN+ for sessions with less popular items as targets (i.e. when is small), with gains 13%, 8%, 5%, and 2% for DN, RR, YC-1/64, and YC-1/4 respectively for in terms of Recall@20. Similarly, gains are 28%, 18%, 6%, and 2% in terms of MRR@20. Gains for DN and RR are high as compared to YC. This is due to the high value of . If instead we consider , gains are as high as 26% and 9% for YC-1/64 and YC-1/4 respectively in terms of Recall@20. Similarly, gains are as high as 34% and 19% in terms of MRR@20. We also note that NISER+ is at least as good as GNN+ even for the sessions with more popular items as targets (i.e. when is large).

(2) NISER+ improves upon GNN+ in online setting for newly introduced items in the set of long-tail items . These items have small number of sessions available for training at the end of the day they are launched. From Fig. 5, we observe that for the less popular newly introduced items, NISER+ outperforms GNN+ for sessions where these items are target items on the subsequent day. This proves the ability of NISER+ to recommend new items on very next day, due to its ability to reduce popularity bias. Furthermore, for DN and RR, we observe that during initial days, when training data is less, GNN+ performs poorly while performance of NISER+ is relatively consistent across days indicating potential regularization effect of NISER on GNN models in less data scenarios. Also, as days pass by and more data is available for training, performance of GNN+ improves with time but still stays significantly below NISER+. For YC, as days pass by, the popularity bias becomes more and more severe (as depicted by very small value for , i.e. the fraction of sessions with less popular newly introduced items) such that the performance of both GNN+ and NISER+ degrades with time. However, importantly, NISER+ still performs consistently better than GNN+ on any given day as it better handles the increasing popularity bias.

(3) NISER and NISER+ outperform GNN and GNN+ in offline setting: From Table 3, we observe that NISER+ shows consistent and significant improvement over GNN in terms of Recall and MRR, establishing a new state-of-the-art in SR.

We also conduct an ablation study (removing one feature of NISER+ at a time) to understand the effect of each of the following features of NISER+: i. L norm of embeddings, ii. including position embeddings, and iii. applying dropout on item embeddings. As shown in Table 4, we observe that L norm is the most important factor across datasets while dropout and position embeddings contribute in varying degrees to the overall performance of NISER+.

7. Discussion

In this work, we highlighted that the typical item-frequency distribution with long tail leads to popularity bias in state-of-the-art deep learning models such as GNNs (Wu et al., 2019) for session-based recommendation. We then argued that this is partially related to the ‘radial’ property of the softmax loss that, in our setting, implies that the norm for popular items will likely be larger than the norm of less popular items. We showed that learning the representations for items and session-graphs by optimizing for cosine similarity instead of inner product can help mitigate this issue to a large extent. Importantly, this ability to reduce popularity bias is found to be useful in the online setting where the newly introduced items tend to be less popular and are poorly modeled by existing approaches. We observed significant improvements in overall recommendation performance by normalizing the item and session-graph representations and improve upon the existing state-of-the-art results. In future, it would be worth exploring NISER to improve other algorithms like STAMP (Liu et al., 2018) that rely on similarity between embeddings for items, sessions, users, etc.

References

  • (1)
  • Abdollahpouri et al. (2017) Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2017. Controlling popularity bias in learning-to-rank recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 42–46.
  • Abdollahpouri et al. (2019) Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2019. Managing Popularity Bias in Recommender Systems with Personalized Re-ranking. arXiv preprint arXiv:1901.07555 (2019).
  • Garg et al. (2019) Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. Sequence and Time Aware Neighborhood for Session-based Recommendations: STAN. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 1069–1072.
  • Hidasi et al. (2016) Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR-2016).
  • Jannach and Ludewig (2017) Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 306–310.
  • Li et al. (2017) Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1419–1428.
  • Li et al. (2015) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
  • Liu et al. (2018) Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1831–1839.
  • Peng et al. (2015) Hao Peng, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu, and Zhi Jin. 2015. A comparative study on regularization strategies for embedding-based neural networks. arXiv preprint arXiv:1508.03721 (2015).
  • Steck (2011) Harald Steck. 2011. Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems. ACM, 125–132.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.
  • Wang et al. (2017) Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 1041–1049.
  • Wang et al. (2019) Meirui Wang, Pengjie Ren, Lei Mei, Zhumin Chen, Jun Ma, and Maarten de Rijke. 2019. A Collaborative Session-based Recommendation Approach with Parallel Memory Modules. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 345–354.
  • Wu et al. (2019) Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based Recommendation with Graph Neural Networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence.
  • Yang et al. (2018) Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 279–287.
  • Zheng et al. (2018) Yutong Zheng, Dipan K Pal, and Marios Savvides. 2018. Ring loss: Convex feature normalization for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5089–5097.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
390106
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description