MultiLinear Interactive Matrix Factorization
Abstract
Recommender systems, which can significantly help users find their interested items from the information era, has attracted an increasing attention from both the scientific and application society. One of the widest applied recommendation methods is the Matrix Factorization (MF). However, most of MF based approaches focus on the useritem rating matrix, but ignoring the ingredients which may have significant influence on users’ preferences on items. In this paper, we propose a multilinear interactive MF algorithm (MLIMF) to model the interactions between the users and each event associated with their final decisions. Our model considers not only the useritem rating information but also the pairwise interactions based on some empirically supported factors. In addition, we compared the proposed model with three typical other methods: userbased collaborative filtering (UCF), itembased collaborative filtering (ICF) and regularized MF (RMF). Experimental results on two realworld datasets, MovieLens 1M and MovieLens 100k, show that our method performs much better than other three methods in the accuracy of recommendation. This work may shed some light on the indepth understanding of modeling user online behaviors and the consequent decisions.
keywords:
Recommender Systems, Collaborative Filtering, Matrix Factorization, Latent Factor Model, Timeaware Recommendation1 Introduction
In recent years, the unprecedented proliferation of information has extremely changed our lifestyles. People all around the world are connected closely because of the daily basis millions of microblog posts, tweets and status updates of the social network. The popular online consumption is becoming an essential part of people’s daily life, with the result that millions of ecommercial orders are generated per day. However, people are suffering from a serious and widely known problem: how to acquire quality recommendations from the numerous web service providers? Since the early work resnick1994grouplens () was published in 1990s, personalized (RS) lu2012recommender (); bobadilla2013recommender () has been a thriving subfield of data mining to tackle this concern.
In general, RS, serving as a special category of knowledgebased systems, attempts to automatically measure the relevance of useruser or itemitem pairs, then delivers items to fit user’s tastes via two basic strategies: (CB) balabanovic1997fab () and (CF) herlocker1999algorithmic (). CB profiles items and users by extracting characteristic units from their content (e.g. demographic data, product information/description), and then identifies the matchingdegree by comparing the corresponding profiles. However, due to the high cost to collect the necessary information about items and the lack of motivated users to share their personal data, CB fails to be the most popular recommendation approach. In contrast with CB, CF generates recommendations according to the structure of hill1995recommending (). The virtual community is based on the underlying assumption that a group of people sharing similar characteristics in the past would also agree on their tastes in future. In addition, CF requires no domain knowledge and offers an alternative approach to reveal the latent patterns that are difficult to be captured by CB methods.
According to pioneering research, CF mainly contains two families: the (NBMs) sarwar2001item (); linden2003amazon () and the (LFMs) hofmann2004latent (); koren2009matrix (). NBMs namely outline the act of working together with neighbors. Here the term “neighbor” does not only point to users, but also items, who share many characteristics in essence. Noteworthiness,  and  CF sarwar2001item (); linden2003amazon () are two typical strategies to implement NBMs by measuring the likelihood of neighborhood between users or items with predefined similarity function. NBMs make predictions based on the known ratings involved with the active users’/items’ neighbors. Comparatively, LFMs identify a couple of entities with the same dimensional feature vector inferred from the existing ratings, and straightly express the preference power with the dotproduct of the corresponding feature vector pairs. On the basis of previous works, LFMs offer another idea to express various aspects or patterns of data, usually along with high accuracy and scalability.
As the most representative technique of LFM, Matrix Factorization (MF) results in numerous variants validated against the real data sets because of its high accuracy, scalability and expressive ability to capture various context factors (e.g. emotion, location, time). The earliest work of employing MF to implement CF was proposed by Sarwar et al., who conducted a case study on the application of dimension reduction in CF with Singular Value Decomposition (SVD) method sarwar2000application (). Recently, Hofmann hofmann2004latent () reported on applying Latent Semantic Model to implementing LFM. At the beginning of Netflix Prize Competition bennett2007netflix () in 2006, Brandyn Webb detailed how the Regularized Matrix Factorization (RMF) webb2006rmf () helped his team rank in the third place under the pseudonym Simon Funk. Subsequently, several works koren2009matrix (); koren2008factorization (); koren2010collaborative (); takacs2008investigation (); takacs2008matrix () showed that RMF has played a significant role in the solution that won the Netflix Prize (NP). The attractive characteristics (e.g. methodological simplicity, easy incorporation of additional information, high accuracy) of RMF inspire many researchers to mine its potential from different aspects, such as koren2009matrix (); koren2008factorization (); koren2010collaborative (); paterek2007improving (); takacs2009scalable (); luo2012incremental (); luo2013applying (); zhang2014information () and so on.
As the aforementioned principles of LFMs, the standard RMF can be easily used to discover the latent relationship hidden in the interactions between two entities. In real life, people could take a number of factors into account before making a decision. However, it is difficult for RMF to integrate the interactions between users and the factors beyond items themselves. Though this challenge can be addressed by the Tensor Factorization (TF) karatzoglou2010multiverse (), the model complexity will grow exponentially with the number of contextual factors. Recently, Koren koren2008factorization () claimed a methodology to incorporate the RMF with neighborhood information. In addition, Koren koren2010collaborative () proposed a novel work on addressing temporal changes in user behaviors with matrix factorization models. Baltrunas et al. baltrunas2011matrix () presented the contextaware matrix factorization, which models the interaction of the contextual factors with items. Ma et al. ma2011recommender () extended the RMF by integrating the social regularization terms under the assumption that two users tend to have similar feature vectors if they are closely connected in social networks.
In this paper, we present a novel approach, namely the MultiLinear Interactive Matrix Factorization (MLIMF), to model the interactions between users and the factors (e.g. emotions, locations, the time when the rating is given, movie genres, movie directors), which may have significant influence on the user’s decision process. Generally, web systems could log multiple information correlated with customer’s rating over a specific item. In our model, besides the interaction between the useritem pair, we represent the relationship between a specific userfactor pair in a same latent space. Then, through extending the standard RMF, we linearly integrate the total pairwise interactions together as the components of the customer’s final rating decision to construct MLIMF. To clear the principles and application scenarios of MLIMF, we conduct two examples in two real datasets of Movielens with different size. Experimental results prove that, comparing with the standard RMF and other baseline algorithms, MLIMF could obtain better accuracy with linear complexity. The main contributions of this work include:

In addition to the rating matrix, online users’ rating decision could be probably influenced by some other factors. We propose that user could have a special interaction with each factor, and such pairwise relationship could be represented in a same latent space.

MLIMF, maintaining the principles and expressive scalability of MF, presents an alternative approach to take into account extra information based on the RMF. In fact, the key idea of MLIMF can be incorporated into other invariants of RMF.

Two application scenarios of MLIMF are given. First, we show that the extracted different feasible features from the training sample serving as the accessorial information which could have significant influence on the user’s rating action. Then, we describe how to model the user’s temporal dynamic preferences by integrating the time factor into MLIMF.
The remainder of this paper is organized as follows. Section 2 describes the preliminaries. In Section 3 we detail our proposed recommendation model. Section 4 gives two application scenarios. Experimental results are given in Section 5. Finally, Section 6 summarises this work and outlooks future work.
2 Preliminaries
The CF problem can be simply defined as generating personalized recommendations for a given user by seeking for a group of people or items with similar features from a finite data sample. In the area of CF, the user preferences over involved items are quantized into the useritem rating matrix , where and respectively denote the size of the given user set and item set . Each entry at position () of , denoted by , presents the user ’s preference on item , usually with high value expressing the strong relationship between the useritem pair. Typically, in terms of system’s received specific feedback, can be binary (), integers from a given range (e.g. ), or a continuous numerical interval (e.g. ). In practice, matrix R is usually very sparse and we can only observe a limited set, , normally . Thereby, CF based recommendation tasks can be regarded as missing data estimation through the known useritem rating pairs.
2.1 Regularized Matrix Factorization
Among the huge amount of solutions to CF problem, RMF has been demonstrated to be superior to classic NBMs on the grand NP competition. Furthermore, numerous RMF variants are proposed to discuss the probable applications of MF and show their high efficiency and accuracy on several real rating datasets as well. Different from traditional NBMs, the goal of RMF is to approximate R by constructing two lowrank matrices. The basic principle of RMF is to map a pair of entities into the same lowdimension feature space. Thus each entity could be represented as a lowdimension feature vector. Taking the rating prediction problem as an example, let denote the dimension of the feature space. denotes the user feature matrix where each row corresponds to a particular user and represents the item feature matrix where each row corresponds to a particular item (usually min). Then the rating approximation of user on item could be transformed as calculation of the dotproduct of corresponding useritem feature vector pair,
(1) 
where is the estimate of . Usually, the values of parameters in and can be learned from the training samples by applying the stochastic gradient decent (SGD) to optimize the objective function :
(2) 
where represents the Frobenius norm. is an indicator function and if user rates item , otherwise . The second term of Eq. (2) serves as the regularizing bulk for avoiding overfitting, meaning that the trained model has bad generalization for the new coming case. According to takacs2009scalable (), is the weight parameter for the regularized term. As Eq. (2) shows, is a quadratic function with local minimum. Under the principles of SGD solver, the involved parameters of feature matrices, and , can be updated by moving in the opposite direction of the gradient for each training example. The optimized result could always be found after looping through all training samples for limited times, each of which is called a training epoch. In takacs2009scalable (), initializing each entry in and with random values chosen from a predefined scale could speed up the convergence rate. For each training case, the algorithm generates estimation of and computes the associated prediction error
(3) 
Then the corresponding feature vectors can be updated by the following rules:
(4)  
where denotes the learning rate. As the updating range of feature vectors goes up in proportion to learning rate, is the key ingredient to influence not only the procedures of seeking for optimized parameters, but also the convergence rate for . However, it’s a tough job to set a suitable value to in real application of SGD. Though Luo et al. luo2013applying () recently tried to deal with the dilemma of learning rate tuning through learning rate adaptation, it’s still lack of uniformed policy to set the value of . Like most of proposed MFbased works, we regard as an empirical parameter, adapting to different data sets. Besides setting suitable value to and , Takács et al. takacs2009scalable () suggested that an early stopping criterion is necessary for avoiding overfitting. Usually, we can stop training the model until the evaluation metric on testing set does not improve any more, or fluctuates to a converged value.
The basic RMF has been proved to be highly accurate and scalable. However, many users offer very few ratings, which makes it difficult to identify their tastes. Fortunately, the MF approach is flexible in dealing with this problem by incorporating additional sources of information beyond the useritem rating matrix. In real applications, besides the users’ explicit ratings, RS could easily capture the implicit feedback (e.g. browsing or purchases history, time effects) and social relationships to deeply analyze user preferences. To utilize the implicit feedback, Koren koren2008factorization () presented an alternative approach, named SVD++, to incorporate implicit information and user attributes into MF model. Jamali et al. jamali2010matrix () reported the effect of trust propagation for recommendation in social networks. These works regard extra sources as significant elements that extensively influence the interactions between users and items. Alternatively, in this paper, we suppose that people tend to weight each extra factor into the final rating decision. This weightingprocess just seems that in the sports competition, judgers could firstly measure athlete’s performance in various aspects, then give the final score by synthetically taking into account the weights of all involved elements. Thereby, we model user’s process of weighting each factor as an unique interactive result based on the RMF model. The final estimation of user’s preferences on item is made by a simple linear combination of the interactive weight involved with each factor.
2.2 Temporal Dynamic Matrix Factorization
The aforementioned applications of MF models can not adapt to dynamic customer preferences. Usually concepts (e.g. customer preferences, item popularity, social structure) involved with data are changing over time, and models should distinguish short term effects from the longer term trends that reflect the intrinsic patterns of the data. Nonetheless, temporal changes in data bring unique challenges. With the detailed analysis, the possibility of modeling time effects on the performance of CF has been demonstrated by the recent works. Lathia et al. lathia2008knn () analysed the evolution of retrieved characteristics over time and gave insightful explanations why certain CF similarity measures outperform others. In Ref. koren2010collaborative (), Koren suggested that temporal modeling should be a predominant factor in building RS, and proposed timeSVD++ to model the temporal drifting concepts. Therefore, according to previous researches koren2010collaborative (); xiang2009time (), incorporating time effects into MF models has become a comparative mature topic.
Note that, those models include dayspecific parameters for each user, which limits the feasibility for predicting their future ratings. In this paper, due to the pioneering discussions of the possible types of time effects, we attempt to model time effect as a decision factor for users to express the idea of our proposed MLIMF in the following section.
3 Recommendation With MLIMF
In this section, we will describe the definition of our focused problem, extending RMF to model the interactions between users and decision factors. In order to offer a better understanding, we conduct two applications of the proposed model.
3.1 Problem Definition
With the purpose of both improving user experience and enhancing competitive power, electronic retailers and content providers would offer adequate information for a vast selection of products, which increases opportunities to meet customers’ various personalized needs and tastes. Certainly, customers profit from the abundant data, which provide enough evidences to demonstrate the quality of involved products. As the figure 1 shows, the final decision for purchasing a product may be influenced by many factors, such as emotion, seasonal discount, comments on the product and so on, and the impact of each on users is unbalanced. For example, as a big fan of , user prefers to pay for another exemplary analogical movie (e.g. THX 1138), directed by (an American director famous for the series of ). However, such delicate information is not always available. Alternatively, extra sources associated with an active customer’s ratings can always be captured by RS. In this paper, we model such effect under the assumption that user ’s preference to item can be parted into limited weighted components, each of which denotes the significance of involved factor in the final decision of .
3.2 Principles of MLIMF
Based on the framework of RMF, the interaction between user and specific factor can be represented as the dotproduct of corresponding lowrank feature vectors. Thereby, can be modified as the following:
(5) 
where denotes the decision factor set. In Eq. (5), the first term presents user ’s preference on item , and the bulk behind notion denotes the interactions between user and possible decision factor , which is always a categorical attribute with a set, denoted as , of limited amount of values. It’s noted that the indicator function is set as 1 if user focuses on the specific value of factor , denoted as , when giving rating on item , otherwise is set as 0. In other words, indicates the specific contextual information when user give his/her rate to item . For example, a user who has ever rated “5 stars” on a ’s movie , might give higher weight to another movie played or directed by him. Here, in order to model the relationship between users and extra information, a new set of decision factor feature vectors are necessary, where is associated with feature vector . denotes feature dimension parameter for decision factor . Correspondingly, we define a new set of user feature vectors, where user involved with factor is associated with . Then the objective function is modified as follow:
(6) 
where the relationship between and has been illustrated in Figure 1b, where the blue rectangle represents the fact that user has selected item , and the gray rectangle further points out the underlying decision factors when this rating record is generated. To make it easy to understand their relationship, we could intuitively review users‘ rating procedures. We firstly should make sure whether user u has shown his/her preference to item i, which results in adding outside the bracket in Equation (6). Then interactions between users and underlying factors that users might take into consideration when giving rate to item i, are weighted into the final rating decision. It directly contributes to using to depict such idea, where only represents the specific value of related decision factor to the contemporary rating record.
Eq. (6) is more complicated than Eq. (2) after including the regularized terms for feature vectors of extra sources. However, under the framework of SGD solver, the training parameters can be learned in linear time. Analogous with Eq. (4), we calculate the gradients of the involved parameters with the following rules:
(7) 
Then for each training example with format , , , , , the updating rule for model parameters is formulated by:
(8) 
By combing Eq. (7) and Eq. (8), the values of model parameters can be efficiently learned after several epoches. However, in real applications, it is not easy to incorporate additional sources into MLIMF, due to the lack of motivated users to share their personalized tastes on each event along with the path to the last rating decision.
Nevertheless, online service providers still carefully polish the design of the software systems to capture more details for better understanding user behaviors, which plays an essential role in offering personalized and novel services to users, as well as enhancing the company reputation and competitive strength. In fact, the logged data in the database offers a highly possible approach to model users’ rating action. Thereby, the issue of modeling the users’ procedure of weighting decision factors becomes a problem on how to weight the modified and extracted probable features that might influence the users’ rating decision for a particular item. To deeply clarify the principles of MLIMF, we conduct two possible applications on two real data sets.
3.2.1 Recommendation with Extracted Features
Before carrying on the data mining methods, preprocessing raw data sources can give an insight into the hidden interesting patterns. This subsection highlights the first application of MLIMF on two ^{1}^{1}1Movielens is an online website with ultimate goal to gather research data on personalized recommendations systems. http://movielens.umn.edu/ data sets^{2}^{2}2http://grouplens.org/datasets/movielens/:

MovieLens 100k (ML100k) is collected by the GroupLens Research Project at the University of Minnesota via the MovieLens web site. ML100k contains 100,000 ratings (15) from 943 users on 1682 movies during the sevenmonth period from September 19th, 1997 to April 22nd, 1998. Each user has rated at least 20 movies. The density of the rating matrix in ML100k is 6.30%. In addition to movie ratings, ML100k also provides various information on individual films, such as a group of genres and release date, which are used to increase the film recommendation system’s accuracy.

MovieLens 1M (ML1m) is another collected data set on Movielens web site, which contains 1,000,209 anonymous ratings (15) of approximately 3,900 movies and 6,040 MovieLens users who joined MovieLens in 2000. The density of the rating matrix in ML1m is 4.25%. Like the ML100k, ML1m provides the same information on individual films.
Obviously, the published MovieLens data sets only collect one type of explicit feedback (users’ ratings on movies), which simplifies users’ decision procedure of giving ratings to movies. In fact, the rating on a specific movie reflects a user’s personalized attitude to the corresponding information of films. Although users do not explicitly express their viewpoints on each piece of movie information, the accumulative rating behaviors may imply interesting patterns. The core idea of CF is to utilize the accumulative data to estimate user preference on items under the assumption that a group of close neighbors with similar tastes could help each other rate objects. Based on the available information in the data, we can distill several feasible ingredients, which might affect user’s rating decision. Then we incorporate those ingredients into the proposed MLIMF to model users’ rating behaviors. By combing the previous Eqs. (78), we can automatically learn the strength of interactions between user and those ingredients from the given data.
In this example, it’s noted that besides the rating on item by user at time , ML100k and ML1m also offer information on each film, denoted as set , where
Both ML100k and ML1m contain 19 types of genres and release date for each individual film and every movie can have multiple genres, namely , which can highly reflect the users’ tastes. The size of genre group describes individual bias on multigenre movies. Through the feasible transformation on the observed data, we extract three additional features, release date (), genre group (), the size of corresponding genre group (). Consequently, each piece of observed data can be denoted as . Let the modified observed data be:
Usually we denote the useritem pairwise relationship as the rating matrix . Thus the interactions between user and a specific factor can be analogically denoted as userfactor matrix , each entry of which is a binary indicator, which is set as 1 if user is associated with factor , otherwise set as 0.
In real applications, users only directly give ratings to movies. However, facing different contextual environment, users might have a specific rating pattern for each factor. Figure 2 and figure 3 show the evolution of rating distribution for two factors extracted from the data. The distribution for factor , described in figure 2a, shows that people prefer to giving ratings to recent released movies. And figure 3a offers an evidence to demonstrate that people tend to give strict ratings as the release date grows. Interestingly, figure 3b shows that movies with 5 genres receive higher rating on average. Figure 3c depicts that the ratings given on movies could evolve with the movie genres.
(a)  (b) 
(a)  (b)  (c) 
For the prediction task of MLIM using the data like , the estimate value for an uncollected item of user can be formulated by:
(9) 
3.2.2 Temporal Recommendation
According to the pioneering research koren2010collaborative (), Koren suggests that modeling time effects is essential for building RS. Customer preferences for items are constantly changing over time. The product popularity also evolves over time when new selection emerges. Within the complex systems intersecting multiple customers and items, various characteristics are drifting over time, while many of them often are too delicate to be explored with a few data instances. In Refs. koren2010collaborative (); xiang2009time (), they model time changes at the level of each individual, leading to modify dayspecific variables. However, dayspecific parameters are associated with certain past time points, which turns out to fail in predicting the changes in the future. In ecommerce systems, user feedbacks are constantly generated and vary at different time points. Analyzing such data brings unique challenges on finding the right balance between avoiding temporary effects that tinily affect the future behaviors, while capturing the long term trends that reflect users’ regular patterns.
In this example, we focus on investigating the dayshifting patterns of customers preferences on movies. Two interesting temporal effects associated with the data are shown in Figure 4. An significant effect within ML1m is that the mean rating concentrates around 3.5 in the beginning 300 days, but fluctuates in an intensive amplitude later on. The snapshot for observed data shows that each example contains the time information, which corresponds with a certain day in a year. It’s noted that users usually do not insist on logging in the system every day. In order to predict the future changes based on the accumulate limited amount of users’ daily behaviors, we should depend on not only users’ historical behaviors on old time point, but also the collaborative information from the close neighbors. Thereby time information need to be transformed into a common format. As intending to explore the dayshifting patterns of users, we define a time mapping function , whose output denotes the number of days since the first day of a year. For instance, if input = , the output of is 2, which means that input is associated with the day in 2010. The output of is independent on a specific year, and only cares about the number of days. In order to model the daydrifting changes, we apply to the time point for each observed example. Let the modified observed be:
(a)  (b) 
Usually, RS could log daily generated data, which ensures that the time factor can cover all possible values. Future date could find its corresponding feature parameters after being transformed with function . Then using the data like to estimate user ’s preferences an uncollected item at time can be formulated by:
(10) 
4 Empirical Analysis
The experiments were conducted on two MovleLens data sets as described in Section 3.2. In the experiments, we applied the 5fold crossvalidation method on both data sets for the first application of MLIMF on extracted features. In order to simulate real recommendation occasion for prediction task on the future changes, we apply allbuttwo^{3}^{3}3allbuttwo: Only the last two ratings of each individual are split into the validation set. experiment setting for the second application of MLIMF on modeling the time effects. The specific value of temporal recommendation on the evaluation metric means the average results for 5 runs with random initialization.
The performance of recommendation algorithms is measured by the root mean squared error (RMSE), a widely used metric for evaluating the rating prediction accuracy of recommenders, given by:
(11) 
where denotes the validation set. RMSE measures the errors between the true values and the predictions. Obviously lower RMSE means higher prediction accuracy.
4.1 Baseline Methods
In this section, in order to show efficiency of our proposed recommendation method, we compare the recommendation results with the three baseline algorithms.
 (UCF) lu2012recommender (). UCF is a typical implementation of CF. In UCF, the prediction task for an active user depends on a group of neighbors with similar interests. UCF generates recommendations by two steps: () calculate the similarity , which denotes the correlation or distance between user and ; () generate the predictions for an active user by taking the weighted average of all ratings of his/her k Nearest Neighbors (NN). In this paper, the similarity between user and is calculated with cosinebased metric. Let denote the set of common items rated by both user and . Then the similarity is formulated by:
(12) 
To predict an active user ’s rating on an uncollected item , we can take a weighted average of all the ratings on that item according to the following formula resnick1994grouplens ():
(13) 
where and denote the mean rating of user and , respectively denotes the set of user ’s nearest neighbors who has collected item .
 (ICF) sarwar2001item (). Rather than computing the similarity between user pairs, ICF starts from matching the user’s rated items with similar items, then combines the most similar ones into recommendation list. We employ the cosinebased correlation to measure the similarity between item pairs. Let the denote the set of common users involved with both item and . Then the similarity is calculated by:
(14) 
The prediction step is significant in producing recommendation list. In ICF, generating recommendation results to an active user is based on his/her historical rated items. In this work, the estimate of for active user is computed by:
(15) 
where denotes the set of rated items by user . and respectively denote the mean rating of item and .
webb2006rmf (); koren2009matrix (): This method has been described in the Section 2. It uses only the userrating matrix to generate recommendations.
4.2 Performance validation for MLIMF
In this part we intend to validate the performance of our proposed MLIMF with other three baseline algorithms. In order to make MF models converge at the optimized result, the initial values of feature vectors for both RMF and MLIMF are randomly drawn from a normal distribution (0, 0.02), following the suggestion from takacs2009scalable (). Note that, the incorporation of extra factors makes MLIMF more difficulty to set the value of dimension parameter for the comparative experiments on MLIMF and RMF. Generally, MFbased approaches can produce more precise rating estimation with the growth of their feature dimension. Different from RMF, MLIMF has additional feature dimension parameters, i.e. , which makes it difficult to compare their performance as the reason mentioned above. We can not set as the same value of , for it will result in that MLIMF has large value of feature dimension parameter. As the basic MF method, RMF only models the interactions between useritem pairs, which simplifies the procedure of predefining the dimension parameter for user and item feature vectors. Usually, useritem pairs are matched into the same latent feature space with dimension according to the principles of RMF. is the key parameter to influence the efficiency of modeling user preferences on items. Therefore, we should redefine the parameter for MLIMF because MLIMF models the interactions between users and extra factors besides useritem pairs. Since the objective is to validate the performance of the proposed MLIMF, we design the experimental processes as follows:

Firstly, we modify the settings of dimension parameters for RMF and MLIMF. The given value of dimension parameter for RMF equals to the sum of dimension variables for MLIMF, which can be denoted as:
(16) where and are respectively the predefined parameters for the dimension of the useritem and userfactor feature spaces. In the application of MLIMF on extracted features, dimension parameter for each decision factor equals to 20% of the given value of , which means = 0.4 and = 0.2. In applying MLIMF to modeling time effects, the only parameter for decision factors equals to 60% of the given value of . In this work, that we set the sum of feature dimensions of MLIMF to the same value of for RMF is purely for equally comparing their performance with the original idea that fixing the value of dimension parameter of RMF and using Eq. (16) to initialize feature dimensions of MLIMF could help us better compare both MFbased approach. The float values like 0.4, 0.2 are empirically used to express the significance of corresponding attribute for users¡¯ final rating decision. In practice, one can independently allocate a dimensional value to different decision factor according to their own prior knowledge.

After implementing detailed analysis on experiment datasets, we carry on two possible examples to deeply clarify the idea of MLIMF. Furthermore, comparative experiments for those examples are conducted for MLIMF and other three baseline approaches, including UCF, ICF, RMF. The experimental results are depicted in Tab. 1.

According to takacs2009scalable (), the dimension of feature space can greatly affect the accuracy of MFbased approaches. Then we explore the impact of different values of dimension parameter on the accuracy of RMF and MLIMF. The experimental results are shown in Tab. 2 and Tab. 3.
In the experimental processes, all aforementioned methods have many predefined parameters that greatly influence the accuracy. For UCF, the number of nearest neighbors is significant for building UCF with high accuracy. After conducting several experiments to explore the accuracy of UCF, we decide to use the top 25% neighbors of each user to generate predictive score of an uncollected item. The initial values of all features vectors are randomly chosen from a normal distribution (0, 0.02). Usually, too small, or large value of will lead to very low generality on the testing dataset bishop2006pattern (). To our knowledge, most of works takacs2008investigation (); koren2008factorization (); luo2012incremental () based on MF method will set in a comparatively small interval [0.001, 0.1]. To better present the influence of the selection of regularization parameter , we further do some experiments on two movielens datasets. Results are illustrated in Figure 5. For the simiplicity, we conduct experiments in dataset from the first application case (Extracted Features). In this work, for both experiment data sets ML1m and ML100k, the regularizing parameter for RMF and MLIMF is set to 0.01. In terms of the learning rate for RMF and MLIMF, initially and are both set to 0.01 to ensure a comparative fast convergence rate for ML100k and ML1m. According to Ref. takacs2009scalable (), in addition to the setting of learning rate, the optimized result of the objective function is correlated with the density of useritem rating matrix . Interestingly, in MLIMF the userfactor matrix is always denser than because in realworld application is much less than . Based on the above consideration, we decline the value of to slow down the updating amplitude after several epoches.
(a)  (b) 
Cases  Data  UCF  ICF  RMF  MLIMF  

Extracted Features  ML100k  0.953  0.940  0.918  0.913  0.906  0.904 
ML1m  0.933  0.909  0.863  0.860  0.855  0.853  
Temporal Dynamic  ML100K  1.057  1.034  1.015  1.013  1.006  1.004 
ML1m  0.978  0.958  0.907  0.905  0.903  0.902 
For a clear view, we summarize the RMSE of all mentioned methods in Tab. 1, which presents that both two MF methods outperform the UCF and ICF when the value of is set to 20 and 50 respectively. Tab. 1 also shows that the effects of incorporating extra information make MLIMF work better than RMF with the same value of on two applications of MLIMF.
Dimension  ML100k  ML1m  

RMF  MLIMF  RMF  MLIMF  
= 50  0.9133  0.9035  0.8593  0.8530 
= 100  0.9108  0.9012  0.8564  0.8513 
= 200  0.9091  0.8992  0.8545  0.8500 
= 300  0.9081  0.8984  0.8534  0.8497 
= 400  0.9076  0.8979  0.8527  0.8492 
= 500  0.9070  0.8972  0.8522  0.8491 
4.2.1 Impact of Parameter
Our model is based on RMF. The predefined parameter plays the key role in affecting the optimized accuracy. In this part the detailed analysis of this effect is shown in Tab. 2 and Tab. 3. From Tab. 2 and Tab. 3 we can observe an evident effect of incorporating the extracted features into MLIMF. With different settings of on both ML100k and ML1m, MLIMF always yields lower RMSE than RMF model. Moreover, for ML100k we can clearly observe that the MLIMF with could mimic the optimum RMSE produced by RMF with . For ML1m, MLIMF with could generate the optimized RMSE yielded by RMF with . In particular, results in Tab. 3 also indicate the effectiveness of MLIMF on modeling the time effects.
Dimension  ML100k  ML1m  

RMF  MLIMF  RMF  MLIMF  
= 50  1.0131  1.0042  0.9047  0.9021 
= 100  1.0113  1.0022  0.9038  0.9008 
= 200  1.0105  1.0012  0.9024  0.8999 
= 300  1.0098  1.0007  0.9020  0.8993 
= 400  1.0096  0.9996  0.9017  0.8990 
= 500  1.0093  0.9992  0.9016  0.8986 
4.3 Complexity Analysis
Based on the aforementioned RMSE comparison results, it can be seen that the MLIFM is capable to extend RMF by incorporating additional information. In terms of the training computation complexity, each training epoch, with regard to the sample with format , , , , , is associated with the following operations:

updating all the latent user features and under the rules of Eq. (8), which results in computational complexity at . denotes the size of training sample.

updating the latent features for items in user ’s rating set, which takes a computational complexity at .

updating the latent features for different factors corresponding to user ’s rating records, which totally costs a computational complexity at .
Since above updating steps could be done within the same iteration, the worst computation complexity of MLIMF on modeling interesting patterns of the data is , where equals to . Given the iterative times for convergence, then the computation complexity for updating over in MLIMF can be formulated by:
(17) 
which depicts a fact that though taking into account extra factors, the computation complexity of MLIMF grows in linear time comparing with RMF. In terms of the space complexity, it increases with the number of factors incorporated into the proposed MLIMF model.
5 Conclusions and Discussion
Many pioneering researches have proved that Matrix Factorization (MF) based approaches are effective and flexible in dealing with various aspects of useritem rating data. Generally, the final rating decisions of online users should be affected by various underlying factors, such as emotions, time, genres, and so on. In this paper, based on classical MF method, we propose a multilinear interactive MF (MLIMF) approach, trying to gain insight into user preferences. Firstly, we assume that users are willing to implicitly or explicitly weigh the impact of each factor when they rate items. Secondly, we extract possible factors correlated with users’ decisions from empirical analyses. Thirdly, to model the multiple pairwise relationship, we linearly integrate the total pairwise interactions to predict their ratings. Finally, experiments results show that the proposed MLIMF method perform much better than three baseline algorithms (UCF, ICF and RMF) with the RMSE metric.
Overall, MLIMF is a simple yet general approach since it mainly focuses on modeling the interactions between user and other information beyond ratings. Similar inspiration can be easily applied to other MF based models as a bulk denoting the userfactor interactions. In this paper, we just simply extend the basic RMF to explore the impact of categorical attributes on users’ rating patterns. However, there are many data mining tasks which need to deal with attributes of continuous values. Therefore, in order to address more general data mining challenges, it is necessary to design an effective framework to extend the proposed MLIMF. We attempt to study the possible applications of MLIMF to solve tough recommendation challenges like estimating clickthrough rate (CTR) in the era of computational advertising, building effective binary classifiers to predict the potential tastes of online users.
6 Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 11305043 and 11301490), the EU FP7 Grant 611272 (project GROWTHCOM), Zhejiang Provincial Natural Science Foundation of China (Grant No. LY14A050001), and the Zhejiang Provincial Qianjiang Talents Project (Grant No. QJC1302001), the startup foundations of Hangzhou Normal University.
References
 (1) P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl, Grouplens: an open architecture for collaborative filtering of netnews, in: Proceedings of the 1994 ACM conference on Computer supported cooperative work, ACM, 1994, pp. 175–186.
 (2) L. Lü, M. Medo, C. H. Yeung, Y.C. Zhang, Z.K. Zhang, T. Zhou, Recommender systems, Physics Reports 519 (1) (2012) 1–49.
 (3) J. Bobadilla, F. Ortega, A. Hernando, A. Gutiérrez, Recommender systems survey, KnowledgeBased Systems 46 (2013) 109–132.
 (4) M. Balabanović, Y. Shoham, Fab: contentbased, collaborative recommendation, Communications of the ACM 40 (3) (1997) 66–72.
 (5) J. L. Herlocker, J. A. Konstan, A. Borchers, J. Riedl, An algorithmic framework for performing collaborative filtering, in: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 1999, pp. 230–237.
 (6) W. Hill, L. Stead, M. Rosenstein, G. Furnas, Recommending and evaluating choices in a virtual community of use, in: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press/AddisonWesley Publishing Co., 1995, pp. 194–201.
 (7) B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Itembased collaborative filtering recommendation algorithms, in: Proceedings of the 10th international conference on World Wide Web, ACM, 2001, pp. 285–295.
 (8) G. Linden, B. Smith, J. York, Amazon. com recommendations: Itemtoitem collaborative filtering, Internet Computing, IEEE 7 (1) (2003) 76–80.
 (9) T. Hofmann, Latent semantic models for collaborative filtering, ACM Transactions on Information Systems (TOIS) 22 (1) (2004) 89–115.
 (10) Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender systems, Computer 42 (8) (2009) 30–37.
 (11) B. Sarwar, G. Karypis, J. Konstan, J. Riedl, Application of dimensionality reduction in recommender systema case study, Tech. rep., DTIC Document (2000).
 (12) J. Bennett, S. Lanning, The netflix prize, in: Proceedings of KDD cup and workshop, 2007, p. 35.

(13)
B. Webb, Netflix prize:
Try this at home (December 11, 2006).
URL http://sifter.org/~simon/journal/20061211.html  (14) Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008, pp. 426–434.
 (15) Y. Koren, Collaborative filtering with temporal dynamics, Communications of the ACM 53 (4) (2010) 89–97.
 (16) G. Takács, I. Pilászy, B. Németh, D. Tikk, Investigation of various matrix factorization methods for large recommender systems, in: Data Mining Workshops, 2008. ICDMW’08. IEEE International Conference on, IEEE, 2008, pp. 553–562.
 (17) G. Takács, I. Pilászy, B. Németh, D. Tikk, Matrix factorization and neighbor based algorithms for the netflix prize problem, in: Proceedings of the 2008 ACM conference on Recommender systems, ACM, 2008, pp. 267–274.
 (18) A. Paterek, Improving regularized singular value decomposition for collaborative filtering, in: Proceedings of KDD cup and workshop, 2007.
 (19) G. Takács, I. Pilászy, B. Németh, D. Tikk, Scalable collaborative filtering approaches for large recommender systems, The Journal of Machine Learning Research 10 (2009) 623–656.
 (20) X. Luo, Y. Xia, Q. Zhu, Incremental collaborative filtering recommender based on regularized matrix factorization, KnowledgeBased Systems 27 (2012) 271–280.
 (21) X. Luo, Y. Xia, Q. Zhu, Applying the learning rate adaptation to the matrix factorization based collaborative filtering, KnowledgeBased Systems 37 (2013) 154–164.
 (22) C.X. Zhang, Z.K. Zhang, L. Yu, C. Liu, H. Liu, X.Y. Yan, Information filtering via collaborative user clustering modeling, Physica A 396 (2014) 195–203.
 (23) A. Karatzoglou, X. Amatriain, L. Baltrunas, N. Oliver, Multiverse recommendation: ndimensional tensor factorization for contextaware collaborative filtering, in: Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 79–86.
 (24) L. Baltrunas, B. Ludwig, F. Ricci, Matrix factorization techniques for context aware recommendation, in: Proceedings of the fifth ACM conference on Recommender systems, ACM, 2011, pp. 301–304.
 (25) H. Ma, D. Zhou, C. Liu, M. R. Lyu, I. King, Recommender systems with social regularization, in: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, 2011, pp. 287–296.
 (26) M. Jamali, M. Ester, A matrix factorization technique with trust propagation for recommendation in social networks, in: Proceedings of the fourth ACM conference on Recommender systems, ACM, 2010, pp. 135–142.
 (27) N. Lathia, S. Hailes, L. Capra, knn cf: a temporal social network, in: Proceedings of the 2008 ACM conference on Recommender systems, ACM, 2008, pp. 227–234.
 (28) L. Xiang, Q. Yang, Timedependent models in collaborative filtering based recommender system, in: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, Vol. 1, 2009, pp. 450–457.
 (29) C. M. Bishop, et al., Pattern recognition and machine learning, Vol. 4, springer New York, 2006.