A Broad Learning Approach for ContextAware Mobile Application Recommendation
Abstract
With the rapid development of mobile apps, the availability of a large number of mobile apps in application stores brings challenge to locate appropriate apps for users. Providing accurate mobile app recommendation for users becomes an imperative task. Conventional approaches mainly focus on learning users’ preferences and app features to predict the userapp ratings. However, most of them did not consider the interactions among the context information of apps. To address this issue, we propose a broad learning approach for ContextAware app recommendation with Tensor Analysis (CATA). Specifically, we utilize a tensorbased framework to effectively integrate user’s preference, app category information and multiview features to facilitate the performance of app rating prediction. The multidimensional structure is employed to capture the hidden relationships between multiple app categories with multiview features. We develop an efficient factorization method which applies Tucker decomposition to learn the fullorder interactions within multiple categories and features. Furthermore, we employ a group norm regularization to learn the groupwise feature importance of each view with respect to each app category. Experiments on two realworld mobile app datasets demonstrate the effectiveness of the proposed method.
1 Introduction
The rapid adoption of mobile devices accelerates the proliferation of mobile apps. The number of available apps in the Google Play
There are some recent studies about the mobile apps recommendation, most of which leverage features of apps or users [1] [2] [3]. Karatzoglou et al. [1] proposed a collaborative filtering method for app recommendation by incorporating some contextual features like location, time of day, etc. Liu et al. [2] proposed to incorporate both app functionality and user privacy preference as features and capture the tradeoff between them for app recommendation. Most of the previous works only tried one kind of feature or a simple combination of multiple features and did not consider the complex interactions between those features. Currently there exist many works exploiting multiple views of features in the tasks like recommendation, clustering, etc [4, 5, 6, 7]. In the scenario of app recommendation, the interactions between different views of features are quite important as different views can provide complementary information. For example, assume we have obtained the latent representations for each app from three aspects, i.e., categories, permissions and description text, as shown in Fig. 1. BackCountry Navigator is an app categorized as Maps&Navigation and it is mainly used for outdoor navigation which can be inferred from the description text. The permission of getting users’ precise location is acceptable (i.e., a positive value), while the permission of reading SMS is abnormal (i.e., a negtive value). It can be found that only the thirdorder interaction provides a negative result reflecting the unreasonable permission for app function. The category of Instagram merely shows its function for social interaction (i.e., a positive value) and neglects the function of sharing photos (i.e., a negtive value). Through the interactions between multiple views, complementary information is provided to show a more sufficient understanding about the app. Obviously, the comprehensive consideration of the features from multiple views would be more insightful on understanding app information and user preference.
To figure out the latent correlations among the context information of apps, we conduct an empirical analysis on the dataset collected from Google Play and discover some important characteristics of mobile apps. For apps in different categories, users’ download behaviors are different. Within some categories like Maps & Navigation and Weather, users might only download one or two app for a long time use. However, for some categories like Entertainment, users are more likely to download many apps in the same category. Generally, the different download behaviors happen because users will consider different reasons (e.g, functions, interface, permissions) to decide whether to download apps for different categories. It can be inferred that users would focus on multiple views of features with different importance for apps in different categories. The analysis of the featurelevel correlation between categories shows the similarities between different categories are lower, which implies that the significances of features within a specific view are different for different categories. The category diversities of apps rated by different users are distinct. Some users prefer to download apps from various categories even though the amount of the downloaded apps is small. Based on the analysis, we consider to fuse the user preference, the category information, the features of multiple views, and the complex interactions among them to generate a contextaware category specific app rating prediction model.
In this paper, we propose a broad learning approach for ContextAware app recommendation with Tensor Analysis (CATA). Specifically, we integrate the interactions among the multiple categories and multiple views of features into a tensor structure through the tensor product of the corresponding feature spaces. The interactions with different orders can fully reflect the complementary relationships, and we use them to predict the user ratings on apps. To effectively learn the fullorder interactions
The main contributions of this paper are as follows:

We propose a contextaware recommendation approach for mobile apps called CATA that models the interactions with different orders among the multiple categories and multiple views of features as a tensor structure.

To effectively learn the hidden relationships among the different views of the context information of apps, Tucker decomposition is adopted to factorize the interaction parameters such that the principal components of the latent representations can be retained.

Empirical studies based on two real world datasets demonstrate the effectiveness of the proposed contextaware recommendation approach.
2 Data Analysis
In this section, we first describe the datasets used for the analysis and experiments. We then provide the statistical characteristics of the employed datasets.
2.1 Data Description

Google Play: We crawled app’s meta data (e.g., name, category, permissions, description) and user review ratings from its description page in Google Play. We filter users and apps with less than 5 ratings. Each rating record in this dataset is represented in three views, i.e., users, permissions and text. The user view consists of binary feature vectors for user ids which means there is only one nonzero feature in the user view for each rating record. The TFIDF vector representations of the app permissions and description texts are used as the permission and text view, respectively.

Apple’s App Store: The dataset is offered by [8][9] and consists of the apps in the “Top Free 300” and “Top Paid 300” leaderboards from Feb. 2010 to Sep. 2012, and the related user ratings and review information. As the dataset lacks of the classification information, we use Free and Paid as two categories, and we remove users and apps with less than 10 ratings. Each rating record in this dataset has two views, i.e., users and text. The user view are constructed using the same way as in the Google Play dataset. The TFIDF vector representations of the review texts of apps are used as the text view.
Table 1 shows the basic statistics of the two employed datasets.
Dataset  #App  #User  #Feature  #Category  #Rating 

Google Play  5460  7165  Text (2574)  45  67504 
Permissions (84)  
Apple App Store  2643  4010  Text (1592)  2  74764 
2.2 Characteristics of Google Play dataset
As the Google Play dataset has richer category and feature information, we focus on the analysis for it.
We calculate the proportions of the users who downloaded more than 2 apps for each category. Due to the space limitation, Fig. 2(a) reports the results of 20 randomly selected app categories. The lower proportion for a category is, the more users only download one or two apps in that category. Generally, apps in the categories with a low proportion can be used for a long time. Taking category Maps & Navigation in which only about 1.07% users downloaded more than 2 apps as an example, users usually only download one or two apps (e.g., Google Map, Baidu Map ) in this category as these apps are sufficient to use. The other categories having the low proportions in Fig 2(a) are Weather, Travel& Local, etc. The categories with high proportions are more general and the apps in them are with more varieties, like Tools, Puzzle, Arcade, etc. For apps in these categories, users might consider different reasons to decide whether to download the apps. For example, users mainly consider if the functions of the apps in category Tools will meet their demands. But for apps in category Maps & Navigation, compared to the functions, users pay more attention to the features like interface or permissions as they have clearly known the functions. We can learn that users would focus on multiple views of features with different significances for apps in different categories.
After the investigation of relationship between multiple views and categories, we explore the features within a specific view. To investigate the featurelevel correlation between categories with respect to a certain view, we calculate the featurebased similarities between each pair of apps from any two categories. Figure 2(b) shows the similarities generated based on app permission feature between any two categories. The category indexes are sorted by the order in Fig. 2(a) (i.e., #1 is Shopping). It can be observed that the similarities between two different categories are generally lower than those between the same categories. That is, for apps of different categories, the significance of features within a specific view are distinguishing.
To investigate the relationship between users and categories, we apply the diversity metric widely used for the evaluation of recommender systems [10] to evaluate the category diversity. The category diversity is calculated by , where is the set of apps rated by user . if app and belong to the same category, otherwise, . Figure 2(c) shows the category diversity and the number of apps for each user. The green curve presents the number of apps rated by users, and the blue curve is the category diversity of the apps rated by users. The user indexes on the axis are sorted by the values of category diversity in an ascending order. The left axis shows the value of category diversity and the right axis represents the number of apps. It can be found that some users have interactions with many types of apps even though the numbers of apps rated by them are very small while some users rate many apps with few categories. Different users have interactions with categories with different diversities. As discussed above, the importance of features from multiple views and features within a specific view is distinct for different categories. Therefore, for each user, it is critical to model his preference on an app considering the category information and the corresponding relationships with features of multiple views.
Based on the analysis of the relationships among app category, app feature, and user, it requires a recommendation model which can integrate the interactions among the multiple categories, multiple views of features, and users.
3 Preliminaries
In this work, we intend to predict ratings for mobile applications by a tensorbased approach. Before that, we introduce some related concepts and notation in tensor algebra that will be used throughout the paper, and then provide the problem formulation of app rating prediction.
3.1 Tensor Concepts and Notation
A tensor is a multidimensional array which generalizes matrix representation. Each dimension in tensor is called mode or way. Following prevailing convention, tensors are represented by calligraphic letters, matrices by boldface uppercase letters, vectors by boldfaced lowercase letters, and scalars by lowercase letters. An element of a vector , a matrix , or a tensor is represented by , , , etc., depending on the number of modes. All vectors are column vectors unless otherwise specified. For an arbitrary matrix , its th row and th column vector are represented by and , respectively. The outer product of vectors for all is an thorder tensor and defined elementwise as for . The inner product of two tensors is defined as . In particular, for and , it holds that
(1) 
Definitions of Kronecker product, KhatriRao product, mode product, and Tucker decomposition are given below, which will be applied to build the proposed model.
Definition 3.1 (Kronecker Product)
The Kronecker product of matrices and is denoted by . The result is a matrix of size and defined by
(2) 
Definition 3.2 (KhatriRao Product)
The KhatriRao product of matrices and is denoted by . The result is a matrix of size and defined by
(3) 
The KhatriRao product is the “matching columnwise” Kronecker product.
Definition 3.3 (mode Product)
The mode product of a tensor with a matrix denoted by is defined as
(4) 
Figure 3 visualizes the Tucker decomposition of a thirdorder tensor and table 2 summarizes the main notations for easy referencing.
Definition 3.4 (Tucker Decomposition)
For a general tensor , its Tucker decomposition is defined as
(5) 
where are the factor matrices and can be thought of as the principal components in each mode. is called the core tensor. is used for shorthand notation.
Symbol  Definition and description 

each lowercase letter represents a scale  
each boldface lowercase letter represents a vector  
each boldface capital letter represents a matrix  
each calligraphic letter represents a tensor, set or space  
a set of integers in the range of to inclusively.  
denotes inner product  
denotes outer product  
denotes Kronecker product  
denotes KhatriRao product  
denotes mode product  
denotes Frobenius norm of vector, matrix or tensor 
3.2 Problem Formulation
Suppose that the scenario of app rating prediction includes a user set and mobile app set . The numbers of app categories and feature views are and . Let be the number of the rating records in the category , then the total number of rating records is . Let be the dimensionality of the feature view and .
In this paper, we construct a multidimensional tensor to discover the latent interactions among the category information and multiview features. Each rating record in category can be represented in different views, i.e., , where and . Generally, a rating record involves a user, an app, and different types of characteristics of the app. Given a training set of rating records , where is the feature matrix in the th category for th view and is the vector of the rating values of those apps in the th category. Our goal is to find a predictive function for each category that can minimize the expected loss and provide accurate predicted ratings. The regularized objective function to be minimized can be formulated as:
(6) 
where is the empirical loss in the category. is the regularization term and is the regularization parameter. can be rewritten as the average square error of each instance.
(7) 
4 Proposed Method
In this section, we first introduce the contextaware recommendation approach based on tensor analysis (CATA). Then we discuss how to employ Tucker decomposition to learn the proposed model without physically building the tensor.
4.1 Model for App Rating Prediction
We derive the proposed model from the basic framework of linear analysis. Given a vector of an app rating record , the basic linear model for the th category is written as
(8) 
where is the weight vector for the th category, and is the bias factor for adjusting the threshold of the th category label assignment.
Let and , then the bias factor can be absorbed to (see [11]). Eq. (8) can thus be rewritten as follows:
(9) 
Let denote the weight matrix to be learned, whose columns are the vector . In order to jointly learn multiple linear models for categories, we introduce a category indicator vector denoted by to model the secondorder interactions between input features and categories. The indicator vector is defined as
Then Eq. (9) can be rewritten as
(10) 
Note that the outer product is used to compute intersections between input features and categories, which consists in the product of all combinations of the variables that define each domain. This data fusion technique provides a good framework to introduce multiple features. When each object is associated with multiview features, by means of the outer product we can easily extend the above Eq. (10) to the multiview case and provide a consensus formulation.
Suppose that the given rating records are composed by features of views (denoted as ), we can extend Eq. (10) to model the fullorder interactions between multiview features and categories as:
(11) 
or elementwise as
(12) 
Where is the input data vector, and is the weight tensor to be learned, wherein is the global bias, and with some indexes satisfying encodes lowerorder interactions between views whose .
In such a manner, the fullorder interactions between multiple views and categories are embedded within the tensor structure, as shown in Fig. 4. However, one drawback might be generated from the model is that not all the categories are fit to the constructed feature tensor and those interactions will be redundant information. Thus, we consider to build a rating predictive function based on both the fullorder feature interaction space and the original feature spaces.
Let be the fullorder tensor, and be the feature vector concatenated by multiple views. We formulate our CATA model as follows:
(13) 
where is the categoryspecific weight vector. For convenience in the following discussion, we denote .
4.2 Model Inference
The number of parameters to be estimated in Eq. (13) is , which makes it infeasible to directly learning the model. Therefore, we assume that the weight tensor can be factorized by Tucker decomposition as
(14) 
where is called the core tensor and its entries show the level of interaction between the different components, is the shared structure matrix for the th view, and is the category specific weight matrix.
Then we can transform Eq. (12) into
(15) 
Because only when and according to Eq. (1), we can further rewrite the equation above into
(16) 
where is the mode product and means multiplying the core tensor by the category specific vector . It is worth noting that the first row within is associated with the constant value and represents the bias factors of the th view. The bias factors make the lowerorder interactions active in the rating predictive function.
Using Eq. (4.2) to replace the first term in Eq. (13), the rating predictive function can be represented by
(17) 
The whole framework of the proposed CATA method is illustrated in Fig. 5.
4.3 Model Estimation
We propose to learn the app rating prediction model CATA by minimizing the following regularized empirical risk:
(18) 
The regularization term and can be set as Frobenius norm, norm, or other structural regularization. In this paper, we adopt the alternating block coordinate descent approach for the optimization of the given objective function. The whole learning procedure is summarized in Algorithm 4.3.
With all other parameters fixed, the minimization over consists of learning the parameters by a regularization method, and the partial derivative of w.r.t. is given by
(19) 
where and for .
For convenience, we let denote the Kronecker product in a reverse order from to and denote . Let and . Then we have that
(20) 
where the mode matricization of tensor .
With all other parameters fixed, the minimization over consists of learning each parameter component independently. The partial derivative of w.r.t. is given by
(21) 
Following the derivation in Eq.(20), we have that
(22) 
By keeping all other parameters fixed, we can get the partial derivative of w.r.t. the core tensor as follows,
(23) 
Following the derivation in Eq.(20), we have that
(24) 
By keeping all other parameters fixed, we can get the partial derivative of w.r.t. the core tensor as follows,
(25) 
where is the concatenated feature matrix for the th category.
\KwInTraining data ,
number of factors , regularization parameter , and learning rate \KwOutModel parameters , , ,
\BlankLineInitialize, , , .
\Repeatconvergence
Fixing , , and , update
\For
Fixing , , , and , update
Fixing , , and , update
Fixing , , and , update
4.4 Group Norm
As mentioned in section 4.3, the regularization terms can be Frobenius norm, norm, or other structural regularization, here we present a proper regularization term for parameter to further improve the performance of rating prediction .
For the original feature spaces, i.e., the second term in Eq. (13), the feature of a specific view might be more or less discriminative for different app categories. For instance, the description information is more useful for the distinguishing of apps in the Lifestyle category than that of apps in Map & Navigation category. It is mainly because Lifestyle is a broad cluster and the functionality which could be extracted from the description text of each app in it are very different from each other. Consider this, we introduce group norm (norm, for short) for regularization, which is defined as [12]. The norm applies norm within each view and norm between views, so it can enforce the sparsity between different views. It means that if a specific view of features are not significant for the apps in a certain category, the weights with very small values will be assigned to them for the corresponding category. The norm can further improve the performance of app rating prediction as it captures the global relationships between views. The right part of Fig. 5 simply shows the categoryspecific weight matrix as an illustration. The elements with gray color have large values. It can be found that the norm effectively emphasizes the viewwise weight learning corresponding to each category.
5 Experiments
In this section, we will verify the effectiveness of the proposed method by conducting a series experiments compared to five well known baselines.
Training  Metrics  PMF  MTFL  FM  MVM  MFM  CATA  CATAG 

60%  MAE  0.95970.0157  0.92720.0247  0.89640.0131  0.87350.0299  0.87610.0177  0.76500.0571  0.76790.0152 
RMSE  1.30150.0234  1.46160.0305  1.21330.0206  1.21220.0353  1.20210.0171  1.17200.0402  1.15860.0209  
70%  MAE  0.94630.0083  0.88890.0053  0.87860.0065  0.84960.0070  0.86600.0350  0.79110.0442  0.78640.0141 
RMSE  1.28340.0090  1.42570.0117  1.19810.0096  1.18360.0089  1.19590.0122  1.16560.0151  1.16260.0115  
80%  MAE  0.93690.0108  0.85750.0201  0.85680.0112  0.83840.0134  0.84390.0266  0.78260.0274  0.77650.0206 
RMSE  1.27450.0157  1.39040.0260  1.17850.0131  1.17410.0113  1.18150.0147  1.15020.0211  1.14190.0154 
Training  Metrics  PMF  MTFL  FM  MVM  MFM  CATA  CATAG 

60%  MAE  1.06090.0062  0.98560.0221  0.94260.0038  0.94630.0098  0.93110.0113  0.93420.0087  0.92710.0052 
RMSE  1.31800.0064  1.30640.0231  1.28900.0069  1.27170.0191  1.24220.0070  1.23960.0059  1.23770.0060  
70%  MAE  1.05510.0056  0.98560.0188  0.93450.0091  0.94090.0135  0.91980.0164  0.92460.0081  0.92470.0077 
RMSE  1.30460.0091  1.30620.0219  1.27450.0123  1.25260.0157  1.23120.0135  1.22570.0126  1.22620.0112  
80%  MAE  1.05410.0052  0.98420.0130  0.94040.0057  0.94270.0080  0.95260.0206  0.93250.0019  0.92880.0057 
RMSE  1.30650.0033  1.29550.0153  1.28000.0090  1.24930.0141  1.23720.0083  1.23170.0066  1.22860.0079 
5.1 Experimental Setup
After the filtering for the Google Play dataset, we first select the top 20 categories with the most apps and then filter users and apps with less than 5 ratings. We obtain 3065 apps and 3895 users with 36791 rating records. The numbers of permissions and text tokens are 83 and 1762, respectively.
We randomly select % (), 10%, and 10% of the rating records in each categories as training set, validation set, and testing set. The parameters of all the baselines are set to the optimal values. For the proposed methods, all the dimensions of the core tensor are set as 5, and the learning rate is set . The maximum numbers of iterations are set as 400. Grid searching is employed to select the optimal regularization parameters for all the comparison methods. Each experiment is repeated for 5 times, and the mean and standard deviation of each metric in both dataset are reported in the next subsection.
We use the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) [13] to evaluate the performance of the proposed approach and the other compared methods. A smaller MAE or RMSE means the better performance.
5.2 Compared Methods
In order to demonstrate the effectiveness of the proposed CATA approach, we compare the following methods.

PMF. It is the Probabilistic Matrix Factorization proposed by Salakhutdinov and Minh [14], and the method is widely used for rating prediction tasks.

MTFL. It is the MultiTask Feature Learning algorithm [12] which is a multivariate regression model with group norm.

FM. It is the Factorization Machine [15] that explores pairwise interactions between all features without view segmentation. We implement the FM by concatenating the category indicator vector and all the feature vectors as the input feature vector.

MVM. It is the Multiview Machine [16] that models the features from multiple views as a tensor structure to explore the fullorder interactions between them.

MFM. It is the Multilinear Factorization Machines [17] that learns taskspecific feature map and the taskview shared multilinear structures from fullorder interactions by applying a joint factorization.

CATA. It is the proposed rating prediction model in this paper that effectively integrates user’s preference, app category and features of multiple views and applies Tucker decomposition to learn the fullorder interactions.

CATAG. It is the variation of the proposed CATA that uses group norm for the categoryspecific weight matrix .
5.3 Performance Comparison
In this subsection, we present the performance comparisons between the proposed CATA methods and the baselines with respect to two metrics, i.e., MAE and RMSE.
Table 3 and Table 4 show the performance of all the prediction methods on the Google Play and Apple App Store datasets. We can find that the proposed approach consistently outperforms the other baselines on both datasets in almost all cases. It demonstrates the superiority of the contextaware prediction approach which utilizes higherorder decomposition to learn the fullorder interactions. It can be observed that CATAG method performs better than CATA overall, which indicates that the employed group norm can effectively improve the rating prediction accuracy by enforcing the sparsity between different views of features.
It is not surprising that PMF has poor performance in both datasets since it doesn’t employ any other features of apps. MTFL also performs badly mainly because it ignores the segmentation of feature views and the interactions between the multiple views of features. Compared to MTFL, the improvement achieved by FM illustrates the necessity of feature interactions. Both MVM and MFM outperform FM, especially for the Google Play dataset, and the results generated by them are competitive with each other. It is mainly because that MVM and MFM consider fullorder interactions including the higherorder feature interactions and global bias. However, it is crucial for predicting app ratings to distinguish different categories. The propose CATA methods achieve the best performance because of the consideration of categoryspecific multiview feature interactions. The application of Tucker decomposition effectively facilitates the performance as it permits the interactions within each modality [18] while the CP decomposition used in MFM does not.
Comparing the two datasets, it can be found that the superiority of the proposed approach is more significant for Google Play dataset. It is mainly caused by the fewer categories and feature views in the dataset of Apple’s App Store. The fewer categories might not sufficiently discriminate the important features in the specific category, while the fewer views of features would lead to the lack of some interaction information between features from different views. Moreover, the two categories in Apple App Store, i.e., “Free” and “Paid”, do not have their own unique characteristics and are not easy to differentiate from each other. Nonetheless, even with very limited context information, the CATA methods still outperform the baselines in almost all cases.
5.4 Impact of Feature Views
In order to explore the impact of the feature views for the proposed CATAG method, we conduct experiments based on different numbers of views. As each rating record in Apple App Store dataset only has two views, i.e., user and review text, we use Google Play dataset for the experiments. Figure 6 shows the prediction performance of the CATAG method with two and three feature views. Note that U, D, and P respectively denote user, description text, and permission. It can be observed that the CATAG method consistently performs best when incorporating three views of features, which benefits from the complementary information generated by the interactions among the various views of features. It indicates that the incorporation of multiple views of features can effectively improve the accuracy of rating prediction for apps. Consider the results produced by two views, we can find that the adoption of permission brings better results than that of description text. This is probably because the features extracted from description text are more sophisticated and higher dimensional, which may provide redundant information.
5.5 CategorySpecific Performance
In this section, we further analyze the performance of the proposed method for each category based on Google Play dataset. Figure 7 shows the MAE and RMSE values of the proposed method and the top 2 baseline methods in each category. The category indexes on the axis is sorted by the numbers of rating records within the categories in an ascending order. We can find that MFM performs better than MVM in the categories with few training instances. It indicates that when few instances are available, the method incorporating the category information can improve the performance as it explores the information from other complementary information. The performance of the proposed CATAG method is the worst with few instances, as CATAG has more model parameters to learn and requires more instances. For the categories with more instances, CATAG makes significant improvements and outperforms the other two methods. Compared with MFM, the superiority of CATAG is the application of Tucker decomposition which can effectively retain the principal components of the weight tensor. Another intereting observation in Fig. 7 is that CATAG makes the top 5 improvements for category Simulation, Action, Casual, Arcade, and Puzzle (i.e., #11, #12, #15, #16, and #19). The apps in the five cateogries are game apps, and it means the features of them are more complicated as each game app has its specific theme setting. Therefore, the proposed method has a greater ability to discriminate the importance of each feature in a complicated feature sets.
5.6 Sensitivity Analysis
There are two hyperparameters (i.e., and ) in the proposed CATA approach. They are used to control the tradeoff between the empirical loss and the prior knowledge encoded by the regularizations. To learn the impacts of the two hyperparameters on the performance of app rating prediction, we run the proposed approach with different values for and on the two datasets. From Fig. 8, we can observe that the performance is stable for most pairs of the two hyperparameters. For each dataset, the effects of the two hyperparameters on MAE and RMSE are different. For the Google Play dataset, Figs. 8(a) and (b) show that the unstable and worse MAE and RMSE are produced when given a larger (i.e., ) or a smaller (i.e., in the range from to ). And the best performance is achieved by the relatively large value of (i.e., in the range from to ) with . Figure 8 (c) and (d) report the results of the Apple App Store dataset, from which we can find that the performance is more stable than that of Google Play. Similarly, when the value of is larger or the value of is smaller, the MAE and RMSE are relatively higher. The best performance of the MAE is achieved by while the RMSE is much lower when the value of is set . The value of is in the range from to . For both datasets, the best performance is generated by a larger and the larger means the model hyperparameters for the categoryspecific weight matrix can be small. It indicates that the part of fullorder interactions among multiple categories and multiple views of features is much more important.
6 Related Work
To the best of our knowledge, this is the first work to consider mining the fullorder interactions among app context information with tensor analysis to facilitate mobile app recommendation. From the conceptual perspective, two topics can be seen as closely related to this work: mobile app recommendation, tensor factorization and its applications. We give a short overview of these areas and distinguish from other existing methods.
Mobile App Recommendation has drawn an increaasing number of attentions as an effective way to alleviate information overload in app market. Most of the existing works are trying to leverage one or more kinds of features to improve the recommendation performance. Yan et al. [19] developed the AppJoy system that recommends mobile apps by based on the analysis of users’ usage records. In [20], Yin et al. applies users’ view/download sequences to mine the actual value and tempting value of apps, which are used to build a recommendation model considering the contest between apps. Features from the other sources are incorporated in some works. For instance, to address the coldstart problem, Lin et al. [21] proposed to apply app followers’ features collected from Twitter to model the app and estimate which users may like the app. Zhu et al. [3] presented a method to evaluate the security risks of apps and proposed a flexible app recommendation approach combining both apps’ popularity and users’ security preferences through the modern portfolio theory. A recommendation model which can capture the tradeoff between app functionality and user privacy preference was proposed in [2]. However, most of these works did not consider the complex interactions among the features of different views. In this work, we propose to model the interactions as a tensor structure and leverage tensor factorization to learn the latent relationships.
Tensor Factorization and Applications Tensor factorization is a method to divide a tensor in multidimensionality into many smaller parts. A comprehensive survey on tensor factorization can be found in [22]. Two wellknown methods in this area are CANDECOMP/PARAFAC (CP) factorization and Tucker factorization. Both of them can be considered as higherorder generalization of Singular value decomposition (SVD) and Principle Component Analysis (PCA). These methods are used to decompose tensor data into simpler form, containing better features and intrinsic multiway structures. CP factorization has been frequently investigated in the multiview learning literature because of its simplicity. Specifically, [23] first introduced to use the outer product operator to fuse multiview features in the tensor structure and proposed a CP factorization based multiview feature selection method. Later, [16] extended this approach to consider the fullorder interactions between features, and proposed a CP factorization based multiview machine (MVM) for multiview prediction problems. Recently, [17] extended the MVM method to deal with multitask multiview prediction problems and proposed a CP factorization based multilinear factorization machine. However, to the best of our knowledge, none of the studies explored Tucker decomposition in the scenario of multiview learning. Tucker decomposition is more general than the CP decomposition, and permits the interactions within each mode while the CP decomposition does not [18]. This paper gives an application of Tucker decomposition into multiview learning task.
7 Conclusions
In this paper, we propose a contextaware recommendation approach based on tensor analysis (CATA) for mobile apps. The proposed CATA approach models the interactions among the multiple categories and multiple views of features of apps as a tensor structure. CATA applies the Tucker decomposition to collectively learn the categoryspecific features and the latent relationships integrated in the fullorder interactions without physically building the tensor. To further improve the performance of app recommendation, we present a group norm regularization for the global categoryspecific weight matrix. Extensive experiments based on two realworld app datasets demonstrate the effectiveness of the proposed CATA approach.
Footnotes
 Google Play: https://play.google.com/store/apps
 Apple App Store: https://itunes.apple.com/us/genre/ios/id36?mt=8
 Fullorder interactions range from the firstorder interactions (i.e., singleview features in each category) to the highestorder interactions (i.e., all combinations of features from multiple views and from different categories).
References
 A. Karatzoglou, L. Baltrunas, K. Church, and M. Böhmer, “Climbing the app wall: enabling mobile app discovery through contextaware recommendations,” in CIKM. ACM, 2012, pp. 2527–2530.
 B. Liu, D. Kong, L. Cen, N. Z. Gong, H. Jin, and H. Xiong, “Personalized mobile app recommendation: Reconciling app functionality and user privacy preference,” in WSDM. ACM, 2015, pp. 315–324.
 H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Mobile app recommendations with security and privacy awareness,” in SIGKDD. ACM, 2014, pp. 951–960.
 C.T. Lu, S. Xie, W. Shao, L. He, and S. Y. Philip, “Item recommendation for emerging online businesses.” in IJCAI, 2016, pp. 3797–3803.
 W. Shao, L. He, and S. Y. Philip, “Clustering on multisource incomplete data via tensor modeling and factorization,” in PAKDD. Springer, 2015, pp. 485–497.
 W. Shao, L. He, C.T. Lu, X. Wei, and S. Y. Philip, “Online unsupervised multiview feature selection,” in ICDM. IEEE, 2016, pp. 1203–1208.
 W. Shao, L. He, and S. Y. Philip, “Multiple incomplete views clustering via weighted nonnegative matrix factorization with l_ 2, 1 regularization,” in ECML/PKDD. Springer, 2015, pp. 318–334.
 H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Discovery of ranking fraud for mobile apps,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 1, pp. 74–87, 2015.
 H. Zhu, C. Liu, Y. Ge, H. Xiong, and E. Chen, “Popularity modeling for mobile apps: A sequential approach,” IEEE Transactions on Cybernetics, vol. 45, no. 7, pp. 1303–1314, 2015.
 F. Ricci, L. Rokach, and B. Shapira, Introduction to recommender systems handbook. Springer, 2011.
 S. ShalevShwartz and S. BenDavid, Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
 H. Wang, F. Nie, H. Huang, J. Yan, S. Kim, S. Risacher, A. Saykin, and L. Shen, “Highorder multitask feature learning to identify longitudinal phenotypic markers for alzheimer’s disease progression prediction,” in NIPS, 2012, pp. 1277–1285.
 A. Gunawardana and G. Shani, “A survey of accuracy evaluation metrics of recommendation tasks,” Journal of Machine Learning Research, vol. 10, no. Dec, pp. 2935–2962, 2009.
 R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization.” in NIPS, vol. 1, no. 1, 2007, pp. 2–1.
 S. Rendle, “Factorization machines,” in ICDM. IEEE, 2010, pp. 995–1000.
 B. Cao, H. Zhou, G. Li, and P. S. Yu, “Multiview machines,” in WSDM. ACM, 2016, pp. 427–436.
 C.T. Lu, L. He, W. Shao, B. Cao, and P. S. Yu, “Multilinear factorization machines for multitask multiview learning,” in WSDM. ACM, 2017, pp. 701–709.
 A. Cichocki, M. Mørup, P. Smaragdis, W. Wang, and R. Zdunek, “Advances in nonnegative matrix and tensor factorization,” Computational Intelligence and Neuroscience: CIN, 2008.
 B. Yan and G. Chen, “Appjoy: personalized mobile application discovery,” in MobiSys. ACM, 2011, pp. 113–126.
 P. Yin, P. Luo, W.C. Lee, and M. Wang, “App recommendation: a contest between satisfaction and temptation,” in WSDM. ACM, 2013, pp. 395–404.
 J. Lin, K. Sugiyama, M.Y. Kan, and T.S. Chua, “Addressing coldstart in app recommendation: latent user models constructed from twitter followers,” in SIGIR. ACM, 2013, pp. 283–292.
 T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
 B. Cao, L. He, X. Kong, P. S. Yu, Z. Hao, and A. B. Ragin, “Tensorbased multiview feature selection with applications to brain diseases,” in ICDM. IEEE, 2014, pp. 40–49.