Inductive Framework for Multi-Aspect StreamingTensor Completion with Side Information

Inductive Framework for Multi-Aspect Streaming
Tensor Completion with Side Information

Madhav Nimishakavi, Bamdev Mishra, Manish Gupta, Partha Talukdar
Indian Institute of Science, India
Microsoft, India
madhav@iisc.ac.in, bamdevm@microsoft.com, manishg@microsoft.com, ppt@iisc.ac.in
Abstract

Low-rank tensor completion is a well-studied problem and has applications in various fields. However, in many real-world applications the data is dynamic, i.e., the tensor grows as new data arrives. Besides the tensor, in many real-world scenarios, side information is also available in the form of matrices which also grow. Existing work on dynamic tensor completion do not incorporate side information and most of the previous work is based on the assumption that the tensor grows only in one mode. We bridge this gap in this paper by proposing a dynamic tensor completion framework called Side Information infused Incremental Tensor Analysis (SIITA), which incorporates side information and works for general incremental tensors. We carry out extensive experiments on multiple real-world datasets to demonstrate the effectiveness of SIITA in various different settings.

1 Introduction

Low rank tensor completion is a well-studied problem and has various applications in the fields of recommendation systems [32], link-prediction [4], compressed sensing [3], to name a few. Majority of the previous works focus on solving the problem in a static setting [7, 10, 15]. However, most of the real world data is dynamic, for example in an online movie recommendation system the number of users and movies increase with time. It is prohibitively expensive to use the static algorithms for dynamic data. Therefore, there has been an increasing interest in developing algorithms for dynamic low-rank tensor completion [16, 20, 29].

Usually in many real world scenarios, besides the tensor data, additional side information is also available, e.g., in the form of matrices. In the dynamic scenarios, the side information grows with time as well. For instance, movie-genre information in the movie recommendation etc. There has been considerable amount of work in incorporating side information into tensor completion [23, 8]. However, the previous works on incorporating side information deal with the static setting. In this paper, we propose a dynamic low-rank tensor completion model that incorporates side information growing with time.

Most of the current dynamic tensor completion algorithms work in the streaming scenario, i.e., the case where the tensor grows only in one mode, which is usually the time mode. In this case, the side information is a static matrix. Multi-aspect streaming scenario [6, 29], on the other hand, is a more general framework, where the tensor grows in all the modes of the tensor. In this setting, the side information matrices also grow. Figure 1 illustrates the difference between streaming and multi-aspect streaming scenarios with side information.

Besides side information, incorporating nonnegative constraints into tensor decomposition is desirable in an unsupervised setting. Nonnegativity is essential for discovering interpretable clusters [12, 22]. Nonnegative tensor learning is explored for applications in computer vision [27, 17], unsupervised induction of relation schemas [25], to name a few. Several algorithms for online Nonnegative Matrix Factorization (NMF) exist in the literature [19, 9, 36], but algorithms for nonnegative online tensor decomposition with side information are not explored to the best of our knowledge. We also fill this gap by showing how nonnegative constraints can be enforced on the decomposition learned by our proposed framework SIITA.

(a) Streaming tensor sequence with side information. (b) Multi-aspect streaming tensor sequence with side information.
Figure 1: Illustration of streaming and multi-aspect streaming sequences with side information. The blue block represents the tensor at time step and the green block represents the side information. The blocks in grey represent the data at previous time steps. For easy understanding, we show side information along only one mode.

In this paper, we work with the more general multi-aspect streaming scenario and make the following contributions:

  • Formally define the problem of multi-aspect streaming tensor completion with side information.

  • Propose a Tucker based framework Side Information infused Incremental Tensor Analysis(SIITA) for the problem of multi-aspect streaming tensor completion with side information. We employ a stochastic gradient descent (SGD) based algorithm for solving the optimization problem.

  • Incorporate nonnegative constraints with SIITA for discovering the underlying clusters in unsupervised setting.

  • Demonstrate the effectiveness of SIITA using extensive experimental analysis on multiple real-world datasets in all the settings.

The organization of the paper is as follows. In Section 3, we introduce the definition of multi-aspect streaming tensor sequence with side information and discuss our proposed framework SIITA in Section 4. We also discuss how nonnegative constraints can incorporated into SIITA in Section 4. The experiments are shown in Section 5, where SIITA performs effectively in various settings. All our codes are implemented in Matlab, and can be found at https://madhavcsa.github.io/.

2 Related Work

Property TeCPSGD [20] OLSTEC [16] MAST [29] AirCP [8] SIITA
(this paper)
Streaming
Multi-Aspect Streaming
Side Information
Sparse Solution
Table 1: Summary of different tensor streaming algorithms.

Dynamic Tensor Completion : [30, 31] introduce the concept of dynamic tensor analysis by proposing multiple Higher order SVD based algorithms, namely Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA) and Window-based Tensor Analysis (WTA) for the streaming scenario. [26] propose two adaptive online algorithms for CP decomposition of -order tensors. [35] propose an accelerated online algorithm for tucker factorization in streaming scenario, while an accelerated online algorithm for CP decomposition is developed in [37].

A significant amount of research work is carried out for dynamic tensor decompositions, but work focusing on the problem of dynamic tensor completion is relatively less explored. Work by [20] can be considered a pioneering work in dynamic tensor completion. They propose a streaming tensor completion algorithm based on CP decomposition. Recent work by [16] is an accelerated second order Stochastic Gradient Descent (SGD) algorithm for streaming tensor completion based on CP decomposition. [6] introduces the problem of multi-aspect streaming tensor analysis by proposing a histogram based algorithm. Recent work by [29] is a more general framework for multi-aspect streaming tensor completion.

Tensor Completion with Auxiliary Information : [1] propose a Coupled Matrix Tensor Factorization (CMTF) approach for incorporating additional side information, similar ideas are also explored in [2] for factorization on hadoop and in [5] for link prediction in heterogeneous data. [23] propose with-in mode and cross-mode regularization methods for incorporating similarity side information matrices into factorization. Based on similar ideas, [8] propose AirCP, a CP-based tensor completion algorithm.

[33] propose nonnegative tensor decmpositon by incorporating nonnegative constraints into CP decomposition. Nonnegative CP decomposition is explored for applications in computer vision in [27]. Algorithms for nonnegative Tucker decomposition are proposed in [17] and for sparse nonnegative Tucker decomposition are proposed in [21]. However, to the best our knowledge, nonnegative tensor decomposition algorithms do not exist for dynamic settings, a gap we fill in this paper.

Inductive framework for matrix completion with side information is proposed in [13, 24, 28], which has not been explored for tensor completion to the best of our knowledge. In this paper, we propose an online inductive framework for multi-aspect streaming tensor completion.

Table 1 provides details about the differences between our proposed SIITA and various baseline tensor completion algorithms.

3 Preliminaries

An -order or -mode tensor is an -way array. We use boldface calligraphic letters to represent tensors (e.g., ), boldface uppercase to represent matrices (e.g., U), and boldface lowercase to represent vectors (e.g., v). represents the entry of indexed by .  
Definition 1 (Coupled Tensor and Matrix) [29]: A matrix and a tensor are called coupled if they share a mode. For example, a tensor and a matrix are coupled along the movie mode.  
Definition 2 (Tensor Sequence) [29]: A sequence of -order tensors is called a tensor sequence denoted as , where each at time instance .  
Definition 3 (Multi-aspect streaming Tensor Sequence) [29]: A tensor sequence of -order tensors is called a multi-aspect streaming tensor sequence if for any , is the sub-tensor of , i.e.,

Here, increases with time, and is the snapshot tensor of this sequence at time .  
Definition 4 (Multi-aspect streaming Tensor Sequence with Side Information) : Given a time instance , let be a side information (SI) matrix corresponding to the mode of (i.e., rows of are coupled along mode of ). While the number of rows in the SI matrices along a particular mode may increase over time, the number of columns remain the same, i.e., is not dependent on time. In particular, we have,

Putting side information matrices of all the modes together, we get the side information set ,

Given an -order multi-aspect streaming tensor sequence , we define a multi-aspect streaming tensor sequence with side information as .

We note that all modes may not have side information available. In such cases, an identity matrix of appropriate size may be used as , i.e., , where .

The problem of multi-aspect streaming tensor completion with side information is formally defined as follows:

Problem Definition: Given a multi-aspect streaming tensor sequence with side information , the goal is to predict the missing values in by utilizing only entries in the relative complement and the available side information .

4 Proposed Framework SIITA

In this section, we discuss the proposed framework SIITA for the problem of multi-aspect streaming tensor completion with side information. Let be an -order multi-aspect streaming tensor sequence with side information. Assuming that, at every time step, are only observed for some indices , where is a subset of the complete set of indices . Let the sparsity operator be defined as:

Tucker tensor decomposition [18], is a form of higher-order PCA for tensors. It decomposes an -order tensor into a core tensor multiplied by a matrix along each mode as follows

where, are the factor matrices and can be thought of as principal components in each mode. The tensor is called the core tensor, which shows the interaction between different components. is the (multilinear) rank of the tensor. The -mode matrix product of a tensor with a matrix is denoted by , more details can be found in [18]. The standard approach of incorporating side information while learning factor matrices in Tucker decomposition is by using an additive term as a regularizer [23]. However, in an online setting the additive side information term poses challenges as the side information matrices are also dynamic. Therefore, we propose the following fixed-rank inductive framework for recovering missing values in , at every time step :

(1)

where

(2)

is the Frobenius norm, and are the regularization weights. Conceptually, the inductive framework models the ratings of the tensor as a weighted scalar product of the side information matrices. Note that (1) is a generalization of the inductive matrix completion framework [13, 24, 28], which has been effective in many applications.

The inductive tensor framework has two-fold benefits over the typical approach of incorporating side information as an additive term. The use of terms in the factorization reduces the dimensionality of variables from to and typically . As a result, computational time required for computing the gradients and updating the variables decreases remarkably. Similar to [17], we define

which collects Kronecker products of mode matrices except for in a backward cyclic manner.

The gradients for (1) wrt for and can be computed as following:

(3)

where

By updating the variables using gradients given in (3), we can recover the missing entries in at every time step , however that is equivalent to performing a static tensor completion at every time step. Therefore, we need an incremental scheme for updating the variables. Let and represent the variables at time step , then

(4)

since is recovered at the time step -, the problem is equivalent to using only

for updating the variables at time step .

We propose to use the following approach to update the variables at every time step , i.e.,

(5)

where is the step size for the gradients. , needed for computing the gradients of , is given by

(6)
Input : 
Randomly initialize and ;
for t = 1, 2, … do
        ;
        ;
        for k = 1:K do
               Compute from (6) using and ;
               Compute for from (3);
               Update using and in (5) ;
               Compute from (3);
               Update using and in (5);
              
        end for
       ;
        ;
       
end for
Return : .
Algorithm 1 Proposed SIITA Algorithm

Algorithm 1 summarizes the procedure described above. The computational cost of implementing Algorithm 1 depends on the update of the variables (5) and the computations in (6). The cost of computing is . The cost of performing the updates (5) is . Overall, at every time step, the computational cost of Algorithm 1 is .

Extension to the nonnegative case: NN-SIITA

We now discuss how nonnegative constraints can be incorporated into the decomposition learned by SIITA. Nonnegative constraints allow the factor of the tensor to be interpretable.

We denote SIITA with nonnegative constraints with NN-SIITA. At every time step in the multi-aspect streaming setting, we seek to learn the following decomposition:

(7)

where is as given in (2).

We employ a projected gradient descent based algorithm for solving the optimization problem in (7). We follow the same incremental update scheme discussed in Algorithm 1, however we use a projection operator defined below for updating the variables. For NN-SIITA, (5) is replaced with

where is the element-wise projection operator defined as

The projection operator maps a point back to the feasible region ensuring that the factor matrices and the core tensor are always nonnegative with iterations.

5 Experiments

We evaluate SIITA against other state-of-the-art baselines in two dynamic settings viz., (1) multi-aspect streaming setting (Section 5.1), and (2) traditional streaming setting (Section 5.2). We then evaluate effectiveness of SIITA in the non-streaming batch setting (Section 5.3). We analyze the effect of different types of side information in Section 5.4. Finally, we evaluate the performance of NN-SIITA in the unsupervised setting in Section 5.5.

Datasets: Datasets used in the experiments are summarized in Table 2. MovieLens 100K [11] is a standard movie recommendation dataset. YELP is a downsampled version of the YELP(Full) dataset [14]. The YELP(Full) review dataset consists of 70K (user) 15K (business) 108 (year-month) tensor, and a side information matrix of size 15K (business) 68 (city). We select a subset of this dataset as the various baselines algorithms compared are unable to handle datasets of this size. We note that SIITA, our proposed method, doesn’t have such scalability concerns. In fact, as we show later in Section 5.4, SIITA is able to process datasets of much larger sizes. In order to create YELP out of YELP(Full), we select the top frequent 1000 users and top 1000 frequent businesses and create the corresponding tensor and side information matrix. After the sampling, we obtain a tensor of dimensions 1000 (user) 992 (business) 93 (year-month) and a side information matrix of dimensions 992 (business) 56 (city).

MovieLens 100K YELP
Modes user movie week user business year-month
Tensor Size 943168231 100099293
Starting size 19342 20202
Increment step 19, 34, 1 20, 20, 2
Sideinfo matrix 1682 (movie) 19 (genre) 992 (business) 56 (city)
Table 2: Summary of datasets used in the paper. The starting size and increment size given in the table are for Multi-Aspect Streaming setting. For Streaming setting, the tensor grows in the third dimension, one slice at every time step.

5.1 Multi-Aspect Streaming Setting

Dataset Missing% Rank MAST SIITA
MovieLens 100K 20% 3 1.60 1.23
5 1.53 1.29
10 1.48 2.49
50% 3 1.74 1.28
5 1.75 1.29
10 1.64 2.55
80% 3 2.03 1.59
5 1.98 1.61
10 2.02 2.96
YELP 20% 3 1.90 1.43
5 1.92 1.54
10 1.93 4.03
50% 3 1.94 1.51
5 1.94 1.67
10 1.96 4.04
80% 3 1.97 1.71
5 1.97 1.61
10 1.97 3.49
Table 3: Test RMSE (lower is better) averaged across all the time steps in the multi-aspect streaming tensor sequence setting (Definition 4) for MAST and SIITA. SIITA, the proposed method, outperforms MAST for all the datasets. Section 5.1 provides more details.
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing)
Figure 2: Evolution of test RMSE of MAST and SIITA with each time step. For both the datasets, SIITA attains a stable performance after a few time steps, while the performance of MAST degrades with every time step. Refer to Section 5.1 for more details.
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing)
Figure 3: Runtime comparison between MAST and SIITA at every time step. SIITA is significantly faster than MAST. Refer to Section 5.1 for more details.

We start with experimental analysis of the model in the multi-aspect streaming setting, for which we consider MAST [29] as it is the state-of-the-art baseline..
MAST [29]: MAST is a dynamic low-rank tensor completion algorithm, which enforces nuclear norm regularization on the decomposition matrices of CP. A tensor-based Alternating Direction Method of Multipliers is used for solving the optimization problem.

We experiment with MovieLens 100K and YELP datasets. Since the third mode is time in both the datasets, i.e., (week) in MovieLens 100K and (year-month) in YELP, one way to simulate the multi-aspect streaming sequence (Definition 3) is by considering every slice in third-mode as one time step in the sequence, and letting the tensor grow along other two modes with every time step, similar to the ladder structure given in [29, Section  3.3]. Note that this is different from the traditional streaming setting, where the tensor only grows in time mode while the other two modes remain fixed. In contrast, in the multi-aspect setting here, there can be new users joining the system within the same month but on different days or different movies getting released on different days in the same week etc. Therefore in our simulations we consider the third mode as any normal mode and generate a more general multi-aspect streaming tensor sequence, the details of the sizes of starting tensor and increase in size at every time step are given in Table 2. Parameters for MAST are set based on the guidelines provided in [29,  Section 4.3].

We compute the root mean square error on test data (test RMSE; lower is better) at every time step and report the test RMSE averaged across all the time steps in Table 3. We perform experiments on multiple train-test splits for each dataset. We vary the test percentage, denoted by Missing% in Table 3, and the rank of decomposition, denoted by Rank for both the datasets. For every (Missing%, Rank) combination, we run both models on ten random train-test splits and report the average. For SIITA, Rank = in Table 3 represents the Tucker-rank .

As can be seen from Table 3, the proposed SIITA achieves better results than MAST. Figure 2 shows the plots for test RMSE at every time step. Since SIITA handles the sparsity in the data effectively, as a result SIITA is significantly faster than MAST, which can be seen from Figure 3. Overall, we find that SIITA, the proposed method, is more effective and faster compared to MAST in the multi-aspect streaming setting.

5.2 Streaming Setting

(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing)
Figure 4: Evolution of Test RMSE of TeCPSGD, OLSTEC and SIITA with each time step. In both datasets, SIITA performs significantly better than the baseline algorithms in the pure streaming setting. Refer to Section 5.2 for more details.
(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing)
Figure 5: Runtime comparison between TeCPSGD, OLSTEC and SIITA. SIITA is able to exploit sparsity in the data and is much faster. Refer to Section 5.2 for more details.
Dataset Missing% Rank TeCPSGD OLSTEC SIITA
MovieLens 100K 20% 3 3.39 5.46 1.53
5 3.35 4.65 1.54
10 3.19 4.96 1.71
50% 3 3.55 8.39 1.63
5 3.40 6.73 1.64
10 3.23 3.66 1.73
80% 3 3.78 3.82 1.79
5 3.77 3.80 1.75
10 3.84 4.34 2.47
YELP 20% 3 4.55 4.04 1.45
5 4.79 4.04 1.59
10 5.17 4.03 2.85
50% 3 4.67 4.03 1.55
5 5.03 4.03 1.67
10 5.25 4.03 2.69
80% 3 4.99 4.02 1.73
5 5.17 4.02 1.78
10 5.31 4.01 2.62
Table 4: Test RMSE averaged across all the time steps in the streaming setting for TeCPSGD, OLSTEC, a state-of-the-art streaming tensor completion algorithm, and SIITA. SIITA outperforms the baseline algorithms significantly. See Section 5.2 for more details.

In this section, we simulate the pure streaming setting by letting the tensor grow only in the third mode at every time step. The number of time steps for each dataset in this setting is the dimension of the third mode, i.e., 31 for MovieLens 100K and 93 for YELP. We compare the performance of SIITA with TeCPSGD and OLSTEC algorithms in the streaming setting.
 
TeCPSGD [20]: TeCPSGD is an online Stochastic Gradient Descent based algorithm for recovering missing data in streaming tensors. This algorithm is based on PARAFAC decomposition. TeCPSGD is the first proper tensor completion algorithm in the dynamic setting.
 
OLSTEC [16]: OLSTEC is an online tensor tracking algorithm for partially observed data streams corrupted by noise. OLSTEC is a second order stochastic gradient descent algorithm based on CP decomposition exploiting recursive least squares. OLSTEC is the state-of-the-art for streaming tensor completion.

We report test RMSE, averaged across all time steps, for both MovieLens 100K and YELP datasets. Similar to the multi-aspect streaming setting, we run all the algorithms for multiple train-test splits and for each split we run all the algorithms with different ranks. For every (Missing%, Rank) combination, we run all the algorithms on ten random train-test splits and report the average. SIITA significantly outperforms all the baselines in this setting, as can be seen from Table 4. Figure 4 shows the average test RMSE of every algorithm at every time step. From Figure 5 it can be seen that SIITA takes much less time compared to other algorithms. The spikes in the plots suggest that the particular slices are relatively less sparse.

5.3 Batch Setting

Dataset Missing% Rank AirCP SIITA
MovieLens 100K 20% 3 3.351 1.534
5 3.687 1.678
10 3.797 2.791
50% 3 3.303 1.580
5 3.711 1.585
10 3.894 2.449
80% 3 3.883 1.554
5 3.997 1.654
10 3.791 3.979
YELP 20% 3 1.094 1.052
5 1.086 1.056
10 1.077 1.181
50% 3 1.096 1.097
5 1.095 1.059
10 1.719 1.599
80% 3 1.219 1.199
5 1.118 1.156
10 2.210 2.153
Table 5: Mean Test RMSE across multiple train-test splits in the Batch setting. SIITA achieves lower test RMSE on both the datasets compared to AirCP, a state-of-the-art algorithm for this setting. Refer to Section 5.3 for details.

Even though our primary focus is on proposing an algorithm for the multi-aspect streaming setting, SIITA can be run as a tensor completion algorithm with side information in the batch (i.e., non streaming) setting as well. To run in batch mode, we set in Algorithm 1 and run for multiple passes over the data. In this setting, AirCP [8] is the current state-of-the-art algorithm which is also capable of handling side information. We consider AirCP as the baseline in this section. The main focus of this setting is to demonstrate that SIITA does a good job in incorporating the side information.  
AirCP [8]: AirCP is a CP based tensor completion algorithm proposed for recovering the spatio-temporal dynamics of online memes. This algorithm incorporates auxiliary information from memes, locations and times. An alternative direction method of multipliers (ADMM) based algorithm is employed for solving the optimization.

AirCP expects the side information matrices to be similarity matrices and takes input the Laplacian of the similarity matrices. However, in the datasets we experiment with, the side information is available as feature matrices. Therefore, we consider the covariance matrices as similarity matrices.

We run both algorithms till convergence and report test RMSE. For each dataset, we experiment with different levels of test set sizes, and for each such level, we run our experiments on 10 random splits. We report the mean test RMSE per train-test percentage split. We run our experiments with multiple ranks of factorization. Results are summarized in Table 5. From this table, we observe that SIITA achieves better results. Note that the rank for SIITA is the Tucker rank, i.e., rank = 3. This implies a factorization rank of (3, 3, 3) for SIITA.

Remark: Since all the baselines considered for various settings are CP based, we only compare for CP tensor rank. From Tables 3, 4 and 5 it can be seen that the performance suffers for rank = 10. However, when we run SIITA with a rank = (10, 10, 2) we achieve lower test RMSE.

5.4 Analyzing Merits of Side Information

Dataset Missing% Rank SIITA (w/o SI) SIITA
MovieLens 100K 20% 3 1.19 1.23
5 1.19 1.29
10 2.69 2.49
50% 3 1.25 1.28
5 1.25 1.29
10 3.28 2.55
80% 3 1.45 1.59
5 1.42 1.61
10 2.11 2.96
YELP 20% 3 1.44 1.43
5 1.48 1.54
10 3.90 4.03
50% 3 1.57 1.51
5 1.62 1.67
10 5.48 4.04
80% 3 1.75 1.71
5 1.67 1.61
10 5.28 3.49
Table 6: Test RMSE averaged across multiple train-test splits in the Multi-Aspect Streaming setting, analyzing the merits of side information. See Section 5.4 for more details.
Dataset Missing% Rank SIITA (w/o SI) SIITA
MovieLens 100K 20% 3 1.46 1.53
5 1.53 1.54
10 1.55 1.71
50% 3 1.58 1.63
5 1.67 1.64
10 1.56 1.73
80% 3 1.76 1.79
5 1.74 1.75
10 2.31 2.47
YELP 20% 3 1.46 1.45
5 1.62 1.59
10 2.82 2.85
50% 3 1.57 1.55
5 1.69 1.67
10 2.54 2.67
80% 3 1.76 1.73
5 1.80 1.78
10 2.25 2.62
Table 7: Test RMSE averaged across multiple train-test splits in the streaming setting, analyzing the merits of side information. See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 6: Evolution of test RMSE with every time step in the multi-aspect streaming setting for SIITA and SIITA (w/o SI). See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 7: Run Time comparison between SIITA and SIITA (w/o SI) in the multi-aspect streaming setting. See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 8: Evolution of test RMSE with every time step in the streaming setting for SIITA and SIITA(w/o SI). See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 9: Run Time comparison between SIITA and SIITA (w/o SI) in the Streaming setting. See Section 5.4 for more details.
(a) Test RMSE at every time step (b) Run Time at every time step
Figure 10: Investigating the merits of side information for MovieLens 1M dataset in the multi-aspect streaming setting. Side information along the user mode is the most useful for tensor completion. See Section 5.4 for more details.
(a) Evolution of Test RMSE against epochs. (b) Time elapsed with every epoch.
Figure 11: Investigating the merits of side information for MovieLens 1M dataset in the batch setting. Side information along the user mode is the most useful for tensor completion. See Section 5.4 for more details.

Our goal in this paper is to propose a flexible framework using which side information may be easily incorporated during incremental tensor completion, especially in the multi-aspect streaming setting. Our proposed method, SIITA, is motivated by this need. In order to evaluate merits of different types of side information on SIITA, in this section we report several experiments where performances of SIITA with and without various types of side information are compared.

Single Side Information: In the first experiment, we compare SIITA with and without side information (by setting side information to identity; see last paragraph of Section 3). We run the experiments in both multi-aspect streaming and streaming settings. Table 6 reports the mean test RMSE of SIITA and SIITA (w/o SI), which stands for running SIITA without side information, for both datasets in multi-aspect streaming setting. For MovieLens 100K, SIITA achieves better performance without side information. Whereas for YELP, SIITA performs better with side information. Figure 6 shows the evolution of test RMSE at every time step for both datasets. Figure 7 shows the runtime of SIITA when run with and without side information. SIITA runs faster in the presence of side information. Table 7 reports the mean test RMSE for both the datasets in the streaming setting. Similar to the multi-aspect streaming setting, SIITA achieves better performance without side information for MovieLens 100K dataset and with side information for YELP dataset. Figure 8 shows the test RMSE of SIITA at every time step when run with side information and without side information. Figure 9 shows the runtime at every time step.

Multi Side Information: In all the datasets and experiments considered so far, side information along only one mode is available to SIITA. In this next experiment, we consider the setting where side information along multiple modes are available. For this experiment, we consider the MovieLens 1M [11] dataset, a standard dataset of 1 million movie ratings. This dataset consists of a 6040 (user) 3952 (movie) 149 (week) tensor, along with two side information matrices: a 6040 (user) 21 (occupation) matrix, and a 3952 (movie) 18 (genre) matrix. As this dataset consists of side information along multiple modes, it gives us an opportunity to perform this study conclusively.

Note that among all the methods considered in the paper, SIITA is the only method which scales to the size of MovieLens 1M datasets.

We create four variants of the dataset. The first one with the tensor and all the side information matrices denoted by MovieLens 1M, the second one with the tensor and only the side information along the movie mode denoted by MovieLens 1M (movie mode). Similarly, MovieLens (user mode) with only user mode side information, and finally MovieLens 1M (no si) with only the tensor and no side information.

We run SIITA in multi-aspect streaming and batch modes for all the four variants. Test RMSE at every time step in the multi-aspect streaming setting is shown in Figure 10(a). Evolution of Test RMSE (lower is better) against epochs are shown in Figure 11(a) in batch mode. From Figures 10(a) and 11(a), it is evident that the variant MovieLens 1M (user mode) achieves best overall performance, implying that the side information along the user mode is more useful for tensor completion in this dataset. However, MovieLens 1M (movie mode) achieves poorer performance than other variants implying that movie-mode side information is not useful for tensor completion in this case. This is also the only side information mode available to SIITA during the MovieLens 100K experiments in Tables 6 and 7. This sub-optimal side information may be a reason for SIITA’s diminished performance when using side information for MovieLens100K dataset. From the runtime comparisons in Figures 11 (b) and 10(b), we observe that MovieLens 1M (where both types of side information are available) takes the least time, while the variant MovieLens 1M (no si) takes the most time to run. This is a benefit we derive from the inductive framework, where in the presence of useful side information, SIITA not only helps in achieving better performance but also runs faster.

5.5 Unsupervised Setting

(a) MovieLens 100K (b) YELP
Figure 12: Average Purity of clusters learned by NN-SIITA and NN-SIITA (w/o SI) at every time step in the unsupervised setting. For both datasets, side information helps in learning purer clusters. See Section 5.5 for more details.
(a) MovieLens 100K (b) YELP
Figure 13: Evolution of mean average purity with for NN-SIITA and NN-SIITA (w/o SI) for both MovieLens 100K and YELP datasets. See Section 5.5 for more details.
Cluster (Action, Adventure, Sci-Fi) Cluster (Noisy)
MovieLens100K Movie Genres Movie Genres
The Empire Strikes Back (1980) Action, Adventure, Sci-Fi, Drama, Romance Toy Story (1995) Animation, Children’s, Comedy
Heavy Metal (1981) Action, Adventure, Sci-Fi, Animation, Horror From Dusk Till Dawn (1996) Action, Comedy, Crime, Horror, Thriller
Star Wars (1977) Action, Adventure, Sci-Fi, Romance, War Mighty Aphrodite (1995) Comedy
Return of the Jedi (1983) Action, Adventure, Sci-Fi, Romance, War Apollo 13 (1995) Action, Drama, Thriller
Men in Black (1997) Action, Adventure, Sci-Fi, Comedy Crimson Tide (1995) Drama, Thriller, War
Cluster (Phoenix) Cluster (Noisy)
YELP Business Location Business Location
Hana Japanese Eatery Phoenix The Wigman Litchfield Park
Herberger Theater Center Phoenix Hitching Post 2 Gold Canyon
Scramble A Breakfast Joint Phoenix Freddys Frozen Custard & Steakburgers Glendale
The Arrogant Butcher Phoenix Costco Avondale
FEZ Phoenix Hana Japanese Eatery Phoenix
Table 8: Example clusters learned by NN-SIITA for MovieLens 100K and YELP datasets. The first column is an example of a pure cluster and the second column is an example of noisy cluster. See Section 5.5 for more details.

In this section, we consider an unsupervised setting with an aim to discover underlying clusters of the items, like movies in MovieLens 100K dataset and businesses in YELP dataset etc, from a sequence of sparse tensors. It is desirable to mine clusters such that similar items are grouped together. Nonnegative constraints are essential for mining interpretable clusters as noted by [12, 22]. Therefore, for this set of experiments we consider the nonnegative version of SIITA denoted by NN-SIITA. We investigate whether side information helps in discovering more coherent clusters of items in both datasets.

We run our experiments in the multi-aspect streaming setting. At every time step, we compute Purity of clusters and report average Purity . Purity of a cluster is defined as the percentage of the cluster that is coherent. For example, in MovieLens 100K, a cluster of movies is 100% pure if all the movies belong to the same genre and 50% pure if only half of the cluster belong to the same genre. Formally, let clusters of items along mode- are desired, let be the rank of factorization along mode-. Every column of the matrix is considered a distribution of the items, the top- items of the distribution represent a cluster. For -th cluster, i.e., cluster representing column of the matrix , let items among the top- items belong to the same category, Purity and average Purity are defined as follows:

Note that Purity is computed per cluster, while average Purity is computed for a set of clusters. Higher average Purity indicates a better clustering.

We report average Purity at every time step for both the datasets. We run NN-SIITA with and without side information. Figure 12 shows average Purity at every time step for MovieLens 100K and YELP datasets. It is clear from Figure 12 that for both the datasets side information helps in discovering better clusters. We compute the Purity for MovieLens 100K dataset based on the genre information of the movies and for the YELP dataset we compute Purity based on the geographic locations of the businesses. Table 8 shows some example clusters learned by NN-SIITA. For MovieLens 100K dataset, each movie can belong to multiple genres. For computing the Purity, we consider the most common genre for all the movies in a cluster. Results shown in Figure 12 are for . However, we also vary between 5 and 25 and report the mean average-Purity, which is obtained by computing the mean across all the time steps in the multi-aspect streaming setting. As can be seen from Figure 13, having side information helps in learning better clusters for all the values of . For MovieLens 100K, the results reported are with a factorization rank of and for YELP, the rank of factorization is . Since this is an unsupervised setting, note that we use the entire data for factorization, i.e., there is no train-test split.

6 Conclusion

We propose an inductive framework for incorporating side information for tensor completion in multi-aspect streaming and streaming settings. The proposed framework can also be used for tensor completion with side information in batch setting. Given a completely new dataset with side information along multiple modes, SIITA can be used to analyze the merits of different side information for tensor completion. Besides performing better, SIITA is also significantly faster than state-of-the-art algorithms. We also propose NN-SIITA for incorporating nonnegative constraints and demonstrate how it can be used for mining interpretable clusters.

In many instances, the side information matrices are themselves incomplete [34]. In future, we plan to extend our proposed framework to recover missing data in the side information matrices besides completing the tensor.

References

  • [1] Evrim Acar, Tamara G. Kolda, and Daniel M. Dunlavy. All-at-once optimization for coupled matrix and tensor factorizations. In MLG, 2011.
  • [2] Alex Beutel, Partha Pratim Talukdar, Abhimanu Kumar, Christos Faloutsos, Evangelos E Papalexakis, and Eric P Xing. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In SDM, 2014.
  • [3] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Processing Magazine, 32(2):145–163, 2015.
  • [4] B. Ermiş, E. Acar, and A. T. Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. In KDD, 2015.
  • [5] Beyza Ermiş, Evrim Acar, and A Taylan Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. KDD, 2015.
  • [6] Hadi Fanaee-T and João Gama. Multi-aspect-streaming tensor analysis. Know.-Based Syst., (89):332–345, 2015.
  • [7] M. Filipović and A. Jukić. Tucker factorization with missing data with application to low-n-rank tensor completion. Multidimens Syst Signal Process, 2015.
  • [8] Hancheng Ge, James Caverlee, Nan Zhang, and Anna Squicciarini. Uncovering the spatio-temporal dynamics of memes in the presence of incomplete information. CIKM, 2016.
  • [9] Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems, 23(7):1087–1099, 2012.
  • [10] X. Guo, Q. Yao, and J. T. Kwok. Efficient sparse low-rank tensor completion using the frank-wolfe algorithm. In AAAI, 2017.
  • [11] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., pages 19:1–19:19, December 2015.
  • [12] Saara Hyvönen, Pauli Miettinen, and Evimaria Terzi. Interpretable nonnegative matrix decompositions. In KDD, pages 345–353. ACM, 2008.
  • [13] Prateek Jain and Inderjit S Dhillon. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626, 2013.
  • [14] ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrix-tensor factorization-algorithm and discoveries. In ICDE, 2016.
  • [15] H. Kasai and B. Mishra. Low-rank tensor completion: a riemannian manifold preconditioning approach. In ICML, 2016.
  • [16] Hiroyuki Kasai. Online low-rank tensor subspace tracking from incomplete data by cp decomposition using recursive least squares. In ICASSP, 2016.
  • [17] Yong-Deok Kim and Seungjin Choi. Nonnegative tucker decomposition. In CVPR, 2007.
  • [18] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
  • [19] Augustin Lefevre, Francis Bach, and Cédric Févotte. Online algorithms for nonnegative matrix factorization with the itakura-saito divergence. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on, pages 313–316. IEEE, 2011.
  • [20] Morteza Mardani, Gonzalo Mateos, and Georgios B Giannakis. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Transactions on Signal Processing, 2015.
  • [21] Morten Mørup, Lars Kai Hansen, and Sidse M Arnfred. Algorithms for sparse nonnegative tucker decompositions. Neural computation, 20(8):2112–2131, 2008.
  • [22] Brian Murphy, Partha Pratim Talukdar, and Tom M. Mitchell. Learning effective and interpretable semantic models using non-negative sparse embedding. In COLING, 2012.
  • [23] Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. Tensor factorization using auxiliary information. In Machine Learning and Knowledge Discovery in Databases, pages 501–516, 2011.
  • [24] Nagarajan Natarajan and Inderjit S Dhillon. Inductive matrix completion for predicting gene–disease associations. Bioinformatics, 30(12):i60–i68, 2014.
  • [25] Madhav Nimishakavi, Uday Singh Saini, and Partha Talukdar. Relation schema induction using tensor factorization with side information. In EMNLP, pages 414–423, 2016.
  • [26] Dimitr Nion and Nicholas D. Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a third-order tensor. IEEE Transactions on Signal Processing, 2009.
  • [27] Amnon Shashua and Tamir Hazan. Non-negative tensor factorization with applications to statistics and computer vision. In ICML, ICML ’05, pages 792–799, New York, NY, USA, 2005. ACM.
  • [28] Si Si, Kai-Yang Chiang, Cho-Jui Hsieh, Nikhil Rao, and Inderjit S Dhillon. Goal-directed inductive matrix completion. In KDD, 2016.
  • [29] Qingquan Song, Xiao Huang, Hancheng Ge, James Caverlee, and Xia Hu. Multi-aspect streaming tensor completion. In KDD, 2017.
  • [30] Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, 2006.
  • [31] Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. Incremental tensor analysis: Theory and applications. ACM Trans. Knowl. Discov. Data, 2(3), 2008.
  • [32] Panagiotis Symeonidis, Alexandros Nanopoulos, and Yannis Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In RecSys, 2008.
  • [33] Max Welling and Markus Weber. Positive tensor factorization. Pattern Recognition Letters, 22(12):1255–1261, 2001.
  • [34] Kishan Wimalawarne, Makoto Yamada, and Hiroshi Mamitsuka. Convex coupled matrix and tensor completion. arXiv preprint arXiv:1705.05197, 2017.
  • [35] Rose Yu, Dehua Cheng, and Yan Liu. Accelerated online low-rank tensor learning for multivariate spatio-temporal streams. In ICML, 2015.
  • [36] Renbo Zhao, Vincent Tan, and Huan Xu. Online nonnegative matrix factorization with general divergences. In AISTATS, pages 37–45, 2017.
  • [37] Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson. Accelerating online cp decompositions for higher order tensors. In KDD. ACM, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
199792
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description