Inductive Framework for MultiAspect Streaming
Tensor Completion with Side Information
Abstract
Lowrank tensor completion is a wellstudied problem and has applications in various fields. However, in many realworld applications the data is dynamic, i.e., the tensor grows as new data arrives. Besides the tensor, in many realworld scenarios, side information is also available in the form of matrices which also grow. Existing work on dynamic tensor completion do not incorporate side information and most of the previous work is based on the assumption that the tensor grows only in one mode. We bridge this gap in this paper by proposing a dynamic tensor completion framework called Side Information infused Incremental Tensor Analysis (SIITA), which incorporates side information and works for general incremental tensors. We carry out extensive experiments on multiple realworld datasets to demonstrate the effectiveness of SIITA in various different settings.
1 Introduction
Low rank tensor completion is a wellstudied problem and has various applications in the fields of recommendation systems [32], linkprediction [4], compressed sensing [3], to name a few. Majority of the previous works focus on solving the problem in a static setting [7, 10, 15]. However, most of the real world data is dynamic, for example in an online movie recommendation system the number of users and movies increase with time. It is prohibitively expensive to use the static algorithms for dynamic data. Therefore, there has been an increasing interest in developing algorithms for dynamic lowrank tensor completion [16, 20, 29].
Usually in many real world scenarios, besides the tensor data, additional side information is also available, e.g., in the form of matrices. In the dynamic scenarios, the side information grows with time as well. For instance, moviegenre information in the movie recommendation etc. There has been considerable amount of work in incorporating side information into tensor completion [23, 8]. However, the previous works on incorporating side information deal with the static setting. In this paper, we propose a dynamic lowrank tensor completion model that incorporates side information growing with time.
Most of the current dynamic tensor completion algorithms work in the streaming scenario, i.e., the case where the tensor grows only in one mode, which is usually the time mode. In this case, the side information is a static matrix. Multiaspect streaming scenario [6, 29], on the other hand, is a more general framework, where the tensor grows in all the modes of the tensor. In this setting, the side information matrices also grow. Figure 1 illustrates the difference between streaming and multiaspect streaming scenarios with side information.
Besides side information, incorporating nonnegative constraints into tensor decomposition is desirable in an unsupervised setting. Nonnegativity is essential for discovering interpretable clusters [12, 22]. Nonnegative tensor learning is explored for applications in computer vision [27, 17], unsupervised induction of relation schemas [25], to name a few. Several algorithms for online Nonnegative Matrix Factorization (NMF) exist in the literature [19, 9, 36], but algorithms for nonnegative online tensor decomposition with side information are not explored to the best of our knowledge. We also fill this gap by showing how nonnegative constraints can be enforced on the decomposition learned by our proposed framework SIITA.
(a) Streaming tensor sequence with side information. (b) Multiaspect streaming tensor sequence with side information. 
In this paper, we work with the more general multiaspect streaming scenario and make the following contributions:

Formally define the problem of multiaspect streaming tensor completion with side information.

Propose a Tucker based framework Side Information infused Incremental Tensor Analysis(SIITA) for the problem of multiaspect streaming tensor completion with side information. We employ a stochastic gradient descent (SGD) based algorithm for solving the optimization problem.

Incorporate nonnegative constraints with SIITA for discovering the underlying clusters in unsupervised setting.

Demonstrate the effectiveness of SIITA using extensive experimental analysis on multiple realworld datasets in all the settings.
The organization of the paper is as follows. In Section 3, we introduce the definition of multiaspect streaming tensor sequence with side information and discuss our proposed framework SIITA in Section 4. We also discuss how nonnegative constraints can incorporated into SIITA in Section 4. The experiments are shown in Section 5, where SIITA performs effectively in various settings. All our codes are implemented in Matlab, and can be found at https://madhavcsa.github.io/.
2 Related Work
Property  TeCPSGD [20]  OLSTEC [16]  MAST [29]  AirCP [8]  SIITA 

(this paper)  
Streaming  ✓  ✓  ✓  ✓  
MultiAspect Streaming  ✓  ✓  
Side Information  ✓  ✓  
Sparse Solution  ✓ 
Dynamic Tensor Completion : [30, 31] introduce the concept of dynamic tensor analysis by proposing multiple Higher order SVD based algorithms, namely Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA) and Windowbased Tensor Analysis (WTA) for the streaming scenario. [26] propose two adaptive online algorithms for CP decomposition of order tensors. [35] propose an accelerated online algorithm for tucker factorization in streaming scenario, while an accelerated online algorithm for CP decomposition is developed in [37].
A significant amount of research work is carried out for dynamic tensor decompositions, but work focusing on the problem of dynamic tensor completion is relatively less explored. Work by [20] can be considered a pioneering work in dynamic tensor completion. They propose a streaming tensor completion algorithm based on CP decomposition. Recent work by [16] is an accelerated second order Stochastic Gradient Descent (SGD) algorithm for streaming tensor completion based on CP decomposition. [6] introduces the problem of multiaspect streaming tensor analysis by proposing a histogram based algorithm. Recent work by [29] is a more general framework for multiaspect streaming tensor completion.
Tensor Completion with Auxiliary Information : [1] propose a Coupled Matrix Tensor Factorization (CMTF) approach for incorporating additional side information, similar ideas are also explored in [2] for factorization on hadoop and in [5] for link prediction in heterogeneous data. [23] propose within mode and crossmode regularization methods for incorporating similarity side information matrices into factorization. Based on similar ideas, [8] propose AirCP, a CPbased tensor completion algorithm.
[33] propose nonnegative tensor decmpositon by incorporating nonnegative constraints into CP decomposition. Nonnegative CP decomposition is explored for applications in computer vision in [27]. Algorithms for nonnegative Tucker decomposition are proposed in [17] and for sparse nonnegative Tucker decomposition are proposed in [21]. However, to the best our knowledge, nonnegative tensor decomposition algorithms do not exist for dynamic settings, a gap we fill in this paper.
Inductive framework for matrix completion with side information is proposed in [13, 24, 28], which has not been explored for tensor completion to the best of our knowledge. In this paper, we propose an online inductive framework for multiaspect streaming tensor completion.
Table 1 provides details about the differences between our proposed SIITA and various baseline tensor completion algorithms.
3 Preliminaries
An order or mode tensor is an way array. We use boldface calligraphic letters to represent tensors (e.g., ), boldface uppercase to represent matrices (e.g., U), and boldface lowercase to represent vectors (e.g., v). represents the entry of indexed by .
Definition 1 (Coupled Tensor and Matrix) [29]: A matrix and a tensor are called coupled if they share a mode. For example, a tensor and a matrix are coupled along the movie mode.
Definition 2 (Tensor Sequence) [29]: A sequence of order tensors is called a tensor sequence denoted as , where each at time instance .
Definition 3 (Multiaspect streaming Tensor Sequence) [29]:
A tensor sequence of order tensors is called a multiaspect streaming tensor sequence if for any , is the subtensor of , i.e.,
Here, increases with time, and is the snapshot tensor of this sequence at time .
Definition 4 (Multiaspect streaming Tensor Sequence with Side Information) : Given a time instance , let be a side information (SI) matrix corresponding to the mode of (i.e., rows of are coupled along mode of ). While the number of rows in the SI matrices along a particular mode may increase over time, the number of columns remain the same, i.e., is not dependent on time. In particular, we have,
Putting side information matrices of all the modes together, we get the side information set ,
Given an order multiaspect streaming tensor sequence , we define a multiaspect streaming tensor sequence with side information as .
We note that all modes may not have side information available. In such cases, an identity matrix of appropriate size may be used as , i.e., , where .
The problem of multiaspect streaming tensor completion with side information is formally defined as follows:
Problem Definition: Given a multiaspect streaming tensor sequence with side information , the goal is to predict the missing values in by utilizing only entries in the relative complement and the available side information .
4 Proposed Framework SIITA
In this section, we discuss the proposed framework SIITA for the problem of multiaspect streaming tensor completion with side information. Let be an order multiaspect streaming tensor sequence with side information. Assuming that, at every time step, are only observed for some indices , where is a subset of the complete set of indices . Let the sparsity operator be defined as:
Tucker tensor decomposition [18], is a form of higherorder PCA for tensors. It decomposes an order tensor into a core tensor multiplied by a matrix along each mode as follows
where, are the factor matrices and can be thought of as principal components in each mode. The tensor is called the core tensor, which shows the interaction between different components. is the (multilinear) rank of the tensor. The mode matrix product of a tensor with a matrix is denoted by , more details can be found in [18]. The standard approach of incorporating side information while learning factor matrices in Tucker decomposition is by using an additive term as a regularizer [23]. However, in an online setting the additive side information term poses challenges as the side information matrices are also dynamic. Therefore, we propose the following fixedrank inductive framework for recovering missing values in , at every time step :
(1) 
where
(2) 
is the Frobenius norm, and are the regularization weights. Conceptually, the inductive framework models the ratings of the tensor as a weighted scalar product of the side information matrices. Note that (1) is a generalization of the inductive matrix completion framework [13, 24, 28], which has been effective in many applications.
The inductive tensor framework has twofold benefits over the typical approach of incorporating side information as an additive term. The use of terms in the factorization reduces the dimensionality of variables from to and typically . As a result, computational time required for computing the gradients and updating the variables decreases remarkably. Similar to [17], we define
which collects Kronecker products of mode matrices except for in a backward cyclic manner.
By updating the variables using gradients given in (3), we can recover the missing entries in at every time step , however that is equivalent to performing a static tensor completion at every time step. Therefore, we need an incremental scheme for updating the variables. Let and represent the variables at time step , then
(4) 
since is recovered at the time step , the problem is equivalent to using only
for updating the variables at time step .
We propose to use the following approach to update the variables at every time step , i.e.,
(5) 
where is the step size for the gradients. , needed for computing the gradients of , is given by
(6) 
Algorithm 1 summarizes the procedure described above. The computational cost of implementing Algorithm 1 depends on the update of the variables (5) and the computations in (6). The cost of computing is . The cost of performing the updates (5) is . Overall, at every time step, the computational cost of Algorithm 1 is .
Extension to the nonnegative case: NNSIITA
We now discuss how nonnegative constraints can be incorporated into the decomposition learned by SIITA. Nonnegative constraints allow the factor of the tensor to be interpretable.
We denote SIITA with nonnegative constraints with NNSIITA. At every time step in the multiaspect streaming setting, we seek to learn the following decomposition:
(7) 
where is as given in (2).
We employ a projected gradient descent based algorithm for solving the optimization problem in (7). We follow the same incremental update scheme discussed in Algorithm 1, however we use a projection operator defined below for updating the variables. For NNSIITA, (5) is replaced with
where is the elementwise projection operator defined as
The projection operator maps a point back to the feasible region ensuring that the factor matrices and the core tensor are always nonnegative with iterations.
5 Experiments
We evaluate SIITA against other stateoftheart baselines in two dynamic settings viz., (1) multiaspect streaming setting (Section 5.1), and (2) traditional streaming setting (Section 5.2). We then evaluate effectiveness of SIITA in the nonstreaming batch setting (Section 5.3). We analyze the effect of different types of side information in Section 5.4. Finally, we evaluate the performance of NNSIITA in the unsupervised setting in Section 5.5.
Datasets: Datasets used in the experiments are summarized in Table 2. MovieLens 100K [11] is a standard movie recommendation dataset. YELP is a downsampled version of the YELP(Full) dataset [14]. The YELP(Full) review dataset consists of 70K (user) 15K (business) 108 (yearmonth) tensor, and a side information matrix of size 15K (business) 68 (city). We select a subset of this dataset as the various baselines algorithms compared are unable to handle datasets of this size. We note that SIITA, our proposed method, doesn’t have such scalability concerns. In fact, as we show later in Section 5.4, SIITA is able to process datasets of much larger sizes.
In order to create YELP out of YELP(Full), we select the top frequent 1000 users and top 1000 frequent businesses and create the corresponding tensor and side information matrix.
After the sampling, we obtain a tensor of dimensions 1000 (user) 992 (business) 93 (yearmonth) and a side information matrix of dimensions 992 (business) 56 (city).
MovieLens 100K  YELP  

Modes  user movie week  user business yearmonth 
Tensor Size  943168231  100099293 
Starting size  19342  20202 
Increment step  19, 34, 1  20, 20, 2 
Sideinfo matrix  1682 (movie) 19 (genre)  992 (business) 56 (city) 
5.1 MultiAspect Streaming Setting
Dataset  Missing%  Rank  MAST  SIITA 

MovieLens 100K  20%  3  1.60  1.23 
5  1.53  1.29  
10  1.48  2.49  
50%  3  1.74  1.28  
5  1.75  1.29  
10  1.64  2.55  
80%  3  2.03  1.59  
5  1.98  1.61  
10  2.02  2.96  
YELP  20%  3  1.90  1.43 
5  1.92  1.54  
10  1.93  4.03  
50%  3  1.94  1.51  
5  1.94  1.67  
10  1.96  4.04  
80%  3  1.97  1.71  
5  1.97  1.61  
10  1.97  3.49 
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing) 
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing) 
We start with experimental analysis of the model in the multiaspect streaming setting, for which we consider MAST [29] as it is the stateoftheart baseline..
MAST [29]: MAST is a dynamic lowrank tensor completion algorithm, which enforces nuclear norm regularization on the decomposition matrices of CP. A tensorbased Alternating Direction Method of Multipliers is used for solving the optimization problem.
We experiment with MovieLens 100K and YELP datasets. Since the third mode is time in both the datasets, i.e., (week) in MovieLens 100K and (yearmonth) in YELP, one way to simulate the multiaspect streaming sequence (Definition 3) is by considering every slice in thirdmode as one time step in the sequence, and letting the tensor grow along other two modes with every time step, similar to the ladder structure given in [29, Section 3.3]. Note that this is different from the traditional streaming setting, where the tensor only grows in time mode while the other two modes remain fixed. In contrast, in the multiaspect setting here, there can be new users joining the system within the same month but on different days or different movies getting released on different days in the same week etc. Therefore in our simulations we consider the third mode as any normal mode and generate a more general multiaspect streaming tensor sequence, the details of the sizes of starting tensor and increase in size at every time step are given in Table 2. Parameters for MAST are set based on the guidelines provided in [29, Section 4.3].
We compute the root mean square error on test data (test RMSE; lower is better) at every time step and report the test RMSE averaged across all the time steps in Table 3. We perform experiments on multiple traintest splits for each dataset. We vary the test percentage, denoted by Missing% in Table 3, and the rank of decomposition, denoted by Rank for both the datasets. For every (Missing%, Rank) combination, we run both models on ten random traintest splits and report the average. For SIITA, Rank = in Table 3 represents the Tuckerrank .
As can be seen from Table 3, the proposed SIITA achieves better results than MAST. Figure 2 shows the plots for test RMSE at every time step. Since SIITA handles the sparsity in the data effectively, as a result SIITA is significantly faster than MAST, which can be seen from Figure 3. Overall, we find that SIITA, the proposed method, is more effective and faster compared to MAST in the multiaspect streaming setting.
5.2 Streaming Setting
(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing) 
(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing) 
Dataset  Missing%  Rank  TeCPSGD  OLSTEC  SIITA 

MovieLens 100K  20%  3  3.39  5.46  1.53 
5  3.35  4.65  1.54  
10  3.19  4.96  1.71  
50%  3  3.55  8.39  1.63  
5  3.40  6.73  1.64  
10  3.23  3.66  1.73  
80%  3  3.78  3.82  1.79  
5  3.77  3.80  1.75  
10  3.84  4.34  2.47  
YELP  20%  3  4.55  4.04  1.45 
5  4.79  4.04  1.59  
10  5.17  4.03  2.85  
50%  3  4.67  4.03  1.55  
5  5.03  4.03  1.67  
10  5.25  4.03  2.69  
80%  3  4.99  4.02  1.73  
5  5.17  4.02  1.78  
10  5.31  4.01  2.62 
In this section, we simulate the pure streaming setting by letting the tensor grow only in the third mode at every time step. The number of time steps for each dataset in this setting is the dimension of the third mode, i.e., 31 for MovieLens 100K and 93 for YELP.
We compare the performance of SIITA with TeCPSGD and OLSTEC algorithms in the streaming setting.
TeCPSGD [20]: TeCPSGD is an online Stochastic Gradient Descent based algorithm for recovering missing data in streaming tensors. This algorithm is based on PARAFAC decomposition. TeCPSGD is the first proper tensor completion algorithm in the dynamic setting.
OLSTEC [16]: OLSTEC is an online tensor tracking algorithm for partially observed data streams corrupted by noise. OLSTEC is a second order stochastic gradient descent algorithm based on CP decomposition exploiting recursive least squares. OLSTEC is the stateoftheart for streaming tensor completion.
We report test RMSE, averaged across all time steps, for both MovieLens 100K and YELP datasets. Similar to the multiaspect streaming setting, we run all the algorithms for multiple traintest splits and for each split we run all the algorithms with different ranks. For every (Missing%, Rank) combination, we run all the algorithms on ten random traintest splits and report the average. SIITA significantly outperforms all the baselines in this setting, as can be seen from Table 4. Figure 4 shows the average test RMSE of every algorithm at every time step. From Figure 5 it can be seen that SIITA takes much less time compared to other algorithms. The spikes in the plots suggest that the particular slices are relatively less sparse.
5.3 Batch Setting
Dataset  Missing%  Rank  AirCP  SIITA 
MovieLens 100K  20%  3  3.351  1.534 
5  3.687  1.678  
10  3.797  2.791  
50%  3  3.303  1.580  
5  3.711  1.585  
10  3.894  2.449  
80%  3  3.883  1.554  
5  3.997  1.654  
10  3.791  3.979  
YELP  20%  3  1.094  1.052 
5  1.086  1.056  
10  1.077  1.181  
50%  3  1.096  1.097  
5  1.095  1.059  
10  1.719  1.599  
80%  3  1.219  1.199  
5  1.118  1.156  
10  2.210  2.153  
Even though our primary focus is on proposing an algorithm for the multiaspect streaming setting, SIITA can be run as a tensor completion algorithm with side information in the batch (i.e., non streaming) setting as well. To run in batch mode, we set in Algorithm 1 and run for multiple passes over the data. In this setting, AirCP [8] is the current stateoftheart algorithm which is also capable of handling side information. We consider AirCP as the baseline in this section.
The main focus of this setting is to demonstrate that SIITA does a good job in incorporating the side information.
AirCP [8]: AirCP is a CP based tensor completion algorithm proposed for recovering the spatiotemporal dynamics of online memes. This algorithm incorporates auxiliary information from memes, locations and times. An alternative direction method of multipliers (ADMM) based algorithm is employed for solving the optimization.
AirCP expects the side information matrices to be similarity matrices and takes input the Laplacian of the similarity matrices. However, in the datasets we experiment with, the side information is available as feature matrices. Therefore, we consider the covariance matrices as similarity matrices.
We run both algorithms till convergence and report test RMSE. For each dataset, we experiment with different levels of test set sizes, and for each such level, we run our experiments on 10 random splits. We report the mean test RMSE per traintest percentage split. We run our experiments with multiple ranks of factorization. Results are summarized in Table 5. From this table, we observe that SIITA achieves better results. Note that the rank for SIITA is the Tucker rank, i.e., rank = 3. This implies a factorization rank of (3, 3, 3) for SIITA.
5.4 Analyzing Merits of Side Information
Dataset  Missing%  Rank  SIITA (w/o SI)  SIITA 

MovieLens 100K  20%  3  1.19  1.23 
5  1.19  1.29  
10  2.69  2.49  
50%  3  1.25  1.28  
5  1.25  1.29  
10  3.28  2.55  
80%  3  1.45  1.59  
5  1.42  1.61  
10  2.11  2.96  
YELP  20%  3  1.44  1.43 
5  1.48  1.54  
10  3.90  4.03  
50%  3  1.57  1.51  
5  1.62  1.67  
10  5.48  4.04  
80%  3  1.75  1.71  
5  1.67  1.61  
10  5.28  3.49 
Dataset  Missing%  Rank  SIITA (w/o SI)  SIITA 

MovieLens 100K  20%  3  1.46  1.53 
5  1.53  1.54  
10  1.55  1.71  
50%  3  1.58  1.63  
5  1.67  1.64  
10  1.56  1.73  
80%  3  1.76  1.79  
5  1.74  1.75  
10  2.31  2.47  
YELP  20%  3  1.46  1.45 
5  1.62  1.59  
10  2.82  2.85  
50%  3  1.57  1.55  
5  1.69  1.67  
10  2.54  2.67  
80%  3  1.76  1.73  
5  1.80  1.78  
10  2.25  2.62 
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing) 
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing) 
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing) 
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing) 
(a) Test RMSE at every time step (b) Run Time at every time step 
(a) Evolution of Test RMSE against epochs. (b) Time elapsed with every epoch. 
Our goal in this paper is to propose a flexible framework using which side information may be easily incorporated during incremental tensor completion, especially in the multiaspect streaming setting. Our proposed method, SIITA, is motivated by this need. In order to evaluate merits of different types of side information on SIITA, in this section we report several experiments where performances of SIITA with and without various types of side information are compared.
Single Side Information: In the first experiment, we compare SIITA with and without side information (by setting side information to identity; see last paragraph of Section 3). We run the experiments in both multiaspect streaming and streaming settings. Table 6 reports the mean test RMSE of SIITA and SIITA (w/o SI), which stands for running SIITA without side information, for both datasets in multiaspect streaming setting. For MovieLens 100K, SIITA achieves better performance without side information. Whereas for YELP, SIITA performs better with side information. Figure 6 shows the evolution of test RMSE at every time step for both datasets. Figure 7 shows the runtime of SIITA when run with and without side information. SIITA runs faster in the presence of side information. Table 7 reports the mean test RMSE for both the datasets in the streaming setting. Similar to the multiaspect streaming setting, SIITA achieves better performance without side information for MovieLens 100K dataset and with side information for YELP dataset. Figure 8 shows the test RMSE of SIITA at every time step when run with side information and without side information. Figure 9 shows the runtime at every time step.
Multi Side Information: In all the datasets and experiments considered so far, side information along only one mode is available to SIITA. In this next experiment, we consider the setting where side information along multiple modes are available. For this experiment, we consider the MovieLens 1M [11] dataset, a standard dataset of 1 million movie ratings. This dataset consists of a 6040 (user) 3952 (movie) 149 (week) tensor, along with two side information matrices: a 6040 (user) 21 (occupation) matrix, and a 3952 (movie) 18 (genre) matrix. As this dataset consists of side information along multiple modes, it gives us an opportunity to perform this study conclusively.
Note that among all the methods considered in the paper, SIITA is the only method which scales to the size of MovieLens 1M datasets.
We create four variants of the dataset. The first one with the tensor and all the side information matrices denoted by MovieLens 1M, the second one with the tensor and only the side information along the movie mode denoted by MovieLens 1M (movie mode). Similarly, MovieLens (user mode) with only user mode side information, and finally MovieLens 1M (no si) with only the tensor and no side information.
We run SIITA in multiaspect streaming and batch modes for all the four variants. Test RMSE at every time step in the multiaspect streaming setting is shown in Figure 10(a). Evolution of Test RMSE (lower is better) against epochs are shown in Figure 11(a) in batch mode. From Figures 10(a) and 11(a), it is evident that the variant MovieLens 1M (user mode) achieves best overall performance, implying that the side information along the user mode is more useful for tensor completion in this dataset. However, MovieLens 1M (movie mode) achieves poorer performance than other variants implying that moviemode side information is not useful for tensor completion in this case. This is also the only side information mode available to SIITA during the MovieLens 100K experiments in Tables 6 and 7. This suboptimal side information may be a reason for SIITA’s diminished performance when using side information for MovieLens100K dataset. From the runtime comparisons in Figures 11 (b) and 10(b), we observe that MovieLens 1M (where both types of side information are available) takes the least time, while the variant MovieLens 1M (no si) takes the most time to run. This is a benefit we derive from the inductive framework, where in the presence of useful side information, SIITA not only helps in achieving better performance but also runs faster.
5.5 Unsupervised Setting
(a) MovieLens 100K (b) YELP 
(a) MovieLens 100K (b) YELP 
Cluster (Action, Adventure, SciFi)  Cluster (Noisy)  
MovieLens100K  Movie  Genres  Movie  Genres 
The Empire Strikes Back (1980)  Action, Adventure, SciFi, Drama, Romance  Toy Story (1995)  Animation, Children’s, Comedy  
Heavy Metal (1981)  Action, Adventure, SciFi, Animation, Horror  From Dusk Till Dawn (1996)  Action, Comedy, Crime, Horror, Thriller  
Star Wars (1977)  Action, Adventure, SciFi, Romance, War  Mighty Aphrodite (1995)  Comedy  
Return of the Jedi (1983)  Action, Adventure, SciFi, Romance, War  Apollo 13 (1995)  Action, Drama, Thriller  
Men in Black (1997)  Action, Adventure, SciFi, Comedy  Crimson Tide (1995)  Drama, Thriller, War  
Cluster (Phoenix)  Cluster (Noisy)  
YELP  Business  Location  Business  Location 
Hana Japanese Eatery  Phoenix  The Wigman  Litchfield Park  
Herberger Theater Center  Phoenix  Hitching Post 2  Gold Canyon  
Scramble A Breakfast Joint  Phoenix  Freddys Frozen Custard & Steakburgers  Glendale  
The Arrogant Butcher  Phoenix  Costco  Avondale  
FEZ  Phoenix  Hana Japanese Eatery  Phoenix 
In this section, we consider an unsupervised setting with an aim to discover underlying clusters of the items, like movies in MovieLens 100K dataset and businesses in YELP dataset etc, from a sequence of sparse tensors. It is desirable to mine clusters such that similar items are grouped together. Nonnegative constraints are essential for mining interpretable clusters as noted by [12, 22]. Therefore, for this set of experiments we consider the nonnegative version of SIITA denoted by NNSIITA. We investigate whether side information helps in discovering more coherent clusters of items in both datasets.
We run our experiments in the multiaspect streaming setting. At every time step, we compute Purity of clusters and report average Purity . Purity of a cluster is defined as the percentage of the cluster that is coherent. For example, in MovieLens 100K, a cluster of movies is 100% pure if all the movies belong to the same genre and 50% pure if only half of the cluster belong to the same genre. Formally, let clusters of items along mode are desired, let be the rank of factorization along mode. Every column of the matrix is considered a distribution of the items, the top items of the distribution represent a cluster. For th cluster, i.e., cluster representing column of the matrix , let items among the top items belong to the same category, Purity and average Purity are defined as follows:
Note that Purity is computed per cluster, while average Purity is computed for a set of clusters. Higher average Purity indicates a better clustering.
We report average Purity at every time step for both the datasets. We run NNSIITA with and without side information. Figure 12 shows average Purity at every time step for MovieLens 100K and YELP datasets. It is clear from Figure 12 that for both the datasets side information helps in discovering better clusters. We compute the Purity for MovieLens 100K dataset based on the genre information of the movies and for the YELP dataset we compute Purity based on the geographic locations of the businesses. Table 8 shows some example clusters learned by NNSIITA. For MovieLens 100K dataset, each movie can belong to multiple genres. For computing the Purity, we consider the most common genre for all the movies in a cluster. Results shown in Figure 12 are for . However, we also vary between 5 and 25 and report the mean averagePurity, which is obtained by computing the mean across all the time steps in the multiaspect streaming setting. As can be seen from Figure 13, having side information helps in learning better clusters for all the values of . For MovieLens 100K, the results reported are with a factorization rank of and for YELP, the rank of factorization is . Since this is an unsupervised setting, note that we use the entire data for factorization, i.e., there is no traintest split.
6 Conclusion
We propose an inductive framework for incorporating side information for tensor completion in multiaspect streaming and streaming settings. The proposed framework can also be used for tensor completion with side information in batch setting. Given a completely new dataset with side information along multiple modes, SIITA can be used to analyze the merits of different side information for tensor completion. Besides performing better, SIITA is also significantly faster than stateoftheart algorithms. We also propose NNSIITA for incorporating nonnegative constraints and demonstrate how it can be used for mining interpretable clusters.
In many instances, the side information matrices are themselves incomplete [34]. In future, we plan to extend our proposed framework to recover missing data in the side information matrices besides completing the tensor.
References
 [1] Evrim Acar, Tamara G. Kolda, and Daniel M. Dunlavy. Allatonce optimization for coupled matrix and tensor factorizations. In MLG, 2011.
 [2] Alex Beutel, Partha Pratim Talukdar, Abhimanu Kumar, Christos Faloutsos, Evangelos E Papalexakis, and Eric P Xing. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In SDM, 2014.
 [3] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, and H. A. Phan. Tensor decompositions for signal processing applications: From twoway to multiway component analysis. IEEE Signal Processing Magazine, 32(2):145–163, 2015.
 [4] B. Ermiş, E. Acar, and A. T. Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. In KDD, 2015.
 [5] Beyza Ermiş, Evrim Acar, and A Taylan Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. KDD, 2015.
 [6] Hadi FanaeeT and João Gama. Multiaspectstreaming tensor analysis. Know.Based Syst., (89):332–345, 2015.
 [7] M. Filipović and A. Jukić. Tucker factorization with missing data with application to lownrank tensor completion. Multidimens Syst Signal Process, 2015.
 [8] Hancheng Ge, James Caverlee, Nan Zhang, and Anna Squicciarini. Uncovering the spatiotemporal dynamics of memes in the presence of incomplete information. CIKM, 2016.
 [9] Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan. Online nonnegative matrix factorization with robust stochastic approximation. IEEE Transactions on Neural Networks and Learning Systems, 23(7):1087–1099, 2012.
 [10] X. Guo, Q. Yao, and J. T. Kwok. Efficient sparse lowrank tensor completion using the frankwolfe algorithm. In AAAI, 2017.
 [11] F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., pages 19:1–19:19, December 2015.
 [12] Saara Hyvönen, Pauli Miettinen, and Evimaria Terzi. Interpretable nonnegative matrix decompositions. In KDD, pages 345–353. ACM, 2008.
 [13] Prateek Jain and Inderjit S Dhillon. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626, 2013.
 [14] ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrixtensor factorizationalgorithm and discoveries. In ICDE, 2016.
 [15] H. Kasai and B. Mishra. Lowrank tensor completion: a riemannian manifold preconditioning approach. In ICML, 2016.
 [16] Hiroyuki Kasai. Online lowrank tensor subspace tracking from incomplete data by cp decomposition using recursive least squares. In ICASSP, 2016.
 [17] YongDeok Kim and Seungjin Choi. Nonnegative tucker decomposition. In CVPR, 2007.
 [18] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
 [19] Augustin Lefevre, Francis Bach, and Cédric Févotte. Online algorithms for nonnegative matrix factorization with the itakurasaito divergence. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on, pages 313–316. IEEE, 2011.
 [20] Morteza Mardani, Gonzalo Mateos, and Georgios B Giannakis. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Transactions on Signal Processing, 2015.
 [21] Morten Mørup, Lars Kai Hansen, and Sidse M Arnfred. Algorithms for sparse nonnegative tucker decompositions. Neural computation, 20(8):2112–2131, 2008.
 [22] Brian Murphy, Partha Pratim Talukdar, and Tom M. Mitchell. Learning effective and interpretable semantic models using nonnegative sparse embedding. In COLING, 2012.
 [23] Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. Tensor factorization using auxiliary information. In Machine Learning and Knowledge Discovery in Databases, pages 501–516, 2011.
 [24] Nagarajan Natarajan and Inderjit S Dhillon. Inductive matrix completion for predicting gene–disease associations. Bioinformatics, 30(12):i60–i68, 2014.
 [25] Madhav Nimishakavi, Uday Singh Saini, and Partha Talukdar. Relation schema induction using tensor factorization with side information. In EMNLP, pages 414–423, 2016.
 [26] Dimitr Nion and Nicholas D. Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a thirdorder tensor. IEEE Transactions on Signal Processing, 2009.
 [27] Amnon Shashua and Tamir Hazan. Nonnegative tensor factorization with applications to statistics and computer vision. In ICML, ICML ’05, pages 792–799, New York, NY, USA, 2005. ACM.
 [28] Si Si, KaiYang Chiang, ChoJui Hsieh, Nikhil Rao, and Inderjit S Dhillon. Goaldirected inductive matrix completion. In KDD, 2016.
 [29] Qingquan Song, Xiao Huang, Hancheng Ge, James Caverlee, and Xia Hu. Multiaspect streaming tensor completion. In KDD, 2017.
 [30] Jimeng Sun, Dacheng Tao, and Christos Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In KDD, 2006.
 [31] Jimeng Sun, Dacheng Tao, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos. Incremental tensor analysis: Theory and applications. ACM Trans. Knowl. Discov. Data, 2(3), 2008.
 [32] Panagiotis Symeonidis, Alexandros Nanopoulos, and Yannis Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In RecSys, 2008.
 [33] Max Welling and Markus Weber. Positive tensor factorization. Pattern Recognition Letters, 22(12):1255–1261, 2001.
 [34] Kishan Wimalawarne, Makoto Yamada, and Hiroshi Mamitsuka. Convex coupled matrix and tensor completion. arXiv preprint arXiv:1705.05197, 2017.
 [35] Rose Yu, Dehua Cheng, and Yan Liu. Accelerated online lowrank tensor learning for multivariate spatiotemporal streams. In ICML, 2015.
 [36] Renbo Zhao, Vincent Tan, and Huan Xu. Online nonnegative matrix factorization with general divergences. In AISTATS, pages 37–45, 2017.
 [37] Shuo Zhou, Nguyen Xuan Vinh, James Bailey, Yunzhe Jia, and Ian Davidson. Accelerating online cp decompositions for higher order tensors. In KDD. ACM, 2016.