OCTen: Online Compressionbased Tensor Decomposition
Abstract
Tensor decompositions are powerful tools for large data analytics as they jointly model multiple aspects of data into one framework and enable the discovery of the latent structures and higherorder correlations within the data. One of the most widely studied and used decompositions, especially in data mining and machine learning, is the Canonical Polyadic or CP decomposition. However, today’s datasets are not static and these datasets often dynamically growing and changing with time. To operate on such large data, we present OCTen the first ever compressionbased online parallel implementation for the CP decomposition.We conduct an extensive empirical analysis of the algorithms in terms of fitness, memory used and CPU time, and in order to demonstrate the compression and scalability of the method, we apply OCTen to big tensor data. Indicatively, OCTen performs onpar or better than stateoftheart online and offline methods in terms of decomposition accuracy and efficiency, while saving up to 40200 % memory space.
copyrightbox
1 Introduction
A Tensor is a multiway array of elements that represents higherorder or multiaspect data. In recent years, tensor decompositions have gained increasing popularity in big data analytics [18]. In higherorder structure, tensor decomposition are capable of finding complex patterns and higherorder correlations within the data. Corresponding to matrix factorization tools like SVD (Singular Value Decomposition), there exist generalizations for the tensor domain, with the most widely used being CANDECOMP/PARAFAC or CP [8] which extracts interpretable factors from tensor data, and Tucker decomposition [24], which is known for estimating the joint subspaces of tensor data. In this work we focus only on the CP decomposition, which is extremely effective in exploratory knowledge discovery on multiaspect data. In the era of information explosion, data is generated or modified in large volume. In such environments, data may be added or removed from any of the dimensions with high velocity. When using tensors to represent this dynamically changing data, an instance of the problem is that of a “streaming”, “incremental”, or “online” tensors^{1}^{1}1Notice that the literature (and thereby this paper) uses the above terms as well as “dynamic” interchangeably.. Considering an example of time evolving social network interactions, where a large number of users interact with each other every second (Facebook users update information and Twitter users send tweets every single minute^{2}^{2}2http://mashable.com/2012/06/22/datacreatedeveryminute/); every such snapshot of interactions is a new incoming slice(s) to the tensor on its “time” mode, which is seen as a streaming update. Additionally, the tensor may be growing in all of its nmodes, especially in complex and evolving environments such as online social networks. In this paper, our goal is, given an already computed CP decomposition, to track the CP decomposition of an online tensor, as it receives streaming updates, 1) efficiently, being much faster than recomputing the entire decomposition from scratch after every update, and utilizing small amount of memory, and 2) accurately, incurring an approximation error that is as close as possible to the decomposition of the full tensor. For exposition purposes, we focus on the timeevolving/streaming scenario, where a threemode tensor grows on the third (“time”) mode, however, our work extends to cases where more than one modes is online.
As the volume and velocity of data grow, the need for time and spaceefficient online tensor decomposition is imperative. There already exists a modest amount of prior work in online tensor decomposition both for Tucker [2, 23] and CP [15, 27].However, most of the existing online methods [2, 27, 15] , model the data in the full space, which can become very memory taxing as the size of the data grows. There exist memory efficient tensor decompositions, indicatively MET for Tucker [12] and PARACOMP [21] for CP, neither of which are able to handle online tensors. In this paper, we fill that gap.
Online tensor decomposition is a challenging task due to the following reasons. First, maintaining highaccuracy (competitive to decomposing the full tensor) using significantly fewer computations and memory than the full decomposition calls for innovative and, ideally, sublinear approaches. Second, operating on the full ambient space of data, as the tensor is being updated online, leads to superlinear increase in time and space complexity, rendering such approaches hard to scale, and calling for efficient methods that work on memory spaces which are significantly smaller than the original ambient data dimensions. Third, in many real settings, more than one modes of the tensor may receive streaming updates at different points in time, and devising a flexible algorithm that can handle such updates seamlessly is a challenge. To handle the above challenges, in this paper, we propose to explore how to decompose online or incremental tensors based on CP decomposition. We specifically study: (1) How to make parallel update method based on CP decomposition for online tensors? (2) How to identify latent component effectively and accurately after decomposition? Answering the above questions, we propose OCTen (Online Compressionbased Tensor Decomposition) framework. Our contributions are summarized as follows:

Novel Parallel Algorithm We introduce OCTen, a novel compressionbased algorithm for online tensor decomposotion that admits an efficient parallel implementation. We do not limit to 3mode tensors, our algorithm can easily handle higherorder tensor decompositions.

Correctness guarantees By virtue of using random compression, OCTen can guarantee the identifiability of the underlying CP decomposition in the presence of streaming updates.

Extensive Evaluation Through experimental evaluation on various datasets, we show that OCTen provides stable decompositions (with quality on par with stateoftheart), while offering up to 40250 % memory space savings.
Reproducibility: We make our Matlab implementation publicly available at link ^{3}^{3}3http://www.cs.ucr.edu/~egujr001/ucr/madlab/src/OCTen.zip. Furthermore, all the small size datasets we use for evaluation are publicly available on same link.
2 Preliminaries
Tensor : A tensor is a higher order generalization of a matrix. An mode tensor is essentially indexed by variables. In particular, regular matrix with two variables i.e. and is 2mode tensor, whereas data cube ( , and ) is viewed as a 3mode tensor . The number of modes is also called ”order”. Table 1 contains the symbols used throughout the paper. For the purposes of background explanation, we refer the reader to [18] for the definitions of KhatriRao and Kronecker products which are not necessary for understanding the basic derivation of our framework.
Symbols  Definition 

Tensor, Matrix, Column vector, Scalar  
Set of Real Numbers  
Outer product  
Frobenius norm, norm  
Summation  
Kronecker product[18]  
KhatriRao product[18] 
Canonical Polyadic Decomposition: One of the most popular and extensively used tensor decompositions is the Canonical Polyadic (CP) or CANDECOMP/ PARAFAC decomposition [5, 8, 3] referred as CP decomposition. Given a Nmode tensor of dimension , its CP decomposition can be written as , where and R represents the number of latent factors or upper bound rank on tensor . Henceforth, the 3mode tensor of size can be represented as a sum of rankone tensors: where . The unfold nmode tensor can be written as kharirao product of its modes as . We refer the interested reader to several wellknown surveys that provide more details on tensor decompositions and its applications [11, 18].
3 Problem Formulation
In many realworld applications, data grow dynamically and may do so in many modes. For example, given a dynamic tensor in a moviebased recommendation system, organized as users movie rating hours, the number of registered users, movies watched or rated, and hours may all increase over time. Another example is network monitoring sensor data where tons of information like source and target IP address, users, ports etc., is collected every second. This nature of data gives rise to update existing decompositions on the fly or online and we call it incremental decomposition. In such conditions, the update needs to process the new data very quickly, which makes nonincremental methods to fall short because they need to recompute the decomposition for the full dataset. The problem that we solve is the following:
[linecolor=red!60!black,backgroundcolor=gray!20,linewidth=2pt, topline=false,rightline=false, leftline=false]
Given (a) an existing set of summaries {}, which approximate tensor of size { } at time t , (b) new incoming batch of slice(s) in form of tensor of size {}, find updates of ( , …, , ) incrementally to approximate tensor of dimension {} and rank R, where after appending new slice or tensor to mode.
4 OCTen Framework
As we mention in the previous section, to the best of our knowledge, there is no algorithm in the literature that is able to efficiently compress and incrementally update the CP decomposition in the presence of incoming tensor slices. However, there exists a method for static data [21]. Since this method considers the tensor in its entirety, it cannot handle streaming data and as the data size grows its efficiency may degrade. In this section, we introduce OCTen, a new method for parallel incremental decomposition designed with two main goals in mind: G1: Compression, speed , simplicity, and parallelization; and G2: correctness in recovering compressed partial results for incoming data, under suitable conditions. The algorithmic framework we propose is shown in Figure 1 and is described below:
We assume that we have a preexisting set of summaries of the before the update. Summaries are in the form of compressed tensors of dimension [].
These are generated by multiplying random compression matrices {} that are independently obtain from an absolutely continuous uniform distribution with respect to the Lebesgue measure, with tensor’s corresponding mode i.e. is multiplied with mode and so on; see Figure 1 and Section 4.2 for its role in correct identification of factors. The compression matrices are generated for each incoming batch or slice. For simplicity of description, we assume that we are receiving updated slice(s) on the third mode. We, further, assume that the updates come in batches of new slices, which, in turn, ensures that we see a matureenough update to the tensor, which contains useful structure. Trivially, however, OCTen can operate on singleton batches and for modes also.
In the following, is the tensor prior to the update and is the batch of incoming slice(s). Considering and , we can write space and time complexity in terms of and . Given an incoming batch, OCTen performs the following steps:
4.1 Parallel Compression and Decomposition
When handling large tensors that are unable to fit in main memory, we may compress the tensor to a smaller tensor that somehow apprehends most of the systematic variation in . Keeping this in mind, for incoming slice(s) , during the parallel compression step, we first need to create ’’ parallel triplets of random compression matrices (uniformly distributed) {} of . Thus, each worker (i.e. Matlab parpool) is responsible for creating and storing these triplets of size and . These matrices share at least ’shared’ amount of column(s) among each other. Mathematically, we can describe it as follows:
(1) 
where u, v and w are shared and have dimensions of and .
For compression matrices, we choose to assign each worker create a single row of each of the matrices to reduce the burden of creating an entire batch of {} of . We see that each worker is sufficient to hold these matrices in main memory. Now, we created compressed tensor replica or summaries by multiplying each triplets of compression matrices and ;see Figure 1. is 3mode tensor of size . Since is considerably smaller than [I ,J, K], we use of memory on each worker.
For , we already have replicas obtained from each triplets of compression matrices {}and ;see Figure 1. In general, the compression comprises Nmode products which leads to overall complexity of for dense tensor , if the first mode is compressed first, followed by the second, and then the third mode and so on. We choose to keep of same order as well nontemporal dimensions are of same order in our algorithm, so time complexity of parallel compression step for Nmode data is for each worker. The summaries are always dense, because first mode product with tensor is dense, hence remaining mode products are unable to exploit sparsity.
After appropriately computing summaries for incoming slices, we need to update the old summaries which were generated from previous data. We don’t save entire , and instead we save the compressed summaries i.e. only. Each worker reads its segment and process update in parallel as given below.
(2) 
where is the number of slices of incoming tensor and is the slice number for the compressed tensor. Further, for the decomposition step, we processed ’’ summaries on different workers, each one fitting the decomposition to the respective compressed tensor created by the compression step. We assume that the updated compressed tensor fits in the main memory, and performs inmemory computation. We denote compressed tensor decompositions as as discussed above. The data for each parallel worker can be uniquely decomposed, i.e. is unique up to scaling and column permutation. Furthermore, parallel compression and decomposition is able to achieve Goal G1.
4.2 Factor match for identifiability
According to Kruskal [9], the CP decomposition is unique (under mild conditions) up to permutation and scaling of the components i.e. and factor matrices. Consider an 3mode tensor of dimension , and of rank . If rank
(3) 
then rank 1 factors of tensor can be uniquely computable[9, 10]. Kronecker product[4] property is described as . Now combining Kruskal’s uniqueness and Kronecker product property, we can obtain correct identifiable factors from summaries if
(4) 
where Kruskalrank of , denoted as , is the maximum such that any columns of are linearly independent;see [21]. Hence, upon factorization of 3mode into components, we obtain where is shared among summaries decompositions , is a permutation matrix, and is a diagonal scaling matrix obtained from CP decomposition. To match factor matrices after decomposition step, we first normalize the shared columns of factor matrices and to unit norm . Next, for each column of , we find the most similar column of , and store the correspondence. Finally, we can describe factor matrices as :
(5) 
where and are matrices of dimension and respectively and N is number of dimensions of tensor. For 3mode tensor, and for 4mode tensor, and so on. Even though for 3mode tensor, and do not increase their number of rows over time, the incoming slices may contribute valuable new estimates to the already estimated factors.Thus, we update all factor matrices in the same way. This is able to partially achieve Goal G2.
4.3 Update results
Final step is to remove all the singleton dimensions from the sets of compression matrices {} and stack them together. A singleton dimension of tensor or matrix is any dimension for which size of matrix or tensor with given dimensions becomes one. Consider the 5by1by5 array A. After removing its singleton dimension, the array A become 5by5. Now, we can write compression matrices as:
(6) 
where and are matrices of dimension and respectively. The updated factor matrices (,, and ) for 3mode tensor (i.e. ) can be obtained by :
(7) 
where and are matrices of dimension and respectively. Hence, we achieve Goal G2.
Finally, by putting everything together, we obtain the general version of our OCTen for 3mode tensor, as presented in Algorithm 1 in supplementary material. The matlab implementation for method is available at . The higher order version of OCTen is also given in supplementary materials. We refer the interested reader to supplementary material that provide more details on OCTen and its applications.
Complexity Analysis: As discussed previously, compression step’s time and space complexity is and respectively. Identifiability and update can be calculated in . Hence, time complexity is considered as . Overall, as S is larger than any other factors, the time complexity of OCTen can be written as . In terms of space consumption, OCTen is quite efficient since only the compressed matrices, previous factor matrices and summaries need to be stored. Hence, the total cost of space is .
5 Empirical Analysis
We design experiments to answer the following questions: (Q1) How much memory OCTen required for updating incoming data? (Q2) How fast and accurately are updates in OCTen compared to incremental algorithms? (Q3) How does the running time of OCTen increase as tensor data grow (in time mode)? (Q4) What is the influence of parameters on OCTen? (Q5) How OCTen used in realworld scenarios?
For our all experiments, we used Intel(R) Xeon(R), CPU E52680 v3 @ 2.50GHz machine with 48 CPU cores and 378GB RAM.
5.1 Evaluation Measures
We evaluate OCTen and the baselines using three criteria: fitness, processor memory used, and wallclock time. These measures provide a quantitative way to compare the performance of our method. More specifically, Fitness measures the effectiveness of approximated tensor and defined as :
higher the value, better the approximation. Here is original tensor and is drawn tensor from OCTen.
CPU time (sec): indicates the average running time for processing all slices for given tensor, measured in seconds, is used to validate the time efficiency of an algorithm.
Process Memory used (MBytes): indicates the average memory required to process each slices for given tensor, is used to validate the space efficiency of an algorithm.
5.2 Baselines
In this experiment, four baselines have been selected as the competitors to evaluate the performance. OnlineCP[27]: It is online CP decomposition method, where the latent factors are updated when there are new data. SambaTen[7]: Samplingbased Batch Incremental Tensor Decomposition algorithm is the most recent and stateoftheart method in online computation of canonical parafac and perform all computations in the reduced summary space. RLST[15]: Recursive Least Squares Tracking (RLST) is another online approach in which recursive updates are computed to minimize the Mean Squared Error (MSE) on incoming slice. ParaComp[21]: an implementation of nonincremental parallel compression based tensor decomposition method. The model is based on parallel processing of randomly compressed and reduced size replicas of the data. Here, we simply recompute decomposition after every update.
5.3 Experimental Setup
The specifications of each synthetic dataset are given in Table 2. We generate tensors of dimension I = J = K with increasing I and other modes, and added gaussian distributed noise. Those tensors are created from a known set of randomly (uniformly distributed) generated factors with known rank , so that we have full control over the ground truth of the full decomposition. For real datasets , we use AUTOTEN [17] to find rank of tensor. We dynamically calculate the size of incoming batch or incoming slice(s) for our all experiments to fit the data into 10% of memory of machine. Remaining machine CPU memory is used for computations for algorithm. We use 20 parallel workers for every experiment.
I=J=K  NNZ  Batch size  p  Q  shared  Noise () 

50  5  20  30  5  (0.1, 0.2)  
100  10  30  35  10  (0.2, 0.2)  
500  50  40  30  6  (0.5, 0.6)  
1000  20  50  40  10  (0.4, 0.2)  
5000  10  90  70  25  (0.5, 0.6)  
10000  10  110  100  20  (0.2, 0.7)  
50000  4  140  150  30  (0.6, 0.9) 
Note that all comparisons were carried out over 10 iterations each, and each number reported is an average with a standard deviation attached to it. Here, we only care about the relative comparison among baseline algorithms and it is not mandatory to have the best rank decomposition for every dataset. In case of methodspecific parameters, for ParaComp algorithm, the settings are chosen to give best performance. For OnlineCP, we choose the batchsize which gives best performance in terms of approximation fitness. For fairness, we also compare against the parameter configuration for OCTen that yielded the best performance. Also, during processing, for all methods we remove unnecessary variable from baselines to fairly compare with our methods.
5.4 Results
5.4.1 [Q1 & Q2] Memory efficient, Fast and Accurate
For all datasets we compute Fitness(%),CPU time (seconds) and Memory(MB) space required. For OCTen, OnlineCP, ParaComp,Sambaten and RLST we use 10% of the timestamp data in each dataset as existing old tensor data. The results for qualitative measure for data is shown in Figure 2. For each of tensor data ,the best performance is shown in bold. All stateofart methods address the issue very well. Compared with OnlineCP, ParaComp,Sambaten and RLST, OCTen give comparable fitness and reduce the mean CPU running time by up to 2x times for big tensor data. For all datasets, PARACOMP’s accuracy (fitness) is better than all methods. But it is able to handle upto size only. For small size datasets, OnlineCP’s efficiency is better than all methods. For large size dataset, OCTen outperforms the baseline methods w.r.t fitness, average running time (improved 2x4x) and memory required to process the updates. It significantly saved 40200% of memory as compared to Online CP, Sambaten and RLST as shown in Figure 2. It saved 4080% memory space compared to Paracomp. Hence, OCTen is comparable to stateofart methods for small dataset and outperformed them for large dataset. These results answer Q1 & Q2 as the OCTen have comparable qualitative measures to other methods.
5.4.2 [Q3] Scalability Evaluation
To evaluate the scalability of our method, firstly, a tensor of small slice size () but longer time dimension () is created. Its first 10% timestamps of data is used for and each method’s running time for processing batch of 10 data slices at each time stamp measured.
As can be seen from Figure 3, increasing length of temporal mode increases time consumption quasilinearly. However the slope is different for various nontemporal mode data sizes. In terms of memory consumption, OCTen also behaves linearly. This is expected behaviour because with increase in temporal mode, the parameters i.e. and Q also grows. Once again, our proposed method illustrates that it can efficiently process large sized temporal data. This answers our Q3.
5.4.3 [Q4] Parameter Sensitivity
We extensively evaluate sensitivity of number of compressed cubes required ’’ , size of compressed cubes and number of shared columns required for OCTen. we fixed batch size to for all datasets ,where is time dimension of tensor . As discussed in section 4, it is possible to identify unique decomposition . In addition, if we have
(8) 
for parallel workers, decomposition is almost definitely identifiable with column permutation and scaling.
We keep this in mind and evaluate the OCTen as follows.
(a) Sensitivity of p :The number of compressed cubes play an important role in OCTen. We performed experiments to evaluate the impact of changing the number of cubes i.e. with fixed values of other parameters for different size of tensors. We see in figure 4 that increasing number of cubes result in increase of Fitness of approximated tensor and CPU Time and Memory (MB) is superlinearly increased. Consider the case of , from above condition, we need . We can see from Figure 4, the condition holds true.
(b) Sensitivity of Q: To evaluate the impact of Q , we fixed other parameters i.e. ’’ and ’’. We can see that with higher values of the ’Q’, Fitness is improved as shown in Figure 5. Also It is observed that when equation 8 satisfy, fitness become saturated. Higher the size of compressed cubes, more memory is required to store them.
(c) Sensitivity of shared : To evaluate the impact of ’shared’ , we fixed other parameters i.e. ’’ and ’Q’.We observed that this parameter does not have impact on CPU Time (sec) and Memory space(MB). The best fitness is found when as shown in figure 5.4.3. Fitness decreases when because the new compressed cubes completely loses its own structure when joined to old compressed cubes. To retain both old and new structure we choose to keep parameter for all experiments.
In sum, these observations demonstrate that: 1) a suitable number of cubes and its size i.e. on compressed tensor could improve the fitness and result in better tensor decomposition, and 2) For identifiability ’p’ must satisfy the condition, , to achieve better fitness, lower CPU Time (seconds) and low memory space (MB). This result answers Q4.
5.4.4 [Q5] Effectiveness on real world dataset
To evaluate effectiveness of our method on real world networks, we use the FoursquareNYC sequential temporal dataset [26] and American College Football Network (ACFN) [20] (analysis provided in supplementry material). FoursquareNYC dataset includes longterm ( 10 months) checkin data in New York city collected from Foursquare from 12 April 2012 to 16 February 2013. The tensor data is structured as [user (1k), Point of Interest (40k), time (310 days)] and each element in the tensor represents the total time spent by user for that visit. Our aim is to find next top@5 places to visit in NYC per user. We decompose the tensor data into batches of 7 days and using rank = estimated by AutoTen [17]. For evaluation, we reconstruct the complete tensor from factor matrices and mask the known POIs in the tensor and then rank the locations for each user. Since we do not have human supplied relevance rankings for this dataset, we choose to visualize the most significant factor (locations) using maps provided by Google. If the top ranked places are within 5 miles radius of user’s previous places visited, then we consider the decomposition is effective. In the Figure 6(a), the five red markers corresponds to the five highest values of the factor. These locations correspond to wellknown area in NYC : Brooklyn Bridge , Square garden , Theater District and Museum of Art. The high density of activities (green points) verifies their popularity. Figure 6(b,c) shows the top@5 results for users #192 and user #902, the red marker shows the next locations to visit and yellow marker shows the previous visited locations. More interestingly, we can see that user #192 visited coffee shops and restaurants most of the time food, top@5 ranked locations are also either restaurants or food & drink shops. Similarity, user #902, most visited places are Fitness center, top@5 ranked locations are park, playground and Stadium. Both case studies shows the effectiveness of the decomposition and confirms that the OCTen can be used for various types of data analysis and this answers Q5.
6 Related Work
In this section, we provide review of the work related to our algorithm. At large, incremental tensor methods in the literature can be categorized into two main categories as described below:
Tensor Decomposition: Phan el at. [19] had purposed a theoretic method namely GridTF to largescale tensors decomposition based on CP’s basic mathematical theory to get subtensors and join the output of all decompositions to achieve final factor matrices. Sidiropoulos el at.[15], proposed algorithm that focus on CP decomposition namely RLST (Recursive Least Squares Tracking), which recursively update the factors by minimizing the mean squared error. In 2014, Sidiropoulos el at. [21] , proposed a parallel algorithm for lowrank tensor decomposition that is suitable for large tensors. The Zhou, el at. [27] describes an online CP decomposition method, where the latent components are updated for incoming data. The most related work to ours was proposed by [7] which is samplingbased batch incremental tensor decomposition algorithm. These stateoftheart techniques focus on only fast computation but not effective memory usage. Besides CP decomposition, tucker decomposition methods[23, 16] were also introduced. Some of these methods were not only able to handle data increasing in onemode, but also have solution for multiplemode updates using methods such as incremental SVD [6]. Latest line of work is introduced in [2] i.e TuckerMPI to find inherent lowdimensional multilinear structure, achieving high compression ratios. Tucker is mostly focused on recovering subspaces of the tensor, rather than latent factors, whereas our focus is on the CP decomposition which is more suitable for exploratory analysis.
Tensor Completion: Another field of study is tensor completion, where realworld largescale datasets are considered incomplete. In literature, wide range of methods have been proposed based on online tensor imputation[13] and tensor completion with auxiliary information[14, 1]. The most recent method in this line of work is by Qingquan el at.[22], who proposed a lowrank tensor completion with general multiaspect streaming patterns, based on block partitioning of the tensor. However, these approaches cannot be directly applied when new batches of data arrived. This provides us a good starting reference for further research.
7 Conclusions
In this work, we focus on online tensor decomposition problem and proposed a novel compression based OCTen framework. The proposed framework effectively identify the low rank latent factors of compressed replicas of incoming slice(s) to achieve online tensor decompositions. To further enhance the capability, we also tailor our general framework towards higherorder online tensors. Through experiments, we empirically validate its effectiveness and accuracy and we demonstrate its memory efficiency and scalability by outperforming stateoftheart approaches (40200 % better). Regardless, future work will focus on investigating different tensor decomposition methods and incorporating various tensor mining methods into our framework.
References
 [1] E. Acar, T. G. Kolda, and D. M. Dunlavy. Allatonce optimization for coupled matrix and tensor factorizations. arXiv preprint arXiv:1105.3422, 2011.
 [2] W. Austin, G. Ballard, and T. G. Kolda. Parallel tensor compression for largescale scientific data. In PDPS, 2016 IEEE Int., pages 912–922. IEEE, 2016.
 [3] B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.6, available online, february 2015, 2015.
 [4] J. Brewer. Kronecker products and matrix calculus in system theory. IEEE Transactions on circuits and systems, 25(9):772–781, 1978.
 [5] J. D. Carroll and J.J. Chang. Analysis of individual differences in multidimensional scaling via an nway generalization of âeckartyoungâ decomposition. Psychometrika, 35(3):283–319, 1970.
 [6] H. FanaeeT and J. Gama. Multiaspectstreaming tensor analysis. KnowledgeBased Systems, 89:332–345, 2015.
 [7] E. Gujral, R. Pasricha, and E. E. Papalexakis. Sambaten: Samplingbased batch incremental tensor decomposition. arXiv preprint arXiv:1709.00668, 2017.
 [8] R. Harshman. Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis. 1970.
 [9] R. A. Harshman. Determination and proof of minimum uniqueness conditions for parafac1. UCLA Working Papers in phonetics, 22(111117):3, 1972.
 [10] T. Jiang and N. D. Sidiropoulos. Kruskal’s permutation lemma and the identification of candecomp/parafac and bilinear models with constant modulus constraints. IEEE Transactions on Signal Processing, 52(9):2625–2636, 2004.
 [11] T. Kolda and B. Bader. Tensor decompositions and applications. SIAM review, 51(3), 2009.
 [12] T. G. Kolda and J. Sun. Scalable tensor decompositions for multiaspect data mining. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 363–372. IEEE, 2008.
 [13] G. B. Mardani, Gonzalo. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Signal Processing, 2015.
 [14] A. Narita, K. Hayashi, R. Tomioka, and H. Kashima. Tensor factorization using auxiliary information. In European Conf. on MLKDD. Springer, 2011.
 [15] D. Nion and N. D. Sidiropoulos. Adaptive algorithms to track the parafac decomposition of a thirdorder tensor. IEEE Transactions on Signal Processing, 57(6):2299–2310, 2009.
 [16] S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple timeseries. In Proc.of 31st Int. Conf. on VLDB. VLDB Endowment, 2005.
 [17] E. E. Papalexakis. Automatic unsupervised tensor mining with quality assessment. In SDM. SIAM, 2016.
 [18] E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. TIST, 2016.
 [19] A. H. Phan and A. Cichocki. Parafac algorithms for largescale problems. Neurocomputing, 74(11):1970–1984, 2011.
 [20] F. Sheikholeslami, B. Baingana, G. B. Giannakis, and N. D. Sidiropoulos. Egonet tensor decomposition for community identification. In GlobalSIP, 2016.
 [21] N. Sidiropoulos, Evangelos E. Papalexakis, and C. Faloutsos. Parallel randomly compressed cubes. IEEE Signal Processing Magazine, 2014.
 [22] Q. Song, H. G. Xiao Huang, J. Caverlee, and X. Hu. Multiaspect streaming tensor completion. In KDD. ACM, 2017.
 [23] J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Incremental tensor analysis:theory and applications. ACM Trans. Knowl. Discov. Data, Oct. 2008.
 [24] L. Tucker. Some mathematical notes on threemode factor analysis. Psychometrika, 31(3):279–311, 1966.
 [25] X. Wang and C. Navasca. Lowrank approximation of tensors via sparse optimization. Numerical Linear Algebra with Applications, 25(2), 2018.
 [26] D. Yang, D. Zhang, V. W. Zheng, and Z. Yu. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(1):129–142, 2015.
 [27] S. Zhou, N. Vinh, J. Bailey, Y. Jia, and I. Davidson. Accelerating online cp decompositions for higher order tensors. In 22nd ACM SIGKDD. ACM, 2016.
Appendix A Supplementary Materials
a.1 Extending to HigherOrder Tensors
We now show how our approach is extended to higherorder cases. Consider Nmode tensor . The factor matrices are for CP decomposition with mode as new incoming data. A new tensor is added to to form new tensor of where . In addition, sets of compression matrices for old data are {} and for new data it is {} for p number of summaries.
Each compression matrices are converted into compressed cubes i.e. for compressed cube is of dimension and same follows for . The updated summaries are computed using s.t. . After CP decomposition of , factor matrices and random compressed matrices are stacked as :
(9) 
Finally, the update rule of each nontemporal mode and temporal mode is :
(10) 
a.2 Necessary characteristics for uniqueness
As we mention above, OCTen is able to identify the solution of the online CP decomposition, as long as the parallel CP decompositions on the compressed tensors are also identifiable. Empirically, we observe that if the decomposition of a given data that has exact or neartrilinear structure (or multilinear in the general case), i.e. obeying the lowrank CP model with some additive noise, OCTen is able to successfully, accurately, and using much fewer resources than stateoftheart, track the online decomposition. On the other hand, when given data that do not have a low trilinear rank, the results were of lower quality. This observation is of potential interest in exploratory analysis, where we do not know 1) the (low) rank of the data, and 2) whether the data have low rank to begin with (we note that this is an extremely hard problem, out of the scope of this paper, but we refer the interested reader to previous studies [25, 17] for an overview). If OCTen provides a good approximation, this indirectly signifies that the data have appropriate trilinear structure, thus CP is a fitting tool for analysis. If, however, the results are poor, this may indicate that we need to reconsider the particular rank used, or even analyzing the data using CP in the first place. We reserve further investigations of what this observation implies for future work.
a.3 OCTen at work
We construct the ACFN tensor data with the playerplayer interaction to a 115 x 115 grid, and considering the time as the third dimension of the tensor. Therefore, each element in the tensor is an integer value that represents the number of interactions between players at a specific moment of time. Our aim is to find the players communities (ground truth communities = 12) changed over time in football dataset. In order to evaluate the effectiveness of our method on football dataset, we compare the groundtruth communities against the communities found by the our method. Figure 7 shows a visualization of the football network over time, with nodes colored according to the observed communities. American football leagues are tightlyknit communities because of very limited matches played across communities. Since these communities generally do not overlap, we perform hard clustering. We find that communities are volatile and players belongs to community #12 (from subfigure (a)) are highly dynamic in forming groups. We observe that OCTen is able to find relevant communities and also shows the ability to capture the changes in forming those communities in temporal networks.