Rank Minimization on Tensor Ring: A New Paradigm in Scalable Tensor Decomposition and Completion
Abstract
In lowrank tensor completion tasks, due to the underlying multiple largescale singular value decomposition (SVD) operations and rank selection problem of the traditional methods, they suffer from high computational cost and high sensitivity of model complexity. In this paper, taking advantages of high compressibility of the recently proposed tensor ring (TR) decomposition, we propose a new model for tensor completion problem. This is achieved through introducing convex surrogates of tensor lowrank assumption on latent tensor ring factors, which makes it possible for the Schatten norm regularization based models to be solved at much smaller scale. We propose two algorithms which apply different structured Schatten norms on tensor ring factors respectively. By the alternating direction method of multipliers (ADMM) scheme, the tensor ring factors and the predicted tensor can be optimized simultaneously. The experiments on synthetic data and realworld data show the high performance and efficiency of the proposed approach.
1 Introduction
Tensor decomposition aims to find the latent factors of tensor valued data (i.e. the generalization of multidimensional arrays), thereby casting largescale tensors into a multilinear tensor space of lowdimensionality (very few degree of freedom designated by the rank). Tensor factors can then be considered as latent features of data, and in this way can represent the data economically and predict missing entries when the data is incomplete. The specific form and operations among latent factors defines the type of tensor decomposition. A variety of tensor decomposition models have been applied in diverse fields such as machine learning [20, 2, 16] and signal processing [30, 7]. Tucker decomposition and CANDECOMP/PARAFAC (CP) decomposition are classical tensor decomposition models, which have been studied for nearly half a century [18, 24, 13].
In recent years, the concept of tensor networks has been proposed and has become a powerful and promising aspect of tensor methodology [5, 6]. One of the most recent and popular tensor networks, named the matrix product state/tensortrain (MPS/TT), is studied across disciplines owing to its super compression and computational efficiency properties [21, 20]. For a tensor of dimensions, the most significant property of TT decomposition is that the space complexity will not grow exponentially with , thus providing a natural remedy for the ‘curse of dimensionality’, while the number of parameters of Tucker decomposition is exponential in . Although the CP decomposition is a highly compact representation which has desirable property of being linear in , it has difficulties in finding the optimal latent tensor factors. To address these issues, recent studies propose a generalization of TT decomposition, termed the tensor ring (TR) decomposition, in order to relax the rank constraint of TT, thus offering an enhanced representation ability, latent factors permutation flexibility (i.e. tensor permutation is directly related to the permutation of tensor factors) and structure information interpretability (i.e. each tensor factor can represent a specific feature of original tensor) [29, 27].
Tensor completion aims to recover an incomplete tensor from partially observed entries. The theoretical lynchpin in matrix or tensor completion problems is the lowrank assumption, and tensor completion has been applied in various applications such as image/video completion [19, 28], recommendation systems [17], link prediction [8], compressed sensing [10], to name but a few. Since the determination of tensor rank is an NPhard problem[14, 18], many tensor lowrank surrogates were proposed for tensor completion. One such surrogate is the Schatten norm (a.k.a. nuclear norm, or trace norm), which is defined as the sum of singular values of a matrix, and is the most popular convex surrogate for rank regularization. Unlike matrix completion problems, the Schatten norm model of a tensor is hard to formulate. Recent studies mainly focus on two convex relaxation models of tensor Schatten norm, the ‘overlapped’ model [19, 23, 4, 22, 15] and the ‘latent’ [23, 12] model.
The work in [23] first proposes the ‘latent’ norm model and shows that the mean square error of a ‘latent’ norm method scales no greater than the ‘overlapped’ norm method. Under the lowrank regularization of the latent model, the tensor does not need to be lowrank at every mode, which is considered as a more flexible constraint. Both models do not need to specify the rank of decompositions, and the rank of tensor is optimized to be minimal subject to the equivalence of observed elements. However, the two methods need to perform multiple SVD operations on the matricization of tensors, and the computational complexity grows exponentially with tensor dimension. Other tensor completion algorithms, like alternating least squares (ALS) [11, 25] and gradientbased algorithms [26, 1], need to specify the rank of the decompositions beforehand, which leads to annoyed parameter tuning problems. In addition, the completion performance of tensor completion algorithms is mainly affected by rank selection, the number of observed entries and tensor dimensions.
In this paper, in order to tackle the high computational cost and the sensitivity to rank selection problems that most proposed algorithms experience, we propose a new tensor completion model based on the tensor ring decomposition. Our main contributions are listed below:

The relation between lowrank assumption on tensor and latent factors is theoretically explained , and the lowrank surrogate on latent factors of tensor ring decomposition is introduced.

We formulate the TR overlapped lowrank factor (TROLRF) model and the TR latent lowrank factor (TRLLRF) model, then the two models are solved efficiently by the ADMM algorithm.

We conduct several experiments and obtain the high performance and high efficiency by using our algorithms. In addition, the experiments results also show that our algorithms are robust to rank selection and data dimensionality.
2 Preliminaries
2.1 tensor ring decomposition
tensor ring (TR) decomposition is a more general decomposition than tensortrain (TT) decomposition, and it represents a tensor with large dimension by circular multilinear products over a sequence of low dimension cores. All of the cores corresponding to TR decomposition are orderthree tensors, and are denoted by , . The decomposition diagram is shown in Fig. 1. In the same way as TT, the TR decomposition linearly scales to the dimension of the tensor, thus it can overcome the ‘curse of dimensionality’. For simplicity, we define to represent a set of tensor cores. The syntax denotes TRrank which controls the model complexity of TR decomposition. The TR decomposition relaxes the rank constraint on the first and last core of TT to , while the original constraint on TT is rather stringent, i.e., . TR applies trace operation and all the core tensors are constrained to be thirdorder equivalently. In this case, TR can be considered as a linear combination of TT and thus offers a more powerful and generalized representation ability than TT. The elementwise relation and global relation of TR decomposition and the original tensor is given by equations (1) and (2):
(1) 
(2) 
where is the matrix trace operator, is the th mode slice matrix of , which also can be denoted by . is a subchain tensor by merging all cores except the th core tensor, i.e., . is the mode matricization operator of a tensor, i.e., if , then . is another type of mode matricization operator of a tensor, e.g., if , then .
2.2 Tensor completion by Schatten norm regularization
The lowrank tensor completion problem can be formulated as:
(3) 
and the model can be written in a unconstrained form by:
(4) 
where is the lowrank approximation tensor, is a rank regularizer, denotes all the observed entries w.r.t. the set of indices of observed entries represented by , and is the Frobenius norm. For the lowrank tensor completion problem, determining the rank of a tensor is an NPhard problem. Work in [19] and [22] extends the concept of lowrank matrix completion and defines the tensor rank as the sum of the rank of mode matricization of the tensor. This surrogate is named ‘overlapped’ model, and it simultaneously regularizes all the mode matricizations of a tensor into lowrankness by Schatten norm. In this way, we can define the rank of a tensor as:
(5) 
where denotes the Schatten norm.
Another surrogate of tensor rank, named ‘latent’ lowrank, has been proposed and studied recently. In [23], the ‘latent’ model considers the original tensor as a summation of several latent tensors and assumes that each latent tensor is lowrank in a specific mode:
(6) 
This convex surrogate is more flexible as it can fit the tensor well if the tensor does not have lowrankness in all modes. The completion algorithms based on these two models are shown to have fast convergence and good performance when data size is small. However, when we need to deal with largescale data, the multiple SVD operations will be intractable due to high computational cost.
2.3 Tensor completion by tensor decomposition
Some other existing tensor completion algorithms do not employ a lowrank constraint to the tensor, and thus they do not find the lowrank tensor directly, instead, they try to find the lowrank representation (i.e. tensor factors) of the incomplete data by observed entries, then the obtained latent factors are used to predict the missing entries. The completion problem is set as a weighted least squares model, e.g., the tensor completion model based on TR decomposition is formulated below:
(7) 
where is the Hadamard product of two tensors of same size, is the tensor generated by the tensor factors. is a weight tensor which is the same size as , it records the indices of the observed entries of , and every entry of satisfies and .
Based on solving tensor factors of different tensor decompositions, many tensor completion algorithms have been proposed, e.g., weighted CP [1], weighted Tucker [9], and weighted TT [26], TRALS [25]. However, usually these algorithms are solved by gradientbased method or alternating least squares method, they are shown to suffer from low convergence speed and high computational cost. In addition, the performance of these methods is sensitive to rank selection.
In this paper, we make virtue of applying both ‘overlapped’ approach and ‘latent’ approach of structured Schatten norms, and aim to formulate a new tensor completion model. The main idea is to give a lowrank constraint on latent factors of a tensor. In this way, we only need to calculate SVD on the tensor factors instead of the whole scale of data. At the same time, lowrankness constraint on tensor factors will regularize the tensor factors to lowrank, and in doing so it will solve the problem of rank selection. The next section we presents our proposed method based on both ‘overlapped’ and ‘latent’ tensor lowrank models.
3 Lowrankness on tensor factors
We propose a new definition on lowrank tensor, which gives the lowrankness on the decomposition factors of a tensor, for TR decomposition, the lowrank model is formulated as:
(8) 
where denotes the tensor approximated by core tensors . We formulate the lowrank assumption of the core tensors by equation (5) and (6).
We need firstly to deduce the relation of tensor rank and tensor factor rank, which can be explained by the below theorem:
: For , .
: For , from equation (2), we can infer .
The above theorem proves the relation between the ranks of tensor and core tensors . Since is an upper bound of the mode matricization of tensor , we can take assumption that has a lowrank structure. This can largely decrease the computational complexity compared to other algorithms which give lowrank assumption on overlapped tensors or latent tensors. In a similar way, we can deduce that the sum of latent rank of tensor factors is the upper bound of the latent rank of the original tensor. More specifically, our tensor ring overlapped lowrank factor (TROLRF) model is formulated as follows:
(9) 
The TR latent lowrank factor (TRLLRF) model is outlined below:
(10)  
The two models have two distinctive advantages. Firstly, the lowrank assumption is placed on tensor factors instead of on the original tensor, this reduces the computational complexity of the SVD operation largely. Secondly, lowrankness on tensor factors can enhance the robustness to rank selection.
3.1 Solving scheme
3.1.1 TrOlrf
To solve the equations (9) and (10), we apply the augmented Lagrangian multiplier method (ADMM) which is efficient and widely used. Because the variables of TROLRF are interdependent, we adopt alternative variables, and the augmented Lagrangian function of TROLRF model is:
(11)  
where are the alternative variables of , are Lagrangian multipliers, denotes the inner product, and is a penalty parameter.
To update and , , the augmented Lagrangian function are formulated by:
(12) 
(13) 
For , the th iteration update scheme of alternating direction method of multipliers (ADMM) of TROLRF model is listed below:
(14) 
where , is the reverse operator of that transforms mode matricization of a tensor to the original tensor, is the singular value thresholding (SVT) operator, i.e., if is the singular value decomposition of matrix , then , and is the set of indices of missing entries.
3.1.2 TrLlrf
Similarly, the augmented Lagrangian function of TRLLRF model can be written as:
(15)  
To update and , the augmented Lagrangian function is formulated by:
(16) 
(17) 
The corresponding update scheme of TRLLRF model is listed below:
(18) 
The ADMM solving model is updated iteratively based on the above model and updating scheme. The implementation process and hyperparameter selection of the two algorithms are summarized in Alg. 1 and Alg. 2.
Alg. 1 TR overlapped lowrank factors (TROLRF)  Alg. 2 TR latent lowrank factors (TRLLRF) 
1: Input: , initial TRrank .  1: Input: , initial TRrank ,. 
2: Initialization: , , , , , , element of s.t. , , , , .  2: Initialization: , , , , , , element of s.t. , , , , . 
3: While the stopping condition is not satisfied do  3: While the stopping condition is not satisfied do 
4: k=k+1;  4: k=k+1; 
5: Update variables by equation (14).  5: Update variables by equation (18). 
6: If , break  6: If , break 
7: End while  7: End while 
8: Output: and , .  8: Output: and , . 
3.2 Computational complexity
Algorithm  Computational Complexity 
TROLRF  
TRLLRF  
TRALS  
TTSiLRTC  
SiLRTC  
BCPF 
We next compared the computational complexity of our TROLRF and TRLLRF to the stateoftheart algorithms TRALS [25], SiLRTCTT [3], SiLRTC [19] and FBCP [28]. The comparative algorithms are stateoftheart algorithms and are similar to our algorithms. The complexities are summarized in Tab. 1, where we denote the dimension of tensor by , , and all the TTranks, TRranks and CP ranks are set to . From Tab. 1 we can see that compared to Schatten norm based algorithms, the computational complexity of our algorithms are linear in tensor dimension. Compared to TRALS and BCPF, the complexity of our algorithms is independent from the number of observed entries. The computational complexity of our algorithms increase fast when increases, however, due to the linear scalability of TR decomposition, is often small in model selection of proposed algorithms. In addition, most of the stated algorithms are rank adaptive, i.e., robust to rank selection.
4 Experiment results
4.1 Synthetic data
To verifying the performance of our two proposed algorithms, we test two tensors of size and . The tensors were generated by TR factors of TRranks and respectively. The values of the TR factors were drawn from an normal distribution . We define as the sum of square root of TRrank (i.e. ) to be the index of model complexity. The observed entries of the tensors were randomly removed. We verified the performance of the proposed two algorithms in several scenarios, with the mean RSE values of 10 times of dependent experiments as the final results. All the hyperparameters of the two algorithms were set according to Alg. 1 and Alg. 2.
For the first experiment, we test the completion performance of our two algorithms and four other stateoftheart algorithms under different missing rates, from 0.1 to 0.99. For our algorithms, we set the TRrank to be the same as the real rank of the synthetic data and other hyperparameters were set as default. For other compared algorithms, we tuned the hyperparameters respectively to obtain the best results of each algorithm. Fig. 2 shows the experiment results for the orderfour tensor and ordersix tensor respectively.
For the second experiment, we tested the completion performance of our two algorithms under various SSR values, the missing rate was set to , and used again two different tensors. The results in the first picture of Fig. 3 show that our two algorithms obtained the lowest RSE values when the SSR was near the real SSR, and when the SSR value increased, the RSE value remained stable. This indicates that our algorithms are robust to rank selection.
For the third and forth experiments, we tested the performance of our algorithms over different values of , missing rate was set to , and TRrank is chosen as the real rank of the two tensors. Fig. 3 shows the robustness for the three different values of and verifies that our two algorithms are robust to the selection of .
4.2 Hyperspectral image
A hyperspectral image of size was next considered. This was an image of urban landscape collected by a satellite. We compare our TROLRF and TRLLRF to TRALS , TTSiLRTC , SiLRTC and BCPF. We examined orderthree, orderfive, orderseven and ordereight tensors respectively. The missing rate is set as 0.9 and the hyperparameters are set as defaults. For each tensor, we choose all the TRranks as a same value, i.e., . The tensor size and TRranks are recorded in the first column of Tab. 2, and the RSE values of each tensor against each algorithm are listed in Tab. 2.
From the results we can see, our algorithms significantly outperform TTSiLRTC, SiLRTC, BCPF. Though the results of TRALS are comparable to our algorithms, it should be noted that the computational time of TRALS is more than double of the time TROLRF and TRLLRF spent (1891 seconds vs 756 seconds and 988 seconds) in order to get the similar results.
TROLRF  TRLLRF  TRALS  TTSiLRTC  SiLRTC  BCPF  
0.0710  0.0677  0.0681  0.4572  0.3835  0.3750  
0.1062  0.1072  0.1122  0.4895  0.4307  0.3742  
0.1436  0.1483  0.1497  0.5051  0.4408  0.3680  
0.1520  0.1524  0.1430  0.4957  0.4526  0.3981 
5 Conclusion
In order to solve the largescale SVD calculation and rank selection problem that most tensor completion methods have. We proposed two algorithms which impose lowrank assumption on tensor factors. Based on tensor ring decomposition, we proposed two optimization models named as TROLRF and TRLLRF. The two models can be solved efficiently by ADMM algorithm. We test the algorithms on synthetic data in various situations by synthetic data and real world data. The high performance and high efficiency of ur algorithms are obtained from the experiment results. In addition, the results also show that the proposed algorithms are robust to tensor rank and other model parameters. The proposed method is heuristic to all the modelbased lowrank tensor completion and decomposition, and it can be applied to various tensor decompositions to create more efficient and robust algorithms.
Acknowledgement
References
 [1] Evrim Acar, Daniel M Dunlavy, Tamara G Kolda, and Morten Mørup. Scalable tensor factorizations for incomplete data. Chemometrics and Intelligent Laboratory Systems, 106(1):41–56, 2011.
 [2] Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. The Journal of Machine Learning Research, 15(1):2773–2832, 2014.
 [3] Johann A Bengua, Ho N Phien, Hoang Duong Tuan, and Minh N Do. Efficient tensor completion for color image and video recovery: Lowrank tensor train. IEEE Transactions on Image Processing, 26(5):2466–2479, 2017.
 [4] Hao Cheng, Yaoliang Yu, Xinhua Zhang, Eric Xing, and Dale Schuurmans. Scalable and sound lowrank tensor learning. In Artificial Intelligence and Statistics, pages 1114–1123, 2016.
 [5] Andrzej Cichocki, Namgil Lee, Ivan Oseledets, AnhHuy Phan, Qibin Zhao, and Danilo P Mandic. Tensor networks for dimensionality reduction and largescale optimization: Part 1 lowrank tensor decompositions. Foundations and Trends® in Machine Learning, 9(45):249–429, 2016.
 [6] Andrzej Cichocki, AnhHuy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama, and Danilo P Mandic. Tensor networks for dimensionality reduction and largescale optimization: Part 2 applications and future perspectives. Foundations and Trends® in Machine Learning, 9(6):431–673, 2017.
 [7] Fengyu Cong, QiuHua Lin, LiDan Kuang, XiaoFeng Gong, Piia Astikainen, and Tapani Ristaniemi. Tensor decomposition of EEG signals: a brief review. Journal of Neuroscience Methods, 248:59–69, 2015.
 [8] Beyza Ermiş, Evrim Acar, and A Taylan Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Mining and Knowledge Discovery, 29(1):203–236, 2015.
 [9] Marko Filipović and Ante Jukić. Tucker factorization with missing data with application to low n nrank tensor completion. Multidimensional systems and signal processing, 26(3):677–692, 2015.
 [10] Silvia Gandy, Benjamin Recht, and Isao Yamada. Tensor completion and lownrank tensor recovery via convex optimization. Inverse Problems, 27(2):025010, 2011.
 [11] Lars Grasedyck, Melanie Kluge, and Sebastian Kramer. Variants of alternating least squares tensor completion in the tensor train format. SIAM Journal on Scientific Computing, 37(5):A2424–A2450, 2015.
 [12] Xiawei Guo, Quanming Yao, and James TinYau Kwok. Efficient sparse lowrank tensor completion using the FrankWolfe algorithm. In AAAI, pages 1948–1954, 2017.
 [13] RA Harshman. Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multimode factor analysis. UCLA Working Papers in Phonetics, 16:1–84, 1970.
 [14] Christopher J Hillar and LekHeng Lim. Most tensor problems are NPhard. Journal of the ACM (JACM), 60(6):45, 2013.
 [15] Masaaki Imaizumi, Takanori Maehara, and Kohei Hayashi. On tensor train rank minimization: Statistical efficiency and scalable algorithm. In Advances in Neural Information Processing Systems, pages 3933–3942, 2017.
 [16] Heishiro Kanagawa, Taiji Suzuki, Hayato Kobayashi, Nobuyuki Shimizu, and Yukihiro Tagami. Gaussian process nonparametric tensor estimator and its minimax optimality. In International Conference on Machine Learning, pages 1632–1641, 2016.
 [17] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: ndimensional tensor factorization for contextaware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems, pages 79–86. ACM, 2010.
 [18] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.
 [19] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence, 35(1):208–220, 2013.
 [20] Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P Vetrov. Tensorizing neural networks. In Advances in Neural Information Processing Systems, pages 442–450, 2015.
 [21] Ivan V Oseledets. Tensortrain decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011.
 [22] Marco Signoretto, Quoc Tran Dinh, Lieven De Lathauwer, and Johan AK Suykens. Learning with tensors: a framework based on convex optimization and spectral regularization. Machine Learning, 94(3):303–351, 2014.
 [23] Ryota Tomioka and Taiji Suzuki. Convex tensor decomposition via structured schatten norm regularization. In Advances in neural information processing systems, pages 1331–1339, 2013.
 [24] Ledyard R Tucker. Some mathematical notes on threemode factor analysis. Psychometrika, 31(3):279–311, 1966.
 [25] Wenqi Wang, Vaneet Aggarwal, and Shuchin Aeron. Efficient low rank tensor ring completion. Rn, 1(r1):1, 2017.
 [26] Longhao Yuan, Qibin Zhao, and Jianting Cao. Completion of high order tensor data with missing entries via tensortrain decomposition. In International Conference on Neural Information Processing, pages 222–229. Springer, 2017.
 [27] Qibin Zhao, Masashi Sugiyama, Longhao Yuan, and Andrzej Cichocki. Learning efficient tensor representations with ring structure networks, 2018.
 [28] Qibin Zhao, Liqing Zhang, and Andrzej Cichocki. Bayesian cp factorization of incomplete tensors with automatic rank determination. IEEE transactions on pattern analysis and machine intelligence, 37(9):1751–1763, 2015.
 [29] Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, and Andrzej Cichocki. Tensor ring decomposition. arXiv preprint arXiv:1606.05535, 2016.
 [30] Guoxu Zhou, Qibin Zhao, Yu Zhang, Tülay Adalı, Shengli Xie, and Andrzej Cichocki. Linked component analysis from matrices to highorder tensors: Applications to biomedical data. Proceedings of the IEEE, 104(2):310–331, 2016.