RESPCA: A Scalable Approach to Recovering Lowrank Matrices
Abstract
Robust principal component analysis (RPCA) has drawn significant attentions due to its powerful capability in recovering lowrank matrices as well as successful appplications in various real world problems. The current stateoftheart algorithms usually need to solve singular value decomposition of large matrices, which generally has at least a quadratic or even cubic complexity. This drawback has limited the application of RPCA in solving real world problems. To combat this drawback, in this paper we propose a new type of RPCA method, RESPCA, which is linearly efficient and scalable in both data size and dimension. For comparison purpose, AltProj, an existing scalable approach to RPCA requires the precise knowlwdge of the true rank; otherwise, it may fail to recover lowrank matrices. By contrast, our method works with or without knowing the true rank; even when both methods work, our method is faster. Extensive experiments have been performed and testified to the effectiveness of proposed method quantitatively and in visual quality, which suggests that our method is suitable to be employed as a lightweight, scalable component for RPCA in any application pipelines.
1 Introduction
Principal component analysis (PCA) bas been one of the most widely used techniques for unsupervised learning in various applications. The classic PCA aims at seeking a lowrank approximation of a given data matrix. Mathematically, it uses the norm to fit the reconstruction error, which is known to be sensitive to noise and outliers. The harder problem of seeking a PCA effective for outliercorruped data is called robust PCA (RPCA). There has been no mathematically precise meaning for the term “outlier” [24]. Thus multiple methods have been attempted to define or quantify this term, such as alternating minimization [14], random sampling techniques [17, 9], multivariate trimming [11], and so on [27, 7].
Among these methods, a recently emerged one treats an outlier as an additive sparse corruption [25], which leads to decomposing the data into a lowrank and a sparse part. Given data matrix , based on such a decomposition assumption, the corresponding RPCA method aims to mathematically solve the following problem [25, 6]:
(1) 
where is a balancing parameter, and is the (pseudo) norm that counts the number of nonzero elements of the matrix. It is generally NPhard to solve the rank function and normbased optimization problems. Hence, in practice creftype 1 is often relaxed to the following convex problem [6]:
(2) 
where is the nuclear norm that adds all singular values of the input matrix and is the norm of a matrix. A number of algorithms have been developed to solve creftype 2, such as singular value thresholding (SVT) [5], accelerated proximal gradient (APG) [23], and inexact agumented Lagrange multipliers (IALM) [16]. These algorithms, however, need to compute SVDs of matrices of size at each iteration, which, is known to generally have at least a quadratic or even cubic complexity [12]. Thus, due to the use of SVDs, high complexity of these algorithms renders them less applicable to largescale data. To improve efficiency, an augmented Lagrange multipliers (ALM)based algorithm adopts the PROPACK package [10] to solve partial, instead of full, SVDs. Even with partial SVD, it is still computationally costy when and are both large.
The convex RPCA in creftype 2 has two known limitations: 1) Without the incoherence guarantee of the underlying matrix, or when the data is grossly corrupted, the results can be much deviated from the truth [6]; 2) When the matrix has large singular values, its nuclear norm may lead to an estimation far from the rank [13]. To combact these drawbacks, several approaches to a better rank approximation have been proposed. For example, the rank of is fixed and used as a hard constraint in [15], and a nonconvex rank approximation is adopted to more accurately approximate the rank function in [13]. However, these nonconvex approaches also need to solve full SVDs of matrices. Two methods in [15, 19] need only to solve partial SVDs, which significantly reduces the complexity compared to full SVDs; for example, AltProj has a complexity of [19], with being the ground truth rank of . However, if is not known a priori, [19] usually fails to recover .
As largescale data is increasingly ubiquitous, it is crucial to handle them with more efficient and scalable RPCA methods which, nonetheless, are still largely missing. To address such a need and challenge, in this paper, we propose a new RPCA method, called RESPCA. This model does not depend on rank approximation to recover the lowrank component; rather, it effectivelly exploits the underlying group structural information of the lowrank component for the recovery. Consequently, the new method does not need to solve any SVDs as current stateoftheart methods typically do, which avoids any quadratic or higher complexity; more specifically, the proposed method has a linear complexity in both and , rendering it lightweight, scalable, and thus suitable for largescale data applications. We summarize the contributions of this paper as follows:

We propose a new type of RPCA model exploiting the underlying group structures of the lowrank component.

We develop an ALMbased algorithm for optimization, which uses no matrix decomposition and has linearly efficient computation at each iteration. The new method is scalable in data dimension and sample size, suitable for largescale data.

Extensive experiments have demonstrated the effectiveness of the proposed method quantitatively and qualitatively.
The rest of this paper is organized as follows. We first briefly review some related work. Then we introduce the new method and its optimization. Next, we conduct experiments to evaluate the new method. Finally, we conclude the paper.
2 Related Work
The convex RPCA in creftype 2 considers the sparsity of the sparse component in an elementwise manner [5]. To exploit examplewise sparsity, the norm has been adopted by replacing the norm in (2) [26, 18]:
(3) 
where is the sum of norms of the columns. The difference between creftypeplural 3 and 2 is that the latter incorporates spatial connections of the sparse component.
It is ponited out that the nuclear norm may be far from accurate in approximating the rank function [22]. To alleviate this defficiency, some new rank approximations have been used to replace the nuclear norm in creftypeplural 3 and 2, such as norm [13]. The norm based RPCA solves the following optimization problem:
(4) 
where , , and is the th largest singular value of . Here, with different values used for , the norm may have different performance in approximating the rank function.
Another recent nonconvex approach to RPCA, AltProj, cobmines the simplicity of PCA and elegant theory of convex RPCA [19]. It alternatively projects the fitting residuals onto the lowrank and sparse sets. Given that the desired rank of is , AltProj computes a rank projection in each of the total stages, with . During this process, matrix elements with large fitting errors are discarded such that sparse errors are suppressed. This method enjoys several nice properties; however, it needs the precise knowledge of the ground truth rank of , which is not always available. Without such knowledge, AltProj may fail to recover the lowrank component.
3 New Robust PCA Method
The classic RPCA and its variants usually require to solve SVDs, which has a high complexity. To overcome this drawback, in this paper we consider a new type of RPCA model that has a linear complexity. Motivated by the convex RPCA approach, we assume that the data can be decomposed as . Here, is the lowrank component of and its columns are linearly dependent in linear algebra; hence, it is true that many columns of share high similarities and thus are close geometrically in Euclidean space. In the case of a single rank1 subspace, the above assumption naturally leads to the minimization of the sum of squared mutual distances, or equivalently the variance (scaled by ), of the column vectors of :
(5) 
where is a balancing parameter, is the th column of , and is the norm of a vector. It is noted that, though not necessary, it is sufficient that the minimization of the first term in creftype 5 leads to lowrank structure for . To see this, we reformulate it as , which is the sum of squares of residuals (SSR) from each data point to the average of all data points. Thus, by minimizing it, all columns are close to their average and the average is the minimizer of SSR, which ideally lead to rank1 solution to . Under some mild conditions, we have the following theorem.
Theorem 3.1.
Given a matrix , with , and , , we have that is sufficient and necessary for
(6) 
s.t.
(7) 
where , and 1 is an all vector of dimension .
It is noted that the double summation in the first term of creftype 5 can be written as , by minimizing which we can obtain the desired lowrank structure. It is natural to generalize the above idea. To this end, we consider the case of multiple rank1 subspaces with the following model, which we refer to as Robust, linearly Efficient, Scalable PCA (RESPCA):
(8)  
where is an identity matrix of size , is an dimensional column vector containing 1’s, is an operator that returns a diagonal matrix from an input vector, and is a binary vector with the positions of s indicating which of the column vectors belong to the th subspace. It is evident that by automatically learning ’s we are able to obtain the structural information about the lowrank subspaces. It is noted that different norms can be used for , such as and norms; in this paper, without loss of generality, we adopt the norm to capture the sparse structure of . In next section, we will develop an efficient algorithm to optimize creftype 8.
Remark In the case that data have nonlinear relationships, i.e., and are close on manifold rather than in Euclidean space if they come from the same subspace, a direct extension of our method can be made, which is presented in Section 4.2. Since the linear model provides with us the key ideas and contributions of this paper, and the experiments have confirmed its effectiveness in several real world applications, we focus on the linear model in our paper. Due to space limit, we do not fully expand the nonlinear model and will consider it in further research and more applications.
4 Optimization
In this section, we present an efficient ALMbased algorithm to solve creftype 8. First, we define the augmented Lagrange function of creftype 8:
(9)  
Then we adpot the alternating decent approach to optimization, where at each step we optimize a subproblem with respect to a variable while keeping the others fixed. The detailed optimization strategies for each variable are described in the following.
4.1 minimization
The subproblem is to solve the following problem:
(10)  
Omitting the factor , it is seen that the first term above can be derived as
(11)  
where the operator returns the submatrix of that contains the columns of corresponding to nonzeros of . Correspondingly, it is straightforward to see that the second term of creftype 10 can be decomposed in a similar way:
(12)  
Hence, can be solved by individually solving the following subproblems for :
(13)  
The above subproblems are convex and according to the firstorder optimality condition we have
(14) 
where, for ease of presentation, we denote , and . Hence, creftype 14 leads to the soluation of :
(15) 
It is seen that creftype 15 requires matrix inversion, which, unfortunately, has a time complexity of in general. To avoid matrix inversion, we rewrite this matrix to simplify creftype 15:
(16) 
It is notable that due to the special structure of creftype 16 its inversion has a simple analytic expression by using the ShermanMorrisonWoodbury formula:
(17)  
Hence, it is apparent that that creftype 15 can be written as follows:
(18)  
which has a linear complexity in both and by exploiting matrixvector multiplications. can be obtained accordingly after obtaining all , for .
4.2 minimization
The subproblem associated with minimization is given as follows:
(19)  
It is seen that
(20)  
where denotes the th element of . Hence, the subproblems can be converted to
(21)  
which is simply the standard Kmeans problem. This is surprising in that we only need to perform Kmeans to and then the optimal simply corresponds to the group indicator matrix:
(22) 
It should be noted that with its current form, creftype 21 is solved by Kmeans [20]. However, more general clustering methods can be also applicable if we consider solving as a clustering rather than optimization problem. For example, if we consider nonlinear clustering algorithms, such as spectral clustering, the recovered and actually reflect nonlinear structures of the data, which can be treated as a direct nonlinear extension of our method to account for nonlinear relationships of the data.
4.3 minimization
4.4 updating
For the updating of and , we follow a standard approach in ALM framework:
(25)  
where is a parameter that controls the increasing speed of .
Regarding the complexity of the above optimization procedure, it should be noted that each step requires complexity and typically ALM converges in a finite number of steps [4], thus the overall complexity of our method is .
5 Experiments
In this section, we evaluate the proposed method in comparison with several current stateoftheart algorithms, including variational Bayesian RPCA (VBRPCA) [10], IALM for convex RPCA [6], AltProj [19], NSA [1], and PCP [28]. In particular, we follow [21, 13] and evaluate RESPCA in three applications, including foregroundbackground separation from video sequences, shadow removal from face images, and anamoly detection from handwritten digits. All these experiments are conducted under Ubuntu system with 12 Intel(R) Xeon(R) W2133 CPR 3.60GHz. All algorithms are terminated if a maximum of 500 iterations is reached or is satisfied.
5.1 ForegroundBackground Seperation
Foregroundbackground separation is to detect moving objects or interesting activities in a scene, and remove background(s) from a video sequence. The background(s) and moving objects correspond to the lowrank and sparse parts, respectively. For this task, we use 9 datasets, whose characteristics are summarized in Table 1. Among these video datasets, the first 5 contain a single background while the remaining sequences have 2 backgrounds.
Data Set  data size  # of backgrounds 
Escalator Airport  130160 3,417  1 
Hall Airport  144176 3,584  1 
Bootstrap  120160 2,055  1 
Shopping Mall  256320 1,286  1 
Highway  240320 1,700  1 
Lobby  128160 1,546  2 
Camera Parameter  240320 5,001  2 
Light Switch1  120160 2,800  2 
Light Switch2  120160 2,715  2 
Data  Method  Rank()  # of Iter.  Time  
Bootstrap  AltProj  1  0.9397  4.22e4  36  68.61 
NSA  843  0.7944  5.87e4  12  1343.22  
VBRPCA  1  1.0000  9.90e4  175  186.90  
IALM  782  0.8003  6.11e4  15  1356.04  
PCP  1174  0.7859  3.45e4  94  571.75  
RESPCA  1  0.9379  7.81e4  23  16.73  
Escalator Airport  AltProj  1  0.8987  3.86e4  33  69.34 
NSA  1016  0.6390  8.09e4  12  1793.35  
VBRPCA  1  0.9839  9.76e4  134  168.01  
IALM  1065  0.6482  6.95e4  15  1325.40  
PCP  1232  0.6670  3.59e4  93  727.65  
RESPCA  1  0.8898  5.77e4  23  20.47  
Hall Airport  AltProj  1  0.9573  1.69e5  37  93.62 
NSA  948  0.7489  4.89e4  13  2189.99  
VBRPCA  1  1.0000  9.90e4  152  240.17  
IALM  974  0.6917  7.37e4  14  2024.10  
PCP  1292  0.7055  4.27e4  77  744.28  
RESPCA  1  0.9302  5.82e4  23  26.38  
Highway  AltProj  1  0.8846  4.63e4  27  119.17 
NSA  166  0.9732  0.87e4  15  1238.95  
VBRPCA  1  1.0000  9.87e4  126  287.27  
IALM  357  0.7980  6.25e4  15  1409.10  
PCP  531  0.8440  2.27e4  152  1013.00  
RESPCA  1  0.9340  7.20e4  23  35.32  
Shopping Mall  AltProj  1  0.8907  8.12e4  30  85.92 
NSA  174  0.9372  1.57e4  14  1027.45  
VBRPCA  1  1.0000  9.92e4  157  295.00  
IALM  151  0.8457  6.25e4  14  498.65  
PCP  290  0.8898  2.85e4  165  790.30  
RESPCA  1  0.9208  7.94e4  23  28.44  
Lobby  AltProj  2  0.88.97  3.77e4  26  21.58 
NSA  161  0.8073  6.13e4  13  182.50  
VBRPCA  2  1.0000  9.92e4  111  69.47  
IALM  104  0.8229  5.66e4  15  168.22  
PCP  502  0.8500  2.59e4  92  166.79  
RESPCA  2  0.8963  1.83e4  25  20.11  
Camera Parameter  AltProj  ——  ——  ——  ——  —— 
NSA  ——  ——  ——  ——  ——  
VBRPCA  1  1.0000  9.95e4  171  1108.20  
IALM  1123  0.7020  7.81e4  16  9297.40  
PCP  ——  ——  ——  ——  ——  
RESPCA  2  0.8305  2.48e4  25  303.57  
Light Switch1  AltProj  2  0.90.84  4.21e4  48  73.54 
NSA  541  0.6559  5.87e4  13  687.19  
VBRPCA  1  1.0000  9.83e4  165  151.05  
IALM  415  0.6298  9.21e4  14  496.92  
PCP  848  0.6776  5.91e4  85  410.39  
RESPCA  2  0.9708  4.15e4  23  31.68  
Light Switch2  AltProj  2  0.8078  9.01e4  37  44.34 
NSA  486  0.8041  4.90e4  14  846.81  
VBRPCA  1  1.0000  9.93e4  150  141.21  
IALM  333  0.7815  7.79e4  15  616.28  
PCP  985  0.8337  2.68e4  154  756.34  
RESPCA  2  0.8608  2.82e4  25  33.71 
We set the rank to be the minimal number of singular values that contribute more
than 99.5% information to avoid the noise effect of small singular values.
“——” presents an “out of memory” issue.
For the parameters, we set them as follows. For IALM, we use the theoretically optimal balancing parameter . The same balancing parameter is used for PCP and NSA as suggested in the original papers. For fair comparison, we use for the proposed method. For AltProj, we specify the ground truth rank; for VBRPCA, we use the ground truth rank as its initial rank parameter. For fair comparison, we set to be ground truth rank for RESPCA. For all methods that relay on ALMoptimization, we set the parameters to be and . These settings remain the same throughout this paper unless specified otherwise.
We show the results in Table 2. It is observed that AltProj, VBRPCA, and RESPCA are able to recover the backgrounds from the video with low rank while IALM, NSA and PCP with much higher ranks. However, it is noted that VBRPCA may recover with ranks lower than the ground truth. For example, on Light Switch1, Light Switch2, and Camera Parameter data sets, the ground truth rank of the background is 2 whereas VBRPCA recovers the low rank parts with rank 1. This may be a potential problem, as will be clear later on in visual illustration. Although IALM, NSA amd PCP do not recover with desired low ranks, they recovery more sparsely than AltProj, VBRPCA, and RESPCA. Besides, we observe that the speed of the proposed method is superior to that of the other methods. From Table 2, it is observed that the proposed method is about 3 times faster than AltProj, the second fastest one, and more than 10 (even about 60 on some data sets) times faster than IALM. Although the proposed method does not obtain the smallest errors at convergence on some data, it is noted that the levels of the errors are well comparable to the other methods.
It should be noted that for mthods such as IALM, PCP, and NSA, though they do not recover with desired low ranks, it is possible that by tunning their balancing parameters they may work well. However, tunning parameter for unsupervised learning method is usually time consuming. The proposed method has one balancing parameter, which has been empirically verified that the theoretical parameter as provided in [6] works well. A possible explaination is that RESPCA has a close connection and thus enjoies the same optimal parameter with the convex RPCA. More theoretical validation is to be explored in further work.
Moreover, to visually compare the algorithms and illustrate the effectiveness of the proposed method, we show some decomposition results in Figs. 2 and 1. Since IALM, NSA and PCP cannot recover with desired low ranks, they cannot recover the backgrounds well. For example, we can observe shadows of car on highway in Fig. 1. VBRPCA reocvers with ranks lower than the ground truth on some data sets; consequently, on such data as Light Switch2 in Fig. 2 we can see that VBRPCA cannot work well on data with different backgrounds. AltProj and RESPCA can separate the backgrounds and foregrounds well.
To further assess the performance of the proposed method, we conduct the following experiments to compare the two methods that have achieved the top performance: AltProj and RESPCA. In this test we asume that the ground truth rank of is unknown, and we set it to 5 for AltProj and for the proposed method. Some obtained results are given in Figs. 4 and 3. It is seen that RESPCA can still separate the background and foreground well while AltProj fails. The success of RESPCA in this kind of scenarios can be explained as follows: With greater than the ground truth rank of , a large group of backgrounds is usually divided into smaller groups such that the backgrounds within each group still share the same structure; as a consequence, RESPCA can still recover the lowrank matrices correctly. This observation reveals that RESPCA has superior performance to AltProj when the precise knowledge of the ground truth is unknown a priori.
5.2 Shadow removal from face images
Face recognition is an important topic; however, it is often plagued by heavy noise and shadows on face images [2]. Therefore, there is a need to handle shadows. In this test, lowrank methods are used because the (unknown) clean images reside in a lowrank subspace, corresponding to , while the shadows correspond to . We use the Extended Yale B (EYaleB) data set for comparative study. EYaleB data contains face images from 38 persons, among which we select images of the first 2 persons, namely, subject 1 and subject 2. For each there are 64 images of pixels. Following the common approach as in [6, 13], we construct a data matrix for each person by vectorizing the images and perform different RPCA algorithms on the matrix. We show some results in Fig. 5 for visual inspection. It is observed that all methods can successfully remove shadows on subject 2, but some fail on subject 1. The proposed method removes shadows from face images on both subject 1 and subject 2, which confirms its effectiveness.
5.3 Anomaly Detection
Given a number of images from a subject, they form a lowdimensional subspace. Those images with stark differences from the majority can be regarded as outliers; besides, a few images from another subject are also treated as outliers. Anomaly detection is to identify such kinds of outliers from the dominant images. It is modeled that is comprised of the dominant images while captures the outliers. For this test, we use USPS data set which consists of 9,298 handwritten digits of size . We follow [13] and vectorize the first 190 images of ‘1’s and the last 10 of ‘7’s to construct a data matrix. Since the dat set contains much more ‘1’s than ‘7’s, we regard the former as the dominant digit while the latter outlier. For visual illustration, we show examples of these digit images in Fig. 6.
It is observed that all the ‘7’s are outliers. Besides, some ‘1’s are quite different from the majority, such as the one with an underline. We apply RESPCA to this data set and obtain the separated and . In , those columns corresponding to outliers have relatively larger values. Following [13], we use the norm to measure the columns of and show their values in Fig. 7, where we have vanished values smaller than 5 for clearer visualization. Then we show the corresponding digits in Fig. 8, which are the detected outliers. It is noted that RESPCA has detected all the ‘7’s as well as some ‘1’s, such as the one with an underline. This has verified the effectiveness of RESPCA in anomaly detection.
5.4 Scalability
We have analyzed the scalability of the proposed method in previous sections. In this test, we empirically verify the result from our analysis regarding the linearity with and using the data sets in Table 1. For each of these data sets, we use different sampling ratios in sample size and data dimension, respectively, to collect its subsets of different sizes. On each subset, we perform RESPCA 10 times. From Table 2, it is seen that all experiments are terminated within about 2325 iterations; hence, in this test we temporarily ignore the terminating tolerance and terminate the experiment within a reasonable number of iterations, which is set to be 30. Then we report the average time cost and show the results in Fig. 9. It is observed that the time cost of RESPCA increases linearly in both and , which confirms the scalability of the proposed method.
6 Conclusion
Existing RPCA methods typically need to solve SVDs of large matrices, which generally has at least a quadratic or even cubic complexity. To combat this drawback, in this pape we propose a new type of RPCA method. The new method recovers the lowrank component by exploiting geometrical similarities of the data, without performing any SVD that current stateoftheart RPCA methods usually have to do. We develop an ALMbased optimization algorithm which is linearly efficient and scalable in both data dimension and sample size. Extensive experiments in different applications testify to the effectivenss of the proposed method, in which we observe superior performance in speed and visual quality to several current stateoftheart methods. These observations suggest that the proposed method is suitable for largescale data applications in real world problems.
Acknowledgement
This work is supported by National Natural Science Foundation of China under grants 61806106, 61802215, 61806045, 61502261, 61572457, and 61379132. C. Chen and Q. Cheng are corresponding authors.
References
 [1] Necdet Serhat Aybat, Donald Goldfarb, and Garud Iyengar. Fast firstorder methods for stable principal component pursuit. arXiv preprint arXiv:1105.2126, 2011.
 [2] Ronen Basri and David W Jacobs. Lambertian reflectance and linear subspaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(2):218–233, 2003.
 [3] Amir Beck and Marc Teboulle. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
 [4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
 [5] JianFeng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010.
 [6] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011.
 [7] Christophe Croux and Gentiane Haesbroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika, 87(3):603–618, 2000.
 [8] Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics, 57(11):1413–1457, 2004.
 [9] Fernando De La Torre and Michael J Black. A framework for robust subspace learning. International Journal of Computer Vision, 54(13):117–142, 2003.
 [10] Xinghao Ding, Lihan He, and Lawrence Carin. Bayesian robust principal component analysis. IEEE Transactions on Image Processing, 20(12):3419–3430, 2011.
 [11] Ramanathan Gnanadesikan and John R Kettenring. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, pages 81–124, 1972.
 [12] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012.
 [13] Zhao Kang, Chong Peng, and Qiang Cheng. Robust pca via nonconvex rank approximation. In Data Mining (ICDM), 2015 IEEE International Conference on, pages 211–220. IEEE, 2015.
 [14] Qifa Ke and Takeo Kanade. Robust l norm factorization in the presence of outliers and missing data by alternative convex programming. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 739–746. IEEE, 2005.
 [15] Wee Kheng Leow, Yuan Cheng, Li Zhang, Terence Sim, and Lewis Foo. Background recovery by fixedrank robust principal component analysis. In Computer Analysis of Images and Patterns, pages 54–61. Springer, 2013.
 [16] Zhouchen Lin, Minming Chen, and Yi Ma. The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices. arXiv preprint arXiv:1009.5055, 2010.
 [17] Ricardo A Maronna, R. Douglas Martin, and Victor J Yohai. Robust statistics. John Wiley & Sons Ltd Chichester, 2006.
 [18] Michael McCoy, Joel A Tropp, et al. Two proposals for robust pca using semidefinite programming. Electronic Journal of Statistics, 5:1123–1160, 2011.
 [19] Praneeth Netrapalli, UN Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Nonconvex robust pca. In Advances in Neural Information Processing Systems, pages 1107–1115, 2014.
 [20] Chong Peng, Zhao Kang, Shuting Cai, and Qiang Cheng. Integrate and conquer: Doublesided twodimensional kmeans via integrating of projection and manifold construction. ACM Trans. Intell. Syst. Technol., 9(5):57:1–57:25, June 2018.
 [21] Chong Peng, Zhao Kang, and Qiang Cheng. A fast factorizationbased approach to robust pca. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1137–1142. IEEE, 2016.
 [22] Chong Peng, Zhao Kang, Huiqing Li, and Qiang Cheng. Subspace clustering using logdeterminant rank approximation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 925–934. ACM, 2015.
 [23] KimChuan Toh and Sangwoon Yun. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 6(615640):15, 2010.
 [24] N. Vaswani, T. Bouwmans, S. Javed, and P. Narayanamurthy. Robust subspace learning: Robust pca, robust subspace tracking, and robust subspace recovery. IEEE Signal Processing Magazine, 35(4):32–55, July 2018.
 [25] John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, and Yi Ma. Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization. In Advances in neural information processing systems, pages 2080–2088, 2009.
 [26] Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust pca via outlier pursuit. In Advances in Neural Information Processing Systems, pages 2496–2504, 2010.
 [27] Lei Xu and Alan L Yuille. Robust principal component analysis by selforganizing rules based on statistical physics approach. Neural Networks, IEEE Transactions on, 6(1):131–143, 1995.
 [28] Zihan Zhou, Xiaodong Li, John Wright, Emmanuel Candes, and Yi Ma. Stable principal component pursuit. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pages 1518–1522. IEEE, 2010.