Smoothed Low Rank and Sparse Matrix Recovery by
Iteratively Reweighted Least Squares Minimization
Abstract
This work presents a general framework for solving the low rank and/or sparse matrix minimization problems, which may involve multiple nonsmooth terms. The Iteratively Reweighted Least Squares (IRLS) method is a fast solver, which smooths the objective function and minimizes it by alternately updating the variables and their weights. However, the traditional IRLS can only solve a sparse only or low rank only minimization problem with squared loss or an affine constraint. This work generalizes IRLS to solve joint/mixed low rank and sparse minimization problems, which are essential formulations for many tasks. As a concrete example, we solve the Schatten norm and norm regularized LowRank Representation (LRR) problem by IRLS, and theoretically prove that the derived solution is a stationary point (globally optimal if ). Our convergence proof of IRLS is more general than previous one which depends on the special properties of the Schatten norm and norm. Extensive experiments on both synthetic and real data sets demonstrate that our IRLS is much more efficient.
I Introduction
In recent years, the low rank and sparse matrix learning problems have been hot research topics and lead to broad applications in computer vision and machine learning, such as face recognition [1], collaborative filtering [2], background modeling [3], and subspace segmentation [4, 5]. The norm and nuclear norm are popular choices for sparse and low rank matrix minimizations with theoretical guarantees and competitive performance in practice. The models can be formulated as a joint low rank and sparse matrix minimization problem as follow:
(1) 
where x and can be either vectors or matrices, is a convex function (the Frobenius norm ; nuclear norm , the sum of all singular values of a matrix; norm ; and norm , the sum of the norm of each column of a matrix) and is a linear mapping. In this work, we further consider the nonconvex Schatten norm , norm and norm with for pursuing lower rank or sparser solutions.
Problem (1) is general which involves a wide range of problems, such as Lasso [6], group Lasso [7], trace Lasso [4], matrix completion [8], Robust Principle Component Analysis (RPCA) [3] and LowRank Representation (LRR) [5]. In this work, we aim to propose a general solver for (1). For the ease of discussion, we focus on the following two representative problems,
(2) 
(3) 
where is a given data matrix, and are with compatible dimensions and is the model parameter. Notice that these problems can be reformulated as unconstrained problems (by representing by ) as that in problem (1).
Ia Related Works
The sparse and low rank minimization problems can be solved by various methods, such as SemiDefinite Programming (SDP) [9], Accelerated Proximal Gradient (APG) [10], and Alternating Direction Method (ADM) [11]. However, SDP has a complexity of for an sized matrix, which is unbearable for large scale applications. APG requires that at least one term of the objective function has Lipschitz continuous gradient. Such an assumption is violated in many problems, e.g., problem (2) and (3). Compared with SDP and APG, ADM is the most widely used one. But it usually requires introducing several auxiliary variables corresponding to nonsmooth terms. The auxiliary variables may slow down the convergence, or even lead to divergence when there are too many variables. Linearized ADM (LADM) [12] may reduce the number of auxiliary variables, but suffer the same convergence issue. The work [12] proposes an accelerated LADM with Adaptive Penalty (LADMAP) with lower periteration cost. However, the accelerating trick is special for the LRR problem. And thus are not general for other problems. Another drawback for many low rank minimization solvers is that they have to perform the soft singular value thresholding:
(4) 
as a subproblem. Solving (4) requires computing the partial SVD of . If the rank of the solution is not sufficiently low, computing the partial SVD of is not faster than computing the full SVD of [11].
In this work, we aim to solve the general problem (1) without introducing auxiliary variables and also without computing SVD. The key idea is to smooth the objective function by introducing regularization terms. Then we propose the Iteratively Reweighted Least Squares (IRLS) method for solving the relaxed smooth problem by alternately updating a variable and its weight. Actually, the reweighting methods have been studied for the () minimization problem [13, 14, 15]. Several variants have been proposed with much theoretical analysis [16, 17]. Usually, IRLS converges exponentially fast (linear convergence) [18], and numerical results have indicated that it leads to a sparse solution with better recovery performance. The reweighting method has also been applied for low rank minimization recently [19, 20, 21]. However, the problems that can be solved by iteratively reweighted algorithm are still very limited. Previous works are only able to minimize the single norm only or nuclear norm only with squared loss or an affine constraint. Thus they cannot solve (1) whose objective function contains two or more nonsmooth terms, such as robust matrix completion [22] and RPCA [3]. Also, previous convergence proofs, based on the special properties of norm and Schatten norm, are not general, and thus limit the application of IRLS. Actually, many other different nonconvex surrogate functions of norm have been proposed, e.g. the logarithm fcuntion [15]. We will generalize IRLS for solving problem (1) with more general objective functions.
IB Contributions
In summary, the contributions of this paper are as follows.

For solving problem (1) with the objective function as the low rank and sparse matrix minimization, we first introduce regularization terms to smooth the objective function, and solve the relaxed problem by the Iteratively Reweighted Least Squares (IRLS) method. This is actually one of the future works mentioned in [21].

We take the Schatten norm and norm regularized LRR problem as a concrete example to introduce the IRLS algorithm and theoretically prove that the obtained solution by IRLS is a stationary point. It is globally optimal when . Based on our general proof, we further show some other problems which can also be solved by IRLS.

Numerical experiments demonstrate the effectiveness of the proposed IRLS algorithm by comparing with the stateoftheart ADM family algorithms. IRLS is much more efficient since it avoids SVD completely.
Ii Smoothed Low Rank Representation
In this section, to illustrate the smoothed low rank and sparse matrix recovery by Iteratively Reweighted Least Squares (IRLS), we take the LRR problem as a concrete example. The reason of choosing this model as an application is twofold. First, LRR is a low rank and (column) sparse minimization problem, so solving LRR is more difficult than solving RPCA by the ADM family algorithms. It is easy to extend IRLS for other low rank plus sparse matrix recovery problems based on this example. Second, LRR has become an important model with various applications in machine learning and computer vision. A fast solver is important for real applications.
The LRR problem (3) can be reformulated as follows without the auxiliary variable :
(5) 
where denotes the Schatten norm of , denotes the norm of . Our solver can handle the case . Problem (3) is a special case of (5) when . The major challenge for solving (5) is that both two terms of the objective function are nonsmooth. A simple way is to smooth both two terms by introducing regularization terms^{1}^{1}1One may use two independent regularization parameters and for Schatten norm and norm, respectively.:
(6) 
where , is the identity matrix and is the all ones vector. The terms and make the objective function smooth (see (10)). The above model is called Smoothed LRR in this work. Solving the Smoothed LRR problem instead of LRR brings several advantages.
First, is smooth when . This is the major difference between LRR and Smoothed LRR. Usually, a smooth objective function makes the optimization problem easier to solve.
Second, if , is convex, and so is . This guarantees a globally optimal solution to (6).
Theorem 1
If , is convex w.r.t and . Also, for a given , is convex w.r.t .
The above theorem can be easily proved by using the convexity of Schatten norm and norm when .
Third, , where the equality holds if and only if . Indeed,
where denotes the th (ordered) eigenvalue of a matrix . That is to say, is majorized by with a given . Decreasing tends to decrease .
Furthermore, for any given , there exists , such that . Suppose and are the optimal solutions to (5) and (6), respectively. Then we have
Input: Data matrix , , .
Initialize: , , and .
while not converged do

Update by solving the following problem
(7) 
Update the weight matrices and separately by
(8) (9) 
.

If , break.
end while
Iii IRLS Algorithm
In this section, we show how to solve (6) by IRLS. By the fact that , (6) can be reformulated as follows:
(10) 
where or denotes the th column of matrix . Let and . Then .
The derivative of is
where is the weight matrix corresponding to . Note that can be computed without SVD [23].
For the derivative of , consider the columnwise differentiation for each ,
That is to say, , where is the weight matrix corresponding to . It is a diagonal matrix with the th diagonal entry being .
By setting the derivative of with respect to to zero, we have
or equivalently,
(11) 
Eqn (11) is the well known Sylvester equation, which cost for a general solver. But if has certain structure, the costs may likely be [24]. We use the Matlab command p to solve (11) in this work.
Notice that both and depend only on . They can be computed if is fixed. If the weight matrices and are fixed, can be obtained by solving (11). This fact motivates us to solve (10) by iteratively updating and . This optimization method is called Iteratively Reweighted Least Squares (IRLS), which is shown in Algorithm 1. IRLS separately treats the weight matrices and , which correspond to the low rank and sparse terms, respectively.
It is easy to see the periteration complexity of IRLS for the smoothed LRR problem (6) is . Such cost is the same as APG, ADM, LADM, and LADMAP. APG solves an approximated unconstraint problem of LRR. Thus its solution is not optimal to (5) or (6) [12]. The traditional ADM does not guarantee to converge for LRR with three variables. Both LADM and LADMAP lead to the optimal solution of LRR. But their convergence rates are sublinear, i.e., ), where is the number of iterations. Usually, IRLS converges much faster than the ADM type methods and it avoids computing SVD in each iteration. Though the convergence rate of IRLS is not established, our experiments show that it tends to converge linearly. The stateoftheart method, accelerated LADMAP [12], costs only , where is the predicted rank of . It may be faster than our IRLS when the rank of is sufficiently low. However, the rank of depends on the choice of the parameter , which is usually tuned to achieve good performance of the application. As observed in the experiments shown later, IRLS outperforms the accelerated LADMAP on several real applications.
It is worth mentioning that though we present IRLS for LRR, it can also be used for many other problems, including the structured Lassos (e.g., group Lasso [7], overlapping/nonoverlapping group Lasso [25], and tree structured group Lasso [26]), robust matrix completion [22] and RPCA [3]. Though it is difficult to give a general IRLS algorithm for all these problems. The main idea is quite similar. The first step is to smooth the objective function like that in (6). Table I shows the smoothed versions of some popular norms. Other related norms, e.g., overlapping group Lasso, can be smoothed in a similar way. Then we are able to compute the derivatives of the smooth functions. The derivatives can be rewritten as a simple function of the main variable or by introducing an auxiliary variable, i.e., the weight matrix as shown in Table I. This will make the updating of the main variable much easier. Iteratively updating the main variable and the weight matrix leads to the IRLS algorithm which guarantees to converge. More generally, one may use other concave function, e.g., the logarithm function [15], to replance the norm in Talbe I. The induced problems can be also solved by IRLS.
Norm  Definition  Smoothed  Derivative  Weight matrix 
norm  is a diagonal matrix, with  
Nuclear norm  
nonoverlapping group Lasso  ,  is a diagonal matrix, ,  
is the index of th group  with each as 
Iv Algorithmic Analysis
Previous iteratively reweighted algorithm minimizes the sum of a nonsmooth term and squared loss, while we minimize the sum of two (or more) nonsmooth terms. In this section, we provide a new convergence analysis of IRLS for nonsmooth optimization. Though based on Algorithm 1 for solving LRR problem, our proofs are general. We first show some lemmas and prove the convergence of IRLS.
Our proofs are based on a key fact that is concave on when . By the definition of concave function, we have
(12) 
The following proofs are also applicable to other concave functions, e.g., , which is an approximation of the norm of .
Lemma 1
Assume each column of and is nonzero. Let , , be concave and differentiable functions. We have
(13) 
where is a diagonal matrix, with its th diagonal element being .
Lemma 2
is concave on (the set of symmetric positive definite matrices) when .
Assume that is concave and differentiable on . For any , we have
(15) 
By letting with in (15), we get
(16) 
Based on the above results, we have the following convergence results of the IRLS algorithm.
Theorem 2
The sequence generated in Algorithm 1 satisfies the following properties:

is nonincreasing, i.e. ;

The sequence is bounded;

.
Theorem 3
Though for the convenience of description, we fixed in Algorithm 1 and the convergence analysis. In the implementation, we decrease the value of in each iteration, e.g., with . The intuition is that it shall make the Smoothed LRR problem (6) close to the LRR problem (5). It is easy to check that our proofs also hold when .
It is worth mentioning that our IRLS algorithm and convergence proofs are much more general than that in [18, 21, 27], and such extensions are nontrivial. The problems in [18] and [21] are sparse or low rank minimization problems with affine constraints. The work in [27] considers the unconstrained sparse or low rank minimization problems with squared loss. Our work considers an unconstrained joint low rank an sparse minimization problem. We need to update a variable and two (can be more) weight variables, while previous IRLS methods update only one variable and one weight. Note that it is usually easy to prove the convergence with two updating variables, but difficult with more than two updating variables. Also, the proofs are totally different. In [18, 21], due to the affine constraints (i.e. ), the optimal solution can be written as , where is a feasible solution and lies in the kernel of . This key property is critical for their proofs but cannot be used in our proof, and we do not rely on it. The least square loss function plays an important role in the convergence proof in [27] (easy to see this from equations (2.12) and (2.13) in [27]). Our proof has to handle at least two nonsmooth terms (and without smooth squared loss function) simultaneously. Also previous IRLS methods use a special property of () based on Young’s inequality, while we use the concavity of (see (13) and Lemma 1, 2), which involves more general functions. Thus, IRLS can be also used if is replaced with other concave functions, e.g., .
V Experiments
In this section, we conduct numerical experiments on both synthetic and real data to demonstrate the efficiency of the proposed IRLS algorithm^{2}^{2}2The codes can be found at https://sites.google.com/site/canyilu/.. We use IRLS to solve LRR and Inductive Robust Principle Component (IRPCA) [28] problems. To compare with previous convex solvers for LRR, we set in (5). We first examine the behaviour of IRLS and its sensitivity to the regularization parameter , and then compare the performance of IRLS with stateoftheart methods.
Va Selection of Regularization Parameter
IRLS converges fast and leads to an accurate solution when the regularization parameter is chosen appropriately. We decrease by with . is initialized as , where is the spectral norm of . Thus the choice of depends on and . We conduct two experiments to examine the sensitivity of IRLS to and , respectively. The first one is to fix and examine different values of . The second one is to fix and examine different values of . The experiments are performed on a synthetic data set.
The synthetic data is generated by the same procedure as that in [5, 12]. We generate independent subspaces whose bases are computed by , , where is a random rotation matrix and is a random orthogonal matrix. So each subspace has a rank of and the data dimension is . We sample data vectors from each subspace by , , with being an i.i.d matrix. We randomly chose samples to be corrupted by adding Gaussian noise with zero mean and standard deviation .
Figures 1 (a) and (b) show the convergence curves of IRLS with different values of and . It is observed that a small value of will lead to an inaccurate solution in a few iterations. But a large value of will delay the convergence. Similar phenomenon can be found in the choice of . A large value of will lead to fast convergence, while a small value of will lead to a more accurate solution. For an accurate solution, should not converge to 0 too fast. Thus cannot be too small and should not be too large. We observe that and work well.
VB LRR for Subspace Segmentation
In this section, we present numerical results of IRLS and the other stateoftheart algorithms, including APG, ADM, LADM [29], LADMAP and accelerated LADMAP [12] (denoted as LADMAP(A)) to solve the LRR problem for subspace segmentation. All the ADM type methods use PROPACK [30] for fast SVD computing. We implement IRLS algorithm by Matlab without using third party package. For LADMAP(A), we set the maximum iteration number as 10000 (the default value is 1000). This is because LADMAP(A) is usually fast but not able to converge within 1000 iterations in some cases. Except this, we use the default parameters of all the competed methods in the released codes from Lin’s homepage^{3}^{3}3http://www.cis.pku.edu.cn/faculty/vision/zlin/zlin.htm. For IRLS, we set , and . All experiments are run on a PC with an Intel Core 2 Quad CPU Q9550 at 2.83GH and 8GB memory, running Windows 7 and Matlab version 8.0.
Method  Minimum  Time  Iter. 

APG  111.481  129.6  312 
ADM  37.572  77.2  187 
LADM  37.571  130.3  298 
LADMAP  37.571  16.8  38 
LADMAP(A)  37.571  2.4  38 
IRLS  37.571  26.5  105 
Method  Minimum  Time  Iter. 
APG  129.022  56.2  160 
ADM  111.463  76.6  199 
LADM  111.797  418.2  1000 
LADMAP  111.463  175.2  457 
LADMAP(A)  111.463  123.6  391 
IRLS  111.463  26.4  105 
Method  Minimum  Time  Iter. 
APG  147.171  44.0  109 
ADM  124.586  105.7  257 
LADM  136.819  578.9  1000 
LADMAP  124.967  556.3  1000 
LADMAP(A)  123.933  1081.4  1973 
IRLS  123.933  24.9  105 
VB1 Synthetic Data Example
We use the same synthetic data as that in Section VA. We emphasize on the performance with different LRR model parameter . Usually a larger leads to lower rank solution. This experiment is to test the sensitiveness of the competed methods to different ranks of the solution. Figure 1 shows the convergence curves corresponding to and , respectively (only the results within 1000 iterations are plotted). Table II shows the detailed results, including the achieved minimum at the last iteration, the computing time and the number of iterations. It can be seen that IRLS is always faster than APG, ADM and LADM. IRLS also outperforms LADMAP and LADMAP(A) except when . We find that the linearized ADM methods need more iterations to converge when increases. That is because when is not small enough, the rank of the solution will be not small. In this case, partial SVD may not be faster than the full SVD [11]. Hence using PROPACK may be unstable. Compared with LADMAP(A), IRLS is a better choice for the smallsized or highrank problems because it completely avoids SVD.
VB2 Face Clustering
We test the performance of all the competed methods for face clustering on the Extended Yale B database [31]. Some example face images are shown in Figure 3. There are 38 subjects in this database. We conduct two experiments by using the first 5 and 10 subjects of face images to form the data [32]. Each subject has 64 face images. These images are resized into and projected onto a 30dimensional subspace by PCA for 5 subjects clustering problem and a 60dimensional subspace for 10 subjects clustering problem. The affinity matrix is defined as , where is the solution to the LRR problem obtained by different solvers. Then the Normalized Cut [33] is used to produce the clustering results based on the affinity matrix. The LRR model parameter is set to which leads to the best clustering accuracy.
VB3 Motion Segmentation
We also test all the competed methods for motion segmentation on the Hopkins 155 database^{4}^{4}4http://www.vision.jhu.edu/data/hopkins155/. This database has 156 sequences, each of which has 39 to 550 data points drawn from two or three motions. In each sequence, the data are first projected onto a 12dimensional subspace by PCA. LRR is performed on the projected subspace, the best LRR model parameter is set to . Table IV tabulates the comparison of all these methods. It can be seen that IRLS is the fastest method. LADMAP(A) is competitive with IRLS but it requires much more iterations.
5 subjects ()  

Method  Minimum  Time  Iter.  Acc. 
APG  74.603  117.9  288  61.88 
ADM  29.993  107.5  262  84.69 
LADM  56.266  411.3  1000  84.69 
LADMAP  48.178  409.0  1000  82.81 
LADMAP(A)  30.028  494.9  8418  84.14 
IRLS  29.991  33.1  113  84.69 
10 subjects ()  
Method  Minimum  Time  Iter.  Acc. 
APG  305.692  2962.9  1000  32.52 
ADM  60.001  705.4  262  68.53 
LADM  162.488  2692.8  1000  47.34 
LADMAP  134.898  2681.1  1000  57.40 
LADMAP(A)  61.230  2212.3  10000  68.44 
IRLS  59.999  222.9  117  69.17 
VC Inductive Robust Principal Component Analysis
Inductive Robust Principal Component Analysis (IRPCA) [28] aims at finding a robust projection to remove the possible corruptions in data. It is done by solving the following nuclear norm regularized minimization problem
(17) 
Here we use the norm , sum of the norm of each row of instead of norm in [28] to handle the data with row corruptions (caused by continuous shadow, e.g., face with glass or scarf).
The norm can be smoothed as , where denotes the th row of . Thus IRLS solves (17) by iteratively solving
where and is a diagonal matrix with . We test our IRLS by comparing with ADM in [28] and LADMAP(A) [12] for face recognition. After the projection is learned by solving (17) from the training data, we can use it to remove corruption from a new coming test data point. We perform experiments on two face data sets. The first one is the Extended Yale B, which consists of 38 subjects with 64 images in each subject. We randomly select 30 images for training and the rest for test. The other one is the CMU PIE face dataset [34], which contains more than 40,000 facial images of 68 people. The images were acquired across different poses. We use the one near frontal pose C07, which includes 1629 images. All the images are resized to . For each subject, we randomly select 10 images for training, and the rest for test. The support vector machine (SVM) is used to perform classification. The recognition results are shown in Figure 5. It can be seen that the recognition accuracies are almost the same by different solvers. But the running time of ADM and LAMDAP(A) is much larger than our IRLS algorithm. Figure 6 plots some test images recovered by IRPCA obtained by our IRLS algorithm. It can be seen that IRPCA by IRLS successfully removes the shadow and corruptions from faces.
Two Motions  

Method  Time  Iter.  Err. 
APG  165.7  388  3.62 
ADM  100.8  223  2.48 
LADM  415.0  1000  6.30 
LADMAP  368.5  1000  4.50 
LADMAP(A)  57.6  4668  2.40 
IRLS  35.5  131  2.71 
Three Motions  
Method  Time  Iter.  Err. 
APG  456.6  476  12.67 
ADM  222.0  224  5.45 
LADM  942.8  1000  14.59 
LADMAP  883.7  1000  10.12 
LADMAP(A)  89.9  5768  5.19 
IRLS  84.7  133  4.14 
All  
Method  Time  Iter.  Err. 
APG  230.8  408  5.84 
ADM  127.9  223  3.25 
LADM  532.6  1000  8.33 
LADMAP  483.3  1000  5.91 
LADMAP(A)  65.7  4949  3.19 
IRLS  46.4  131  3.20 
Vi Conclusions and Future Work
Different from previous Iteratively Reweighted Least Squares (IRLS) algorithm which simply solved a single sparse or low rank minimization problem. We proposed a more general IRLS to solve the joint low rank and sparse matrix minimization problems. The objective function is first smoothed by introducing regularization terms. Then IRLS is applied for solving the relaxed problem. We provide a general proof to show that the solution by IRLS is a stationary point (globally optimal if the problem is convex). IRLS can also be applied to various optimization problems with the same convergence guarantee. An interesting future work is to use IRLS for solving nonconvex structured Lasso problems (e.g., norm regularized group Lasso, overlapping/nonoverlapping group Lasso [25], and tree structured group Lasso [26]).
Appendix
Via Proof of Lemma 1
Proof. By the definition of concave function, we have
Lemma 3
[27] Given . Let and be ordered eigenvalues of and , respectively. Then .
ViB Proof of Lemma 2
ViC Proof of Theorem 2
Proof. We denote . Since solves (7), we have
(19) 
A dot product with on both side of (19) gives
This together with (16) gives
(20) 
By using (14), we have
(21) 
Now, combining (20) and (21) gives
(22) 
The above equation implies that is nonincreasing. Then we have
(23) 
Thus the sequence is bounded. Furthermore, (23) implies that the minimum eigenvalues of and satisfy
By using Lemma 3, (22) implies that
Summing all the above inequalities for all , we get
(24) 
In particular, (24) implies that . The proof is completed.
ViD Proof of Theorem 3
Proof. If , problem (6) is convex. The stationary point is globally optimal. Thus we only need to prove that converges to a stationary point of problem (6).
The sequence is bounded by Theorem 2, hence there exists a matrix and a subsequence , such that . Note that solves (7), i.e.,
(25) 
Let , (25) implies that also converges to some . From the fact that in Theorem 2, we have
That is to say . Denote as , and let , (25) can be rewritten as
where and are defined in (8)(9) with in place of . Therefore, satisfies the firstorder optimality condition of problem (6).
References
 [1] John Wright, Allen Y Yang, Arvind Ganesh, Shankar S Sastry, and Yi Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
 [2] Markus Weimer, Alexandros Karatzoglou, Quoc Viet Le, and Alex Smola, “Cofirankmaximum margin matrix factorization for collaborative ranking,” in Advances in Neural Information Processing Systems, 2007, pp. 222–230.
 [3] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright, “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, 2011.
 [4] Canyi Lu, Jiashi Feng, Zhouchen Lin, and Shuicheng Yan, “Correlation adaptive subspace segmentation by trace Lasso,” in International Conference on Computer Vision, 2013, pp. 1345–1352.
 [5] Guangcan Liu, Zhouchen Lin, and Yong Yu, “Robust subspace segmentation by lowrank representation,” in International Conference Machine Learning, 2010.
 [6] Robert Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
 [7] Ming Yuan and Yi Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006.
 [8] Emmanuel J Candès and Yaniv Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010.
 [9] Martin Jaggi and Marek Sulovskỳ, “A simple algorithm for nuclear norm regularized problems,” in International Conference on Machine Learning, 2010, pp. 471–478.
 [10] KimChuan Toh and Sangwoon Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific Journal of Optimization, vol. 6, no. 615640, pp. 15, 2010.
 [11] Zhouchen Lin, Minming Chen, and Yi Ma, “The augmented Lagrange multiplier method for exact recovery of a corrupted lowrank matrices,” UIUC Technical Report UILUENG092215, Tech. Rep., 2009.
 [12] Zhouchen Lin, Risheng Liu, and Zhixun Su, “Linearized alternating direction method with adaptive penalty for lowrank representation,” in Advances in Neural Information Processing Systems, 2011.
 [13] Canyi Lu, Yunchao Wei, Zhouchen Lin, and Shuicheng Yan, “Proximal iteratively reweighted algorithm with multiple splitting for nonconvex sparsity optimization,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2014.
 [14] Rick Chartrand and Wotao Yin, “Iteratively reweighted algorithms for compressive sensing,” in IEEE Conference on Coustics, Speech and Signal Processing, 2008, pp. 3869–3872.
 [15] Emmanuel J Candès, Michael B Wakin, and Stephen P Boyd, “Enhancing sparsity by reweighted minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
 [16] Simon Foucart and MingJun Lai, “Sparsest solutions of underdetermined linear systems via minimization for ,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 395–407, 2009.
 [17] YunBin Zhao and Duan Li, “Reweighted minimization for sparse solutions to underdetermined linear systems,” SIAM Journal on Optimization, vol. 22, no. 3, pp. 1065–1088, 2012.
 [18] Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Gunturk, “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, pp. 1–38, 2010.
 [19] Canyi Lu, Jinhui Tang, Shuicheng Yan Yan, and Zhouchen Lin, “Generalized nonconvex nonsmooth lowrank minimization,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2014.
 [20] Canyi Lu, Changbo Zhu, Chunyan Xu, Shuicheng Yan, and Zhouchen Lin, “Generalized singular value thresholding,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2015.
 [21] Karthik Mohan and Maryam Fazel, “Iterative reweighted algorithms for matrix rank minimization,” in Journal of Machine Learning Research, 2012, vol. 13, pp. 3441–3473.
 [22] Daniel Hsu, Sham M Kakade, and Tong Zhang, “Robust matrix decomposition with sparse corruptions,” IEEE Transactions on Information Theory, vol. 57, no. 11, pp. 7221–7234, 2011.
 [23]