Nonconvex Nonsmooth Low-Rank Minimization via Iteratively Reweighted Nuclear Norm

Nonconvex Nonsmooth Low-Rank Minimization via Iteratively Reweighted Nuclear Norm

Canyi Lu,  Jinhui Tang, , Shuicheng Yan, ,
and Zhouchen Lin, 
C. Lu and S. Yan are with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore (e-mail:; Tang is with the School of Computer Science, Nanjing University of Science and Technology, China (e-mail: Lin is with the Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, China (e-mail: paper is an extended version of [1] published in CVPR 2014.

The nuclear norm is widely used as a convex surrogate of the rank function in compressive sensing for low rank matrix recovery with its applications in image recovery and signal processing. However, solving the nuclear norm based relaxed convex problem usually leads to a suboptimal solution of the original rank minimization problem. In this paper, we propose to perform a family of nonconvex surrogates of -norm on the singular values of a matrix to approximate the rank function. This leads to a nonconvex nonsmooth minimization problem. Then we propose to solve the problem by Iteratively Reweighted Nuclear Norm (IRNN) algorithm. IRNN iteratively solves a Weighted Singular Value Thresholding (WSVT) problem, which has a closed form solution due to the special properties of the nonconvex surrogate functions. We also extend IRNN to solve the nonconvex problem with two or more blocks of variables. In theory, we prove that IRNN decreases the objective function value monotonically, and any limit point is a stationary point. Extensive experiments on both synthesized data and real images demonstrate that IRNN enhances the low-rank matrix recovery compared with state-of-the-art convex algorithms.

Nonconvex low rank minimization, Iteratively reweighted nuclear norm algorithm

I Introduction

BENEFITING from the success of Compressive Sensing (CS) [2], the sparse and low rank matrix structures have attracted considerable research interests from the computer vision and machine learning communities. There have been many applications which exploit these two structures. For instance, sparse coding has been widely used for face recognition [3], image classification [4] and super-resolution [5], while low rank models are applied for background modeling [6], motion segmentation [7, 8] and collaborative filtering [9].

Conventional CS recovery uses the -norm, i.e., , as the surrogate of the -norm, i.e., , and the resulting convex problem can be solved by fast first-order solvers [10, 11]. Though for certain problems, the -minimization is equivalent to the -minimization under certain incoherence conditions [12], the obtained solution by -minimization is usually suboptimal to the original -minimization since the -norm is a loose approximation of the -norm. This motivates to approximate the -norm by nonconvex continuous surrogate functions. Many known nonconvex surrogates of -norm have been proposed, including -norm () [13], Smoothly Clipped Absolute Deviation (SCAD) [14], Logarithm [15], Minimax Concave Penalty (MCP) [16], Capped [17], Exponential-Type Penalty (ETP) [18], Geman [19] and Laplace [20]. We summarize their definitions in Table I and visualize them in Figure 1. Numerical studies [21, 22] have shown that the nonconvex sparse optimization usually outperforms convex models in the areas of signal recovery, error correction and image processing.

Penalty Formula , , Supergradient
SCAD [14]
Logarithm [15]
MCP [16]
Capped [17]
ETP [18]
Geman [19]
Laplace [20]
TABLE I: Popular nonconvex surrogate functions of and their supergradients.
(a) Penalty [13]
(b) SCAD Penalty [14]
(c) Logarithm Penalty [15]
(d) MCP Penalty [16]
(e) Capped Penalty [17]
(f) ETP Penalty [18]
(g) Geman Penalty [19]
(h) Laplace Penalty [20]
Fig. 1: Illustration of the popular nonconvex surrogate functions of (left) and their supergradients (right). For the penalty, . For all these penalties, and .
(a) Rank
(b) Nuclear norm
(d) SCAD
(e) Logarithm
(f) MCP
(g) Capped
(h) ETP
(i) Geman
(j) Laplace
Fig. 2: Manifold of constant penalty for a symmetric matrix for the (a) rank penalty, (b) nuclear norm, (c-j) , where the choices of the nonconvex are listed in Table I. For in , we set . For other parameters, we set (c) , (d) , (e) , (f) , (g) , (h) , (i) and (j) . Note that the manifold will be different for with different parameters.

The low rank structure of a matrix is the sparsity defined on its singular values. A particularly interesting model is the low rank matrix recovery problem


where is a linear mapping and . The above low rank minimization problem arises in many computer vision tasks such as multiple category classification [23], matrix completion [24], multi-task learning [25] and low-rank representation with squared loss for subspace segmentation [7]. Similar to the -minimization, the rank minimization problem (1) is also challenging to solve. Thus, the rank function is usually replaced by the convex nuclear norm, , where ’s denote the singular values of . This leads to a relaxed convex formulation of (1):


The above convex problem can be efficiently solved by many known solvers [26, 27]. However, the obtained solution by solving (2) is usually suboptimal to (1) since the nuclear norm is also a loose approximation of the rank function. Such a phenomenon is similar to the difference between -norm and -norm for sparse vector recovery. However, different from the nonconvex surrogates of -norm, the nonconvex rank surrogates and the optimization solvers have not been well studied before.

In this paper, to achieve a better approximation of the rank function, we extend the nonconvex surrogates of -norm shown in Table I onto the singular values of the matrix, and show how to solve the following general nonconvex nonsmooth low rank minimization problem [1]


where denotes the -th singular value of (we assume that in this work). The penalty function and loss function satisfy the following assumptions:

  • is continuous, concave and monotonically increasing on . It is possibly nonsmooth.

  • : is a smooth function of type , i.e., the gradient is Lipschitz continuous,


    for any , is called Lipschitz constant of . is possibly nonconvex.

Note that problem (3) is very general. All the nonconvex surrogates of -norm in Table I satisfy the assumption A1. So is the nonconvex surrogate of the rank function111Note that the singular values of a matrix are always nonegative. So we only consider the nonconvex definted on .. It is expected that it approximates the rank function better than the convex nuclear norm. To see this more intuitively, we show the balls of constant penalties for a symmetric matrix in Figure 2. For the loss function in assumption A2, the most widely used one is the squared loss .

There are some related work which consider the nonconvex rank surrogates. But they are different from this work. The work [28, 29] extend the -norm of a vector to the Schatten- norm () and use the iteratively reweighted least squares (IRLS) algorithm to solve the nonconvex rank minimization problem with affine constraint. IRLS is also applied for the unconstrained problem with the smoothed Schatten- norm regularizer [30]. However, the obtained solution by IRLS may not be naturally of low rank, or it may require a lot of iterations to get a low rank solution. One may perform the singular value thresholding appropriately to achieve a low rank solution, but there has no theoretically sound rule to suggest a correct threshold. Another nonconvex rank surrogate is the truncated nuclear norm [31]. Their proposed alternating updating optimization algorithm may not be efficient due to double loops of iterations and cannot be applied to solve (3). The nonconvex low rank matrix completion problem considered in [32] is a special case of our problem (3). Our solver shown later for (3) is also much more general. The work [33] uses the nonconvex log-det heuristic in [34] for image recovery. But their augmented Lagrangian multiplier based solver lacks of the convergence guarantee. A possible method to solve (3) is the proximal gradient algorithm [35], which requires to compute the proximal mapping of the nonconvex function . However, computing the proximal mapping requires solving a nonconvex problem exactly. To the best of our knowledge, without additional assumptions on (e.g., the convexity of [35]), there does not exist a general solver for computing the proximal mapping of the general nonconvex in assumption A1.

In this work, we observe that all the existing nonconvex surrogates in Table I are concave and monotonically increasing on . Thus their gradients (or supergradients at the nonsmooth points) are nonnegative and monotonically decreasing. Based on this key fact, we propose an Iteratively Reweighted Nuclear Norm (IRNN) algorithm to solve (3). It computes the proximal operator of the weighted nuclear norm, which has a closed form solution due to the nonnegative and monotonically decreasing supergradients. The cost is the same as the computing of singular value thresholding which is widely used in convex nuclear norm minimization. In theory, we prove that IRNN monotonically decreases the objective function value and any limit point is a stationary point.

Furthermore, note that problem (3) contains only one block of variable. But there are also some work which aim at finding several low rank matrices simultaneously, e.g., [36]. So we further extend IRNN to solve the following problem with blocks of variables


where , (assume ), ’s satisfy the assumption A1, and is Lipschitz continuous defined as follows.

Definition 1

Let be differentiable. Then is called Lipschitz continuous if there exist , such that


for any and with . We call ’s as Lipschitz constants of .

Note that the Lipschitz continuity of the multivariable function is crucial for the extension of IRNN for (5). This definition is completely new and it is different from the one block variable case defined in (4). For , (6) holds if (4) holds (Lemma 1.2.3 in [37]). This motivates the above definition. But note that (4) does not guarantee to hold based on (6). So the definition of the Lipschitz continuity of the multivariable function is different from (4). This makes the extension of IRNN for problem (5) nontrivial. A widely used function which satisfies (6) is . Its Lipschitz constants are , , where denotes the spectral norm of matrix . This is easy to verified by using the property , where ’s are of compatible size.

In theory, we prove that IRNN for (5) also has the convergence guarantee. In practice, we propose a new nonconvex low rank tensor representation problem which is a special case of (5) for subspace clustering. The results demonstrate the effectiveness of nonconvex models over the convex counterpart.

In summary, the contributions of this paper are as follows.

  • Motivated from the nonconvex surrogates of -norm in Table I, we propose to use a new family of nonconvex surrogates to approximate the rank function. Then we propose the Iteratively Reweighted Nuclear Norm (IRNN) method to solve the nonconvex nonsmooth low rank minization problem (3).

  • We further extend IRNN to solve the nonconvex nonsmooth low rank minimization problem (5) with blocks of variables. Note that such an extension is nontrivial based on our new definition of Lipschitz continuity of the multivariable function in (6). In theory, we prove that IRNN converges with decreasing objective function values and any limit point is a stationary point.

  • For applications, we apply the nonconvex low rank models on image recovery and subspace clustering. Extensive experiments on both synthesized and real-world data well demonstrate the effectiveness of the nonconvex models.

The remainder of this paper is organized as follows: Section II presents the IRNN method for solving problem (3). Section III extends IRNN for solving problem (5) and provides the convergence analysis. The experimental results are presented in Section IV. Finally, we conclude this paper in Section V.

Ii Nonconvex Nonsmooth Low-Rank Minimization

In this section, we show how to solve the general problem (3). Note that in (3) is not necessarily smooth. An known example is the Capped norm, see Figure 1. To handle the nonsmooth penalty , we first introduce the concept of supergradient defined on the concave function.

Ii-a Supergradient of a Concave Function

If is convex but nonsmooth, its subgradient at is defined as


If is concave and differentiable at , it is known that


Inspired by (8), we can define the supergradient of concave at the nonsmooth point [38].

Definition 2

Let be concave. A vector is a supergradient of at the point if for every , the following inequality holds


The supergradient at a nonsmooth point may not be unique. All supergradients of at are called the superdifferential of at . We denote the set of all the supergradients at as . If is differentiable at , then is the unique supergradient, i.e., . Figure 3 illustrates the supergradients of a concave function at both differentiable and nondifferentiable points.

For concave , is convex, and vice versa. From this fact, we have the following relationship between the supergradient of and the subgradient of .

Lemma 1

Let be concave and . For any , , and vice versa.

Fig. 3: Supergraidients of a concave function. is a supergradient at , and and are supergradients at .

It is trivial to prove the above fact by using (7) and (9). The relationship of the supergradient and subgradient shown in Lemma 1 is useful for exploring some properties of the supergradient. It is known that the subdiffierential of a convex function is a monotone operator, i.e.,


for any , . Now we show that the superdifferential of a concave function is an antimonotone operator.

Lemma 2

The superdifferential of a concave function is an antimonotone operator, i.e.,


for any and .

The above result can be easily proved by Lemma 1 and (10).

The antimonotone property of the supergradient of concave function in Lemma 2 is important in this work. Suppose that satisfies the assumption A1, then (11) implies that


when . That is to say, the supergradient of is monotonically decreasing on . The supergradients of some usual concave functions are shown in Table I. We also visualize them in Figure 1. Note that for the penalty, we further define that . This will not affect our algorithm and convergence analysis as shown later. The Capped penalty is nonsmooth at with its superdifferential .

Ii-B Iteratively Reweighted Nuclear Norm Algorithm

In this subsection, based on the above concept of the supergradient of concave function, we show how to solve the general nonconvex and possibly nonsmooth problem (3). For the simplicity of notation, we denote as the singular values of . The variable in the -th iteration is denoted as and is the -th singular value of .

In assumption A1, is concave on . So, by the definition (9) of the supergradient, we have




Since , by the antimonotone property of supergradient (12), we have


In (15), the nonnegativeness of ’s is due to the monotonically increasing property of in assumption A1. As we will see later, property (15) plays an important role for solving the subproblem of our proposed IRNN.

Motivated by (13), we may use its right hand side as a surrogate of in (3). Thus we may solve the following relaxed problem to update :


Problem (16) is a weighted nuclear norm regularized problem. The updating rule (16) can be regarded as an extension of the Iteratively Reweighted (IRL1) algorithm [21] for the weighted -norm problem


However, the weighted nuclear norm in (16) is nonconvex (it is convex if and only if [39]), while the weighted -norm in (17) is convex. For convex in (16) and in (17), solving the nonconvex problem (16) is much more challenging than the convex weighted -norm problem. In fact, it is not easier than solving the original problem (3).

Input: - A Lipschitz constant of .
Initialize: , , and , .
Output: .
while not converge do

  1. Update by solving problem (20).

  2. Update the weights , , by


    end while

Algorithm 1 Solving problem (3) by IRNN

Instead of updating by solving (16), we linearize at and add a proximal term:


where . Such a choice of guarantees the convergence of our algorithm as shown later. Then we use the right hand sides of (13) and (19) as surrogates of and in (3), and update by solving


Solving (20) is equivalent to computing the proximity operator of the weighted nuclear norm. Due to (15), the solution to (20) has a closed form despite that it is nonconvex.

Lemma 3

[39, Theorem 2.3] For any , and , a globally optimal solution to the following problem


is given by the Weighted Singular Value Thresholding (WSVT)


where is the SVD of , and .

From Lemma 3, it can be seen that to solve (20) by using (22), (15) plays an important role and it holds for all satisfying the assumption A1. If , then reduces to the convex nuclear norm . In this case, for all . Then WSVT reduces to the conventional Singular Value Thresholding (SVT) [40], which is an important subroutine in convex low rank optimization. The updating rule (20) then reduces to the known proximal gradient method [10].

After updating by solving (20), we then update the weights , . Iteratively updating and the weights corresponding to its singular values leads to the proposed Iteratively Reweighted Nuclear Norm (IRNN) algorithm. The whole procedure of IRNN is shown in Algorithm 1. If the Lipschitz constant is not known or computable, the backtracking rule can be used to estimate in each iteration [10].

It is worth mentioning that for the penalty, if , then . By the updating rule of in (20), we have . This guarantees that the rank of the sequence is nonincreasing.

In theory, we can prove that IRNN converges. Since IRNN is a special case of IRNN with Parallel Splitting (IRNN-PS) in Section III, so we only give the convergence results of IRNN-PS later.

At the end of this section, we would like to remark some more differences between previous work and ours.

  • Our IRNN and IRNN-PS for nonconvex low rank minimization are different from previous iteratively reweighted solvers for nonconvex sparse minimization, e.g., [21, 30]. The key difference is that the weighted nuclear norm regularized problem is nonconvex while the weighted -norm regularized problem is convex. This makes the convergence analysis different.

  • Our IRNN and IRNN-PS utilize the common properties instead of specific ones of the nonconvex surrogates of -norm. This makes them much more general than many previous nonconvex low rank solvers, e.g., [22, 31, 33], which target for some special nonconvex problems.

Iii Extensions of IRNN and the Convergence Analysis

In this section, we extend IRNN to solve two types of problems which are more general than (3). The first one is to solve some similar problems as (3) but with more general nonconvex penalties. The second one is to solve problem (5) which has blocks of variables.

Iii-a IRNN for the Problems with More General Nonconvex Penalties

IRNN can be extended to solve the following problem


where ’s are concave and their supergradients satisfy for any , . The truncated nuclear norm [31] is an interesting example. Indeed, let


Then and its supergradients is


Compared with the alternating updating algorithm in [31], which require double loops, our IRNN will be more efficient and with stronger convergence guarantee.

Iii-B IRNN for the Multi-Blocks Problem (5)

The multi-blocks problem (5) also has some applications in computer vision. An example is the Latent Low Rank Representation (LatLRR) problem [36]


Here we propose a more general Tensor Low Rank Representation (TLRR) as follows


where is an -way tensor and denotes the -mode product [41]. TLRR is an extension of LRR [7] and LatLRR. It can also be applied for subspace clustering, see Section IV. If we replace in (26) as with ’s satisfying the assumption A1, then we have the Nonconvex TLRR (NTLRR) model which is a special case of (5).

Now we show how to solve (5). Similar to (20), we update , , by


where , the notation denotes the gradient of w.r.t. , and


Note that (III-B) and (29) can be computed in parallel for . So we call such a method as IRNN with Parallel Splitting (IRNN-PS).

Iii-C Convergence Analysis

In this section, we give the convergence analysis of IRNN-PS for (5). For the simplicity of notation, we denote as the -th singular value of in the -th iteration.

Theorem 1

In problem (5), assume that ’s satisfies the assumption A1 and is Lipschitz continuous. Then the sequence generated by IRNN-PS satisfies the following properties:

  1. is monotonically decreasing. Indeed,

  2. ;

Proof. First, since is optimal to (III-B), we have

It can be rewritten as

Second, since is Lipschitz continuous, by (6), we have

Third, by (29) and (9), we have

Summing the above three equations for all and leads to

Thus is monotonically decreasing. Summing the above inequality for , we get

This implies that .

(a) Random data without noise
(b) Running time
(c) Random data with noises
(d) Convergence curves
Fig. 4: Low-rank matrix recovery comparison of (a) frequency of successful recovery and (b) running time on random data without noise; (c) relative error and (d) convergence curves on random data with noises.
Theorem 2

In problem (5), assume iff . Then any accumulation point of generated by IRNN-PS is a stationary point to (5).

Proof. Due to the above assumption, is bounded. Thus there exists a matrix and a subsequence such that . Note that in Theorem 1, we have . Thus for and . By Lemma 1, implies that . From the upper semi-continuous property of the subdifferential [42, Proposition 2.1.5], there exists such that . Again by Lemma 1, and .

Denote . Since is optimal to (III-B), there exists , such that


Let in (30). Then there exists , such that


Thus is a stationary point to (5).

(a) Original image
(b) Noisy Image
(c) APGL
(d) LMaFit
(f) IRNN-
Fig. 5: Image recovery comparison by using different matrix completion algorithms. (a) Original image; (b) Image with Gaussian noise and text; (c)-(g) Recovered images by APGL, LMaFit, TNNR-ADMM, IRNN-, and IRNN-SCAD, respectively. Best viewed in sized color pdf file.

Iv Experiments

In this section, we present several experiments to demonstrate that the models with nonconvex rank surrogates outperform the ones with convex nuclear norm. We conduct three experiments. The first two aim to examine the convergence behavior of IRNN for the matrix completion problem [43] on both synthetic data and real images. The last experiment is tested on the tensor low rank representation problem (27) solved by IRNN-PS for face clustering.

For the first two experiments, we consider the nonconvex low rank matrix completion problem


where is the set of indices of samples, and is a linear operator that keeps the entries in unchanged and those outside zeros. The gradient of squared loss function in (32) is Lipschitz continuous, with a Lipschitz constant . We set in IRNN. For the choice of , we use five nonconvex surrogates in Table I, including -norm, SCAD, Logarithm, MCP and ETP. The other three nonconvex surrogates, including Capped , Geman and Laplace, are not used since we find that their recovery performances are very sensitive to the choices of and in different cases. For the choice of in , we use a continuation technique to enhance the low rank matrix recovery. The initial value of is set to a larger value , and dynamically decreased by with . It is stopped till reaching a predefined target . is initialized as a zero matrix. For the choice of parameters (e.g., and ) in , we search them from a candidate set and use the one which obtains good performance in most cases.

Iv-a Low Rank Matrix Recovery on the Synthetic Data

We first compare the low rank matrix recovery performances of nonconvex model (32) with the convex one by using nuclear norm [9] on the synthetic data. We conduct two tasks. The first one is tested on the observed matrix without noises, while the other one is tested on with noises.

(a) Original
(b) Noisy image
(c) APGL
(d) IRNN-
Fig. 6: Comparison of image recovery on more images. (a) Original images. (b) Images with noises. Recovered images by (c) APGL and (d) IRNN-. Best viewed in sized color pdf file.

For the noise free case, we generate the rank matrix as , where , and are generated by the Matlab command dn. We randomly set elements of to be missing. The Augmented Lagrange Multiplier (ALM) [44] method is used to solve the noise free problem


The default parameters of in the released codes222Code: of ALM are used. For problem (32), it is solved by IRNN with the parameters , and . The algorithm is stopped when . The matrix recovery performance is evaluated by the Relative Error defined as


where is the recovered matrix by different algorithms. If the Relative Error is smaller than , then is regarded as a successful recovery of . For each , we repeat the experiments times. Then we define the , where is the times of successful recovery. We also vary the underlying rank of from 20 to 33 for each algorithm. We show the frequency of success in Figure 3(a). The legend IRNN- in Figure 3(a) denotes the model (32) with penalty solved by IRNN. It can be seen that IRNN for (32) with nonconvex rank surrogates significantly outperforms ALM for (33) with convex rank surrogate. This is because the nonconvex surrogates approximate the rank function much better than the convex nuclear norm. This also verifies that our IRNN achieves good solutions of (32), though its optimal solutions are in general not computable.

For the second task, we assume that the observed matrix is noisy. It is generated by +0.1dn. We compare IRNN for (32) with convex Accelerated Proximal Gradient with Line search (APGL)333Code: [24] which solves the noisy problem


For this task, we set and in IRNN. We run the experiments for 100 times and the underlying rank is varying from 15 and 35. For each test, we compute the relative error in (34). Then we show the mean relative error over 100 tests in Figure 3(c). Similar to the noise free case, IRNN with nonconvex rank surrogates achieves much smaller recovery error than APGL for convex problem (35).

It is worth mentioning that though Logarithm seems to perform better than other nonconvex penalties for low rank matrix completion from Figure 4. It is still not clear which one is the best rank surrogate since the obtained solutions are not globally optimal. Answering this question is beyond the scope of this work.

Figure 3(b) shows the running times of the compared methods. It can be seen that IRNN is slower than the convex ALM. This is due to the reinitialization of IRNN when using the continuation technique. Figure 3(d) plots the objective function values in each iterations of IRNN with different nonconvex penalties. As verified in theory, it can be seen that the values are decreasing.

(a) PSNR values
(b) Relative error
(c) Running time
Fig. 7: Comparison of (a) PSNR values; (b) Relative error; and (c) Running time (seconds) for image recovery by different matrix completion methods.

Iv-B Application to Image Recovery

In this section, we apply the low rank matrix completion models (35) and (3) for image recovery. We follow the experimental settings in [31]. Here we consider two types of noises on the real images. The first one replaces of pixels with random values (sample image (1) in Figure 4(b)). The other one adds some unrelated texts on the image (sample image (2) in Figure 4(b)). The goal is to remove the noises by using low rank matrix completion. Actually, the real images may not be of low-rank. But their top singular values dominate the main information. Thus, the image can be approximately recovered by a low-rank matrix. For the color image, there are three channels. Matrix completion is applied for each channel independently. We compare IRNN with some state-of-the-art methods on this task, including APGL, Low-Rank Matrix Fitting (LMaFit)444Code: [45] and Truncated Nuclear Norm Regularization (TNNR)555Code: [31]. For the obtained solution, we evaluate its quality by the Peak Signal-to-Noise Ratio (PSNR) and the relative error (34).

Figure 5 (c)-(g) show the recovered images by different methods. It can be seen that our IRNN method for nonconvex models achieve much better recovery performance than APGL and LMaFit. The performances of low rank models (3) using different nonconvex surrogates are quite similar, so we only show the results by IRNN- and IRNN-SCAD due to the limit of space. Some more results are shown in Figure 6. Figure 7 shows the PSNR values, relative errors and running time of different methods on all the tested images. It can be seen that IRNN with all the evaluated nonconvex functions achieves higher PSNR values and smaller relative error. This verifies that the nonconvex penalty functions are effective in this situation. The nonconvex truncated nuclear norm is close to our methods, but its running time is 35 times of ours.

Fig. 8: Some example face images from (a) Extended Yale B and (b) UMIST databases.

Iv-C Tensor Low-Rank Representation

In this section, we consider to use the Tensor Low-Rank Representation (TLRR) (27) for face clustering [46, 36]. Problem (27) can be solved by the Accelerated Proximal Gradient (APG) [10] method with the optimal convergence rate , where is the number of iterations. The corresponding Nonconvex TLRR (NTLRR) related to (27) is


where we use the Logarithm function in Table I, since we find it achieves the best performance in the previous experiments. Problem (3