# Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

###### Abstract

The heavy-tailed distributions of corrupted outliers and singular values of all channels in low-level vision have proven effective priors for many applications such as background modeling, photometric stereo and image alignment. And they can be well modeled by a hyper-Laplacian. However, the use of such distributions generally leads to challenging non-convex, non-smooth and non-Lipschitz problems, and makes existing algorithms very slow for large-scale applications. Together with the analytic solutions to -norm minimization with two specific values of , i.e., and , we propose two novel bilinear factor matrix norm minimization models for robust principal component analysis. We first define the double nuclear norm and Frobenius/nuclear hybrid norm penalties, and then prove that they are in essence the Schatten- and quasi-norms, respectively, which lead to much more tractable and scalable Lipschitz optimization problems. Our experimental analysis shows that both our methods yield more accurate solutions than original Schatten quasi-norm minimization, even when the number of observations is very limited. Finally, we apply our penalties to various low-level vision problems, e.g., text removal, moving object detection, image alignment and inpainting, and show that our methods usually outperform the state-of-the-art methods.

## 1 Introduction

The sparse and low-rank priors have been widely used in many real-world applications in computer vision and pattern recognition, such as image restoration [1], face recognition [2], subspace clustering [3, 4, 5] and robust principal component analysis [6] (RPCA, also called low-rank and sparse matrix decomposition in [7, 8] or robust matrix completion in [9]). Sparsity plays an important role in various low-level vision tasks. For instance, it has been observed that the gradient of natural scene images can be better modeled with a heavy-tailed distribution such as hyper-Laplacian distributions (, typically with , which correspond to non-convex -norms) [10, 11], as exhibited by the sparse noise/outliers in low-level vision problems [12] shown in Fig. 1. To induce sparsity, a principled way is to use the convex -norm [6, 13, 14, 15, 16, 17], which is the closest convex relaxation of the sparser -norm, with compressed sensing being a prominent example. However, it has been shown in [18] that the -norm over-penalizes large entries of vectors and results in a biased solution. Compared with the -norm, many non-convex surrogates of the -norm listed in [19] give a closer approximation, e.g., SCAD [18] and MCP [20]. Although the use of hyper-Laplacian distributions makes the problems non-convex, fortunately an analytic solution can be derived for two specific values of , and , by finding the roots of a cubic and quartic polynomial, respectively [10, 21, 22]. The resulting algorithm can be several orders of magnitude faster than existing algorithms [10].

As an extension from vectors to matrices, the low-rank structure is the sparsity of the singular values of a matrix. Rank minimization is a crucial regularizer to induce a low-rank solution. To solve such a problem, the rank function is usually relaxed by its convex envelope [5, 23, 24, 25, 26, 27, 28, 29], the nuclear norm (i.e., the sum of the singular values, also known as the trace norm or Schatten- norm [26, 30]). By realizing the intimate relationship between the -norm and nuclear norm, the latter also over-penalizes large singular values, that is, it may make the solution deviate from the original solution as the -norm does [19, 31]. Compared with the nuclear norm, the Schatten- norm for is equivalent to the -norm on singular values and makes a closer approximation to the rank function [32, 33]. Nie et al. [31] presented an efficient augmented Lagrange multiplier (ALM) method to solve the joint -norm and Schatten- norm (LpSq) minimization. Lai et al. [32] and Lu et al. [33] proposed iteratively reweighted least squares methods for solving Schatten quasi-norm minimization problems. However, all these algorithms have to be solved iteratively, and involve singular value decomposition (SVD) in each iteration, which occupies the largest computation cost, [34, 35].

It has been shown in [36] that the singular values of nonlocal matrices in natural images usually exhibit a heavy-tailed distribution (), as well as the similar phenomena in natural scenes [37, 38], as shown in Fig. 2. Similar to the case of heavy-tailed distributions of sparse outliers, the analytic solutions can be derived for the two specific cases of , and . However, such algorithms have high per-iteration complexity . Thus, we naturally want to design equivalent, tractable and scalable forms for the two cases of the Schatten- quasi-norm, and , which can fit the heavy-tailed distribution of singular values closer than the nuclear norm, as analogous to the superiority of the quasi-norm (hyper-Laplacian priors) to the -norm (Laplacian priors).

We summarize the main contributions of this work as follows. 1) By taking into account the heavy-tailed distributions of both sparse noise/outliers and singular values of matrices, we propose two novel tractable bilinear factor matrix norm minimization models for RPCA, which can fit empirical distributions very well to corrupted data. 2) Different from the definitions in our previous work [34], we define the double nuclear norm and Frobenius/nuclear hybrid norm penalties as tractable low-rank regularizers. Then we prove that they are in essence the Schatten- and quasi-norms, respectively. The solution of the resulting minimization problems only requires SVDs on two much smaller factor matrices as compared with the much larger ones required by existing algorithms. Therefore, our algorithms can reduce the per-iteration complexity from to , where in general. In particular, our penalties are Lipschitz, and more tractable and scalable than original Schatten quasi-norm minimization, which is non-Lipschitz and generally NP-hard [32, 39]. 3) Moreover, we present the convergence property of the proposed algorithms for minimizing our RPCA models and provide their proofs. We also extend our algorithms to solve matrix completion problems, e.g., image inpainting. 4) We empirically study both of our bilinear factor matrix norm minimizations and show that they outperform original Schatten norm minimization, even with only a few observations. Finally, we apply the defined low-rank regularizers to address various low-level vision problems, e.g., text removal, moving object detection, and image alignment and inpainting, and obtain superior results than existing methods.

Sparsity | Low-rankness | |||
---|---|---|---|---|

-norm | rank | |||

-norm | norm | |||

-norm | nuclear norm | |||

Frobenius norm | Frobenius norm |

## 2 Related Work

In this section, we mainly discuss some recent advances in RPCA, and briefly review some existing work on RPCA and its applications in computer vision (readers may see [6] for a review). RPCA [24, 40] aims to recover a low-rank matrix () and a sparse matrix from corrupted observations as follows:

(1) |

where denotes the -norm^{1}^{1}1Strictly speaking, the -norm is not actually a norm, and is defined as the number of non-zero elements. When , strictly defines a norm which satisfies the three norm conditions, while it defines a quasi-norm when . Due to the relationship between and , the latter has the same cases as the former. and is a regularization parameter. Unfortunately, solving (1) is NP-hard. Thus, we usually use the convex or non-convex surrogates to replace both of the terms in (1), and formulate this problem into the following more general form:

(2) |

where in general , and are depicted in Table I and can be seen as the loss term and regularized term, respectively, and is the orthogonal projection onto the linear subspace of matrices supported on : if and otherwise. If is a small subset of the entries of the matrix, (2) is also known as the robust matrix completion problem as in [9], and it is impossible to exactly recover [41]. As analyzed in [42], we can easily see that the optimal solution , where is the complement of , i.e., the index set of unobserved entries. When and , (2) becomes a nuclear norm regularized least squares problem as in [43] (e.g., image inpainting in Sec. 6.3.4).

Model | Objective function | Constraints | Parameters | Convex? | Per-iteration |

Complexity | |||||

RPCA [6, 44] | Yes | ||||

PSVT [45] | No | ||||

WNNM [46] | |||||

LMaFit [47] | No | ||||

MTF [48] | |||||

RegL1 [16] | |||||

Unifying [49] | |||||

factEN [15] | |||||

LpSq [31] | No | ||||

No | |||||

### 2.1 Convex Nuclear Norm Minimization

In [6, 40, 44], both of the non-convex terms in (1) are replaced by their convex envelopes, i.e., the nuclear norm () and the -norm (), respectively.

(3) |

Wright et al. [40] and Candès et al. [6] proved that, under some mild conditions, the convex relaxation formulation (3) can exactly recover the low-rank and sparse matrices with high probability. The formulation (3) has been widely used in many computer vision applications, such as object detection and background subtraction [17], image alignment [50], low-rank texture analysis [29], image and video restoration [51], and subspace clustering [27]. This is mainly because the optimal solutions of the sub-problems involving both terms in (3) can be obtained by two well-known proximal operators: the singular value thresholding (SVT) operator [23] and the soft-thresholding operator [52]. The -norm penalty in (3) can also be replaced by the -norm as in outlier pursuit [28, 53, 54, 55] and subspace learning [5, 56, 57].

To efficiently solve the popular convex problem (3), various first-order optimization algorithms have been proposed, especially the alternating direction method of multipliers [58] (ADMM, or also called inexact ALM in [44]). However, they all involve computing the SVD of a large matrix of size in each iteration, and thus suffer from high computational cost, which severely limits their applicability to large-scale problems [59], as well as existing Schatten- quasi-norm () minimization algorithms such as LpSq [31]. While there have been many efforts towards fast SVD computation such as partial SVD [60], the performance of those methods is still unsatisfactory for many real applications [59, 61].

### 2.2 Non-Convex Formulations

To address this issue, Shen et al. [47] efficiently solved the RPCA problem by factorizing the low-rank component into two smaller factor matrices, i.e., as in [62], where , , and usually , as well as the matrix tri-factorization (MTF) [48] and factorized data [63] cases. In [16, 42, 64, 65, 66], the column-orthonormal constraint is imposed on the first factor matrix . According to the following matrix property, the original convex problem (3) can be reformulated as a smaller matrix nuclear norm minimization problem.

###### Property 1.

For any matrix and . If has orthonormal columns, i.e., , then .

Cabral et al. [49] and Kim et al. [15] replaced the nuclear norm regularizer in (3) with the equivalent non-convex formulation stated in Lemma 1, and proposed scalable bilinear spectral regularized models, similar to collaborative filtering applications in [26, 67]. In [15], an elastic-net regularized matrix factorization model was proposed for subspace learning and low-level vision problems.

Besides the popular nuclear norm, some variants of the nuclear norm were presented to yield better performance. Gu et al. [37] proposed a weighted nuclear norm (i.e., , where ), and assigned different weights to different singular values such that the shrinkage operator becomes more effective. Hu et al. [38] first used the truncated nuclear norm (i.e., ) to address image recovery problems. Subsequently, Oh et al. [45] proposed an efficient partial singular value thresholding (PSVT) algorithm to solve many RPCA problems of low-level vision. The formulations mentioned above are summarized in Table II.

## 3 Bilinear Factor Matrix Norm Minimization

In this section, we first define the double nuclear norm and Frobenius/nuclear hybrid norm penalties, and then prove the equivalence relationships between them and the Schatten quasi-norms. Incorporating with hyper-Laplacian priors of both sparse noise/outliers and singular values, we propose two novel bilinear factor matrix norm regularized models for RPCA. Although the two models are still non-convex and even non-smooth, they are more tractable and scalable optimization problems, and their each factor matrix term is convex. On the contrary, the original Schatten quasi-norm minimization problem is very difficult to solve because it is generally non-convex, non-smooth, and non-Lipschitz, as well as the quasi-norm [39].

As in some collaborative filtering applications [26, 67], the nuclear norm has the following alternative non-convex formulations.

###### Lemma 1.

For any matrix of rank at most , the following equalities hold:

(4) |

The bilinear spectral penalty in the first equality of (4) has been widely used in low-rank matrix completion and recovery problems, such as RPCA [15, 49], online RPCA [68], matrix completion [69], and image inpainting [25].

### 3.1 Double Nuclear Norm Penalty

Inspired by the equivalence relation between the nuclear norm and the bilinear spectral penalty, our double nuclear norm (D-N) penalty is defined as follows.

###### Definition 1.

For any matrix of rank at most , we decompose it into two factor matrices and such that . Then the double nuclear norm penalty of is defined as

(5) |

Different from the definition in [34, 70], i.e., , which cannot be used directly to solve practical problems, Definition 1 can be directly used in practical low-rank matrix completion and recovery problems, e.g., RPCA and image recovery. Analogous to the well-known Schatten- quasi-norm [31, 32, 33], the double nuclear norm penalty is also a quasi-norm, and their relationship is stated in the following theorem.

###### Theorem 1.

The double nuclear norm penalty is a quasi-norm, and also the Schatten- quasi-norm, i.e.,

(6) |

To prove Theorem 1, we first give the following lemma, which is mainly used to extend the well-known trace inequality of John von Neumann [71, 72].

###### Lemma 2.

Let be a symmetric positive semi-definite (PSD) matrix and its full SVD be with . Suppose is a diagonal matrix (i.e., ), and if and , then

Lemma 2 can be seen as a special case of the well-known von Neumann’s trace inequality, and its proof is provided in the Supplementary Materials.

###### Lemma 3.

For any matrix , and , the following inequality holds:

###### Proof:

Let and be the thin SVDs of and , where , , and . Let , where the columns of and are the left and right singular vectors associated with the top singular values of with rank at most , and . Suppose , and , we first construct the following PSD matrices and :

Because the trace of the product of two PSD matrices is always non-negative (see the proof of Lemma 6 in [73]), we have

By further simplifying the above expression, we obtain

(7) |

The detailed proof of Theorem 1 is provided in the Supplementary Materials. According to Theorem 1, it is easy to verify that the double nuclear norm penalty possesses the following property [34].

###### Property 2.

Given a matrix with , the following equalities hold:

### 3.2 Frobenius/Nuclear Norm Penalty

Inspired by the definitions of the bilinear spectral and double nuclear norm penalties mentioned above, we define a Frobenius/nuclear hybrid norm (F-N) penalty as follows.

###### Definition 2.

For any matrix of rank at most , we decompose it into two factor matrices and such that . Then the Frobenius/nuclear hybrid norm penalty of is defined as

(10) |

Different from the definition in [34], i.e., , Definition 2 can also be directly used in practical problems. Analogous to the double nuclear norm penalty, the Frobenius/nuclear hybrid norm penalty is also a quasi-norm, as stated in the following theorem.

###### Theorem 2.

The Frobenius/nuclear hybrid norm penalty, , is a quasi-norm, and is also the Schatten- quasi-norm, i.e.,

(11) |

To prove Theorem 2, we first give the following lemma.

###### Lemma 4.

For any matrix , and , the following inequality holds:

###### Proof:

To prove this lemma, we use the same notations as in the proof of Lemma 3, e.g., , and denote the thin SVDs of and , respectively. Suppose , and , we first construct the following PSD matrices and :

Similar to Lemma 3, we have the following inequality:

By further simplifying the above expression, we also obtain

(12) |

According to Theorem 2 (see the Supplementary Materials for its detailed proof), it is easy to verify that the Frobenius/nuclear hybrid norm penalty possesses the following property [34].

###### Property 3.

For any matrix with , the following equalities hold:

Similar to the relationship between the Frobenius norm and nuclear norm, i.e., [26], the bounds hold for between both the double nuclear norm and Frobenius/nuclear hybrid norm penalties and the nuclear norm, as stated in the following property.

###### Property 4.

For any matrix , the following inequalities hold:

The proof of Property 4 is similar to that in [34]. Moreover, both the double nuclear norm and Frobenius/nuclear hybrid norm penalties naturally satisfy many properties of quasi-norms, e.g., the unitary-invariant property. Obviously, we can find that Property 4 in turn implies that any low F-N or D-N penalty is also a low nuclear norm approximation.

### 3.3 Problem Formulations

Without loss of generality, we can assume that the unknown entries of are set to zero (i.e., ), and may be any values^{2}^{2}2Considering the optimal solution , must be set to 0 for the expected output . (i.e., ) such that . Thus, the constraint with the projection operator in (2) is considered instead of just as in [42]. Together with hyper-Laplacian priors of sparse components, we use and defined above to replace in (2), and present the following double nuclear norm and Frobenius/nuclear hybrid norm penalized RPCA models:

(15) |

(16) |

## 4 Optimization Algorithms

To efficiently solve both our challenging problems (15) and (16), we need to introduce the auxiliary variables and , or only to split the interdependent terms such that they can be solved independently. Thus, we can reformulate Problems (15) and (16) into the following equivalent forms:

(17) |

(18) |

### 4.1 Solving (17) via ADMM

Inspired by recent progress on alternating direction methods [44, 58], we mainly propose an efficient algorithm based on the alternating direction method of multipliers [58] (ADMM, also known as the inexact ALM [44]) to solve the more complex problem (17), whose augmented Lagrangian function is given by

where is the penalty parameter, represents the inner product operator, and , and are Lagrange multipliers.

#### 4.1.1 Updating and

To update