Unified Scalable Equivalent Formulations for Schatten Quasi-Norms

# Unified Scalable Equivalent Formulations for Schatten Quasi-Norms

Fanhua Shang,  Yuanyuan Liu, and James Cheng F. Shang, Y. Liu and J. Cheng are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. E-mail: {fhshang, yyliu, jcheng}@cse.cuhk.edu.hk.CUHK Technical Report CSE-ShangLC20160307, March 7, 2016.
###### Abstract

The Schatten quasi-norm can be used to bridge the gap between the nuclear norm and rank function, and is the tighter approximation to matrix rank. However, most existing Schatten quasi-norm minimization (SQNM) algorithms, as well as for nuclear norm minimization (NNM), are too slow or even impractical for large-scale problems, due to the singular value decomposition (SVD) or eigenvalue decomposition (EVD) of the whole matrix in each iteration. In this paper, we rigorously prove that for any , , satisfying , the Schatten- quasi-norm of any matrix is equivalent to minimizing the product of the Schatten- norm (or quasi-norm) and Schatten- norm (or quasi-norm) of its two factor matrices. Then we present and prove the equivalence relationship between the product formula of the Schatten quasi-norm and its weighted sum formula for the two cases of and : and . In particular, when , there is an equivalence between the Schatten- quasi-norm of any matrix and the Schatten- norms of its two factor matrices, where the widely used equivalent formulation of the nuclear norm, i.e., , can be viewed as a special case. That is, various SQNM problems with can be transformed into the one only involving smooth, convex norms of two factor matrices, which can lead to simpler and more efficient algorithms than conventional methods.

We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can see that for any , the Schatten- quasi-norm of any matrix is the minimization of the mean of the Schatten- norms of all factor matrices, where denotes the largest integer not exceeding . In other words, for any , the SQNM problem can be transformed into an optimization problem only involving the smooth, convex norms of multiple factor matrices. In addition, we also present some representative examples for two and three factor matrices. Naturally, the bi-nuclear and Frobenius/nuclear quasi-norms defined in our previous paper [1] and the tri-nuclear quasi-norm defined in our previous paper [2] are three important special cases.

Schatten quasi-norm, nuclear norm, rank function, factor matrix, equivalent formulations.

## I Introduction

The affine rank minimization problem arises directly in various areas of science and engineering including statistics, machine learning, information theory, data mining, medical imaging and computer vision. Some representative applications include low-rank matrix completion (LRMC) [3], robust principal component analysis (RPCA) [4], low-rank representation [5], multivariate regression [6], multi-task learning [7] and system identification [8]. To efficiently solve such problems, we mainly relax the rank function to its tractable convex envelope, i.e., the nuclear norm (sum of the singular values, also known as the trace norm or Schatten- norm), which leads to a convex optimization problem [3, 9, 10, 11]. In fact, the nuclear norm of one matrix is the -norm of the vector of its singular values, and thus it can motivate a low-rank solution. However, it has been shown in [12, 13] that the -norm over-penalizes large entries of vectors, and therefore results in a solution from a possibly biased solution space. Recall from the relationship between the -norm and nuclear norm, the nuclear norm penalty shrinks all singular values equally, which also leads to over-penalize large singular values. That is, the nuclear norm may make the solution deviate from the original solution as the -norm does. Compared with the nuclear norm, the Schatten- quasi-norm with is non-convex, but it can give a closer approximation to the rank function. Thus, the Schatten- quasi-norm minimization (SQNM) has received a significant amount of attention from researchers in various communities, such as images recovery [14], collaborative filtering [15, 16] and MRI analysis [17].

Recently, two classes of iterative reweighted lease squares (IRLS) algorithms in [18] and [19] were proposed to approximate associated Schatten- quasi-norm minimization problems, respectively. In addition, Lu et al. [14] proposed a family of iteratively reweighted nuclear norm (IRNN) algorithms to solve various non-convex surrogate (including the Schatten quasi-norm) minimization problems. In [14, 15, 20, 21], the Schatten- quasi-norm has been shown to be empirically superior to the nuclear norm for many different problems. Moreover, [22] theoretically proved that the SQNM requires significantly fewer measurements than conventional nuclear norm minimization (NNM). However, existing algorithms mentioned above have to be solved iteratively and involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, as well as those for NNM. Thus they suffer from high computational cost and are even not applicable for large-scale problems [1, 2].

On the contrary, the nuclear norm has a scalable equivalent formulation, also known as the bilinear spectral penalty [11, 23, 24], which has been successfully applied in many large-scale applications, such as collaborative filtering [16, 25, 26]. In addition, Zuo et al. [27] proposed a generalized shrinkage-thresholding operator to iteratively solve quasi-norm minimization with arbitrary values, i.e., . Since the Schatten- quasi-norm of one matrix is equivalent to the quasi-norm on its singular values, we may naturally ask the following question: can we design a unified scalable equivalent formulation to the Schatten- quasi-norm with arbitrary values, i.e., .

In this paper, we first present and prove the equivalence relationship between the Schatten- quasi-norm of any matrix and the minimization of the product of the Schatten- norm (or quasi-norm) and Schatten- norm (or quasi-norm) of its two factor matrices, for any , , satisfying . In addition, we also prove the equivalence relationship between the product formula of the Schatten quasi-norm and its weighted sum formula for the two cases of and : and . When and by setting the same value for and , there is an equivalence between the Schatten- quasi-norm (or norm) of any matrix and the Schatten- norms of its two factor matrices, where a representative example is the widely used equivalent formulation of the nuclear norm, i.e., . In other worlds, various SQNM problems with can be transformed into the one only involving the smooth convex norms of two factor matrices, which can lead to simpler and more efficient algorithms than conventional methods [14, 15, 18, 19, 20, 21].

We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can know that for any , the Schatten- quasi-norm of any matrix is equivalent to the minimization of the mean of the Schatten- norms of all factor matrices, where denotes the largest integer not exceeding . Note that the norms of all factor matrices are convex and smooth. Besides the theoretical results, we also present several representative examples for two and three factor matrices. Naturally, the bi-nuclear and Frobenius/nuclear quasi-norms defined in our previous paper [1] and the tri-nuclear quasi-norm defined in our previous paper [2] are three important special cases.

## Ii Notations and Background

###### Definition 1.

The Schatten- norm () of a matrix (without loss of generality, we can assume that ) is defined as

 ∥X∥Sp=(n∑i=1σpi(X))1/p, (1)

where denotes the -th singular value of .

When , Definition 1 defines a natural norm, for instance, the Schatten- norm is the so-called nuclear norm, , and the Schatten- norm is the well-known Frobenius norm, whereas it defines a quasi-norm for . As the non-convex surrogate for the rank function, the Schatten- quasi-norm is the better approximation than the nuclear norm [22], analogous to the superiority of the quasi-norm to the -norm [19, 28].

To recover a low-rank matrix from a small set of linear observations, , the general SQNM problem is formulated as follows:

 minX∈Rm×n∥X∥pSp,subject toA(X)=b (2)

where is a general linear operator. Alternatively, the Lagrangian version of (2) is

 minX∈Rm×nλ∥X∥pSp+f(A(X)−b) (3)

where is a regularization parameter, and the loss function generally denotes certain measurement for characterizing the loss . For instance, is the linear projection operator , and in LRMC problems [14, 18, 21, 29], where is the orthogonal projection onto the linear subspace of matrices supported on : if and otherwise. In addition, for RPCA problems [4, 30, 31, 32, 33], is the identity operator and . In the problem of multivariate regression [34], with being a given matrix, and . may be chosen as the Hinge loss in [23] or the quasi-norm in [15].

Generally, the SQNM problem, such as (2) and (3), is non-convex, non-smooth and even non-Lipschitz [35]. So far, only few algorithms, such as IRLS [18, 19] and IRNN [14], have been developed to solve such challenging problems. However, since most existing SQNM algorithms involve SVD or EVD of the whole matrix in each iteration, they suffer from a high computational cost of , which severely limits their applicability to large-scale problems [1, 2]. While there have been many efforts towards fast SVD or EVD computation such as partial SVD [36], the performance of those methods is still unsatisfactory for many real applications [37]. As an alternative to reduce the computational complexity of SVD or EVD on a large matrix, one can factorize into two smaller factor matrices, i.e., . According to the unitary invariant property of norms, (2) and (3) can be reformulated into two much smaller matrices optimization problems as in [38, 39], which are still non-convex, non-smooth and even non-Lipschitz. Therefore, it is a very important problem that how to transform the challenging problems such as (2) and (3) into more tractable ones, which can be solved by simpler and more efficient algorithms.

## Iii Main Results

In this section, we first present and prove the equivalence relationship between the Schatten- quasi-norm of any matrix and the Schatten- and Schatten- quasi-norms (or norms) of its two factor matrices, where with any and . Moreover, we prove the equivalence relationship between the product formula of the Schatten quasi-norm and its weighted sum formula for the two cases of and : and , respectively. For any , the Schatten- quasi-norm (or norm) of any matrix is equivalent to the minimization of the mean of the Schatten- norms of both factor matrices, for instance , which can lead to simpler and more efficient algorithms than conventional methods. Finally, we extend the theoretical results of two factor matrices to the cases of three and more factor matrices. We can see that for any , the Schatten- quasi-norm of any matrix is the minimization of the mean of the Schatten- norms of all factor matrices, where denotes the largest integer not exceeding .

### Iii-a Unified Schatten Quasi-Norm Formulations of Two Factor Matrices

###### Theorem 1.

For any matrix with , it can be decomposed into the product of two much smaller matrices and , i.e., . For any , and satisfying , then

 ∥X∥Sp=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥Sp1∥V∥Sp2. (4)

The detailed proof of Theorem 1 is provided in Section IV-A. From Theorem 1, it is very clear that for any and satisfying , then the Schatten- quasi-norm (or norm) of any matrix is equivalent to minimizing the product of the Schatten- norm (or quasi-norm) and Schatten- norm (or quasi-norm) of its two factor matrices.

Naturally, we can see that and may have the same value, i.e., , or different values, i.e., . Next, we discuss these two cases for and , i.e., and .

#### Iii-A1 Case of p1=p2

First, we discuss the case when . In fact, for any given , there exist infinitely many pairs of positive numbers and satisfying , such that the equality (4) holds. By setting the same value for and , i.e., , we give a unified scalable equivalent formulation for the Schatten- quasi-norm (or norm) as follows.

###### Theorem 2.

Given any matrix of , then the following equalities hold:

 (5)
###### Remark 1.

The detailed proof of Theorem 2 is provided in Section IV-B. From the second equality in (5), we know that, for any , the Schatten- quasi-norm (or norm) minimization problems in many low-rank matrix completion and recovery applications can be transformed into the one of minimizing the mean of the Schatten- norms (or quasi-norms) of both much smaller factor matrices. We note that when , the norms of both much smaller factor matrices are convex and smooth (see Example 2 below) due to , which can lead to simpler and more efficient algorithms than conventional methods [14, 15, 18, 19, 20, 21].

When and , the equalities in Theorem 2 become the following forms.

###### Corollary 1.

Given any matrix with , the following equalities hold:

 ∥X∥∗=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥F∥V∥F=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥2F+∥V∥2F2. (6)

The bilinear spectral penalty in the second equality of (6) has been widely used in many low-rank matrix completion and recovery problems, such as collaborative filtering [11, 23], RPCA [40], online RPCA [41], and image recovery [42]. Note that the well-known equivalent formulations of the nuclear norm in Corollary 1 are just a special case of Theorem 2, i.e., and . In the following, we give two more representative examples for the case of .

Example 1: When , and by setting and using Theorem 1, we have

 ∥X∥S1/2=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥∗∥V∥∗.

Due to the basic inequality for any real numbers and , we obtain

 ∥X∥S1/2=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥∗∥V∥∗≤minU∈Rm×d,V∈Rn×d:X=UVT(∥U∥∗+∥V∥∗2)2.

Let and as in [1, 2], then we have and

 ∥X∥S1/2=(\textupTr1/2(ΣX))2=∥U⋆∥∗∥V⋆∥∗=(∥U⋆∥∗+∥V⋆∥∗2)2

where . Therefore, under the constraint , we have the following property [1, 2].

###### Property 1.
 ∥X∥S1/2=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥∗∥V∥∗=minU∈Rm×d,V∈Rn×d:X=UVT(∥U∥∗+∥V∥∗2)2. (7)

In our previous papers [1, 2], the scalable formulations in the above equalities are known as the bi-nuclear quasi-norm. In other words, the bi-nuclear quasi-norm is also a special case of Theorem 2, i.e., and .

Example 2: When , and by setting and using Theorem 1, we have

 ∥X∥S2/3=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥S4/3∥V∥S4/3.

Due to the basic inequality for any real numbers and , then

 ∥X∥S2/3=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥S4/3∥V∥S4/3=minU∈Rm×d,V∈Rn×d:X=UVT(∥U∥2/3S4/3∥V∥2/3S4/3)3/2≤minU∈Rm×d,V∈Rn×d:X=UVT⎛⎜ ⎜⎝∥U∥4/3S4/3+∥V∥4/3S4/32⎞⎟ ⎟⎠3/2.

Let and , then we have and

 ∥X∥S2/3=(\textupTr2/3(ΣX))3/2=∥U⋆∥S4/3∥V⋆∥S4/3=⎛⎜ ⎜⎝∥U⋆∥4/3S4/3+∥V⋆∥4/3S4/32⎞⎟ ⎟⎠3/2.

Together with the constraint , thus we have the following property.

###### Property 2.
 ∥X∥S2/3=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥S4/3∥V∥S4/3=minU∈Rm×d,V∈Rn×d:X=UVT⎛⎜ ⎜⎝∥U∥4/3S4/3+∥V∥4/3S4/32⎞⎟ ⎟⎠3/2. (8)

#### Iii-A2 Case of p1≠p2

In this part, we discuss the case of . Different from the case of , we may set infinitely many different values for and . For any given , there must exist , at least one of which is no less than 1 (which means that the norm of one factor matrix can be convex), such that . Indeed, for any , the values of and may be different, e.g., and for , thus we give the following unified scalable equivalent formulations for the Schatten- quasi-norm (or norm).

###### Theorem 3.

Given any matrix of , and any , and satisfying , then the following equalities hold:

 (9)
###### Remark 2.

The detailed proof of Theorem 3 is given in Section IV-C. From Theorem 3, we know that Theorem 2 and Corollary 1 can be viewed as two special cases of Theorem 3, i.e., and , respectively. That is, Theorem 3 is the more general form of Theorem 2 and Corollary 1. From the second equality in (9), we can see that, for any , the Schatten- quasi-norm (or norm) minimization problem can be transformed into the one of minimizing the weighted sum of the Schatten- norm (or quasi-norm) and Schatten- norm (or quasi-norm) of two much smaller factor matrices (see Example 3 and Example 4 below), where the weights of the two terms in the second equality of (9) are and , respectively.

In the following, we give two representative examples for the case of .

Example 3: When , and by setting and , and using Theorem 1, then

 ∥X∥S2/3=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥∗∥V∥F.

 ∥U∥∗∥V∥F=√∥U∥∗√∥U∥∗∥V∥Fa≤(√∥U∥∗+√∥U∥∗+∥V∥F3)3=⎛⎜ ⎜⎝2√∥U∥∗+√∥V∥2F3⎞⎟ ⎟⎠3b≤(2∥U∥∗+∥V∥2F3)3/2

where the inequality holds due to the fact that for any real numbers , and , and the inequality follows from the Jensen’s inequality for the concave function .

Let and as in [1], then we have and

 ∥X∥S2/3=(\textupTr2/3(ΣX))3/2=∥U⋆∥∗∥V⋆∥F=(2∥U⋆∥∗+∥V⋆∥2F3)3/2.

Therefore, together with the constraint , we have the following property [1].

###### Property 3.
 ∥X∥S2/3=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥∗∥V∥F=minU∈Rm×d,V∈Rn×d:X=UVT(2∥U∥∗+∥V∥2F3)3/2. (10)

In our previous paper [1], the scalable formulations in the above equalities are known as the Frobenius/nuclear hybrid quasi-norm. It is clear that the Frobenius/nuclear hybrid quasi-norm is also a special case of Theorem 3, i.e., , and . As shown in the above representative examples and our previous papers [1, 2], we can design more efficient algorithms to solve the Schatten- quasi-norm with than conventional methods [14, 15, 18, 19, 20, 21].

Example 4: When , and by setting and , and using Theorem 1, we have

 ∥X∥S2/5=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥S1/2∥V∥F.

Moreover,

 ∥U∥S1/2∥V∥F=(∥U∥1/4S1/2)4∥V∥Fa≤⎛⎜ ⎜ ⎜ ⎜⎝4√∥U∥1/2S1/2+√∥V∥2F5⎞⎟ ⎟ ⎟ ⎟⎠5b≤⎛⎜ ⎜⎝4∥U∥1/2S1/2+∥V∥2F5⎞⎟ ⎟⎠5/2

where the inequality holds due to the familiar inequality of arithmetic and geometric means, and the inequality follows from the Jensen’s inequality for the concave function .

Let and , then we have and

 ∥X∥S2/5=(\textupTr2/5(ΣX))5/2=∥U⋆∥S1/2∥V⋆∥F=⎛⎜ ⎜⎝4∥U⋆∥1/2S1/2+∥V⋆∥2F5⎞⎟ ⎟⎠5/2.

With the constraint , thus we have the following property.

###### Property 4.
 ∥X∥S2/5=minU∈Rm×d,V∈Rn×d:X=UVT∥U∥S1/2∥V∥F=minU∈Rm×d,V∈Rn×d:X=UVT⎛⎜ ⎜⎝4∥U∥1/2S1/2+∥V∥2F5⎞⎟ ⎟⎠5/2. (11)

### Iii-B Extensions to Multiple Factor Matrices

###### Theorem 4.

For any matrix of , it can be decomposed into the product of three much smaller matrices , and , i.e., . For any and for all , satisfying , then

 (12)

The detailed proof of Theorem 4 is provided in Section IV-D. From Theorem 4, we can see that for any and satisfying , the Schatten- quasi-norm (or norm) of any matrix is equivalent to minimizing the weighted sum of the Schatten- norm (or quasi-norm), Schatten- norm (or quasi-norm) and Schatten- norm (or quasi-norm) of these three much smaller factor matrices, where the weights of the three terms are , and , respectively. Similarly, we extend Theorem 4 to the case of more factor matrices as follows.

###### Theorem 5.

For any matrix of , it can be decomposed into the product of multiple much smaller matrices , , i.e., . For any and for all , satisfying , then

 ∥X∥Sp=minUi:X=∏Mi=1UiM∏i=1∥Ui∥Spi=minUi:X=∏Mi=1Ui⎛⎜⎝∑Mi=1∥Ui∥piSpi/pi1/p⎞⎟⎠1/p. (13)

The proof of Theorem 5 is very similar to that of Theorem 4 and is thus omitted. From Theorem 5, we can know that for any and for all , satisfying , the Schatten- quasi-norm (or norm) of any matrix is equivalent to the minimization of the weighted sum of the Schatten- norm (or quasi-norm) of each much smaller factor matrix, where the weights for these terms are for all .

Similar to the case of two factor matrices, for any given , there exist infinitely many positive numbers , and such that , and the equality (12) holds. By setting the same value for , and , i.e., , we give the following unified scalable equivalent formulations for the Schatten- quasi-norm (or norm).

###### Corollary 2.

Given any matrix of , then the following equalities hold:

 (14)
###### Remark 3.

The detailed proof of Corollary 2 is provided in Section IV-E. From the second equality in (14), we know that, for any , various Schatten- quasi-norm minimization problems in many low-rank matrix completion and recovery applications can be transformed into the problem of minimizing the mean of the Schatten- norms (or quasi-norms) of three much smaller factor matrices. In addition, we note that when , the norms of the three factor matrices are convex and smooth due to , which can also lead to some simpler and more efficient algorithms than conventional methods.

Example 5: In the following, we give a representative example. When and , the equalities in Corollary 2 become the following forms [2].

###### Property 5.

For any matrix of , then the following equalities hold:

 ∥X∥S1/3=minU∈Rm×d,V∈Rd×d,W∈Rn×d:X=UVWT∥U∥∗∥V∥∗∥W∥∗=minU,V,W:X=UVWT(∥U∥∗+∥V∥∗+∥W∥∗3)3. (15)

From Property 5, we can see that the tri-nuclear quasi-norm defined in our previous paper [2] is also a special case of Corollary 2.

\textcolor

[rgb]1.00,0.00,0.00From Theorem 2, we can know that for any , the Schatten- quasi-norm (or norm) of any matrix is equivalent to minimizing the mean of the Schatten- norms of both factor matrices, as well as Corollary 2 for any . In other worlds, if or , the original Schatten- quasi-norm (or norm) minimization problem can be transformed into a simpler one only involving the convex and smooth norms of two or three factor matrices. In addition, we extend the results of Theorem 2 and Corollary 2 to the case of more factor matrices, as shown in Corollary 3 below. The proof of Corollary 3 is very similar to that of Corollary 2 and is thus omitted. In other words, for any , the Schatten- quasi-norm of any matrix can theoretically be equivalent to the minimization of the mean of the Schatten- norms of all factor matrices, where and denotes the largest integer not exceeding . It needs to be strongly emphasized that the norms of all factor matrices are convex and smooth due to , which can help us to design simpler and more efficient algorithms.

###### Corollary 3.

Given any matrix of , then the following equalities hold:

 ∥X∥Sp=minUi:X=∏Mi=1UiM∏i=1∥Ui∥SMp=minUi:X=∏Mi=1Ui⎛⎜⎝∑Mi=1∥Ui∥MpSMpM⎞⎟⎠1/p. (16)

## Iv Proofs

In this section, we give the detailed proofs for some important theorems and corollaries. We first introduce several important inequalities, such as the Jensen’s inequality, Hölder’s inequality and Young’s inequality, that we use throughout our proofs.

###### Lemma 1 (Jensen’s inequality).

Assume that the function is a continuous concave function on . For all satisfying , and any for , then

 g(n∑i=1tixi)≥n∑i=1tig(xi). (17)
###### Lemma 2 (Hölder’s inequality).

For any satisfying , then for any and , ,

 n∑i=1|xiyi|≤(n∑i=1|xi|p)1/p(n∑i=1|yi|q)1/q (18)

with equality iff there is a constant such that each .

###### Lemma 3 (Young’s inequality).

Let and be such that . Then

 app+bqq≥ab (19)

with equality iff .

### Iv-a Proof of Theorem 1

Before giving a complete proof for Theorem 1, we first present and prove the following lemma.

###### Lemma 4.

Suppose that is a matrix of rank , and we denote its thin SVD by , where , and . For any satisfying , and the given , then for all , and

 \textupTrp(AΣZAT)≥\textupTrp(ΣZ)=∥Z∥pSp,

where .

###### Proof:

For any , we have , where is the -th singular value of . Then

 \textupTrp(AΣZAT)=∑k(∑ia2kiσi)p. (20)

Recall that with is a concave function on . By using the Jensen’s inequality [43], as stated in Lemma 1, and for any , we have

 (∑ia2kiσi)p≥∑ia2kiσpi.

Using the above inequality and for any , (20) can be rewritten as

 \textupTrp(AΣZAT)=∑k(∑ia2kiσi)p≥∑k∑ia2kiσpi=∑iσpi=\textupTrp(ΣZ)=∥Z∥pSp. (21)

In addition, when , i.e., , we obtain

 (∑ia2kiσi)p=∑ia2kiσi,

which means that the inequality (21) is still satisfied. This completes the proof. ∎

Proof of Theorem 1:

###### Proof:

Let and be the thin SVDs of and , respectively, where , , and .