Fixed point and Bregman iterative methods for matrix rank minimization

Fixed point and Bregman iterative methods for matrix rank minimization

Shiqian Ma Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027. Email: {sm2756, goldfarb, lc2161}@columbia.edu
Research supported in part by NSF Grant DMS 06-06712, ONR Grants N00014-03-0514 and N00014-08-1-1118, and DOE Grants DE-FG01-92ER-25126 and DE-FG02-08ER-58562.
Donald Goldfarb Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027. Email: {sm2756, goldfarb, lc2161}@columbia.edu
Research supported in part by NSF Grant DMS 06-06712, ONR Grants N00014-03-0514 and N00014-08-1-1118, and DOE Grants DE-FG01-92ER-25126 and DE-FG02-08ER-58562.
Lifeng Chen Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027. Email: {sm2756, goldfarb, lc2161}@columbia.edu
Research supported in part by NSF Grant DMS 06-06712, ONR Grants N00014-03-0514 and N00014-08-1-1118, and DOE Grants DE-FG01-92ER-25126 and DE-FG02-08ER-58562.
Submitted October 27, 2008, Revised May 7, 2009
Abstract

The linearly constrained matrix rank minimization problem is widely applicable in many fields such as control, signal processing and system identification. The tightest convex relaxation of this problem is the linearly constrained nuclear norm minimization. Although the latter can be cast as a semidefinite programming problem, such an approach is computationally expensive to solve when the matrices are large. In this paper, we propose fixed point and Bregman iterative algorithms for solving the nuclear norm minimization problem and prove convergence of the first of these algorithms. By using a homotopy approach together with an approximate singular value decomposition procedure, we get a very fast, robust and powerful algorithm, which we call FPCA (Fixed Point Continuation with Approximate SVD), that can solve very large matrix rank minimization problems 111The code can be downloaded from http://www.columbia.edu/~sm2756/FPCA.htm for non-commercial use.. Our numerical results on randomly generated and real matrix completion problems demonstrate that this algorithm is much faster and provides much better recoverability than semidefinite programming solvers such as SDPT3. For example, our algorithm can recover matrices of rank 50 with a relative error of in about 3 minutes by sampling only 20 percent of the elements. We know of no other method that achieves as good recoverability. Numerical experiments on online recommendation, DNA microarray data set and image inpainting problems demonstrate the effectiveness of our algorithms.

Keywords:
Matrix Rank Minimization Matrix Completion Problem Nuclear Norm Minimization Fixed Point Iterative Method Bregman Distances Singular Value Decomposition
journal: Math. Program., Ser. A

AMS subject classification. 65K05, 90C25, 90C06, 93C41, 68Q32

1 Introduction

The matrix rank minimization problem can be written as

 minrank(X)s.t.X∈C,

where and is a convex set. This model has many applications such as determining a low-order controller for a plant (ElGhaoui-Gahinet-1993) and a minimum order linear system realization (Fazel-Hindi-Boyd-2001), and solving low-dimensional Euclidean embedding problems (Linial-London-Rabinovich-1995).

In this paper, we are interested in methods for solving the affinely constrained matrix rank minimization problem

 minrank(X)s.t.A(X)=b, (1.1)

where is the decision variable, and the linear map and vector are given.

The matrix completion problem

 minrank(X)s.t.Xij=Mij,(i,j)∈Ω (1.2)

is a special case of (1.1), where and are both matrices and is a subset of index pairs The so called collaborative filtering problem (Rennie-Srebro-2005; Srebro-thesis-2004) can be cast as a matrix completion problem. Suppose users in an online survey provide ratings of some movies. This yields a matrix with users as rows and movies as columns whose -th entry is the rating given by the -th user to the -th movie. Since most users rate only a small portion of the movies, we typically only know a small subset of the entries. Based on the known ratings of a user, we want to predict the user’s ratings of the movies that the user did not rate; i.e., we want to fill in the missing entries of the matrix. It is commonly believed that only a few factors contribute to an individual’s tastes or preferences for movies. Thus the rating matrix is likely to be of numerical low rank in the sense that relatively few of the top singular values account for most of the sum of all of the singular values. Finding such a low-rank matrix corresponds to solving the matrix completion problem (1.2).

1.1 Connections to compressed sensing

When the matrix is diagonal, problem (1.1) reduces to the cardinality minimization problem

 min∥x∥0s.t.Ax=b, (1.3)

where and denotes the number of nonzeros in the vector This problem finds the sparsest solution to an underdetermined system of equations and has a wide range of applications in signal processing. This problem is NP-hard (Natarajan-1995). To get a more computationally tractable problem, we can replace by its convex envelope.

Definition 1

The convex envelope of a function is defined as the largest convex function such that for all (see e.g., (Hiriart-Urruty-Lemarechal-1993)).

It is well known that the convex envelope of is , the norm of , which is the sum of the absolute values of all components of . Replacing the objective function in (1.3) by yields the so-called basis pursuit problem

 min∥x∥1s.t.Ax=b. (1.4)

The basis pursuit problem has received an increasing amount of attention since the emergence of the field of compressed sensing (CS) (Candes-Romberg-Tao-2006; Donoho-2006). Compressed sensing theories connect the NP-hard problem (1.3) to the convex and computationally tractable problem (1.4) and provide guarantees for when an optimal solution to (1.4) gives an optimal solution to (1.3). In the cardinality minimization and basis pursuit problems (1.3) and (1.4), is a vector of measurements of the signal obtained using the sampling matrix . The main result of compressed sensing is that when the signal is sparse, i.e., we can recover the signal by solving (1.4) with a very limited number of measurements, i.e., , when is a Gaussian random matrix or when it corresponds to a partial Fourier transformation. Note that if is contaminated by noise, the constraint in (1.4) must be relaxed, resulting in either the problem

 min∥x∥1s.t.∥Ax−b∥2≤θ (1.5)

or its Lagrangian version

 minμ∥x∥1+12∥Ax−b∥22, (1.6)

where and are parameters and denotes the Euclidean norm of a vector .. Algorithms for solving (1.4) and its variants (1.5) and (1.6) have been widely investigated and many algorithms have been suggested including convex optimization methods ((Candes-Romberg-2005-l1-magic; Figueiredo-Nowak-Wright-2007; Hale-Yin-Zhang-2007; Kim-Koh-Lustig-Boyd-Gorinevsky-2007; vandenBerg-Friedlander-2008)) and heuristic methods ((Tibshirani-1996; Donoho-Tsaig-Drori-Starck-2006; Tropp-2006; Donoho-Tsaig-2006; Dai-Milenkovie-2008)).

1.2 Nuclear norm minimization

The rank of a matrix is the number of its positive singular values. The matrix rank minimization (1.1) is NP-hard in general due to the combinational nature of the function . Similar to the cardinality function , we can replace by its convex envelope to get a convex and more computationally tractable approximation to (1.1). It turns out that the convex envelope of on the set is the nuclear norm (Fazel-thesis-2002), i.e., the nuclear norm is the best convex approximation of the rank function over the unit ball of matrices with norm less than one, where is the operator norm of . The nuclear norm and operator norm are defined as follows.

Definition 2

Nuclear norm and Operator norm. Assume that the matrix has positive singular values of . The nuclear norm of is defined as the sum of its singular values, i.e.,

 ∥X∥∗:=r∑i=1σi(X).

The operator norm of matrix is defined as the largest singular value of , i.e.,

 ∥X∥2:=σ1(X).

The nuclear norm is also known as Schatten 1-norm or Ky Fan norm. Using it as an approximation to in (1.1) yields the nuclear norm minimization problem

 min∥X∥∗s.t.A(X)=b. (1.7)

As in the basis pursuit problem, if is contaminated by noise, the constraint must be relaxed, resulting in either the problem

 min∥X∥∗s.t.∥A(X)−b∥2≤θ

or its Lagrangian version

 minμ∥X∥∗+12∥A(X)−b∥22, (1.8)

where and are parameters.

Note that if we write in vector form by stacking the columns of in a single vector , then we get the following equivalent formation of (1.7):

 min∥X∥∗s.t.A vec(X)=b, (1.9)

where is the matrix corresponding to the linear map An important question is: when will an optimal solution to the nuclear norm minimization problem (1.7) give an optimal solution to matrix rank minimization problem (1.1). In response to this question, Recht et al.(Recht-Fazel-Parrilo-2007) proved that if the entries of are suitably random, e.g., i.i.d. Gaussian, then with very high probability, most matrices of rank can be recovered by solving the nuclear norm minimization (1.7) or equivalently, (1.9), whenever where is a positive constant.

For the matrix completion problem (1.2), the corresponding nuclear norm minimization problem is

 min∥X∥∗s.t.Xij=Mij,(i,j)∈Ω. (1.10)

Candès et al.(Candes-Recht-2008) proved the following result.

Theorem 1.1

Let be an matrix of rank with SVD

 M=r∑k=1σkukv⊤k,

where the family is selected uniformly at random among all families of orthonormal vectors, and similarly for the family . Let . Suppose we observe entries of with locations sampled uniformly at random. Then there are constants and such that if

 m≥Cn5/4rlogn,

the minimizer to the problem (1.10) is unique and equal to with probability at least . In addition, if , then the recovery is exact with probability at least provided that

 m≥Cn6/5rlogn.

This theorem states that a surprisingly small number of entries are sufficient to complete a low-rank matrix with high probability.

Recently, this result was strengthened by Candès and Tao in (Candes-Tao-2009), where it is proved that under certain incoherence conditions, the number of samples that are required is only

The dual problem corresponding to the nuclear norm minimization problem (1.7) is

 maxb⊤zs.t.∥A∗(z)∥2≤1, (1.11)

where is the adjoint operator of . Both (1.7) and (1.11) can be rewritten as equivalent semidefinite programming (SDP) problems. The SDP formulation of (1.7) is:

 minX,W1,W212(Tr(W1)+Tr(W2))s.t.[W1XX⊤W2]⪰0A(X)=b, (1.12)

where denotes the trace of the square matrix . The SDP formulation of (1.11) is:

 maxzb⊤zs.t.[ImA∗(z)A∗(z)⊤In]⪰0. (1.13)

Thus to solve (1.12) and (1.13), we can use SDP solvers such as SeDuMi (Sturm-1999) and SDPT3 (Tutuncu-Toh-Todd-2003) to solve (1.12) and (1.13). Note that the number of variables in (1.12) is . SDP solvers cannot usually solve a problem when and are both much larger than

Recently, Liu and Vandenberghe (Liu-Vandenberghe-2008) proposed an interior-point method for another nuclear norm approximation problem

 min∥A(x)−B∥∗, (1.14)

where and

 A(x)=x1A1+x2A2+⋯+xpAp

is a linear mapping from to The equivalent SDP formulation of (1.14) is

 minx,W1,W212(Tr(W1)+Tr(W2))s.t.[W1(A(x)−B)⊤A(x)−BW2]⪰0. (1.15)

Liu and Vandenberghe (Liu-Vandenberghe-2008) proposed a customized method for computing the scaling direction in an interior point method for solving the SDP (1.15). The complexity of each iteration in their method was reduced from to when and ; thus they were able to solve problems up to dimension

Another algorithm for solving (1.7) is due to Burer and Monteiro (Burer-Monteiro-2003; Burer-Monteiro-2005), (see also Rennie and Srebro (Rennie-Srebro-2005; Srebro-thesis-2004)). This algorithm uses the low-rank factorization of the matrix , where and solves the optimization problem

 minL,R12(∥L∥2F+∥R∥2F)s.t.A(LR⊤)=b, (1.16)

where denotes the Frobenius norm of the matrix :

 ∥X∥F:=(r∑i=1σ2i)1/2=(∑i,jX2ij)1/2=(Tr(XX⊤))1/2.

It is known that as long as is chosen to be sufficiently larger than the rank of the optimal solution matrix of the nuclear norm problem (1.7), this low-rank factorization problem is equivalent to the nuclear norm problem (1.7) (see e.g., (Recht-Fazel-Parrilo-2007)). The advantage of this low-rank factorization formulation is that both the objective function and the constraints are differentiable. Thus gradient-based optimization algorithms such as conjugate gradient algorithms and augmented Lagrangian algorithms can be used to solve this problem. However, the constraints in this problem are nonconvex, so one can only be assured of obtaining a local minimizer. Also, how to choose is still an open question.

One very interesting algorithm is the so called singular value thresholding algorithm (SVT) (Cai-Candes-Shen-2008) which appeared almost simultaneously with our work. SVT is inspired by the linearized Bregman algorithms for compressed sensing and -regularized problems. In (Cai-Candes-Shen-2008) it is shown that SVT is efficient for large matrix completion problems. However, SVT only works well for very low rank matrix completion problems. For problems where the matrices are not of very low rank, SVT is slow and not robust therefore often fails.

Our algorithms have some similarity with the SVT algorithm in that they make use of matrix shrinkage (see Section 2). However, other than that, they are greatly different. All of our methods are based on a fixed point continuation (FPC) algorithm which uses an operator splitting technique for solving (1.8). By adopting a Monte Carlo approximate SVD in the FPC, we get an algorithm, which we call FPCA (Fixed Point Continuation with Approximate SVD), that usually gets the optimal solution to (1.1) even if the condition of Theorem 1.1, or those for the affine constrained case, are violated. Moreover, our algorithm is much faster than state-of-the-art SDP solvers such as SDPT3 applied to (1.12). Also, FPCA can recover matrices of moderate rank that cannot be recovered by SDPT3, SVT, etc. with the same amount of samples. For example, for matrices of size and rank 50, FPCA can recover them with a relative error of in about 3 minutes by sampling only 20 percent of the matrix elements. As far as we know, there is no other method that has as good a recoverability property.

1.3 Outline and Notation

Outline. The rest of this paper is organized as follows. In Section 2 we review the fixed-point continuation algorithm for -regularized problems. In Section 3 we give an analogous fixed-point iterative algorithm for the nuclear norm minimization problem and prove that it converges to an optimal solution. In Section 4 we discuss a continuation technique for accelerating the convergence of our algorithm. In Section 5 we propose a Bregman iterative algorithm for nuclear norm minimization extending the approach in (Yin-Osher-Goldfarb-Darbon-2008) for compressed sensing to the rank minimization problem. In Section 6 we incorporate a Monte-Carlo approximate SVD procedure into our fixed-point continuation algorithm to speed it up and improve its ability to recover low-rank matrices. Numerical results for both synthesized matrices and real problems are given in Section 7. We give conclusions in Section 8.

Notation. Throughout this paper, we always assume that the singular values are arranged in nonincreasing order, i.e., denotes the subdifferential of the function and is the gradient of function at the point . denotes the diagonal matrix whose diagonal elements are the elements of the vector . is the signum function of , i.e.,

 sgn(t):=⎧⎪⎨⎪⎩+1 if t>0,0 if t=0,−1 if t<0,

while the signum multifunction of is

 SGN(t):=∂|t|=⎧⎪⎨⎪⎩{+1}%ift>0, [−1,1] if t=0,{−1} if t<0.

We use to denote the elementwise multiplication of two vectors and . We use to denote the submatrix of consisting of the -th to -th column of . We use to denote the nonnegative orthant in

2 Fixed point iterative algorithm

Our fixed point iterative algorithm for solving (1.8) is the following simple two-line algorithm:

 {Yk=Xk−τg(Xk)Xk+1=Sτμ(Yk), (2.1)

where is the matrix shrinkage operator which will be defined later.

Our algorithm (2.1) is inspired by the fixed point iterative algorithm proposed in (Hale-Yin-Zhang-2007) for the -regularized problem (1.6). The idea behind this algorithm is an operator splitting technique. Note that is an optimal solution to (1.6) if and only if

 0∈μSGN(x∗)+g∗, (2.2)

where . For any , (2.2) is equivalent to

 0∈τμSGN(x∗)+τg(x∗). (2.3)

Note that the operator on the right hand side of (2.3) can be split into two parts: where and .

Letting , (2.3) is equivalent to

 0∈T1(x∗)−y=τμSGN(x∗)+x∗−y. (2.4)

Note that (2.4) is actually the optimality conditions for the following convex problem

 minx∗τμ∥x∗∥1+12∥x∗−y∥22. (2.5)

This problem has a closed form optimal solution given by the so called shrinkage operator:

 x∗=~sν(y),

where and shrinkage operator is given by

 ~sν(⋅)=sgn(⋅)⊙max{|⋅|−ν,0}. (2.6)

Thus, the fixed point iterative algorithm is given by

 xk+1=~sτμ(xk−τgk). (2.7)

Hale et al.(Hale-Yin-Zhang-2007) proved global and finite convergence of this algorithm to the optimal solution of the -regularized problem (1.6).

Motivated by this work, we develop a fixed point iterative algorithm for (1.8). Since the objective function in (1.8) is convex, is the optimal solution to (1.8) if and only if

 0∈μ∂∥X∗∥∗+g(X∗), (2.8)

where . Note that if the Singular Value Decomposition (SVD) of is , where then (see e.g., (Borwein-Lewis-2000-book; Bach-2008))

 ∂∥X∥∗={UV⊤+W:U⊤W=0,WV=0,∥W∥≤1}.

Hence, we get the following optimality conditions for (1.8):

Theorem 2.1

The matrix with singular value decomposition is optimal for the problem (1.8) if and only if there exists a matrix such that

 μ(UV⊤+W)+g(X)=0, (2.9a) U⊤W=0,WV=0,∥W∥2≤1. (2.9b)

Now based on the optimality conditions (2.8), we can develop a fixed point iterative scheme for solving (1.8) by adopting the operator splitting technique described at the beginning of this section. Note that (2.8) is equivalent to

 0∈τμ∂∥X∗∥∗+X∗−(X∗−τg(X∗)) (2.10)

for any . If we let

 Y∗=X∗−τg(X∗),

then (2.10) is reduced to

 0∈τμ∂∥X∗∥∗+X∗−Y∗, (2.11)

i.e., is the optimal solution to

 minX∈Rm×nτμ∥X∥∗+12∥X−Y∗∥2F (2.12)

In the following we will prove that the matrix shrinkage operator applied to gives the optimal solution to (2.12). First, we need the following definitions.

Definition 3 (Nonnegative Vector Shrinkage Operator)

Assume . For any , the nonnegative vector shrinkage operator is defined as

 sν(x):=¯x, with ¯xi={xi−ν, if xi−ν>00, o.w.
Definition 4 (Matrix Shrinkage Operator)

Assume and the SVD of is given by , For any , the matrix shrinkage operator is defined as

 Sν(X):=UDiag(¯σ)V⊤,with ¯σ=sν(σ).
Theorem 2.2

Given a matrix with , let its Singular Value Decomposition (SVD) be , where , and a scalar . Then

 X:=Sν(Y)=UYDiag(sν(γ))V⊤Y (2.13)

is an optimal solution of the problem

 minX∈Rm×nf(X):=ν∥X∥∗+12∥X−Y∥2F. (2.14)
Proof

Without loss of generality, we assume . Suppose that the solution to problem (2.14) has the SVD where . Hence, must satisfy the optimality conditions for (2.14) which are

 0∈ν∂∥X∥∗+X−Y;

i.e., there exists a matrix

 W=¯U[Diag(¯σ)0]¯V⊤,

where , and both and are orthogonal matrices, such that

 0=ν(UV⊤+W)+X−Y. (2.15)

Hence,

 ^U[νI+Diag(σ)000νDiag(¯σ)0]^V⊤−UYDiag(γ)V⊤Y=0. (2.16)

To verify that (2.13) satisfies (2.16), consider the following two cases:

Case 1: In this case, choosing as above, with and , where is a vector of ones, and choosing (i.e., ) satisfies (2.16).

Case 2: In this case, by choosing and and satisfy (2.16).

Note that in both cases, can be written as the form in (2.13) based on the way we construct . ∎

Based on the above we obtain the fixed point iterative scheme (2.1) stated at the beginning of this section for solving problem (1.8).

Moreover, from the discussion following Theorem 2.1 we have

Corollary 1

is an optimal solution to problem (1.8) if and only if , where

3 Convergence results

In this section, we analyze the convergence properties of the fixed point iterative scheme (2.1). Before we prove the main convergence result, we need some lemmas.

Lemma 1

The shrinkage operator is non-expansive, i.e., for any and ,

 ∥Sν(Y1)−Sν(Y2)∥F≤∥Y1−Y2∥F. (3.1)

Moreover,

 ∥Y1−Y2∥F=∥Sν(Y1)−Sν(Y2)∥F⟺Y1−Y2=Sν(Y1)−Sν(Y2). (3.2)
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters