Decomposition Approach for Low-rank Matrix Completion
Abstract
In this paper, we describe a low-rank matrix completion method based on matrix decomposition. An incomplete matrix is decomposed into submatrices which are filled with a proposed trimming step and then are recombined to form a low-rank completed matrix. The divide-and-conquer approach can significantly reduce computation complexity and storage requirement. Moreover, the proposed decomposition method can be naturally incorporated into any existing matrix completion methods to attain further gain. Unlike most existing approaches, the proposed method is not based on norm minimization nor SVD decomposition. This makes it possible to be applied beyond real domain and can be used in arbitrary fields including finite fields.
1 Introduction
Consider a large matrix with only a small portion of known entry, an interesting problem is to fill the missing entry assuming the matrix has low-rank. This problem has several interesting applications including the so-called collaborative filtering problem [1]. An example is the famous Netflix challenge where a huge matrix is used to represent the rating of a movie given by a user. Of course, a typical user will only rate very few movie titles. Therefore, an algorithm will be needed to complete the matrix to predict the ratings of all movies among all users.
It has been shown theoretically that under certain assumptions the matrix can be recovered with very high accuracy [2, 3, 4]. Their approaches convert the rank minimization problem into a nuclear norm minimization problem instead and thus can be solved using semidefinite program (SDP). However, the complexity still grows rather rapidly with the size of the matrix (). Several efficient algorithms have been proposed including Singular Value Thresholding (SVT) [5], Atomic Decomposition for Minimum Rank Approximation (ADMiRA) [6], Fixed Point Continuation with Approximate (FPCA) [7], Accelerated Proximal Gradient (APG) [8], Subspace Evolution and Transfer (SET) [9], Singular Value Projection (SVP) [10], OptSpace [4], and LMaFit [11], where OptSpace and SET are based on Grassmann manifold optimization, SVT and SVP uses iterative hard thresholding (IHT) to facilitate matrix shrinkage, FPCA utilizes Bregman iterative algorithm and Monte Carlo approximate SVD, and LMaFit adopts successive over-relaxation (SOR).
In this paper, we propose a decomposition method to allow very efficient divide-and-conquer approach when known entries are relatively very few. A simple “trimming” method is proposed to recover the decomposed “cluster” matrix. However, the decomposition method can also be combine with any other existing matrix completion techniques to yield further gain. One advantage of the proposed approach is that unlike most existing approaches it does not utilize SVD but only relies on basic vector operations. Therefore, the approach is immediately applicable to matrices of any field (including finite field matrices). This opens up opportunities for new applications.
The rest of the paper is organized as follows. In the next section, we will fix our notation, describe the problem precisely, and present several properties to be used in the later sections. Sections 3 and 4 will describe the decomposition procedures and present our main results. Section 5 will describe the trimming process.
2 Minimum Rank of Incomplete Matrix
Let us start with a few notes on our notation. When things are clear, lines of partition in matrices will not be shown; the sign may represent an unknown entry, a row or column of unknown entry, a matrix of unknown entry, etc; and similar for the sign.
Given a finite size matrix over field to be completed, let
(2.1) |
If is already completed, then . We define
(2.2) |
Such minimum exists because and hence such that
(2.3) |
If ,
(2.4) |
as we can always find from in (2.3). We list other properties about that will be quoted:
(2.5) | |||
(2.6) | |||
(2.7) | |||
(2.8) | |||
(2.9) | |||
(2.10) |
2.1 Junk Row and Junk Column
Definition 2.1.
A row(column) contains entirely either zero or unknown will be refered as a junk row(column).
Certainly, we have
(2.11) |
since we can always complete entirely by zero entries.
Theorem 2.1.
Let where is a junk column, then .
Thanks to (2.8), we have the following corollary:
Corollary 1.
if is a junk row.
2.2 Equivalence
3 Unknown-diagonalization
Define
(3.1) |
We say is , and both and contain at least one nonzero known entry.
Theorem 3.1.
Let , then .
Proof.
By (2.10), we can simply assume the first columns of form a basis of ; and the first columns of form a basis of . Without loss of generality, let us assume . We complete the ï»¿ï»¿matrix by filling up the columns:
(3.5) | |||||
(3.6) |
For , we make use of the fact that is a linear combination of . We fill
for | (3.7) |
Similarly,
(3.8) |
Remark 3.1.
Suppose has been completed by and . Then if the number of column of , then we can complete arbitrarily and do the completion in (3.5)-(3.8) as if . More generally, given with is completed. Then the completing process of can be stopped once we know that the final will not be greater than no matter how we do the remaining completion on . For example, if , where
(3.9) |
then we can complete arbitrarily to start with.
3.1 Percolation and Clusters
We would call those in Theorem 2.1 as clusters. In other words, clusters are matrices that cannot be u-diagonized. They are not the clusters in the 2-d square lattice, where each point, not counting the edgy one, has 4 neighbors. In our case, each entry in an matrix has neighbors, from the view of percolation. Despite that difference, the two models share the same percolation threshold at [12], where is the occupation rate. That means if about of our entries are known, then there is probably one cluster left and the matrix cannot be u-diagonalized. We estimate the number of clusters as the size of the matrix and the number of known entries vary through Monte Carlo simulation and the results are shown in Figure 3.1. We can see the number of clusters increases as as the number of known entries increases and is peak when the occupation ratio is at about regardness the size of the matrix.
3.2 A Decomposition Algorithm
First of all, we set all junk rows and junk columns of the given matrix to zero and blackout them. Now we are working with a junk-free matrix.
We create a row set and a column set for the first cluster. Then put the row position of the first row into the row set; the columns positions of the columns with known entries in that row into the column set. Thanks to (2.10), sorting is not necessary. We black out the row against repeating searches. For each new comers of the column set, we search vertically for its known entries and put the corresponding row position into the row sets. After that we black out the searched columns. Now the row set may have new comers. We enlarge the column set in the way that we enlarged the row set. Both sets keep growing until there is no more new comer.
Then we create another column set and another row set. Repeat the procedure for the next cluster, if the remaining matrix has not been blackout to void.
4 Sub Unknown-diagonalization
Definition 4.1.
Let be a vector and be a matrix. We define as if (c.f. (2.1)) with , then . Otherwise, we define as .
E.g. , . Notice that the necessary condition (not sufficient) for is that must be a junk column. Hence we have , where .
Theorem 4.1.
Let , where are matrices, are vectors that is not a junk column, then .
Proof.
We will show the case when ; cases of higher are easy to generalized. Let with is not a junk column. If , then we have
(4.1) |
where the first inequality is by (2.9). If , then must be a junk column. Since is not a junk column, must contains a nonzero known entry . Let . By (2.7) and (2.9),
(4.2) |
So there are only two possibilies for . Assume . By (2.3) we can pick such that
(4.3) |
Therefore, as . Since , we have (c.f. def 4.1). But that will make because . We conclude that cannot equal to and hence we must have
(c.f. (4.2)) | |||||
(c.f. Theorem 2.1) | (4.4) |
Without loss of generality, we may assume
(4.6) |
Pick s.t.
(4.7) |
and
(4.8) |
Then we complete the rest by filling up the columns:
(4.9) | |||||
(4.10) |
For , we can pick s.t. and fill the columns
(4.11) |
Similarly, for , we pick s.t. and fill the columns
(4.12) |
Then the completed matrix is rank with the first columns form a basis of its column space. By (2.5),
(c.f. (4.6), (4.7)) | ||||
as assumed. Together with (4.5), we get .
∎
Remark 4.1.
Suppose has been completed by and we have . Then if the number of column of , we can complete arbitrarily and do the completion in (4.9)-(4.12) as if . More generally, the completing process of can be stopped once we know that the final won’t be greater than no matter how we do the remaining completion on . For example if (c.f. (3.9), def 4.1), then we can complete arbitarily at the beginning and do the completion (4.9)-(4.12).
4.1 How to decompose sub u-diagonalizable matrix
Definition 4.2.
A matrix , not u-diagonalizable, becomes u-diagonalizable after deleting a row or a column is called sub u-diagonalizable. The row (column) is called conjoined row (column).
For example, in Theorem 4.1 is a conjoined column and becomes u-diagonalizable without it.
Definition 4.3.
Given two vectors and of same length, we say is a donor for iff all of the unknown positions of are also unknown in . In order words, after some row interchanging with and are completed. Clearly, and imply . However, if and , we do not have . Vectors and are said to be comparable if either or .
Theorem 4.2.
Conjoined row (column) does not have donors among other rows of the sub u-diag matrix.
Proof.
Let be the sub u-diag matrix, then it must have the following structure (after some row and column interchanging): where cannot be entirely unknown, otherwise is u-diagonalizable. Now, rows in cannot be donors of , the conjoined row. Similarly cannot be entirely unknown and hence, rows in cannot be donors of neither. ∎
Therefore if is a sub unknown-diagonalizable, we will not miss the chance of decomposing it if we have tested every row and column that does not have a donor. That is to blackout the suspicious row (column) and then carrying out the decomposition mentioned in Section 3.2. We would like to call the decomposed components as sub-clusters. For example, and are sub-clusters of the in the proposition.
Unlike cluster that cannot be further unknown-diagonalized, sub-clusters can be sub unknown-diagonalizable. For example where both and are conjoined rows. In that case, we may first decompose into sub-clusters and . Then we may further decompose the later into and , if necessary.
5 Trimming
Lemma 5.1.
if , ,.
Theorem 5.1.
Let be the -th column of and donor (c.f. def 4.3) of a vector for , such that after some row interchanging, with and are completed. If , then .
Proof.
Thanks to (2.10), we may start with
5.1 Trimming Process
We test column by column to see if we can make use of Theorem 5.1 to trim away some columns from a given matrix, which is probably a sub-cluster mentioned in the previous section. We call this process as column trimming. When we find a column satisfying the condition of Theorem 5.1, we will mark down the dependency relation between it and its donor (i.e. (5.1)) in order. Then we black it out and go for the next column.
Similarly, we have row trimming. An uninterrupted (c.f. Remarks 3.1 and 4.1) trimming process starts with a column trimming followed by a row trimming, or the other way round. Then we carry out these two kinds of trimming one after the other, until there is no more reduction in the matrix. After the trimmed matrix gets completed, we restore, in reverse order, the blackouts with the completed forms given by (5.2).
The following proposition is interesting in its own right and may be useful for our future study.
Lemma 5.2.
If is a column that , . Then .
Proof.
Proposition. Suppose is a matrix that every two columns and of are comparable (c.f. def 4.3). Then one round of column trimming followed by arbitrarily completion and proper restoration (i.e. (5.2)) of the trimmed columns complete to its minimum rank.
Proof.
Let be the trimmed . Every two columns of are also columns of and hence comparable. So there exists a column in such that for all . After some row interchanging and column interchanging, we have and with and are completed. Notice that , otherwise has been trimmed away already. Now Lemma 5.2 and (2.7) give . Repeating the argument, we get , . Together with Theorem 5.1 and (2.5), we conclude that , . Finally, the proper restoration (c.f. (5.2)) restores to such that . ∎
5.2 Trimming Process with Approximation
The trimming process stops when there are no more columns or rows fulfill the condition of Theorem 5.1. But we can always make an approximation by blacking out a column or a row as if it fulfill the condition and continue the trimming. With this approximation, the process stops when there is no more unknown left in the trimming matrix. (We may even choose the row or column that has no donor (c.f. Thm 4.2) to black out and check for u-diagonalization. )
Then we restore the blackouts in reverse order. When we meet a blackout without dependency relation (i.e. (5.1)) to restore, we check for the condition of Theorem 5.1, again. The first time was with the uncompleted trimming matrix; this time is with the completed restoring matrix. If the condition is fulfilled, we restore the blackout with completed form given by (5.2). This will not compromise (further) the minimum rank that we can reach. Otherwise we restore the blackout with arbitrary completed form, which may cause one (more) rank deviation from the possible minimum.
References
- N. Srebro, “Learning with matrix factorizations,” Ph.D. dissertation, Citeseer, 2004.
- E. Candes and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717–772, 2009.
- E. Candes and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” arXiv, vol. 903, 2009.
- R. Keshavan, S. Oh, and A. Montanari, “Matrix completion from a few entries,” arxiv, vol. 901, 2009.
- J. Cai, E. Candes, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” preprint, 2008.
- K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimum rank approximation,” arXiv, vol. 905, 2009.
- S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,” Mathematical Programming, pp. 1–33, 2009.
- K. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems,” preprint, 2009.
- W. Dai and O. Milenkovic, “Set: an algorithm for consistent matrix completion,” Arxiv preprint arXiv:0909.2705, 2009.
- R. Meka, P. Jain, and I. Dhillon, “Guaranteed rank minimization via singular value projection,” 2009.
- Z. Wen, W. Yin, and Y. Zhang, “Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm.”
- D. Stauffer and A. Aharony, Introduction to percolation theory. CRC, 1994.