A Rank-Corrected Procedure for Matrix Completion with Fixed Basis Coefficients
For the problems of low-rank matrix completion, the efficiency of the widely-used nuclear norm technique may be challenged under many circumstances, especially when certain basis coefficients are fixed, for example, the low-rank correlation matrix completion in various fields such as the financial market and the low-rank density matrix completion from the quantum state tomography. To seek a solution of high recovery quality beyond the reach of the nuclear norm, in this paper, we propose a rank-corrected procedure using a nuclear semi-norm to generate a new estimator. For this new estimator, we establish a non-asymptotic recovery error bound. More importantly, we quantify the reduction of the recovery error bound for this rank-corrected procedure. Compared with the one obtained for the nuclear norm penalized least squares estimator, this reduction can be substantial (around ). We also provide necessary and sufficient conditions for rank consistency in the sense of Bach (2008). Very interestingly, these conditions are highly related to the concept of constraint
nondegeneracy in matrix optimization. As a byproduct, our results provide a theoretical foundation for the majorized penalty method of Gao and Sun (2010) and Gao (2010)
for structured low-rank matrix optimization problems. Extensive numerical experiments demonstrate that our proposed rank-corrected procedure can simultaneously achieve a high recovery accuracy and capture the low-rank structure.
Keywords: matrix completion, fixed basis coefficients, low-rank, convex optimization, rank consistency, constraint nondegeneracy.
The low-rank matrix completion is to recover an unknown low-rank matrix from the under-sampled observations with or without noises. This problem is of considerable interest in many application areas, from machine learning to quantum state tomography. A basic idea to address a low-rank matrix completion problem is to minimize the rank of a matrix subject to certain constraints from observations. Since the direct minimization of rank function is generally NP-hard, a widely-used convex relaxation approach is to replace the rank function with the nuclear norm — the convex envelope of the rank function over a unit ball of the spectral norm .
The nuclear norm technique has been observed to provide a low-rank solution in practice for a long time (see, e.g., [55, 54, 19]). The first remarkable theoretical characterization for the minimum rank solution via the nuclear norm minimization was given by Recht, Fazel and Parrilo , with the help of the concept of Restricted Isometric Property (RIP). Recognizing that the matrix completion problem does not obey the RIP, Candès and Recht  introduced the concept of incoherence property and proved that most low-rank matrices can be exactly recovered from a surprisingly small number of noiseless observations of randomly sampled entries via the nuclear norm minimization. The bound of the number of sampled entries was later improved to be near-optimal by Candès and Tao  through a counting argument. Such a bound was also obtained by Keshavan et al.  for their proposed OptSpace algorithm. Later, Gross  sharpened the bound by employing a novel technique from quantum information theory developed in , in which noiseless observations were extended from entries to coefficients relative to an arbitrary basis. This technique was also adapted by Recht , leading to a short and intelligible analysis. Besides the above results for the noiseless case, matrix completion with noise was first addressed by Candès and Plan . More recently, nuclear norm penalized estimators for matrix completion with noise have been well studied by Koltchinskii, Lounici and Tsybakov , Negahban and Wainwright , and Klopp  under different settings. Besides the nuclear norm, estimators with other penalties for matrix completion have also been considered in terms of recoverability in the literature, e.g., [68, 39, 43, 70, 25].
The nuclear norm technique has been demonstrated to be a successful approach to encourage a low-rank solution for matrix completion. However, its efficiency may be challenged in some circumstances. For example, Salakhutdinov and Srebro  showed that when certain rows and/or columns are sampled with high probability, the nuclear norm minimization may fail in the sense that the number of observations required for recovery is much more than the setting of most matrix completion problems. It means that the efficiency of the nuclear norm techniques could be highly weakened under a general sampling scheme. Negahban and Wainwright  also pointed out the impact of such heavy sampling schemes on the recovery error bound. As a remedy for this, a weighted nuclear norm (trace norm), based on row- and column-marginals of the sampling distribution, was suggested in [58, 69, 24] if the prior information on sampling distribution is available. Moreover, the conditions characterized by Bach  for rank consistency of the nuclear norm penalized least squares estimator may not be satisfied, especially when certain constraints are involved.
A concrete example of interest is to recover a density matrix of a quantum system from Pauli measurements in quantum state tomography (see, e.g., [31, 22, 74]). A density matrix is a Hermitian positive semidefinite matrix of trace one. Clearly, if the constraints of positive semidefiniteness and trace one are simultaneously imposed on the nuclear norm minimization, the nuclear norm completely fails in promoting a low-rank solution. Thus, one of the two constraints has to be abandoned in the nuclear norm minimization and then be restored in the post-processing stage. In fact, this idea has been much explored in [31, 22] and the numerical results there indicated its relative efficiency though it still has much room for improvement.
All the above examples motivate us to ask whether it is possible to go beyond the nuclear norm approach for practical use to seek for better performance in low-rank matrix completion. In this paper, we provide a positive answer to this question with both theoretical and empirical supports. We first establish a unified low-rank matrix completion model, which allows for the imposition of fixed basis coefficients so that the correlation and the density matrix completion are included as special cases. It means that in our setting, for any given basis of the matrix space, a few basis coefficients of the true matrix are assumed to be fixed due to a certain structure or some prior information, and the rest are allowed to be observed with noises under a general sampling scheme. To pursue a low-rank solution with a high recovery accuracy, we propose a rank-correction step to generate a new estimator. The rank-correction step solves a penalized least squares problem with its penalization being the nuclear norm minus a linear rank-correction term constructed on a reasonable initial estimator. A satisfactory choice of the initial estimator could be the nuclear norm penalized least squares estimator or one of its analogies. The resulting convex matrix optimization problem can be solved by the efficient algorithms recently developed in [21, 34, 35, 36] even for large-scale cases.
The idea of using a two-stage or even multi-stage procedure is not brand new for dealing with sparse recovery in the statistical and machine learning literature. The -norm penalized least squares method, also known as the Lasso , is very attractive and popular for variable selection in statistics, thanks to the invention of the fast and efficient LARS algorithm . On the other hand, the -norm penalty has long been known by statisticians to yield biased estimators and cannot achieve the best estimation performance [14, 18]. The issue of bias can be overcome by nonconvex penalization methods, see, e.g., [47, 13, 77]. A multi-stage procedure naturally occurs if the nonconvex problem obtained is solved by an iterative algorithm [81, 45]. In particular, once a good initial estimator is used, a two-stage estimator is enough to achieve the desired asymptotic efficiency, e.g., the adaptive Lasso proposed by Zou . There are also a number of important works along this line on variable selection, including [47, 53, 78, 33, 79, 52, 15], to name only a few. For a broad overview, the interested readers are referred to the recent survey papers [16, 17]. It is natural to extend the ideas from the vector case to the matrix case. Fazel, Hindi and Boyd  first proposed the reweighted trace minimization for minimizing the rank of a positive semidefinite matrix. In , Bach made an important step in extending the adaptive Lasso of Zou  to the matrix case for rank consistency. However, it is not clear how to apply Bach’s idea to our matrix completion model with fixed basis coefficients since the required rate of convergence of the initial estimator for achieving asymptotic properties is no longer valid, as far as we can see. More critically, there are numerical difficulties in efficiently solving the resulting optimization problems. Numerical difficulties also occur in the reweighted nuclear norm approach proposed by Mohan and Fazel  as an extension of  for rectangular matrices. Iterative reweighted least squares minimization is an alternative extension of  independently proposed by Mohan and Fazel  and Fornasier, Rauhut and Ward , taking advantage of the property that the rank of a matrix is equal to the rank of the product of this matrix and its transpose. However, the resulting smoothness of inner-iteration subproblems is weak in encouraging a low-rank solution so much more iterations are needed in general and thus the computational cost is high especially when hard constraints such as fixed basis coefficients are involved.
The rank-correction step to be proposed in this paper is for overcoming the above difficulties. This approach is inspired by the majorized penalty method proposed by Gao and Sun  for solving structured matrix optimization problems with a low-rank constraint. For our proposed rank-correction step, we establish a non-asymptotic recovery error bound in Frobenius norm, following a similar argument adopted by Klopp in . We also discuss the impact of adding the rank-correction term on recovery error. More importantly, we provide an affirmative guarantee that under mild condition the rank-correction step highly improves the recoverability, compared with the nuclear norm penalized least squares estimator. As the estimator is expected to be of low-rank, we also study the asymptotic property — rank consistency in the sense of Bach , under the setting that the matrix size is assumed to be fixed. This setting may not be ideal for analyzing asymptotic properties for matrix completion, but it does allow us to take the crucial first step to gain insights into the limitation of the nuclear norm penalization. Among others, the concept of constraint nondegeneracy for conic optimization problem plays a key role in our analysis. Interestingly, our results of recovery error bound and rank consistency suggest a consistent criterion for constructing a suitable rank-correction function. In particular, for the correlation and the density matrix completion problems, we prove that rank consistency automatically holds for a broad selection of rank-correction functions. For most cases, a single rank-correction step is sufficient for a substantial improvement, unless the sample ratio is rather low so that the rank-correction step may be iteratively used for two or three times to achieve the limit of improvement. Owing to this property, the advantage of our proposed method is more apparent in practical computations especially when fixed basis coefficients are involved. Finally, we remark that our results can also be used to provide a theoretical foundation in the statistical setting for the majorized penalty method of Gao and Sun  and Gao  for structured low-rank matrix optimization problems.
This paper is organized as follows. In Section 2, we introduce the observation model of matrix completion with fixed basis coefficients and formulate the rank-correction step. In Section 3, we establish a non-asymptotic recovery error bound for the estimator generated from the rank-correction step and provide a quantification of the improvement in recoverability. Section 4 provides necessary and sufficient conditions for rank consistency. Section 5 is devoted to the construction of the rank-correction function. In Section 6, we report numerical results to validate the efficiency of our proposed rank-corrected procedure. We conclude this paper in Section 7. All relevant material and all proofs of theorems are left in the appendices.
Notation. Here we provide a brief summary of the notation used in this paper.
Let and denote the space of all real and complex matrices, respectively. Let denote the set of all real symmetric (positive semidefinite, positive definite) matrices and denote the set of all Hermitian (positive semidefinite, positive definite) matrices.
Let represent , , or . We define for the previous two cases and stipulate for the latter two cases. Let be endowed with the trace inner product and its induced norm , i.e., for , where stands for the trace of a matrix and means the real part of a complex number.
For the real case, i.e., or , let represent ; and for the complex case, i.e., or , let represent .
For the real case, denotes the set of all real matrices with orthonormal columns, and for the complex case, denotes the set of all complex matrices with orthonormal columns. When , we write as for short.
The notation denotes the transpose for the real case and the conjugate transpose for the complex case. The notation means the adjoint of a linear operator.
For any index set , let denote the cardinality of , i.e., the number of elements in . For any , let denote the vector in whose -th component is , let denote the vector in whose -th component is and let denote the vector in whose -th component is .
For any given vector , denotes a rectangular diagonal matrix of suitable size with the -th diagonal entry being .
For any , let and denote the Euclidean norm and the maximum norm, respectively. For any , let and denote the spectral norm and the nuclear norm, respectively.
The notations , and mean almost sure convergence, convergence in probability and convergence in distribution, respectively. We write if is bounded in probability.
For any set , let denote the indicator function of , i.e., if , and otherwise. Let denote the identity matrix.
2 Problem formulation
In this section, we formulate the model of the matrix completion problem with fixed basis coefficients, and then propose an adaptive nuclear semi-norm penalized least squares estimator for solving this class of problems.
2.1 The observation model
Let be a given orthonormal basis of the given real inner product space . Then, any matrix can be uniquely expressed in the form of , where is called the basis coefficient of relative to . Throughout this paper, let be the unknown low-rank matrix to be recovered and let . In some practical applications, for example, the correlation and density matrix completion, a few basis coefficients of the unknown matrix are fixed (or assumed to be fixed) due to a certain structure or reliable prior information. We let denote the set of the indices relative to which the basis coefficients are fixed, and denote the complement of in , i.e., and . We define and .
When a few basis coefficients are fixed, one only needs to observe the rest for recovering the unknown matrix . Assume that we are given a collection of noisy observations of the basis coefficients relative to in the following form
where are the indices randomly sampled from the index set , are the independent and identically distributed (i.i.d.) noises with and , and controls the magnitude of noise. Unless otherwise stated, we assume a general weighted sampling (with replacement) scheme with the sampling distributions of as follows.
The indices are i.i.d. copies of a random variable that has a probability distribution over defined by
Note that each is assumed to be sampled with a positive probability in this sampling scheme. In particular, when the sampling probability of all are equal, i.e., , we say that the observations are sampled uniformly at random.
For notational simplicity, let be the multiset of all the sampled indices from the index set , i.e., . With a slight abuse on notation, we define the sampling operator : associated with by
Then, the observation model (1) can be expressed in the following vector form
where and denote the observation vector and the noise vector, respectively.
Next, we present some examples of low-rank matrix completion problems in the above settings.
Correlation matrix completion. A correlation matrix is an real symmetric or Hermitian positive semidefinite matrix with all diagonal entries being ones. Let be the vector with the -th entry being one and the others being zeros. Then, . The recovery of a correlation matrix is based on the observations of entries. For the real case, , , ,
and for the complex case, , , ,
Here, represents the imaginary unit. Of course, one may fix some off-diagonal entries in specific applications.
Density matrix completion. A density matrix of dimension for some positive integer is an Hermitian positive semidefinite matrix with trace one. In quantum state tomography, one aims to recover a density matrix from Pauli measurements (observations of the coefficients relative to the Pauli basis) [31, 22], given by
where “” means the Kronecker product of two matrices and
are the Pauli matrices. In this setting, , , , and .
Rectangular matrix completion. Assume that a few entries of a rectangular matrix are known and let be the index set of these entries. One aims to recover this rectangular matrix from the observations of the rest entries. For the real case, , , ,
and for the complex case, , , ,
Now we introduce some linear operators that are frequently used in the subsequent sections. For any given index set , say or , we define the linear operators : , : and : respectively, by
For convenience of discussions, in the rest of this paper, for any given , we denote by the singular value vector of arranged in the nonincreasing order and define
In particular, when , we denote by the eigenvalue vector of with and define
For any and any , we write and with , , and . In particular, for any and any , we write with and .
2.2 The rank-correction step
In many situations, the nuclear norm penalization performs well for matrix recovery, but its efficiency may be challenged if the observations are sampled at random obeying a general distribution such as the one considered in . The setting of fixed basis coefficients in our matrix completion model can also be regarded to be under an extreme sampling scheme. In particular, for the correlation and density matrix completion, the nuclear norm completely loses its efficiency since it reduces to a constant in these two cases. In order to overcome the shortcomings of the nuclear norm penalization, we propose a rank-correction step to generate an estimator in pursuit of a better recovery performance.
Recall that is the unknown true matrix of rank . Given an initial estimator of , say, the nuclear norm penalized least squares estimator or one of its analogies, our proposed rank-correction step is to solve the convex optimization problem
where is the penalty parameter (depending on the number of observations), is an upper bound of the magnitudes of basis coefficients of , is a closed convex set that contains , and is a spectral operator associated with a symmetric function . One may refer to Appendix A for more information on the concept of spectral operators. (Indeed, based on the subsequent analysis for better recovery performance, the choice is much preferred, for which the penalization is indeed a nuclear semi-norm. But this choice criterion is not compulsory). The bound restriction is very mild since such a bound is often available in applications, for example, the correlation and the density matrix completion. This boundedness setting can also be found in previous works done by Negahban and Wainwright  and Klopp .
Hereafter, we call the rank-correction function and the rank-correction term. Note that, when , the rank-correction step (3) reduces to the nuclear norm penalized least squares estimator, which equally penalizes singular values to promote a low-rank solution for matrix completion. Certainly, for this purpose, penalizing more on small singular values or even directly penalizing the rank function could serve better, but only theoretically rather than practically, due to the lack of convexity. Also note that an initial estimation, if deviates not too much from the true matrix, could contain some information of the singular values and/or the rank of the true matrix to a certain extent. Therefore, provided such an initial estimator is available, it is achievable to construct a rank-correction term with a suitable to substantially offset the penalization of large singular values from the nuclear norm penalty. Consequently, we can expect the rank-correction step (3) to have a better low-rank promoting ability and outperform the nuclear norm penalized least squares estimator.
The key issue is then how to construct a favored rank-correction function . In the next two sections, we provide theoretical supports to our proposed rank-correction step, from which some important guidelines on the construction of can be captured. In particular, if one chooses the nuclear norm penalized least squares estimator to be the initial estimator , and also suitably chooses the spectral operator so that is a semi-norm, called nuclear semi-norm, then the estimator generated from this two-stage procedure is called the adaptive nuclear semi-norm penalized least squares estimator associated with .
2.3 Relation with the majorized penalty approach
The rank-correction step above is inspired by the majorized penalty approach proposed by Gao and Sun  for solving the rank constrained matrix optimization problem:
where , is a given continuous function and is a closed convex set. Note that for any , the constraint is equivalent to
where denotes the Ky Fan -norm. The central idea of the majorized penalty approach is to solve the following penalized version of (4):
where is the penalty parameter. With the current iterate , the majorized penalty approach yields the next iterate by solving the convex optimization problem
where is a subgradient of the convex function at , and is a convex majorization function of at . By comparing with (3), one may notice that our proposed rank-correction step is close to a single step of the majorized penalty approach.
Note that the rank constrained least squares problem is of great consideration in matrix completion especially when the rank information is known. However, different from the noiseless case, for matrix completion with noise, the solution to the rank constrained least squares problem (assuming the uniqueness) is in general not the true matrix though quite close to it. Indeed, there may exist many candidate matrices surrounding the true matrix and having its rank. The rank constrained least squares solution is only one of them. It deviates the least from the noisy observations rather than the true matrix. Naturally, it is conceivable that some candidate matrices may deviate a bit more from the noisy observations but less from the true matrix. So, for the purpose of matrix completion, there is no need to aim precisely at the rank constrained least squares solution and find this solution accurately. An approach roughly towards it such as our proposed rank-correction step (3) is good enough to bring similar good recovery performance.
3 Error bounds
In this section, we aim to derive a recovery error bound in Frobenius norm for the estimator generated from the rank-correction step (3) and discuss the impact of the rank-correction term on the resulting bound. The analysis mainly follows Klopp’s arguments in , which is also in line with those used by Negahban and Wainwright .
We start the analysis by defining a quantity, which plays a key role in the subsequent analysis, as
A basic relation between the true matrix and its estimate can be obtained by using the optimality of to the problem (3) as follows.
For any , if then the following inequality holds:
We emphasize that is not restricted to be a constant in Theorem 1 but could be set to depend on the size of matrix. This realization is important as can be seen in the sequel. According to Theorem 1, the choice of the penalty parameter depends on the observation noises and the sampling operator . Therefore, we make the following assumption on the noises as follows:
The i.i.d. noise variables are sub-exponential, i.e., there exist positive constants , and such that for all ,
Moreover, based on Assumption 1, we further define quantities and that control the sampling probability for observations as
It is easy to obtain that and , according to the facts and , respectively. In general, the values of and depend on the sampling distribution. The more extreme the sampling distribution is, the larger these two values have to be. Assume that there exist some positive constants and such that , . Then we can easily set . The setting of is not universal for different cases. For example, consider the cases described in Section 2. For correlation matrix completion, we can set for the real case and for the complex case. For density matrix completion, we can set for any sampling distribution. For rectangular matrix completion, we can set for the real case and for the complex case. Note that for uniform sampling.
Theorem 1 reveals the key to deriving a recovery error bound in Frobenius norm, that is, to establish the relation between and . This can be achieved by looking into some RIP-like property of the sampling operator , as done previously in [58, 44, 40, 49]. Following this idea, we obtain an explicit recovery error bound as follows:
Theorem 2 shows that for any rank-correction function , controlling the recovery error only needs the samples size to be of roughly the degree of freedom of a rank matrix up to a logarithmic factor in the matrix size. Besides the information on the order of magnitude, Theorem 2 also provides us more details on the constant part in the recovery error bound, which also plays an important role in practice. The impact of different choices of rank-correction functions on recovery error is fully embodied with the value of . Note that the smaller is, the smaller the error bound (10) is for a fixed , and thus the smaller value this error bound can achieve for the best (as well as the best ). Therefore, we aim to establish an explicit relationship between and in the next theorem.
For any given such that , we have
It is immediate from Theorem 3 that
Recall that the nuclear norm penalized least squares estimator corresponds to the rank-correction step with so that . Therefore, Theorem 3 guarantees that if the initial estimator does not deviate too much from , the rank-correction step outperforms the nuclear norm penalized least squares estimator in the sense of recovery error, provided that is close to . For example, consider the case when the rank of the true matrix is known. One may simply choose to take advantage of the rank information. In this case, the requirement in (11) ensuring simply reduces to . Moreover, further suppose that is the nuclear norm penalized least squares estimator. Then, according to Theorems 2 and 3, one only needs samples with size
where . As can be seen, the larger the matrix size is, the easier becomes less than or even close to . If the rank of the true matrix is unknown, one could construct the rank-correction function on account of the tradeoff between optimality and robustness, to be discussed in Section 5. An experimental example of the relationship between and can be found in Table 1.
Next, we demonstrate the power of the rank-correction term with more details. It is interesting to notice that the value of (as well as ) has a substantial impact on the recovery error bound (10). The part related to the magnitude of noise increases as increases, while the part related to the upper bound of entries slightly decreases to its limit as increases. Therefore, our first target is to find the smallest error bound in terms of (10) among all possible . It is possible to work on the error bound (10) directly for its minimum in but the subsequent analysis is much more tedious. For simplicity of illustration, instead, we perform our analysis on a slightly relaxed version instead as
Direct calculation shows that over , attains its minimum
It is worthwhile to note that when , meaning that the optimal choice of is inversely proportional to rather than a simple constant. (This observation is important for achieving the rank consistency in Section 4.) In other words, for achieving the best possible recovery error, the penalty parameter chosen for the rank-correction step (3) with should be larger than that for the nuclear norm penalized least squares estimator. In addition, consider two extreme cases with and respectively:
By direct calculations, we obtain , where the lower bound is attained when and the upper bound is approached when or . This finding motivates us to wonder whether the recovery error can be reduced by around half in practice. This inference is further validated by numerical experiments in Section 6.
4 Rank consistency
In this section we consider the asymptotic behavior of the estimator generated from the rank-correction step (3) in term of its rank. We expect that the resulting has the same rank as the true matrix . Theorem 2 only reveals a flavored parameter in terms of the optimal order but rather its exact value. In practice, for a chosen parameter , there is hardly any clue to know the recovery performance of the resulting solution since the true matrix is unknown. However, if the rank property holds as expected, the observable rank information may be used to infer the recovery quality of the resulting solution of a parameter and thus help in parameter searching. Numerical experiments in Section 6 demonstrate the practicability of this idea.
For the purpose above, we study the rank consistency in the sense of Bach  under the setting that the matrix size is fixed. An estimator of the true matrix is said to be rank consistent if
Throughout this section, we make the following assumptions:
The spectral operator is continuous at .
The initial estimator satisfies as .
Epi-convergence in distribution gives us an elegant way in analyzing the asymptotic behavior of optimal solutions of a sequence of constrained optimization problems. Based on this technique, we obtain the following result.
If , then as .
We first focus on the characterization of necessary and sufficient conditions for rank consistency of . Unlike in the analysis of recovery error bound, additional information represented by the set could affect the path along which converges to and thus may break the rank consistency. In the sequel, we only discuss two most common cases: the rectangular case (recovering a rectangular matrix or a symmetric/Hermitian matrix) and the positive semidefinite case (recovering a symmetric/Hermitian positive semidefinite matrix).
For notational simplicity, we divide the index set into three subsets as
Then, we define a linear operator as
Here, we use the superscript “” because of its inverse-like property in terms of
By extending the arguments of Bach  for the nuclear norm penalized least squares estimator from the unconstrained case to the constrained case, we obtain the following results.
For the positive semidefinite case, the nuclear norm in (3) simply reduces to the trace . We assume that the Slater condition holds.
For the positive semidefinite case , the Slater condition holds, i.e., there exists some such that and .
Next, we provide a theoretical guarantee on the uniqueness of the solution to the linear systems (13) and (14) with the help of constraint nondegeneracy. The concept of constraint nondegeneracy was pioneered by Robinson  and later extensively developed by Bonnans and Shapiro . We say that the constraint nondegeneracy holds at to (3) with if
where . Meanwhile, we say that the constraint nondegeneracy holds at to (3) with if
where . One may refer to Appendix B for more details of constraint nondegeneracy.