Matrix rigidity and the ill-posedness of
Robust PCA and matrix completion111This publication is based on work partially supported by: the EPSRC I-CASE studentship (voucher 15220165) in partnership with Leonardo, and The Alan Turing Institute through EPSRC (EP/N510129/1)
Robust Principal Component Analysis (PCA) (Candès et al., 2011) and low-rank matrix completion (Recht et al., 2010) are extensions of PCA to allow for outliers and missing entries respectively. It is well-known that solving these problems requires a low coherence between the low-rank matrix and the canonical basis, since in the extreme cases – when the low-rank matrix we wish to recover is also sparse – there is an inherent ambiguity. However, the well-posedness issue in both problems is an even more fundamental one: in some cases, both Robust PCA and matrix completion can fail to have any solutions at due to the set of low-rank plus sparse matrices not being closed, which in turn is equivalent to the notion of the matrix rigidity function not being lower semicontinuous (Kumar et al., 2014). By constructing infinite families of matrices, we derive bounds on the rank and sparsity such that the set of low-rank plus sparse matrices is not closed. We also demonstrate numerically that a wide range of non-convex algorithms for both Robust PCA and matrix completion have diverging components when applied to our constructed matrices. An analogy can be drawn to the case of sets of higher order tensors not being closed under canonical polyadic (CP) tensor rank, rendering the best low-rank tensor approximation unsolvable (Silva and Lim, 2008) and hence encourage the use of multilinear tensor rank (De Lathauwer, 2000).
Principal Component Analysis (PCA) plays a crucial role in the analysis of high-dimensional data [43, 38, 1, 18] and is a widely used dimensionality reduction technique [23, 26, 36, 33]. It involves solving a low-rank approximation which can be easily computed for moderate size problems  by computing the singular value decomposition (SVD), or for larger problem sizes using notions of sketching to compute leading portions of the SVD [22, 14, 47]. Over the last decade PCA has been extended to allow for missing data (matrix completion) or data with either corrupted or few entries inconsistent with a low-rank model (Robust PCA). In this manuscript we show that the set of matrices which are the sum of low-rank and sparse matrices is not closed for a range of rank, sparsity, and matrix dimensions; see Theorem 1.1. Consequently there are a number of algorithms which seek such a decomposition where the constituents diverge while at the same time the sum of the matrices converges, see Section 3. We thereby highlight a previously unknown issue practitioners might experience using these techniques. The situation is analogous to the lack of closedness for Tensor CP decomposition rank [25, 24] which motivates the notions of multilinear rank approximation .
1.1 Prior work
Robust PCA (RPCA) solves a low-rank plus sparse matrix approximation with the sparse component allowing for few but arbitrarily large corruptions in the low-rank structure; that is, a matrix is decomposed into a low-rank matrix plus a sparse matrix
where is the set of matrices that can be expressed as a rank matrix plus a sparsity matrix
We omit the subscript and write where the matrix size is implied from the context and use only a single subindex to denote sets of square matrices . Allowing the addition of a sparse matrix to the low-rank matrix can be viewed as modelling globally correlated structure in the low-rank component while allowing local inconsistencies, innovations, or corruptions. Exemplar applications of this model include image restoration , hyperspectral image denoising [17, 10, 45], face detection [32, 48], acceleration of dynamic MRI data acquisition [35, 49], analysis of medical imagery [2, 16], separation of moving objects in at otherwise static scene , and target detection [34, 39] .
where denotes the Schatten 1-norm555The Schatten 1-norm is often also referred to as the nuclear norm . of a matrix (sum of its singular values) and denotes the norm of a vectorised matrix (sum of absolute values of its entries). In , authors show that exact decomposition of a low-rank plus sparse matrix is possible for randomly chosen sparsity locations even for the case of the sparsity level being a fixed fraction with . The work of  takes a deterministic approach in which corrupted entries can have arbitrary locations but must be sufficiently spread such that the sparsity fraction of each row and column does not exceed . In both the works of  and , as well as subsequent extensions, it is common to impose conditions on the singular vectors of the low-rank component being sufficiently uncorrelated with the canonical basis.
Robust PCA is closely related to the problem of recovering a low-rank matrix from incomplete observations referred to as matrix completion . The main difference between the two is that, in the case of a matrix completion, the indices of missing entries are known, and the aim is to solve
where is entry-wise subsampling of observed entries of with indices in .
Similarly to the case of Robust PCA, matrix completion can be approached by solving a convex relaxation formulation of the problem [7, 8, 37], but there are also a number of algorithms that solve the non-convex formulation directly while also providing recovery guarantees [5, 21, 29, 30, 40, 41, 46]. Such non-convex methods are typically observed to be able to recover matrices with higher ranks than is possible by solving the convex relaxed problem .
1.2 Main contribution
It is well known that the model from (1.1) need not have a unique solution without further constraints, such as the singular vectors of the low-rank component being uncorrelated with the canonical basis as quantified by the incoherence condition with parameter
where is the singular value decomposition of the rank component of size . The incoherence condition for small values of ensures that left and right singular vectors are well spread out and not sparse [7, 37].
Trivial examples of matrices with non-unique decompositions in include any matrix with two nonzero entries in differing rows and columns as they are in for any and such that with the entries of the matrix assigned to the sparse or low-rank components selected arbitrarily. Moreover, completion of a low-rank matrix is impossible for sampling patterns that are disjoint from the support of the matrix , which can be likely for matrices that have few nonzeros. Both of the aforementioned problems are overcome by imposing a low coherence which ensures the singular vectors of the low-rank matrix have most entries being nonzero .
Herein we highlight the presence of a more fundamental difficulty: There are matrices for which Robust PCA and matrix completion can have no solution in that their constituents diverge even while the objective is minimized to zero. This is not because of the ambiguity between possible solutions or lack of information about the matrix, but instead because is not a closed set. Moreover, this is not an isolated phenomenon, as sequences of matrices converging outside of the set can be constructed for a wide range of ranks, sparsities and matrix sizes.
Theorem 1.1 ( is not closed).
The set of low-rank plus sparse matrices is not closed for , provided , or provided where is a multiple of a squared integer.
Theorem 1.1 implies that there are matrices such that problem (1.1) is ill-posed in that the objective can be decreased to zero with the constituents and diverging with unbounded energy. The problem size bounds in Theorem 1.1 allow for matrices with to have number of corruptions of order for , which for constant rank allows to be quadratic in , and for to have the number of corruptions of order . In Section 1.2.1 we illustrate the non-closedness of and the consequent ill-posedness of the corresponding Robust PCA and low-rank matrix completion problems.
1.2.1 Simple example of being open
Consider solving for the optimal approximation to the following matrix, which is a special case of construction given in  in the context of the matrix rigidity function not being lower semicontinuous.
Consider the following sequence of matrices
which can decrease the objective function to zero as , but at the cost of the constituents and diverging with unbounded energy. Moreover, the sequence which minimizes the error converges to a matrix lying outside of the feasible set and is in the set instead. As a consequence, Robust PCA as posed in (1.5) does not have a global minimum. As the objective function is decreased towards zero, the energy of both the low-rank and the sparse components diverge to infinity. Likewise, we could consider an instance of the matrix completion problem (1.3) in which the top left entry of is missing and a rank approximation is sought. We see that a rank solution cannot be obtained as there does not exist a choice for the top left entry that would reduce the rank of to . However, the sequence decreases the objective arbitrarily close to zero while the energy of the iterates grows without bounds, .
1.3 Connection with matrix rigidity
Robust PCA is closely related to the notion of the matrix rigidity function which was originally introduced in complexity theory by Valiant  and refers to the minimum number of entries of that must be changed in order to reduce it to rank or lower.
Matrix rigidity is upper bounded for any and rank as
due to elementary matrix properties . Matrices which achieve this upper bound for every are referred to as maximally rigid and it was only recently showed in  how to construct them explicitly, which was a long standing open question originally posed by Valiant in 1977.
Matrix rigidity has important consequences for complexity of linear algebraic circuits but is also of interest for its mathematical properties. The work of  also provides an example of the rigidity function not being lower semicontinuous, which implies the set is not closed. Here, we generalize the result, providing non-closedness examples for many ranks, sparsities and matrix sizes, and discuss consequences for Robust PCA and matrix completion problems. In Section 2 we prove Theorem 1.1 and in Section 3 we illustrate how this phenomenon can cause several Robust PCA and matrix completion algorithms to diverge.
2 Main result
We extend the example of with given in (1.5) by constructing and yet for which there exists a sequence of matrices which are in and = 0. Matrix as in (2.5) demonstrates that is not closed for (Lemma 2.2) and matrix as in (2.11) is constructed for (Lemma 2.3). In both cases we require to be sufficiently large in terms of and .
For the case , consider and of the following general form
where and denotes the matrix with all zero entries. These constructed matrices satisfy the following properties.
Lemma 2.1 (General form of ).
Let and be as defined in (2.1). Then . Furthermore
Remark 2.1 (Nested property of sets).
Note that sets form a partially ordered set
for any and . As a consequence implies that also for .
With Lemma 2.1 we give the general form of and such that for . It remains to show that, for a more specific choice of and , we also have . In particular, we construct and as follows.
where are matrices with all non-zero entries, are arbitrary non-singular matrices which may, but need not, be the same, and denote matrices with all entries equal to zero or one respectively, and we set , .
By construction, the matrix size is , due to the matrices and for each being of size , the top left zero matrix and columns of and .
is not closed for provided
It remains to prove that , which is equivalent to showing . We show that having a sparse component is insufficient for , because for any choice of such with at most non-zero entries, the matrix must have a minor with nonzero determinant implying .
In order to establish we consider minors of each of size . For of these we select minors that include , , along with an additional column from the first columns and an additional row entry from row index to from ; and for the remaining minors we similarly choose a and an additional row and column as before.
These minors are of the form as shown in (2.7) where the are chosen to be different entries from for each . This requires to be of size for . Recall that, by construction of , the have no zero entries and are each full rank. The are constructed as
where denotes the matrix with all entries equal to zero. Note that matrices do not have disjoint supports as they have some elements from the top left submatrix of in common. These are the left zero entries in the first row of for and the top zero entries in the first column of for . We refer to these entries as the intersecting part of .
We now consider the possible such that and show that any such must have at least nonzeros, thus . This follows by noting that although the have intersecting portions, restricted to the subminor associated with will have at least one distinct nonzero per . Consider the for associated with and and let be the corresponding sparsity mask of . It follows that must have at least one entry in the non-intersecting set otherwise is of the form
which is insufficient for the rank of to become rank deficient; similarly for .
Having shown we set , which then implies that . By the construction of in this argument we have
due to the matrices and each of size , the top left matrix and columns of or rows of respectively, and by zero padding of the matrix we can arbitrarily increase its size. Substituting and , we conclude that is not a closed set for provided
Turning to the case, we now build upon Lemma 2.3 by constructing matrices and as
where are identical full rank matrices and
have the same structure as in (2.5) but with replaced by and as a result , , , so while .
By construction, the size of is and the size of is .
is not closed for provided
Consider and from (2.11). By additivity of rank for block diagonal matrices, and , we have that .
It remains to show that by proving that . We show that having a sparse component is insufficient for , because for any such , matrix must have at least one minor with non-zero determinant, implying .
We consider minors of size by diagonally appending a minor of of a similar structure as in (2.7) and the whole diagonal block
Due to matrices being picked from the block diagonal, the intersecting parts of supports between are only the intersecting parts between individual as explained in (2.7) in the proof of Lemma 2.2. We will ensure that in order for we require to have at least one non-zero in a part of that is disjoint from for . Either has at least one non-zero on a zero block or or . If the non-zero is in a zero block or , then these are disjoint which implies at least non-zero entries. On the other hand, if the non-zero is in then at least one entry of must be changed in the non-intersecting part of as argued following equation (2.7). Therefore for every at least one distinct entry per must be changed using the corresponding sparsity component , and since , we must also change at least entries of . We thus have .
By the construction of in this argument we have
where the size of comes from times repeating the matrices and each of size , the top left matrix , the column and row respectively and times repeating matrix of size . By zero padding of the matrix we can arbitrarily increase its size. Substituting gives that is not a closed set for provided
The low-rank plus sparse set is not closed provided and , .
where the first inequality in (2.17) comes from an upper bound on the ceiling function , the second inequality follows from and the last inequality holds for .
The first inequality in (2.18) comes from an upper bound on the ceiling function and the second inequality holds for .
2.1 Quadratic sparsity
Note that the condition limits the order of and ; in particular if then which for constrains to be at most linear in , . In Lemma 2.4 and Lemma 2.5, we extend the result so that for and we obtain which for constant rank, , allows to be quadratic .
Lemma 2.4 establishes a lower bound on the rigidity of block matrices in terms of the rigidity of a single block. Lemma 2.5 shows that the sequence converging to is an example of not being closed provided . Let
where matrices and are of the same structure as in (2.12) and where is constructed by repeating in row and column blocks.
For as in (2.19)
Let be the sparsity matrix corresponding to , such that
where denotes the sparsity matrix used in the place of the block. A necessary condition for is that also the rank of individual blocks is less than or equal to , that is
By definition of the rigidity function as the minimal sparsity of such that , we have that
Summing over all blocks yields the result
and consequently that
is not closed provided
and , .
Consider and as in (2.19). Repeating times in row and column blocks does not increase the rank, so and by additivity of sparsity we have that . By Lemma 2.4 and we have the strict lower bound on the rigidity of
which implies that while .
Recall that the size of as defined in (2.12) is and, since is repeated times, we obtain
where the inequality comes from zero padding of the matrix to arbitrarily expand its size. ∎
The low-rank plus sparse set is not closed provided
and , .
We weaken the condition of Lemma 2.5 and show that it suffices to have for not closed by substituting
where in the first line we substitute , the first inequality comes from an upper bound on the ceiling function, the second inequality follows from , and the last inequality holds for . ∎
2.2 Almost maximally rigid examples of non-closedness
We would like to prove non-closedness of sets for as high ranks and sparsities as possible. There cannot be a maximally rigid sequence converging outside because corresponds to the set of all matrices. Similarly, it is necessary that both and hold since sets of rank matrices and sets of sparsity matrices are both closed. As a consequence, the highest possible rank and sparsity for which we may hope to prove that is not closed corresponds to one strictly less than the maximal rigidity bound, i.e. for and also .
It is shown in  that the matrix rigidity function might not be semicontinuous even for maximally rigid matrices. This translates into the set not being closed as we have which converges to by choosing
It is easy to check that for a general choice of , is maximally rigid with . However, since can be expressed in the following way
We therefore have that is not a closed set, which is the optimal result with the highest possible sparsity for sets of rank matrices of size . We pose the question as to whether this result can be generalized and the following conjecture holds.
Conjecture 2.1 (Almost maximally rigid non-closedness).
The low-rank plus sparse set is not closed provided
for and .
3 Numerical examples with divergent Robust PCA and matrix completion
Theorem 1.1 and the constructions in Section 2 indicate that there are matrices for which Robust PCA and matrix completion, as stated in (1.1) and (1.3) respectively, are not well defined. In particular, the objective can be driven to zero while the components diverge with unbounded norms. Herein we give examples of two simple matrices which are of a similar construction to in (1.5),
which are not in , but can be approximated by an arbitrarily close , and for which popular RPCA and MC algorithms exhibit this divergence. This is analogous to the problem of diverging components for CP-rank decomposition of higher order tensors which is especially pronounced for algorithms employing alternating search between individual components .