Algebraic Variety Models for High-Rank Matrix Completion
We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e., each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after mapping each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of monomial features. Our algorithm uses the well-known “kernel trick” to avoid working directly with the high-dimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.
Work in the last decade on matrix completion has shown that it is possible to leverage linear structure in order to interpolate missing values in a low-rank matrix [?]. The high-level idea of this work is that if the data defining the matrix belongs to a structure having fewer degrees of freedom than the entire dataset, that structure provides redundancy that can be leveraged to complete the matrix. The assumption that the matrix is low-rank is equivalent to assuming the data lies on (or near) a low-dimensional linear subspace.
It is of great interest to generalize matrix completion to exploit low-complexity nonlinear structures in the data. Several avenues have been explored in the literature, from generic manifold learning [?], to unions of subspaces [?], to low-rank matrices perturbed by a nonlinear monotonic function [?]. In each case missing data has been considered, but there lacks a clear, unifying framework for these ideas.
In this work we study the problem of completing a matrix whose columns belong to an algebraic variety, i.e., the set of solutions to a system of polynomial equations [?]. This is a strict generalization of the linear (or affine) subspace model, which can be written as the set of points satisfying a system of linear equations. Unions of subspaces and unions of affine spaces also are algebraic varieties. In addition, a much richer class of non-linear curves, surfaces, and their unions, are captured by a variety model.
The matrix completion problem using a variety model can be formalized as follows. Let be a matrix of data points where each column . Define as the mapping that sends the vector to the vector of all monomials in of degree at most , and let denote the matrix that results after applying to each column of , which we call the lifted matrix. We will show the lifted matrix is rank deficient if and only if the columns of belong to an algebraic variety. This motivates the following matrix completion approach:
where represents a projection that restricts to some observation set . The rank of depends on the choice of the polynomial degree and the underlying “complexity” of the variety, in a sense we will make precise. Figure 1 shows two examples of datasets that have low-rank in the lifted space for different polynomial degree.
In this work we investigate the factors that influence the sampling complexity of varieties as well as algorithms for completion. The challenges are (a) to characterize varieties having low-rank (and therefore few degrees of freedom) in the lifted space, i.e., determine when is low-rank, (b) devise efficient algorithms for solving that can exploit these few degrees of freedom in a matrix completion setting, and (c) determine the trade-offs relative to existing matrix completion approaches. This work contributes considerable progress towards these goals.
For a given variety model, we seek to describe the degrees of freedom that determine the sampling complexity of the model. For example, it is well-known that rank matrix can be completed from sampled entries under standard incoherence assumptions [?]. This is very close to the degrees of freedom for such a matrix. Similarly, the degrees of freedom in the lifted space is , where is the rank of the lifted matrix and is the number of higher-degree monomials. This is suggestive of the number of samples required for completion in the lifted space and in turn the number of samples required in the original observation space. We note that although and , for many varieties , implying potential for completion in the lifted space.
Our contributions are as follows. We identify bounds on the rank of a matrix when the columns of the data matrix belong to an algebraic variety. We study how many entries of such a matrix should be observed in order to recover the full matrix from an incomplete sample. We show as a case study that monomial representations produce low-rank representations of unions of subspaces, and we characterize the rank. The standard union of subspace representation as a discrete collection of individual subspaces is inherently non-smooth in nature, whereas the algebraic variety allows for a purely continuous parameterization. This leads to general algorithms for completion of a data matrix whose columns belong to a variety. The algorithms’ performance are showcased on data simulated as a union of subspaces, a union of low-dimensional parametric surfaces, and real data from a motion segmentation dataset and a motion capture dataset. The simulations show that the performance of our algorithm matches our predictions and outperforms other methods. In addition, the analysis of the degrees of freedom associated with the proposed representations introduces several new research avenues at the intersection of nonlinear algebraic geometry and random matrix theory.
There has been a great deal of research activity on matrix completion problems since [?], where the authors showed that one can recover an incomplete matrix from few entries using a convex relaxation of the rank minimization optimization problem. At this point it is even well-known that entries are necessary and sufficient [?] for almost every matrix as long as the measurement pattern satisfies certain deterministic conditions. However, these methods and theory are restricted to low-rank linear models. A great deal of real data exhibit nonlinear structure, and so it is of interest to generalize this approach.
Work in that direction has dealt with union of subspaces models [?], locally linear approximations [?], as well as low-rank models perturbed by an arbitrary nonlinear link function [?]. In this paper we instead seek a more general model that captures both linear and nonlinear structure. The variety model has as instances low-rank subspaces and their union as well as quadratic and higher degree curves and surfaces.
Work on kernel PCA (cf., [?]) leverage similar geometry to ours. InKernel Spectral Curvature Clustering [?], the authors similarly consider clustering of data points via subspace clustering in a lifted space using kernels. These works are algorithmic in nature, with promising numerical experiments, but do not systematically consider missing data or analyze relative degrees of freedom.
This paper also has close ties to algebraic subspace clustering (ASC) [?], also known as generalized PCA. Similar to our approach, the ASC framework models unions of subspaces as an algebraic variety, and makes use of monomial liftings of the data to identify the subspaces. Characterizations of the rank of data belonging to union of subspaces under the monomial lifting are used in the ASC framework [?] based on results in [?]. The difference of the results in [?] and those in Prop. ? is that ours hold for monomial liftings of all degrees , not just , where is the number of subspaces. Also, the main focus of ASC is to recover unions of subspaces or unions of affine spaces, whereas we consider data belonging to a more general class of algebraic varieties. Finally, the ASC framework has not been adapted to the case of missing data, which is the main focus of this work.
As a simple example to illustrate our approach, consider a matrix
whose six columns satisfy the quadratic equation
for and some unknown constants that are not all zero. Generically, will be full rank. However, suppose we vertically expand each column of the matrix to make a matrix
i.e., we augment each column of with a and with the quadratic monomials , , . This allows us to re-express the polynomial equation as the matrix-vector product
where . In other words, is rank deficient. Suppose, for example, that we are missing entry of . Since is full rank, there is no way to uniquely complete the missing entry by leveraging linear structure alone. Instead, we ask: Can we complete using the linear structure present in ? Due to the missing entry , the first column of will having the following pattern of missing entries: . However, assuming the five complete columns in are linearly independent, we can uniquely determine the nullspace vector up to a scalar multiple. Then from we have
In general, this equation will yield at most two possibilities for . Moreover, there are conditions where we can uniquely recover , namely when and .
This example shows that even without a priori knowledge of the particular polynomial equation satisfied by the data, it is possible to uniquely recover missing entries in the original matrix by leveraging induced linear structure in the matrix of expanded monomials. We now show how to considerably generalize this example to the case of data belonging to an arbitrary algebraic variety.
Let be a matrix of data points where each column . Define as the mapping that sends the vector to the vector of all monomials in of degree at most :
where is a multi-index of non-negative integers, with , and . In the context of kernel methods in machine learning, the map is often called a polynomial feature map [?]. Borrowing this terminology, we call a feature vector, the entries of features, and the range of feature space. Note that the number of features is given by , the number of unique monomials in variables of degree at most . When is an matrix, we use to denote the matrix .
The problem we consider is this: can we complete a partially observed matrix under the assumption that is low-rank? This can be posed as the optimization problem given above in Equation . We give a practical algorithm for solving a relaxation of in Section 4. Similar to previous work cited above on using polynomial feature maps, our method leverages thekernel trick for efficient computations. However, it would be naïve to think of the associated analysis as applying known results on matrix completion sample complexities to our high-dimensional feature space. In particular, if we observe entries per column in a rank- matrix of size and apply the polynomial feature map, then in the feature space we have entries per column in a rank- matrix of size . Generally, the number of samples, rank, and dimensional all grow in the mapping to feature space, but they grow at different rates depending on the underlying geometry; it is not immediately obvious what conditions on the geometry and sampling rates impact our ability to determine the missing entries. In the remainder of this section, we show how to relate the rank of to the underlying variety, and we study the sampling requirements necessary for the completion of the matrix in feature space.
To better understand the rank of the matrix , we introduce some additional notation and concepts from algebraic geometry. Let denote the space of all polynomials with real coefficients in variables . We model a collection of data as belonging to a real (affine) algebraic variety [?], which is defined as the common zero set of a system of polynomials :
Suppose the variety is defined by the finite set of polynomials , where each has degree at most . Let be the matrix whose columns are given by the vectorized coefficients of the polynomials in . Then the columns of belong to the variety if and only if . In particular, assuming the columns of are linearly independent, this shows that has rank . In particular, when the number of data points , then is rank deficient.
However, the exact rank of could be much smaller than , especially when the degree is large. This is because the coefficients of any polynomial that vanishes at every column of satisfies . We will find it useful to identify this space of coefficients with a finite dimensional vector space of polynomials. Let be the space of all polynomials in real variables of degree at most . We define vanishing ideal of degree corresponding to a set , denoted by , to be subspace of polynomials belonging to that vanish at all points in :
We also define the non-vanishing ideal of degree corresponding to , denoted by , to be the orthogonal complement of in :
where the inner product of polynomials is defined as the inner product of their coefficient vectors. Hence, the rank of a data matrix in feature space can be expressed in terms of the dimension of non-vanishing ideal of degree corresponding to , the set of all columns of . Specifically, we have where
This follows from the rank-nullity theorem, since has dimension . In general the dimension of the space or is difficult to determine when is an arbitrary set of points. However, if we assume is a subset of a variety , since we immediately have the bound
In certain cases can be computed exactly or bounded using properties of the polynomials defining . For example, it is possible to compute the dimension of directly from a Gröbner basis for the vanishing ideal associated with [?]. In Section 3 we show how to bound the dimension of in the case where is a union of subspaces.
Informally, the degrees of freedom of a class of objects is the minimum number of free variables needed to describe an element in that class uniquely. For example, a rank matrix has degrees of freedom: parameters to describe linearly independent columns making up a basis of the column space, and parameters to describe the remaining columns in terms of this basis. It is impossible to uniquely complete a matrix in this class if we sample fewer than this many entries.
We can make a similar argument to specify the minimum number of samples needed to uniquely complete a matrix that is low-rank when mapped to feature space. First, we characterize how missing entries of the data matrix translate to missing entries in feature space. For simplicity, we will assume a sampling model where we sample a fixed number of entries from each column of the original data matrix. Let represent a single column of the data matrix, and with denote the indices of the sampled entries of . The pattern of revealed entries in corresponds to the set of multi-indices:
which has the same cardinality as the set of all monomials of degree at most in variables, i.e., . If we call this quantity , then the ratio of revealed entries in to the feature space dimension is
which is on the order of for small . More precisely, we have the bounds
In total, observing entries per column of the data matrix translates to entries per column in feature space. Suppose the lifted matrix is rank . By the preceding discussion, we need least entries of the feature space matrix to complete it uniquely among the class of all matrices of rank . Hence, at minimum we need to satisfy
Let denote the minimal value of such that achieves the bound , and set . Dividing through by the feature space dimension and gives
and so from we see we can guarantee this bound by having
and this in fact will result in tight satisfaction of because for small and large .
At one extreme where the matrix is full rank, then or and according to we need , i.e., full sampling of every data column. At the other extreme where instead we have many more data points than the feature space rank, , then gives the asymptotic bound .
The above discussion bounds the degrees of freedom of a matrix that is rank- in feature space. Of course, the proposed variety model has potentially fewer degrees of freedom than this, because additionally the columns of the lifted matrix are constrained to lie in the image of the feature map. We use the above bound only as a rule of thumb for sampling requirements on our matrix. Furthermore, we note that sample complexities for standard matrix completion often require that locations are observed uniformly at random, whereas in our problem the locations of observations in the lifted space will necessarily be structured. However, there is recent work that shows matrix completion can suceed without these assumptions [?] that gives reason to believe random samples in the original space may allow completion in the lifted space, and our empirical results support this rationale.
3Case Study: Union of affine subspaces
A union of affine subspaces can also be modeled as an algebraic variety. For example, with , the union of the plane and the line is the zero-set of the quadratic polynomial . In general, if are affine spaces of dimension and , respectively, then we can write and where the and are linear, and their union can be expressed as the common zero set of all possible products of the and :
i.e., is the common zero set of a system of quadratic equations. This argument can be extended to show a union of affine subspaces of dimensions is a variety described by a system of polynomial equations of degree .
In this section we establish bounds on the feature space rank for data belonging to a union of subspaces. We will make use of the following lemma that shows the dimension of a vanishing ideal is fixed under an affine change of variables:
We omit the proof for brevity, but the result is elementary and relies on the fact the degree of a polynomial is unchanged under an affine change of variables. Our next result establishes a bound on the feature space rank for a single affine subspace:
By Lemma ?, is preserved under an affine transformation of . Note that we can always find an affine change of variables with invertible and such that in the coordinates the variety becomes
For any polynomial , the only monomial terms in that do not vanish on are those having the form . Furthermore, any polynomial in just these monomials that vanishes on all of must be the zero polynomial, since the are free variables. Therefore,
i.e., the non-vanishing ideal coincides with the space of polynomials in variables of degree at most , which is , proving the claim.
We note that for sufficiently large, the bound in becomes an equality, provided the data points are distributed generically within the affine subspace, meaning they are not the solution to additional non-trivial polynomial equations of degree at most .
We now derive bounds on the dimension of the non-vanishing/vanishing ideals for a union of affine varieties. Below we give a more general argument for a union of arbitrary varieties, then specialize to affine spaces.
Let be any two varieties. It follows directly from definitions that
Applying orthogonal complements to both sides above gives
Therefore, we have the bound
In the case of an arbitrary union of varieties , repeated application of gives
Specializing to the case where each is affine subspace of dimension at most and applying the result in Prop. ? gives the following result:
We remark that in some cases the bound is (nearly) tight. For example, if the data lies on the union of two -dimensional affine subspaces and that are mutually orthogonal, one can show . The rank is one less than the bound in because has dimension one, coinciding with the space of constant polynomials. Determining the exact rank for data belonging to an arbitrary finite union of subspaces appears to be a intricate problem; see [?] which studies the related problem of determining the Hilbert series of a union of subspaces. Empirically, we observe that the bound in is order-optimal with respect to and .
The feature space rank to dimension ratio in this case is given by
Recall that the minimum sampling rate is approximately for . Hence we would need
This rate is favorable to low-rank matrix completion approaches, which need measurements per column for a union of subspaces having dimension . While this bound suggests it is always better to take the degree as large as possible, this is only true for sufficiently large . To take advantage of the improved sampling rate implied by , according to we need the number of data vectors per subspace to be . In other words, our model is able to accommodate more subspaces with larger but at the expense of requiring exponentially more data points per subspace. We note that if the number of data points is not an issue, we could take and require only observed entries per column. In this case, for moderately sized (e.g., ) we should choose we have or . In fact, we find that for these values of we get excellent empirical results, as we show in Section ?.
There are several existing matrix completion algorithms that could potentially be adapted to solve a relaxation of the rank minimization problem , such as singular value thresholding [?], or alternating minimization [?]. However, these approaches do not easily lend themselves to “kernelized” implementations, i.e., ones that do not require forming the high-dimensional lifted matrix explicitly, but instead make use of the efficiently computable kernel function for polynomial feature maps
For matrices , we use to denote the matrix whose -th entry is , equivalently,
where is the matrix of all ones, and denotes the entrywise -th power of a matrix. A kernelized implementation of the matrix completion algorithm is critical for large , since the rows of the lifted matrix scales exponentially with .
One class of algorithm that kernelizes very naturally is the iterative reweighted least squares (IRLS) approach of [?] for low-rank matrix completion. The algorithm also has the advantage of being able to accommodate the non-convex Schatten- relaxation of the rank penalty, in addition to the convex nuclear norm relaxation. Specifically, we use an IRLS approach to solve
where is the Schatten- quasi-norm defined as
with denoting the singular value of . Note that for we recover the nuclear norm. We call this optimization formulation variety-based matrix completion (VMC).
The basic idea behind the IRLS approach can be illustrated in the case of the nuclear norm. First, we can re-express the nuclear norm as a weighted Frobenius norm:
and then attempt to minimize the nuclear norm of a matrix belonging to a constraint set by performing the iterations
Note the -update can be recast as a weighted least-squares problem subject to the iteratively updated weight matrix , lending the algorithm its name. To ensure the matrix defining is invertible, and to improve numerical stability, we can also introduce a smoothing parameter to the -update as , satisfying as .
Making the substitution , and replicating the steps above gives the following IRLS approach for solving with :
Rather than finding the exact minimum in the update, which could be costly, following the approach in [?], we instead take a single projected gradient descent step to update . A straightforward calculation shows that the gradient of the objective is given by , where denotes an entry-wise product. Hence a projected gradient step is given by
where is a step-size parameter.
- This is a consequence of the Hilbert basis theorem [?], which shows that every vanishing ideal has a finite generating set. For related discussion see Appendix C of [?].
- Strictly speaking, is not kernel associated with the polynomial feature map as defined in . Instead, it is the kernel of the related map where are appropriately chosen multinomial coefficients.