Exact Solutions in Structured Low-Rank Approximation

Exact Solutions in Structured Low-Rank Approximation

Abstract

Structured low-rank approximation is the problem of minimizing a weighted Frobenius distance to a given matrix among all matrices of fixed rank in a linear space of matrices. We study the critical points of this optimization problem using algebraic geometry. A particular focus lies on Hankel matrices, Sylvester matrices and generic linear spaces.

1Introduction

Low-rank approximation in linear algebra refers to the following optimization problem:

Here, we are given a real data matrix of format , and we wish to find a matrix of rank at most that is closest to in a weighted Frobenius norm. The entries of the weight matrix are positive reals. If and the weight matrix is the all-one matrix then the solution to (Equation 1) is given by the singular value decomposition

Here are orthogonal matrices, and are the singular values of . By the Eckart-Young Theorem, the matrix of rank closest to equals

For weights , the situation is more complicated, as seen in the studies [17]. In particular, there can be many local minima. We discuss a small instance in Example ?.

In structured low-rank approximation [5], we are also given a linear subspace , typically containing the matrix . We consider the restricted problem:

A best-case scenario for is this: if lies in then so does . This happens for some subspaces , including symmetric and circulant matrices, but most subspaces do not enjoy this property (cf. [5]). Our problem is difficult even for .

Most practitioners use local methods to solve (Equation 3). These methods return a local minimum. There are many heuristics for ensuring that a local minimum is in fact a global minimum, but there is never a guarantee that this has been accomplished. Another approach is to set up sum of squares relaxations, which are then solved with semidefinite programming (cf. [2]). These SOS methods furnish certificates of global optimality whenever the relaxation is exact. While this does happen in many instances, there is no a-priori guarantee either.

How then can one reliably find all global optima to a polynomial optimization problem such as (Equation 3)? Aside from interval arithmetic and domain decomposition techniques, the only sure method we are aware of is to list and examine all the critical points. Algorithms that identify the critical points, notably Gröbner bases [8] and numerical algebraic geometry [1], find all solutions over the complex numbers and sort out the real solutions after the fact. The number of complex critical points is an intrinsic invariant of an optimization problem, and it is a good indicator of the running time needed to solve that problem exactly. The study of such algebraic degrees is an active area of research, and well-developed results are now available for semidefinite programming [21] and maximum likelihood estimation [4].

The present paper applies this philosophy to structured low-rank approximation. A general degree theory for closest points on algebraic varieties was introduced by Draisma et al. in [6]. Following their approach, our primary task is to compute the number of complex critical points of (Equation 3). Thus, we seek to find the Euclidean distance degree (ED degree) of

This determinantal variety is always regarded as a subvariety of the matrix space , and we use the -weighted Euclidean distance coming from . We write for the -weighted Euclidean distance degree of the variety . Thus is the number of complex critical points of the problem (Equation 3) for generic data matrices . The importance of keeping track of the weights was highlighted in [6], for the seemingly harmless situation when is the subspace of all symmetric matrices in .

Our initial focus lies on the unit ED degree, when is the all-one matrix, and on the generic ED degree, denoted , when the weight matrix is generic. Choosing generic weights ensures that the variety meets the isotropic quadric transversally, and it hence allows us to apply formulas from intersection theory such as [6].

This paper is organized as follows. In Section 2 we offer a computational study of our optimization problem (Equation 3) when the subspace is generic of codimension . Two cases are to be distinguished: either is a vector space, defined by homogeneous linear equations in the matrix entries, or is an affine space, defined by inhomogeneous linear equations. We refer to these as the linear case and affine case respectively. We present Gröbner basis methods for computing all complex critical points, and we report on their performance. From the complex critical points, one identifies all real critical points and all local minima.

In Section 3 we derive some explicit formulas for when is generic. We cover the four cases that arise by pairing the affine case and the linear case with either unit weights or generic weights. Here we are using techniques from algebraic geometry, including Chern classes and the analysis of singularities. In Section 4, we shift gears and we focus on special matrices, namely Hankel matrices and Sylvester matrices. Those spaces arise naturally from symmetric tensor decompositions and approximate GCD computations. These applications require the use of certain specific weight matrices other than .

We close the introduction with two examples that illustrate the concepts above.

William Rey [22] reports on numerical experiments with the optimization problem (Equation 1), and he asks whether the number of local minima is bounded above by . Our Example ? gives a negative answer: the number of local minima can exceed . This result highlights the value of our exact algebraic methods for practitioners of optimization.

2Gröbner Bases

The critical points of the low-rank approximation problem (Equation 3) can be computed as the solution set of a system of polynomial equations. In this section we derive these equations, and we demonstrate how to solve a range of instances using current Gröbner basis techniques. Here, our emphasis lies on the case when is a generic subspace, either linear or affine.

Starting with the linear case, let be a basis of , the space of linear forms on that vanish on . Thus , each derivative is a constant, and . The case when is an affine space can be treated with the same notation if we take each to be a linear form plus a constant.

The following implicit formulation of the critical equations is a variation on [6]. We begin with the case . Let denote the determinant of the -matrix . Given a data matrix , the critical points of on the determinantal hypersurface verify the following conditions. The matrix on the right has rows and columns:

Any singular point of also satisfies these conditions. The rank condition on the Jacobian matrix can be modeled by introducing Lagrange multipliers . These are new variables. We now consider the following polynomial system in variables:

Table ? shows the number of complex solutions to these equations. These numbers are obtained from the formulas in Section 3. We verified them using Gröbner bases.

The ED degree for the determinant of an -matrix with linear or affine entries.
2 3 4 5
4 15 28 45
2 31 92 205
0 39 188 605
33 260 1221
21 284 1805
9 284 2125
3 284 2205
0 284 2205
264 2205
204 2205
120 2205
52 2205
16 2205
4 2205
0 2205
The ED degree for the determinant of an -matrix with linear or affine entries.
2 3 4 5
6 15 28 45
4 31 92 205
2 39 188 605
39 260 1221
33 284 1805
21 284 2125
9 284 2205
3 284 2205
284 2205
264 2205
204 2205
120 2205
52 2205
16 2205
4 2205
The ED degree for the determinant of an -matrix with linear or affine entries.
6 39 284 2205
4 39 284 2205
2 39 284 2205
0 39 284 2205
33 284 2205
21 284 2205
9 284 2205
3 284 2205
0 284 2205
264 2205
204 2205
120 2205
52 2205
16 2205
4 2205
0 2205
The ED degree for the determinant of an -matrix with linear or affine entries.
6 39 284 2205
6 39 284 2205
4 39 284 2205
2 39 284 2205
39 284 2205
33 284 2205
21 284 2205
9 284 2205
3 284 2205
284 2205
264 2205
204 2205
120 2205
52 2205
16 2205
4 2205

We observe that Table ? has the following remarkable properties:

  • There is a shift between the ED degrees of affine and linear sections for . This phenomenon will be explained in Proposition ?.

  • For general, the third block of columns (linear entries) is constant for and the fourth one (affine entries) is constant for . This is explained in Corollaries ? and ?.

  • The differences between the first and the third block of columns (both with linear entries) equal those between the second and the fourth one (both with affine entries). This gap is expressed (conjecturally) with formula (Equation 10).

We prove the correctness of the formulation (Equation 4) and then discuss our computations.

We prove this for linear spaces . The argument is similar when is an affine space. Any solution of the system (Equation 4) corresponds to a point of where the Jacobian matrix of has a rank defect. There are two types of such points: the critical points of the distance function and singular points on the determinantal variety. Hence it suffices to prove that no point in the singular locus corresponds to a solution of (Equation 4). The matrix was assumed to be generic, so it has rank since is also generic.

If is a singular point of the linear section of the variety defined by , then there exists with such that

Let us assume by contradiction that extends to a solution of (Equation 4). Then

This means that belongs to and belongs to . Here denotes the Hadamard (coordinatewise) product of two matrices. The scalar product of and is zero. Since all coordinates live in , these conditions imply , and hence . We get a contradiction since has full rank, whereas .

The values of in Table ? can be verified computationally with the formulation (Equation 4). We used the implementation of Faugère’s Gröbner basis algorithm [8] in the maple package FGb. Computing Gröbner bases for (Equation 4) was fairly easy for , but difficult already for . For each of the cases in Table ?, we computed the ED degree by running FGb over the finite field with elements. However, due to substantial coefficient growth, this did not work over the field of rational numbers. Hence, to actually compute all critical points over and hence all local minima over , even for , a better formulation was required. In what follows we shall present two such improved formulations.

Duality plays a key role in the computation of the critical points of the Euclidean distance and was investigated in [6]. In what follows, we compute the critical points of the weighted Euclidean distance of the determinant by using this duality. In the following statement we are using the standing hypothesis that all are non-zero.

The critical points of (1) correspond to matrices such that the Hadamard product is perpendicular to the tangent space at of the variety of corank matrices. Recall, e.g. from [6], that the dual variety to is the variety of rank matrices. Hence, the critical points in (1) can be found by solving the linear equation on the conormal variety. That conormal variety is the set of all pairs such that , , , and . We can now express in terms of and the parameters by writing , where denotes the Hadamard (coordinatewise) inverse of the weight matrix . Using biduality, this means that is perpendicular to the tangent space at of the variety . This is equivalent to the statement that is a critical point of (2) on .

In both Propositions ? and ?, it is assumed that the given matrix is generic. Here the term generic is meant in the usual sense of algebraic geometry: lies in the complement of an algebraic hypersurface. In particular, that complement is dense in , so will be generic with probability one when drawn from a probability measure supported on . However, an exact characterization of genericity is difficult. The polynomial that defines the aforementioned hypersurface is the ED discriminant. As can be seen in [6], this is a very large polynomial of high degree, and we will rarely be able to identify it in an explicit way.

Proposition ? shows that weighted low-rank approximation can be solved by the dual problem. We focus now on the corank 1 case (whose dual problem is rank 1 approximation). For this, we use the parametrization of matrices of rank by

The parametrization (Equation 5) expresses the dual problem (for corank one) as an unconstrained optimization problem in variables:

Here, “maximize” is used in an unconventional way: what we seek is the critical point furthest to . That critical point need not be a local maximum; see e.g. [6]. We compute the critical points for (Equation 6) by applying Gröbner bases to the equations

The critical points of the primal problem are found by the formula .

This concludes our discussion of square matrices of rank or corank . We next consider the general case of rectangular matrices of format with general linear or affine entries. We assume and . Let be a complex -matrix of rank . Then is a smooth point in the variety of matrices of rank . Let and denote the left and right kernels of respectively. The normal space of at has dimension , and it equals [11]. Its orthogonal complement is the tangent space at , which has dimension .

In order to construct a polynomial system whose solutions are the critical points of on the smooth locus of , we introduce two matrices of unknowns:

For , , let be the rank matrix which is the product of the th column of and of the th row of . We consider

The rank condition on the matrix in (Equation 7) comes from the fact that is a critical point if the gradient of the distance function at belongs to the normal space of at . The first rows of the matrix span the normal space of at a smooth point. This formulation avoids saturating by the singular locus, which is often too costly.

This is derived from [6]. It is analogous to Proposition ?.

As in the corank case, for special data some critical points may be missed because our formulation computes only the critical points in a dense open subset of . However, the same fix as in Remark ? works here. We can redo the computations in any of the charts corresponding to the invertibility of pairs of square submatrices of and .

We next discuss our computational experience with Gröbner bases. In Table ?, we compare the efficiency of the different approaches on a specific problem: computing the weighted rank approximation of a matrix. The experimental setting is the following: we consider a matrix with integer entries picked uniformly at random in and a random weight matrix with positive integer entries chosen at random in . By Table ?, the generic ED degree is and the ED degree for is . We report in Table ? the timings for computing a lexicographical Gröbner basis with the maple package FGb [8]. Once a Gröbner basis is known, isolation techniques may be used to obtain the real roots. The maple package fgbrs provides implementations of such methods.

Symbolic computation of the weighted rank approximations of a matrix
Determinant
primal (Equation 4)
Parametric
dual (Equation 6)
Normal space
primal (Equation 7)
Normal space
dual (Equation 7)
generic, 5s 1.3s 6s 8.6s
generic, over day 891s 1327s 927s
, over 0.3s 0.2s 0.4s 0.5s

We examine three scenarios. In the first row, the computation is performed over a finite field. This gives information about the algebraic difficulty of the problem: there is no coefficient growth, and the timings indicate the number of arithmetic operations in Gröbner bases algorithms. However, finding local minima requires computing over . In rows 2 and 3 of Table ?, we compare the case of generic weights with the unweighted case (Equation 2) that corresponds to the singular value decomposition (). The dual problem is easiest to solve, in particular with the unconstrained formulation (Equation 6). Note that, for , such an unconstrained formulation is not available, since is generally not a unirational variety.

Symbolic computations for affine sections of determinantal varieties with .
/0.42s/1.8s /1.93s/744s /52.7s/– /349.2s/–
/0.2s/0.3s /0.3s/7.4s /1.2s/132s /1.5s/1120s
/0.3s/0.5s /0.5s/16s /2.1s/400s /7.1s/6038s
Symbolic computations for affine sections of determinantal varieties with .
/1474s/– /2739s/– /2961s/–
/2.2s/2696s /2.3s/4846s /2.1s/5764s
/16s/59091s /20s/160094s /20s/68164s
Symbolic computations for affine sections of determinantal varieties with .
/1816s/– /821s/– /349s/–
/2.2s/4570s /1.0s/1619s /0.8s/350s
/20s/99208s /20s/163532s /18s/263586s
Symbolic computations for affine sections of determinantal varieties with .
/92s/– /42s/450988s /1.4s/1970s
/0.3s/6.4s
/13s/67460s /1.9s/4568s /0.8s/114s

In Table ?, we report on some Gröbner basis computations with the maple package FGb for . Here we used the formulation (Equation 7). The ED degree, given in bold face, is followed by the time, measured in seconds, for computing the graded reverse lexicographic Gröbner basis. The first timing is obtained by performing the computation over the finite field ; the second one is obtained by computing over the field of rationals . The symbol “” means that we did not obtain the Gröbner basis after seven days of computation.

An important observation in Table ? is the correlation between the reported running times and the values of . The former tell us how many arithmetic operations are needed to find a Gröbner basis. This suggests that the ED degree is an accurate measure for the complexity of solving low-rank approximation problems with symbolic algorithms, and it serves as a key motivation for computing ED degrees using advanced tools from algebraic geometry. This will be carried out in the next section, both for generic and for . In particular, we shall arrive at theoretical explanations for the ED degrees in Tables ? and ?.

3Algebraic Geometry

The study of ED degrees for algebraic varieties was started in [6]. This section builds on and further develops the geometric theory in that paper. We focus on the low rank approximation problem (Equation 3), and we derive general formulas for the ED degrees in Tables ? and ?.

We recall that an affine variety is an affine cone if implies for every . The variety of -matrices of rank is an affine cone. If is an affine cone, then the corresponding projective variety is well defined. The ED degree of is the ED degree of its affine cone . The following proposition explains the shift between the third and fourth column of Table ?. More generally, it shows that we can restrict the analysis to linear sections, since the ED degree (for generic weights) in the affine case can be deduced from the linear case.

Let be the projective closure of . From [6], we have , since the transversality assumptions in that result are satisfied for general weights. From the equality , we conclude . Here, the second equality follows from .

Consider a projective variety embedded in with a generic system of coordinates. It was shown in [6] that is the sum of the degrees of the polar classes . Here, denotes the degree of the polar class of in dimension , as in [13]. Moreover, if is a generic linear subspace of codimension in then by [6]. We call -th sectional ED degree of the number . We denote by the dual variety of , as in [6], and already seen in the proof of Proposition ?.

This follows from results in Sections 5 and 6 in [6]. In order to compute we have to sum for . However, it is known that if .

A special role in [6] is played by the isotropic quadric in . If is smooth and transversal to then [6] gives an explicit formula for the ED degree in terms of Chern classes of . A thorough treatment of Chern classes can be found in [10]; the reader interested in the applications in this paper can be referred to the basics provided in [6]. By combining [6] with Corollary ?, we obtain

The inner sum is the polar class ; see the proof of [6].

We now apply Theorem ? to the situation when , , and is the Segre variety of matrices of rank in . The Chern polynomial of the tangent bundle of in the Chow ring equals . By [13], this implies

where is the coefficient of in the expansion of . Toric geometers may view as the sum of the normalized volumes of all -dimensional faces of the polytope ; see [6].

The following result explains the ED degrees in the third column in Table ?, and it allows us to determine this column for any desired value of , and :

The dual in to the Segre variety is the variety of matrices of rank . By [13], we have for all . With this duality of polar classes, the result follows from Corollary ? and [6].

Writing down closed formulas for intermediate values of is more difficult: it involves some Schubert calculus. However, can be conveniently computed with the following script in Macaulay2 [12]. It is a slight generalization of that in [6]:

loadPackage "Schubert2"ED=(m,n,r,s)->
(G = flagBundle({r,m-r}); (S,Q) = G.Bundles;
X=projectiveBundle (S^n); (sx,qx)=X.Bundles;
d=dim X; T=tangentBundle X;
sum(toList(s..m*n-2),i->sum(toList(i..d),j->(-1)^(d-j)*
   binomial(j+1,i+1)*integral(chern(d-j,T)*(chern(1,dual(sx)))^(j)))))

The function ED(m,n,r,s) computes the ED degree of the variety of matrices of rank , in general coordinates, cut with a generic linear space of codimension in . For this is precisely the function displayed in [6].

At this point, we wish to reiterate the main thesis of this paper, namely that knowing the ED degree ahead of time is useful for practitioners who seek to find and certify the global minimum in the optimization problem (Equation 3), and to bound the number of local minima. The following example illustrates this for one of the numbers 83 in the output in Example ?.

In Table ? and Example ? we observed that the sectional ED degree for generic does not depend on , provided is small. The following corollary explains this.

Let be the variety of matrices of rank . Its dual is the variety of matrices of rank and has codimension . This implies for . The assertion follows from Corollary ?.

Corollary ? can be stated informally like this: in the setting of generic weights and generic linear spaces of matrices with sufficiently high dimension, the algebraic complexity of structured low-rank approximation agrees with that of ordinary low-rank approximation.

Shifting gears, we now consider the case of unit weights . Thus, we fix as the isotropic quadric in . Let denote the Segre variety of matrices of rank in , and let denote the non-transversal locus of the intersection of with . The dual variety consists of all matrices of rank in . We conjecture that the following formula (put ) holds for the gap between the third and the first column of Table ?, (or between the fourth and the second, as well),

To compute the right-hand side, and to test this conjecture, we use

The Segre variety meets in the union of two irreducible components, and . The non-transversality locus is the intersection of these components.

Combining Lemma ?, Corollary ? and the proof of [6], and abbreviating , the right-hand side of (Equation 9) can be expressed as

Moreover, is equal to the coefficient of in the rational generating function

This computation allows us to extend Table ? to any desired value of , and .

Changing topics, we now consider the case when is the space of Hankel matrices. The computation of low-rank approximation of Hankel matrices will be our topic in Section 4, where we focus on algebraic geometry and formulas for generic ED degree.

Set and let denote the variety of Hankel matrices of rank . See (Equation 12) for examples. This variety lives in the projective space , whose points represent binary forms of degree . Thus is the rational normal curve of degree , and is the th secant variety of this curve. We have for .

The sum in ( ?) is the coefficient of in the generating function

The conormal variety of is the closure of the set

The homology class of is given by a binary form. We will show that the sum of its coefficients is the asserted coefficient of (Equation 11). By [6], this proves the claim.

Let , be the two projections. The images of the conormal variety are

We desingularize by considering . The desingularization map is given by the scheme-theoretic intersection of the rational normal curve of degree with a hyperplane. A point in , identified with a hyperplane, gives points on . Their linear span in defines a rank bundle on , known as the Schwarzenberger bundle [7]. This is the kernel of the bundle map . In the same way, we desingularize the conormal variety by the fiber product over of the projectivization of the Schwarzenberger bundle and of the projective bundle of . Exactly as in the proof of [3], the degrees of the polar classes of are

The total Segre class of is . The total Segre class of is . By multiplying them we obtain the degree sum of the polar classes, thus proving (Equation 11).

This corollary means that the ED degree of the Hankel determinant agrees with the ED degree of the general symmetric determinant. By ED duality [6], this also the ED degree of the second Veronese embedding of ; see [6]. If we consider Hankel matrices of fixed rank then we obtain polynomiality:

For example, we find the following explicit polynomials when the rank is small:

The values of these polynomials are the entries in the left columns in Table ? below.

4Hankel and Sylvester Matrices

In this section we study the weighted low-rank approximation problem for matrices with a special structure that is given by equating some matrix entries and setting others to zero. One such family consists of the Hurwitz matrices in [6]. We here discuss Hankel matrices, then catalecticants, and finally Sylvester matrices. The corresponding applications are low-rank approximation of symmetric tensors and approximate greatest common divisors.

The Hankel matrix of format has the entry in row and column . So, the total number of unknowns is . We are most interested in the case when this matrix is square or almost square. The Hankel matrix of order is if is odd, and it is if is even. We denote this matrix by . For instance,

For approximations by low-rank Hankel matrices, we consider three natural weights:

  • the matrix has entry in row and column ;

  • the matrix has all entries equal to ;

  • the matrix has in row and column .

We encountered these matrices for in Example ?. For we have

The weights represent the usual Euclidean distance in , the unit weights give the Frobenius distance in the ambient matrix space, and the weights give the natural metric in the space of symmetric -tensors. Such a tensor corresponds to a binary form

The Hankel matrix has rank if and only if is the st power of a linear form. More generally, if is the sum of powers of linear forms then has rank . As we saw in §3, this locus corresponds to the th secant variety of the rational normal curve in . Various ED degrees for our three weight matrices are displayed in Table ?.

Weighted ED degrees for Hankel matrices of order and rank .
4
7
10 13
13 34
16 64 40
19 103 142
22 151 334 121
Weighted ED degrees for Hankel matrices of order and rank .
2
7
6 9
13 34
10 38 34
19 103 142
14 103 246 113
Weighted ED degrees for Hankel matrices of order and rank .
2
3
4 7
5 16
6 28 20
7 43 62
8 61 134 53

The entries in the leftmost chart in Table ? come from Theorem ?. Indeed, the variety of Hankel matrices of rank is precisely the secant variety we discussed in Section 3. The weight matrix exhibits the generic ED degree for that variety. The columns on the left of Table ? are the values of the polynomials in Corollary ?, and the diagonal entries are given by Corollary ?.

All ED degrees in Table ? were verified using Gröbner basis computations over using the maple package FGb [8]. The running times are closely tied to the valued of the ED degrees, and they are similar to those reported in Table ?. Gröbner bases over can also be computed fairly easily whenever the ED degree is below , and for those cases we can locate all real critical points using fgbrs. However, for larger instances, exact symbolic solving over becomes a considerable challenge due to the growth in coefficient size.

Hankel matrices of rank correspond to symmetric -tensors of tensor rank , and these can be represented by binary forms that are sums of powers of linear forms. That is the point of the geometric discussion in Section 3. This interpretation extends to symmetric tensors of arbitrary format, with the rational normal curve replaced with the Veronese variety. For a general study of low-rank approximation of symmetric tensors see Friedland and Stawiska [9]. In general, there is no straightforward representation of low rank tensors by low rank matrices with special structure. However, there are some exceptions, notably for rank tensors, by the results of Raicu [20] and others in the recent tensor literature. We refer to Landsberg’s book [16], especially Chapters 3, 7 and 10. The resulting generalized Hankel matrices are known as catalecticants in the commutative algebra literature, or as moment matrices in the optimization literature. We now present a case study that arose from a particular application in biomedical imaging.

We consider the following catalecticant matrix of format :

The fifteen unknown entries are the coefficients of a ternary quartic