Variable projection methods for approximate (greatest) common divisor computations
We consider the problem of finding for a given -tuple of polynomials (real or complex) the closest -tuple that has a common divisor of degree at least . Extended weighted Euclidean seminorm of the coefficients is used as a measure of closeness. Two equivalent representations of the problem are considered: (i) direct parameterization over the common divisors and quotients (image representation), and (ii) Sylvester low-rank approximation (kernel representation). We use the duality between least-squares and least-norm problems to show that (i) and (ii) are closely related to mosaic Hankel low-rank approximation. This allows us to apply to the approximate common divisor problem recent results on complexity and accuracy of computations for mosaic Hankel low-rank approximation. We develop optimization methods based on the variable projection principle both for image and kernel representation. These methods have linear complexity in the degrees of the polynomials for small and large . We provide a software implementation of the developed methods, which is based on a software package for structured low-rank approximation.
keywords:approximate GCD; structured low-rank approximation; variable projection; mosaic Hankel matrices; least squares problem; weighted 2-norm
The problem of computing a greatest common divisor (GCD) of polynomials with real or complex coefficients appears in many applications: signal processing and system identification Agrawal.etal04SP-Common ; Gaubitch.etal05conf-Adaptive , computer-aided geometric design Emiris.etal13CD-Sparse , blind image deblurring Li.etal10conf-Blind , control of linear systems Khare.etal10conf-Real ; uncontr and approximate factorization of polynomials Zeng08-ApaTools . But, as noted in Pan01IaC-Computation , “computation of polynomial GCDs is an excellent example of numerically ill-posed problems”. Indeed, for any set of polynomials with non-trivial GCD, a generic perturbation of the coefficients makes them coprime. In the aforementioned applications, perturbations appear naturally due to limited precision of the numerical representation of the coefficients, or due to measurement errors. These reasons make it inevitable to use the notion of approximate greatest common divisor (AGCD).
There is a vast literature on the topic of AGCD (see the list of references), starting with different definitions of AGCD and finishing with different computational methods. Nevertheless, two main optimization-based formulations of AGCD are predominant. In what follows, we give these formulations for the case of two polynomials.
The first commonly accepted problem formulation is the problem of finding the so-called -GCD (see (Rupprecht99JPAA-algorithm, , Def. 2.2), (Bini.Boito10-fast, , Def 1.1), (Pan01IaC-Computation, , Eqn. (1)-(2))).
Problem 1.0 (-Gcd).
Given polynomials , and a threshold , find polynomials with a GCD that are solutions to
where is some distance measure for the pairs of polynomials.
The polynomial is conventionally called an -GCD. The distance in the definition of the -GCD is typically of the form
where is some norm. Various norms are used in the literature: the -norm (Cheze.etal11TCS-subdivision ; Agrawal.etal04SP-Common ; Corless.etal95conf-singular ; Kaltofen.etal07-Structured ), the -norm Hitz98PhD-Efficient and mixed /-norm in (Rupprecht99JPAA-algorithm, , Def. 2.2) and (Bini.Boito10-fast, , Def 1.1) (i.e., ).
Given and a number , find that are solutions to
Problem 2 appears in many contexts. First, it is often used in a combination with Problem 1: since Problem 1 typically has infinite number of optimal solutions , the closest pair of polynomials to the given ones is of interest. Thus Problem 2 is often referred to as refinement in the AGCD literature (Bini.Boito10-fast, , §3.2), Zeng11-Numerical .
Second, as noted in Rupprecht99JPAA-algorithm , being able to solve Problem 2, gives a solution to Problem 1. Indeed, if the minimum value of (ACD) for is less than or equal to a given and the minimum value of (ACD) for is strictly greater than , then is the solution of (AGCD). (For more details, see the discussion after (Rupprecht99JPAA-algorithm, , Def. 2.3).) This fact is illustrated in Fig. 1, where the feasible set for (AGCD) and (ACD) is shown (i.e., the values of and , for which exists a pair that satisfies ). The solutions of (ACD) correspond to the lowest points in each vertical line. The optimal solutions of (AGCD) correspond to red segment in Fig. 1 (the points on the rightmost vertical line that intersect the threshold horizontal line).
Finally, Problem 2 appears when we know an a priori bound on the degree of the GCD is given, which is a reasonable assumption in some applications Agrawal.etal04SP-Common , Markovsky.VanHuffel06conf-algorithm . Another common example is the problem of finding the nearest non-coprime polynomials Cheze.etal11TCS-subdivision ; uncontr , which corresponds to . In this paper, we focus on Problem 2.
1.1 Previous works
Two main approaches to Problem 2 can be identified in the literature, namely the direct parameterization approach (referred to as image representation in this paper) and the structured low-rank approximation (SLRA) approach (also referred to as kernel representation in this paper). Most of the algorithms were proposed for minimizing the weighted Euclidean distance.
The image representation approach is based on the direct representation of the polynomials as a product of a common factor and quotient polynomials, i.e., the cost function is minimized over all candidate common divisors (of degree ) and candidate quotient polynomials , (of degree ). The image representation approach is used as early as Corless.etal95conf-singular . However, the size of the search space makes minimization of expensive for general-purpose optimization routines. One of the main ways to reduce the complexity is elimination of variables, which is also named variable projection in the context of nonlinear least squares problems GolubPereyra73SJoNA-Differentiation .
In short, the variable projection principle is: for a fixed , the minimization of with respect to the other parameters is a linear least squares problem; thus other parameters can be eliminated, and a function with smaller number of parameters can be minimized. In the special case of , as shown in Karmarkar.Lakshman98JoSC-Approximate , the minimum of can be computed by minimizing a univariate polynomial (for real and ) or bivariate real polynomial (for complex and ). For the latter case a certified algorithm was presented in Cheze.etal11TCS-subdivision . If , but is small, as shown in Corless.etal95conf-singular ; Pan01IaC-Computation , can be computed efficiently (in linear time in the degree of the polynomials, if is small). Later (but independently) in Markovsky.VanHuffel06conf-algorithm , it was shown that the first derivative of can be evaluated with the same complexity. Independently, in Stoica.Soderstrom97A-Common ; Agrawal.etal04SP-Common , elimination of (instead of and ) was proposed. Finally, the variable projection was also implicitly used in Zhi.etal04JJoIaAM-Hybrid ; Li.Zhi13TCS-Computing for symbolic computation of nearest singular polynomial and semi-definite programming relaxations of the AGCD problem Li.etal08TCS-Approximate ; Kaltofen.etal08conf-Exact . There are other developments and extensions for the image representation. For example, in (Bini.Boito10-fast, , §3.2) it was shown that the Gauss-Newton step can be performed in quadratic time in the degrees of the polynomials.
Another popular approach to Problem 2 is the SLRA approach (kernel representation approach), which consists in reformulating Problem 2 as a problem of approximating a given structured matrix by a structured matrix of low rank (an SLRA problem Markovsky08A-Structured ). This reformulation is possible since the constraint on the GCD degree can be rewritten as a rank constraint on a structured matrix. For two polynomials, this is a Sylvester or a Sylvester subresultant matrix Zeng11-Numerical ; for several polynomials there are various generalizations of the Sylvester structure Kaltofen.etal06conf-Approximate ; Karcanias.etal06CMwA-Approximate (see Section 4 for more details).
Concerning algorithms for SLRA, the following methods were used in the context of the AGCD problem: structured total least norm (STLN) and its improvements Kaltofen.etal07-Structured ; Li.etal05JJSSaAC-Fast ; Winkler.Allan08JoCaAM-Structured ; Winkler.Hasan13JoCaAM-improved , Riemannian SVD Botting.etal05conf-Using , gradient projection Terui13TCS-GPGCD , alternating least squares with penalization Ishteva.etal13-Regularized , variable projection Markovsky.Usevich13JCAM-Software , Newton-like alternating projection algorithms Schost.Spaenlehauer13-Quadratically . In Ottaviani.etal13-Exact , a step toward global optimization was made by the authors who proposed to compute the number of complex critical points for the optimization problem, using symbolic computations.
1.2 Contribution and structure of this paper
In this paper, we consider generalization of Problem 2 to many polynomials. The main contributions of this paper are connections between image/kernel representation approaches to Problem 2 and structured low-rank approximation of mosaic-Hankel matrices. First, we show that the generalized Sylvester subresultant low-rank matrix approximation can be reduced to mosaic Hankel SLRA. Second, we show that the cost function in variable projection methods for AGCD in image representation has the same structure as the cost function in the variable projection method for mosaic Hankel SLRA Usevich.Markovsky13JCAM-Variable . These connections allow us to use, with small modification, the efficient algorithms developed in Usevich.Markovsky13JCAM-Variable . The algorithms have proven computational complexity and can handle real and complex polynomials. As a side result, we show that minimizing the relative distance between tuples of polynomials is equivalent to minimizing a distance based on angles between polynomials.
This structure of the paper is as follows. Sections 2–5 contain known results or their minor improvements. Sections 6–7 contain the main results and experiments. Section 2 contains the necessary background and a formal statement of an analogue of Problem 2 for many polynomials; we also introduce the spaces of homogeneous polynomials (polynomials with possible infinite roots) and operations with them, which are key ingredients of this paper. In Section 3 we review the image representation approach and the variable projection principle. In Section 4, we review the structured low-rank approximation (kernel representation) approaches adapted to our problem statement. In Section 5, we recall the mosaic Hankel structure and the results on variable projection methods of the corresponding SLRA problem. In Section 6, we present the main results of the paper. In Section 7, we provide numerical experiments that include comparison with the state-of-the-art methods. The methods developed in this paper are implemented in MATLAB and are based on the SLRA package SLRA described in Markovsky.Usevich13JCAM-Software . The source code of the methods and experiments is publicly available at http://github.com/slra/slra.
2 Main notation and the approximate common divisor problem
2.1 Polynomials with possible roots at infinity
Let denote the set of univariate polynomials over the field (where is or ). Let be the set of polynomials of degree at most , i.e.
Then the space is isomorphic to through the correspondence
With some possible abuse of notation, we will use shorthand notation .
The leading coefficient may be equal to . In this case, we will say that the polynomial has the root . By multiplicity of the root we denote the maximal number of consecutive zero leading coefficients. Thus every polynomial in has exactly roots in (the Riemannian sphere).
The following polynomial has two simple roots ( and ) and a double root :
In this paper, we call the elements of homogeneous polynomials, since can be viewed as the space of bivariate homogeneous polynomials.
2.2 Multiplication and division of homogeneous polynomials
The multiplication of polynomials is defined as (acting as ). It has the following matrix representation:
where ,, and is the multiplication matrix by :
which is a rectangular Toeplitz matrix, where the blank triangular parts stand for zeros.
For , we say that a polynomial divides a polynomial (or is a divisor of ), if there exists a polynomial such that . In particular, this definition includes the following special cases.
All , , are the divisors of the zero polynomial .
A nonzero polynomial of zero degree is a divisor of any polynomial.
The notion of divisor in the spaces differs from the notion of divisors for ordinary polynomials, due to possible presence of the roots.
Consider two polynomials
Although divides of , the polynomial does not, because it has the root .
2.3 N-tuples of polynomials and common divisors
Let be a vector of fixed degrees, , , and denote by the set of -tuples of polynomials with these degrees. We also adopt the notation , , for the elements of the -tuples. With some possible ambiguity of notation (as in (2)), we use the same letter for the -tuple , and for the stacked vector
Next, we introduce the operation of multiplication of an -tuple by a polynomial as follows:
The polynomials have a common divisor , if divides all the polynomials . The polynomial is called a greatest common divisor (GCD) if there are no common divisors in for any .
Since is a divisor of any polynomial, a GCD always exists (but it is not unique). We denote by the degree of GCDs, and denote by the set of all GCD (which is a subset of ). In particular, if all , then
Otherwise, is a punctured one-dimensional linear subspace of
2.4 The approximate common divisor problem statement
Define the set of -tuples that have a GCD of degree at least as follows
Finally, assume that is equipped with a distance , which is continuous in the Euclidean topology. We formulate the generalization of Problem 2 as follows.
Problem 2.0 (Approximate GCD with bounded degree).
Given and , find the distance
The set is closed, in particular, thanks to the fact that we use homogeneous polynomials and to the special definition of the GCD in Section 2.3.
2.5 Weigthed norm, missing and fixed values
In this paper, we use the following distance:
where is a tuple of weight vectors , and
is the weighted extended semi-norm on (see also Markovsky.Usevich13JCAM-Software ). If , then is a standard weighted -norm. The and weights have a special meaning:
If a weight is present, the solution does not depend on . Hence, we may assume that the coefficient is undefined (the case of missing data Markovsky.Usevich13SJMAA-Structured ).
3 Image representation and variable projection
3.1 Image representation
In this approach, the set is replaced by the set (of candidate factorizations)
where is a candidate common divisor, and are candidate quotient polynomials.
We refer to (7) as image representation, since is the image of the map
Then an analogue of Problem 3 is formulated as follows.
Given , find and that are solutions to
3.2 Variable projection methods
Denote the cost function in (8) as
The variable projection principle GolubPereyra73SJoNA-Differentiation is based on the fact that for one fixed variable (either or ), minimization of (9) is a linear least squares problem and has a closed form solution. This principle is further explained on each of the examples.
Example 3.1 (Variable projection with respect to a common divisor).
Example 3.2 (Variable projection with respect to quotient polynomials).
Remark 3.4 (On algorithms).
After eliminated of variables, for the reduced cost function () we can apply conventional smooth optimization methods, such as:
gradient-based methods, which require evaluation of the cost function and its gradient (including quasi-Newton methods, for example, BFGS Nocedal.Wright06-Numerical );
if the cost function can be represented as a sum of squares, i.e.,
where , the Gauss-Newton/Levenberg-Marquardt methods can be applied, which require evaluation of the vector function and its Jacobian at each iteration. In the context of nonlinear least squares problems, the Levenberg-Marquardt method has shown to be particularly effective, especially when combined with the variable projection GolubPereyra73SJoNA-Differentiation ; Usevich.Markovsky14A-Optimization .
The choice of the type of the variable projection depends on a particular problem and its dimensions. For example, the variable projection with respect to (Example 3.1) is reasonable when the degree of the common divisor is small, see Karmarkar.Lakshman98JoSC-Approximate ; Cheze.etal11TCS-subdivision ; Corless.etal95conf-singular ; Pan01IaC-Computation ; Markovsky.VanHuffel06conf-algorithm ; Zhi.etal04JJoIaAM-Hybrid ; Li.Zhi13TCS-Computing ; Li.etal08TCS-Approximate ; Kaltofen.etal08conf-Exact . For real polynomials and uniform weights, in Markovsky.VanHuffel06conf-algorithm it was shown that and its gradient can be evaluated in flops.
Variable projection with respect to is less common (used in framework of Common Factor Estimation Stoica.Soderstrom97A-Common ; Agrawal.etal04SP-Common ). But, in fact, the inner minimization problem (12) is well-known in the AGCD literature: it is exactly the so-called least squares division (see, for example, Zeng11-Numerical ).
4 Structured low-rank approximation approaches
In this section, we recall the structured low-rank approximation problem, and review the most popular parameterizations of the problem (4), adapted to our case.
4.1 SLRA problem
The structured low-rank approximation problem is formulated as follows Markovsky.Usevich13SJMAA-Structured . An affine matrix structure is an affine map from the structure parameter space to the space of matrices (where is or ), defined by
Problem 4.0 (Structured low-rank approximation).
Given an affine structure , data vector , and natural number
where is the weighted extended seminorm, see Section 2.5.
Thus if the are able to represent the set through the set of low-rank matrices, then Problem 2 can be reformulated as Problem 5. The classic theorem of Sylvester provides this correspondence for two polynomials.
Theorem 4.1 (Sylvester).
Two homogeneous polynomials and have a non-trivial common divisor if and only if the matrix
is rank deficient. The rank defect of the matrix is equal to the degree of the GCD.
There exists a generalization of Theorem 4.1, for so-called subresultant matrices.
4.2 Generalized Sylvester subresultant matrix
where is defined in Section 2.2. The matrix has rows and columns:
The matrix (15) is called the generalized Sylvester subresultant matrix.
For , the matrix has the form
It can be shown that the following lemma holds true.
For , , we have that
or, equivalently has rank deficiency at least .
The proof is given in B.∎
The following equality of sets holds true:
But, the set of -tuples with rank-deficient subresultant matrix
does not coincide with , in view of our definition of GCD. Indeed, if then the matrix becomes automatically rank-deficient.
4.3 Full Sylvester subresultant matrix
By Remark 4.3, if the solution of SLRA f has , then it does not give a desired solution to Problem 3. In order to handle properly this non-generic case, we can use an alternative structure, which can be constructed recursively from subresultants.
For an -tuple , define
Example 4.2 (Example 4.1, continued).
For , we have
Compared with the matrix (which has block rows), the matrix has block rows, i.e., all possible pairs of the polynomials are present. The structure of is similar to the structure of Young flattening of tensors (Landsberg12-Tensors, , §3.8), and probably has a similar algebraic description.
Next, we show that SLRA of is equivalent to Problem 3.
For , we have that
The proof is given in B.∎
Apart from the precise correspondence between the approximation problems (proved in Proposition 4.5), the structure can be used to construct initial approximation in the optimization methods (see Section 4.5). As shown in the following lemma, the quotient polynomials can be obtained from the kernel of .
If and , for , then , where .
The proof is given in B. ∎
4.4 Extended Sylvester matrix
Yet another elegant extension of the Sylvester matrix was proposed in Karcanias.etal06CMwA-Approximate . The extended Sylvester matrix, (Karcanias.etal06CMwA-Approximate, , (2.2b))) for a parameter , defined as
The number of rows does not exceed the number of columns . For the structure (19), the following theorem holds true.
Theorem 4.7 ((Karcanias.etal06CMwA-Approximate, , Thm. 1)).
For and ,
Suppose (for simplicity) that the polynomial has simple roots (excluding ). From (19), it follows that for the roots of , the vector
is in the right kernel of . It can be shown that the vectors in the right kernel are linear combinations of the vectors of the form (20), thus the rank defect of is .
Hence, the Problem 3 is equivalent to structured low-rank approximation of the matrix , and can be solved as an SLRA problem.
4.5 Initial approximations for the direct parameterization methods
The formulation (SLRA) also provides a heuristic to obtain an initial guess for and/or for the optimization methods in image or kernel representation (from Section 3). The common heuristic consists in replacing the SLRA problem by unstructured low-rank approximation of a structured matrix . The unstructured low-rank approximation can be computed, for example, using the SVD.
The first option is to use Lemma 4.6, and use an approximate solution of . Such a solution obtained by unstructured low-rank approximation will be denoted . We may assume that give an approximation of quotient polynomials (due to the fact that the condition represents generic points in ). Then an initial approximation of , can be found by finding the minimizer of (12). Let us summarize this option in the following algorithm.
Input: -tuple , . Output: initial approximation .
Compute — last right singular vector of (or );
In Algorithm 4.1, it may be preferable to use , because it contains each polynomial the same number of times.
Another option is to use the approximate kernel of the structure (19), described in Remark 4.8, and compute the initial approximation using the matrix pencil approach (modified matrix pencil method of Karcanias.etal06IJoC-Matrix ). In this case, the algorithm is as follows.
Input: -tuple , . Output: initial approximation .
Construct the extended Sylvester matrix defined in (19).
Compute the SVD , and define by the matrix composed of the last columns of .
Set (computed using the eigenvalue decomposition of ).
5 Mosaic Hankel low-rank approximation
In this section, we recall the definition of mosaic-Hankel matrices and results on mosaic Hankel SLRA Usevich.Markovsky13JCAM-Variable . Note that compared with Usevich.Markovsky13JCAM-Variable , we use transposed matrices.
5.1 Mosaic Hankel matrices
Let denote a Hankel matrix, generated from , i.e.
For two vectors , , vectors , and the combined vector
we define the mosaic Hankel matrix Usevich.Markovsky13JCAM-Variable :
5.2 Structured low-rank approximation and variable projection
such that . Then the following result holds true.
Theorem 5.1 ((Usevich.Markovsky13JCAM-Variable, , Thm. 1-3)).
The complexity (in flops) of the evaluation of , , and the Jacobian of with respect to is .
The complexity bound are lower for certain cases (evaluation of , in the case of uniform weights), but we stick to the bound in Theorem 5.1, since it gives complexity for the Gauss-Newton/Levenberg-Marquardt step.
6 Main results
In this section, we provide the main results of the paper on the connections between the ACD problem and mosaic-Hankel low-rank approximation.
6.1 Generalized Sylvester LRA as mosaic Hankel LRA
First, we consider the matrix and show how it can be represented in the form , where is full row rank matrix, and is a mosaic Hankel matrix. Thus SLRA for this structure can be solved with the methods of Markovsky.Usevich13JCAM-Software .
Denote , as in (16), , , and
where denotes the vector of zeros of length .
The generalized Sylvester subresultant matrix (15) can be represented as the following mosaic Hankel matrix:
For and we have
If we denote , then we have that
which completes the proof. ∎
The problem (SLRA) for the matrix (15) can be solved as a weighted mosaic low-rank approximation of the matrix in (26), if we fix the zero elements. This can be accomplished by taking the following weight vector:
where denotes the vector of ones of length . We note that the case of infinite weights can be handled by the methods of Markovsky.Usevich13JCAM-Software .
As in Proposition 6.1, the structure (19) can be represented as a mosaic Hankel structure. We do not consider this representation here, because SLRA of presents a difficulty for optimization methods based on kernel representation of the rank constraint Markovsky.Usevich13JCAM-Software , due to the nonlinear structure of the right kernel of a rank deficient (as shown in Remark 4.8). Recently Ishteva.etal13-Regularized , new me