Variable projection methods for approximate (greatest) common divisor computations
Abstract
We consider the problem of finding for a given tuple of polynomials (real or complex) the closest tuple that has a common divisor of degree at least . Extended weighted Euclidean seminorm of the coefficients is used as a measure of closeness. Two equivalent representations of the problem are considered: (i) direct parameterization over the common divisors and quotients (image representation), and (ii) Sylvester lowrank approximation (kernel representation). We use the duality between leastsquares and leastnorm problems to show that (i) and (ii) are closely related to mosaic Hankel lowrank approximation. This allows us to apply to the approximate common divisor problem recent results on complexity and accuracy of computations for mosaic Hankel lowrank approximation. We develop optimization methods based on the variable projection principle both for image and kernel representation. These methods have linear complexity in the degrees of the polynomials for small and large . We provide a software implementation of the developed methods, which is based on a software package for structured lowrank approximation.
keywords:
approximate GCD; structured lowrank approximation; variable projection; mosaic Hankel matrices; least squares problem; weighted 2norm1 Introduction
The problem of computing a greatest common divisor (GCD) of polynomials with real or complex coefficients appears in many applications: signal processing and system identification Agrawal.etal04SPCommon ; Gaubitch.etal05confAdaptive , computeraided geometric design Emiris.etal13CDSparse , blind image deblurring Li.etal10confBlind , control of linear systems Khare.etal10confReal ; uncontr and approximate factorization of polynomials Zeng08ApaTools . But, as noted in Pan01IaCComputation , “computation of polynomial GCDs is an excellent example of numerically illposed problems”. Indeed, for any set of polynomials with nontrivial GCD, a generic perturbation of the coefficients makes them coprime. In the aforementioned applications, perturbations appear naturally due to limited precision of the numerical representation of the coefficients, or due to measurement errors. These reasons make it inevitable to use the notion of approximate greatest common divisor (AGCD).
There is a vast literature on the topic of AGCD (see the list of references), starting with different definitions of AGCD and finishing with different computational methods. Nevertheless, two main optimizationbased formulations of AGCD are predominant. In what follows, we give these formulations for the case of two polynomials.
The first commonly accepted problem formulation is the problem of finding the socalled GCD (see (Rupprecht99JPAAalgorithm, , Def. 2.2), (Bini.Boito10fast, , Def 1.1), (Pan01IaCComputation, , Eqn. (1)(2))).
Problem 1.0 (Gcd).
Given polynomials , and a threshold , find polynomials with a GCD that are solutions to
(AGCD) 
where is some distance measure for the pairs of polynomials.
The polynomial is conventionally called an GCD. The distance in the definition of the GCD is typically of the form
where is some norm. Various norms are used in the literature: the norm (Cheze.etal11TCSsubdivision ; Agrawal.etal04SPCommon ; Corless.etal95confsingular ; Kaltofen.etal07Structured ), the norm Hitz98PhDEfficient and mixed /norm in (Rupprecht99JPAAalgorithm, , Def. 2.2) and (Bini.Boito10fast, , Def 1.1) (i.e., ).
The second core problem formulation (Kaltofen.etal07Structured, , Prob. 1.1) is a dual problem to (AGCD).
Problem 1.0.
Given and a number , find that are solutions to
(ACD) 
Problem 2 appears in many contexts. First, it is often used in a combination with Problem 1: since Problem 1 typically has infinite number of optimal solutions , the closest pair of polynomials to the given ones is of interest. Thus Problem 2 is often referred to as refinement in the AGCD literature (Bini.Boito10fast, , §3.2), Zeng11Numerical .
Second, as noted in Rupprecht99JPAAalgorithm , being able to solve Problem 2, gives a solution to Problem 1. Indeed, if the minimum value of (ACD) for is less than or equal to a given and the minimum value of (ACD) for is strictly greater than , then is the solution of (AGCD). (For more details, see the discussion after (Rupprecht99JPAAalgorithm, , Def. 2.3).) This fact is illustrated in Fig. 1, where the feasible set for (AGCD) and (ACD) is shown (i.e., the values of and , for which exists a pair that satisfies ). The solutions of (ACD) correspond to the lowest points in each vertical line. The optimal solutions of (AGCD) correspond to red segment in Fig. 1 (the points on the rightmost vertical line that intersect the threshold horizontal line).
Thus, Problem 1 can be solved by solving Problem 2 for all possible (or by using, for example, bisection over ).
Finally, Problem 2 appears when we know an a priori bound on the degree of the GCD is given, which is a reasonable assumption in some applications Agrawal.etal04SPCommon , Markovsky.VanHuffel06confalgorithm . Another common example is the problem of finding the nearest noncoprime polynomials Cheze.etal11TCSsubdivision ; uncontr , which corresponds to . In this paper, we focus on Problem 2.
1.1 Previous works
Two main approaches to Problem 2 can be identified in the literature, namely the direct parameterization approach (referred to as image representation in this paper) and the structured lowrank approximation (SLRA) approach (also referred to as kernel representation in this paper). Most of the algorithms were proposed for minimizing the weighted Euclidean distance.
The image representation approach is based on the direct representation of the polynomials as a product of a common factor and quotient polynomials, i.e., the cost function is minimized over all candidate common divisors (of degree ) and candidate quotient polynomials , (of degree ). The image representation approach is used as early as Corless.etal95confsingular . However, the size of the search space makes minimization of expensive for generalpurpose optimization routines. One of the main ways to reduce the complexity is elimination of variables, which is also named variable projection in the context of nonlinear least squares problems GolubPereyra73SJoNADifferentiation .
In short, the variable projection principle is: for a fixed , the minimization of with respect to the other parameters is a linear least squares problem; thus other parameters can be eliminated, and a function with smaller number of parameters can be minimized. In the special case of , as shown in Karmarkar.Lakshman98JoSCApproximate , the minimum of can be computed by minimizing a univariate polynomial (for real and ) or bivariate real polynomial (for complex and ). For the latter case a certified algorithm was presented in Cheze.etal11TCSsubdivision . If , but is small, as shown in Corless.etal95confsingular ; Pan01IaCComputation , can be computed efficiently (in linear time in the degree of the polynomials, if is small). Later (but independently) in Markovsky.VanHuffel06confalgorithm , it was shown that the first derivative of can be evaluated with the same complexity. Independently, in Stoica.Soderstrom97ACommon ; Agrawal.etal04SPCommon , elimination of (instead of and ) was proposed. Finally, the variable projection was also implicitly used in Zhi.etal04JJoIaAMHybrid ; Li.Zhi13TCSComputing for symbolic computation of nearest singular polynomial and semidefinite programming relaxations of the AGCD problem Li.etal08TCSApproximate ; Kaltofen.etal08confExact . There are other developments and extensions for the image representation. For example, in (Bini.Boito10fast, , §3.2) it was shown that the GaussNewton step can be performed in quadratic time in the degrees of the polynomials.
Another popular approach to Problem 2 is the SLRA approach (kernel representation approach), which consists in reformulating Problem 2 as a problem of approximating a given structured matrix by a structured matrix of low rank (an SLRA problem Markovsky08AStructured ). This reformulation is possible since the constraint on the GCD degree can be rewritten as a rank constraint on a structured matrix. For two polynomials, this is a Sylvester or a Sylvester subresultant matrix Zeng11Numerical ; for several polynomials there are various generalizations of the Sylvester structure Kaltofen.etal06confApproximate ; Karcanias.etal06CMwAApproximate (see Section 4 for more details).
Concerning algorithms for SLRA, the following methods were used in the context of the AGCD problem: structured total least norm (STLN) and its improvements Kaltofen.etal07Structured ; Li.etal05JJSSaACFast ; Winkler.Allan08JoCaAMStructured ; Winkler.Hasan13JoCaAMimproved , Riemannian SVD Botting.etal05confUsing , gradient projection Terui13TCSGPGCD , alternating least squares with penalization Ishteva.etal13Regularized , variable projection Markovsky.Usevich13JCAMSoftware , Newtonlike alternating projection algorithms Schost.Spaenlehauer13Quadratically . In Ottaviani.etal13Exact , a step toward global optimization was made by the authors who proposed to compute the number of complex critical points for the optimization problem, using symbolic computations.
1.2 Contribution and structure of this paper
In this paper, we consider generalization of Problem 2 to many polynomials. The main contributions of this paper are connections between image/kernel representation approaches to Problem 2 and structured lowrank approximation of mosaicHankel matrices. First, we show that the generalized Sylvester subresultant lowrank matrix approximation can be reduced to mosaic Hankel SLRA. Second, we show that the cost function in variable projection methods for AGCD in image representation has the same structure as the cost function in the variable projection method for mosaic Hankel SLRA Usevich.Markovsky13JCAMVariable . These connections allow us to use, with small modification, the efficient algorithms developed in Usevich.Markovsky13JCAMVariable . The algorithms have proven computational complexity and can handle real and complex polynomials. As a side result, we show that minimizing the relative distance between tuples of polynomials is equivalent to minimizing a distance based on angles between polynomials.
This structure of the paper is as follows. Sections 2–5 contain known results or their minor improvements. Sections 6–7 contain the main results and experiments. Section 2 contains the necessary background and a formal statement of an analogue of Problem 2 for many polynomials; we also introduce the spaces of homogeneous polynomials (polynomials with possible infinite roots) and operations with them, which are key ingredients of this paper. In Section 3 we review the image representation approach and the variable projection principle. In Section 4, we review the structured lowrank approximation (kernel representation) approaches adapted to our problem statement. In Section 5, we recall the mosaic Hankel structure and the results on variable projection methods of the corresponding SLRA problem. In Section 6, we present the main results of the paper. In Section 7, we provide numerical experiments that include comparison with the stateoftheart methods. The methods developed in this paper are implemented in MATLAB and are based on the SLRA package SLRA described in Markovsky.Usevich13JCAMSoftware . The source code of the methods and experiments is publicly available at http://github.com/slra/slra.
2 Main notation and the approximate common divisor problem
2.1 Polynomials with possible roots at infinity
Let denote the set of univariate polynomials over the field (where is or ). Let be the set of polynomials of degree at most , i.e.
(1) 
Then the space is isomorphic to through the correspondence
(2) 
With some possible abuse of notation, we will use shorthand notation .
Remark 2.1.
The leading coefficient may be equal to . In this case, we will say that the polynomial has the root . By multiplicity of the root we denote the maximal number of consecutive zero leading coefficients. Thus every polynomial in has exactly roots in (the Riemannian sphere).
Example 2.1.
The following polynomial has two simple roots ( and ) and a double root :
Remark 2.2.
In this paper, we call the elements of homogeneous polynomials, since can be viewed as the space of bivariate homogeneous polynomials.
2.2 Multiplication and division of homogeneous polynomials
The multiplication of polynomials is defined as (acting as ). It has the following matrix representation:
where ,, and is the multiplication matrix by :
which is a rectangular Toeplitz matrix, where the blank triangular parts stand for zeros.
For , we say that a polynomial divides a polynomial (or is a divisor of ), if there exists a polynomial such that . In particular, this definition includes the following special cases.

All , , are the divisors of the zero polynomial .

A nonzero polynomial of zero degree is a divisor of any polynomial.
The notion of divisor in the spaces differs from the notion of divisors for ordinary polynomials, due to possible presence of the roots.
Example 2.2.
Consider two polynomials
Although divides of , the polynomial does not, because it has the root .
2.3 Ntuples of polynomials and common divisors
Let be a vector of fixed degrees, , , and denote by the set of tuples of polynomials with these degrees. We also adopt the notation , , for the elements of the tuples. With some possible ambiguity of notation (as in (2)), we use the same letter for the tuple , and for the stacked vector
Next, we introduce the operation of multiplication of an tuple by a polynomial as follows:
Definition 2.3.
The polynomials have a common divisor , if divides all the polynomials . The polynomial is called a greatest common divisor (GCD) if there are no common divisors in for any .
Since is a divisor of any polynomial, a GCD always exists (but it is not unique). We denote by the degree of GCDs, and denote by the set of all GCD (which is a subset of ). In particular, if all , then
Otherwise, is a punctured onedimensional linear subspace of
2.4 The approximate common divisor problem statement
Define the set of tuples that have a GCD of degree at least as follows
(3) 
Finally, assume that is equipped with a distance , which is continuous in the Euclidean topology. We formulate the generalization of Problem 2 as follows.
Problem 2.0 (Approximate GCD with bounded degree).
Given and , find the distance
(4) 
Note that Problem 3 wellposed, since the set is closed in the Euclidean topology, by Lemma A.1 (see A).
Remark 2.4.
The set is closed, in particular, thanks to the fact that we use homogeneous polynomials and to the special definition of the GCD in Section 2.3.
2.5 Weigthed norm, missing and fixed values
In this paper, we use the following distance:
(5) 
where is a tuple of weight vectors , and
(6) 
is the weighted extended seminorm on (see also Markovsky.Usevich13JCAMSoftware ). If , then is a standard weighted norm. The and weights have a special meaning:

If a weight is present, the solution does not depend on . Hence, we may assume that the coefficient is undefined (the case of missing data Markovsky.Usevich13SJMAAStructured ).
3 Image representation and variable projection
3.1 Image representation
In this approach, the set is replaced by the set (of candidate factorizations)
(7) 
where is a candidate common divisor, and are candidate quotient polynomials.
We refer to (7) as image representation, since is the image of the map
Remark 3.1.
Then an analogue of Problem 3 is formulated as follows.
Problem 3.0.
Given , find and that are solutions to
(8) 
Remark 3.2.
3.2 Variable projection methods
Denote the cost function in (8) as
(9) 
The cost function (9) is a nonlinear least squares problem, for which the variable projection principle GolubPereyra73SJoNADifferentiation (based on elimination of variables) can be applied.
The variable projection principle GolubPereyra73SJoNADifferentiation is based on the fact that for one fixed variable (either or ), minimization of (9) is a linear least squares problem and has a closed form solution. This principle is further explained on each of the examples.
Example 3.1 (Variable projection with respect to a common divisor).
Example 3.2 (Variable projection with respect to quotient polynomials).
Remark 3.4 (On algorithms).
After eliminated of variables, for the reduced cost function () we can apply conventional smooth optimization methods, such as:

gradientbased methods, which require evaluation of the cost function and its gradient (including quasiNewton methods, for example, BFGS Nocedal.Wright06Numerical );

if the cost function can be represented as a sum of squares, i.e.,
where , the GaussNewton/LevenbergMarquardt methods can be applied, which require evaluation of the vector function and its Jacobian at each iteration. In the context of nonlinear least squares problems, the LevenbergMarquardt method has shown to be particularly effective, especially when combined with the variable projection GolubPereyra73SJoNADifferentiation ; Usevich.Markovsky14AOptimization .
The choice of the type of the variable projection depends on a particular problem and its dimensions. For example, the variable projection with respect to (Example 3.1) is reasonable when the degree of the common divisor is small, see Karmarkar.Lakshman98JoSCApproximate ; Cheze.etal11TCSsubdivision ; Corless.etal95confsingular ; Pan01IaCComputation ; Markovsky.VanHuffel06confalgorithm ; Zhi.etal04JJoIaAMHybrid ; Li.Zhi13TCSComputing ; Li.etal08TCSApproximate ; Kaltofen.etal08confExact . For real polynomials and uniform weights, in Markovsky.VanHuffel06confalgorithm it was shown that and its gradient can be evaluated in flops.
Variable projection with respect to is less common (used in framework of Common Factor Estimation Stoica.Soderstrom97ACommon ; Agrawal.etal04SPCommon ). But, in fact, the inner minimization problem (12) is wellknown in the AGCD literature: it is exactly the socalled least squares division (see, for example, Zeng11Numerical ).
4 Structured lowrank approximation approaches
In this section, we recall the structured lowrank approximation problem, and review the most popular parameterizations of the problem (4), adapted to our case.
4.1 SLRA problem
The structured lowrank approximation problem is formulated as follows Markovsky.Usevich13SJMAAStructured . An affine matrix structure is an affine map from the structure parameter space to the space of matrices (where is or ), defined by
(13) 
Problem 4.0 (Structured lowrank approximation).
Given an affine structure , data vector , and natural number
(SLRA) 
where is the weighted extended seminorm, see Section 2.5.
Thus if the are able to represent the set through the set of lowrank matrices, then Problem 2 can be reformulated as Problem 5. The classic theorem of Sylvester provides this correspondence for two polynomials.
Theorem 4.1 (Sylvester).
Two homogeneous polynomials and have a nontrivial common divisor if and only if the matrix
(14) 
is rank deficient. The rank defect of the matrix is equal to the degree of the GCD.
There exists a generalization of Theorem 4.1, for socalled subresultant matrices.
4.2 Generalized Sylvester subresultant matrix
For several polynomials (a tuple ), the following matrix is typically considered (called generalized Sylvester subresultant matrix) Rupprecht99JPAAalgorithm ; Kaltofen.etal06confApproximate .
(15) 
where is defined in Section 2.2. The matrix has rows and columns:
(16) 
The matrix (15) is called the generalized Sylvester subresultant matrix.
Example 4.1.
For , the matrix has the form
It can be shown that the following lemma holds true.
Lemma 4.2.
For , , we have that
or, equivalently has rank deficiency at least .
Proof.
The proof is given in B.∎
Remark 4.3.
The following equality of sets holds true:
But, the set of tuples with rankdeficient subresultant matrix
(17) 
does not coincide with , in view of our definition of GCD. Indeed, if then the matrix becomes automatically rankdeficient.
4.3 Full Sylvester subresultant matrix
By Remark 4.3, if the solution of SLRA f has , then it does not give a desired solution to Problem 3. In order to handle properly this nongeneric case, we can use an alternative structure, which can be constructed recursively from subresultants.
For an tuple , define
where is defined in (15), and the number of columns of the zero block is . Thus has rows and columns, where is as in (16) and
Example 4.2 (Example 4.1, continued).
For , we have
Remark 4.4.
Compared with the matrix (which has block rows), the matrix has block rows, i.e., all possible pairs of the polynomials are present. The structure of is similar to the structure of Young flattening of tensors (Landsberg12Tensors, , §3.8), and probably has a similar algebraic description.
Next, we show that SLRA of is equivalent to Problem 3.
Proposition 4.5.
For , we have that
(18) 
Proof.
The proof is given in B.∎
Apart from the precise correspondence between the approximation problems (proved in Proposition 4.5), the structure can be used to construct initial approximation in the optimization methods (see Section 4.5). As shown in the following lemma, the quotient polynomials can be obtained from the kernel of .
Lemma 4.6.
If and , for , then , where .
Proof.
The proof is given in B. ∎
4.4 Extended Sylvester matrix
Yet another elegant extension of the Sylvester matrix was proposed in Karcanias.etal06CMwAApproximate . The extended Sylvester matrix, (Karcanias.etal06CMwAApproximate, , (2.2b))) for a parameter , defined as
(19) 
The number of rows does not exceed the number of columns . For the structure (19), the following theorem holds true.
Theorem 4.7 ((Karcanias.etal06CMwAApproximate, , Thm. 1)).
For and ,
The proof of Theorem 4.7 (which can be found in (Karcanias.etal06CMwAApproximate, , Thm. 1)) is based on the following fact on the right kernel of (an analogue of Lemma 4.6).
Remark 4.8.
Suppose (for simplicity) that the polynomial has simple roots (excluding ). From (19), it follows that for the roots of , the vector
(20) 
is in the right kernel of . It can be shown that the vectors in the right kernel are linear combinations of the vectors of the form (20), thus the rank defect of is .
Hence, the Problem 3 is equivalent to structured lowrank approximation of the matrix , and can be solved as an SLRA problem.
Remark 4.9.
Note that the methods Karcanias.etal06CMwAApproximate ; Karcanias.etal06IJoCMatrix (matrix pencil methodologies) do not solve the structured lowrank approximation problem. Instead, they are based on unstructured relaxations of the problem (using singular value decomposition).
4.5 Initial approximations for the direct parameterization methods
The formulation (SLRA) also provides a heuristic to obtain an initial guess for and/or for the optimization methods in image or kernel representation (from Section 3). The common heuristic consists in replacing the SLRA problem by unstructured lowrank approximation of a structured matrix . The unstructured lowrank approximation can be computed, for example, using the SVD.
The first option is to use Lemma 4.6, and use an approximate solution of . Such a solution obtained by unstructured lowrank approximation will be denoted . We may assume that give an approximation of quotient polynomials (due to the fact that the condition represents generic points in ). Then an initial approximation of , can be found by finding the minimizer of (12). Let us summarize this option in the following algorithm.
Algorithm 4.1.
Input: tuple , . Output: initial approximation .

Compute — last right singular vector of (or );

Set .
Remark 4.10.
In Algorithm 4.1, it may be preferable to use , because it contains each polynomial the same number of times.
Another option is to use the approximate kernel of the structure (19), described in Remark 4.8, and compute the initial approximation using the matrix pencil approach (modified matrix pencil method of Karcanias.etal06IJoCMatrix ). In this case, the algorithm is as follows.
Algorithm 4.2.
Input: tuple , . Output: initial approximation .

Construct the extended Sylvester matrix defined in (19).

Compute the SVD , and define by the matrix composed of the last columns of .

Define .

Set (computed using the eigenvalue decomposition of ).
5 Mosaic Hankel lowrank approximation
In this section, we recall the definition of mosaicHankel matrices and results on mosaic Hankel SLRA Usevich.Markovsky13JCAMVariable . Note that compared with Usevich.Markovsky13JCAMVariable , we use transposed matrices.
5.1 Mosaic Hankel matrices
Let denote a Hankel matrix, generated from , i.e.
For two vectors , , vectors , and the combined vector
(21) 
we define the mosaic Hankel matrix Usevich.Markovsky13JCAMVariable :
(22) 
5.2 Structured lowrank approximation and variable projection
We consider the problem (SLRA) for the realvalued structure (22) (i.e., ). We assume that , , and denote by the rank defect.
Then, following the variable projection approach, as described in Usevich.Markovsky13JCAMVariable , the problem (SLRA) can be rewritten as a bilevel optimization problem:
(23)  
(24) 
The problem (24) is a linear least norm problem, and has a closed form solution. Define the optimal solution of (24), and
(25) 
such that . Then the following result holds true.
Theorem 5.1 ((Usevich.Markovsky13JCAMVariable, , Thm. 13)).
The complexity (in flops) of the evaluation of , , and the Jacobian of with respect to is .
Remark 5.2.
The complexity bound are lower for certain cases (evaluation of , in the case of uniform weights), but we stick to the bound in Theorem 5.1, since it gives complexity for the GaussNewton/LevenbergMarquardt step.
6 Main results
In this section, we provide the main results of the paper on the connections between the ACD problem and mosaicHankel lowrank approximation.
6.1 Generalized Sylvester LRA as mosaic Hankel LRA
First, we consider the matrix and show how it can be represented in the form , where is full row rank matrix, and is a mosaic Hankel matrix. Thus SLRA for this structure can be solved with the methods of Markovsky.Usevich13JCAMSoftware .
Proposition 6.1.
The generalized Sylvester subresultant matrix (15) can be represented as the following mosaic Hankel matrix:
(26) 
where and
Proof.
For and we have
If we denote , then we have that
which completes the proof. ∎
Remark 6.2.
The problem (SLRA) for the matrix (15) can be solved as a weighted mosaic lowrank approximation of the matrix in (26), if we fix the zero elements. This can be accomplished by taking the following weight vector:
where denotes the vector of ones of length . We note that the case of infinite weights can be handled by the methods of Markovsky.Usevich13JCAMSoftware .
Corollary 6.3.
The complexity of the GaussNewton/LevenbergMarquardt step for the structure (26) (using the methods of Markovsky.Usevich13JCAMSoftware ) is .
Remark 6.4.
As in Proposition 6.1, the structure (19) can be represented as a mosaic Hankel structure. We do not consider this representation here, because SLRA of presents a difficulty for optimization methods based on kernel representation of the rank constraint Markovsky.Usevich13JCAMSoftware , due to the nonlinear structure of the right kernel of a rank deficient (as shown in Remark 4.8). Recently Ishteva.etal13Regularized , new me