Nearly Optimal Computations with Structured Matrices

# Nearly Optimal Computations with Structured Matrices

Victor Y. Pan
Depts. of Mathematics and Computer Science
of the City University of New York
Bronx, NY 10468 USA http://comet.lehman.cuny.edu/vpan/ Elias P. Tsigaridas
PolSys Project
INRIA, Paris-Rocquencourt Center
UPMC, Univ Paris 06, LIP6
CNRS, UMR 7606, LIP6
Paris, France
###### Abstract

We estimate the Boolean complexity of multiplication of structured matrices by a vector and the solution of nonsingular linear systems of equations with these matrices. We study four basic most popular classes, that is, Toeplitz, Hankel, Cauchy and Vandermonde matrices, for which the cited computational problems are equivalent to the task of polynomial multiplication and division and polynomial and rational multipoint evaluation and interpolation. The Boolean cost estimates for the latter problems have been obtained by Kirrinnis in [11], except for rational interpolation, which we supply now. All known Boolean cost estimates for these problems rely on using Kronecker product. This implies the -fold precision increase for the -th degree output, but we avoid such an increase by relying on distinct techniques based on employing FFT. Furthermore we simplify the analysis and make it more transparent by combining the representation of our tasks and algorithms in terms of both structured matrices and polynomials and rational functions. This also enables further extensions of our estimates to cover Trummer’s important problem and computations with the popular classes of structured matrices that generalize the four cited basic matrix classes.

\numberofauthors

2

## 1 Introduction

Table 1 displays four classes of most popular structured matrices, which are omnipresent in modern computations for Sciences, Engineering, and Signal and Image Processing. These basic classes have been naturally extended to the four larger classes of matrices, , , , and , that have structures of Toeplitz, Hankel, Vandermonde and Cauchy types, respectively. They include many other important classes of structured matrices such as the products and inverses of the matrices of these four basic classes, as well as the companion, Sylvester, subresultant, Loewner, and Pick matrices. All these matrices can be readily expressed via their displacements of small ranks [16, Chapter 4], which implies their further attractive properties:

• Compressed representation of matrices as well as their products and inverses through a small number of parameters.

• Multiplication by a vector in nearly linear arithmetic time.

• Solution of nonsingular linear systems of equations with these matrices in quadratic or nearly linear arithmetic time.

These properties enable efficient computations, closely linked and frequently equivalent to fundamental computations with polynomials and rational polynomial functions, in particular to the multiplication, division, multipoint evaluation and interpolation [17]. Low arithmetic cost is surely attractive, but substantial growth of the computational precision quite frequently affects the known algorithms having low arithmetic cost (see, e.g., [5]). So the estimation of the complexity under the Boolean model is more informative, although technically more demanding.

To the best of our knowledge, the first Boolean complexity bounds for multipoint evaluation are due to Ritzmann [19]. We also wish to cite the papers [25] and [13], although their results have been superceded in the advanced work of 1998 by Kirrinnis, [11], apparently still not sufficiently well known. Namely in the process of studying approximate partial fraction decomposition he has estimated the Boolean complexity of the multipoint evaluation, interpolation, and the summation of rational functions. He required the input polynomials to be normalized, but actually this was not restrictive at all. We generalize his estimates. For simplicity we assume the evaluation at the points of small magnitude, but our estimates can be rather easily extended to the case of general input. Kirrinnis’ study as well as all previous estimates of the Boolean complexity of these computational problems rely on multiplying polynomials as integers, by using Kronecker’s product, aka binary segmentation, as proposed in [8]. This implies the -fold increase of the computational precision for the -th degree output. The results that we present rely on FFT algorithms for multiplying univariate polynomials and avoid this precision growth.

We represent our FFT-based estimates and algorithms in terms of operations with both structured matrices and polynomial and rational functions. In both representations the computational tasks and the solution algorithms are equivalent, and so the results of [11] for partial fraction decomposition can be extended to most although not all of these tasks. By using both representations, however, we make our analysis more transparent. Furthermore in Section 7 we extend Kirrinnis’ results to the solution of a Cauchy linear system of equations (which unlike [11] covers rational interpolation) and in Section 7.2 to the solution of Trummer’s celebrated problem [9], [10], [6], having important applications to mechanics (e.g., to particle simulation) and representing the secular equation, which is the basis for the MPSolve, the most efficient package of subroutines for polynomial root-finding [3].

Our estimates cover multiplication of the matrices of the four basic classes of Table 1 by a vector and solving Vandermonde and Cauchy linear systems of equations. (These tasks are equivalent to the listed tasks of the multiplication, division, multipoint evaluation and interpolation of polynomials and rational functions.) Expressing the solution of these problems in terms of matrices has a major advantage: it can be extended to matrices from the four larger matrix classes , , , and . Actually the algorithms for multiplication by vector can be extended quite readily, as we explain in Section 7. There we also briefly discuss the solution of linear systems of equations with the matrices of the cited classes, which can be a natural subject of our further study.

#### Notation

In what follows \OB, resp. \OO, means bit, resp. arithmetic, complexity and \sOB, resp. \sO, means that we are ignoring logarithmic factors. “Ops" stands for “arithmetic operations". For a polynomial , denotes its degree and the maximum bitsize of its coefficients, including a bit for the sign. For , is the maximum bitsize of the numerator and the denominator. denotes the bit complexity of multiplying two integers of size ; we have . is an upper bound on the magnitude of the roots of . We write or just to denote the minimum distance between a root of a polynomial and any other root. We call this quantity local separation bound. We also write instead of . or just denotes the separation bound, that is the minimum distance between all the roots of . The Mahler bound (or measure) of is , where runs through the complex roots of , e.g. [14, 26]. If and , then . If we evaluate a function (e.g. ) at a number using interval arithmetic, then we denote the resulting interval by , provided that we fix the evaluation algorithm and the precision of computing. We write . denotes a -approximation to a polynomial , such that . In particular denotes a -approximation to a constant such that . stands for .

## 2 Preliminaries

### 2.1 Univariate Separation Bounds

The following proposition provides upper and aggregate bounds for the roots of a univariate polynomial. There are various version of these bounds. We use the one presented in [23], to which we also refer the reader for further details and a discussion of the literature. For multivariate separation bounds we refer the reader to [7].

###### Proposition 1

Let be a square-free univariate polynomial of a degree such that . Let be any set of pairs of indices such that , let the complex roots of be , and let be the discriminant of . Then

 ∏(i,j)∈Ω|γi−γj|≥2k−d−d(d−1)2\absa0kM(f)1−d−k√|\disc(f)|≥2k−d−d(d−1)2\absa0k\normf1−d−k2√|\disc(f)|. (2)

If and the maximum coefficient bitsize is then

 2−τ−1≤\absγ≤2τ+1, (3)
 −lg∏(i,j)∈Ω|γi−γj|≤3d2+3dτ+4dlgd. (4)

The following lemma from [24] provides a lower bound on the evaluation of a polynomial that depends on the closest root and on aggregate separation bounds.

###### Lemma 2

Suppose , is a square-free polynomial, and its root is closest to . Then

### 2.2 Complex Interval arithmetic

We also need the following bounds for the width of complex intervals when we perform computations with interval arithmetic. We will use them to bound the error when we perform basic computation with complex (floating point) numbers. We refer the reader to [20] for further details.

###### Proposition 3 (Complex intervals)

Given complex intervals and , where , resp. , denotes the modulus of any complex number in the complex interval , resp. . If and , then , , and .

### 2.3 Approximate multiplication of two polynomials

We need the following two lemmas from [17] on the evaluation of a polynomial at the powers of a root of unity and on polynomial multiplication. A result similar to the first lemma appeared in [21, Section 3] where Bluestein’s technique from [4] is applied (see also [12, Chapter 4.3.3, Exercise 16]). We use that lemma to provide a bound on the Boolean complexity of multiplying two univariate polynomials when their coefficients are known up to a fixed precision. An algorithm for this problem appeared in [21, Theorem 2.2] based on employing Kronecker’s product, but instead we rely on FFT and the estimates of Corollary 4.1 from [2, Chapter 3].

###### Lemma 4

Suppose of a degree at most such that . Let for a positive integer . Assume that we know the coefficients of up to the precision ; that is the input is assumed to be a polynomial such that . Let denote a -th root of unity. Then we can evaluate the polynomial at in such that . Moreover, , for all .

###### Lemma 5

Let of degree at most , such that and . Let denote the product and let for a positive integer . Write . Assume that we know the coefficients of and up to the precision , that is that the input includes two polynomials and such that and . Then we can compute in a polynomial such that . Moreover, for all .

###### Remark 6

In the sequel, for simplicity we occasionally replace the value by its simple upper bound .

## 3 Approximate FFT-based polynomial division

In this section we present an efficient algorithm and its complexity analysis for dividing univariate polynomials approximately. This result is the main ingredient of the fast algorithms for multipoint evaluation and interpolation. The evaluation is involved into our record fast real root-refinement, but all these results are also interesting on their own right because, unlike the previous papers such as [21], [22] and [11], we keep the Boolean cost bounds of these computations at the record level by employing FFT rather than the Kronecker product and thus decreasing the precision of computing dramatically.

Assume two polynomials and such that , , and seek the quotient and the remainder of their division such that and . Further assume that . This is no loss of generality because we can divide the polynomial by its nonzero leading coefficient. We narrow our task to computing the quotient because as soon as the quotient is available, we can compute the remainder at the dominated cost by multiplying by and subtracting the result from .

The complexity analysis that we present relies on root bounds of , contrary to [17] where it relies on bounds on the infinity norm of . To keep the presentation self-contained we copy from [17] the matrix representation of the algorithm, which occupies the next two pages, up to to Lemma 9.

We begin with an algorithm for the exact evaluation of the quotient. Represent division with a remainder by the vector equation

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1tn−11⋮⋮t1t0t1⋯1t0t1⋮t1t0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣qm−nqm−n−1⋮q1q0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦+⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣rn−1rn−2⋮r0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣smsm−1⋮snsn−1⋮s0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

The first equations form the following vector equation,

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1tn−11⋮⋮t1t0t1⋯1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣qm−nqm−n−1⋮q1q0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣smsm−1⋮sn+1sn⋮sm−n−1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⇔Tq=s, (5)

where , , and is the nonsingular lower triangular Toeplitz matrix, defined by its first column vector , . Write and where is the second coordinate vector, and express the matrix as a polynomial in a generator matrix of size as follows,

 Z=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝0…01⋱⋮⋱⋱⋮⋱00…10⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,  T=Z(t)=t(Z)=n∑i=0tiZi,  Zn+1=O.

The matrix is nonsingular because , and the latter equations imply that the inverse matrix is again a polynomial in , that is again a lower triangular Toeplitz matrix defined by its first column. We compute this column by applying a divide and conquer algorithm. Assume that is a power of two, for a positive integer . If this is not the case, embed the matrix into a lower triangular Toeplitz matrix for and with the leading (that is northwestern) block , such that , compute the inverse matrix and output its leading block .

Now represent as the block matrix, where and are Toeplitz submatrices of the Toeplitz matrix , is invertible, and observe that

 T−1=⎡⎢ ⎢⎣T0O\omit\span\omit\span\@@LTX@noalign\vspace10pt\omitT1T0⎤⎥ ⎥⎦−1=⎡⎢ ⎢ ⎢⎣T−10O\omit\span\omit\span\@@LTX@noalign\vspace10pt\omit−T−10T1T−10T−10⎤⎥ ⎥ ⎥⎦.

We only seek the first column of the matrix . Its computation amounts to solving the same problem for the half-size triangular Toeplitz matrix and to multiplication of each of the Toeplitz matrices and by a vector. Let and denote the arithmetic cost of triangular Toeplitz matrix inversion and multiplying an Toeplitz matrix by a vector, respectively. Then the above analysis implies that . Recursively apply this bound to for , and deduce that . The following simple lemma (cf. [16, equations (2.4.3) and (2.4.4)]) reduce Toeplitz-by-vector multiplication to polynomial multiplication and the extraction of a subvector of the coefficient vector of the product, thus implying that for a constant and consequently .

###### Lemma 7

The vector equation

 ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝u0O⋮⋱⋮⋱u0um⋱⋮⋱⋮Oum⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠⎛⎜ ⎜⎝v0⋮vn⎞⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝p0⋮⋮pm⋮pm+n⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ (5)

is equivalent to the polynomial equation

 (m∑i=0uixi)(n∑i=0vixi)=m+n∑i=0pixi. (6)

We wish to estimate the Boolean (rather than arithmetic) cost of inverting a triangular Toeplitz matrix and then extend this to the Boolean cost bound of computing the vector and of polynomial division. So next we assume that the input polynomials are known up to some precision and employ the above reduction of the problem to recursive (approximate) polynomial multiplications.

To study the Boolean complexity of this procedure, we need the following corollary, which is a direct consequence of Lemma 5 and the inequality .

###### Corollary 8 (Bounds for the product P20p1)

Let a polynomial have a degree , let its coefficients be known up to a precision , and let . Similarly, let have the degree , let its coefficients be known up to a precision , and let . Then the polynomial has degree , its coefficients are known up to the precision , and .

The following lemma is a normalized version of Lemma 4.4 in [11].

###### Lemma 9

Let such that , let be an upper bound on the magnitude of roots of , and let with and . Then

 \normQ∞≤2m+lgm+mρ\normF∞  and  \normR∞≤2m+n+lgm+mρ\normF∞.
{proof}

To bring the roots inside the unit circle, transform the polynomials by scaling the variable as follows, , , , and . Now apply [11, Lemma 4.4] to the equation to obtain and .

Combine these inequalities with the equation and the inequalities and to deduce the claimed bounds.

We will estimate by induction the cost of inverting the matrix , by using Eq. (3) recursively. The proof of the following lemma could be found in the Appendix.

###### Lemma 10

Let for a positive integer and let be a lower triangular Toeplitz matrix of. Eq. (5), having ones on the diagonal. Let its subdiagonal entries be complex numbers of magnitude at most known up to a precision . Let be an upper bounds on the magnitude of the roots of the univariate polynomial associated with . Write . Then

 maxi,j\absT−1i,j≤2(ρ+1)n+lg(n)+1.

Furthermore, to compute the entries of up to the precision of bits, that is to compute a matrix such that , it is sufficient to know the entries of up to the precision of

or bits.

The computation of costs or .

As usual in estimating the complexity of approximate division we assume that to simplify our presentation. Recall that .

###### Theorem 11

Assume of degree at most and , such that , , and is an upper bound on the magnitude of the coefficients of . Assume that we know the coefficients of and up to a precision , that is that the input includes two polynomials and such that and , where or . Let denote the quotient and let denote remainder of the division of the polynomials by , that is where .

Then we can compute in or two polynomials and such that and , and .

{proof}

We compute the coefficients of using Eq. (5), ie . Each coefficient of the polynomial comes as the inner product of two vectors, ie .

From Lemma 10 we know that and for .

For the coefficients of the polynomials and , we have assumed the following bounds, , , , , , and for all and . Therefore

 lg\normq−˜q∞ ≤lg\abs∑jT−1i,jsj−∑j˜T−1i,j˜sj≤−λ+ℓ2+τ1+lgn ≤−λ+10τ2lgn+70lg2n+8(ρ+1)nlgn+τ1+lgn.

To compute the remainder we apply the formula . It involves an approximate polynomial multiplication and a subtraction. For the former we use Lemma 5 and obtain the inequality .

Let us also cover the impact of the subtraction. After some calculations and simplifications that make the bounds less scary (albeit less accurate wrt the constant involved), we obtain

 lg\normr−˜r∞ ≤−λ+τ1+2τ2+6lgn+26+ℓ2+2N ≤−λ+τ1+2τ2+6lgn+26+10τ2lgn +70lg2n+8(ρ+1)nlgn+2(n(ρ+1)+lgn+1) ≤−λ+τ1+12τ2lgn+80lg2n+10(ρ+1)nlgn+30.

By using Lemma 9 we bound the norms of the quotient and the remainder as follows:

The maximum number of bits that we need to compute with is or .

The complexity of computing is or .

According to Lemma 5 the complexity of computing the product is or .

###### Remark 12

We can eliminate the dependence of the bounds of Theorem 11 on by applying Vieta’s formulae and the following inequality, , where is the -th coefficient of . In this way, after some further simplifications, the required precision is and the complexity bound becomes or .

## 4 Multipoint polynomial evaluation

###### Problem 1

Multipoint polynomial evaluation. Given the coefficients of a polynomial and a set of knots , compute the values or equivalently compute the vector where , , and .

In the case where the knots are the -th roots of 1 for all , /n, and , Problem 1 turns into the problem of the DFT computation.

#### Solution:

The Moenck–Borodin algorithm of [15] solves Problem 1 in ops for in (2.4.1), (2.4.2) based on the two following simple observations.

###### Fact 13

for any polynomial and any scalar .

###### Fact 14

for any triple of polynomials , , and .

Algorithm 4: the Moenck–Borodin algorithm for multipoint polynomial evaluation.

#### initialization:

Write , , ; for (that is, pad the set of the moduli with ones, to make up a total of moduli). Write .

#### Computation:

1. Fan-in process (see Figure 1). Compute recursively the “supermoduli" , ; .

2. Fan-out process (see Figure 2). Compute recursively the remainders ,
; .

#### Output:

, .

Let us include a brief outline of the analysis of the algorithm (cf. [15]). To prove its correctness, first apply Fact 14 recursively to obtain that for all and . Now, correctness of the output follows from Fact 13.

To estimate the computational cost of the algorithm, represent its two stages by the same binary tree (see Figures 3.1 and 3.2), whose nodes are the “supermoduli" at the fan-in stage 1, but turn into the remainders at the fan-out stage 2.

At each level of the tree, the algorithm computes products of pairs of polynomials of degree at stage 1 and remainders of the division of polynomials of degree of at most by “supermoduli" of degree . Each time multiplication/division uses ops for in (2.4.1), (2.4.2). So we use ops at the -th level and ops at all levels. Recall that and obtain the claimed bound of ops. \qed

###### Remark 15

The fan-in computation at stage 1 depends only on the set and can be viewed as (cost-free) preprocessing if the knot set is fixed and only the polynomial varies. Similar observations hold for the solution of many other problems in this chapter.

###### Remark 16

Problem 1 and its solution algorithms are immediately extended to the case where we have points for or . The solution requires ops provided , , and ops are sufficient for the solution where . for a general set but decreases to , where for fixed scalars , and and for all . This also leads to a similar improvement of the estimates for the Boolean complexity [1].

### 4.1 Boolean complexity estimates

In the following two lemmata we present the bit complexity of the fan-in and the fan-out process. These results are of independent interest. We do not estimate the accuracy needed and the bit complexity bound of the algorithm for multipoint evaluation because in Lemma 21 we cover a more general algorithm. Multipoint evaluation is its special case.

###### Lemma 17 (Complexity of Fan-in process)

Assume that we are given complex numbers known up to a precision , that is , and that for a positive integer . At the cost the Fan-in process of the Moenck–Borodin algorithm approximates the “supermoduli" within the bounds for all and . Moreover,