In this paper we show that the weighted Bernstein-Walsh inequality in logarithmic potential theory is sharp up to some new universal constant, provided that the external field is given by a logarithmic potential. Our main tool for such results is a new technique of discretization of logarithmic potentials, where we take the same starting point as in earlier work of Totik and of Levin & Lubinsky, but add an important new ingredient, namely some new mean value property for the cumulative distribution function of the underlying measure.
As an application, we revisit the work of Beckermann & Kuijlaars on the superlinear convergence of conjugate gradients. These authors have determined the asymptotic convergence factor for sequences of systems of linear equations with an asymptotic eigenvalue distribution. There was some numerical evidence to let conjecture that the integral mean of Green functions occurring in their work should also allow to give inequalities for the rate of convergence if one makes a suitable link between measures and the eigenvalues of a single matrix of coefficients. We prove this conjecture, at least for a class of measures which is of particular interest for applications.
On the sharpness of the weighted Bernstein-Walsh inequality, with applications to the superlinear convergence of conjugate gradients
[15pt] Bernhard Beckermann111Laboratoire Painlevé UMR 8524, UFR Mathématiques, Univ. Lille, F-59655 Villeneuve d’Ascq CEDEX, France. E-mail: email@example.com. Supported in part by the Labex CEMPI (ANR-11-LABX-0007-01). and Thomas Helart222Laboratoire Painlevé UMR 8524, UFR Mathématiques, Univ. Lille, F-59655 Villeneuve d’Ascq CEDEX, France. E-mail: firstname.lastname@example.org.
Keywords: logarithmic potential theory, Bernstein-Walsh inequality, discretization of potential, conjugate gradients, superlinear convergence.
AMS subject classification: 15A18, 31A05, 31A15, 65F10
Conjugate gradients (CG) is a popular method for solving large systems of equations, with the matrix of coefficients being symmetric and positive definite . However, its convergence (at least in exact arithmetic) is not yet fully understood, despite an important number of research contributions, for instance [3, 4, 5, 2, 7, 8, 9, 10, 18]. It happens quite often that there is a regime of convergence called superlinear convergence, which depends very much on the eigenvalue distribution of the matrix, see §1.2 for more details. People have been aware of this phenomenon for more than 40 years, but only in  a general theory based on logarithmic potential theory was suggested to quantify the rate of convergence, see also  for a more comprehensive summary. The drawback of this theory is that all results in  study only the so-called asymptotic convergence factor. In addition, this theory requires to consider sequences of systems of equations with a joint eigenvalue distribution, and thus gives not so much information about the actual rate of convergence for a single matrix. Numerical evidence in [3, 4, 5] seemed to indicate that behind the asymptotic results there should be some hidden inequality valid for a single matrix, see Conjecture 1.4 below. To our knowledge, the present paper is the first which deals with this conjecture, at least for a suitable subclass of eigenvalue distributions.
This paper contains three main ingredients, all being connected with polynomial extremal problems and thus with logarithmic potential theory: we discuss in §1.1 the sharpness of the so-called weighted Bernstein-Walsh inequality for the particular case where the external field is the logarithmic potential of some measure. Here our main result in Theorem 1.3 indicates the existence of some new universal constant. Secondly, we give and discuss in §1.2 some new upper bound for the rate of convergence of conjugate gradients, and show in our Theorem 1.7 the above conjecture for a particular class of eigenvalue distributions, which is illustrated by some (academic) numerical examples.
Our main technical result stated and proved in §2 is Theorem 2.1 on a new fine discretization of logarithmic potentials for a suitable class of measures, where in contrast to preceding work of Totik, Lubinsky and others we get (large but) explicit constants. Here an essential tool is a new mean value property stated in Theorem 2.6.
1.1 The weighted Bernstein-Walsh inequality
One of the appealing aspects of CG convergence is that there is a close link with polynomial extremal problems and extremal problems in logarithmic potential theory, which we discuss now.
Given a finite union of compact intervals , we denote by the set of Borel measures with support in and of total mass , and consider the logarithmic potential and energy
Given a weight defined on and continuous on together with an external field , it is known [16, Theorem I.1.3 and Theorem I.4.8] that there is a unique minimizer of the extremal problem
which is uniquely characterized by the existence of a constant such that
Logarithmic potential theory with external fields has been applied with success for getting asymptotics for various polynomial extremal problems , maybe one of the most prominent results being the weighted Bernstein-Walsh inequality [16, Theorem III.2.1]
and its sharpness, see, e.g., [16, Corollary III.1.10],
where denotes the set of polynomials of degree at most , and . One aim of this paper is to improve (4) for a particular class of external fields, see Theorem 1.3 below, namely to show that (3) is sharp up to some constant. Before giving some more details, let us first have a look at other classes of external fields where such constants are explicitly known. In what follows we will write to denote the Green function in for a compact set with pole at . We will be mainly interested in the special case of an interval where the Green function vanishes on and is strictly positive outside , and where explicit formulas are available.
Consider and , then an explicit formula is known for the minimizer in (1)
also called Robin equilibrium measure of the interval and denoted by . It is also known from, e.g., [16, Eqn. (I.4.8)] that , and thus (3) becomes the classical Bernstein-Walsh inequality. Taking with the Chebyshev polynomial of the first kind, one may also show that (3) is sharp up to a factor .
Consider , and with being a polynomial of degree , strictly positive on , compare with [14, chap 4.4]. Thus with an atomic measure of mass . Here the extremal measure in (1), (2) is given in [16, Example II.4.8] in terms of balayage onto , and it follows from [16, Eqn. (4.32)] that
Moreover, with help of the factorization
, the polynomial of degree having all its roots outside the unit circle, it is known that defined by
is a polynomial of degree , showing that again (3) is sharp up to a factor .
We are interested in the case where the external field is a positive potential (not necessarily of an atomic measure), for instance if is a (power of a) polynomial. This includes the particular case on for , starting point of an important research area about incomplete polynomials [16, §VI.1.1]. For external fields being a positive potential, we recall below how to solve the extremal problem, including the well-known pushing effect that the support of the equilibrium measure may be a proper subset of . We then state our main result on the sharpness of the weighted Bernstein-Walsh inequality.
Let be some integer, and on , with the Borel measure being compactly supported on . Consider on the strictly decreasing function
Moreover, the weighted Bernstein-Walsh inequality (3) is sharp up to some constant, that is, there exists a universal real constant such that, for all , we may construct a polynomial of degree such that, for all ,
Our proof of Theorem 1.3 presented in §3.1 is based on a fine discretization of the logarithmic potential . We will show in this paper that , but this is by no means optimal. The most remarkable fact for us seems to be that such a constant does not depend on the data nor on . In particular, we do not need any further assumptions on smoothness of , which is probably required by other techniques like a Riemann-Hilbert approach (which in any case would only allow to discuss asymptotics).
1.2 Superlinear convergence for conjugate gradients
Conjugate gradients is a popular method for solving large sparse linear systems with symmetric positive definite , with spectrum , . Here one easily obtains the error estimate333In general, (8) might be an important overestimation of the error, but there exist right-hand sides with equality. for the th iterate
with the energy norm , where is any compact set containing the spectrum , for instance . Thus, in contrast to the polynomial extremal problem considered in §1.1, we have a trivial weight and take norms on discrete sets. One way of relating the two problems is to replace by an interval containing all eigenvalues, leading to the classical upper bound
compare with Example 1.1. It is however known for a long time that there are eigenvalue distributions which lead to convergence which is faster than the one described in (9), namely so-called superlinear convergence, see for instance Figure 1. A first attempt to quantify such a convergence behavior was suggested by Kuijlaars and Beckermann , see also the review , or the review  from the perspective of discrete orthogonal polynomials. The key ingredient of this theory is to dispose of a measure with continuous potential and compact support describing the eigenvalue distribution. In , this is quantified by supposing that there is a sequence of systems , with being the weak-star limit of normalized counting measures of the spectra of the symmetric and positive definite matrices ,
Under some additional weak assumptions for small eigenvalues, the authors establish in [3, Theorem 2.1] for the th iterate of conjugate gradients applied to the system the asymptotic upper bound
where is a decreasing family of compact subsets of the convex hull of the spectra, obtained from some constrained extremal problem in logarithmic potential theory, which we explain now.
For measures with compact support and continuous potential, and , according to [6, 15] there exists a unique minimizer of under all candidates with . This minimizer is uniquely characterized by the existence of a constant such that
Many Buyarov-Rakhmanov type properties are known about the measures for fixed and varying , we just recall here from [3, Proof of Theorem 2.1] the fact that the measures are increasing in , and hence
As a consequence, the map is concave and describes superlinear convergence behavior. The compact sets may have a quite complicated shape, and the main finding of  roughly says that the th Ritz values of approach well all eigenvalues in . There is a similar (rough) interpretation of (11): so-called ”converged” eigenvalues which are already well approached by th Ritz values should no longer contribute (in exact arithmetic) to the convergence of CG at later stages.
In many examples, numerical evidence did let to conjecture that the above upper bound (11) even holds (up to some modest constant) for a single matrix , without limits and without taking the -th root, see for instance [3, Eqn.(1.9) and Figures 1 and 4], [5, Eqn. (1.3)], or Figure 1. Of course, for a single matrix we cannot define through (10). This gives the following conjecture.
There is a (modest) constant and a technique of associating a measure with compact support and continuous potential to the spectrum of a positive definite matrix such that, for all sufficiently small,
It may be that this conjecture is wrong for measures where has a complicated shape. In our proof of the conjecture, following [3, Lemma 3.1(a)], we will impose sufficient conditions on such that for all .
Suppose that is supported on the interval with density with respect to Lebesgue measure denoted by , and suppose444It follows that has compact support and continuous potential. that vanishes at , and is strictly increasing in . Then for all we have , with being the unique solution of the equation
in particular is strictly increasing.
Roughly speaking, having for sufficiently small means that there are so few eigenvalues around that they are the first eigenvalues which are well approached by Ritz values of low order. One of the reasons to consider such sets is that, in any case, the superlinear convergence rate is only pronounced if small eigenvalues are well approached by Ritz values, and the rate depends not as much on other ”converging” eigenvalues, which in first order could be neglected. Another reason is that, if the system comes from discretizing an elliptic PDE, we might have only asymptotic knowledge on small eigenvalues of through a so-called Weyl formula. The final reason is that in the particular case the analysis becomes simpler, and also the upper bound is more explicit, since, by (9),
in terms of some ”effective condition number” , compare with [5, Eqn. (2.27)].
In order to proceed, we first extend our definition (8) of to compact sets which do not necessarily contain the spectrum of : following , for a fixed matrix , a compact set , and sufficiently large , let
and then obviously . This inequality has been used for example in  or  in order to derive a CG convergence bound taking into account few outliers represented by the set , where typically is the convex hull of the remaining eigenvalues.
In what follows we consider , and thus we prescribe as roots of the smallest eigenvalues . Understanding the modulus of the product of the corresponding linear factors as a weight, and setting , and , Theorem 1.3 gives the following upper bounds in terms of Green functions. The sharpness follows from the weighted Bernstein-Walsh inequality (3).
For any integer , let be equal to if , and else let be the unique solution of the equation . Then
being sharp up to the factor .
Corollary 1.6 gives us for each an upper bound for the function , each of them having the shape of a straight line for sufficiently large , with the slope of these straight lines decreasing with , but the abscissa in general increases. We thus hope that is close to the value of the concave lower envelope of these straight lines, which is true for the particular example of Figure 2. In fact, finding an optimal with minimal for given seems to be a difficult task, we will suggest an approximate solution in order to solve the above conjecture.
Let and for be as in Lemma 1.5, and be a symmetric positive definite matrix with spectrum .
If the integers and are such that
and thus Conjecture 1.4 holds.
The above choice (15) of is nearly optimal in the following sense: consider diagonal with eigenvalues satisfying for . Furthermore, let with , then555We write instead of in order to indicate that here we consider the spectrum of depending on .
It is also interesting to compare Theorem 1.7(a),(b) with [4, Theorem 3.1] which showed under the sole assumption (10) (and for quite general measures ) that, for any fixed compact set , the quantity is asymptotically greater than or equal to the right-hand side of (11). One of the consequences of our Theorem 1.7 is that, roughly, we can achieve equality for the interval .
Consider the probability density
in particular for , for , and for , in accordance with the right-hand plot of Figure 2.
In the previous example the small eigenvalues were approximately equidistant, with stepsize , and the convex hull of the spectrum given approximately by . Up to correct scaling, a similar behavior is true for the eigenvalues of the finite difference discretization of the 2D Laplacian on the unit square with Dirichlet boundary conditions, and thus the convergence curves should be similar. However, this is no longer true for higher dimensions , where we expect that grows like a constant times for small , which motivates the following example.
For a parameter , consider the density
We again choose for attaining equality in (14), however, there are no longer explicit formulas, and thus the have to be computed numerically. In Figure 3 we have plotted two examples for , on the left for and on the right for , where in both cases we have chosen the approximately optimal of Theorem 1.7(a), in accordance with the statement of Theorem 1.7. Notice also the well-known phenomena that the convergence of CG improves dramatically with getting larger.
1.3 Structure of the paper
The reminding of the paper is organized as follows. Section 2 contains our results on discretizing the logarithmic potential of a class of measures including the extremal measure of Theorem 1.3. We first state our main Theorem 2.1, and then report in §2.1 about related results of Totik and of Lubinsky, and about the link with weighted quadrature formulas. Subsequently, we give in §2.2 the structure of the proof of Theorem 2.1, where following Totik we write the discretization error as a sum of three sums. We then state our original approach for dealing with these three sums, namely the mean value property of Theorem 2.6, and describe in §2.3 how to bound each of the three sums, with explicit constants.
In the third section we explain how to deduce Theorem 1.3 from Theorem 2.1 (§3.1), and Theorem 1.7 from Theorem 1.3 (§3.2). Subsequently, we give some concluding remarks. Our (quite technical) proof of Theorem 2.6 is postphoned to Appendix A, and in Appendix B we gather some further technical results for dealing with our three sums.
2 Discretization of a potential
Consider a measure which has the density
for a function which is non negative, concave and increasing666 In particular, is continuous and bounded on , thus we may extend to become a continuous, non-negative, concave and increasing function in . on , such that is convex on . Then there exists a universal explicit constant such that for each we may construct a monic polynomial of degree such that
Another class of functions satisfying the assumptions of Theorem 2.1 for is given by
We will describe in §2.1 related work for discretizing potentials under various assumptions, but here the constants in general depend on , see for instance [16, §VI.4] for a summary. In §2.2 we give a proof of Theorem 2.1, where we initially follow the approach of Totik in [17, §2 and §9], see also the very accessible reference [13, Method 1] for the particular case on (up to a quadratic change of variables). Subsequently, we give in §2.3 a proof of three upper bounds we used in §2.2. Since the general case follows from a linear change of variables, we will suppose in what follows that in Theorem 2.1.
2.1 How to discretize a potential?
It is natural to approach the logarithmic potential by a quadrature rule of the form
for instance a weighted rectangular or midpoint rule, where we first cut into subintervals , , of equal mass , and chose for . As long as and the density of does not vary too much, we may bound the error above and below, and may even show convergence to for for suitable choices of . In our case we have the additional difficulties that the density of may have singularities at , showing that the interval lengths may strongly vary in size for , and in addition in case we have to deal with a logarithmic singularity of the integrand.
Totik in [13, Method 1] used the weighted midpoint rule
for . In the particular case and , a proof of Theorem 2.1 can be found in [13, §2], which strongly relies on the explicit knowledge of asymptotics for the points and as a function of and for , and thus on the explicit knowledge of . In [16, Theorem VI.4.2] (see also the related result [17, Lemma 9.1] where the roots of are slightly shifted into the complex plane), Totik considered probability measures with densities which are continuous up to a finite number of singularities of the form for . These assumptions are true in the setting of Theorem 2.1. He then shows the existence of (non explicit) constants depending on but not on such that, for all ,
We see that the first inequality is as in Theorem 2.1(a), whereas the second one is clearly weaker than Theorem 2.1(b) for close to , since we get an additional term for some . Again, a proof of these statements uses heavily asymptotics for the points and as a function of and for , and thus quite a bit of information on .
Another technique of discretization has been considered by Lubinsky & Levin in  and , see also the very accessible reference [13, Method 2] for the particular case on (up to a quadratic change of variables). With as before, consider intermediate abscissa such that all intervals have the same mass . Given , the authors then apply trapezian rule on most of the subintervals corrected with suitable rectangle rules on the remaining 2 or 3 subintervals such that . Up to a (quadratic) change of variables, the authors of [11, Theorem 9.1] suppose that
with continuous and on , and the modulus of continuity satisfies that is bounded above by some for . In this case, for all ,
where are (non explicit) constants depending only on and the minimum and maximum of on . Note that the assumptions of [11, Theorem 9.1] and those of Theorem 2.1 are different and do not imply each other, see for instance Example 2.2 for . However, the above inequalities are quite close to those of Theorem 2.1, though our constants do not depend on , and our does not depend on , and we only allow .
In the particular case and in Theorem 2.1, we have explicit formulas
Here the midpoint approach of Totik gives the monic polynomial
which is not optimal for the one-sided approximation of in Theorem 2.1 or the sharpness of the classical Bernstein-Walsh inequality as discussed in Example 1.1, but good enough for concluding in Theorem 2.1.
The previous example is misleading in the sense that in general there is no such sufficiently explicit formula for the nor the which will allow us to conclude in Theorem 2.1.
2.2 Structure of the proof of Theorem 2.1
The following classical lemma shows Theorem 2.1(b).
Proof: Using the fact that is convex on by assumption on , we know that , and thus
where in the last equality we have used (17). Also, using the convexity of and the inequality we obtain
Integrating and using again (17) we conclude that
(a) The interested reader might have noticed that, by the same argument, the inequality of Theorem 2.1(b), namely , also holds for .
(b) For (and similarly for ), the right-hand side of Theorem 2.1(b) cannot be improved since, by Lemma 2.4 and Lemma B.8(c),
(c) For , it is not too difficult to show that satisfies
and hence by Lemma B.8(c)
Thus, for sufficiently large , the inequality of Theorem 2.1(b) also holds for non-real up to some arbitrarily small constant.