Convergence of adaptive ILG Methods

On the Convergence of
Adaptive Iterative Linearized Galerkin Methods

Pascal Heid and Thomas P. Wihler Mathematics Institute, University of Bern, Sidlerstrasse 5, CH-3012 Bern, Switzerland

A wide variety of different (fixed-point) iterative methods for the solution of nonlinear equations exists. In this work we will revisit a unified iteration scheme in Hilbert spaces from our previous work [12] that covers some prominent procedures (including the Zarantonello, Kačanov and Newton iteration methods). In combination with appropriate discretization methods so-called (adaptive) iterative linearized Galerkin (ILG) schemes are obtained. The main purpose of this paper is the derivation of an abstract convergence theory for the unified ILG approach (based on general adaptive Galerkin discretization methods) proposed in [12]. The theoretical results will be tested and compared for the aforementioned three iterative linearization schemes in the context of adaptive finite element discretizations of strongly monotone stationary conservation laws.

Key words and phrases:
Numerical solution methods for quasilinear elliptic PDE, monotone problems, fixed point iterations, linearization schemes, Kačanov method, Newton method, Galerkin discretizations, adaptive mesh refinement, convergence of adaptive finite element methods
2010 Mathematics Subject Classification:
35J62, 47J25, 47H05, 47H10, 49M15, 65J15, 65N12, 65N30, 65N50
The authors acknowledge the financial support of the Swiss National Science Foundation (SNF), Grant No. 200021 182524

1. Introduction

In this paper we analyze the convergence of adaptive iterative linearized Galerkin (ILG) methods for nonlinear problems with strongly monotone operators. To set the stage, we consider a real Hilbert space with inner product  and induced norm denoted by . Then, given a nonlinear operator , we focus on the equation


where  denotes the dual space of . In weak form, this problem reads


with  signifying the duality pairing in . For the purpose of this work, we suppose that satisfies the following conditions:

  1. The operator is Lipschitz continuous, i.e. it exists a constant such that

    for all .

  2. The operator is strongly monotone, i.e. there is a constant such that

    for all .

Given the properties (F1) and (F2), the main theorem of strongly monotone operators states that (1) has a unique solution ; see, e.g., [16, §3.3] or [18, Theorem 25.B].

Iterative linearization

The existence of a solution to the nonlinear equation (1) can be established in a constructive way. This can be accomplished, for instance, by transforming (1) into an appropriate fixed-point form, which, in turn, induces a potentially convergent fixed-point iteration scheme. To this end, following our approach in [12], for some given , we consider a linear and invertible preconditioning operator . Then, applying  to (1) leads to the fixed-point equation

For any suitable initial guess , the above identity motivates the iteration scheme

Equivalently, we have


For given , we emphasize that the above problem of solving for  is linear; consequently, we call (3) an iterative linearization scheme for (1). Letting


we may write


In order to discuss the weak form of (5), for a prescribed , we introduce the bilinear form


Then, based on , the solution  of (5) can be obtained from the weak formulation


Throughout this paper, for any , we assume that the bilinear form  is uniformly coercive and bounded. The latter two assumptions refer to the fact that there are two constants  independent of , such that




respectively. In particular, owing to the Lax-Milgram Theorem, these properties imply the well-posedness of the solution  of the linear equation (7), for any given .

Let us briefly review some prominent procedures that can be cast into the framework of the linearized fixed-point iteration (7): For instance, we point to the Zarantonello iteration given by


with  being a sufficiently small parameter; cf. Zarantonello’s original report [17], or the monographs [16, §3.3] and [18, §25.4]. A further example is the Kačanov scheme which reads


in the special case that is independent of . Finally, we mention the (damped) Newton method which is defined by


for a damping parameter . Here  signifies the Gâteaux derivative of  (provided that it exists). For any of the above three iterative procedures, we emphasize that convergence to the unique solution of (1) can be guaranteed under suitable conditions; see our previous work [12] for details.

The ILG approach

Consider a finite dimensional subspace . Then, the Galerkin approximation of (2) in  reads as follows:


We note that (13) has a unique solution  since the restriction still satisfies the conditions (F1) and (F2) above. The iterative linearized Galerkin (ILG) approach is based on discretizing the iteration scheme (7). Specifically, a Galerkin approximation of , based on a prescribed initial guess , is obtained by solving iteratively the linear discrete problem


for . For the resulting sequence  of discrete solutions it is possible, based on elliptic reconstruction techniques (cf., e.g., [13, 14]), to obtain general (abstract) a posteriori estimates for the difference to the exact solution, , of (1), i.e. for , , see [12, §3]. Based on such a posteriori error estimators, an adaptive ILG algorithm that exploits an efficient interplay of the iterative linearization scheme (14) and automatic Galerkin space enrichments was proposed in [12, §4]; see also [6]. We refer to some related works in the context of (inexact) Newton schemes [1, 2, 9, 8], or of the Kačanov iteration [4, 11].

Goal of this paper

The convergence of an adaptive Kačanov algorithm, which is based on a finite element discretization, for the numerical solution of quasi-linear elliptic partial differential equations has been studied in [11]. Furthermore, more recently, the authors of [10] have proposed and analyzed an adaptive algorithm for the numerical solution of (1) within the specific context of a finite element discretization of the Zarantonello iteration (10). The latter paper includes an analysis of the convergence rate which is related to the work [5] on optimal convergence for adaptive finite element methods within a more general abstract framework. The purpose of the current paper is to generalize the adaptive ILG algorithm from [10] to the framework of the unified iterative linearization scheme (5); furthermore, arbitrary (conforming) Galerkin discretizations will be considered. In order to provide a convergence analysis for the ILG scheme (14) within this general abstract setting, we will follow along the lines of [10], however, we emphasize that some significant modifications in the analysis are required. Indeed, whilst the theory in [10] relies on a contraction argument for the Zarantonello iteration, this favourable property is not available for the general iterative linearization scheme (5). To address this difficulty, we derive a contraction-like property instead. This observation will then suffice to establish the convergence of the adaptive ILG scheme, and to (uniformly) bound the number of linearization steps on each (fixed) Galerkin space similar to [10]; we note that the latter property constitutes a crucial ingredient with regards to the (linear) computational complexity of adaptive iterative linearized finite element schemes.


Section 2 contains a convergence analysis of the unified iteration scheme (5). On that account we will encounter a contraction-like property, which is key for the subsequent analysis of the convergence rate of the adaptive ILG algorithm in Section 3. Here, in addition, a (uniform) bound of the iterative linearization steps on each discrete space will be shown. In Section 4, we will test our ILG algorithm in the context of finite element discretizations of stationary conservation laws. Finally, we add a few concluding remarks in Section 5.

2. Iterative linearization

In this first section we will address the convergence of the linearized iteration (5). We begin with the following a posteriori error estimate.

Lemma 2.1.

Consider the sequence  generated by the iteration (5). If satisfies (F1)–(F2), and , for , fulfils (8)–(9), then it holds the bound


for any .


By invoking (F2), and since is the (unique) solution of (1), for , we find that

Employing (4), (7), and (9), we further get

and thus

By the triangle inequality, this leads to

which completes the proof. ∎

Remark 2.2.

We note that the above result equally holds if (2) and (5) are restricted to any closed subspace of .

2.1. Potentials

In addition to (F1) and (F2), let us make a further assumption on the (nonlinear) operator  from (1), which will play an essential role in the ensuing analysis.

  • The operator possesses a potential, i.e. it exists a Gâteaux differentiable functional such that .

We notice the following relation between the norm  and the potential .

Lemma 2.3.

Suppose that the operator satisfies (F1)–(F3), and denote by the unique solution of (1). Then, we have the estimate


In particular, takes its minimum at .


For fixed , define the real-valued function , for . Taking the derivative leads to

By invoking the fundamental theorem of calculus and implementing (1), this yields

Applying the assumptions (F1) and (F2) we can bound the integrand from above and below, respectively. Indeed, the strong monotonicity (F2) implies that

and therefore,

Likewise, by invoking (F1) instead of (F2), we find that

Combining the above bounds leads to (16). ∎

2.2. Contractivity

In order to state and prove the main result of this section, see Theorem 2.7 below, we impose a monotonicity condition on the sequence generated by the iterative linearization scheme (5).

  • There is a constant such that the sequence defined by (5) fulfils the bound


    where is the potential of introduced in (F3).

Proposition 2.4.

Suppose that (F1)–(F4) are satisfied. Furthermore, let the bilinear form from (7) be coercive and bounded, cf. (8) and (9), respectively. Then, for any , the sequence  from (5) satisfies the estimate




Moreover, the contraction-like property


holds true for any .

Before turning to the proof of the above proposition, we establish an auxiliary result.

Lemma 2.5.

Consider a sequence  which satisfies the estimate


for some constant . Then, it holds the bound , for any .


Let us define the sequence , . Using (21), we note that

for all . By induction, this implies that  for any . Therefore, we infer that

Rearranging terms completes the proof. ∎

Proof of Proposition 2.4.

Let be arbitrary. Then, we note the telescope sum

Thus, by virtue of (17), we infer that


We aim to bound the left-hand side. To this end, we employ Lemma 2.3, which implies that

This, together with Lemma 2.1, leads to


Combining (22) and (23) yields

Letting , we obtain (18). Moreover, upon setting  and , , the bound (18) takes the form (21). Hence, applying Lemma 2.5, we deduce (20). ∎

From (20) we immediately obtain the following result.

Corollary 2.6.

Under the assumptions of Proposition 2.4, it follows that is a null sequence as .

2.3. Convergence

We are now ready to state and prove the main result of this section.

Theorem 2.7.

Suppose that (F1)–(F4) as well as (8) and (9) hold true. Then, the sequence obtained from the iterative linearization procedure (5) converges to the unique solution of (1).


Combining Lemma 2.1 and Corollary 2.6 directly implies the convergence of the linearized iteration scheme (5). ∎

Remark 2.8.

In the proof of Theorem 2.7 the application of Lemma 2.1 can be replaced by using  [12, Proposition 2.1] instead. We note that the latter result does not require property (F1) to hold. Indeed, assume that (F2), (8) and (9) are satisfied, and that and are continuous mappings from into its dual space  with respect to the weak topology on . Then, if the sequence defined by (5) satisfies  as , it converges to the unique solution of (1).

2.4. Some remarks on condition (F4)

Suppose that the assumptions (F1)–(F3) are satisfied, and consider the sequence generated by the iteration (5). Analogously as in the proof of Lemma 2.3, for fixed , we define the real-valued function , for . Then, it holds the identity

Using (4) and (7), we note that


Consequently, if the bilinear form , for any given , is uniformly coercive with constant , cf. (8), where refers to the Lipschitz constant occurring in (F1), then we obtain that

i.e. (17) is satisfied with .

Proposition 2.9.

If satisfies (F1)–(F3), and the bilinear form from the unified iteration scheme (5) is coercive with coercivity constant , cf. (8), then (F4) holds true.

Remark 2.10.

For the Zarantonello iteration scheme (10) we note that , for , in (6). Then, we have that

which, upon using Proposition 2.9, shows that (F4) is satisfied for any . Under suitable assumptions, a similar observation can be made for the Newton method (12) provided that the damping parameter  is chosen sufficiently small; cf. [12, Theorem 2.6].

Remark 2.11.

The above Proposition 2.9 delivers a sufficient condition for (F4). We note, however, that it is not necessary. In particular, if the coercivity constant  in (8) is much smaller than the Lipschitz constant from (F1), then the bound on  in Proposition 2.9 is violated. Nonetheless, in that case, we can still satisfy (17) by imposing alternative assumptions; cf., e.g., (K2) in [12].

3. Adaptive ILG Discretizations

In this section, following the recent approach [10], we will present an adaptive ILG algorithm that exploits an interplay of the unified iterative linearization procedure (5) and abstract adaptive Galerkin discretizations thereof, cf. (14). Moreover, we will establish the (linear) convergence of the resulting sequence of approximations to the unique solution of (1), and comment on the uniform boundedness of the iterative linearization steps on each discrete space. We proceed along the ideas of [10, §4 and §5], and generalize those results to the abstract framework considered in the current paper. Throughout this section, we will assume that any iterative linearization is of the form (7), with (8) and (9) being satisfied.

3.1. Abstract error estimators

We generalize the assumptions on the finite element refinement indicator from [10, §4]. Let us consider a sequence of hierarchical finite dimensional Galerkin subspaces , i.e.


Suppose that, for any , there is a computable error estimator


which satisfies the following two properties:

  1. For all it holds that

  2. The error of the discrete solution  from (13) is controlled by the a posteriori error bound


    where  is the exact solution of (1).

Here, are two constants.

The following result shows that the two estimators for and are equivalent once the linearization error is small enough.

Lemma 3.1.

Suppose that satisfies (F1)–(F2), and that the a posteriori estimator fulfils (A1). Furthermore, for some , assume that


with a constant , where


Then, we have that


Moreover, the two error estimators  and  are equivalent in the sense that


Owing to Lemma 2.1 and Remark 2.2, and due to (28), it holds that


Invoking the Lipschitz continuity (A1), we obtain

Since we have that . Hence, manipulating the above inequality yields


Combining the two inequalities (32) and (33) gives the bound (30). Moreover, applying (A1), as before, and using (32), we infer that

Similarly, employing (33), it follows that

This completes the argument. ∎

3.2. Adaptive ILG algorithm

We focus on the adaptive algorithm from [10], which was studied in the context of finite element discretizations of the Zarantonello iteration (10). It is closely related to the general adaptive ILG scheme in [12]. The key idea is the same in both algorithms: On a given Galerkin space, we iterate the linearization scheme (14) as long as the linearization error dominates. Once the ratio of the linearization error and the a posteriori error bound is sufficiently small, we enrich the Galerkin space in a suitable way.

1:Prescribe a tolerance , and an adaptivity parameter . Moreover, set  and . Start with an initial Galerkin space , and an arbitrary initial guess .
3:     Set and .
4:     while  do
5:         Perform a single iterative linearization step (14) to obtain from .
6:         Update .
7:         Set , and compute the error estimator from (25).
8:     end while
Let , and enrich the Galerkin space appropriately based on the error estimator in order to obtain .
10:     Define by inclusion .
11:     Update , and set .
12:until .
13:return the sequence of discrete solutions .
Algorithm 1 Adaptive ILG algorithm
Remark 3.2.

We emphasize that we do not know (a priori) if the while loop of Algorithm 1 always terminates after finitely many steps. Moreover, it may happen that , for some , i.e. the algorithm terminates. Let us provide two comments on this issue:

  1. Suppose that there is an enrichment of generated by the above Algorithm 1 such that for all ; in this situation, the while loop will never end. Given the assumptions of Proposition 2.4, it follows from Corollary 2.6 that as . In addition, by virtue of Theorem 2.7 (applied to the discrete setting (13) and (14)), we have that as . Then, invoking the reliability (A2) and the continuity (A1), we conclude that

    It follows that , and therefore as . In particular, Algorithm 1 will generate an approximate solution which, for sufficiently large , is arbitrarily close to the exact solution of (1).

  2. If, for some , the while loop terminates, then we have the bound . Thus, in the special situation where , we directly obtain that . Then, employing Lemma 2.1 and Remark 2.2, we find that , i.e. . Consequently, recalling (A2), we deduce that

    We obtain that , i.e. the exact solution of (1) is found.

3.3. Perturbed contractivity

We will now turn to the proof of the convergence of Algorithm 1. More precisely, we will show that the sequence generated by the above ILG procedure converges, under certain assumptions, to the exact solution of (1). In view of Remark 3.2 we may assume that the while loop always terminates after finitely many steps with for all .

We begin with the following result, which corresponds to [10, Proposition 4.10]. Since we consider general Galerkin discretizations, an additional assumption, cf. the perturbed contraction (34) below, is imposed.

Proposition 3.3.

Let (F1)–(F2) and (A1) be satisfied, and  be given. Moreover, for each , assume that the while loop of Algorithm 1 terminates after finitely many steps, thereby yielding an output , with . Furthermore, suppose that there are constants and such that it holds the perturbed contraction bound


where is the unique solution of (13). Then, we have that as .


Set , and denote by the solution of the weak formulation

For any , Galerkin orthogonality reads for all . Thus, by the Lipschitz continuity (F1) and strong monotonicity (F2) we find that

for any . This results in the Céa type estimate


Recalling the nestedness (24) of the Galerkin spaces, and exploiting the definition of , the above bound (35) directly implies that for . Consequently, we deduce that for . Hence, by (34), the estimator , , is contractive up to a non-negative perturbation which tends to 0. This implies that as , see, e.g., [3, Lemma 2.3]. Since satisfies by construction of Algorithm 1, Lemma 3.1 yields the equivalence of and . Hence, we conclude that as . ∎

Remark 3.4.

We emphasize that the perturbed contraction property (34) is satisfied, for instance, in the special case of the finite element method; see [10] for details.

Combining the above Proposition 3.3 and Lemma 3.1 leads to the following result.

Corollary 3.5.

Given the same assumptions as in Proposition 3.3 and, additionally, (A2), then for , where the sequence  is generated by the Algorithm 1, and is the unique solution of (1).


Let , , be the output of Algorithm 1 based on the Galerkin space . Then, by virtue of (30), and due to (A2) and (31), we have




Applying Proposition 3.3 completes the proof. ∎

3.4. Linear convergence

In this section we show the linear convergence of the output sequence generated by Algorithm 1. Our analysis follows closely the work [10, Theorem 5.3]. Again, we formulate and prove the result within a more general setting, and, for this purpose, under the additional assumption (34) as before.



with  from (F2), we introduce the quantity


where  and  are the (unique) solution of (1) and its Galerkin approximation from (13), respectively. By virtue of Lemma 2.3, provided that (F1)–(F3) hold, we observe that

for any .

Theorem 3.6.

Let satisfy (F1)–(F3), and assume (A1)–(A2). Furthermore, suppose that there are constants and such that (34) holds true. Then, upon setting


with  from (38), and with , the following contraction property holds: If the while loop of Algorithm 1 terminates after finitely many steps with , for all , then we have the (linear) contraction property


Moreover, there exists a constant such that


i.e. the error estimators decay at a linear rate.


Given any integers , and corresponding Galerkin subspaces . Then, using Lemma 2.3, with  being replaced by , we have that