Pseudo-time regularization methods

Pseudo-time regularization for PDE with solution-dependent diffusion

Sara Pollock Department of Mathematics and Statistics
Wright State University
Dayton, OH 45435
September 17, 2019

This work unifies pseudo-time and inexact regularization techniques for nonmonotone classes of partial differential equations, into a regularized pseudo-time framework. Convergence of the residual at the predicted rate is investigated through the idea of controlling the linearization error, and regularization parameters are defined following this analysis, then assembled in an adaptive algorithm. The main innovations of this paper include the introduction of a Picard-like regularization term scaled by its cancellation effect on the linearization error to stabilize the Newton-like iteration; an updated analysis of the regularization parameters in terms of minimizing an appropriate quantity; and, strategies to accelerate the algorithm into the asymptotic regime. Numerical experiments demonstrate the method on an anisotropic diffusion problem where the Jacobian is not continuously differentiable, and a model problem with steep gradients and a thin diffusion layer.

Key words and phrases:
Adaptive methods, nonlinear diffusion, quasilinear equations, pseudo-time, Newton-like methods, inexact methods, regularization.

1. Introduction

This paper is concerned with the finite element approximation to second order quasilinear elliptic equations in divergence form,


for polygonal doman . Nonlinear diffusion problems are ubiquitous throughout science and engineering applications, appearing in applications such as heat conduction, groundwater flow, diffusion of contaminants and flow in porous media [5, 8, 15, 19]. Here, the nonlinear diffusion coefficient may be thought of as a scalar quantity, or in the more general anisotropic case as a matrix coefficient with entries , where (1.1) has the expansion


with the ellipticity condition: For some


for any , and all .

As remarked in [19], while (1.1) for scalar-valued may be solved by the Kirchhoff transform (see, e.g., [8]), this technique does not carry over to the anisotropic case, or to lower order solution-dependent terms. The nonlinear diffusion problem (1.1) is generally in the class of nonmonotone problems; that is , is not guaranteed to hold for each is the solution space, e.g., . While convergence and optimality of finite element methods for monotone classes of quasilinear problems have been recently investigated in [3, 17], and the references therein, nonmonotone classes of problems are less understood. In particular, convergence results rely on sufficiently fine global mesh conditions, and sufficiently close initial guesses to assume convergence of Newton-iterations to solve the discrete nonlinear problem [4, 11, 18, 21]. Other recent work [14] includes this problem class in an adaptive framework of incomplete linear and nonlinear solves, but with the implicit assumption that the sequence of solution iterates is convergent to the solution. As described in previous work by the author in [25, 26, 27], the Newton iterations cannot realistically be assumed to converge, and are often observed to diverge, especially for problems of the form (1.1) which may contain steep gradients and thin internal layers in the solution-dependent diffusion coefficient . The interest in the current investigation is to develop a regularized adaptive method and understand the residual convergence of the discrete nonlinear problem without these assumptions. It is of particular interest to allow a solution process to start on a coarse mesh which is refined adaptively, to uncover an efficient and accurate discretization. Future work will directly address convergence of the discrete solution to the weak solution of (1.1).

The weak form of (1.1)-(1.2) is given by: Find such that


for solution space and test space , with


Based on the analysis of [12, 19, 29], the following set of conditions in addition to the uniform ellipticity (1.4) is sufficient to assure existence and uniqueness of the weak solution of  (1.5).

Assumption 1.1.

Assume the data satisfy the following boundedness and Lipschitz conditions.

  1. Boundedness of the diffusion coefficient

  2. Boundedness of the source: satisfies

  3. Lipschitz continuity of the diffusion coefficient


For scalar-valued , the above should be interpreted with , and .

A finite dimensional, or discrete problem corresponding to  (1.6), is given by: Find such that


where and are finite dimensional subspaces of the solution and test spaces. For the remainder of this paper, and are assumed subsets of . The trial and test spaces , now referred to as , are taken as the continuous finite element spaces of polynomials of degree over each mesh element, corresponding to nested mesh partitions, , which are conforming in the sense of  [6].

In the current discussion, a Newton-like method is applied so solve the discrete nonlinear equations induced by (1.10). The advantages of a Newton-like approach include fast convergence in the asymptotic regime, and the mesh-independence principle, as in for instance  [10]. To understand the convergence of sequence of linear equations used to approximate the solution of the discrete nonlinear problem, we require some control of the Jacobian. For this reason, the following conditions on the problem data are considered, in addition to those for well-posedness of the PDE, namely, the ellipticity condition (1.4) and Assumption 1.1.

Assumption 1.2 (Assumptions on the problem data).

The following assumptions are made on the diffusion coefficient , componentwise, .

  1. is bounded. In particular

  2. satisfies the Lipschitz condition

Remark 1.3 (Standard problem classes).

The two existing convergence results for adaptive methods for nonmonotone problems of the form (1.1) are developed in  [21] using continuous Galerkin, in particular linear elements; and, [4] using discontinuous Galerkin methods.

The results of [21] are adapted from the analysis of uniform methods in [5]. Their approach assumes , regularity of the solution , and bounded and for all . The results of  [4] are adapted from the analysis of uniform methods in [18], based on the results of [11]. For these results, it is assumed that the solution , and is twice continuously differentiable with bounded and for all .

Both frameworks explicitly assume a uniformly small initial meshsize and either implicitly or explicitly assume the convergence of the iterative method used to solve the discrete nonlinear system on each refinement.

The following notation is used throughout the paper. Where not otherwise specified, the norm denotes the norm. The integral is sometimes denoted ; and, denotes a Euclidean product between vectors and .

The remainder of the paper is structured as follows. Section 2 describes a framework for pseudo-time regularization, exploiting the quasi-linear structure of (1.5), and specifies the regularized system under inexact assembly. Section 3 derives the expansion of the latest residual in terms of the previous residual, exposing the regularization and linearization error terms. Section 4 suggests a set of regularization parameter updates based on the representation of Section 3, together with an adaptive algorithm. Finally, Section 5 demonstrates the ideas with some numerical experiments.

2. Regularized formulation

The regularization framework is next described. First, an error decomposition is discussed to separate out the contributions to the error in each iterate; that is, each solution to a linearized problem used to approximate the solution to (1.5). The contributions to the error include those induced from added regularization, linearization, early termination of the iterations, inexact assembly, and discretization.

Then, in Section 2.2, a pseudo-time regularized iteration is described and fit into this framework. First, the pseudo-time regularization is introduced in a general sense, then the regularized iteration is derived from a linearization of the abstract formulation of the discrete problem (1.10). Section 2.3, then introduces notation for the inexact assembly of the discrete problem, leading to the regularized iteration in matrix form. The steps are separated out in this presentation to emphasize the relation between the practical inexactly assembled equation that is actually solved computationally, and the abstract formulation of the problem that is thought of on the PDE level.

2.1. Error decomposition

The goal of the numerical method, finite element or otherwise, is to approximate the PDE solution; in this case, the solution to (1.5). The error may be understood by breaking it into components describing the discretization error, quadrature error, and so-called linearization or nonlinear iteration error. In the following notation, represents the (or a) solution to (1.5); and denotes the solution to the discrete nonlinear problem on the refinement of the initial mesh partition, under exact assembly. Neither nor are generally computable quantities. The iterate is the solution up to some set tolerance of the discrete problem using a numerical and potentially inexact assembly procedure; may or may not be computable. In terms of the inexactly assembled problem, is the terminal iterate, potentially before convergence to tolerance, and is the iterate on the refinement.


The frameworks developed in [4, 21] show convergence to zero of the first term of (2.1), for a restricted class of problems of the type (1.1) (see Remark 1.3), under the assumptions of a sufficiently small meshsize. These asymptotic results motivate the current work, by demonstrating the approximation properties of a finite element solution to the PDE solution; however, the goal now is to develop a computational framework for which still holds, and by which can be approximated by a computable sequence.

To that end, a regularized iteration is introduced starting on the initial, presumably coarse, mesh. The expansion of the error incorporating the introduced regularization breaks the last term of (2.1) into two new terms. Let be the terminal iterate of the regularized problem on refinement . For ease of notation, the indices on are suppressed, however it should be understood that is subject to regularization ; and likewise, on iteration of refinement , the iterate is subject to regularization .


The regularization addressed in this paper can be broken into two parts: a Jacobian regularization and a residual regularization term. Then the regularized Newton-like iteration described in the following sections can be written in terms of a standard Newton iteration for this problem with Jacobian and residual , , by


Denoting , eliminating the regularization from the iteration can be described as sending .

Control of the last term of (2.2), requires assumptions on the PDE, for instance Assumption 1.2, as well as assumptions on the regularization; essentially, the sequence of regularized linear problems must be sufficiently stable.

Due to the regularization structure (2.3), control of the fourth term is established by for all , for some iteration . If the problem (1.5) is sufficiently well-posed, then the regularized iteration should limit and indeed revert to a standard Newton-iteration on each refinement after some level . This type of result is detailed for a similar regularized Newton-like method in [27]. It is noted however that sending restores consistency of the iteration. If a problem features a potentially indefinite or badly conditioned Jacobian in the vicinity of a solution, it may be beneficial not to send to zero, rather to keep some background level of Jacobian regularization that does not interfere with the convergence of the iteration.

The third term of (2.2) is the difference between the terminal iterate and one converged to tolerance. This term is zero for larger than some , where may be close and potentially equal to , under a suitable residual-reduction condition for terminating the nonlinear iterations on each refinement level . In the early stages of the solution process however, the nonlinear iteration may be stopped far from convergence, and the difference may be non-negligible.

The second term of (2.2) describes the error induced by inexact integration. With the potential of a highly oscillatory diffusion coefficient in (1.1), this error term is not automatically assumed to be small as it is in [4, 21] and the references therein, where the assumption of a sufficiently small global meshsize can control the level of accuracy. Generally, information can be lost both by the averaging of integration against basis functions, and by the secondary averaging of approximate integration by quadrature. Finally, the first term constitutes the discretization error, the difference between a solution to (1.5) and the solution of the exactly assembled discrete nonlinear problem on refinement . The analysis of these first two terms, determining convergence of the error, is beyond the scope of this paper, which analyzes efficient convergence of the residual. The conditions under which converges to for greater than some will be discussed elsewhere.

Remark 2.1 (Inexact linear solves).

Solving the linear equations by an iterative method yields yet another term in the expansion. For linear iteration , the error between the PDE solution and iteration , of regularized linear solve , on mesh refinement , may be decomposed into


The last error term is not addressed in the current paper, and the linear systems are assumed solved exactly. It is noted however that incomplete linear solves can be exploited for both their regularization properties and efficiency, and this topic is worth investigation.

2.2. Regularized abstract formulation

The regularized linear equations compatible with error decomposition (2.2) are now derived through a pseudo-time discretization framework with respect to the abstract discrete problem (1.10). The pseudo-time framework developed in for instance [2, 7, 22], the book [9], and the references therein, suggests to stabilize the solution of the elliptic equation , introduce the pseudo-time dependence to , and solve . To allow a preconditioned or more general regularized framework, the discrete nonlinear pseudo-time regularized problem is: Find such that


where . Here, it is assumed that the bilinear form is continuous with respect to the native norm , in this case taken to be the norm.


The continuity assures decays to zero as indicating a steady state solution. Rather that coercive, is assumed semi-definite


This allows for a more general regularization, as used in [25, 26], in which a cutoff function is used to only allow the regularization to act on select degrees of freedom. Upon matrix assembly, the role of can be viewed as improving the condition of the approximate Jacobian, and it should be chosen with this in mind. In the numerical experiments of Section 5, two different choices of are illustrated. The first sets , for the initial solution iterate on each refinement, thus adding more regularization locally to control steeper gradients in the diffusion. The second uses , the standard Laplacian preconditioner, adding a uniform level of diffusion to stabilize the Jacobian. The regularization functional is left in general form for the remainder of the analysis to emphasize that these are two of many choices.

A generalization of the Newmark time integration strategy  [24], exploiting the structure of the quasilinear equation (1.1) is now introduced to discretize (2.5) in pseudo-time


where and . Linearizing the second term on the right, and rewriting the third to isolate the dependence on yields


where , the Gateaux derivative in the first argument of , in the direction . Rearranging (2.2) so that all terms involving the update step appear on the left, and rescaling by yields


with the four regularization coefficients given by


The coefficient is then the rescaled reciprocal of the pseudo-time step, and , corresponds to . It is remarked that corresponds to the method discussed in [26], where this parameter is taken greater than one to introduce an increase in numerical dissipation, or controlled damping, into the iteration. Further, corresponds to an implicit, or backward Euler discretization of (2.5); while , corresponds to an explicit, or forward Euler discretization of (2.5). And finally, and leads to a Picard iteration. It is also recognized that yields a consistent pseudo-time discretization.

2.3. Inexact assembly

Both the finite dimensional equation (1.10), and the pseudo-time regularized (2.5), are abstract equations rather than computable systems. To clarify which quantities are assumed computationally available, the following notation is introduced to describe the discrete system induced inexact assembly, e.g by quadrature, which may be assumed inexact for nonpolynomial integrands. Let be a discrete space, here a finite element space, with degrees of freedom, spanned by the basis functions . Supposing , each function has an exact expansion as a linear combination of basis functions; in particular with the vector of coefficients . Let be an inexact assembly of , with an error introduced by inexact integration, e.g., quadrature error. The source vector is formed by the inexact integral of source function against each basis function . The matrix assembly of the regularization is denoted . Let represent the inexact assembly operator. The assembled systems under are denoted as follows.


The following commuting diagram holds for the discrete assembly procedure given by (2.12)-(2.13), with . That is, the inexact assembly operator commutes with the Gateaux derivative of the first argument of .


This justifies the use of Taylor’s theorem in the error representation of the residual in Section 3. In general, the Gateaux derivative commutes with projection-type discretizations; see for example [20]. This includes assembly under inexact integration, assuming the integral approximation over each element falls in the general form , for points in the interior of element , and weights .

2.4. Regularized matrix equations

Processing the linear pseudo-time regularized equation (2.10) with the inexact assembly given by (2.12)-(2.16), yields the coefficients of the update step as the solution to a linear system of equations.


which may be written as


with the update .

With respect to (2.3), the formal representation of the regularization structure, the Jacobian part , and the residual part , of the regularization are given by


Consistency is restored by sending regularization parameter . Asymptotic efficiency is restored by sending , and , although this asymptotic efficiency may be at least partially sacrificed for stability, even into the asymptotic regime, where the iterations converge to tolerance. This balance is understood in the next section where the residual representation exposes the error contributions from regularization and linearization. So long as the regularization effectively controls the linearization error without increasing the norm of the residual, it is viewed as beneficial.

3. Residual representation of the matrix equation

The residual representation follows the standard method of applying Taylor’s theorem to expand the residual about the residual, justified by the commuting diagram (2.3). This exposes the separate terms from the introduced regularization error and the intrinsic linearization error. The linearization error is bounded by a Lipschitz assumption on the problem data (1.12), although approaches with more general assumptions such as a majorant condition have also been developed for Newton iterations [16], and would be interesting to investigate in the present context. Unlike previous presentations by the author [25, 26, 27], here the structure of the quasilinear problem is exploited to separate the linear and nonlinear dependencies on the latest iterate . A choice of regularization parameters in then introduced in the context of minimizing an appropriate quantity to control the linearization error.

3.1. Residual representation under inexact integration

Expanding the residual about yields




Solving (2.19) for yields


where the floating-point arithmetic error is introduced from the solution of the linear system for coefficients . Applying (3.4) to (3.1) yields


Here describes the dominant term in the linearization error, and the secondary term, whose linear component is . The total linearization error is defined as


which agrees with the definition of the one-step linearization error given in [27]. The convergence of the residual then follows from control over the linearization error, assuming the floating point error is sufficiently negligible. A remark about this terms follows.

Remark 3.1 (Floating-point error).

The last term in (3.5), , denotes the floating point error, which cannot be controlled by the linearization, but neither can it be entirely ignored. It can be estimated, for instance by the difference between two evaluations of the linearization error , one by (3.6), and the other by isolating in (3.5). In the preasymptotic and coarse mesh regimes, where the iterate is sufficiently far from the solution, the floating point error, observed in the numerical experiments in Section 5 remains on the order of to . However approaching the asymptotic regime as approaches or , the linearization error is no longer observed to be , even where analytically it should be. In this regime the linearization error is , due to the pollution from the floating-point error. In terms of practical impact on a computational method such as the one described here, limits the regime where the convergence rate can be accurately detected. This is immaterial, so long as detecting that convergence rate is no longer necessary once, for instance, , where is a set tolerance.

The control of the right-hand side linearization error is left to the choice of regularization terms and . It is remarked that does not necessarily need to be second order with respect to for convergence of the method: it only needs to be small enough not to interfere with the convergence rate.

Local convergence theory for Newton-like methods describes the convergence of the iterates in a neighborhood near the solution, and is addressed for regularized pseudo-time algorithms by the author in previous work  [25, 26, 27], based in part on the analysis of [2, 7, 22] and [9]. In practice, however, the predicted convergence of rate of iteration (2.19)-(2.21) is often oberved from the first few iterations without a particularly good initial guess. Here the goal is to characterize the convergence rate when the iterate is not sufficiently close to the solution of .

Lemma 3.2 (Convergence rate far from the solution).

Consider iteration (2.18) applied to the problem . Assume the regularization parameters, and , satisfy the following properties.




Moreover, if it holds that


then the iteration (2.18) converges within tolerance of the predicted rate .


The bound (3.9) follows directly from applying hypotheses (3.7) and (3.8) to the residual representation (3.5). ∎

Ultimately, residual convergence at the predicted rate comes down to whether the linearization error can be controlled. The following discussion investigates when this is computationally reasonable. Based on Assumption 1.2 on the problem data , there exist positive constants and with


Applying (3.11)-(3.12) to the linearization error given by (3.6), one obtains


Then from iteration (2.19)


While estimate (3.14) is true, and it illustrates the role of as a damping parameter, it may greatly overestimate the linearization error and is not useful as a predictor of when (3.10) will hold.

The source of the overestimate in this context is allowing for the maximum Lipschitz constant and bound on to be achieved uniformly over the domain. Standard adaptive finite element methods are known to perform well with relatively few local high contrast heterogeneities or singularities, but are not necessarily appropriate for globally high contrast domains or coefficients, so it makes sense to understand how local high contrast can effect the convergence. Writing in terms of the inexact assembly operator


Freezing the analysis about the iterate , expression (3.1) suggests partitioning into , where milder bounds than and are realized, and , where these bounds are locally attained. Rewriting (3.1) in terms of a partition




From the data Assumption 1.2 and the decomposition (3.16)-(3.1), for each partition of the domain into and there is a smallest constant with


where . With this structure in place, it follows that the condition (3.10) holds if there is a partition for which


for a given .

The ratio is related to the condition of the approximate Jacobian given by (2.20), and is explicitly dependent on the parameters and , as well as implicitly dependent upon . A large parameter can clearly control the scale of the linearization error at the start of the adaptive algorithm, and if the steep gradients are bounded away from zero, a small scaling parameter can control . However, to attain convergence of the residual with , in some computationally available neighborhood of the solution to , the measure of the set on which a large Lipschitz constant and bound on the first derivative of is realized must be relatively small. Otherwise, may need to be small enough that it is computationally infeasible to find the basin of attraction.

A choice of regularization parameters is next described with respect to the numerically assembled iteration (2.19). In particular, the parameter guiding the Picard-like regularization is based on the condition (3.8); and, the Tikhonov-like regularization scaled by is based on the condition (3.7), from Lemma (3.2).

4. Regularization parameter updates

A set of regularization parameters , and is now presented, along with a discussion of their properties. The definition of is consistent with given in [27], as is the definition of . A different definition of the parameter is given here, than in [25, 26, 27], and the parameter has not been previously introduced. It is noted, however, that effectively adds diffusion to the linearized system by adding a Picard-like term to the Newton-like iteration. In [26], the parameter adds a frozen Newton-like iteration to stabilize the approximate Jacobian, essentially preventing small eigennvalues from changing sign at each step. The new parameter performs a similar role, but is more amenable to analysis, and appears to perform better in numerical experiments.

4.1. Update of numerical dissipation,

The first order regularization error is controlled by the numerical dissipation parameter, i.e., the Newmark parameter, . The definition used here is recalled from [27], Definition 4.3; and framed in the context of an minimization as follows. Rewriting (3.5) by moving the residual terms involving to the left-hand side and taking the norm of both sides of the resulting equation

An updated value of is chosen to minimize the norm on the left, namely

The update of is then defined by


for a user-set parameter , with .

The purpose of introducing the parameter is to enforce monotonicity of the sequence of parameters to one at a given rate. As shown in [27], if is updated when the residual reduction satisfies the following condition, then there is a critical value , after which is assured to reduce at a linear rate. Those results are included in the following more general lemma, which features a condition on the direction cosine of consecutive residuals to determine predictable reduction of .

Let be the value of , on iteration of refinement . For simplicity of notation, will be denoted as . For the update (4.1) to remain bounded under the conditions that follow, the parameters , the maximum allowed value of , and , the rate tolerance used to determine whether should be updated, are now introduced to satisfy the following condition.

Condition 4.1.

The adaptively-set regularization parameter , and the user-set parameters and must satisfy the relation


The next condition, which is the same as Condition (2) of Criteria (4) in [27], gives a necessary criterion for update of in order to establish the monotonicity result below.

Condition 4.2 (Condition for the update of ).

Given a rate tolerance satisfying Condition 4.1, the ratio of consecutive residual norms must satisfy


Then, the following result on the monotonicity of the update holds. This next lemma generalizes the result Corollary 4.7 of [27], which establishes the decrease in , as updated by (4.1) for small enough with respect to parameters and . For practical purposes, however, one may want to start the computation with a larger value. The following result characterizes the decrease based on the direction cosine of consecutive residuals, where the direction cosine is given by

Lemma 4.3 (Preasymptotic decrease of ).

Given a fixed parameter , a number , and a rate tolerance satisfying Condition 4.1, if is computed by (4.1), specifically


upon satisfaction of Condition