A higher-order gradient flow scheme

A higher-order gradient flow scheme for a singular one-dimensional diffusion equation

Bertram Düring Department of Mathematics, University of Sussex, Pevensey II, Brighton BN1 9QH, United Kingdom b.during@sussex.ac.uk Philipp Fuchs Institute for Analysis and Scientific Computing, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Wien, Austria philipp.fuchs@tuwien.ac.at  and  Ansgar Jüngel Institute for Analysis and Scientific Computing, Vienna University of Technology, Wiedner Hauptstr. 8-10, 1040 Wien, Austria juengel@tuwien.ac.at

A nonlinear diffusion equation, interpreted as a Wasserstein gradient flow, is numerically solved in one space dimension using a higher-order minimizing movement scheme based on the BDF (backward differentiation formula) discretization. In each time step, the approximation is obtained as the solution of a constrained quadratic minimization problem on a finite-dimensional space consisting of piecewise quadratic basis functions. The numerical scheme conserves the mass and dissipates the -norm of the two-step BDF time approximation. Numerically, also the discrete entropy and variance are decaying. The decay turns out to be exponential in all cases. The corresponding decay rates are computed numerically for various grid numbers.

Key words and phrases:
Super-fast diffusion equation, Wasserstein gradient flow, minimizing movements, higher-order scheme, G-norm.
2000 Mathematics Subject Classification:
65M99, 35K20.
The last two authors acknowledge partial support from the Austrian Science Fund (FWF), grants P24304, P27352, and W1245

1. Introduction

The aim of this paper is to propose and study a fully discrete higher-order variant of the minimizing movement scheme in one space dimension for the nonlinear diffusion equation


with negative exponent , where is the -dimensional torus. Equation (1) can be written as the gradient flow of the entropy with respect to the Wasserstein distance. The gradient flow formulation gives rise to a natural semi-discretization in time of the evolution by means of the minimizing movement scheme, leading to a minimization problem for the sum of kinetic and potential energies.

Minimizing movement schemes for evolution equations with an underlying gradient flow structure were first suggested by De Giorgi [12] in an abstract framework. Jordan, Kinderlehrer, and Otto [17] have shown that the solution to the linear Fokker-Planck equation can be obtained by minimizing the logarithmic entropy in the Wasserstein space. Since then, many nonlinear evolution equations have been shown to constitute Wasserstein gradient flows, for instance the porous-medium equation [23], the Keller-Segel model [4], equations for interacting gases [10], and a nonlinear fourth-order equation for quantum fluids [14].

The minimizing movement scheme can be interpreted as an implicit Euler semi-discretization with respect to the Wasserstein gradient flow structure. While for some time this has been mainly used as an analytical tool, more recently numerical approximations of evolution equations using this scheme have been proposed. In one space dimension, the optimal transport metric becomes flat when re-parametrized by means of inverse cumulative functions, which simplifies the numerical solution; see e.g. [1, 4, 20, 22]. For multi-dimensional situations, one approach is based on the Eulerian representation of the discrete solution on a fixed grid. The resulting problem can be solved by using interior point methods [8], finite elements [7], or finite volumes [9]. Another approach employs the Lagrangian representation, which is well adapted to optimal transport. Examples are explicit marching schemes [15], moving meshes [6], linear finite elements for a fourth-order equation [13], reformulations in terms of evolving diffeomorphisms [11], and entropic smoothing using the Kullback-Leibler divergence [24]. The connection between Lagrangian schemes and the gradient flow structure was investigated in [19]. In this paper, we will use the Lagrangian viewpoint.

The minimizing movement scheme of De Giorgi is of first order in time only since it is based on the implicit Euler method. Concerning higher-order schemes, we are only aware of the paper [31]. There, second-order gradient flow schemes were suggested for the Euler equations, with finite differences in space and the two-step BDF (Backward Differentiation Formula) method or diagonally implicit Runge-Kutta (DIRK) schemes in time.

In this paper, we propose a fully discrete second-order minimizing movement scheme using quadratic finite elements in space and the two-step BDF method in time. We consider periodic point-symmetric solutions. The finite-dimensional minimization problem, constrained by the mass conservation, is solved by the method of Lagrange multipliers which leads to a sequential quadratic programming problem.

By construction, our numerical scheme is of second order both in time and space, it conserves the mass and dissipates the -norm of the approximation of the quadratic ansatz functions at time step ,


where the matrix is defined in the approximation of the Wasserstein metric on the space of the quadratic ansatz functions. We refer to Section 3 for details. It turns out that numerically, the relative -norm decays exponentially fast to zero. Although we cannot expect for the multistep scheme that the discrete entropy decays exponentially fast, this holds true for the numerical experiments performed in this paper. Furthermore, also the discrete variance of the original variable and the Lagrangian variable decay exponentially fast.

The numerical tests also indicate that the decay rate of the entropy is increasing in the number of grid points, i.e., the discrete decay rates are smaller than the (expected) value for the continuous equation. This result is in accordance with the findings of [21] for a finite-volume approximation of a one-dimensional linear Fokker-Planck equation.

Let us briefly review the literature for the diffusion equation (1) with . Equation (1) with appears in the modeling of heat conduction in solid Helium, where the solution corresponds to the inverse temperature [26]. When this equation is considered in the whole space or in a bounded domain with homogeneous Dirichlet boundary conditions, it is sometimes called the super-fast diffusion equation [28, Chap. 9]. The critical exponent of this equation in one space dimension is . For and , we have a smoothing property, namely for any [28, Section 9.1]. For , no solutions exist with data in . The non-existence range in dimensions contains even all negative exponents, [27]. However, if and , there is a weak smoothing effect. Indeed, given for , the solution exists and is locally bounded in . Furthermore, if then there is instantaneous extinction, i.e.  in for all [28, Theorem 9.3].

Clearly, such results cannot be expected when the super-fast diffusion equation is considered on the torus. We expect that global-in-time weak solutions exist, which converge to the constant steady state as . Since we could not find any results on the existence and large-time asymptotics in the literature in that situation and since the use of negative exponents is less standard, we provide a (short) proof for completeness in the appendix; see Section 2.1 for details.

The paper is organized as follows. The global existence result and the exponential decay of the solutions to the constant steady state as well as some basic properties of the Wasserstein distance for periodic functions are stated in Section 2. Section 3 is devoted to the description of the numerical scheme, and some numerical experiments are presented in Section 4. We conclude in Section 5. The appendix contains the proofs of the existence and large-time asymptotics theorems and the calculations of the coefficients of the matrix and the Hessian of the discrete entropy.

2. Prerequisites

2.1. Existence of solutions and large-time asymptotics

Equation (1) on the torus does not possess the non-existence or instantaneous extinction properties of the super-fast diffusion equation in the whole space since mass cannot get lost. In fact, we expect that for any , there exists a global weak solution. If the initial datum is nonnegative only, (1) is still a singular diffusion equation. However, because of the fast diffusion, the solution becomes positive for all positive times and, by parabolic regularity theory, also smooth.

Theorem 1 (Existence of weak solutions).

Let and let satisfy . If , we assume additionally that . Then there exists a unique weak solution to (1) satisfying , for all , and for , .

The proof of this theorem is based on a standard regularization procedure but we need to distinguish carefully the cases , , and . We present the (short) proof for completeness in Appendix A. For , the (smooth) solution converges to the constant steady state. Since this constant is positive, diffusion slows down when time increases. Therefore, we cannot expect instantaneous extinction phenomena. Still, we are able to prove that the convergence is exponentially fast with respect to the -norm. We introduce the entropy

The steady state of (1) is given by if .

Theorem 2 (Exponential decay).

Let be a smooth positive solution to (1) and let . Then

where and for , ,

and is the constant in the Beckner inequality (19); for , ,

In the first result, the decay rate depends on , which seems to be not optimal. The decay rate in the second result depends on the -norm of only but we need a particular value of . The proof is based on the entropy method; see Appendix A. Stronger decay results have been derived for the fast-diffusion equation in the whole space or in bounded domains; see, e.g., [5, 28]. However, our proof is very elementary and just an illustration for the qualitative behavior of the solutions to (1).

2.2. The Wasserstein distance for periodic functions

We recall some basic facts about mass transportation and Wasserstein distances following [13]. For more details, we refer to [2, 30]. Let be a Riemannian manifold with distance and let be the convex set of measures with fixed mass on . The -Wasserstein distance of two measures , is defined by


where is the set of transport plans connecting with , i.e. the set of all measures on with respective marginals and ,

for all measurable sets , . If the measures and possess densities and , respectively, with respect to a fixed measure on , then we write, slightly abusing the notation, instead of .

In one space dimension, there exists an explicit formula to compute . Let be a (possibly infinite) interval, and let , be two measures. We define their distribution functions

As these functions are right-continuous and monotonically increasing, they possess right-continuous increasing pseudo-inverse functions , given by

Then [29],


This formula does not extend to because of the topology induced by the periodic boundary conditions, . The reason is that mass can be transported either clock- or counter-clockwise (see [13] for details). However, if the densities and are point-symmetric, (4) still holds. More precisely, let for and . Then (4) holds, where is the inverse function of [13, Lemma 2.2].

3. Time discretization and Lagrangian coordinates

3.1. The semi-discrete BDF scheme

We introduce the second-order minimizing movement scheme. First, we explain the underlying idea for the finite-dimensional gradient flow


where is a smooth potential. This equation can be approximated by the following minimization problem:

where is the time step size and is an approximation of . The minimizer is a critical point and thus,

which corresponds to the implicit Euler scheme.

Instead of the Euler scheme, we wish to discretize (1) by a multistep method. As an example, consider the two-step BDF (or BDF-2) method,

where and are given. Writing this scheme as

we see that is a critical point of the functional

More generally, the BDF- approximation of (5),

for given can be formulated as

since , or as the minimization problem


In a similar way, we may formulate general multistep methods. We recall that BDF- schemes are consistent if , and zero-stable if and only if [25, Section 11.5].

The same idea as above is applicable to gradient flows in the -Wasserstein distance. For this, we replace the norm by the Wasserstein distance. For equation (1), scheme (6) turns into


where . Scheme (7) can be interpreted as a BDF- minimizing movement scheme.

3.2. Lagrangian coordinates

Before introducing the spatial discretization, we rewrite the scheme (7) in a more explicit manner, using the inverse distribution functions and of and , respectively, which were introduced in Section 2.2. The numerical procedure is similar to that in [13] but our higher-order scheme introduces some changes. We call the Lagrangian coordinate, which is conjugate to the Eulerian coordinate , and we refer to the inverse distribution function as the associated Lagrangian map. For a consistent change of variables, we need to express the entropy in terms of the Lagrangian coordinates. With the formula for the differential of an inverse function,

and the change of unknowns under the integral in , we obtain

where . Note that the exponent in the integrand is positive since . In terms of , the expression for the Wasserstein distance in (4) becomes (see [13, Section 2.3])

This expression is simply a quadratic form in and thus easy to implement in the numerical scheme. We summarize our results which slightly generalize Lemma 2.3 in [13].

Lemma 3.

Let the initial datum be point-symmetric. Then the solution to the BDF- scheme (7) is in one-to-one correspondence to the solution obtained from the inductive scheme


with initial condition and given obtained from a lower-order scheme. The argmin has to be taken over all measurable functions satisfying the mass constraint , and the function is given by


Moreover, and the functions and are related by


In Lagrangian coordinates, the problem has become a minimization problem in which is the sum of a quadratic form and a convex functional, hence it is convex. In the special case , the second integral in (9) is quadratic too which simplifies the numerical discretization. Therefore, we will consider mainly numerical examples with in Section 4.

3.3. Spatial discretization

We approximate the infinite-dimensional variational problem (8) by a finite-dimensional one. Minimization in (8) is performed over the finite-dimensional space of quadratic ansatz functions. This generalizes the approach in [13], where only linear ansatz functions were used. We define the ansatz space as follows.

Let and a mesh on be given with and . Using (10), we construct the mesh of . Then , , , and (point-symmmetry), where . Since we wish to introduce quadratic ansatz functions, we add the grid points for .

The basis functions are defined by

This set of piecewise linear functions is supplemented by the following piecewise quadratic basis functions:

The ansatz space is the set of all positive, piecewise quadratic functions of the form


We call the associated weight vector. By definition of , we have for . Moreover, as is point-symmetric, .

Now, for given mass and grid , we define the set as the set of weight vectors for which the associated interpolation from (11) satisfies the mass constraint,


The Wasserstein metric for functions approximated in this way becomes



The coefficients , , and can be computed explicitly. The explicit expressions are given in Appendix B.1. Setting , , , and defining the matrix


we can formulate the above sum as

As the matrices and are symmetric, is symmetric, too. The matrix corresponds to the linear approximation considered in [13].

3.4. Minimization

The numerical scheme consists of the following finite-dimensional variational problem:


and where ,


The functions and from (9) are related by with a piecewise quadratic function defined from by (11). Since is convex for and the set is convex, there exists a unique minimizer of (15).

3.5. Fully discrete Euler-Lagrange equations

The minimizer in (15) is subject to the mass constraint (12), by definition of the set . Therefore, instead of working on the set , it is more convenient to consider (15) as a constrained minimization problem for on the larger set , which is solved by the method of Lagrange multipliers using the Lagrange functional

A critical point of satisfies the conditions

The precise values for are given in Appendix B.2 for . The condition for the constraint is recovered from

The vector is the gradient of with respect to . We approximate a critical point numerically by applying the Newton method to the first-order optimality condition . This leads to a sequential quadratic programming method, since at every Newton iteration step a quadratic subproblem has to be solved.

3.6. Implementation

Let the solution at the th time step be given and let , . The iteration is as follows:

where is the solution to the linear system

where denotes the Hessian of , whose entries are given in Appendix B.2 for . The iteration is stopped if the norm of is smaller than a certain threshold (see Section 4 for details). In this case, we define and at the th time step. For the BDF- scheme, the values are computed from a lower-order scheme. In the numerical section below, we employ the BDF-2 scheme only such that is calculated by the implicit Euler method.

Note that the constrained minimization problem is exactly mass conserving by construction, but the Newton iteration introduces a small error which depends on the tolerance imposed in the Newton method.

In each iteration step, we need to invert the dense matrix which is the sum of and the Hessian of . This is not a numerical challenge in the one-dimensional case we consider but it may become critical in multi-dimensional discretizations on fine grids. For , however, and the Hessian of are constant matrices which significantly simplifies the Newton scheme.

3.7. Choice of the initial condition

In order to compute the initial condition in Lagrangian coordinates, we need to make precise the values of the vector and the points of the spatial lattice which is moving as the solution evolves. Let the mesh be given and set . Approximating the initial function by a linear ansatz function, we obtain


This is a system of linear equations in . Choosing the uniform grid , we can solve this system explicitly. Indeed, since

which can be solved for :

As is assumed to be point-symmetric, this is true for too. Finally, we approximate by the arithmetic mean , . Consequently, the weights vanish for in the expansion , which is consistent with our approximation (17). We also refer to the discussion in [13, Section 2.8].

4. Numerical experiments

In this section, we present some numerical results for (1) with , by employing the BDF-2 method with quadratic ansatz functions. We choose a uniform grid for with grid points, and the time step size . The Newton iterations are stopped if both the relative error in the -variables and the norm of are smaller than .

Figure 1 illustrates the temporal evolution of the solution with the initial conditions (left figure) and (right figure). We observe that, as expected, the solutions converge to the constant steady state. Because of the negative exponent , very small initial values increase quickly in time.

Figure 1. Time evolution of the solution to the diffusion equation (1) with for two different initial conditions.

A nice feature of the Wasserstein gradient flow scheme is that we may interpret the evolution as a process of redistribution of particles with spatio-temporal density on under the influence of a nonlinear particle interaction, which is described by . The way in which the initial density is “deformed” during the time evolution is illustrated in Figure 2. We have chosen 50 “test particles” for the solutions to (1) for the initial conditions chosen above. We stress the fact that the density of trajectories can generally be not identified with the density of the solution.

Figure 2. Particle trajectories in the Wasserstein gradient flow scheme, corresponding to the solutions of Figure 1 with .

We verify that the discretization is indeed of second order. Figure 3 shows the -error for various numbers of grid points . We have chosen the initial datum , the end time , and the time step size . The reference solution is computed by using , and . The differences and in the norm feature the expected second-order dependence on .

Figure 3. -error for (left) and (right) at for various numbers of grid points.

Next, we fix the number of grid points and compute the error for varying time step sizes ; see Figure 4. Because of the approximation of the initial datum as detailed in Section 3.7, the error will be not of second order initially. Therefore, we compute the error in the interval with . The errors are of second order, as expected.

Figure 4. -error for (left) and (right) for various time step sizes, with .

The time decay of the discrete version of the relative entropy is presented in Figure 5 (left) for various grid numbers. We observe that the decay is exponential until saturation. The saturation comes from the spatial error and the error from the Newton iteration. The decay rate is estimated in the linear regime from the difference quotient

Figure 5. Discrete relative entropy versus time with for various .

The numerical decay rates for are shown in Figure 6. The rate for (right) is much larger than the corresponding one for (left), since a smaller exponent yields a larger diffusion coefficient (if ) and thus, diffusion becomes faster. We also see that the decay rates become larger on a finer spatial grid. This behavior seems to confirm recent analytical results for spatial discretizations of Fokker-Planck equations; see [21, Section 5]. One may ask if a similar behavior can be observed for the decay rate as a function of the time step size. However, our numerical experiments do not show a monotonic dependence (figures not presented); rather the decay rates vary in a small range which seems to be determined by the other numerical error parts.

Figure 6. Estimated decay rate of the entropy versus number of grid points for (left) and (right).

In Figure 7, the decay of the square of the relative -norm is presented. The -norm is calculated according to (2), where the argument is given by and is the weight vector corresponding to the constant steady state. Again, the decay is much faster for because of the faster diffusion.

Figure 7. Relative -norm versus time for various grid numbers (left) and for two values of the exponent (right).

Finally, we present some results on the time decay of the discrete variance of and at time , defined by

where is the expectation value of (which equals the mass and is therefore constant in time). The discrete variance of is defined in a similar way. Interestingly, the variances are exponentially decaying (Figure 8), although it is not clear how to prove this property analytically.

Figure 8. Discrete variance of (left) and (right) versus time for various grid numbers .

5. Conclusion

We have proposed an extension of the standard minimizing movement scheme to higher-order BDF time discretizations. Quadratic finite elements have been used to discretise in (mass) space. As an example, we have considered a (singular) diffusion equation, but it should be possible to adapt the scheme to other partial differential equations which constitute Wasserstein gradient flows. The implicit numerical scheme is based on successive solution of constrained minimization problems. The quadratic convergence in space and time of our method has been confirmed numerically. It turns out that the relative entropy, -norm, and variance converge exponentially to zero. An interesting observation is that the decay rate depends monotonically on the number of grid numbers, but it seems that there is no monotonic behavior with respect to the time step size. Future work may be concerned with an analytical derivation of numerical decay rates using a discrete variant of the Bakry-Emery approach. First steps in this direction were presented in [21] but only for a semi-discretization. Possibly this approach has to be adapted to -stable schemes in the spirit of [18].

Appendix A Proofs of Theorems 1 and 2

Proof of Theorem 1.

The existence proof is based on a regularization of the initial datum and a fixed-point argument. Let and . Let and , Set

where will be determined later. The set is convex and, by Aubin’s lemma, compact in . Let and let be the weak solution to


This defines the fixed-point operator , . Standard arguments show that is continuous. We verify that . By the maximum principle, . Using as a test function in the weak formulation of (18) shows that , where is some constant depending on . Moreover, . Thus, setting, , we infer that . By the fixed-point theorem of Schauder, there exists a fixed point of .

In order to perform the limit , we need to derive -independent estimates for . To this end, we need to distinguish several cases. First, let . Employing the test function in the weak formulation of (18) with , we find that

The right-hand side is uniformly bounded as we assumed that . Since