# Implicit Neural Solver for Time-dependent Linear PDEs with Convergence Guarantee

###### Abstract

Fast and accurate solution of time-dependent partial differential equations (PDEs) is of key interest in many research fields including physics, engineering, and biology. Generally, implicit schemes are preferred over the explicit ones for better stability and correctness. The existing implicit schemes are usually iterative and employ a general-purpose solver which may be sub-optimal for a specific class of PDEs. In this paper, we propose a neural solver to learn an optimal iterative scheme for a class of PDEs, in a data-driven fashion. We achieve this goal by modifying an iteration of an existing semi-implicit solver using a deep neural network. Further, we provide theoretical proof that our approach preserves the correctness and convergence guarantees provided by the existing iterative-solvers. We also demonstrate that our model generalizes to a different parameter setting than the one seen during training and achieves faster convergence compared to the semi-implicit schemes.

## 1 Introduction

Time-dependent partial differential equations (PDEs) are an essential mathematical tool to describe numerous physical processes such as heat transfer, wave propagation, quantum transport, and tumor growth. Solving the initial-value problem (IVP) and boundary-value problem (BVP) accurately and efficiently is of primary research interest in computational science. Numerical solution of time-dependant PDEs requires appropriate spatio-temporal discretization. Spatial discretization can be cast as either finite difference method (FDM) or finite element method (FEM) or finite volume method (FVM), whereas temporal discretization relies on either explicit, implicit or semi-implicit methods. Explicit temporal update rules are generally a single or few forward computation steps, while implicit or semi-implicit update rules, such as Crank-Nicolson’s scheme, resort to a fixed-point iterative scheme. Small time and spatial resolution facilitate a more accurate solution, however, it increases computational burden at the same time. Moreover, maximally allowed spatio-temporal resolution is not only constrained by the desired accuracy but also limited to numerical stability criteria. Note that implicit and semi-implicit methods offer a relaxed stability constraints (sometimes unconditionally stable) at the expense of an increased computational cost caused by the iterative schemes.

In recent times, deep neural networks raissi2019physics have gained significant attention by numerical computation community due to its superior performance in solving forward simulations magill2018neural and inverse-problems ezhov2019neural. Recent work by Tompson et al. tompson2017accelerating shows a data-driven convolutional neural network (CNN) can accelerate fluid simulation compared to traditional numerical schemes. Long et al. long2017pde shows that learned differential operators can outperform hand-designed discrete schemes. However, on the contrary to the well understood and theoretically grounded classical methods, the deep learning-based approaches rely mainly on empirical validity. Recently, Hsieh et al. hsieh2018learning develop a promising way to learn numerical solver while providing a theoretical convergence guarantee. They demonstrate that a feed-forward CNN, which is trained to mimic a single iteration of a linear solver, can deliver faster solution than the handcrafted solver. Astonishingly for time-dependent PDEs, the temporal update step of the previously proposed neural schemes relies on an explicit forward Euler method and none of them is capable of making use of the advanced implicit and semi-implicit methods. This limitation restricts the general use of neural architectures in solving time-dependent PDEs.

In this paper, we introduce a novel neural solver for time-dependant linear PDEs. Motivated by hsieh2018learning we construct a neural iterator from a semi-implicit update rule. We replace a single iteration of the semi-implicit scheme with a learnable parameterized function such that the fixed point of the algorithm is preserved. To leverage the theoretical guarantees we perform data-driven learning to enhance the convergence speed. As a result, our approach provides: (i) theoretical guarantees of convergence to the correct stationary solution, (ii) faster convergence than existing solvers, and (iii) generalizes to a resolutions and parameter settings very different from the ones seen at training time.

## 2 Methodology

In this section, we first introduce the necessary background on the semi-implicit method for time-dependant linear PDEs and subsequently describe our proposed neural solver.

### 2.1 Background

In the following, we consider the general IVP form of time-dependant linear PDE for the variable of interest within the computation domain , w.r.t. the spatial variable and temporal variable , subject to Dirichlet boundary condition at the boundary

(1) |

is a linear operator and can be discretized as , with uniform spatial discretization step and PDE parameter set , where is a diagonal matrix comprising the coefficients of the differential operator of order . Let’s call as for simplicity. A first order semi-implicit update rule to get from (with time step ) is given by

(2) |

To obtain one needs to solve the following linear system of equations

(3) |

where is independent of and is the central element of the central difference discretization of operator . Note that for central difference scheme, is real, zero-diagonal, and either circulant or skew-circulant matrix. One can use an iterative scheme to compute from an arbitrary initialization on the right-hand-side of Eq. 3. For simplicity of notation, we refer to as , and, we drop the subscript of and use a superscript to denote a single iteration at a time . We enforce Dirichlet boundary condition using a projection step with a binary boundary mask .

(4) |

Eq. 4 can be seen as a linear operator . We can guarantee the spectral radius of the linear transformer , i.e. , by appropriately selecting [see Appendix A], leading to a fixed-point algorithm.

### 2.2 Neural Solver

We propose the following iterator using similar notation as in hsieh2018learning

(5) |

where , and is a learned linear operator which satisfies for . Substituting in Eq. 5 we get , where denotes the additive part which is independent of , and .

###### Lemma 1.

For a linear PDE problem and choice of if is a fixed point of it is a fixed point of in Eq. 5.

###### Proof.

If satisfies then, . Therefore the iterative rule in Eq.5 becomes, ∎

Additionally, if then . Furthermore, if , then since

(6) |

which is equal to two iterations of . Since computing requires two separate convolutions: i) , and ii) , one iteration of computes same order of convolution operations as two iterations of . This shows that we can learn a set of such that our iterator performs at least as good as the standard solver . In the following theorem, we show that there is a convex open set of that the learning algorithm can explore.

###### Theorem 1.

For fixed , and the spectral norm of is a convex function of and the set of such that the spectral norm of is a convex open set.

###### Proof.

See Appendix B. ∎

On a stark contrast with previous work hsieh2018learning, we have several sets of parameters , and attached to the PDEs governing equation. Although we train on a single parameter settings, the model posses a generalization properties, which we show in the following.

###### Proposition 1.

For fixed {, and , and some , and , if is valid iterator for the PDE problem , then for all and the iterator is valid iterator for the PDE problem if .

###### Proof.

See Appendix B. ∎

## 3 Experiments

We consider a 2-D advection-diffusion equation of the following form

(7) |

where and are advection velocity and diffusivity respectively. We minimize the following loss function

where is the number of time-step and iteration for a single time step is denoted as .

Data Generation: We consider a rectangular domain of . Elements of and are drawn from a uniform distribution of and respectively. The computational domain is discretized into 64 x 64 regular mesh. We assume zero Dirichlet boundary condition and the initial value is generated according to long2017pde as where and are drawn from a normal distribution of , and, and are values drawn in random from a uniform distribution of . We generate 200 simulations each having 50 time steps, using FEniCS logg2012automatedalnaes2015FEniCS for . Further, we take the train, test, and validation split of the simulated time series as . A time series of a test data is shown in Fig 1.

Implementation Details: Following hsieh2018learning, we use a three-layer convolutional neural network to model each of the . We use zero-padding to enforce zero Dirichlet condition at the boundary and use a kernel size of 3x3. During training, we fixed the following parameters as follows . We use the PyTorch framework and train our network with Adam Optimizer kingma2014adam.

### 3.1 Results and Discussion

Generalization for Different Parameters: We investigate the effect of different parameter settings than those used during, to validate the generalizability of the neural scheme. To study the effect of different we use the original test set. We generate two additional test cases; varying one parameter at a time : a) , and b) . The elements of and are drawn from the same distribution as before. The average error propagation over ten time step between the semi-implicit finite difference method and the proposed neural implicit solver is being compared in Figure 2.

We observe that error from the neural scheme (10 iterations per time step) is less compared to the error from the semi-implicit FDM (25 iterations per time step) scheme for all three different test sets. This affirms our hypothesis that the neural solver is more accurate compared to the semi-implicit FDM and generalizable to other parameter settings at the same time. We interpret that each explores an optimal discretization scheme near the manifold of by learning the sum of order rules as described in dong2017image[c.f. Appendix C].

Run-time Comparison: We compare the run time for the neural solver (10 iterations per time step) and semi-implicit scheme (25 iterations per time step), for one time-steps. The experiments are conducted on an Intel Xeon W-2123 CPU @ 3.60GHz, with code running on one of the four cores. We report that the trained neural solver takes circa 0.0148s compared to 0.0141s for the semi-implicit scheme, whereas the FEniCS solution takes 3.19s for machine precision convergence.

## 4 Conclusion

This abstract introduces a novel implicit neural scheme to solve linear time-dependent PDEs. We leverage an existing semi-implicit update rule to design a learnable iterator that provides theoretical guarantees. The learned iterator achieves faster convergence compared to the existing semi-implicit solver and produces a more accurate solution for a fixed computation budget. More importantly, we empirically demonstrate that training on a single parameter setting is enough to generalize over other parameter settings which confirms our theoretical results. The learned neural solver can be a faster alternative for simulation of various physical processes.

#### Acknowledgments

Suprosanna Shit and Ivan Ezhov are supported by the Translational Brain Imaging Training Network (TRABIT) under the European Union’s ‘Horizon 2020’ research & innovation program (Grant agreement ID: 765148).

## References

## Appendix A Convergence Criteria for Semi-implicit Update

The spectral radius of can be expresses as following

Given , we have .

## Appendix B Proofs

See 1

###### Proof.

The proof is similar to Theorem 2 of [hsieh2018learning]. The spectral norm is convex from the sub-additive property, and is linear in . To prove that it is open, observe that is a continuous function, so is continuous in . Given , the set of is the preimage under this continuous function of for some , and the inverse image of open set must be open. ∎

###### Lemma 2.

The upper bound of the spectral norm of is independent of , and Given .

###### Proof.

Considering the spectral norm of and invoking product and triangular inequality of norm, we obtain the following tight bound

Given , we have

∎

See 1

###### Proof.

From Theorem 1 of [hsieh2018learning] and Lemma 1, our iterator is valid if and only if . From Lemma 2 the upper bound of spectral norm of iterator only depends on { and given . Nonetheless, for any matrix spectral radius is upper bounded by its spectral norm. Thus, if the iterator is valid for some , and , then it is valid for any feasible choice of , and satisfying the constraints. ∎

## Appendix C Geometric Interpretation

Surprisingly, we find that the form of has a special structure. As the denominator is constant the objective is to minimize w.r.t. . Invoking triangular inequality of norm we have the lower-bound

Taking square of both side, we have

When the equality holds the local optima is the surface of the hyper-sphere with center at with radius . This leads us to believe that each explores an optimal discretization scheme near the manifold of by learning the sum of order rules as described in [dong2017image].