D3M: A deep domain decomposition method for partial differential equations

D3M: A deep domain decomposition method for partial differential equations

Ke Li School of Information Science and Technology, ShanghaiTech University, Shanghai, 200120, China. Equal contributions. ({like1, tangkj, liaoqf}@shanghaitech.edu.cn).    Kejun Tang 1    Tianfan Wu Viterbi School of Engineering, University of Southern California, Los Angeles, USA, (tianfanw@usc.edu).    Qifeng Liao1 Corresponding author.
1footnotemark: 1

A state-of-the-art deep domain decomposition method (D3M) based on the variational principle is proposed for partial differential equations (PDEs). The solution of PDEs can be formulated as the solution of a constrained optimization problem, and we design a multi-fidelity neural network framework to solve this optimization problem. Our contribution is to develop a systematical computational procedure for the underlying problem in parallel with domain decomposition. Our analysis shows that the D3M approximation solution converges to the exact solution of underlying PDEs. Our proposed framework establishes a foundation to use variational deep learning in large-scale engineering problems and designs. We present a general mathematical framework of D3M, validate its accuracy and demonstrate its efficiency with numerical experiments.


omain decomposition, Deep learning, Mesh-free, Multi-fidelity, Parallel computation, PDEs, Physics-constrained


65M55, 68T05, 65N55

1 Introduction

Partial differential equations (PDEs) are among the most ubiquitous tools employed in describing computational science and engineering problems. When modeling complex problems, the governing PDEs are typically expensive to solve through transitional numerical methods, e.g., the finite element methods [elman2014finite]. While principal component analysis [wold1987principal, scholkopf1997kernel] (PCA), proper orthogonal decomposition [berkooz1993proper, willcox2002balanced] (POD) and reduced basis methods [verpat02, quaroz07, elman2013reduced, cheroz14, liao2016reduced, chenjiang16] are classical approaches for model reduction to reduce the computational costs, deep learning [lecun2015deep] currently gains a lot of interests for efficiently solving PDEs. There are mathematical guarantees called universal approximation theorems [hornik1991approximation] stating that a single layer neural network can approximate most functions in Soblev spaces. Although there is still a lack of theoretical frameworks for explaining the effectiveness of multilayer neural networks, deep learning has become a widely used tool. Marvelous successful practices of deep neural networks encourages their applications to different areas, where the curse of dimensionality is a tormenting issue.

New approaches are actively proposed to solve PDEs based on deep learning techniques. E et al. [weinan2017proposal, weinan2018deep] connect deep learning with dynamic system and propose a deep Ritz method (DRM) for solving PDEs via variational methods. Raissi et al. [raissi2017physics1, raissi2017physics2, raissi2019physics] develop physics-informed neural networks which combine observed data with PDE models. By leveraging a prior knowledge that the underlying PDE model obeys a specific form, they can make accurate predictions with limited data. Long et al. [long2017pde] present a feed-forward neural network, called PDE-Net, to accomplish two tasks at the same time: predicting time-dependent behavior of an unknown PDE-governed dynamic system, and revealing the PDE model that generates observed data. Later, Sirignano et al. [sirignano2018dgm] propose a deep Galerkin method (DGM), which is a meshfree deep learning algorithm to solve PDEs without requiring observed data (solution samples of PDEs). When a steady-state high-dimensional parametric PDE system is considered, Zhu et al. [zhu2018bayesian, zhu2019physics] propose Bayesian deep convolutional encoder-decoder networks for problems with high-dimensional random inputs.

When considering computational problems arising in practical engineering, e.g. aeronautics and astronautics, systems are typically designed by multiple groups along disciplinary. The complexity of solving large-scale problems may take an expensive cost of hardware. The balance of accuracy and generalization is also hard to trade off. For this reason, decomposing a given system into component parts to manage the complexity is a strategy, and the domain decomposition method is a traditional numerical method to achieve this goal. Schwarz [schwarz1870ueber] proposes an iterative method for solving harmonic functions. Then this method is improved by S.L.Sobolev [sobolev1936schwarz], S.G.Michlin [mikhlin1951schwarz], P.L.Lions et al. [lions1988schwarz, lions1989schwarz]. Domain decomposition is also employed for optimal design or control [antil2010domain], for decomposing a complex design task (e.g., decomposition approaches to multidisciplinary optimization [liao2019parallel, kong2018efficient]), and for uncertainty analysis of models governed by PDEs [liao2015domain, chen2015local].

In this work, we propose a variational deep learning solver based on domain decomposition methods, which is referred to as the deep domain decomposition method (D3M) to implement parallel computations along physical subdomains. Especially, efficient treatments of complex boundary conditions are developed. Solving PDEs using D3M has several benefits:

Complexity and generalization. D3M manages complexity at the local level. Overfitting is a challenging problem in deep learning. The risk of overfitting can be reduced by splitting the physical domain into subdomains, so that each network focuses on a specific subdomain.

Mesh-free and data-free. D3M constructs and trains networks under variational formulation. So, it does not require given data, which can be potentially used for complex and high-dimensional problems.

Parallel computation. The computational procedures of D3M are in parallel for different subdomains. This feature helps D3M work efficiently on large-scale and multidisciplinary problems.

In this work, D3M is developed based on iterative domain decomposition methods. The development of using domain decomposition leads to an independent model-training procedure in each subdomain in an “offline” phase, followed by assembling global solution using pre-computed local information in an “online” phase. Section 2 reviews iterative overlapping domain decomposition methods. Section 3 presents the normal variational principle informed neural networks, and our D3M algorithms. A convergence analysis of D3M is discussed in Section 4, and a summary of our full approach is presented in Section 5. Numerical studies are discussed in Section 6. Finally, Section 7 concludes the paper.

2 Overlapping domain decomposition

The Schwarz method [schwarz1870ueber] is the most classical example of domain decomposition approach for PDEs, and it is still efficient with variant improvements [li2018multilevel, zampini2017multilevel, zhang2018bi].

Given a classical Poisson’s equation


We divide into two overlapping subdomains (see Figure 1), where

Figure 1: Partition into two overlapping subdomains.

We introduce the original formula named Schwarz alternating method here. Let be an initial guess defined in and vanishing on . For , we define sequences where denotes in . The is determined from an iteration algorithm:




3 Deep domain decomposition method with variational principle

Before introducing D3M, we first give a brief introduction of variational principle. In this section, we consider the Poisson’s equation and reformulate (1) as a constrained minimization problem, and then we introduce the D3M algorithm.

3.1 Variational principle

The Poisson’s equation with the homogeneous Dirichlet boundary condition is (1), and we consider the situation that and is a square domain in this section. The idea of the standard Deep Ritz method is based on the variational principle. That is, the PDE can be derived by a functional minimization problem as described in the following proposition.

Proposition 1

Solving the Poisson’s equation (1) is equivalent to an optimization problem


The Lagrangian formula of (5) is given by


where is the Lagrange multiplier.

Definition 1

denotes symmetric tensor-fields in space, in which functions are square integrable and have square integrable divergence.

We employ a mixed residual loss [zhu2019physics] following Hellinger-Reissner principle [ARNOLD1990281]. With an additional variable , which represents flux, we can turn Equation (1) into


The mixed residual loss is


3.2 Variational principle informed neural networks

Though the Poisson’s equation (1) is reformulated as an optimization problem, it is intractable to find the optimum in an infinite-dimensional function space. Instead, we seek to approximate the solution by neural networks. We utilize to approximate the solution and the flux in domain , where and are the parameters to train. The input is the spatial variable in , and the outputs represent the function value corresponding to the input. With these settings, we can train a neural network by variational principle to represent the solution of Poisson’s equation. The functional minimization problem (8) turns into the following optimization problem

Remark 1

In practical implementation, and are embedded in one network parameterized with , and the two outputs of denote the function values of and respectively.

Therefore, the infinite-dimensional optimization problem (5) is transformed into a finite-dimensional optimization problem (9). Our goal is to find the optimal (or sub optimal) parameters to minimize the loss in (9). To this end, we choose a mini-batch of points randomly sampled in . These data points can give an estimation of the integral in (9) and the gradient information to update the parameters . For example, a mini-batch points are drawn in randomly, where in and on . Then the parameters can be updated by using optimization approaches


3.3 Implementation details for neural networks

This section provides details for the architecture of our neural networks.

For giving a direct-viewing impression, we show the implementation with a plain vanilla densely connected neural network to introduce how the structure works in Figure 2. For illustration only, the network depicted consists of 2 layers with 6 neurons in each layer. The network takes input variables and outputs . The number of neurons in each layer is and denotes an element-wise operator


where is called the activation function. There are some commonly used activation functions such as the sigmoidal function, the tanh function, the rectified linear units (ReLU) function [nair2010rectified], and the Leaky ReLU function [maas2013rectifier]. We employ the tanh function, and the automatic differentiation is obtained by using PyTorch [paszke2017automatic]. The total loss function comprises the residual loss terms , and the Lagrangian term which guarantees the constraint conditions. The parameters are trained with backpropogating gradients of the loss function and the optimizer is L-BFGS [liu1989limited] where the learning rate is 0.5. In practice, the model architecture of neural networks is the residual network (ResNet) [he2016deep]. These residual networks are easier to optimize, and it can gain accuracy from considerably increased depth. The structure of ResNet improves the result of deep networks, because there are more previous information retained. A brief illustration of ResNet is in Figure 3.

Figure 2: Illustration of the neural networks. are inputs, are outputs and the dashed box with means the architecture of plain fully-connected neural networks.
Figure 3: The residual network building block of our method.

3.4 Deep domain decomposition method

In this part, we propose the main algorithms of our D3M. Because the physics-constrained neural network is mesh-free, we improve Schwarz alternating method with a better inhomogeneous D3M sampling method at junctions to accelerate convergence. We note the performance of normal deep variational networks and mixed residual networks can deteriorate when the underlying problem has inhomogeneous boundary conditions. Our treatment to overcome this weakness is to introduce the following boundary function.

Definition 2

(Boundary function) A smooth function is called a boundary function associated with if


where is a coefficient, the notation denotes the shortest Euclidean distance between and . If the point is on the boundary, . If not, the value of decreases to zero sharply. And we define , where satisfies


Letting on each subdomain, Equation (7) can be represented as


The mixed residual loss is


It should be noted that, the integration is completed by Monte Carlo, such that the domain decomposition reduces the variance of samples significantly with the same number of data because the area of samples becomes smaller.

1:Input: , , , , , , , .
2:Initialize: , , , , .
3:Divide the physical domain into .
4:while  do
5:     Run Algorithm (2) in each subdomain in parallel.
6:     .
7:     .
8:end while
9:Merge parts and get .
10:Return: .
Algorithm 1 Deep domain decomposition
1:Input: , , , , , .
2:Construct function using value of , .
3:for  steps do
4:     Sample minibatch of samples in .
5:     Sample minibatch of samples on .
6:     Update the parameters by descending its stochastic gradient:
7:end for
10:Return: , .
Algorithm 2 Training for subdomain

The procedure of D3M is as follows. We first divide the domain into subdomains, and each two neighboring subdomains are overlapping. The local solution of PDEs on each subdomain is replaced by neural networks which can be trained through the variational principle, where the global solution on the whole domain consists of these local solutions on subdomains. To be more precise, let denote decomposed junctions, is initial weights of neural networks, is the threshold of accuracy, and are the samples generated in and on interface to evaluate the output of networks in each iteration, are training samples in subdomain , are training samples on , is training time in each iteration, and are batch sizes, is the neural networks for , is the neural networks for , is the iteration time, and is the output of networks for subdomain in -th iteration. The formal description of D3M is presented in Algorithm 1.

4 Analysis

While mixed residual formulation is a special case, we consider the basic functional formulation first,

Definition 3

Given closed subspaces and , for any , and a proper, lower semi-continuous, coercive convex functional , we denote .

Assumption 1

and   s.t.


where J’ is uniformly continuous on .

Proposition 2

The alternating Schwarz method (3) and (4) converges to the solution of of Equation (14). The error bound of and can be estimated via maximum principle [kantorovich1960approximate, lions1989schwarz], such that for


where constants is close to one if the overlapping region is thin.

Lemma 1

On each subdomain , neural network with continuous derivatives up to order are universal approximators in Sobolev space with an order , which means .

Lemma 2

If the variational formulation (16) satisfies the assumption (1), then it follows that there exists a sequential obtained by Schwarz alternating method converges to the minimum of on each subdomain .

Theorem 1

denotes the objective function on the subdomain . Under above assumptions, for , , while iteration times , converges to optimal solution of in subdomain for a constant


For concision, we use to represent in the following part. {proof} denotes the sequential in Lemma 2, there exist


where is a constant, .
Then with Lemma 1 [hornik1991approximation], in each subdomain , neural network . If the training times in each subdomain is enough, the universal approximation ensures the distance between functions and is close enough in the Sobolev space, with the constant


While converging to the minimum of of , by the optimality conditions it is clear that


Under the assumption, and the difference between two iterations can be represented as


Consider equation (20), (21) and (23), we have

Theorem 2

For a given boundary function and a fixed , the optimal solution of Equation (9) and of Equation (14) satisfy .


We use and to denote and respectively.


Function we optimized is also the optimal solution with the formula .

Up to now, we prove that D3M can solve steady Poisson’s equation with variational formulations. Then, we extend D3M to more general quasilinear parabolic PDEs (26) with physics-constrained approaches.


where denotes , and are decomposed boundary sets with smooth boundaries . We recall the network space of subdomain with generated data according to [hornik1991approximation]


where is any activation function, is one set of generated data, , and denote coefficients of networks. Set . Under the universal approximation of neural networks Lemma 1, in each subdomain the neural networks satisfies


where and satisfy


For the following part of analysis, we make some assumptions.

Assumption 2
  • There is a constant and positive functions such that for all we have


    with for some .

  • and are Lipschitz continuous in .

  • In each subdomain, the derivatives of solutions from alternating Schwarz method (3), (4) converge to the derivative of solution . Precisely, there exists a constant , such that for iteration times

  • is continuously differentiable w.r.t. .

  • There is a positive constant such that


Theorem 3

Suppose the domain is decomposed into , denotes iteration times (omitted in notations for brief). denotes networks space space in subdomain , where subdomains are compact. Assume that target function (26) has unique solution in each subdomain, nonlinear terms and are locally Lipschitz in (), and uniformly converges to with . For , there such that there exists a set of neural networks satisfies the error as follow


In each subdomain , with iteration times , denotes the loss between and .


With Lemma 1, it is clear that the sum of last term is smaller than , where is a constant. We assumed that uniformly converges to with , and this means that


So that we have


where is a constant, and the error bound between and is proved from Theorem 7.1 in [sirignano2018dgm].


where is a constant depends on . While is also Lipschitz continuous, we can prove the upper bound of in the same formula with Equation (34), denoted as . Hence we can obtain

Theorem 4

Under Assumption 2 and Equation (29), with iteration times , the set of neural networks converge to the unique solutions to (26), strongly in for every . In addition, in each subdomain the sequence is bounded in n under the constraint of 2 and converges to .


In each subdomain, the convergence can be obtained from the Theorems 7.1 and 7.3 in [sirignano2018dgm]. With the Proposition 2, the sequence is uniform bounded in n, and the rates of convergence to the solution are related to overlapping areas.

Specially, the in means training times instead of iteration times.We leave the proof for time-dependent data-free variational formulations for future work.

5 D3M summary

To summarize the strategies in Section 3 and 4, the full procedure of D3M comprises the following steps:

  • pre-step: Set the architecture of neural networks in each subdomain;

  • offline step: Construct functions for boundary conditions, train networks and generate local solutions;

  • online step: Estimate target input data using neural networks. If solutions don’t converge, transfer information on interfaces and go back to the offline step.

The pre-step is cheap, because the setting of neural networks is an easy task. In the offline stage, the complex system is fully decomposed into component parts, which means that there is no data exchange between subdomains. Since we only approximate the data on interface with normal simple approach such as Fourier series and polynomial chaos [chen2015local, arngha11a], the approximation is also low costly. After decomposition, the requirement for number of samples for the Monte Carlo integration of the residual loss function (15) is significantly reduced, while the density of samples does not change. Since the number of samples decreasing and the domain becoming simpler, we can use neural networks with few layers to achieve a relatively high accuracy. If we keep the same number of samples and layers as the global setting, D3M should obtain a better accuracy. The cost of the online step is low, since no PDE solve is required. This full approach is also summarized in Figure 4, where transformations of data happen between adjacent subdomains.

Figure 4: D3M summary.

For the problems we focus on (systems governed by PDEs), the cost of the D3M is dominated by the local training procedures in the offline step. Here we present a rough order of magnitude analysis of the computational costs where denotes the cost of one block in each training epoch (i.e., the cost of any block with the same number of neurons in each training iteration is taken to be equal for simplicity). The dominant cost of D3M is the total number of blocks of neural network and training times, , where is sample size, is number of blocks and is training times. If we consider equal offline sample sizes, number of blocks and training epochs, for all subdomains , then total cost can be written as . The total cost is decreased by employing the idea of hierarchical neural networks [ng2014multifidelity, li2019hnh].

6 Numerical tests

Here we consider two classical problems, the Poisson’s equation and the time-independent Schrödinger equation, to verify the performance of our D3M. All timings conduct on an Intel Core i5-7500, 16GB RAM, Nvidia GTX 1080Ti processor with PyTorch 1.0.1 [paszke2017automatic] under Python 3.6.5. We train the networks only 30 epoch using L-BFGS in numerical tests (cost within two minutes).

6.1 Poisson’s equation


where the physical domain is . The domain decomposition setting is illustrated in Figure 5. To further improve the efficiency of D3M, we propose a new type of sampling methods. We randomly sample in each subdomain, and the number of samples increases with iteration times increase. The sample size on interfaces remains the same to provide an accurate solution for data exchange. An illustration of our D3M sampling method is represented in Figure 6

Figure 5: Illustrations of the physical domain with four overlapping components.
Figure 6: D3M sampling : a new type of mesh-free sampling.
Remark 2

According to the research on overfitting, the hypothesis set (w.r.t. the complexity of neural networks) should match the quantity and quality of data instead of target functions [abu2012learning]. So in the initial several iterations, the number of residual blocks is small. The number increases while the sample size in Figure 6 increases.

After decomposition, with designed satisfying Definition 1 for each subdomains, the function satisfies the homogeneous Poisson’s equation and follows (3) and (4).

(a) Solution of D3M
(b) Solution of DRM