D3M: A deep domain decomposition method for partial differential equations
Abstract
A stateoftheart deep domain decomposition method (D3M) based on the variational principle is proposed for partial differential equations (PDEs). The solution of PDEs can be formulated as the solution of a constrained optimization problem, and we design a multifidelity neural network framework to solve this optimization problem. Our contribution is to develop a systematical computational procedure for the underlying problem in parallel with domain decomposition. Our analysis shows that the D3M approximation solution converges to the exact solution of underlying PDEs. Our proposed framework establishes a foundation to use variational deep learning in largescale engineering problems and designs. We present a general mathematical framework of D3M, validate its accuracy and demonstrate its efficiency with numerical experiments.
omain decomposition, Deep learning, Meshfree, Multifidelity, Parallel computation, PDEs, Physicsconstrained
65M55, 68T05, 65N55
1 Introduction
Partial differential equations (PDEs) are among the most ubiquitous tools employed in describing computational science and engineering problems. When modeling complex problems, the governing PDEs are typically expensive to solve through transitional numerical methods, e.g., the finite element methods [elman2014finite]. While principal component analysis [wold1987principal, scholkopf1997kernel] (PCA), proper orthogonal decomposition [berkooz1993proper, willcox2002balanced] (POD) and reduced basis methods [verpat02, quaroz07, elman2013reduced, cheroz14, liao2016reduced, chenjiang16] are classical approaches for model reduction to reduce the computational costs, deep learning [lecun2015deep] currently gains a lot of interests for efficiently solving PDEs. There are mathematical guarantees called universal approximation theorems [hornik1991approximation] stating that a single layer neural network can approximate most functions in Soblev spaces. Although there is still a lack of theoretical frameworks for explaining the effectiveness of multilayer neural networks, deep learning has become a widely used tool. Marvelous successful practices of deep neural networks encourages their applications to different areas, where the curse of dimensionality is a tormenting issue.
New approaches are actively proposed to solve PDEs based on deep learning techniques. E et al. [weinan2017proposal, weinan2018deep] connect deep learning with dynamic system and propose a deep Ritz method (DRM) for solving PDEs via variational methods. Raissi et al. [raissi2017physics1, raissi2017physics2, raissi2019physics] develop physicsinformed neural networks which combine observed data with PDE models. By leveraging a prior knowledge that the underlying PDE model obeys a specific form, they can make accurate predictions with limited data. Long et al. [long2017pde] present a feedforward neural network, called PDENet, to accomplish two tasks at the same time: predicting timedependent behavior of an unknown PDEgoverned dynamic system, and revealing the PDE model that generates observed data. Later, Sirignano et al. [sirignano2018dgm] propose a deep Galerkin method (DGM), which is a meshfree deep learning algorithm to solve PDEs without requiring observed data (solution samples of PDEs). When a steadystate highdimensional parametric PDE system is considered, Zhu et al. [zhu2018bayesian, zhu2019physics] propose Bayesian deep convolutional encoderdecoder networks for problems with highdimensional random inputs.
When considering computational problems arising in practical engineering, e.g. aeronautics and astronautics, systems are typically designed by multiple groups along disciplinary. The complexity of solving largescale problems may take an expensive cost of hardware. The balance of accuracy and generalization is also hard to trade off. For this reason, decomposing a given system into component parts to manage the complexity is a strategy, and the domain decomposition method is a traditional numerical method to achieve this goal. Schwarz [schwarz1870ueber] proposes an iterative method for solving harmonic functions. Then this method is improved by S.L.Sobolev [sobolev1936schwarz], S.G.Michlin [mikhlin1951schwarz], P.L.Lions et al. [lions1988schwarz, lions1989schwarz]. Domain decomposition is also employed for optimal design or control [antil2010domain], for decomposing a complex design task (e.g., decomposition approaches to multidisciplinary optimization [liao2019parallel, kong2018efficient]), and for uncertainty analysis of models governed by PDEs [liao2015domain, chen2015local].
In this work, we propose a variational deep learning solver based on domain decomposition methods, which is referred to as the deep domain decomposition method (D3M) to implement parallel computations along physical subdomains. Especially, efficient treatments of complex boundary conditions are developed. Solving PDEs using D3M has several benefits:
Complexity and generalization. D3M manages complexity at the local level. Overfitting is a challenging problem in deep learning. The risk of overfitting can be reduced by splitting the physical domain into subdomains, so that each network focuses on a specific subdomain.
Meshfree and datafree. D3M constructs and trains networks under variational formulation. So, it does not require given data, which can be potentially used for complex and highdimensional problems.
Parallel computation. The computational procedures of D3M are in parallel for different subdomains. This feature helps D3M work efficiently on largescale and multidisciplinary problems.
In this work, D3M is developed based on iterative domain decomposition methods. The development of using domain decomposition leads to an independent modeltraining procedure in each subdomain in an “offline” phase, followed by assembling global solution using precomputed local information in an “online” phase. Section 2 reviews iterative overlapping domain decomposition methods. Section 3 presents the normal variational principle informed neural networks, and our D3M algorithms. A convergence analysis of D3M is discussed in Section 4, and a summary of our full approach is presented in Section 5. Numerical studies are discussed in Section 6. Finally, Section 7 concludes the paper.
2 Overlapping domain decomposition
The Schwarz method [schwarz1870ueber] is the most classical example of domain decomposition approach for PDEs, and it is still efficient with variant improvements [li2018multilevel, zampini2017multilevel, zhang2018bi].
Given a classical Poisson’s equation
(1) 
We divide into two overlapping subdomains (see Figure 1), where
(2) 
We introduce the original formula named Schwarz alternating method here. Let be an initial guess defined in and vanishing on . For , we define sequences where denotes in . The is determined from an iteration algorithm:
(3) 
and
(4) 
3 Deep domain decomposition method with variational principle
Before introducing D3M, we first give a brief introduction of variational principle. In this section, we consider the Poisson’s equation and reformulate (1) as a constrained minimization problem, and then we introduce the D3M algorithm.
3.1 Variational principle
The Poisson’s equation with the homogeneous Dirichlet boundary condition is (1), and we consider the situation that and is a square domain in this section. The idea of the standard Deep Ritz method is based on the variational principle. That is, the PDE can be derived by a functional minimization problem as described in the following proposition.
Proposition 1
Definition 1
denotes symmetric tensorfields in space, in which functions are square integrable and have square integrable divergence.
We employ a mixed residual loss [zhu2019physics] following HellingerReissner principle [ARNOLD1990281]. With an additional variable , which represents flux, we can turn Equation (1) into
(7) 
The mixed residual loss is
(8) 
3.2 Variational principle informed neural networks
Though the Poisson’s equation (1) is reformulated as an optimization problem, it is intractable to find the optimum in an infinitedimensional function space. Instead, we seek to approximate the solution by neural networks. We utilize to approximate the solution and the flux in domain , where and are the parameters to train. The input is the spatial variable in , and the outputs represent the function value corresponding to the input. With these settings, we can train a neural network by variational principle to represent the solution of Poisson’s equation. The functional minimization problem (8) turns into the following optimization problem
(9) 
Remark 1
In practical implementation, and are embedded in one network parameterized with , and the two outputs of denote the function values of and respectively.
Therefore, the infinitedimensional optimization problem (5) is transformed into a finitedimensional optimization problem (9). Our goal is to find the optimal (or sub optimal) parameters to minimize the loss in (9). To this end, we choose a minibatch of points randomly sampled in . These data points can give an estimation of the integral in (9) and the gradient information to update the parameters . For example, a minibatch points are drawn in randomly, where in and on . Then the parameters can be updated by using optimization approaches
(10) 
3.3 Implementation details for neural networks
This section provides details for the architecture of our neural networks.
For giving a directviewing impression, we show the implementation with a plain vanilla densely connected neural network to introduce how the structure works in Figure 2. For illustration only, the network depicted consists of 2 layers with 6 neurons in each layer. The network takes input variables and outputs . The number of neurons in each layer is and denotes an elementwise operator
(11) 
where is called the activation function. There are some commonly used activation functions such as the sigmoidal function, the tanh function, the rectified linear units (ReLU) function [nair2010rectified], and the Leaky ReLU function [maas2013rectifier]. We employ the tanh function, and the automatic differentiation is obtained by using PyTorch [paszke2017automatic]. The total loss function comprises the residual loss terms , and the Lagrangian term which guarantees the constraint conditions. The parameters are trained with backpropogating gradients of the loss function and the optimizer is LBFGS [liu1989limited] where the learning rate is 0.5. In practice, the model architecture of neural networks is the residual network (ResNet) [he2016deep]. These residual networks are easier to optimize, and it can gain accuracy from considerably increased depth. The structure of ResNet improves the result of deep networks, because there are more previous information retained. A brief illustration of ResNet is in Figure 3.
3.4 Deep domain decomposition method
In this part, we propose the main algorithms of our D3M. Because the physicsconstrained neural network is meshfree, we improve Schwarz alternating method with a better inhomogeneous D3M sampling method at junctions to accelerate convergence. We note the performance of normal deep variational networks and mixed residual networks can deteriorate when the underlying problem has inhomogeneous boundary conditions. Our treatment to overcome this weakness is to introduce the following boundary function.
Definition 2
(Boundary function) A smooth function is called a boundary function associated with if
(12) 
where is a coefficient, the notation denotes the shortest Euclidean distance between and . If the point is on the boundary, . If not, the value of decreases to zero sharply. And we define , where satisfies
(13) 
Letting on each subdomain, Equation (7) can be represented as
(14) 
The mixed residual loss is
(15)  
It should be noted that, the integration is completed by Monte Carlo, such that the domain decomposition reduces the variance of samples significantly with the same number of data because the area of samples becomes smaller.
The procedure of D3M is as follows. We first divide the domain into subdomains, and each two neighboring subdomains are overlapping. The local solution of PDEs on each subdomain is replaced by neural networks which can be trained through the variational principle, where the global solution on the whole domain consists of these local solutions on subdomains. To be more precise, let denote decomposed junctions, is initial weights of neural networks, is the threshold of accuracy, and are the samples generated in and on interface to evaluate the output of networks in each iteration, are training samples in subdomain , are training samples on , is training time in each iteration, and are batch sizes, is the neural networks for , is the neural networks for , is the iteration time, and is the output of networks for subdomain in th iteration. The formal description of D3M is presented in Algorithm 1.
4 Analysis
While mixed residual formulation is a special case, we consider the basic functional formulation first,
(16) 
Definition 3
Given closed subspaces and , for any , and a proper, lower semicontinuous, coercive convex functional , we denote .
Assumption 1
and s.t.
(17) 
where J’ is uniformly continuous on .
Proposition 2
Lemma 1
On each subdomain , neural network with continuous derivatives up to order are universal approximators in Sobolev space with an order , which means .
Lemma 2
Theorem 1
denotes the objective function on the subdomain . Under above assumptions, for , , while iteration times , converges to optimal solution of in subdomain for a constant
(19) 
For concision, we use to represent in the following part. {proof} denotes the sequential in Lemma 2, there exist
(20) 
where is a constant, .
Then with Lemma 1 [hornik1991approximation], in each subdomain , neural network . If the training times in each subdomain is enough, the universal approximation ensures the distance between functions and is close enough in the Sobolev space, with the constant
(21) 
While converging to the minimum of of , by the optimality conditions it is clear that
(22)  
Under the assumption, and the difference between two iterations can be represented as
(23) 
Consider equation (20), (21) and (23), we have
(24)  
Theorem 2
We use and to denote and respectively.
(25)  
Function we optimized is also the optimal solution with the formula .
Up to now, we prove that D3M can solve steady Poisson’s equation with variational formulations. Then, we extend D3M to more general quasilinear parabolic PDEs (26) with physicsconstrained approaches.
(26) 
where denotes , and are decomposed boundary sets with smooth boundaries . We recall the network space of subdomain with generated data according to [hornik1991approximation]
(27) 
where is any activation function, is one set of generated data, , and denote coefficients of networks. Set . Under the universal approximation of neural networks Lemma 1, in each subdomain the neural networks satisfies
(28) 
where and satisfy
(29) 
For the following part of analysis, we make some assumptions.
Assumption 2

There is a constant and positive functions such that for all we have
and
with for some .

and are Lipschitz continuous in .

is continuously differentiable w.r.t. .

There is a positive constant such that
and
Theorem 3
Suppose the domain is decomposed into , denotes iteration times (omitted in notations for brief). denotes networks space space in subdomain , where subdomains are compact. Assume that target function (26) has unique solution in each subdomain, nonlinear terms and are locally Lipschitz in (), and uniformly converges to with . For , there such that there exists a set of neural networks satisfies the error as follow
(30) 
In each subdomain , with iteration times , denotes the loss between and .
(31)  
With Lemma 1, it is clear that the sum of last term is smaller than , where is a constant. We assumed that uniformly converges to with , and this means that
(32) 
So that we have
(33)  
where is a constant, and the error bound between and is proved from Theorem 7.1 in [sirignano2018dgm].
(34)  
where is a constant depends on . While is also Lipschitz continuous, we can prove the upper bound of in the same formula with Equation (34), denoted as . Hence we can obtain
(35) 
Theorem 4
In each subdomain, the convergence can be obtained from the Theorems 7.1 and 7.3 in [sirignano2018dgm]. With the Proposition 2, the sequence is uniform bounded in n, and the rates of convergence to the solution are related to overlapping areas.
Specially, the in means training times instead of iteration times.We leave the proof for timedependent datafree variational formulations for future work.
5 D3M summary
To summarize the strategies in Section 3 and 4, the full procedure of D3M comprises the following steps:

prestep: Set the architecture of neural networks in each subdomain;

offline step: Construct functions for boundary conditions, train networks and generate local solutions;

online step: Estimate target input data using neural networks. If solutions don’t converge, transfer information on interfaces and go back to the offline step.
The prestep is cheap, because the setting of neural networks is an easy task. In the offline stage, the complex system is fully decomposed into component parts, which means that there is no data exchange between subdomains. Since we only approximate the data on interface with normal simple approach such as Fourier series and polynomial chaos [chen2015local, arngha11a], the approximation is also low costly. After decomposition, the requirement for number of samples for the Monte Carlo integration of the residual loss function (15) is significantly reduced, while the density of samples does not change. Since the number of samples decreasing and the domain becoming simpler, we can use neural networks with few layers to achieve a relatively high accuracy. If we keep the same number of samples and layers as the global setting, D3M should obtain a better accuracy. The cost of the online step is low, since no PDE solve is required. This full approach is also summarized in Figure 4, where transformations of data happen between adjacent subdomains.
For the problems we focus on (systems governed by PDEs), the cost of the D3M is dominated by the local training procedures in the offline step. Here we present a rough order of magnitude analysis of the computational costs where denotes the cost of one block in each training epoch (i.e., the cost of any block with the same number of neurons in each training iteration is taken to be equal for simplicity). The dominant cost of D3M is the total number of blocks of neural network and training times, , where is sample size, is number of blocks and is training times. If we consider equal offline sample sizes, number of blocks and training epochs, for all subdomains , then total cost can be written as . The total cost is decreased by employing the idea of hierarchical neural networks [ng2014multifidelity, li2019hnh].
6 Numerical tests
Here we consider two classical problems, the Poisson’s equation and the timeindependent Schrödinger equation, to verify the performance of our D3M. All timings conduct on an Intel Core i57500, 16GB RAM, Nvidia GTX 1080Ti processor with PyTorch 1.0.1 [paszke2017automatic] under Python 3.6.5. We train the networks only 30 epoch using LBFGS in numerical tests (cost within two minutes).
6.1 Poisson’s equation
(36) 
where the physical domain is . The domain decomposition setting is illustrated in Figure 5. To further improve the efficiency of D3M, we propose a new type of sampling methods. We randomly sample in each subdomain, and the number of samples increases with iteration times increase. The sample size on interfaces remains the same to provide an accurate solution for data exchange. An illustration of our D3M sampling method is represented in Figure 6
Remark 2
According to the research on overfitting, the hypothesis set (w.r.t. the complexity of neural networks) should match the quantity and quality of data instead of target functions [abu2012learning]. So in the initial several iterations, the number of residual blocks is small. The number increases while the sample size in Figure 6 increases.
After decomposition, with designed satisfying Definition 1 for each subdomains, the function satisfies the homogeneous Poisson’s equation and follows (3) and (4).