Learning in Modal Space: Solving Time-Dependent Stochastic PDEs Using Physics-Informed Neural Networks
One of the open problems in scientific computing is the long-time integration of nonlinear stochastic partial differential equations (SPDEs), especially with arbitrary initial data. We address this problem by taking advantage of recent advances in scientific machine learning and the spectral dynamically orthogonal (DO) and bi-orthogonal (BO) methods for representing stochastic processes. The recently introduced DO/BO methods reduce the SPDE into solving a system of deterministic PDEs and a system of stochastic ordinary differential equations. Specifically, we propose two new Physics-Informed Neural Networks (PINNs) for solving time-dependent SPDEs, namely the NN-DO/BO methods. The proposed methods incorporate the DO/BO constraints into the loss function (along with the modal decomposition of the SPDE) with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the NN-DO/BO methods can overcome some of the drawbacks of the original DO/BO methods. For example, we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NN-DO/BO methods can be used to solve time-dependent stochastic inverse problems with the same formulation and same computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples, namely: (1) A linear stochastic advection equation with deterministic initial condition: we obtain good results with the proposed methods while the original DO/BO methods cannot be applied directly in this case. (2) Long-time integration of the stochastic Burgers’ equation: we show the good performance of NN-DO/BO methods, especially the effectiveness of the NN-BO approach for such problems with many eigenvalue crossings during the whole time evolution, while the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward problem and the inverse problems, including very noisy initial point values, to investigate the flexibility of the NN-DO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NN-DO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems but future work should address the efficiency issue of PINNs for forward problems.
keywords:scientific machine learning, data-driven modeling, dynamical orthogonality, bi-orthogonality, uncertainty quantification, inverse problems
Physics-informed neural networks (PINNs) Raissi2018 are a special class of PDE-induced networks that encode the physics (expressed by the PDE) into a deep neural network (DNN) that shares parameters with a standard DNN that approximates the quantity of interest (QoI), e.g. the solution of the PDE. In practice, this implies that the loss function that expresses mismatch in the labelled data is augmented by the residual of the PDE, which is represented efficiently by automatic differentiation and is evaluated at random points in the time-space domain. This approximation of the nonlinear operators by the DNN is justified theoretically based on the pioneering work of ChenChen93; ChenChen95, which goes well beyond the universal function approximation theorem of Cybenko. This simple and easy to program algorithm has been shown to be successful for diverse problems in physics and fluid mechanics Raissi2018a; Raissi2017b; Raissi_JFM, especially for inverse problems and even for discovering hidden physics Hiddenfluidmechanics. The advantages of encoding the PDE itself into a DNN are multiple: (1) we require much less data to train the DNN since we are searching for the minima on the manifold-solution of the PDE; (2) we respect the conservation laws of mass, momentum and energy; and most importantly, (3) we can truly predict the state of the system, unlike the DNNs driven solely by data that can interpolate accurately only within the training domain. While there is still a lot of work to be done to make PINNs efficient simulation machines, one of the main open issues is uncertainty quantification in predicting the QoI, which will reflect the various sources of uncertainty, i.e., from the approximation of the DNN to the data and physical model uncertainties.
In dongkunNNapc2018 we addressed the issue of total uncertainty for first time and combined dropout and arbitrary polynomial chaos to model stochasticity in steady SPDEs. Here, we consider the more difficult case of time-dependent nonlinear SPDEs and hence we need to introduce a more effective way of dealing with the complexity of long-time integration of stochastic systems. To this end, we employ a generalized form of time-dependent Karhunen-loève (KL) decomposition, first introduced in SapsisPhd, appropriate for second-order random fields, which has the form:
This approach can evolve the time-dependent basis of modes and stochastic coefficients simultaneously, and it is different than the standard polynomial chaos methods Xiu2002; Xiu2005; Xiu2010; narayan_stochastic_2015. To remove the redundancy in this representation, we need some constraints. For example, this can be achieved by imposing dynamical constraints on the spatial basis, which is the so-called “dynamically orthogonal" (DO) methodology first proposed in Sapsis2009; Sapsis2012. Alternatively, by imposing static constraints on both the spatial and stochastic basis, the “bi-orthogonal" (BO) methodology was developed in Cheng2013; Cheng2013a. For both DO and BO we need to derive explicitly the evolution equations for all the components involved, i.e. the mean, spatial basis, and stochastic basis. The DO and BO formulations are mathematically equivalent Choi2014, but they exhibit computationally complimentary properties. Specifically, the BO formulation may fail due to crossing of the eigenvalues of the covariance matrix Choi2014, while both BO and DO become unstable when there is a high condition number of the covariance matrix or zero eigenvalues. A rigorous and sharp error bounds of DO method was first given by Zhou et al. in musharbash2014dynamically, where it was shown that the DO modes can capture the effective directions. For more applications and improvements of DO/BO methods, we refer to Sapsis2013jcp; Choi2014; Lermusiaux2016; Hessam2017 and references therein.
The purpose of this paper is to combine PINNs and the DO/BO methodologies together to obtain new effective methods for solving time-dependent SPDEs – we will refer to them as NN-DO/BO methods. Concretely, we first build a surrogate neural net for the solution of the time-dependent SPDEs based on the generalized KL expansion (Eq. 1). Then the DO/BO constraints are included into the loss function and we train the neural network by minimizing this loss function to obtain the solution. Compared with the original DO/BO method, the merits of the proposed methods are the following:
We do not need the assumption in our NN-DO approach that the covariance matrix of the random coefficients is invertible, even for SPDEs with deterministic initial conditions.
We can deal with eigenvalue crossing in the given time domain when applying our NN-BO approach.
The same NN-DO/BO formulation and computer code can be applied for solving time-dependent stochastic inverse problems or problems driven by sparse noisy data, with the same computational complexity.
The organization of this paper is as follows. In Section 2, we set up the time-dependent stochastic problems. In Section 3, we give a brief review of the dynamically orthogonal and bi-orthogonal methodologies. In Section 4, we formulate our NN-DO/BO framework after the introduction of the PINNs for solving deterministic differential equations. In Section 5, we provide a detailed study of the accuracy and performance of the NN-DO/BO approach with numerical examples. We include two benchmark cases that are specifically designed to have exact solution for the DO and BO representations, followed by a nonlinear stochastic forward problem with high input stochastic dimensionality and noisy data as the initial condition, and a nonlinear inverse problem where we try to identify the model parameters. Finally, we conclude with a brief discussion in Section 6.
2 Problem Setup
Let be a probability space, where is the sample space, is the -algebra of subsets of , and is a probability measure. Let be a bounded domain in ( or 3) whose boundary is denoted by , and be the time domain of interest. We consider the following time-dependent SPDE:
with initial and boundary conditions:
where is a differential operator and is a linear differential operator acting on the domain boundary. Assume that our quantity of interest, , is a second-order random field. The initial and boundary conditions for Eq. 2 are denoted by and . Our aim is to solve Eq. 2, and specifically, to evaluate the mean and standard deviation of the solution .
3 An Overview of the DO and BO Decomposition Methods
For a random field that evolves in time, the generalized Karhunen-Loève (KL) expansion at a given time is
where is the mean, () are zero-mean independent random variables, and are the largest eigenvalue and the corresponding eigenfunction of the covariance kernel, i,e., they solve the following eigenproblem:
Here is the covariance kernel of .
Next we consider a generalized expansion first proposed in Sapsis2009:
Similar to the KL expansion, the random field is decomposed into two parts: (i) the deterministic mean field function , and (ii) the random fluctuation part consists of an infinite summation of deterministic orthogonal fields with 0-mean stochastic coefficients . Formally, we have
We define the linear subspace as the linear space spanned by the first deterministic bases. For now let us assume that are linearly independent and is the linear subspace in spanned by the first stochastic coefficients. The truncated expansion , defined by
is the projection of to the subspace . Without making any assumptions on their form, the governing equations Eq. 2 and Eq. 8–10 represent the only information that can be utilized to derive the evolution equations of and . Note that both the stochastic coefficients and the orthogonal bases are time-dependent (and they are evolving according to the system dynamics), unlike the standard polynomial chaos where the stochastic coefficients are time-independent. There exists some redundancy in the Eq. 11, and therefore, additional constraints need to be imposed in order to formulate a well posed problem for the unknown quantities. Here we review the DO and BO approaches, which have different assumptions on the constraints.
3.1 Dynamically Orthogonal (DO) Representation
As first proposed in Sapsis2009, a natural constraint to overcome redundancy is that the evolution of the bases be orthogonal to the space ; this can be expressed through the following dynamically orthogonal (DO) condition:
Here is defined as the spatial inner product . Comparing Eq. 11 with the standard KL expansion Eq. 5, in the DO representation, we set to have unit length and carries the scaling coefficient as the result of the eigenvalues. Note that the DO condition preserves the orthonormality and the length of the bases since
It is proved in Sapsis2009 that the DO condition leads to a set of independent and explicit evolution equations for all the unknown quantities. Here we state the DO evolution equations without proof:
Theorem 3.1 (see Sapsis2009).
Under the assumptions of the DO representation, the original SPDE (Eq. 2) is reduced to the following system of equations:
The projection in the orthogonal complement of the linear subspace is defined as and the covariance of the stochastic coefficients is . The associated boundary conditions are determined by
and the initial conditions at for the DO components are given by
for all , where is the eigenfields of the standard KL expansion of .
It is shown in Sapsis2009 that by imposing suitable restrictions on the DO representation, the equations for methods such as Polynomial Chaos (PC) or Proper Orthogonal Decomposition (POD) can be recovered from the DO evolution equations. For example, PC can be recovered by setting , where is an orthogonal polynomial in terms of . Moreover, it is shown in ChoiThesis that there exists an one-to-one correspondence between the eigenvalues of the KL expansion for and the eigenvalues of the covariance matrix in the DO representation given any fixed time . Thus, the stochastic coefficients together with the modes provide the necessary information to describe both the shape and the magnitude of the uncertainty that characterizes a stochastic field but also the principle directions in which the stochasticity is distributed.
The moments of can be readily computed from the DO representation. For example, the first moment, i.e., the mean, can be trivially obtained from the first term , and the variance can be calculated as follows:
3.2 Bi-Orthogonal (BO) Representation
An alternative way to overcome the aforementioned redundancy in Eq. 11 is the Bi-Orthogonal (BO) condition, which imposes the static constraint that both the spatial basis functions and stochastic coefficients are orthogonal in time Cheng2013a. In other words, we have the following conditions:
where the s are eigenvalues of the covariance kernel and is the Dirac’s delta function. There is a slight difference between the DO and BO representation: the stochastic coefficients carry the eigenvalues of the covariance operator in the DO representation while the spatial bases carry the eigenvalues of the covariance operator in the BO representation.
Next, we define the matrix and whose entries are
Then by taking the time derivative of Eq. 18, we have
Here we state the BO evolution equations without proof:
Theorem 3.2 (see Cheng2013).
We assume that the bases and stochastic coefficients satisfy the BO condition. Then, the original SPDE (Eq. 2) is reduced to the following system of equations:
Moreover, if for , the -by- matrices and have closed form expression:
where the matrix is defined as .
Similar to the DO method, the boundary condition is given by
and the initial condition is generated from the KL expansion of .
3.3 A Brief Summary of DO/BO Methods
Let us note the following difference between the DO and BO condition: the spatial bases, , under the DO condition evolve to the direction which is normal to the space they expand (orthogonality is automatically maintained), while under the BO condition, only the mutual orthogonality within and within are required and there is no restriction to the direction of their evolution. As a compensation to the lack of constraints in spatial bases, the BO condition puts an additional orthogonality restriction on the random coefficients . However, Choi et al. Choi2014 have proved theoretically the equivalence between the DO and the BO methods, in a sense that one method is an exact reformulation of the other via a differential transformation.
Each method can be applied to a limited range of problems since the evolution equations are valid only if certain assumptions are satisfied. For the DO method, it is assumed that the covariance matrix of random coefficients, , is invertible. Therefore, it fails when applied to some benchmark problems such as a stochastic PDE with deterministic initial condition. This is because all coefficients are equal to 0 at the initial state and their covariance matrix is singular. For the BO method, it is assumed that there is no eigenvalue crossing in the given time domain in order to calculate the explicit expression of and . However, some strategies have been proposed to get around those issues, e.g., the hybrid gPC-DO method Choi2013 and the psedo-inverse hybrid BO-DO method Hessam2017 are developed to address the above limitations.
Inspired by the DO and BO methods, we introduce a new procedure for solving time-dependent stochastic PDEs within the framework of Physics-Informed Neural Networks (PINNs). The proposed methods inherit the similarities between the DO and the BO methods and can be implemented with either the DO or the BO version that are free from the aforementioned restrictions.
4.1 Physics-informed neural network
In this part, we briefly review using DNNs to solve the deterministic differential equations Lagaris1998; Lagaris2000; Raissi2017a, and its generalization for solving deterministic inverse problems in Raissi2017b. Suppose that we have a parameterized deterministic differential equation:
where is the solution and denotes the parameters.
A DNN, denoted by , is constructed as a surrogate of the solution , and it takes the coordinate as the input and outputs a vector that has the same dimension as . Here we use to denote the DNN parameters that will be tuned at the training stage, namely, contains all the weights and biases in . For this surrogate network , we can take its derivatives with respect to its input by applying the chain rule for differentiating compositions of functions using the automatic differentiation, which is conveniently integrated in many machine learning packages such as Tensorflow Abadi2016. The restrictions on is two-fold: first, given the set of scattered data of the observations, the network should be able to reproduce the observed value, when taking the associated as input; second, should comply with the physics imposed by Eq. 24. The second part is achieved by defining a residual network:
which is computed from straightforwardly with automatic differentiation. This residual network , also named the physics-informed neural network (PINN), shares the same parameters with network and should output the constant 0 for any input . Figure 1 shows a sketch of the PINN. At the training stage, the shared parameters (and also , if it is also to be inferred) are fine-tuned to minimize a loss function that reflects the above two constraints.
Suppose we have a total number of observations on , collected at location , and is the number of training points where we evaluate the residual . We shall use to represent a single instance of training data, where the first entry denotes the input and the second entry denotes the anticipated output (also called “label”). The workflow of solving a differential equation with PINN can be summarized as follows:
4.2 A weak formulation interpretation of the DO and BO methods
The derivation of equations for both the DO and the BO methods can be summarized into four steps as follows:
Apply operator on both sides of the SPDE:
Apply operator on both sides of the SPDE:
Due to the orthogonality of , they form a valid set of basis in the physical space . The random coefficients are also linearly independent as they are orthogonal under the BO representation, and the DO representation is equivalent to the BO representation so will not degenerate in the DO expansion either. Therefore, the random coefficients form a valid set of basis in the probability space . Consequently, Eq. 28–30 are the weak formulation of the original SPDE in the physical space and the probability space (note that Eq. 28 is the inner product of both side of the original SPDE on constant , which can be regarded as the basis in the probability space), and they provide all the necessary information to find the solution in .
4.3 NN-DO/BO Methods
In this section we formalize the algorithm of solving time-dependent stochastic PDEs using PINNs. First, we rewrite Eq. 11 as
while enforcing and . The time-dependent coefficients are scaling factors and play the role of when we compare Eq. 31 with the standard KL expansion Eq. 5. Suppose that the original SPDE is parameterized into a PDE that involves a finite set of random variables , then can be written as and can be written as . Four separate neural networks are constructed:
The neural net that takes and as the input and outputs ;
The neural net that takes as the input and outputs a -dimensional vector representing , for ;
The neural net that takes and as the input and outputs a -dimensional vector representing , for ;
The neural net that takes and as the input and outputs a -dimensional vector representing , for .
A surrogate neural net for the solution can be constructed from those four neural nets by substituting them into Eq. 31, yielding
Since the weak formulation of SPDE involves integration in both the physical and the probability spaces, the neural nets are evaluated at the physical training points and the probabilistic training points , where and are the numbers of training points. In the time domain we uniformly sample random points . Once we have constructed the computation graph, the derivatives of the quantity of interest with respect to time and space coordinate can be easily obtained via the auto-differentiation algorithm, and the integration terms can be evaluated by using a numerical quadrature rule.
The loss function is a weighted summation of four components: the weak formulation of SPDE, initial/boundary conditions, constraints on and , and the additional regularization terms. The loss function in each part consists of mean squared errors (MSEs) associated with the prescribed constraints, calculated from the sampled training points. Next, we will illustrate each of these four components of loss function and write down their explicit expressions.
4.3.1 Loss Function for the Weak Formulation of SPDE
4.3.2 Loss Function for Initial and Boundary Conditions
Let be the initial time of computation. The initial condition for the representation in Eq. 31 is similar to Eq. 16. The only difference is that are normalized to have unit variance, and the standard deviation of is assigned to be the initial value for . Here are the normalized KL modes for , and they are the initial value for . That is,
For deterministic initial condition, are set to be orthonormal bases satisfying the boundary condition, are set to be the gPC bases of with unit variance, and is set to be . The initial condition shall be imposed to the neural network by adding an extra penalty term , and it is calculated as follows:
The boundary condition is imposed by taking the weak formulation of Eq. 4 in the random space, i.e.,
Thus, the loss associated with the boundary condition is
where the expectations and covariance matrix shall be evaluated by using a numerical quadrature rule.
Note that the periodic boundary condition can be strictly imposed by modifying the neural nets and by replacing the input with the combination of and , where is the length of domain . This is because any continuous -periodic function can be written as a nonlinear function of and . This modification simplifies the loss function by removing the loss due to the periodic boundary condition.
4.3.3 Loss Function for the Constraints on and
This is the part where we can have different implementations in favor of the DO or the BO method. Both DO and BO representations require that , and thus the loss functions in both implementations should involve the term For the DO constraint, Eq. 12 should be satisfied. In addition, we require that
so that stay normalized with unit variance. The loss function for DO is:
For the BO constraints, Eq. 20 generates the following loss function:
Since we put all the scaling factor to and keep normalized, in Eq. 20 still holds true when is equal to .
4.3.4 Loss Function for Additional Regularization
Additional regularization terms shall be added to the loss function to reduce the risk of overfitting. Here we remark that it is helpful to add a penalty term from the original equation (Eq. 2) to speed up the training. The loss from the original equation is:
4.3.5 Putting the Loss Functions Together
A sketch of the computation graph for the loss functions and is shown in Figure 2. The loss function used for training the PINNs is the weighted summation of the aforementioned MSEs. Intuitively, we want to train the networks to gradually satisfy the weak formulation of the SPDE, while not violating the physical restrictions and the DO/BO constraints, as these are the cornerstones of the weak formulation. Therefore, we put a relatively large weight in front of , and , making their scale the same as that of , if not slightly larger. The idea behind this is to remove the redundancy of Eq. 31 in the first place. We put a small weight for the regularization term since it is only used to help speedup the training process, and is not essential. Nevertheless, the distribution of weights is still an open question for future research. In the numerical tests we train our neural nets by minimizing the following loss function:
This proposed algorithm can be implemented with the DO or the BO constraints, and we name them the NN-DO or NN-BO method, respectively. The proposed algorithm is summarized as follows:
We remark that the bottleneck of the original DO/BO method is to generate an explicit expression for the temporal derivatives of the bases (Step 4 in Section 4.2). For the standard DO method, it involves calculating the inverse of a covariance matrix which could be singular, and for the standard BO method, to obtain explicit expression for matrices and (Eq. 19), one has to assume no eigenvalue crossing. In the proposed NN-DO/BO algorithm, there is no need to derive explicit expressions from constraints, instead we only need to write the constraints into the loss functions as they are.
5 Simulation Results
We first test our NN-DO/BO methods with two benchmark cases that are especially designed to have exact solutions for the DO and BO representations. To demonstrate the advantage of the NN-DO/BO methods over the standard methods, we then solve a nonlinear diffusion-reaction equation with a 19-dimensional random input, where the problem is solved with very rough initial conditions given as discrete point values. Finally, an inverse problem is also considered to demonstrate the new capacity of the proposed NN-DO/BO methods. For all test cases we use deep feed-forward neural networks for , , and . The loss functions are defined in Eq. 44, and the Adam optimizer with learning rate 0.001 is used to train the networks.
5.1 Application to a Linear Stochastic Problem
In this section we present a pedagogical example by solving the linear stochastic advection equation using the NN-DO/BO methods. The stochastic advection equation with a random advection coefficient has the form
where the physical domain is and we obtain the solution until final time . Periodic boundary conditions are considered, such that , . The randomness comes from the advection velocity, which is modeled as a Gaussian random variable where we set to be 0.8.
The exact solutions for the mean and variance of the stochastic advection equation, Eq. 45, can be calculated, and the closed form formulas of the DO and BO expansion components and , can be derived ChoiThesis. Here we write down the exact solution and the expansion components without giving details of the derivation:
and the normalizing factors