Learning in Modal Space: Solving TimeDependent Stochastic PDEs Using PhysicsInformed Neural Networks
Abstract
One of the open problems in scientific computing is the longtime integration of nonlinear stochastic partial differential equations (SPDEs), especially with arbitrary initial data. We address this problem by taking advantage of recent advances in scientific machine learning and the spectral dynamically orthogonal (DO) and biorthogonal (BO) methods for representing stochastic processes. The recently introduced DO/BO methods reduce the SPDE into solving a system of deterministic PDEs and a system of stochastic ordinary differential equations. Specifically, we propose two new PhysicsInformed Neural Networks (PINNs) for solving timedependent SPDEs, namely the NNDO/BO methods. The proposed methods incorporate the DO/BO constraints into the loss function (along with the modal decomposition of the SPDE) with an implicit form instead of generating explicit expressions for the temporal derivatives of the DO/BO modes. Hence, the NNDO/BO methods can overcome some of the drawbacks of the original DO/BO methods. For example, we do not need the assumption that the covariance matrix of the random coefficients is invertible as in the original DO method, and we can remove the assumption of no eigenvalue crossing as in the original BO method. Moreover, the NNDO/BO methods can be used to solve timedependent stochastic inverse problems with the same formulation and same computational complexity as for forward problems. We demonstrate the capability of the proposed methods via several numerical examples, namely: (1) A linear stochastic advection equation with deterministic initial condition: we obtain good results with the proposed methods while the original DO/BO methods cannot be applied directly in this case. (2) Longtime integration of the stochastic Burgers’ equation: we show the good performance of NNDO/BO methods, especially the effectiveness of the NNBO approach for such problems with many eigenvalue crossings during the whole time evolution, while the original BO method fails. (3) Nonlinear reaction diffusion equation: we consider both the forward problem and the inverse problems, including very noisy initial point values, to investigate the flexibility of the NNDO/BO methods in handling inverse and mixed type problems. Taken together, these simulation results demonstrate that the NNDO/BO methods can be employed to effectively quantify uncertainty propagation in a wide range of physical problems but future work should address the efficiency issue of PINNs for forward problems.
keywords:
scientific machine learning, datadriven modeling, dynamical orthogonality, biorthogonality, uncertainty quantification, inverse problems1 Introduction
Physicsinformed neural networks (PINNs) Raissi and Karniadakis (2018) are a special class of PDEinduced networks that encode the physics (expressed by the PDE) into a deep neural network (DNN) that shares parameters with a standard DNN that approximates the quantity of interest (QoI), e.g. the solution of the PDE. In practice, this implies that the loss function that expresses mismatch in the labelled data is augmented by the residual of the PDE, which is represented efficiently by automatic differentiation and is evaluated at random points in the timespace domain. This approximation of the nonlinear operators by the DNN is justified theoretically based on the pioneering work of Chen and Chen (1993, 1995), which goes well beyond the universal function approximation theorem of Cybenko (1989). This simple and easy to program algorithm has been shown to be successful for diverse problems in physics and fluid mechanics Raissi (2018); Raissi et al. (2017, 2019), especially for inverse problems and even for discovering hidden physics Yazdani et al. (2018). The advantages of encoding the PDE itself into a DNN are multiple: (1) we require much less data to train the DNN since we are searching for the minima on the manifoldsolution of the PDE; (2) we respect the conservation laws of mass, momentum and energy; and most importantly, (3) we can truly predict the state of the system, unlike the DNNs driven solely by data that can interpolate accurately only within the training domain. While there is still a lot of work to be done to make PINNs efficient simulation machines, one of the main open issues is uncertainty quantification in predicting the QoI, which will reflect the various sources of uncertainty, i.e., from the approximation of the DNN to the data and physical model uncertainties.
In Zhang et al. (2018) we addressed the issue of total uncertainty for first time and combined dropout and arbitrary polynomial chaos to model stochasticity in steady SPDEs. Here, we consider the more difficult case of timedependent nonlinear SPDEs and hence we need to introduce a more effective way of dealing with the complexity of longtime integration of stochastic systems. To this end, we employ a generalized form of timedependent Karhunenloève (KL) decomposition, first introduced in Sapsis (2011), appropriate for secondorder random fields, which has the form:
(1) 
This approach can evolve the timedependent basis of modes and stochastic coefficients simultaneously, and it is different than the standard polynomial chaos methods Xiu and Karniadakis (2002); Xiu and Hesthaven (2005); Xiu (2010); Narayan and Zhou (2015). To remove the redundancy in this representation, we need some constraints. For example, this can be achieved by imposing dynamical constraints on the spatial basis, which is the socalled “dynamically orthogonal" (DO) methodology first proposed in Sapsis and Lermusiaux (2009, 2012). Alternatively, by imposing static constraints on both the spatial and stochastic basis, the “biorthogonal" (BO) methodology was developed in Cheng et al. (2013a, b). For both DO and BO we need to derive explicitly the evolution equations for all the components involved, i.e. the mean, spatial basis, and stochastic basis. The DO and BO formulations are mathematically equivalent Choi et al. (2014), but they exhibit computationally complimentary properties. Specifically, the BO formulation may fail due to crossing of the eigenvalues of the covariance matrix Choi et al. (2014), while both BO and DO become unstable when there is a high condition number of the covariance matrix or zero eigenvalues. A rigorous and sharp error bounds of DO method was first given by Zhou et al. in Musharbash et al. (2014), where it was shown that the DO modes can capture the effective directions. For more applications and improvements of DO/BO methods, we refer to Ueckermann et al. (2013); Choi et al. (2014); Subramani and Lermusiaux (2016); Babaee et al. (2017) and references therein.
The purpose of this paper is to combine PINNs and the DO/BO methodologies together to obtain new effective methods for solving timedependent SPDEs – we will refer to them as NNDO/BO methods. Concretely, we first build a surrogate neural net for the solution of the timedependent SPDEs based on the generalized KL expansion (Eq. 1). Then the DO/BO constraints are included into the loss function and we train the neural network by minimizing this loss function to obtain the solution. Compared with the original DO/BO method, the merits of the proposed methods are the following:

We do not need the assumption in our NNDO approach that the covariance matrix of the random coefficients is invertible, even for SPDEs with deterministic initial conditions.

We can deal with eigenvalue crossing in the given time domain when applying our NNBO approach.

The same NNDO/BO formulation and computer code can be applied for solving timedependent stochastic inverse problems or problems driven by sparse noisy data, with the same computational complexity.
The organization of this paper is as follows. In Section 2, we set up the timedependent stochastic problems. In Section 3, we give a brief review of the dynamically orthogonal and biorthogonal methodologies. In Section 4, we formulate our NNDO/BO framework after the introduction of the PINNs for solving deterministic differential equations. In Section 5, we provide a detailed study of the accuracy and performance of the NNDO/BO approach with numerical examples. We include two benchmark cases that are specifically designed to have exact solution for the DO and BO representations, followed by a nonlinear stochastic forward problem with high input stochastic dimensionality and noisy data as the initial condition, and a nonlinear inverse problem where we try to identify the model parameters. Finally, we conclude with a brief discussion in Section 6.
2 Problem Setup
Let be a probability space, where is the sample space, is the algebra of subsets of , and is a probability measure. Let be a bounded domain in ( or 3) whose boundary is denoted by , and be the time domain of interest. We consider the following timedependent SPDE:
(2) 
with initial and boundary conditions:
(3)  
(4) 
where is a differential operator and is a linear differential operator acting on the domain boundary. Assuming that our quantity of interest, , is a secondorder random field. The initial and boundary conditions for Eq. 2 are denoted by and . Our aim is to solve Eq. 2, and specifically, to evaluate the mean and standard deviation of the solution .
3 An Overview of the DO and BO Decomposition Methods
For a random field that evolves in time, the generalized KarhunenLoève (KL) expansion at a given time is
(5) 
where is the mean, and are the largest eigenvalue and the corresponding eigenfunction of the covariance kernel, i,e., they solve the following eigenproblem:
(6) 
Here is the covariance kernel of .
Next we consider a generalized expansion first proposed in Sapsis and Lermusiaux (2009):
(7) 
Similar to the KL expansion, the random field is decomposed into two parts: (i) the deterministic mean field function , and (ii) the random fluctuation part consists of an infinite summation of deterministic orthogonal fields with 0mean stochastic coefficients . Formally, we have
(8) 
(9) 
and
(10) 
We define the linear subspace as the linear space spanned by the first deterministic bases. For now let us assume that are linearly independent and is the linear subspace in spanned by the first stochastic coefficients. The truncated expansion , defined by
(11) 
is the projection of to the subspace . Without making any assumptions on their form, the governing equations Eq. 2 and Eq. 8–10 represent the only information that can be utilized to derive the evolution equations of and . Note that both the stochastic coefficients and the orthogonal bases are timedependent (and they are evolving according to the system dynamics), unlike the standard polynomial chaos where the stochastic coefficients are timeindependent. There exists some redundancy in the Eq. 11, and therefore, additional constraints need to be imposed in order to formulate a well posed problem for the unknown quantities. Here we review the DO and BO approaches, which have different assumptions on the constraints.
3.1 Dynamically Orthogonal (DO) Representation
As first proposed in Sapsis and Lermusiaux (2009), a natural constraint to overcome redundancy is that the evolution of the bases be orthogonal to the space ; this can be expressed through the following dynamically orthogonal (DO) condition:
(12) 
Comparing Eq. 11 with the classical KL expansion Eq. 5, in the DO representation, we set to have unit length and carry the scaling coefficient as the result of the eigenvalues. Note that the DO condition preserves the orthonormality and the length of the bases since
(13) 
It is proved in Sapsis and Lermusiaux (2009) that the DO condition leads to a set of independent and explicit evolution equations for all the unknown quantities. Here we state the DO evolution equations without proof:
Theorem 3.1 (see Sapsis and Lermusiaux (2009)).
Under the assumptions of the DO representation, the original SPDE (Eq. 2) is reduced to the following system of equations:
(14)  
The projection in the orthogonal complement of the linear subspace is defined as and the covariance of the stochastic coefficients is . The associated boundary conditions are determined by
(15) 
and the initial conditions at for the DO components are given by
(16)  
for all , where is the eigenfields of the classical KL expansion of .
It is shown in Sapsis and Lermusiaux (2009) that by imposing suitable restrictions on the DO representation, the equations for methods such as Polynomial Chaos (PC) or Proper Orthogonal Decomposition (POD) can be recovered from the DO evolution equations. For example, PC can be recovered by setting , where is an orthogonal polynomial in terms of . Moreover, it is shown in Choi (2014) that there exists an onetoone correspondence between the eigenvalues of the KL expansion for and the eigenvalues of the covariance matrix in the DO representation given any fixed time . Thus, the stochastic coefficients together with the modes provide the necessary information to describe both the shape and the magnitude of the uncertainty that characterizes a stochastic field but also the principle directions in which the stochasticity is distributed.
The moments of can be readily computed from the DO representation. For example, the first moment, i.e., the mean, can be trivially obtained from the first term , and the variance can be calculated as follows:
(17) 
3.2 BiOrthogonal (BO) Representation
In order to overcome the aforementioned redundancy, the BO condition imposes the static constraint, which is the biorthogonality of the spatial bases and stochastic coefficients in time Cheng et al. (2013b). In other words, we have the following conditions:
(18) 
where the s are eigenvalues of the solution. There is a slight difference between the DO and BO representation: the stochastic coefficients carry the eigenvalues of the covariance operator in the DO representation while the spatial bases carry the eigenvalues of the covariance operator in the BO representation.
Next, we define the matrix and whose entries are
(19) 
Then by taking the time derivative of Eq. 18, we have
(20)  
Here we state the BO evolution equations without proof:
Theorem 3.2 (see Cheng et al. (2013a)).
We assume that the bases and stochastic coefficients satisfy the BO condition. Then, the original SPDE (Eq. 2) is reduced to the following system of equations:
(21)  
Moreover, if for , the by matrices and have closed form expression:
(22)  
where the matrix is defined as .
Similar to the DO method, the boundary condition is given by
(23) 
and the initial condition is generated from the KL expansion of .
3.3 A Brief Summary of DO/BO Methods
Let us note the following difference between the DO and BO condition: the spatial bases, , under the DO condition evolve to the direction which is normal to the space they expand (orthogonality is automatically maintained), while under the BO condition, only the mutual orthogonality within and within are required and there is no restriction to the direction of their evolution. As a compensation to the lack of constraints in spatial bases, the BO condition puts an additional orthogonality restriction on the random coefficients . However, Choi et al. Choi et al. (2014) have proved theoretically the equivalence between the DO and the BO methods, in a sense that one method is an exact reformulation of the other via a differential transformation.
Each method can be applied to a limited range of problems since the evolution equations are valid only if certain assumptions are satisfied. For the DO method, it is assumed that the covariance matrix of random coefficients, , is invertible. Therefore, it fails when applied to some benchmark problems such as a stochastic PDE with deterministic initial condition. This is because all coefficients are equal to 0 at the initial state and their covariance matrix is undefined. For the BO method, it is assumed that there is no eigenvalue crossing in the given time domain in order to calculate the explicit expression of and . However, some strategies have been proposed to get around those issues, e.g., the hybrid gPCDO method Choi et al. (2013) and the psedoinverse hybrid BODO method Babaee et al. (2017) are developed to address the above limitations.
Inspired by the DO and BO methods, we introduce a new procedure for solving timedependent stochastic PDEs within the framework of PhysicsInformed Neural Networks (PINNs). The proposed methods inherit the similarities between the DO and the BO methods and can be implemented with either the DO or the BO version that are free from the aforementioned restrictions.
4 Methodology
4.1 Physicsinformed neural network
In this part, we briefly review using DNNs to solve the deterministic differential equations Lagaris et al. (1998, 2000); Raissi et al. (2017), and its generalization for solving deterministic inverse problems in Raissi et al. (2017). Suppose that we have a parameterized deterministic differential equation:
(24) 
where is the solution and denotes the parameters.
A DNN, denoted by , is constructed as a surrogate of the solution , and it takes the coordinate as the input and outputs a vector that has the same dimension as . Here we use to denote the DNN parameters that will be tuned at the training stage, namely, contains all the weights and biases in . For this surrogate network , we can take its derivatives with respect to its input by applying the chain rule for differentiating compositions of functions using the automatic differentiation, which is conveniently integrated in many machine learning packages such as Tensorflow Abadi et al. (2016). The restrictions on is twofold: first, given the set of scattered data of the observations, the network should be able to reproduce the observed value, when taking the associated as input; second, should comply with the physics imposed by Eq. 24. The second part is achieved by defining a residual network:
(25) 
which is computed from straightforwardly with automatic differentiation. This residual network , also named the physicsinformed neural network (PINN), shares the same parameters with network and should output the constant 0 for any input . Figure 1 shows a sketch of the PINN. At the training stage, the shared parameters (and also , if it is also to be inferred) are finetuned to minimize a loss function that reflects the above two constraints.
Suppose we have a total number of observations on , collected at location , and is the number of training points where we evaluate the residual . We shall use to represent a single instance of training data, where the first entry denotes the input and the second entry denotes the anticipated output (also called “label”). The workflow of solving a differential equation with PINN can be summarized as follows:
(26) 
(27) 
4.2 A weak formulation interpretation of the DO and BO methods
The derivation of equations for both the DO and the BO methods can be summarized into four steps as follows:

Apply operator on both sides of the SPDE:
(29) 
Apply operator on both sides of the SPDE:
(30)
Due to the orthogonality of , they form a valid set of basis in the physical space . The random coefficients are also linearly independent as they are orthogonal under the BO representation, and the DO representation is equivalent to the BO representation so will not degenerate in the DO expansion either. Therefore, the random coefficients form a valid set of basis in the probability space . Consequently, Eq. 28–30 are the weak formulation of the original SPDE in the physical space and the probability space (note that Eq. 28 is the inner product of both side of the original SPDE on constant , which can be regarded as the basis in the probability space), and they provide all the necessary information to find the solution in .
4.3 NNDO/BO Methods
In this section we formalize the algorithm of solving timedependent stochastic PDEs using PINNs. First, we rewrite Eq. 11 as
(31) 
while enforcing and . The timedependent coefficients are scaling factors and play the role of when we compare Eq. 31 with the classical KL expansion Eq. 5. Suppose that the original SPDE is parameterized into a PDE that involves a finite set of random variables , then can be written as and can be written as . Four separate neural networks are constructed:

The neural net that takes and as the input and outputs ;

The neural net that takes as the input and outputs a dimensional vector representing , for ;

The neural net that takes and as the input and outputs a dimensional vector representing , for ;

The neural net that takes and as the input and outputs a dimensional vector representing , for .
A surrogate neural net for the solution can be constructed from those four neural nets by substituting them into Eq. 31, yielding
(32) 
Since the weak formulation of SPDE involves integration in both the physical and the probability spaces, the neural nets are evaluated at the physical training points and the probabilistic training points , where and are the numbers of training points. In the time domain we uniformly sample random points . Once we have constructed the computation graph, the derivatives of the quantity of interest with respect to time and space coordinate can be easily obtained via the autodifferentiation algorithm, and the integration terms can be evaluated by using a numerical quadrature rule.
The loss function is a weighted summation of four components: the weak formulation of SPDE, initial/boundary conditions, constraints on and , and the additional regularization terms. The loss function in each part consists of mean squared errors (MSEs) associated with the prescribed constraints, calculated from the sampled training points. Next, we will illustrate each of these four components of loss function and write down their explicit expressions.
4.3.1 Loss Function for the Weak Formulation of SPDE
4.3.2 Loss Function for Initial and Boundary Conditions
Let be the initial time of computation. The initial condition for the representation in Eq. 31 is similar to Eq. 16. The only difference is that are normalized to have unit variance, and the standard deviation of is assigned to be the initial value for . Here are the normalized KL modes for , and they are the initial value for . That is,
(37)  
For deterministic initial condition, are set to be orthonormal bases satisfying the boundary condition, are set to be the gPC bases of with unit variance, and is set to be . The initial condition shall be imposed to the neural network by adding an extra penalty term , and it is calculated as follows:
(38)  
The boundary condition is imposed by taking the weak formulation of Eq. 4 in the random space, i.e.,
(39)  
Thus, the loss associated with the boundary condition is
(40)  
where the expectations and covariance matrix shall be evaluated by using a numerical quadrature rule.
Note that the periodic boundary condition can be strictly imposed by modifying the neural nets and by replacing the input with the combination of and , where is the length of domain . This is because any continuous periodic function can be written as a nonlinear function of and . This modification simplifies the loss function by removing the loss due to the periodic boundary condition.
4.3.3 Loss Function for the Constraints on and
This is the part where we can have different implementations in favor of the DO or the BO method. Both DO and BO representations require that , and thus the loss functions in both implementations should involve the term For the DO constraint, Eq. 12 should be satisfied. In addition, we require that
so that stay normalized with unit variance. The loss function for DO is:
(41)  
For the BO constraints, Eq. 20 generates the following loss function:
(42)  
Since we put all the scaling factor to and keep normalized, in Eq. 20 still holds true when is equal to .
4.3.4 Loss Function for Additional Regularization
Additional regularization terms shall be added to the loss function to reduce the risk of overfitting. Here we remark that it is helpful to add a penalty term from the original equation (Eq. 2) to speed up the training. The loss from the original equation is:
(43) 
4.3.5 Putting the Loss Functions Together
A sketch of the computation graph for the loss functions and is shown in Figure 2. The loss function used for training the PINNs is the weighted summation of the aforementioned MSEs. Intuitively, there would be too much redundancy in the expansion (Eq. 31) if the BO/DO constraints were not strictly enforced, and the training process would be meaningless. Therefore, we put large weights in the loss for DO/BO constraints and initial/boundary conditions to remove the redundancy in the first place, and we put a small weight for the regularization term since it is only used to help speedup the training process, and is not essential. Nevertheless, the distribution of weights is still an open question for future research. In the numerical tests we train our neural nets by minimizing the following loss function:
(44) 
This proposed algorithm can be implemented with the DO or the BO constraints, and we name them the NNDO or NNBO method, respectively. The proposed algorithm is summarized as follows:
We remark that the bottleneck of the original DO/BO method is to generate an explicit expression for the temporal derivatives of the bases (Step 4 in Section 4.2). For the standard DO method, it involves calculating the inverse of a covariance matrix which could be singular, and for the standard BO method, to obtain explicit expression for matrices and (Eq. 19), one has to assume no eigenvalue crossing. In the proposed NNDO/BO algorithm, there is no need to derive explicit expressions from constraints, instead we only need to write the constraints into the loss functions as they are.
5 Simulation Results
We first test our NNDO/BO methods with two benchmark cases that are especially designed to have exact solutions for the DO and BO representations. To demonstrate the advantage of the NNDO/BO methods over the classical methods, we then solve a nonlinear diffusionreaction equation with a 19dimensional random input, where the problem is solved with very rough initial conditions given as discrete point values. Finally, an inverse problem is also considered to demonstrate the new capacity of the proposed NNDO/BO methods. For all test cases we use deep feedforward neural networks for , , and . The loss functions are defined in Eq. 44, and the Adam optimizer with learning rate 0.001 is used to train the networks.
5.1 Application to a Linear Stochastic Problem
In this section we present a pedagogical example by solving the linear stochastic advection equation using the NNDO/BO methods. The stochastic advection equation with a random advection coefficient has the form
(45)  
where the physical domain is and we obtain the solution until final time . Periodic boundary conditions are considered, such that , . The randomness comes from the advection velocity, which is modeled as a Gaussian random variable where we set to be 0.8.
The exact solutions for the mean and variance of the stochastic advection equation, Eq. 45, can be calculated, and the closed form formulas of the DO and BO expansion components and , can be derived Choi (2014). Here we write down the exact solution and the expansion components without giving details of the derivation:

Exact solutions:
(46) 
DO components:
(47) where
(48) 
BO components:
(49) where
(50) and the normalizing factors
We set , and all to be 50. The data points in the time domain are sampled from a uniform distribution. The training points are equidistantly distributed in . For the training points in the stochastic space, instead of using the GaussHermite quadrature rule, we generate