Physics-informed Neural Networks for Solving Nonlinear Diffusivity and Biot’s equations
This paper presents the potential of applying physics-informed neural networks for solving nonlinear multiphysics problems, which are essential to many fields such as biomedical engineering, earthquake prediction, and underground energy harvesting. Specifically, we investigate how to extend the methodology of physics-informed neural networks to solve both the forward and inverse problems in relation to the nonlinear diffusivity and Biot’s equations. We explore the accuracy of the physics-informed neural networks with different training example sizes and choices of hyperparameters. The impacts of the stochastic variations between various training realizations are also investigated. In the inverse case, we also study the effects of noisy measurements. Furthermore, we address the challenge of selecting the hyperparameters of the inverse model and illustrate how this challenge is linked to the hyperparameters selection performed for the forward one.
physics-informed neural networks nonlinear Biot’s equations deep learning inverse problem
The volumetric displacement of a porous medium caused by the changes in fluid pressure inside the pore spaces is essential for many applications, including groundwater flow, underground heat mining, fossil fuel production, earthquake mechanics, and biomedical engineering [nick2013reactive, bisdom2016impact, lee2016pressure, juanes2016were, vinje2018fluid]. Such volumetric deformation may impact the hydraulic storability and permeability of porous material, which influences the fluid flow behavior. This multiphysics problem, i.e., the coupling between fluid flow and solid deformation, can be captured through the application of Biot’s equations of poroelasticity [biot1941general, biot1957elastic]. The Biot’s equations can be solved by analytical solutions for simple cases [terzaghi1951theoretical, wang2017theory]. In complex cases, the finite difference approximation [masson2010finite, wenzlau2009finite], finite volume discretization [nordbotten2014cell, sokolova2019multiscale], or more commonly finite element methods such as the mixed formulation or discontinuous/enriched Galerkin, [Haga2012, murad2013new, wheeler2014coupling, bouklas2015nonlinear, bouklas2015effect, choo2018enriched, liu2018lowest, liu2009coupled], can be used. These numerical methods, however, require significant computational resources, and the accuracy of the solution depends heavily on the quality of the generated mesh. Hence, these methods may not be suitable to handle an inverse problem [hansen2010discrete, hesthaven2016certified] or a forward problem with complex geometries [wang2017physics, raissi2018deep, raissi2019physics].
Recent proposals have speculated that neural-network-based approaches such as deep learning might be an appealing alternative in solving physical problems, which are governed by partial differential equations since they operate without a mesh and are scalable on multithreaded systems [kutz2017deep, wang2018model, zhang2019application, raissi2018deep, raissi2019physics]. Moreover, deep learning has been successfully applied to many applications [krizhevsky2012imagenet, alipanahi2015predicting, shen2017deep] because of its capability to handle highly nonlinear problems [lecun2015deep]. As this technique, in general, requires a significantly large data set to reach a reasonable accuracy [ahmed2015improved], its applicability to many scientific and industrial problems can be a challenge [raissi2017inferring]. Use of a priori knowledge, however, can reduce the amount of examples needed. For instance, the idea of encoding physical information into the architectures and loss functions of the deep neural networks has been successfully applied to computational fluid dynamics problems [xiao2016quantifying, wang2017physics, raissi2019physics, kutz2017deep, wang2018model, zhang2019application]. The published results illustrate that by incorporating the physical information in the form of regularization terms of the loss functions, the neural networks can be trained to provide good accuracy with a reasonably sized data set.
Since the coupled fluid and solid mechanics process is highly nonlinear and generally involves complex geometries [ruiz2019modelling, choo2018enriched, Kadeethum2019], it seems to fit well into the context of physics-informed neural networks (PINN). For this reason, we propose to apply PINN for solving the nonlinear diffusivity and Biot’s equations, both concerning forward and inverse modeling. The rest of the paper is organized as follows. The governing equations of the coupled solid and fluid mechanics are presented in the methodology section. Subsequently, the PINN architecture and its loss functions are defined. We then present forward and inverse modeling results for both the nonlinear diffusivity equation as well as Biot’s equations. We also study the impact of the stochastic variations of the training procedures on the accuracy of the predicted primary variables and estimated parameters. Finally, we conclude the findings and describe the possibilities that can enhance the capability of this model, which should be addressed in future works.
We are interested in solving the nonlinear Biot’s equations on the closed domain () which amounts to a time-dependent multiphysics problem coupling solid deformation with fluid flow. Let be the domain of interest in -dimensional space where or and bounded by boundary, . can be decomposed into displacement and traction boundaries, and , respectively, for the solid deformation problem. For the fluid flow problem, is decomposed into pressure and flux boundaries, and , respectively. In short, and represent the first-kind boundary condition or Dirichlet boundary condition (). The and , on the other hand, represent the second-kind boundary condition or Neumann boundary condition (). The time domain is denoted by with .
As just stated, the coupling between the fluid flow and solid deformation can be captured through the application of Biot’s equations of poroelasticity, which is composed of linear momentum and mass balance equations [biot1941general]. The linear momentum balance equation can be written as follows:
where is displacement, is fluid pressure, is body force. The bold-face letters or symbols denote tensors and vectors, and the normal letters or symbols denote scalar values. Here, is the total stress, which is defined as:
where is the identity tensor and is Biot’s coefficient defined as [Jaeger2010]:
with the bulk modulus of a rock matrix and the solid grains modulus . In addition, is an effective stress defined as:
where and are Lamé constants, is strain assuming infinitesimal displacements defined as:
We can write the linear momentum balance and its boundary conditions as:
where and are prescribed displacement and traction at boundaries, respectively, is a normal unit vector, and is time.
The mass balance equation is written as [coussy2004poromechanics, Kadeethum2019]:
where is fluid density, is initial porosity and remains constant throughout the simulation (the volumetric deformation is represented by ), is fluid compressibility, is a gravitational vector, is sink/source, and are specified pressure and flux, respectively, represents a nonlinear operator, and is hydraulic conductivity defined as:
where the tensor components characterize the transformation of the components of the gradient of fluid pressure into the components of the velocity vector. The , , and represent the matrix permeability in x-, y-, and z-direction, respectively. In this study, all off-diagonal terms are zero because we assume that a porous media is isotropic and homogeneous [durlofsky1991numerical, holden1992tensor]. Note that the diffusivity equation is a specific case of the Biot’s equations since Eqs (6) and (10) decouple when . The details of this equation are presented in the results and discussion section.
Physics-informed neural networks model
Neural network architecture
The neural network architecture used in this study is presented in Fig 1 [rumelhart1986learning, lecun2015deep, hinton2006reducing]. The number of input and output nodes in the neural networks are determined from the problem formulation; for example, if the problem is time-dependent poroelasticity (as discussed in the previous section) and bounded by , we have two input nodes ( and ) and two output nodes ( and ) where is coordinate in x-direction, is time, is the displacement in x-direction, and is fluid pressure. The number of hidden layers () and the number of neurons () act as so-called hyperparameters [goodfellow2016deep]. Each neuron (e.g., ) is connected to the nodes of the previous layer with adjustable weights and also has an adjustable bias. We denote the set of weights and biases as () and (), respectively. These variables are learned during a training phase [hinton2006reducing, goodfellow2016deep]. In this paper, we define all hidden layers to have the same number of neurons.
Physics-informed neural networks encode the information given by the differential operators as specific regularizing terms of the loss functions used when training the networks (see the section below). Because the training examples that are used to evaluate these extra regularizing terms, in general, are different from those used to train the network shown in Fig 1, one conceptually introduces an additional neural network - denoted the physical informed neural network [raissi2019physics]. This additional neural network is dependent on all the and of the first neural network, but it introduces some extra variables to be learned for the inverse modeling case. This is elaborated in more detail in the below section on training the PINN. Note that while the neural network shown in Fig 1 can be seen as point-to-point learning, the physics-informed regularization indirectly contributes to the local interactions (the stencil). The neural networks are built on the Tensorflow platform [tensorflow2015-whitepaper]. The results produced using either the rectified linear unit (ReLU) or the hyperbolic tangent () were comparable. Hence, we only present the results using the activation function in this paper.
We encode the underlying physical information to the neural networks through the so-called physics-informed function (), acting as additional regularizing terms in the loss function defined below.
For the linear momentum balance equation Eq (6), we define as follows:
and with reference to Eq (8) for its :
and for the mass balance equation Eq (10), we define the as:
and for its according to Eq (12)
Loss function definition
The loss function applied with the PINN scheme is composed of two parts (here we use a mean squared error - as the metric). The error on the training data () and the mean square value of the regularization term given by the physics-informed function ():
where , , , and correspond to the loss function of Eqs (15), (16), (17), and (18), respectively. The Dirichlet boundary conditions given on with respect to the linear momentum Eq (7) and mass balance Eq (11) equations are automatically incorporated into the .
A graphical presentation of how boundary points, initial points, as well as domain data, are used for training the neural networks is provided in Fig 2. For the forward model, the set of points that constitutes and contributes to the term of the loss function. We denote this set of training data as . Besides, we have a set of collocation points that are sampled from and . These training data are used to minimize the parts of the loss function, and we denote these data points as .
For the inverse model, we will have a set of known/measured data points that can be sampled from the whole domain, i.e., . We refer to this set of training data as . All this data will contribute to both the and the terms of the loss function. Also, we may include an extra set of collocation points (i.e., data where we would have no measured values), which would contribute only to the term.
Training the PINN
The structure of a PINN architecture is shown in Fig 3. We assume a simple case of two inputs, and , one output, , and one hidden layer with two neurons. Moreover, we here assume that is a function of , the first derivative of with respect to the set of inputs, and the set of physical parameters, . With we represent a training set, where the values are known for given and . This set is used to evaluate the term of Eq (19) at each training cycle. The part of Eq (19) is obtained by calculating for the collocation points ; see bottom part of Fig 3. Calculating involves knowing and its derivatives with respect to and . As these values are unknown for the collocation points, they are estimated using the current approximation of and its derivatives, which can be obtained using automatic differentiation [griewank1989automatic, corliss2002automatic]. In this way, the regularization term indirectly depends on the weights, , and biases, , of the neural architecture shown at the top of Fig 3. As a result, can be written as a function of , , , , and . If we denote this function, , we have . Note that the specific mapping of and to has been defined as the physical informed neural network in previous work [raissi2019physics] as opposed to the neural network shown in the top of Fig 3.
For both the forward and the inverse modeling cases, one trains the neural networks to establish a mapping from the input space given by and to the output space, , by minimizing and . The essential differences between the two cases come down to the type of training examples being available to train the networks, as discussed in Fig 2, and whether the physical parameters () are known or not. For the forward modeling, we apply to evaluate the term and to evaluate the term. The and are then adjusted to minimize the sum of these terms. One should emphasize that for the forward modeling, all the variables to be learned belong to the neural network depicted in the top of Fig 3.
For the inverse modeling, the aim is to estimate . We still, however, train a neural network to predict (similar to the forward case). That is, we are not using a loss function that involves measuring a distance between estimated values of and their ground truth values. Instead, the reasoning behind solving the inverse problem is that we expect the unknown to converge towards their true values during training because we also allow to be adjusted along with and . During a training phase, these variables are learned to minimize the combined sum of and . Specifically, the variables are adjusted by backpropagating the errors as calculated by Eq. (19) [hinton2006reducing, goodfellow2016deep]. Unlike the forward problem, the boundary and initial conditions are unknown and cannot be used to generate training examples. Instead, we provide training examples that, in real cases, would be obtained from measurements, which ideally correspond to solution points of the forward problem inside (see Fig 2b).
As discussed above, the solution of the inverse problem is based on training the neural network to establish a mapping from and to as we do in the forward case. This means that the hyperparameters estimated from the forward modeling may act as qualified estimates for the hyperparameters in the inverse case, assuming the number of training examples in both cases are similar. By definition of the inverse problem, we do not know the true values of , so we would have to assume that the exact values of have a limited influence on the hyperparameters. A more direct way of estimating the hyperparameters in the inverse case would be to divide the data into training and validation sets and then select the hyperparameters that minimize with respect to learning the mapping from and to . The downside of this is that we could then not spend all the measurement data on training the neural networks.
Results and Discussion
We first consider the results of forward and inverse models of the nonlinear diffusivity equation. Subsequently, we present the results of the nonlinear Biot’s equations for both the forward and inverse models.
Nonlinear diffusivity equation
We assume to decouple the momentum and mass balance equations and focus on Eq (10), which is reduced to
We take , , and choose the exact solution in as:
where and represent points in x-direction and time domain, respectively. The is assumed to be a scalar in this case, i.e., . All the physical constants are set to ; and subsequently, is chosen as:
to satisfy the exact solution. Furthermore, the homogeneous boundary conditions are applied to all boundaries and initial conditions using Eq (24). Combining Eqs (23) to (26), the physics-informed function () for the nonlinear diffusivity equation is defined:
We generate the solution based on an interval mesh having 2559 equidistant spatial intervals and 99 temporal ones; hence, in total, we have 256000 solution points, including the points on boundaries. Subsequently, we randomly draw training examples. Half of the remaining points are randomly selected as a validation set, and the remaining ones are used for testing. As an illustration, if we have 256000 solution points and use 100 examples to train the model. We then use 127950 examples for the validation and 127950 examples for the test set.
For the nonlinear diffusivity equation, the forward modeling with a neural network should calculate the at any given and from provided values of , , and . For the inverse case, the aim is to infer , , and from observed values of , , and . The architecture of the neural network corresponding to the top of Fig 3 is illustrated in Fig 4. We have two input nodes and one output node for this case. We use L-BFGS [liu1989limited]; a quasi-Newton, fullbatch gradient-based optimization algorithm to minimize the loss function with stop criteria as where and are previous and current iteration, respectively. The L-BFGS algorithm has several advantages that are suitable for this study; for example, it is more stable compared to stochastic gradient descent (SGD) and can handle large batch sizes well [liu1989limited, malouf2002comparison]. This setting is used for all the simulations presented in this paper unless stated otherwise.
To reiterate, the sampling strategies of the forward and inverse modelings are illustrated in Fig 2. To be specific, in the forward case, we must determine the solution of the partial differential equation based on known boundary values for the time interval of combined with the initial values at . These boundary and initial points are used to calculate the term of Eq. 19. The inner points act as collocation points and are used to calculate the term of Eq. 19. Note that for the forward case, we can pick as many points as we wish to calculate for both terms of the loss function. For the inverse model, we know a set of and their corresponding values of in advance. These examples are used to calculate the and terms of Eq. 19. In a real setting, the amount of training examples is limited by the available measurements.
Verification of the forward model
Following Eq (19) we obtain:
where refer to the initial and boundary training data on while denote the collocation points for , which are sampled using the Latin Hypercube method provided within the pyDOE package [baudin2015pydoe]. The is the combined number of initial and boundary training data, and is the number of collocation points.
In Table 1, we explore the accuracy of the PINN as a function of the number of training examples, i.e., different numbers of and for a given size of the network; and are fixed to four and ten, respectively. Due to the stochastic behavior of the training procedure of the neural networks, we calculate the results as an average over many realizations as it otherwise might be difficult to observe a clear trend of the relative error as a function of the number of training examples (see Panel A of Table 1). From Panels B and C of Table 1, we observe that the average accuracy of the PINN is improved as and are increased. Moreover, the combined number of and to obtain an average value of the error below is in the order of 100s. To illustrate the impact of terms, we consider the case where we simply train a neural network to interpolate a feed-forward solution based on a known set of solution points. Without regularization we obtain an average relative error of based on 96 training examples while the average error becomes less than when including .
Next, we perform a sensitivity analysis to explore how the PINN performance depends on the hyperparameters, and . We do this using a fixed size of the training set with and equal to 96 and 160, respectively. Ideally, the explored space of the hyperparameters should reveal a parameter set that produces a minimum of the loss function on a validation set. To deal with the stochastic nature of the training procedure, again, we calculate average values obtained over many realizations. The results are presented in Table 2. From panel C, we observe that selecting and corresponds to the best accuracy in the explored space. For a smaller and larger size of the architecture, the accuracy decreases, corresponding to underfitting and overfitting, respectively [tetko1995neural]. We now apply the trained PINN model using , , , and to the test set. The relative error is 1.43 , which is comparable to that of the validation set (see Table 2 - Panel C).
For the selected model (, , = 96, and = 160), we illustrate the behavior of the mean square training error of this model as function of the training iterations in Fig 5. One can observe that is higher than in the beginning, but later it approaches zero faster than . The converges steadily without any oscillations.
Inverse model of diffusivity equation
We rewrite Eq (27) in the parametrized form [hesthaven2016certified] as presented below:
Unlike the forward mode, where the weights and biases are the unknown parameters to be learned, we now have two more unknowns, and . Using Eq (19) we have:
where refers to the set of training data and is the number of training data. In contrast to the forward model, we do not need to specify specific collocation points to activate the dependent loss term; here we can just use the training examples used to calculate . All physical constants are set to one; hence, and . Note that and are considered constant throughout the domain.
We use the hyperparameters obtained from the sensitivity analysis of the forward model (see Panel C of Table 2), i.e., and . In Table 3, we illustrate that this choice also yields the least error of and percentage errors of and for the inverse problem, supporting the heuristic arguments presented above in the section "Training the PINN". Moreover, we find that the error of and percentage error of and are not much different with different combinations of the hyperparameters. Note that the error of and percentage errors of and presented in Table 3 represent average values over 24 realizations.
We depict the performance of the PINN model for solving the inverse problem as a function of in Fig 6. The reported error values of and are percentage errors, while the relative error is shown for . The error bars show the standard derivation (1 SD) based on 24 realizations. We observe that a minimum of is required by the PINN model to avoid substantial stochastic fluctuations in the estimated values of and . Moreover, we observe that the PINN model with provides average percentage errors of and less than 1 %, but with a large standard deviation. Over 24 realizations of the trained PINN with , the minimum percentage errors are and and the maximum are and , for and , respectively. To obtain 200 training examples in a real setting, i.e., lab experiments or field observations, is realistic; hence, the results illustrate the feasibility of PINN to solve the inverse problem based on a reasonably sized data set.
Next, we perform a systematic study of the effect of additive noise in data, which is created from the true data as follows [raissi2019physics]:
where and is the vector of the data with and without noise, respectively. The determines the noise level, represents a standard deviation operator, is a random value, which is sampled from the Gaussian distribution with mean and standard deviation of zero and one, respectively. The noise generated from this procedure is fully random and uncorrelated. The results are presented in Table 4 as average values over 10 training realizations. We can observe that the error increases with the noise level () while the error decreases as the is increased. As expected, the PINN model requires more data to accurately approximate the unknown physical parameters when the noise level is high.
Nonlinear Biot’s equations
From the nonlinear diffusivity equation section, we have shown that the PINN model can solve forward and inverse problems. We then progress to the multiphysics problem represented by the nonlinear Biot’s equations. We take , , and choose the exact solution in as:
for the displacement variable where and are displacements in x- and y-direction, respectively. Note that as we focus on the 2-Dimensional domain; therefore, is composed of two spatial components. For the pressure variable, we choose
Here , , and represent points in x-, y-direction, and time domain, respectively. The is chosen as:
where represent initial rock matrix conductivity. Again, we assume to be a scalar in this case, i.e., . The is the total volumetric strain defined as:
The choice of function is selected to represent the change in a volumetric strain that affects the porous media conductivity, and it is adapted from [Du2007, abou2013petroleum, kadeethum2019comparison]. All the physical constants are set to ; and subsequently, is chosen as:
We generate the exact solution points, Eqs (36) and (37), based on a rectangular mesh () with 99 equidistant intervals in both x- and y-direction, i.e. . Using 49 equidistant temporal intervals, in total, we have 500000 examples. Similar to the diffusivity equation case, we draw training examples randomly. Subsequently, we split the remaining examples equally for validation and test sets. Again, assuming we have 500000 solution points for the sake of illustration, we use 100 examples to train the model; we then have 249950 examples for both the validation and the test sets.
To recap, the forward modeling of Biot’s system aims to predict the displacement () and pressure () by specifying the initial and boundary conditions, collocation points, and as well as the physical parameters (, , , , , , and ). The inverse modeling, however, aims to estimate the physical parameters from observed values of and with their corresponding values of , , and .
In the case of the nonlinear Biot’s equations, the architecture of the neural network corresponding to the top of Fig 3 is presented in Fig 7. We have three input nodes and three output nodes for this case. For the forward modeling, the hyperparameters and are found using a sensitivity analysis, and as argued in the above section, "Training the PINN," we apply the same hyperparameters for the inverse model. Again, we use L-BFGS; a quasi-Newton, fullbatch gradient-based optimization algorithm to minimize the loss function [liu1989limited] for the forward model. For the inverse problem, we find that combining ADAM, stochastic gradient descent, and L-BFGS might provide faster convergence when training the neural network. Specifically, we use ADAM for the first 10000 iterations and then continue using L-BFGS until the stop criterion is met. Note that we apply the same stop criterion as described above for the diffusivity case. Since the ADAM is a first-order method compared to L-BFGS, which is a second-order model, ADAM is less computationally demanding. Initially, where the weights of the neural network are far from convergence, we speculate that the less computational effort of ADAM is an advantage. However, as we approach the minimum, L-BFGS is likely to provide a better estimate of the steepest descent. Whether these observations could be made when dealing with other types of partial differential equations is an open question, but the use of a similar combined optimization scheme has been reported in the literature for the case of the Navier-Stokes equations [raissi2019physics].
Verification of the forward model
Again, we apply Eq. (19) and obtain:
where refer to the initial and boundary training data. and specify the collocation points for and , as defined in Eqs (15) and (17). Similar to the diffusivity equation case, these collocation points are sampled using the Latin Hypercube method [baudin2015pydoe]. denotes the number of initial and boundary training data, and and are the number of collocation points for and , respectively. For the sake of simplification, in this investigation, we assume and .
In Fig 8, we illustrate an example of exact solutions of , , and , and compare them to the prediction values obtained from our test set using the PINN trained with , , , and . This figure demonstrates that the PINN provides good approximations of the exact solutions. The dependency of the prediction accuracy on and with the and fixed to four and ten, respectively is illustrated in Table 5. Again, to deal with the stochastic behavior of the neural networks, we calculate an average of the relative error over many realizations to obtain a clear pattern; see Panels B and C of Table 5. Similar to the diffusivity equation case, we observe the accuracy of the PINN is improved when and are increased. We also note that the total amount of training examples required to achieve high accuracy, i.e., an average error less than , is in the order of 100s. Again, we can illustrate the impact of the terms by considering the case where we train a neural network to interpolate a feed-forward solution based on a known set of solution points with and without making use of the regularization terms. Without using we obtain an average relative error of based on 96 solution points while the average error becomes less than when including .
In Table 6, we present a sensitivity analysis of and with a fixed size of the training set; and . Once again, the observed trend becomes more apparent by averaging over many training realizations. We can now identify an extremum in the explored space of the hyperparameters corresponding to have and ; see Panel C of Table 6. We then apply all of the 27 PINN networks trained with that choice of hyperparameters to the test set. We obtain the average error of the test set to be 9.05 , which is comparable to that of the validation set, 9.08 .
In Fig 9, we show the behavior of the different loss terms as function of the training iterations, when using ,