PDENet: Learning PDEs from Data
Abstract
Partial differential equations (PDEs) play a prominent role in many disciplines such as applied mathematics, physics, chemistry, material science, computer science, etc. PDEs are commonly derived based on physical laws or empirical observations. However, the governing equations for many complex systems in modern applications are still not fully known. With the rapid development of sensors, computational power, and data storage in the past decade, huge quantities of data can be easily collected and efficiently stored. Such vast quantity of data offers new opportunities for datadriven discovery of hidden physical laws. Inspired by the latest development of neural network designs in deep learning, we propose a new feedforward deep network, called PDENet, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. The basic idea of the proposed PDENet is to learn differential operators by learning convolution kernels (filters), and apply neural networks or other machine learning methods to approximate the unknown nonlinear responses. Comparing with existing approaches, which either assume the form of the nonlinear response is known or fix certain finite difference approximations of differential operators, our approach has the most flexibility by learning both differential operators and the nonlinear responses. A special feature of the proposed PDENet is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). We also discuss relations of the PDENet with some existing networks in computer vision such as NetworkInNetwork (NIN) and Residual Neural Network (ResNet). Numerical experiments show that the PDENet has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.
PDENet: Learning PDEs from Data
Zichao Long, Yiping Lu 

School of Mathematical Sciences 
Peking University, Beijing, China 
{zlong,luyiping9712}@pku.edu.cn 
Xianzhong Ma^{†}^{†}thanks: Equal contribution. 

School of Mathematical Sciences, Peking University 
Beijing Computational Science Research Center 
Beijing, China 
xianzhongma@pku.edu.cn 
Bin Dong 

Beijing International Center for Mathematical Research, Peking University 
Center for Data Science, Peking University 
Beijing Institute of Big Data Research 
Beijing, China 
dongbin@math.pku.edu.cn 
1 Introduction
Differential equations, especially partial differential equations(PDEs), play a prominent role in many disciplines to describe the governing physical laws underlying a given system of interest. Traditionally, PDEs are derived based on simple physical principles such as conservation laws, minimum energy principles, or based on empirical observations. Important examples include the NavierStokes equations in fluid dynamics, the Maxwell’s equations for electromagnetic propagation, and the Schrödinger’s equations in quantum mechanics. However, many complex systems in modern applications (such as many problems in climate science, neuroscience, finance, etc.) still have eluded mechanisms, and the governing equations of these systems are only partially known. With the rapid development of sensors, computational power, and data storage in the last decade, huge quantities of data can be easily collected and efficiently stored . Such vast quantity of data offers new opportunities for datadriven discovery of potentially new physical laws. Then, one may ask the following interesting and intriguing question: can we learn a PDE model (if there exists one) from a given data set and perform accurate and efficient predictions using the learned model?
One of earlier attempts on datadriven discovery of hidden physical laws is by Bongard & Lipson (2007) and Schmidt & Lipson (2009). Their main idea is to compare numerical differentiations of the experimental data with analytic derivatives of candidate functions, and apply the symbolic regression and the evolutionary algorithm to determining the nonlinear dynamical system. Recently, Brunton et al. (2016), Schaeffer (2017), Rudy et al. (2017) and Wu & Zhang (2017) propose an alternative approach using sparse regression. They construct a dictionary of simple functions and partial derivatives that are likely to appear in the unknown governing equations. Then, they take advantage of sparsity promoting techniques to select candidates that most accurately represent the data. When the form of the nonlinear response of a PDE is known, except for some scalar parameters, Raissi & Karniadakis (2017) presented a framework to learn these unknown parameters by introducing regularity between two consecutive time step using Gaussian process.
These recent work greatly advanced the progress of the problem. However, symbolic regression is expensive and does not scale very well to large systems. The sparse regression method requires to fix certain numerical approximations of the spatial differentiations in the dictionary beforehand, which limits the expressive and predictive power of the dictionary. Although the framework presented by Raissi & Karniadakis (2017) is able to learn hidden physical laws using less data than the approach of sparse regression, the explicit form of the PDEs are assumed to be known except for a few scalar learnable parameters. Therefore, extracting governing equations from data in a less restrictive setting remains a great challenge.
The main objective of this paper is to accurately predict the dynamics of complex systems and to uncover the underlying hidden PDE models (should they exist) at the same time, with minimal prior knowledge on the systems. Our inspiration comes from the latest development of deep learning techniques in computer vision. An interesting fact is that some popular networks in computer vision, such as ResNet(He et al., 2016a; b), have close relationship with PDEs (E, 2017; Haber & Ruthotto, 2017; Sonoda & Murata, 2017; Lu et al., 2017). Furthermore, the deeper is the network, the more expressive power the network possesses, which may enable us to learn more complex dynamics arose from fields other than computer vision. However, existing deep networks designed in deep learning mostly emphasis on expressive power and prediction accuracy. These networks are not transparent enough to be able to reveal the underlying PDE models, although they may perfectly fit the observed data and perform accurate predictions. Therefore, we need to carefully design the network by combining knowledge from deep learning and applied mathematics so that we can learn the governing PDEs of the dynamics and make accurate predictions at the same time.
In this paper, we design a deep feedforward network, named PDENet, based on the following generic nonlinear evolution PDE
The objective of the PDENet is to learn the form of the nonlinear response and to perform accurate predictions. Unlike the existing work, the proposed network only requires minor knowledge on the form of the nonlinear response function , and requires no knowledge on the involved differential operators (except for their maximum possible order) and their associated discrete approximations. The only assumption we have is the maximum order of the differential operators that can appear in . The nonlinear response function can be learned using neural networks or other machine learning methods, while discrete approximations of the differential operators are learned using convolution kernels (i.e. filters) jointly with the learning of the response function . However, if we have a prior knowledge on the form of the response function , we can easily adjust the network architecture by taking advantage of the additional information. This may simplify the training and improve the results. We will also discuss relations of the PDENet to some existing networks in computer vision such as NetworkInNetwork (NIN) and ResNet. Details are given in Section 2.
In Section 3 and Section 4, we conduct numerical experiments on a linear PDE (convectiondiffusion equation) and a nonlinear PDE (convectiondiffusion equation with a nonlinear source). We generate data set for each PDE using high precision numerical methods and add Gaussian noise to mimic real situations. Our numerical results show that the PDENet can uncover the hidden equations of the observed dynamics, and can predict the dynamical behavior for a relatively long time, even in a noisy environment.
A particular novelty of our approach is that we impose appropriate constraints on the learnable filters in order to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. This makes our approach different from existing deep convolutional networks which mostly emphasis on the prediction accuracy of the networks, as well as all the existing approaches of learning PDEs from data which assume either the form of the response function is known or have fixed approximations of the differential operators. In other words, our proposed approach not only has vast flexibility in fitting observed dynamics and is able to accurately predict its future behavior, but is also able to reveal the hidden equations driving the observed dynamics. The constraints on the filters are motivated by the earlier work of Cai et al. (2012); Dong et al. (2017) where general relations between wavelet frame transforms and differential operators were established. In particular, it was observed in Dong et al. (2017) that we can relate filters and finite difference approximation of differential operators by examining the orders of sum rules of the filters (an important concept in wavelet theory and closely related to vanishing moments of wavelet functions). These constraints on the filters may also be useful in network designs for machine learning tasks in computer vision.
2 PDENet: A Flexible Deep Archtecture to Learn PDEs from Data
Given a series of measurements of some physical quantities on the spatial domain , with , we want to discover the governing PDEs of the data. We assume that the observed data are associated with a PDE that takes the following general form:
(1) 
Our objective is to design a feedforward network, named the PDENet, that approximates the PDE (1) in the way that: 1) we can predict the dynamical behavior of the equation for as long time as possible; 2) we are able to reveal the form of the response function and the differential operators involved. There are two main components of the PDENet that are combined together in the same network: one is automatic determination on the differential operators involved in the PDE and their discrete approximations; the other is to approximate the nonlinear response function . In this section, we start with discussions on the relation between convolutions and differentiations in discrete setting.
2.1 Convolutions and Differentiations
A comprehensive analysis on the relations between convolutions and differentiations within variational and PDE framework were laid out by Cai et al. (2012) and Dong et al. (2017), where the authors established general connections between PDE based approach and wavelet frame based approach for image restoration problems. We demonstrate one of the key observations of their work using a simple example. Consider the 2dimensional Haar wavelet frame filter bank contains one lowpass filter and three high pass filters , and :
The associated Haar wavelet frame transform on an image is defined by
It is easy to verify using Taylor’s expansion that the high frequency coefficients of the Haar wavelet frame transform on are discrete approximations of differential operators:
Here, and represent the horizontal and vertical spatial grid size respectively. For simplicity of notation, we use regular character to denote both discrete and continuum functions, since there should be no confusion within the context.
A profound relationship between convolutions and differentiations was presented in Dong et al. (2017), where the authors discussed the connection between the order of sum rules of filters and the orders of differential operators. Note that the order of sum rules is closely related to the order of vanishing moments in wavelet theory (Daubechies, 1992; Mallat, 1999). We first recall the definition of the order of sum rules.
Definition 2.1 (Order of Sum Rules).
For a filter , we say to have sum rules of order , where , provided that
(2) 
for all with and for all with but .
The following proposition from Dong et al. (2017) links the orders of sum rules with orders of differential operator.
Propositin 2.1.
Let be a filter with sum rules of order . Then for a smooth function on , we have
(3) 
where is the constant defined by
If, in addition, has total sum rules of order for some , then
(4) 
According to Proposition 2.1, an th order differential operator can be approximated by the convolution of a filter with order of sum rules. Furthermore, according to (4), one can obtain a high order approximation of a given differential operator if the corresponding filter has an order of total sum rules with .
Now, we introduce the concept of moment matrix for a given filter that will be used to constrain filters in the PDENet. For an filter , define the moment matrix of as
(5) 
for . We shall call the element of the moment of for simplicity. Combining (5) and Proposition 2.1, one can easily see that filter can be designed to approximate any differential operator at any given approximation order by imposing constraints on . For example, if we want to approximate (up to a constant) by convolution where is a filter, we can consider the following constrains on :
(6) 
Here, means no constraint on the corresponding entry. The constraints described by the moment matrix on the left of (6) guarantee the approximation accuracy is at least first order, and the ones on the right guarantee an approximation of at least second order. In particular, when all entries of are constrained, e.g.
the corresponding filter can be uniquely determined, in which case we call it a “frozen” filter. In the PDENet which shall be introduced in the next subsection, all filters are learned subjected to partial constraints on their associated moment matrices.
It is worth noticing that the approximation property of a filter is limited by its size. Generally speaking, large filters can approximate higher order differential operators or lower order differential operators with higher approximation orders. Taking 1dimensional case as an example, 3element filters cannot approximate the fifth order differential operator, whereas 7element filters can. In other words, the larger are the filters, the stronger is the representation capability of filters. However, larger filters lead to more memory overhead and higher computation cost. It is a wisdom to balance the tradeoff in practice.
2.2 Architecture of PDENet
Given the evolution PDE (1), we consider forward Euler as the temporal discretization. One may consider more sophisticated temporal discretization which leads to different network architectures. For simplicity, we focus on forward Euler in this paper.
One Block:
Let be the predicted value of at time based on the value of at . Then, we have
(7) 
Here, the operators and are convolution operators with the underlying filters denoted by and . The operators , , , etc. approximate differential operators, i.e. . The operators and are average operators. The purpose of introducing these average operators in stead of using the identity is to improve stability of the network and enables it to capture more complex dynamics. Other than the assumption that the observed dynamics is governed by a PDE of the form (1), we assume that the highest order of the PDE is less than some positive integer. Then, the task of approximating is equivalent to a multivariate regression problem, which can be approximated by a pointwise neural network (with shared weights across the computation domain ) or other classical machine learning methods. Combining the approximation of differential operators and the nonlinear function , we achieve an approximation framework of (7) which will be referred to as a block (see Figure 1). Note that if we have a prior knowledge on the form of the response function , we can easily adjust the network architecture by taking advantage of the additional information. This may simplify the training and improve the results.
PDENet (Multiple Blocks):
One block only guarantees the accuracy of onestep dynamics, which does not take error accumulation into consideration. To improve the stability of the network and enable longterm prediction, we stack multiple blocks into a deep network, and call this network the PDENet (see Figure 2). The importance of stacking multiple blocks will be demonstrated in Section 3.
The PDENet can be easily described as: (1) stacking one block multiple times; (2) sharing parameters in all blocks. Given an input data , training a PDENet with blocks needs to minimize the accumulated error , where is the output from the PDENet (i.e. blocks) with input . Thus, the PDENet with bigger owns a longer time stability. Note that sharing parameters is a common practice in deep learning, which decreases the number of parameters and leads to significant memory reduction (Goodfellow et al., 2016).
Loss Function and Constraints:
Consider the data set , where indicates the th solution path with a certain initial condition of the unknown dynamics. We would like to train the PDENet with blocks. For a given , every pair of the data , for each and , is a training sample, where is the input and is the label that we need to match with the output from the PDENet. We select the following simple loss function for training:
where is the output of the PDENet with as the input.
All the filters involved in the PDENet are properly constrained using their associated moment matrices. Let and be the underlying filters of and . We impose the following constrains
and for
For example, for filters, we have
and
To demonstrate the necessity of learnable filters, we will compare the PDENet having the aforementioned constrains on the filters with the PDENet having frozen filters. To differentiate the two cases, we shall call the PDENet with frozen filters “the FrozenPDENet”.
To further increase the expressive power and flexibility of the PDENet, we may associate multiple filters to approximate a given differential operator. However, in order not to mess up the identifiability of the underlying PDE model, we may select only one of the filters to provide correct approximation to the given differential operator in the way as described above. The rest of the filters are constrained in the way that they only contribute to modify the local truncation errors. For example, consider two filters and constrain their moment matrices as follows
Then, is potentially a better approximation to (up to a constant) than . However, for simplicity, we only use one filter to approximate a given differential operator in this paper.
Novelty of the PDENet:
Different from fixing numerical approximations of differentiations in advance in sparse regression methods (Schaeffer, 2017; Rudy et al., 2017), using learnable filters makes the PDENet more flexible, and enables more robust approximation of unknown dynamics and longer time prediction (see numerical experiments in Section 3 and Section 4). Furthermore, the specific form of the response function is also approximated from the data, rather than assumed to be known in advance (such as (Raissi & Karniadakis, 2017)). On the other hand, by inflicting constrains on moment matrices, we can identify which differential operators are included in the underlying PDE which helps with identifying the nonlinear response function . This grants transparency to the PDENet and the potential to reveal hidden physical laws. Therefore, the proposed PDENet is distinct from the existing learning based method to discover PDEs from data, as well as networks designed in deep learning for computer vision tasks.
2.3 Initialization and training
In the PDENet, parameters can be divided into three groups:

filters to approximate differential operators;

the parameters of the pointwise neural network to approximate ;

hyperparameters, such as the number of filters, the size of filters, the number of layers, etc.
The parameters of the pointwise neural network are shared across the computation domain , and are initialized by random sampling from a Gaussian distribution. For the filters, we initialize them by freezing them to their corresponding differential operators. For example, if a filter is to approximate , we freeze it by constraining its moment to 1 and other moments to 0. During the training process, we release the filters by switching to the constrains described in Section 2.2.
Instead of training an layer PDENet directly, we adopt layerwise training, which improves the training speed. To be more precise, we start with training the PDENet on the first block, and then use the results of the first block as the initialization and restart training the PDENet on the first two blocks. Repeat until we complete all blocks. Note that all the parameters in each of the block are shared across layers. In addition, we add a warmup step before the training of the first block. The warmup step is to obtain a good initial guess of the parameters of the pointwise neural network that approximates by using frozen filters.
2.4 Relations to some existing networks
In recent years, a variety of deep neural networks have been introduced with great success in computer vision. The structure of the proposed PDENet is similar to some existing networks such as the NetworkInNetwork (NIN) (Lin et al., 2013) and the deep Residual Neural Network (ResNet) (He et al., 2016a; b).
The NIN is an improvement over the traditional convolutional neural networks. One of the special designs of NIN is the use of multilayer perceptron convolution (mlpconv) layers instead of the ordinary convolution layers. An mlpconv layer contains the convolutions and small pointwise neural networks. Such design can improve the ability of the network to extract nonlinear features from shallow layers. The inner structure of one block of the PDENet is similar to the mlpconv layer, and the multiple blocks structure is similar to the NIN structure, except for the pooling and ReLU operations.
On the other hand, each block of the PDENet has two paths (see Figure 1 and Figure 2): one is for the averaged quantity of and the other is for the increment . This structure coincides with the “residual block” introduced in ResNet. In fact, there has been a substantial study on the relation between ResNet and dynamical systems recently (E, 2017; Haber & Ruthotto, 2017; Sonoda & Murata, 2017).
3 Numerical Studies: ConvectionDiffusion Equations
Convectiondiffusion equations are classical PDEs that are used to describe physical phenomena where particles, energy, or other physical quantities are transferred inside a physical system due to two processes: diffusion and convection (Chandrasekhar, 1943). Convectiondiffusion equations are widely applied in many scientific areas and industrial fields, such as pollutants dispersion in rivers or atmosphere, solute transferring in a porous medium, and oil reservoir simulation. In practical situations, usually the physical and chemical properties on different locations cannot be the same (called ¡°anisotropy¡± in physics), thus it is more reasonable that convection coefficients and diffusion coefficients are variables instead of constants.
3.1 Simulated Data, Training and Testing
We consider a 2dimensional linear variablecoefficient convectiondiffusion equation on ,
(8) 
where
and .
The computation domain is discretized using a regular mesh. Data is generated by solving problem (8) using a high precision numerical scheme with pseudospectral method for spatial discretization and 4th order RungeKutta for temporal discretization (with time step size ). We assume periodic boundary condition and the initial value is generated from
(9) 
where , , and and are chosen randomly. In order to mimic real world scenarios, we add noise to the generated data. For each sample sequence , the noise is added as
(10) 
where , and represents the standard normal distribution.
Suppose we know apriori that the underlying PDE is linear with order no more than 4. Then, the response function takes the following form
Each block of the PDENet can be written as
where are convolution operators and are 2D arrays which approximate functions on . The approximation is achieved using piecewise quadratic polynomial interpolation with smooth transitions at the boundaries of each piece. The filters associated to the convolution operators and the coefficients of the piecewise quadratic polynomials are the trainable parameters of the network.
During training and testing, the data is generated onthefly, i.e. we only generate the data needed following the aforementioned procedure when training and testing the PDENet. In our experiments, the size of the filters that will be used is or . The total number of trainable parameters for each block is approximately k. During training, we use LBFGS, instead of SGD, to optimize the parameters. We use 28 data samples per batch to train each layer (i.e. block) and we only construct the PDENet up to 20 layers, which requires totally 560 data samples per batch. Note that the PDENet is designed with the assumption that it approximates nonlinear evolution PDEs, which is a relatively stronger assumption than the networks in deep learning. Therefore, we require less training data and LBFGS performs better than SGD (which is widely adopted in deep learning). Furthermore, as will be shown by our numerical results, the learned PDENet generalizes very well. The PDENet can accurately predict the dynamics even when the initial data does not come from the same distribution as in the training process.
3.2 Results and Discussions
This section presents numerical results of training the PDENet using the data set described in the previous subsection. We will specifically observe how the learned PDENet performs in terms of prediction of dynamical behavior and identification of the underlying PDE model. Furthermore, we will investigate the effects of some of the hyperparameters (e.g. size of the filters, number of blocks) on the learned PDENet.
Predicting longtime dynamics
We demonstrate the ability of the trained PDENet in prediction, which in the language of machine learning is the ability to generalize. After the PDENet with blocks () is trained, we randomly generate 560 initial guesses based on (9) and (10), feed them to the PDENet, and measure the relative error between the predicted dynamics (i.e. the output of the PDENet) and the actual dynamics (obtained by solving (8) using high precision numerical scheme). The error plots are shown in Figure 3. Results of longer prediction for the PDENet with learnable filters are shown in Figure 4. Some of the images of the predicted dynamics are presented in Figure 5. From these results, we can see that:

Even trained with noisy data, the PDENet is able to perform longterm prediction (see Figure 5);

Having multiple blocks helps with the stability of the PDENet and ensures longterm prediction (see Figure 3);

The PDENet performs significantly better than FrozenPDENet, especially for filters (see Figure 3);
To further demonstrate how well the learned PDENet generalizes, we generate initial values following (9) with highest frequency equal to 12, followed by adding noise (10). Note that the maximum allowable frequency in the training set is 9. The results of longtime prediction and the estimated dynamics are shown in Figure 6. Although oscillations are observed in the prediction, the estimated dynamic still captures the main pattern of the true dynamic.
Discovering the hidden equation
For the linear problem, identifying the PDE amounts to finding the coefficients that approximate . The coefficients of the trained PDENet are shown in Figure 7. Note that are absent from the PDE (8), and the corresponding coefficients learned by the PDENet are indeed close to zero. In order to have a more concise demonstration of the results, we only show the image of in Figure 7.
Comparing the first three rows of Figure 7, the coefficients learned by the PDENet are close to the true coefficients except for some oscillations due to the presence of noise in the training data. Furthermore, the last row of Figure 7 indicates that having multiple blocks helps with estimation of the coefficients. However, having larger filters does not seem to improve the learning of the coefficients, though it helps tremendously in prolonging predictions of the PDENet.
Quick Summary:
In summary, the numerical experiments show that the PDENet is able to conduct accurate prediction and identify the underlying PDE model at the same time, even in a noisy environment. Multiple blocks, i.e. deeper structure of the PDENet, makes the PDENet more stable and enables longer time prediction. Comparisons between the PDENet and the FrozenPDENet demonstrate the importance of using learnable and yet partially constrained filters, which is new to the literature. Furthermore, using larger filters helps with stability and can prolong reliable predictions.
4 Numerical Studies: Diffusion Equations with Nonlinear Source
When modeling physical processes like particle transportation or energy transfer, in addition to convection and diffusion, we have to consider source/sink terms. In some problems, the source/sink plays an important role. For example, when convectiondiffusion equations are used to describe the distribution and flow of pollutants in water or atmosphere, identifying the intensity of pollution source is equivalent to finding the source term, which is important for environmental pollution control problems.
4.1 Simulated Data, Training and Testing
We consider a 2dimensional linear diffusion equation with a nonlinear source on ,
(11) 
where and . The computation domain is discretized using a regular mesh. Data is generated by solving problem (11) using forward Euler for temporal discretization (with time step size ) and central differencing for spatial discretization on mesh, and then restricted to the mesh. We assume zero boundary condition and the initial value is generated by , where is obtained from (9) with maximum allowable frequency . Same as the numerical setting in Section 3, Gaussian noise is added to each sample sequence as described by (10).
Suppose we know a priori that the underlying PDE is a convectiondiffusion equation of order no more than 2 with a nonlinear source depending on the variable . Then, the response function takes the following form
Each block of the PDENet can be written as
where are convolution operators and are 2D arrays which approximate functions on . The approximation is achieved using piecewise quadratic polynomial interpolation with smooth transitions at the boundaries of each piece. The approximation of is obtained by piecewise 4th order polynomial approximation over a regular grid of the interval with 40 grid points. The training and testing strategy is exactly the same as in Section 3. In our experiments, the size of the filters is . The total number of trainable parameters for each block is approximately k.
4.2 Results and Discussions
This section presents numerical results of the trained PDENet using the data set described in Section 4.1. We will observe how the trained PDENet performs in terms of prediction of dynamical behavior and identification of the underlying PDE model.
Predicting longtime dynamics
We demonstrate the ability of the trained PDENet in prediction, which in the language of machine learning is the ability to generalize. The testing method is exactly the same as the method described in Section 3. Comparisons between PDENet and FrozenPDENet are shown in Figure 8, where we can clearly see the advantage of learning the filters. Longtime predictions of the PDENet is shown in Figure 9 and we visualize the predicted dynamics in Figure 10. To further demonstrate how well the learned PDENet generalizes, we generate initial values following (9) with highest frequency equal to 10, followed by adding noise (10). Note that the maximum allowable frequency in the training set is only 6. The results of longtime prediction and the estimated dynamics are shown in Figure 11. All these results show that the learned PDENet performs well in prediction.
Discovering the hidden equation
For the PDE (11), identifying the PDE amounts to finding the coefficients that approximate , and that approximates . The computed coefficients of the trained PDENet are shown in Figure 12, and the computed is shown in Figure 13 (left). Note that the first order terms are absent from the PDE (11), and the corresponding coefficients learned by the PDENet are indeed close to zero. The approximation of is more accurate near the center of the interval than near the boundary. This is because the value of in the data set is mostly distributed near the center (Figure 13(right)).
5 Conclusion and Discussion
In this paper, we designed a deep feedforward network, called the PDENet, to discover the hidden PDE model from the observed dynamics and to predict the dynamical behavior. The PDENet consists of two major components which are jointly trained: to approximate differential operations by convolutions with properly constrained filters, and to approximate the nonlinear response by deep neural networks or other machine learning methods. The PDENet is suitable for learning PDEs as general as in (1). However, if we have a prior knowledge on the form of the response function , we can easily adjust the network architecture by taking advantage of the additional information. This may simplify the training and improve the results. As an example, we considered a linear variablecoefficient convectiondiffusion equation. The results show that the PDENet can uncover the hidden equation of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment. Furthermore, having deep structure (i.e. multiple blocks) and larger learnable filters can improve the PDENet in terms of stability and can prolong reliable predictions. As part of the future work, we will try the proposed framework on real data sets. One of the important directions is to uncover hidden variables which cannot be measured by sensors directly, such as in data assimilation. Another interesting direction which is worth exploring is to learn stable and consistent numerical schemes for a given PDE model based on the architecture of the PDENet.
Acknowledgments
Bin Dong is supported in part by NSFC 11671022. Zichao Long is supported in part by The National Key Research and Development Program of China 2016YFC0207700. Yiping Lu is supported by the elite undergraduate training program of School of Mathematical Sciences in Peking University.
References
 Bongard & Lipson (2007) Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 104(24):9943–9948, 2007.
 Brunton et al. (2016) Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016.
 Cai et al. (2012) JianFeng Cai, Bin Dong, Stanley Osher, and Zuowei Shen. Image restoration: total variation, wavelet frames, and beyond. Journal of the American Mathematical Society, 25(4):1033–1089, 2012.
 Chandrasekhar (1943) Subrahmanyan Chandrasekhar. Stochastic problems in physics and astronomy. Reviews of modern physics, 15(1):1, 1943.
 Daubechies (1992) Ingrid Daubechies. Ten lectures on wavelets. SIAM, 1992.
 Dong et al. (2017) Bin Dong, Qingtang Jiang, and Zuowei Shen. Image restoration: wavelet frame shrinkage, nonlinear evolution pdes, and beyond. Multiscale Modeling & Simulation, 15(1):606–660, 2017.
 E (2017) Weinan E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017.
 Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
 Haber & Ruthotto (2017) Eldad Haber and Lars Ruthotto. Stable architectures for deep neural networks. arXiv preprint arXiv:1705.03341, 2017.
 He et al. (2016a) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016a.
 He et al. (2016b) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pp. 630–645. Springer, 2016b.
 Lin et al. (2013) Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.
 Lu et al. (2017) Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. ArXiv preprint, 2017.
 Mallat (1999) Stéphane Mallat. A wavelet tour of signal processing. Academic press, 1999.
 Raissi & Karniadakis (2017) Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. arXiv preprint arXiv:1708.00588, 2017.
 Rudy et al. (2017) Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Datadriven discovery of partial differential equations. Science Advances, 3(4):e1602614, 2017.
 Schaeffer (2017) Hayden Schaeffer. Learning partial differential equations via data discovery and sparse optimization. In Proc. R. Soc. A, volume 473, pp. 20160446. The Royal Society, 2017.
 Schmidt & Lipson (2009) Michael Schmidt and Hod Lipson. Distilling freeform natural laws from experimental data. science, 324(5923):81–85, 2009.
 Sonoda & Murata (2017) Sho Sonoda and Noboru Murata. Double continuum limit of deep neural networks. ICML Workshop on Principled Approaches to Deep Learning, Sydney, Australia, 2017.
 Wu & Zhang (2017) Zongmin Wu and Ran Zhang. Learning physics by data for the motion of a sphere falling in a nonnewtonian fluid nonnewtonian fluid. preprint, 2017.