Neural network closures for nonlinear model order reduction
Abstract
Many reduced order models are neither robust with respect to the parameter changes nor costeffective enough for handling the nonlinear dependence of complex dynamical systems. In this study, we put forth a robust machine learning framework for projection based reduced order modeling of such nonlinear and nonstationary systems. As a demonstration, we focus on a nonlinear advectiondiffusion system given by the viscous Burgers equation, which is a prototype setting of more realistic fluid dynamics applications with the same quadratic nonlinearity. In our proposed methodology the effects of truncated modes are modeled using a singe layer feedforward neural network architecture. The neural network architecture is trained by utilizing both the Bayesian regularization and extreme learning machine approaches, where the latter one is found to be computationally more efficient. A particular effort is devoted to the selection of basis functions considering the proper orthogonal decomposition and Fourier bases. It is shown that the proposed models yield significant improvements in the accuracy over the standard Galerkin projection models with a negligibly small computational overhead and provide reliable predictions with respect to the parameter changes.
Keywords:
Machine learning Neural networks Extreme learning machine Model order reduction Projection methods Burgers equation∎\extrafloats100
1 Introduction
The everincreasing need for accurate prediction of many complex nonlinear processes leads to very largescale dynamical systems whose simulations and analyses make overwhelming and unmanageable demands on computational resources. Since the computational cost of traditional fullorder numerical simulations is extremely prohibitive, many successful model order reduction approaches have been introduced roychowdhury1999reduced (); fang2009pod (); buffoni2006low (); lucia2004reduced (); fortuna2012model (); silveira1996efficient (); freund1999reduced (); noack2011reduced (); dyke1996modeling (); kazantzis2010new (); akhtar2015using (). The purpose of such approaches is to reduce this computational burden and serve as surrogate models for efficient computational analysis of such systems, especially in settings where the traditional methods require repeated model evaluations over a large range of parameter values. Simplifying computational complexity of the underlying mathematical model these reduced order models offer promises in many prediction, identification, design, optimization, and control applications. However, they are neither robust with respect to the parameter changes nor lowcost to handle nonlinear dependence for complex nonlinear dynamical systems encountered in a wide range of physical phenomena. Therefore, reduced order modeling (ROM) remains an open challenge and the development of efficient and reliable model order reduction techniques is of paramount importance for both fundamental and applied science. In this study, we focus on the development of an accurate and robust machine learning framework for projection based nonlinear model order reduction of such transient nonlinear systems.
The basic philosophy of projection based approaches is the alleviation of the intense computational burden of the partial differential equations obtained from governing laws benner2015survey (). These approaches endeavor to reduce the high degrees of freedom of a governing law through an expansion in a transformed space, traditionally with orthogonal bases. The choice of the transformation aids the user in the identification of sparseness and thus allows them to obtain a dense (but low dimensional) representation of the same governing law. Among the large variety of reduced order modeling strategies, proper orthogonal decomposition (POD) has emerged as a popular technique for the study of dynamical systems amsallem2008interpolation (); aubry1988dynamics (); banyay2014proper (). With respect to terminology, POD is also referred to as the KarhunenLoéve expansion loeve1955probability (), principal component analysis hotelling1933analysis () or the empirical orthogonal function lorenz1956empirical (). A successful construction of quality POD bases requires a collection of high fidelity numerical simulations of the governing equations of the dynamic system being studied if an exact solution is not available. This information is used to devise optimal bases for the transformed space. Indeed, the construction of these bases also serve as a postprocessing tool for instance as a method of extraction of the large scale coherent structures in statistical pattern recognition lumley1967structures (). This is because the global POD modes may be spanned through a considerably truncated number of bases to resolves attractors and transients as well. This study, however, is oriented more towards the feedback flow control community which attempts to used POD as a degree of freedom reduction technique. We would like to emphasize here that there exist several noteworthy approaches to ROM implementation such as Dynamic Mode Decomposition (or DMD) which is a technique for determining a lowdimensional subspace from observed data similar to POD but with a description of the dynamics on that subspace schmid2010dynamic (). Another popular technique for determining a reduced subspace is the extended DMD (or EDMD). The reader is direct to rowley2017model (); taira2017modal () for an excellent discussion of these closely related approaches.
In our investigation, we implement the POD for our ROM analysis using a Galerkin projection (GP) approach. PODGP has extensively been used to provide fast and accurate simulations of large nonlinear systems borggaard2007interval (); kunisch2001galerkin (); weller2010numerical (). This approach is devised so as to generate a reduced model where the truncated modes are evolved through time as a set of ordinary differential equations. This may be considered to be a projection type approach as it is obtained by projecting the truncated modes (obtained from our POD) onto the governing equations. We must remark here that the an orthogonal projection has been used to move our problem to the reduced subspace. A nonorthogonal projection may also be employed which is generally known as the PetrovGalerkin projection antoulas2005approximation (). Another approach one may use for nonlinear systems is given by Koopman operator theory gaspard2005chaos () which identifies a subspace where the nonlinear dynamics are linearized and uncoupled. However, in this investigation we limit ourselves to the GP approach alone. The process of truncation (which is necessary for degree of freedom reduction) introduces a limitation to our chosen approach. The truncation procedure causes a significant loss of information particularly for highly nonstationary, convective and nonlinear problems cordier2010calibration (); el2016new (). Therefore, there have been considerable efforts to mitigate the drawbacks of this loss of information such as through the process of closure modeling wang2011two (); wang2012reduced (); san2013proper (); borggaard2016goal () where arguments are made in favor of modeling the effect of the discarded modes on the preserved ones. Another promising approach is to develop strategies for constructing more representative bases san2015stabilized (); bui2007goal (); carlberg2011low (); iollo2000stability (); kunisch2010optimal (); buffoni2006low (). We note that bases obtained from orthogonal Fourier modes have also been utilized in this investigation for added comparison.
By analogy with the large eddy simulations of turbulent flows, several eddy viscosity type of closure models have been introduced for ROMs aubry1988dynamics (); noack2002low (); ullmann2010pod (); sirisup2004spectral (); bergmann2009improvement (); borggaard2011artificial (); wang2011two (); wang2012proper (); akhtar2012new (); san2013proper (); san2015stabilized (). Recently, a new closure modeling framework for stabilization of ROMs has been proposed by using a robust Lyapunov control theory. It has been demonstrated that the Lyapunovbased closure model is robust to parametric uncertainties and the free parameters are optimized using a datadriven multiparametric extremum seeking algorithm benosman2016learning (); benosman2017learning (). Other stabilization models for ROMs have been also suggested in literature noack2011reduced (); balajewicz2012stabilization (); amsallem2012stabilization (); cordier2010calibration (); lassila2013model (); rowley2005model (); imtiaz2016closure (); benosman2016learning (); xie2017approximate (); wells2017evolve (). In addition to these models, Cazemier’s closure modeling approach cazemier1997proper (); cazemier1998proper (); san2013proper () uses the concept of energy conservation to account for the truncated modes.
The focus of the present investigation will be to utilize a datadriven supervised machine learning framework for the closure (and stabilization) of a highly truncated POD implemented using the GP approach. A single layer feedforward artificial neural network is used to estimate the effect of the discarded modes on the retained ones during the time integration of the ordinary differential equations obtained using PODGP. In brief, an artificial neural network (ANN) can be used to setup a nonlinear relationship between a desired set of inputs and targets provided a large set of benchmark data for the underlying statistical relationship is available. It is therefore no surprise that this subset of the machine learning field has seen wide application in function approximation, data classification, pattern recognition and dynamic systems control applications widrow1994neural (); demuth2014neural (). Before an ANN is deployed, it must be trained to accurately capture the nonlinear relationship between its inputs and outputs. For this purpose, a high fidelity solution of the Burgers equation is used to generate the ‘true’ modes. An optimization problem is solved to minimize a desired performance function (which generally represents a measure of the suitability of a network for prediction). While we shall provide information about the machine learning framework implemented here in far greater detail within, we note that we have used the Bayesian regularization foresee1997gauss () and extreme learning machine huang2006extreme () approaches for the training of the artificial neural network. Both methods have been developed in the machine learning community for the accurate capture of statistical trends in noisy data. While the former is famed for its accuracy, the latter is attractive due to its very low computational cost. Indeed, the extreme learning machine (or ELM) proves to be very exciting as a possible means to address the need for fast learning in dynamically adaptive systems (although the topic of which is left to a separate investigation).
While there have been investigations into the suitability of machine learning techniques for the purpose of ROM implementations narayanan1999low (); khibnik2000analysis (); sahan1997artificial (); moosavi2015efficient (); moosavi2017multivariate () and these approaches are quite well studied for use in feedback flow control where they are used to generate a direct mapping of flow measurements to actuator control systems gillies1995low (); gillies1998low (); gillies2001multiple (); faller1997unsteady (); hocevar2004experimental (); efe2004modeling (); efe2005control (); lee1997application (), it is our belief that the work presented in this document represents the first utilization of an ANN as closure model in the PODGP framework. As mentioned previously, information from the time evolution of our PDE (from its analytical solution) is leveraged to provide a supervised learning framework for the ANN to approximate the error in the modal evolution through GP. A regularized training ensures that no localized behavior is observed from the trained ANN with respect to the governing equations and the data they have produced. Training data sets are generated through different realizations of the viscosity of the viscous Burgers equation with a high degree of freedom. The trained ANN architecture is then tested for its ability to stabilize modal evolution for values of viscosity that are not in the training data set. Our investigations can be summarized by the following bullet points

A machine learning based closure model is implemented in the ROM framework (using a Galerkin projection) through the use of a single layer feedforward artificial neural network. Both optimal POD and Fourier modes are used as the bases of the transformation space.

The closure is ‘trained’ using high fidelity data obtained from the true solution of the time evolution of our partial differential equation. The Bayesian regularization and extreme learning machine approaches are used for a regularized training to capture underlying statistical data.

The proposed framework is tested through a ROM framework for the viscous Burgers equation for physical parameters which were not included in the training data set.
The rest of this paper is devoted to a mathematical development of the concepts we have introduced as well as a documentation of our results.
2 Mathematical model
The viscous Burgers equation is our governing law of choice for this study. We employ the conservative form of the equation given by
(1) 
where is the Reynolds number. This equation is generally considered a lowerdimensional equivalent for the full NavierStokes equations due to its advective and diffusive behavior. It is therefore, quite common, to use the Burgers equation for the preliminary evaluation of ROM frameworks prior to deployment for fluid flow problems kunisch1999control (); san2013proper (); imtiaz2016closure (). Indeed, it may be considered a challenging convective system for ROM assessments as it characterizes localized flow structures such as shock waves.
3 Basis selection
In this section, we detail the two types of bases chosen for the transformation space of our PDE. Some of the bases obtained using our two approaches are shown in 4 and we note that they are orthogonal to each other.
3.1 Fourier basis functions
As a first choice for our orthogonal bases, we utilize the Fourier series coefficients. In order to ensure that the boundary conditions of the PDE are respected we discard the Cosine components of the series and are left with
(2) 
where is our extent of the physical domain. We remark here that these bases may not be considered optimal as they are devised without any information from our exact solutions but through intuition. The condition for orthonormality between bases can be expressed as:
(3) 
3.2 POD basis functions
In this subsection, we detail the POD approach for ROM to capture unsteady, convective and nonperiodic dynamics of governing PDEs. The reader is directed to san2013proper () for a far more detailed discussion of this topic.
A POD can be constructed from the scalar field at different times (also known as snapshots). These snapshots are either obtained by solving the governing equations through a DNS or (as in this case) through the exact solution. In the following, we will utilize the index to indicate a particular snapshot in time. For the POD approach we utilize a total of snapshots for the field variable, i.e., for . The flow field data for our governing laws may thus be represented as
(4) 
where implies a temporal averaging of a particular point value in the field and contains the fluctuating quantity. We must remark here that is a function of space alone whereas is a function of both space and time. A correlation matrix may be constructed using the fluctuating components of the snapshots to give
(5) 
where is the entire spatial domain and and refer to the and snapshots. The correlation matrix is a nonnegative symmetric square matrix of side . If we define the inner product of any two fields and as
(6) 
we may express the correlation matrix as . In this study, we use the wellknown Simpson’s 3/8 integration rule for a numerical computation of the inner products. The optimal POD basis functions are obtained on performing an eigendecomposition for the matrix. This has been shown in detail in the POD literature (see, e.g., sirovich1987turbulence (); holmes1998turbulence (); ravindran2000reduced ()). The eigenvalue problem can be written in the following form:
(7) 
where is a diagonal matrix containing the eigenvalues of this decomposition and =[ , , …,], is an orthogonal basis consisting of the eigenvectors of this decomposition. The eigenvalues are stored in descending order, . The POD basis functions can be written as
(8) 
where is the th component of eigenvector . Therefore we emphasize that the POD modes are ranked according to the magnitude of their eigenvalue. The eigenvectors must also be normalized in order to satisfy the condition of orthonormality between bases given by Eq. (3). It can be shown that, for Eq. (3) to be true for the POD bases, the eigenvector must satisfy the following equation:
(9) 
In practice, most of the subroutines for solving the eigensystem given in Eq. (7) return the eigenvector matrix having all the eigenvectors normalized to unity. In that case, the orthogonal POD bases are given by
(10) 
where is the th POD basis function. The main motivation behind the construction of a ROM using the optimal POD bases is due to the fact that modes with high magnitudes of eigenvalues retain a greater proportion of the energy of the system. Hence, it is possible to construct a lower degree of freedom approximation of our sparse PDE in the space transformed by the aforementioned eigenvectors.
4 Projection based model order reduction
For our test case given by the Burgers equation, the Galerkin projection can be carried out in the following manner.
(11) 
where are the time dependent coefficients, and are the space dependent modes. To derive the Galerkin projection, we first rewrite the Burgers equation (i.e., Eq. (1)) in the following form
(12) 
where
(13) 
is the linear operator, and
(14) 
is the nonlinear operator. By applying this projection to our nonlinear system (i.e., multiplying Eq. (12) with the basis functions and integrating over the domain), we obtain our ROM, denoted PODGPROM when using the optimal POD modes and FourierGPROM when using Fourier modes:
(15) 
where we use Eq. (4) and Eq. (11) to yield
(16) 
and substituting Eq. (16) into Eq. (11), and simplifying the resulting equation by using the condition of orthogonality given in Eq. (3), the ROM implementation can be written as follows:
(17) 
where
(18)  
(19)  
(20) 
The GPROM given by Eq. (17) consists of coupled ODEs and can be solved by a standard numerical method (such as the thirdorder RungeKutta scheme that was used in this study). The number of degrees of freedom of the system is now significantly lower. The vectors, matrices and tensors in Eqs. (18)(20) are also precomputed quantities, which results in a dynamical system that can be solved very efficiently. To complete the dynamical system given by Eq. (17), the initial condition is given by using the following projection:
(21) 
where is the physical initial condition of the problem given in Eq. (1).
5 Artificial neural networks
5.1 Network architecture
The basic structure of the simple feedforward artificial neural network consists of layers with each layer possessing a predefined number of unit cells called neurons. Each of these layers has an associated transfer function and each unit cell has an associated bias. Any input to the neuron has a bias added to it followed by activation through the transfer function. To describe this process using equations, we have for a single neuron in the layer receiving a vector of inputs from the layer given by demuth2014neural ()
(22) 
where stands for a matrix of weights linking the and layers with being the output of the layer. The output of the layer is now given by
(23) 
where is the vector of biasing parameters for the layer. Every node (or unit cell) has an associated transfer function which acts on its input and bias to produce an output which is ‘fed forward’ the network. The nodes which take the raw input value of our training data set (i.e., the nodes of the first layer in the network) perform no computation (i.e., they do not have any biasing or activation through a transfer function). The next layers are a series of unit cells which have an associated bias and activation function which perform computation on their inputs. These are called the hidden layers with the individual unit cells known as neurons. Note that it is common in literature to consider these the only layers in the network. The final layer in the network is that of the outputs. The output layers generally have a linear activation function with a bias which implies a simple summation of inputs incident to a unit cell with its associated bias. In this investigation, we have used one hidden layer of neurons between the set of inputs and targets with a TanSigmoid activation function. The TanSigmoid function can be expressed as
(24) 
The transfer function calculates the neuron’s output given its net input. In theory, any differentiable function can qualify as an activation function zhang1998forecasting (), however, only a small number of functions which are bounded, monotonically increasing and differentiable are used for this purpose. The choice of using ANN is motivated by its excellent performance as a forecasting tool dawson1998artificial (); kim2003nonlinear () and its general suitability in the machine learning and function estimation domain (e.g., see haykin2009neural () and references therein).
5.2 ANN closure modeling for ROM
Eq. (17) can be rewritten as
(25) 
where
(26) 
where we have truncated our modal quantities to a reduced number (). At this point, a closure term must be introduced to account for the effect of truncation. This gives us
(27) 
where accounts for the residual effects of the discarded modes. An ANN architecture is introduced to model this term. For the purpose of training, we need to assess what our modal quantities would evolve like if there was no truncation (i.e., a pure evolution of the PDE in a transformed space). We may obtain this by using our underlying PDE
(28) 
and applying our familiar projection to get
(29) 
which gives us
(30) 
where our true projected right hand side becomes
(31) 
One may now compare Eqs. (27) and (31) to obtain
(32) 
Thus to summarize: is our standard truncated GP obtained using either POD or Fourier bases, is the closure we would like to model, is the true projection obtained by transforming the exact solution to the new space spanned by the POD or Fourier bases.
An ANN architecture is devised which aids us in developing a nonlinear relationship between a set of inputs given by the Reynolds number (), the present time () and the truncated GP projections (i.e., ). The outputs are given by the closures (i.e., ). The architecture of our ANN is shown in Fig. (5) and is our number of hidden neurons.
5.3 Bayesian regularization
The training of a desired ANN is carried out by minimizing the error between the target and the inputs to determine a set of best fit parameters. These best fit parameters are the biases and linear weights that have captured the underlying relationship between the targets and inputs and may now be used to predict target data for inputs a posteriori. The advantage of using the ANN approach over traditional statistical regression models is that comparatively smaller data sets for training are suitable. The objective function of a neural network training is given by:
(33) 
which is a classic mean squared error of the outputs and the targets. Note that our input data may be defined as:
(34) 
where is the number of sample data. Our output data can also be represented similarly as:
(35) 
There is an important caveat to the ‘meansquarederror’ based training of neural networks: regularization techniques are imperative for the prevention of overfitting noise in our data. In our investigation, we shall use the Bayesian regularization (BR) and extreme learning machine (ELM) approaches which have become popular for their regularization ability.
The Bayesian regularization training procedure augments the performance function of the ANN training by penalizing the sum of the squared weights of the network mackay1992bayesian (); foresee1997gauss (). This is in contrast to the traditional methods which are oriented to minimizing the sum of squared errors alone (which is dangerous for noisy data as it promotes excessive localization). This modified objective function is then implemented in a Bayesian framework where the parameters in the network (i.e., the weights and the biases) are considered as random variables. Mathematically we have,
(36) 
where is the sum of squared weights and are the weight coefficients which are dynamically adjusted during training. The scalar may be mathematically represented as the sum of the Frobenius norm of the layer 1 and layer 2 weight matrices:
(37) 
If , the training aims to reduce weight sizes at the expense of network errors thereby producing a smoother network response. The main challenge is the setting of the weight coefficients after which a regular gradient based optimization technique may be used to adjust the weights of the ANN for one iteration. This is done through the computation of a quantity known as the effective number of parameters () foresee1997gauss (). Once one step of the standard LevenbergMarquardt algorithm is completed from the initial condition where , and weights are initialized according to the NguyenWidrow method nguyen1990improving (), the effective number of parameters is computed from the knowledge of the Hessian matrix available from the LevenbergMarquardt algorithmlevenberg1944method (); marquardt1963algorithm ():
(38) 
where is the total number of parameters (i.e., weights and biases of the entire network) and is given by:
(39) 
where is the Jacobian matrix of the training set errors hagan1994training () and is the vector of all network parameters (refer equation 12.36 in demuth2014neural ()). The superscript represents the matrix transpose operation. The LevenbergMarquardt algorithm, may then be utilized to obtain a new estimate of the network parameters:
(40) 
Note that this algorithm is a hybrid between the Newton’s method and steepest descent approach to function minimization with the gradient of the objective function given by:
(41) 
with being the vector of network errors. When the scalar is small, the algorithm behaves more like the Newton’s method of root finding using the Hessian Matrix . When is large, the algorithm becomes steepest descent with a small step size. In brief, the parameter is decreased after each successful step that reduced the objective function and increased only if a tentative step increases the objective function. Following this step, new estimates for the weighting coefficients are obtained from
(42) 
Now that the weighting coefficients are updated one may iterate through the aforementioned steps till a desired convergence is reached. An updated value of and , gives us a newly weighted performance function which may be minimized by a variety of gradient based optimization algorithms (here we have used the LevenbergMarquardt algorithm) to obtain a new set of optimal weights and biases at the end of each iteration. Algorithm 1 summarizes the BR training methodology. The BR algorithm is equipped with a convergence criteria given by a maximum value of (implying that the minima of the objective function has been reached) or a value of the gradient is obtained which is lower than the minimum gradient threshold specified prior to the start of the minimization process.
5.4 Extreme learning machine
In this section we detail the extreme learning machine approach to generalized single layer feedforward ANN training. This methodology was proposed in huang2006extreme () for extremely fast training of a single layer feedforward ANN based on the principles of the least squares approximation. For the ease of description, let us define a few matrices for the single layer feedforward network. We remark that is a generalization of the architecture introduced in Section 5.1. Our input matrix is
(43) 
where is the sample (out of a total of samples) of a multidimensional input vector. Our weights connecting the inputs to the middle (hidden) layer are given by
(44) 
where is the neuron (out of a total of neurons) in the hidden layer. These are initialized to be small nonzero random numbers to enforce generalization. The extreme learning machine methodology prescribes biases only for hidden layer neurons and these may be given by can be given by
(45) 
The output of the hidden layer neurons becomes
(46) 
where implies an tansigmoid activation procedure on each element of a matrix . The weight matrix of the second layer may be given as
(47) 
Our outputs of the ELM may thus be represented as
(48) 
which must be trained against a set of targets corresponding to each input vector given by
(49) 
The ELM training mechanism is given as follows. In order to calculate the matrix , we must recognize that its optimal solution should satisfy
(50) 
or by taking a transpose of both sides
(51) 
which leads us to the following expression for the optimal weights
(52) 
The weights and biases are generated initially using random numbers. The matrix given by is calculated using a generalized MoorePenrose pseudoinverse serre2002matrices (). Once the optimal weights of the second layer are obtained, our network is trained for deployment. Due to the random number values chosen for the weights in the first layer, our network is well suited to highly effective abstraction of the training process due to a smaller degree of freedom of the overall ANN. The obvious advantage of this approach compared to the BR method is the substitution of an iterative minimization to a direct least squares approximation using the pseudoinverse. The procedure for the ELM method is given in Algorithm 2.
5.5 Performance comparison between BR and ELM
Before proceeding to the results of our investigation, it would be useful to compare the computational cost of the BR and ELM training algorithms. In addition, since both algorithms are designed for regularization ability, it would also be helpful to see their training performance for noisy data. For this purpose, we choose a simple one dimensional problem to test both approaches given by:
(53) 
A training data set is devised by obtaining 51 samples of the above function at equal intervals within . In addition two other training data sets are obtained with varying amounts of noise. We use a pseudorandom number generator and multiply the random number with an amplitude to mimic noise which is added to each training data point. To sum up, our test cases are those with zero noise, with an amplitude of 0.05 and 0.1. The ELM and BR algorithms were then tested on all training data sets. Five trials for each data set were run and training times were recorded as given in Table (1). It is apparent that an increased noise in the data causes an increased duration for the convergence of the BR algorithm. On the other hand, the ELM method relies on the calculation of a pseudoinverse of a matrix, the dimensions of which depend solely on the number of hidden neurons and the size of the input data set. The regularization ability of the algorithms can be examined through the performance of the trained ANN output as shown in Fig. (1). It can be observed that the ELM algorithm generally does a better job at extracting the underlying function behavior. However, we caution the reader that this behavior is not conclusive of either algorithm and a variety of test cases must be examined for stronger conclusions.
Amplitude = 0  Amplitude = 0.05  Amplitude = 0.1  

Trial Number  ELM  BR  ELM  BR  ELM  BR 
Trial 1  0.002514  0.91155  0.002409  2.03174  0.023618  1.91132 
Trial 2  0.001187  0.51236  0.001286  2.06923  0.001208  5.29371 
Trial 3  0.001373  0.70040  0.001565  2.86361  0.001638  1.64957 
Trial 4  0.001203  0.81693  0.001230  1.33273  0.001186  2.28243 
Trial 5  0.001335  1.24820  0.001099  2.26790  0.001328  2.06521 
6 Results
This section details the results of our investigation for the viscous Burgers equation problem. Our test case is given by the following initial and boundary conditions:
(54) 
where the length of our domain and maximum time . The PDE for the Burgers equation with the aforementioned boundary and initial conditions may be solved exactly to obtain an analytical formulation for the time evolution for the field variable given by maleewong2011line ()
(55) 
where . This exact expression is used to generate snapshot data for our ROM assessments. We compute inner products using the wellknown Simpson’s rule moin2010fundamentals () and a thirdorder RungeKutta scheme is utilized for solving the resulting ROM. The reader is directed to san2013proper () for a more detailed discussion of the numerical methods used in this investigation. A spacetime evolution of our solution field is presented in Fig. (2) for an . Fig. (3) shows the distribution of the eigenvalues as a function of the total energy captured. It is apparent that retaining the eigenvectors corresponding to the first 5 eigenvalues causes a 94% capture of energy for the system and the first 20 modes contribute to 99.93% of the total energy of the system at . Note that a variation in would manifest itself with a higher number of modes needed to capture an equivalent amount of energy (if increases) and a lower number of modes (if decreases).
Before proceeding with an elaboration of our results, it is worthwhile to elaborate on the definition of our models:

FourierGPROM: Reduced order model obtained by the standard Galerkin projection to Fourier modes.

PODGPROM: Reduced order model obtained by the standard Galerkin projection to POD modes.

FourierANNROM: Proposed reduced order model with ANN closure applied to Fourier modes.

PODANNROM: Proposed reduced order model with ANN closure applied to POD modes.

FourierTrue: Projection of exact solution of PDE into space spanned by Fourier modes.

PODTrue: Projection of exact solution of PDE into space spanned by POD modes.
Fig. (6) shows the time evolution of the solution field when using a ROM implementation of the Fourier modes. On increasing the number of retained modes , one can see a convergence to the true behavior of the PDE. However, for , some oscillations can still be observed for retained modes. When using the POD modes instead, as shown in Fig. (7), a quicker convergence is observed with increasing modes. This is expected since POD bases are generated from exact snapshot data.
Fig. (8) shows the performance of the FourierGPROM and PODGPROM methods against the FourierANNROM and PODANNROM approaches for the first five modes. The true projections are also shown for the purpose of comparison. It is immediately apparent that the use of the ANN closures lends a ‘stabilization’ effect to the modal evolution. We remark that the training data for the ANN architecture includes modal evolution data sets obtained by . At , which is a value of Reynolds number within the data set, an excellent performance is observed with the ANNROM approaches virtually indistinguishable from the true modes. We note that the BR training technique has been used for these figures.
The real challenge of our proposed closure lies in its performance for values of physical parameters which do not lie in the training data set. The ability to predict modal evolution for values of the Reynolds number which are not a part of the training data set would be favorable for optimal control applications where small differences in physical parameters wouldn’t require a complete high fidelity numerical simulation prior to ROM deployment. Fig. (9) describe the time series evolution of the first three modes for the ROM implementations with and without the proposed closures. The bases used for the Galerkin projection here are the Fourier bases. The ANN closure clearly leads to a much closer alignment with the true evolution of the modes. What is notable here is that for and , the ANN closure performs excellently as well in comparison to the FourierGPROM which can be seen to deviate from the trends of the true evolution. A similar conclusion can also be drawn from Fig. (10), where we have now used the POD bases for our transformation space. It can also be observed that the POD modes show a larger deviation from the true evolution without the use of the closure.
Finally, we also detail the effect of the training algorithm (i.e., whether BR or ELM) for determining the ANN weights and consequent performance. We have utilized three values of . We remark that none of the values are a part of the training data set. Also, and are outside the range of the training data. In particular, we are interested in the performance of ELM since it represents a massive opportunity for fast training requirements in dynamical systems. For the case of , given by Fig. (11), we can see the ELM approach progressively converges to the true evolution with increasing number of neurons . For , which lies within the range of our training data set, we can see that the ELM and BR approaches are more or less equivalent with excellent agreement with the true evolution (as shown in Fig. (12)). Fig. (13) shows the challenging case of , where a slight deviation from the true solution can be seen for ELM at lower number of hidden neurons. Note that this behavior is expected since the ELM procedure effectively reduces the degrees of freedom of the single layer feedforward ANN which might prove to be a handicap for a lower number of hidden layer neurons. However, the fast training time effectively offsets the slightly higher necessity for neurons.
7 Conclusions
This work demonstrates the value of utilizing a single layer feedforward artificial neural network as a closure for reduced order models of the viscous Burgers equation. A reduced order model for the Burgers equation is created by transforming our partial differential equation using a new set of orthonormal bases (given by either the optimal POD modes or the Fourier modes). While the Fourier modes are predefined, the POD space is constructed by using snapshot data generated from the exact solution of the system we are studying. The Galerkin projection methodology is then used to evolve a severely truncated system with and without our proposed ANN closure. Our proposed closure consists of an ANN architecture with one hidden layer of neurons which is trained using ‘true’ modal evolution values generated from the exact solution of the governing law. The nonlinear relationship approximates the true value of the right hand side of the Galerkin projection system of ordinary differential equations, given the truncated modes at the current time step. It thus acts as a stabilization procedure.
Our numerical assessments reveal that our closure hypothesis proves efficient in the closure of the ROM for both Fourier and POD bases. The ANN closure is also seen to perform robustly for the modal evolution of systems with Reynolds numbers which are different from the training data set. This highlights its potential utility as a closure for ROMGP based optimal control for fluid systems. In addition, two different training approaches are utilized prior to the deployment of the ANN for the purpose of closure. The BR and ELM training methodologies are selected for this study due to their ability to give regularized networks (i.e., networks which are less sensitive to noisy data). While the BR method is exceptional for its accuracy and has become a mainstay for ANN training for noisy data sets, the ELM approach is extremely popular due its high speed in training. It is observed that both approaches are capable of stabilizing modal evolution satisfactorily. In addition, an increasing number of neurons is seen to enhance the accuracy of the ELM training. This is because it has a slightly lower degree of freedom as compared to the BR method. The ELM approach is particularly promising due its application in dynamical systems which may require fast learning. A simple one dimensional training data is also examined with artificial noise using both BR and ELM approaches and the computational benefit of ELM is clearly established.
This investigation leads us to several meaningful insights about the future of hybrid data and physics driven ROM. For the case of the one dimensional Burgers equation, our proposed closure may be considered to be an adequate augmentation to a PODROM. However, further investigations are necessary to test the performance of the closure (for example for noisy data). The use of BR and ELM (i.e., regularized training) is motivated by the desire to extract the underlying statistics of the flow (which are ultimately physics driven) and they must be ideally tested against experimental data sets with their inherent background noise. In addition, we also plan to investigate the performance of this closure for higher dimensional systems and ultimately for real world flow control problems.
References
 (1) Akhtar, I., Borggaard, J., Burns, J.A., Imtiaz, H., Zietsman, L.: Using functional gains for effective sensor location in flow control: a reducedorder modelling approach. Journal of Fluid Mechanics 781, 622–656 (2015)
 (2) Akhtar, I., Wang, Z., Borggaard, J., Iliescu, T.: A new closure strategy for proper orthogonal decomposition reducedorder models. Journal of Computational and Nonlinear Dynamics 7(3), 034,503 (2012)
 (3) Amsallem, D., Farhat, C.: Interpolation method for adapting reducedorder models and application to aeroelasticity. AIAA Journal 46(7), 1803–1813 (2008)
 (4) Amsallem, D., Farhat, C.: Stabilization of projectionbased reducedorder models. International Journal for Numerical Methods in Engineering 91(4), 358–377 (2012)
 (5) Antoulas, A.C.: Approximation of largescale dynamical systems. SIAM (2005)
 (6) Aubry, N., Holmes, P., Lumley, J.L., Stone, E.: The dynamics of coherent structures in the wall region of a turbulent boundary layer. Journal of Fluid Mechanics 192(1), 115–173 (1988)
 (7) Balajewicz, M., Dowell, E.H.: Stabilization of projectionbased reduced order models of the Navier–Stokes. Nonlinear Dynamics 70(2), 1619–1632 (2012)
 (8) Banyay, G.A., Ahmadpoor, M., Brigham, J.C.: Proper orthogonal decomposition based reduced order modeling of the very high temperature reactor lower plenum hydrodynamics. In: ASME 2014 4th Joint USEuropean Fluids Engineering Division Summer Meeting collocated with the ASME 2014 12th International Conference on Nanochannels, Microchannels, and Minichannels, pp. V01DT27A013–V01DT27A013. American Society of Mechanical Engineers (2014)
 (9) Benner, P., Gugercin, S., Willcox, K.: A survey of projectionbased model reduction methods for parametric dynamical systems. SIAM Review 57, 483–531 (2015)
 (10) Benosman, M., Borggaard, J., San, O., Kramer, B.: Learningbased robust stabilization for reducedorder models of 2D and 3D Boussinesq equations. Applied Mathematical Modelling 49, 162–181 (2017)
 (11) Benosman, M., Kramer, B., Boufounos, P.T., Grover, P.: Learningbased reduced order model stabilization for partial differential equations: Application to the coupled Burgers’ equation. In: American Control Conference (ACC), 2016, pp. 1673–1678. IEEE (2016)
 (12) Bergmann, M., Bruneau, C.H., Iollo, A.: Improvement of reduced order modeling based on POD. Computational Fluid Dynamics 2008 pp. 779–784 (2009)
 (13) Borggaard, J., Hay, A., Pelletier, D.: Intervalbased reduced order models for unsteady fluid flow. Int. J. Numer. Anal. Model 4(34), 353–367 (2007)
 (14) Borggaard, J., Iliescu, T., Wang, Z.: Artificial viscosity proper orthogonal decomposition. Mathematical and Computer Modelling 53(1), 269–279 (2011)
 (15) Borggaard, J., Wang, Z., Zietsman, L.: A goaloriented reducedorder modeling approach for nonlinear systems. Computers & Mathematics with Applications 71(11), 2155–2169 (2016)
 (16) Buffoni, M., Camarri, S., Iollo, A., Salvetti, M.V.: Lowdimensional modelling of a confined threedimensional wake flow. Journal of Fluid Mechanics 569, 141–150 (2006)
 (17) BuiThanh, T., Willcox, K., Ghattas, O., van Bloemen Waanders, B.: Goaloriented, modelconstrained optimization for reduction of largescale systems. Journal of Computational Physics 224(2), 880–896 (2007)
 (18) Carlberg, K., Farhat, C.: A lowcost, goaloriented ‘compact proper orthogonal decomposition’ basis for model reduction of static systems. International Journal for Numerical Methods in Engineering 86(3), 381–402 (2011)
 (19) Cazemier, W.: Proper orthogonal decomposition and low dimensional models for turbulent flows. Groningen (1997)
 (20) Cazemier, W., Verstappen, R., Veldman, A.: Proper orthogonal decomposition and lowdimensional models for driven cavity flows. Physics of Fluids (1994present) 10(7), 1685–1699 (1998)
 (21) Cordier, L., Majd, E., Abou, B., Favier, J.: Calibration of POD reducedorder models using Tikhonov regularization. International Journal for Numerical Methods in Fluids 63(2), 269–296 (2010)
 (22) Dawson, C.W., Wilby, R.: An artificial neural network approach to rainfallrunoff modelling. Hydrological Sciences Journal 43(1), 47–66 (1998)
 (23) Demuth, H.B., Beale, M.H., De Jess, O., Hagan, M.T.: Neural network design. Martin Hagan (2014)
 (24) Dyke, S., Spencer Jr, B., Sain, M., Carlson, J.: Modeling and control of magnetorheological dampers for seismic response reduction. Smart Materials and Structures 5(5), 565 (1996)
 (25) Efe, M., Debiasi, M., Yan, P., Ozbay, H., Samimy, M.: Control of subsonic cavity flows by neural networksanalytical models and experimental validation. In: 43rd AIAA Aerospace Sciences Meeting and Exhibit, p. 294 (2005)
 (26) Efe, M.O., Debiasi, M., Ozbay, H., Samimy, M.: Modeling of subsonic cavity flows by neural networks. In: Mechatronics, 2004. ICM’04. Proceedings of the IEEE International Conference on, pp. 560–565. IEEE (2004)
 (27) El Majd, B.A., Cordier, L.: New regularization method for calibrated POD reducedorder models. Mathematical Modelling and Analysis 21(1), 47–62 (2016)
 (28) Faller, W.E., Schreck, S.J.: Unsteady fluid mechanics applications of neural networks. Journal of Aircraft 34(1), 48–55 (1997)
 (29) Fang, F., Pain, C., Navon, I., Gorman, G., Piggott, M., Allison, P., Farrell, P., Goddard, A.: A POD reduced order unstructured mesh ocean modelling method for moderate Reynolds number flows. Ocean Modelling 28(1), 127–136 (2009)
 (30) Foresee, F.D., Hagan, M.T.: GaussNewton approximation to Bayesian learning. In: Neural Networks, 1997., International Conference on, vol. 3, pp. 1930–1935. IEEE (1997)
 (31) Fortuna, L., Nunnari, G., Gallo, A.: Model order reduction techniques with applications in electrical engineering. Springer Science & Business Media (2012)
 (32) Freund, R.W.: Reducedorder modeling techniques based on Krylov subspaces and their use in circuit simulation. In: Applied and computational control, signals, and circuits, pp. 435–498. Springer (1999)
 (33) Gaspard, P.: Chaos, scattering and statistical mechanics, vol. 9. Cambridge University Press (2005)
 (34) Gillies, E.: Lowdimensional characterization and control of nonlinear wake flows. Ph.D. thesis, PhD. Dissertation, University of Glasgow, Scotland (1995)
 (35) Gillies, E.: Lowdimensional control of the circular cylinder wake. Journal of Fluid Mechanics 371, 157–178 (1998)
 (36) Gillies, E.: Multiple sensor control of vortex shedding. AIAA journal 39(4), 748–750 (2001)
 (37) Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks 5(6), 989–993 (1994)
 (38) Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural networks and learning machines, vol. 3. Pearson Upper Saddle River, NJ, USA: (2009)
 (39) Hocevar, M., SÌirok, B., Grabec, I.: Experimental turbulent field modeling by visualization and neural networks. Journal of Fluids Engineering 126, 316–322 (2004)
 (40) Holmes, P., Lumley, J.L., Berkooz, G.: Turbulence, coherent structures, dynamical systems and symmetry. Cambridge University Press (1998)
 (41) Hotelling, H.: Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24(6), 417 (1933)
 (42) Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
 (43) Imtiaz, H., Akhtar, I.: Closure modeling in reducedorder model of Burgers’ equation for control applications. Journal of Aerospace Engineering pp. 1–15 (2016)
 (44) Iollo, A., Lanteri, S., Désidéri, J.A.: Stability properties of POD–Galerkin approximations for the compressible Navier–Stokes equations. Theoretical and Computational Fluid Dynamics 13(6), 377–396 (2000)
 (45) Kazantzis, N., Kravaris, C., Syrou, L.: A new model reduction method for nonlinear dynamical systems. Nonlinear Dynamics 59(1), 183–194 (2010)
 (46) Khibnik, A., Narayanan, S., Jacobson, C., Lust, K.: Analysis of low dimensional dynamics of flow separation. In: Continuation Methods in Fluid Dynamics, vol. 74, pp. 167–178. Vieweg (2000)
 (47) Kim, T.W., Valdés, J.B.: Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. Journal of Hydrologic Engineering 8(6), 319–328 (2003)
 (48) Kunisch, K., Volkwein, S.: Control of the Burgers equation by a reducedorder approach using proper orthogonal decomposition. Journal of optimization theory and applications 102(2), 345–371 (1999)
 (49) Kunisch, K., Volkwein, S.: Galerkin proper orthogonal decomposition methods for parabolic problems. Numerische Mathematik 90(1), 117–148 (2001)
 (50) Kunisch, K., Volkwein, S.: Optimal snapshot location for computing POD basis functions. ESAIM: Mathematical Modelling and Numerical Analysis 44(3), 509–529 (2010)
 (51) Lassila, T., Manzoni, A., Quarteroni, A., Rozza, G.: Model order reduction in fluid dynamics: challenges and perspectives. In: A. Quarteroni, G. Rozza (eds.) Reduced Order Methods for modeling and computational reduction. Springer, Milano (2013)
 (52) Lee, C., Kim, J., Babcock, D., Goodman, R.: Application of neural networks to turbulence control for drag reduction. Physics of Fluids 9(6), 1740–1747 (1997)
 (53) Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Quarterly of Applied Mathematics 2(2), 164–168 (1944)
 (54) Loève, M.: Probability Theory; Foundations, Random Sequences. New York: D. Van Nostrand Company (1955)
 (55) Lorenz, E.N.: Empirical orthogonal functions and statistical weather prediction (1956)
 (56) Lucia, D.J., Beran, P.S., Silva, W.A.: Reducedorder modeling: new approaches for computational physics. Progress in Aerospace Sciences 40(1), 51–117 (2004)
 (57) Lumley, J.: The structures of inhomogeneous turbulent flow. In: Atmospheric Turbulence and Radio Wave Propagation, A. Yaglom and V. Tatarski, eds, pp. 160–178 (1967)
 (58) MacKay, D.J.: Bayesian interpolation. Neural computation 4(3), 415–447 (1992)
 (59) Maleewong, M., Sirisup, S.: Online and offline POD assisted projective integral for nonlinear problems: A case study with Burgers’ equation. International Journal of Mathematical, Computational, Physical, Electrical and Computer Engineering 5(7), 984–992 (2011)
 (60) Marquardt, D.W.: An algorithm for leastsquares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11(2), 431–441 (1963)
 (61) Moin, P.: Fundamentals of engineering numerical analysis. Cambridge University Press (2010)
 (62) Moosavi, A., Stefanescu, R., Sandu, A.: Efficient Construction of Local Parametric Reduced Order Models Using Machine Learning Techniques. arXiv preprint arXiv:1511.02909 (2015)
 (63) Moosavi, A., Stefanescu, R., Sandu, A.: Multivariate predictions of local reducedordermodel errors and dimensions. arXiv preprint arXiv:1701.03720 (2017)
 (64) Narayanan, S., Khibnik, A., Jacobson, C., Kevrekedis, Y., RicoMartinez, R., Lust, K.: Lowdimensional models for active control of flow separation. In: Control Applications, 1999. Proceedings of the 1999 IEEE International Conference on, vol. 2, pp. 1151–1156. IEEE (1999)
 (65) Nguyen, D., Widrow, B.: Improving the learning speed of 2layer neural networks by choosing initial values of the adaptive weights. In: Neural Networks, 1990., 1990 IJCNN International Joint Conference on, pp. 21–26. IEEE (1990)
 (66) Noack, B., Papas, P., Monkewitz, P.: Lowdimensional Galerkin model of a laminar shearlayer. Tech. rep., Tech. Rep. 200201. Laboratoire de Mecanique des Fluides, Departement de Genie Mecanique, Ecole Polytechnique Fédérale de Lausanne, Switzerland (2002)
 (67) Noack, B.R., Morzynski, M., Tadmor, G.: Reducedorder modelling for flow control, vol. 528. Springer Verlag (2011)
 (68) Ravindran, S.S.: A reducedorder approach for optimal control of fluids using proper orthogonal decomposition. International Journal for Numerical Methods in Fluids 34(5), 425–448 (2000)
 (69) Rowley, C.W.: Model reduction for fluids, using balanced proper orthogonal decomposition. International Journal of Bifurcation and Chaos 15(03), 997–1013 (2005)
 (70) Rowley, C.W., Dawson, S.T.: Model reduction for flow analysis and control. Annual Review of Fluid Mechanics 49, 387–417 (2017)
 (71) Roychowdhury, J.: Reducedorder modeling of timevarying systems. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 46(10), 1273–1288 (1999)
 (72) Sahan, R., KocSahan, N., Albin, D., Liakopoulos, A.: Artificial neural networkbased modeling and intelligent control of transitional flows. In: Control Applications, 1997., Proceedings of the 1997 IEEE International Conference on, pp. 359–364. IEEE (1997)
 (73) San, O., Iliescu, T.: Proper orthogonal decomposition closure models for fluid flows: Burgers equation. International Journal of Numerical Analysis and Modeling 5, 217–237 (2014)
 (74) San, O., Iliescu, T.: A stabilized proper orthogonal decomposition reducedorder model for large scale quasigeostrophic ocean circulation. Advances in Computational Mathematics 41(5), 1289–1319 (2015)
 (75) Schmid, P.J.: Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics 656, 5–28 (2010)
 (76) Serre, D.: Matrices: Theory and Applications. SpringerVerlag, New York (2002)
 (77) Silveira, L.M., Kamon, M., White, J.: Efficient reducedorder modeling of frequencydependent coupling inductances associated with 3D interconnect structures. IEEE Transactions on Components, Packaging, and Manufacturing Technology: Part B 19(2), 283–288 (1996)
 (78) Sirisup, S., Karniadakis, G.E.: A spectral viscosity method for correcting the longterm behavior of POD models. Journal of Computational Physics 194(1), 92–116 (2004)
 (79) Sirovich, L.: Turbulence and the dynamics of coherent structures. ICoherent structures. IISymmetries and transformations. IIIDynamics and scaling. Quarterly of Applied Mathematics 45, 561–571 (1987)
 (80) Taira, K., Brunton, S.L., Dawson, S., Rowley, C.W., Colonius, T., McKeon, B.J., Schmidt, O.T., Gordeyev, S., Theofilis, V., Ukeiley, L.S.: Modal analysis of fluid flows: An overview. arXiv preprint arXiv:1702.01453 (2017)
 (81) Ullmann, S., Lang, J.: A PODGalerkin reduced model with updated coefficients for Smagorinsky LES. In: V European conference on computational fluid dynamics, ECCOMAS CFD, p. 2010 (2010)
 (82) Wang, Z.: Reducedorder modeling of complex engineering and geophysical flows: analysis and computations. Ph.D. thesis, Virginia Tech (2012)
 (83) Wang, Z., Akhtar, I., Borggaard, J., Iliescu, T.: Twolevel discretizations of nonlinear closure models for proper orthogonal decomposition. Journal of Computational Physics 230(1), 126–146 (2011)
 (84) Wang, Z., Akhtar, I., Borggaard, J., Iliescu, T.: Proper orthogonal decomposition closure models for turbulent flows: a numerical comparison. Computer Methods in Applied Mechanics and Engineering 237, 10–26 (2012)
 (85) Weller, J., Lombardi, E., Bergmann, M., Iollo, A.: Numerical methods for loworder modeling of fluid flows based on POD. International Journal for Numerical Methods in Fluids 63(2), 249–268 (2010)
 (86) Wells, D., Wang, Z., Xie, X., Iliescu, T.: An evolvethenfilter regularized reduced order model for convectiondominated flows. International Journal for Numerical Methods in Fluids 10.1002/fld.4363, 1–20 (2017)
 (87) Widrow, B., Rumelhart, D.E., Lehr, M.A.: Neural networks: applications in industry, business and science. Communications of the ACM 37(3), 93–106 (1994)
 (88) Xie, X., Wells, D., Wang, Z., Iliescu, T.: Approximate deconvolution reduced order modeling. Computer Methods in Applied Mechanics and Engineering 313, 512–534 (2017)
 (89) Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14(1), 35–62 (1998)