Construction of Reduced Order Models for Fluid Flows Using Deep Feedforward Neural Networks
Abstract
We present a numerical methodology for construction of reduced order models, ROMs, of fluid flows through the combination of flow modal decomposition and regression analysis. Spectral proper orthogonal decomposition, SPOD, is applied to reduce the dimensionality of the model and, at the same time, filter the POD temporal modes. The regression step is performed by a deep feedforward neural network, DNN, and the current framework is implemented in a context similar to the sparse identification of nonlinear dynamics algorithm, SINDy. A discussion on the optimization of the DNN hyperparameters is provided for obtaining the best ROMs and an assessment of these models is presented for a canonical nonlinear oscillator and the compressible flow past a cylinder. Then, the method is tested on the reconstruction of a turbulent flow computed by a large eddy simulation of a plunging airfoil under dynamic stall. The reduced order model is able to capture the dynamics of the leading edge stall vortex and the subsequent trailing edge vortex. For the cases analyzed, the numerical framework allows the prediction of the flowfield beyond the training window using larger time increments than those employed by the full order model. We also demonstrate the robustness of the current ROMs constructed via deep feedforward neural networks through a comparison with sparse regression. The DNN approach is able to learn transient features of the flow and presents more accurate and stable longterm predictions compared to sparse regression.
fontsize=,xleftmargin=2em
keywords
1 Introduction
Current supercomputers allow the application of high fidelity numerical simulations of turbulent flows over largescale industrial configurations. The results from these simulations certainly improve the understanding of complex physical phenomena such as mixing enhancement, drag reduction, heat transfer, noise generation, to name a few. Such simulations may lead to discretizations with billions of degrees of freedom in order to resolve the energetically relevant spatial and temporal scales. In these cases, the fluid flow data is generally obtained for long periods using small time steps to compute meaningful converged statistics of the flow with sufficient accuracy.
The analysis of unsteady flows by timeresolved simulations and experiments require the acquisition and treatment of large datasets. In recent years, datadriven algorithms have been developed and applied to perform statistical postprocessing of such datasets of unsteady fluid flows allowing the investigation of complex physical mechanisms in turbulent flows and improving their analyses. For example, one can cite techniques of flow modal decomposition such as proper orthogonal decomposition (POD) and its variations (Lumley, 1967; Sieber et al., 2016; Ribeiro & Wolf, 2017; Towne et al., 2018), dynamic mode decomposition (DMD) and variations (Schmid, 2010; Tu et al., 2014; Clainche & Vega, 2017), Lagrangian coherent structures (LCS) (Haller, 2015; Green et al., 2007; Rockwood et al., 2016) and resolvent analysis (Luhar et al., 2014; Sharma et al., 2016), among others. Recently, Taira et al. (2017) published a review of such techniques in the context of fluid flows.
The previous techniques of modal analysis can also be employed for the construction of reduced order models (ROMs) which are appealing since they can be used in preliminary stages of design due to their inherent reduction in the computational costs compared to largescale simulations. Such techniques should be also useful in the context of optimization analyzes and studies of flow control (Brunton & Noack, 2015). However, in order to be employed for these applications, ROMs should be able to reproduce the main physical aspects of the full scale models. Several ROM techniques have been discussed in the literature such as Galerkin projection (Rowley et al., 2004), leastsquares PetrovGalerkin projection (Carlberg et al., 2011), eigensystem realization analysis (Juang & Pappa, 1985) and sparse regression of nonlinear dynamics (Brunton et al., 2016). The previous techniques can be used to reduce the original sets of partial differential equations to sets of ordinary differential equations. Sparse regression has also been applied for discovering sets of partial differential equations through spatiotemporal data collection (Rudy et al., 2017). These techniques have been mostly applied for canonical problems and their application to more complex turbulent flows is still a challenging task.
In some cases, ROMs constructed using some of the above techniques may exhibit unstable behavior (Carlberg et al., 2011). For example, ROMs developed based on Galerkin projection employ POD to rewrite the NavierStokes equations as sets of dynamical systems for the evolution of the POD temporal modes. Due to the basis truncation typical of such methods, numerical instabilities can be created from the unbalance in the turbulent kinetic energy budget in the ROM. This issue may be addressed using turbulence models (Cazemier et al., 1998; Östh et al., 2014; Protas et al., 2015), however, this approach can destroy consistency between the original partial differential equations and the ODE system of the ROM. Other alternatives have been also proposed to deal with this problem via minimal rotation of the projected subspace (Balajewicz et al., 2016). This approach is able to account for the contribution of the truncated modes while keeping the consistency between the full and reduced order models. Recently, San & Maulik (2018) employed machine learning via neural networks to compute optimal coefficients for an eddy viscosity model which is able to stabilize a PODGalerkin ROM.
Machine learning is a field that finds application in several areas from data classification to pattern recognition and nonlinear function approximation. Several groups have applied machine learning for investigations concerning fluid flows. Ling & Templeton (2015) evaluated different machine learning techniques to classify regions of high uncertainty in RANS calculations. Ling et al. (2016) presented a novel deep neural network (DNN) architecture to improve RANS turbulence models. In a similar context, Wang et al. (2017) developed a machine learning approach based on random forests for predicting discrepancies in Reynolds stresses obtained by RANS calculations. Ströfer et al. (2019) employed convolutional neural networks (CNNs), a machine learning technique well suited for image pattern recognition, to identify features in fluid flows.
Machine learning is also a natural candidate for the development of ROMs of largescale dynamical systems, typical of numerical simulations of unsteady flows. A discussion on the application of deep learning in the context just described was presented by Kutz (2017). Recently, several authors have proposed algorithms for the prediction of highdimensional complex dynamical systems using neural networks and their variants (Rudy et al., 2018; San & Maulik, 2018; Pan & Duraisamy, 2018; Wan et al., 2018; Vlachas et al., 2018). Rudy et al. (2018) presented a methodology that is able to learn the dynamics of a particular system and estimate the noise from measurements using feedforward neural networks (FNN). Pan & Duraisamy (2018) performed a comparison between FNNs and sparse regression for modeling of dynamical systems and demonstrated the benefits of the former in terms of adaptability. They computed the Frobenius norm of the Jacobian of the neural network as the regularization term of the cost function to improve accuracy and robustness of the framework. Wan et al. (2018) developed a reduced order model methodology capable of modeling extreme events occurring in dynamical systems. These authors employed a long shortterm memory (LSTM) approach to regularize a recurrent neural network (RNN) which obtains the complementary dynamics of a nonlinear Galerkin projection of the dynamical system. Vlachas et al. (2018) combined a LSTM neural network with a mean stochastic model to propose a datadriven algorithm which has desirable short and longterm prediction capabilities.
In this work, we present a numerical methodology which combines flow modal decomposition via POD and regression analysis using machine learning and FNNs. The framework is implemented in a context similar to that of the sparse identification of nonlinear dynamics (SINDy) algorithm (Brunton et al., 2016). In order to improve the regression step in the current approach, a spectral proper orthogonal decomposition (SPOD) (Sieber et al., 2016) is employed to extract a lowdimensional representation of the full order model. The SPOD technique is able to filter the temporal modes while preserving the information of the full order model by a redistribution of the energy of the system to higher POD modes. The current method is applied for the construction of ROMs of unsteady compressible flows. First, we test the method for a simple two degree of freedom nonlinear dynamical system. Then, we evaluate the capability of the method to reproduce the compressible flow past a cylinder including its noise generation. In this case, we show that the current DNN approach is also able to reproduce transient stages of the flow. Finally, the method is tested for a turbulent flow involving dynamic stall of a plunging airfoil. We present a discussion on the optimization of hyperparameters for obtaining the best models for the deep neural networks. For both the cylinder and plunging airfoil cases, a comparison between DNN and sparse regression techniques is presented in terms of their predictive capability. The approach presented in this work allows to predict the flowfield beyond the training window and with larger time increments than those used by the full order model, demonstrating the robustness of the current ROMs constructed via deep feedforward neural networks. The current numerical tool for construction of reduced order models via DNNs can be downloaded from <http://cces.unicamp.br/software/>.
2 Theoretical and Numerical Formulation
2.1 Flow modal decomposition
The equations governing an unsteady threedimensional compressible flow contain partial derivatives with respect to both the spatial coordinates and time . Using the method of lines one can first approximate the spatial derivatives producing a system of nonlinear ordinary differential equations (ODEs). In the most general notation, for each mesh point, these ODEs would be expressed in the form
(1) 
where is a nonlinear operator, is the vector of flow variables, is the density, , and are respectively the , and components of the velocity vector, and is the pressure.
Here, we consider that a dynamical system given by Eq. 1 is solved at each mesh point. To determine the nonlinear operator from data, we follow the ideas of the sparse identification of nonlinear dynamics algorithm (SINDy) developed by Brunton et al. (2016) with some modifications. A schematic of the current method can be seen in figure 1. First, we collect snapshots of the variables which will be our training data. The data set is then arranged into a matrix as
(2) 
where is the number of grid points in the computational domain and is the number of snapshots.
Because of the high dimensionality of the input data , we first reduce the dimension of the dynamical system using the snapshot POD method (Sirovich, 1986). Proper orthogonal decomposition has been widely applied to obtain optimal bases that represent the most energetic content of the system dynamics with as few bases functions as possible (Lumley, 1967). The snapshot POD approach starts with a decomposition of the vector quantities into the mean flow, , and fluctuations, . The latter can be further expanded into a combination of spatial modes and their temporal coefficients for a defined number of modes as
(3) 
To calculate the POD correlation matrix of the data set some specific norm must be used. For an incompressible flow, a kinetic energy norm provides an optimal result, however, for a compressible flow, other norms can be employed (Rowley et al., 2004). Hence, we define a vector that determines which norm should be used to compute the correlation between two snapshots. For example, a pressurebased norm uses and a kinetic energy norm uses . The correlation between two snapshots is computed using the inner product. Therefore, considering and , the elements of the correlation matrix are given by
(4) 
where is the fluid region of interest for the reconstruction and the matrix is of size . For the problems studied in this work, we employ norms based on kinetic energy and pressure. Despite the changes in the POD modes computed for the different norms, we observed that the ROMs obtained by either kinetic energy or pressure norms were similar. In both cases, stable and accurate models could be reconstructed by the DNNs and further comments are provided in the results section.
In turbulent flows, the POD temporal modes may contain contributions from several frequencies, including highfrequency noise. For such cases, in order to provide a better identification of the individual modes and to smooth out the temporal coefficients, we employ the spectral proper orthogonal decomposition (SPOD) described by Sieber et al. (2016). The SPOD technique is able to filter the temporal modes while preserving the information of the full order model by a redistribution of the energy of the system to higher POD modes. The technique consists of applying a filter function to the POD correlation matrix, which results in a matrix with elements given as
(5) 
Here, is the filter function and is the size of the filter window. We consider a periodic temporal series and assume that the correlation matrix is also periodic (see Ribeiro & Wolf (2017) for details). Hence, , where is the number of snapshots used in the SPOD filter. Following this notation, if we apply 50% of filter to the correlation matrix, it means that we are filtering 50% of the total number of snapshots. Several filter functions can be applied to the POD correlation matrix and their effects are reported by Ribeiro & Wolf (2017). The filter function employed in this work is the box filter represented by .
The temporal coefficients and the POD eigenvalues are determined from the filtered correlation matrix as
(6) 
Singular value decomposition (SVD) can be employed to compute the eigenvalues and eigenvectors of since the matrix is real symmetric positivedefinite. The spatial modes are obtained from the projection of the fluctuation quantities onto the temporal coefficients
(7) 
Finally, the temporal coefficients and spatial modes can be stored in matrices and as
(8) 
and
(9) 
where the temporal coefficients are the columns of and the spatial modes are the rows of . Hence, the matrix of fluctuation quantities can be written as
(10) 
Taking the time derivative of Eq. 3, we arrive at the following set of equations
(11) 
The last term of Eq. 11 represents the temporal evolution of coefficients associated with the modes retained in the SPOD modal basis. We can express this system of coupled ODEs as
(12) 
Next, we compute the derivative numerically using the data for each temporal mode. Different numerical schemes are tested for the computation of the temporal derivatives and a discussion on the application of explicit schemes and compact schemes will be provided in the results section. For now, lets consider that the numerical scheme employed for the temporal derivatives is a 10thorder accurate compact scheme (Lele, 1992) which provides high spectral resolution being nondissipative and lowdispersive. The derivative is then obtained as
(13) 
In the equation above, is the time step, and the coefficients of the numerical scheme are set as , , , and for a 10thorder discretization. The system of Eqs. 13 written for each temporal mode can be solved as a pentadiagonal linear system for the unknown derivatives . The boundary data points can be computed from the interior points following Olson (2012).
The derivatives are then arranged into a matrix
(14) 
2.2 Regression via DNN
Once the matrix of temporal derivatives is computed, we can set up a regression problem to find the weights and biases that determine the function presented in Eq. 12
(15) 
where is the matrix of features. In the SINDy algorithm, Brunton et al. (2016) suggest that may consist of constant, polynomial, exponential and trigonometric functions. However, in many cases, it is complicated to know what set of features should be extracted from the data. Hence, we use machine learning to circumvent the problem of finding the functions which represent the dynamics of the problem. Therefore, the methodology can discover not only the weights and biases but also the features . The “learned” features often result in a better performance when compared to those obtained using “engineered” features. A learning algorithm can find a proper set of features in minutes or hours, depending on the task complexity. On the other hand, manually engineered features would require a great amount of human time and effort for complex tasks (Goodfellow et al., 2016).
Deep learning methods are feature learning algorithms that can find a proper set of features using multiple layers, from higher layer features defined in terms of lower layer features. Automatically learning features at multiple processing layers allow the learning of complex functions through mapping the input to the output directly from a given data (Bengio, 2009). In the present work, a deep feedforward neural network (DNN) is used to learn the weights , biases and features of the dynamical systems investigated. Figure 2 shows a sample DNN arquitecture where the input of the DNN is the matrix and the target is the matrix . The DNN calculation procedure is presented in the following algorithm form 1. In the current work, the open source machine learning framework Tensorflow (Abadi & et al., 2015) is used for training the DNN.
Once we have learned the parameters and of our model 15, we can use them to predict the temporal coefficients given the initial conditions . The system of coupled ODEs 12 can now be given by
(15) 
where is the index of the last layer, is the activation function matrix of the penultimate layer, is the weight matrix of the last layer, and is the bias vector of the last layer.
This system of coupled ODEs is integrated using an explicit 5 stage 4thorder RungeKutta scheme derived by Kennedy et al. (1999). As we can see in algorithm 1, depends on the weigths and biases from previous layers. Thus, for each stage of the 4thorder Runge Kutta, we need to perform forward propagation to obtain . As we have the temporal coefficients , one can reconstruct the flowfield using Eq. 3. However, we are interested in using a reduced order model in circumstances other than simply reproducing the training data. The approach presented in this work allows to predict the flowfield beyond the training window as and depend only on the spatial coordinates and they are calculated using only the training data.
2.3 Datadriven ROMs
As previously discussed in section 1, the construction of reduced order models is an active area of research and different methodologies for the development of ROMs are available. For example, PODGalerkin based methods are directly related to the physics of the problem through the projection of the Navier Stokes equations into a system of ordinary differential equations. Galerkin projection methods require the treatment of the linear and nonlinear spatial terms appearing in the full order model. Another issue with these methods relates to their expensive application for dynamical systems with strong nonlinearities where one should employ, for example, hyperreduction techniques (Chaturantabut & Sorensen, 2010; Carlberg et al., 2011; Zimmermann & Willcox, 2016).
In the current approach, we employ DNNs for the regression step in a context similar to the SINDy algorithm. Both the DNN and the original SINDy approaches are datadriven methods and, instead of having a direct connection with the physics of the problem, these techniques learn from data. An advantage of these methods is that the nonlinearities of the problem are considered in the temporal derivatives of the primitive variables, which are obtained from the full order model. Therefore, neither the SINDy nor the DNN method requires the treatment of spatial derivatives.
In general, the resulting reduced order models constructed via DNNs are not physically interpretable due to the nonlinearity of the matrix of features and their weights and biases, which are not sparse. On the other hand, in some cases, such as the flow past a cylinder, models constructed using sparse regression could lead to physically interpretable results (Brunton et al., 2016). As expected, the application of DNNs for the regression step adds a penalty cost compared to sparse regression. However, it is shown in this work that, for the cases analyzed, models obtained using neural networks present better longterm predictive capabilities compared to sparse regression. Both these issues will be discussed in the results section.
3 Hyperparameter Optimization
The performance of the DNN described in algorithm 1 depends dramatically on the selection of hyperparameters such as the network depth, , number of hidden units for each layer, , regularization parameter, , and learning rate . Indeed, finding an optimal set of hyperparameters which minimizes the loss function over a hyperparameter space is a challenging task given the substantial number of free parameters involved.
The manual search, grid search, random search (Bergstra & Bengio, 2012) and Bayesian optimization (Brochu et al., 2010) are the most widely used procedures for the hyperparameter optimization. The manual search consists in a direct human trial and error procedure in the search for an optimal configuration of hyperparameters. This procedure is entirely based on prior experience of the user and there is a high probability that an optimal set of hyperparameters is not found. However, manual search is still useful if the effect of a specific hyperparameter on the model performance can be monitored on the fly. In the grid search method, several combinations of hyperparameter values are tested in a range evenly spaced. This method is extremely timeconsuming because the number of trials increases exponentially with the number of hyperparameters. In a random search, one randomly selects each hyperparameter from a defined range and evaluates the model performance. It is timeconsuming when a highdimensional hyperparameter space is analyzed, however, Bergstra & Bengio (2012) empirically show that random search outperforms a grid search for the hyperparameter optimization both in terms of computational time and model performance.
One of the recent strategies to find an optimal set of hyperparameters is the Bayesian optimization. It is a technique that involves constructing a probabilistic surrogate model to the data in order to determine the most promising hyperparameters to evaluate. Snoek et al. (2012) showed that Bayesian optimization was able to find optimal hyperparameters for a threelayer convolutional neural network considerably faster than previous approaches, and outperformed the state of the art performance at selecting the set of hyperparameters on the CIFAR10 dataset (Krizhevsky, 2009).
In this work, we use two hyperparameter optimization strategies: random search and Bayesian optimization. For random search, the model generation procedure is presented in the following algorithm form 2. Likewise, Bayesian optimization has the same inputs as random search. To report the performance of each model from a set of candidates, we compute the mean absolute error (MAE) over the training data. The candidate models with lower MAE values are most likely those which will provide the optimal parameters. It is possible, however, that the model with the lowest MAE suffers from overfitting.
The current metric does not assess the generalization of the model. However, it is the only one available since we cannot split our data into training and validation sets. The validation dataset should provide an unbiased evaluation of the ROM. In our case, we employ the temporal coefficients of the POD modes for the training stage of the model. These modes are computed using a correlation of different snapshots and if we construct the ROM using POD modes obtained with data including the validation set, our model would use a biased set for the training stage. This occurs because the POD correlation matrix would be computed for snapshots of both the training and validation sets.
Hyperparameter 
Improves performance if  Reason  Warning 

Number of hidden units, 
Increased  Increasing the number of hidden units augments the capacity of the model to represent more complex functions  Increasing this parameter may cause overfitting to the training data 
Number of layers, 
Increased  Same as above  Same as above. One should also be aware that a DNN with a large number of layers and a very small number of hidden units will not work properly. 
Regularization parameter, 
Reduced  Reducing the regularization parameter allows larger weights for the model features. One should expect that some of these features are the most relevant for the model.  Reducing the regularization parameter causes the model to be more prone to overfitting to the training data 
Learning rate, 
Tuned  If is too small, the optimization process can be slow. If is too high, the optimization method may lead to overshoot of local minima. This parameter is chosen by monitoring the learning curve.  If is too large, the learning curve will show strong oscillations. If it is too small, the learning curve may stuck with a high value of the cost function. 
Number of iterations, 
Tuned  The number of iterations is strictly related to the learning rate. It is chosen by monitoring the learning curve.  As the number of iterations increases, the model goes from underfitting to optimal and, then, to overfitting the training data. 

An alternative to this issue is to use Akaike’s information criterion (AIC) (Akaike, 1973) or the Bayesian information criterion (BIC) (Schwarz, 1978) as the model selection criteria. These methods try to balance the quality of the fit and model complexity and the main advantage is that there is no need for a validation set. The downside is that AIC and BIC impose a penalty for model complexity which is related to the number of parameters. This can be a problem when dealing with deep neural networks due to their large number of parameters.
In our experiments, we noticed that a few of the hyperparameters employed in algorithm 1 need to be tuned for the best performance of the DNN. These are listed in algorithm 2. The remaining hyperparameters are determined according to the following procedures in order to reduce the hyperparameter search space. For instance, hyperparameters related to the Adam optimization, such as the exponential decay rates and , and the constant for numerical stability , are set as described by Kingma & Ba (2014). The learning rate and the number of iterations are chosen via manual search. Further details about this process can be found in Goodfellow et al. (2016, p. 295). As can be observed in the results section, we use the same values of the previous hyperparameters for all flow simulations studied in this work and these values should serve as references for other flows. Following Goodfellow et al. (2016), we provide table 1 with information of the hyperparameters presented in algorithm 2 based on their effects on model performance. This table also presents a similar discussion for the hyperparameters chosen via manual search. We expect that it serves as a guideline for choosing the range of values to be explored for the hyperparameters presented in algorithm 2.
Here, we consider that the activation function is also a hyperparameter since it plays a key role in the DNN performance. There are several available activation functions such as sigmoid, hyperbolic tangent (tanh), rectified linear unit (ReLU) (Nair & Hinton, 2010), exponential linear unit (ELU) (Clevert et al., 2015), to name a few. For general regression problems, tanh and ELU are the most popular functions since they possess nonlinear properties and are continuously differentiable. Clevert et al. (2015) show that the ELU function reduces the vanishing gradient effect since its positive part returns the identity. Thus, in the positive part, the derivative is unitary and it is not contractive. On the other hand, tanh is contractive almost everywhere. Furthermore, in our experiments, the ELU function provided results with fewer iterations than a corresponding tanh network and, therefore, we use ELU as the activation function.
4 Results
In this section, the present method is first tested in the reconstruction of the dynamics of a damped cubic oscillator. Then, we evaluate the capability of the DNNROMs to reproduce the dynamics of a compressible flow past a cylinder including its noise generation. A comparison between the current DNN technique against sparse regression is also presented for the transient regime of an incompressible flow past a cylinder. Finally, the method is employed to create a ROM of a turbulent flow involving dynamic stall of a plunging airfoil. Again, the DNN solution is compared against that obtained by sparse regression. In each example, we show the ability and limitations of the methods to identify the dynamics of the different nonlinear systems comparing the ROM and full order model (FOM) solutions.
4.1 Nonlinear oscillator
For this first example, we consider a canonical problem in system identification (Brunton et al., 2016; Rudy et al., 2018): the twodimensional nonlinear oscillator. The system dynamics are given by
(16) 
with initial conditions .
We generate 4000 snapshots from to by integrating equation 16 using an explicit 5 stage 4thorder RungeKutta scheme (Kennedy et al., 1999) with a time step of . The training window spans the period and the remaining data is used as the test set. The system reconstruction is obtained following the procedures defined in section 2. For this first example, it is not necessary to use POD due to the low dimensionality of the system. The parameters employed in algorithm 2 are presented in table 2 and the best set of hyperparameters obtained via random search is listed in table 3.
DNN architecture  

ELU 
Figure 3 presents the solutions obtained by the FOM (true model) and ROM for the damped cubic oscillator. Results show that the proposed algorithm accurately reproduces the system dynamics during the training window and beyond, for the test set.
4.2 Construction of ROMs for fluid flow simulations
For the following investigations of the flow past a cylinder and the dynamic stall of a plunging SD7003 airfoil, the system dynamics are modeled by the compressible NavierStokes equations in 2D and 3D, respectively. To simulate the airfoil undergoing a prescribed motion, the equations are solved in a noninertial frame. In this form, source terms emerge from the grid curvature and frame movement. Here, all terms are solved in the full contravariant form to allow the use of a curvilinear coordinate system . For a frame of reference with varying linear velocity, the equations reduce to
(17) 
(18) 
and
(19) 
The set of equations above represent the continuity, momentum and energy equations. In order to close the system of equations the following relations are employed
(20) 
(21) 
and
(22) 
Here, represents the density, the th component of the contravariant velocity vector and is the pressure. The term is the frame position (crossstream motion of the plunging airfoil), is the total energy, is the dynamic viscosity and is the temperature. The dots represent temporal derivatives of the frame position, i.e., frame velocity and acceleration. The aforementioned terms are nondimensionalized by freestream quantities such as density and speed of sound . The length scales are made nondimensional by the cylinder diameter or airfoil chord. In the above equations, and are the covariant and contravariant metric tensors, and represents the Christoffel symbols of the second kind. Further details about the present formulation can be found in Aris (1989).
The numerical scheme for the spatial discretization of the equations is a highresolution sixthorder accurate compact scheme (Nagarajan et al., 2003) implemented on a staggered grid. A sixthorder compact interpolation scheme is also employed (Lele, 1992) to obtain fluid properties on the different nodes of the staggered configuration. Due to the nondissipative characteristics of compact finitedifference schemes, numerical instabilities may arise from mesh nonuniformities, interpolations and boundary conditions and, hence, they have to be filtered to preserve stability of the numerical schemes. The high wavenumber compact filter presented by Lele (1992) is applied in flow regions far away from solid boundaries at prescribed time intervals in order to control numerical instabilities.
The time integration of the fluid equations is carried out by the fully implicit secondorder scheme of Beam & Warming (1978) in the nearwall region in order to overcome the time step restriction due to the usual nearwall finegrid numerical stiffness. A thirdorder RungeKutta scheme is used for time advancement of the equations in flow regions far away from solid boundaries. Noslip adiabatic wall boundary conditions are applied along the solid surfaces and characteristic plus sponge conditions are applied in the far field. For the study of dynamic stall, periodic boundary conditions are applied along the spanwise direction of the airfoil. In this case, we employ an implicit large eddy simulation to study the flow physics of deep dynamic stall over a plunging SD7003 airfoil profile, i.e., no subgrid scale model is used. The numerical tool has been previously validated for several simulations of unsteady compressible flows (Wolf, 2011; Wolf et al., 2012a, b; D’Lelis et al., 2019).
4.2.1 Flow past a cylinder
In this case, the full order model (FOM) is obtained by solving the compressible Navier Stokes equations as detailed in section 4.2. The numerical simulations are conducted for Reynolds and Mach numbers and , respectively. These nondimensional parameters are computed based on freestream quantities. The grid configuration consists of a bodyfitted Ogrid with 421 751 points in the streamwise and wallnormal directions, respectively.
The flow is recorded for 1120 snapshots with nondimensional time steps of = 0.05. The snapshots are collected after an initial transient period of the simulation is discarded. The reduced order model (ROM) is obtained following the procedure described in section 2 using the pressure norm for the POD correlation matrix. It is important to mention that the use of a kinetic energy norm produced similar results for this case. The training data comprises the first 280 snapshots of the FOM and the remaining data is employed as the test set. The set of hyperparameters employed in algorithm 2 is listed in table 4 and the best set of these parameters obtained via random search is presented in table 5. For this case, Bayesian optimization was also able to produce accurate models but at a higher computational cost compared to random search. One can see that the best architecture of the neural network for this case has 10 layers. However, we also found several other stable and accurate models with 6 layers, for example. The fact that the best model was found with 10 layers is a concidence and it occured because this particular model presented the lowest mean absolute error (MAE). The procedure to select the model could be improved if the MAE were computed from the error of the reconstructed flow as . Although this procedure should be more expensive, it would be a better way to evaluate the models.
DNN architecture  

ELU 
Figures 4 and 5 show contours of vorticity and divergence of velocity, respectively, along the cylinder and wake regions at time , which is beyond the training window. The snapshots allow a comparison of the results between the FOM and ROM using 2 and 10 POD modes out of 280 modes. Hence, the flow is reconstructed using 0.7% and 3.5% of the total information available from the full order model. Although the current simulation is performed for a compressible flow at and , we could verify that the POD spatial eigenfunctions are almost identical to those from Noack et al. (2003), which were obtained for an incompressible flow and . A video comparing the FOM and ROM solutions is provided together with the manuscript. Reconstruction of the individual flow variables with 2 POD modes could recover between 50 and 80% of the total modal energy, depending on the variable. For example, density is reconstructed using 50% of the total energy of the system dynamics while momentum is reconstructed with 80% of the total modal energy. Here, we use the term “modal energy” to refer to the ratio between the sum of POD eigenvalues used in a particular reconstruction over the entire range of eigenvalues available, . For 10 POD modes, the reconstructions could recover 99% of the energy for all variables and, therefore, should lead to an accurate flow representation. For this particular case, the spectral proper orthogonal decomposition technique is not required since the modes are almost monochromatic and, hence, do not require filtering.
One can observe from the figures that the computations of the flow using the ROM framework show good agreement compared to those obtained by the FOM. For the current Reynolds number, the flow develops a typical von Kármán vortex street along the cylinder wake. The periodical pattern of the vortex shedding can be observed in the contours of vorticity. Noise generation also occurs in the current unsteady compressible flow simulation. In this case, pressure fluctuations along the cylinder surface are scattered to the farfield and can be observed in the contours of dilatation shown in figure 5. Both the nearfield hydrodynamics and the farfield acoustics are recovered by the ROM. The reconstruction using 10 POD modes show an excellent agreement with the FOM. When 2 POD modes are employed in the flow reconstruction, discrepancies between the ROM and FOM are evident from the figures. However, the main features of the flow are still recovered by the model. One should remind that, despite the use of only 2 modes, the dynamical system is still stable beyond the training region.
In order to show a more qualitative evaluation of the model reconstructions, the density and momentum fluctuation time histories are presented for the FOM and ROMs in figures 6 and 7, respectively. The figures on the left column show results for a probe located just behind the cylinder, close to the surface, at . The cylinder has radius 0.5 and its center is positioned in the origin of the Cartesian system. On the right column, results are obtained for a probe downstream the cylinder wake, at . Results are shown for both the training window period and beyond. When 2 POD modes are employed, the solutions show a less accurate representation of the dynamics observed in the FOM. The density reconstruction is that with the highest discrepancy and that is attributed to the lower energetic content achieved by the first 2 POD modes. One can notice that the reduced order model accurately reproduces the full order model results during and beyond the training window when 10 POD modes are employed in the reconstruction.
In order to test the robustness of the method, we employ the DNN approach for the reconstruction of the transient regime of an incompressible cylinder flow. For this study, the 2 most energetic POD modes containing both the transient and limit cycle dynamics of the flow are obtained from Brunton et al. (2016). These modes contain both the transient and limit cycle dynamics of the flow. Figure 8 shows the POD temporal modes used for training and testing the DNNs. The first 5000 temporal instants are used to train the models and one can see this dataset represented by the black line to the left of the vertical line marking the end of the training window. To the right of the training window, the black line represents the test data which is the correct solution for the temporal dynamics. Figure 8 also shows the results obtained by the current DNN approach and by sparse regression. In the case of sparse regression, we employ the same model obtained by Brunton et al. (2016) in table 11 of the Supporting Information report. As one can observe, the DNN is able to accurately recover both the transient and limit cycle solutions of the test data for both POD modes. On the other hand, sparse regression reconstructs the transient portion of the dynamics but not the long term prediction of the limit cycle. Several models obtained by the DNNs presented similar results compared to those shown in figure 8 and some had similar neural network architectures compared to that of table 5. These results show that the proposed DNN approach has good longterm predictive capabilities and can learn transient features of the flow.
4.2.2 Deep dynamic stall of plunging airfoil
The present study concerns a plunging SD7003 airfoil in deep dynamic stall. The flow conditions have a reduced frequency , where is the plunging frequency, is the chord of the airfoil and is the reference free stream velocity. The motion amplitude is set as with a static angle of attack deg. The chord Reynolds number based on the freestream velocity is and the freestream Mach number is . This flow condition is relevant for micro air vehicle applications and this case is selected based on the availability of published results from other high fidelity simulations (Visbal, 2011) and particle image velocimetry data (Baik et al., 2009; Ol et al., 2009). In a previous work (D’Lelis et al., 2019), we performed a mesh refinement study and validation of the numerical solutions against the references above.
The present mesh configuration consists of a bodyfitted Ogrid with points in the streamwise, wallnormal and spanwise directions, respectively. The grid is generated with of the surface points located in the suction side of the airfoil to improve the capturing of finer flow scales developing along the turbulent region of the flow. Due to the favorable pressure gradients in the pressure side of the airfoil, the flow remains laminar along the entire cycle of the plunging motion. The trailing edge of the SD7003 airfoil is rounded in the current simulations with an arc of radius . This procedure is required for retaining the smoothness of the metric terms computed by the high order compact scheme. The spanwise domain is set as similarly to Visbal (2011) and the nondimensional time step of the simulation is set as .
The plunge motion undergoes with an effective angle of attack in the range of deg. deg. Defining as the angular position in the plunging cycle, we say that at deg. the airfoil has no velocity in the direction and is at the topmost position of the plunging motion. At deg. it has the highest velocity in the direction downwards and, at deg., it has no velocity and is at the bottommost position of the plunging motion. Finally, at deg. it has the highest velocity in the direction upwards. In summary, during the downstroke, instabilities begin to form in the suction side of the airfoil, growing and eventually breaking into finer structures. While this takes place, the dynamic stall vortex forms along the leading edge and is transported through the airfoil suction side increasing the overall lift and creating a nosedown pitching moment. As the leadingedge vortex (LEV) approximates the trailing edge, a trailingedge vortex (TEV) forms pushing the LEV away from the airfoil. A more complete discussion of the flow dynamics can be found in Visbal (2011) and D’Lelis et al. (2019).
Figure 9 shows isosurfaces of Qcriterion colored by pressure and it is possible to observe the main flow features described above. In figures 9(a) and (b), it is possible to compare the 3D solutions of the FOM and ROM, respectively, for the leading edge vortex formation. Figures 9(c) and (d) show a similar comparison for the instant where the trailing edge vortex forms. For both instants, one can observe that the ROM is able to reconstruct the larger scale features of the 3D flow. A video comparing the FOM and ROM solutions for this flow configuration is provided together with the manuscript. The ROM is trained using the first 2 cycles of the plunging motion and the solutions presented in figure 9 are computed for the fourth cycle, showing that the model is able to reproduce the 3D flow dynamics beyond the training window. For this study, the parameters employed in algorithm 2 are presented in table 6 and the best set of hyperparameters, obtained via random search, is listed in table 7. It is important to mention that, for this case, Bayesian optimization was unable to produce stable and accurate models.
DNN architecture  

ELU 
The first 16 SPOD modes are employed to reduce the dimensionality of the input data in the 3D flow reconstruction. Other POD reconstructions were tested with a different number of modes and it was observed that the first 16 modes contained the main features of the leading and trailing edge vortex formation. For example, adding 10 more modes did not improve significantly the solution and, beyond mode 26, the SPOD temporal modes presented complex behavior, being composed of several frequencies and difficulting the training stage of the DNNs. This issue could be improved running the simulation for a longer period with a lower time step to improve convergence of the POD modes. One should remind that the spectral proper orthogonal decomposition allows an energy shift of the modes to obtain frequency filtered temporal dynamics. In this sense, the highfrequency noise observed in the POD temporal modes is not discarded but is shifted to higher SPOD modes. This procedure allows a better pairing of POD modes if the coherent structures have periodicity. In the present study, due to the unsteady boundary conditions and the lack of symmetry in the flow, we do not expect pairing of the SPOD modes.
For the present turbulent flow, the temporal modes of the standard snapshot POD are composed by several frequencies and, moreover, contain some noise that deteriorates the training of the DNNs. The SPOD provides the most energetic modes for specific frequency bands, allowing a better identification of the individual modes and smoothing out the temporal coefficients. Here, a box filter is applied to 25% of the POD correlation matrix. We observed that filter values in the range of 20% to 40% provide good results for the model reconstructions. If a lower filter window is employed, noise can still be present in the temporal modes. On the other hand, higher filter windows may considerably modify the temporal modes transforming them into quasisinusoidal signals. This is undesirable since the relevant dynamics of the flow can be lost and too many SPOD modes may be required for the construction of the ROM. An example of the impact of SPOD on the characteristics of temporal modes 2 and 16 can be seen in figure 10. In this example, the second POD mode exhibits highfrequency noise which is filtered by the SPOD. Meanwhile, POD mode 16 shows a complicated pattern due to the contribution of several frequencies. This temporal coefficient is considerably simplified by the application of SPOD as can be observed in the figure. It is clear that, differently from the cylinder case, for the plunging airfoil, the POD modes contain more complex dynamics which are composed of multiple Fourier modes.
In order to further evaluate the present DNN approch, the ROM reconstructions are also performed for spanwiseaveraged solutions of the flow and the reduced order model is constructed following the procedures defined in section 2. We employ 6 plunging cycles (2244 snapshots) for computing the models, discarding initial flow transients. The training data contains the first two cycles (748 snapshots) of the FOM and the remaining data is used as the test set. The parameters employed in algorithm 2 are the same as those presented in table 6 and the best set of hyperparameters, obtained via random search, is listed in table 8. Here, we apply the box filter to 25% of the POD correlation matrix and employ the first 10 SPOD modes for the ROM construction, which was sufficient to recover the most important features of the dynamic stall solution for the spanwiseaveraged study.
DNN architecture  

ELU 
Figures 11 and 12 present snapshots of density and xcomponent of momentum, respectively, for the FOM and ROM. A video comparing these solutions for the spanwiseaveraged flow is also provided together with the manuscript. The flow features of the dynamic stall can be observed for different stages of the plunging cycle, given by different values of in the figures. It is possible to track the LEV over the suction side of the airfoil. The current ROM is able to recover the important dynamics of the leading edge vortex formation, its transport and ejection, besides the trailing edge vortex formation and ejection. One should remind that the results presented in these figures are obtained for a plunging cycle beyond the training region. Therefore, the current ROM is capable of reproducing the flow dynamics for the test set. The high wavenumber features shown in the FOM are not present in the SPOD modes and, hence, they are not recovered by the ROM. If additional SPOD modes were employed in the model reconstruction, these features would appear. However, the overall cost of the simulations would considerably increase if accurate higher POD modes were required. For the present study, we observed that the fine scale flow dynamics are related to higher modes since most of the energetic content is related to the airfoil plunging motion represented by the first POD modes.
Fluctuation time histories are presented in figure 13 for density and momentum. These properties are computed by the FOM and ROM at probe locations in the proximity of the leading and trailing edges, at and , respectively. It is important to mention that the airfoil leading edge is at the origin of the Cartesian system. The plots show the training and test regions demonstrating that the reduced order model is stable beyond the training set and can accurately reproduce the main dynamics of the flow. The fast oscillations resolved by the full order model are not represented due to the SPOD basis truncation. However, all the relevant features of the flow fluctuations are captured by the DNN model obtained with the first 10 SPOD modes.
In figure 14, one can observe a comparison between reduced order models built using the current DNN approach and sparse regression. For this case, both the LASSO and Ridge regression techniques were tested with the original SINDy algorithm. Although the LASSO presented a higher computational cost, it provided better models compared to the Ridge regression. For this case, the computational cost of the DNN regression is, on average, 40 times higher than that of SINDy. However, as shown for SPOD modes 1 and 8, the SINDy algorithm was not able to provide stable models. The same can be said for the other SPOD modes. Sparse regression can learn the dynamics during the training window and also for a few cycles in the test set but its solution is not stable for longtime predictions. On the other hand, despite the higher cost, the DNN approach presents good longtime predictive capabilities with stable and accurate solutions beyond the training window.
In order to evaluate the robustness of the present framework, we test the capability of the ROM in reconstructing the flowfield using a time step different than that of the simulation. Figure 15 shows the reconstruction of SPOD temporal modes using different finite difference schemes and time steps. The derivatives of the temporal modes are computed via an explicit second order scheme and by sixth and tenth order compact schemes. When the reduced order model is constructed using the same time step of the numerical simulation, the first 10 SPOD modes are highly resolved in time. For this case, all three schemes for computing the temporal derivatives provide accurate results and the random search is able to find a set of hyperparameters which results in an accurate ROM. However, when the DNN is trained using SPOD modes obtained by a time step five times larger than that of the simulation, the resolution of the higher POD temporal modes can become compromised. Then, it is important to use the compact schemes to obtain accurate representations of the derivatives. In this case, the random search could only find suitable ROMs using the compact schemes. As shown in figure 15, the models are still stable and reproduce the same dynamics observed with lower time steps. This shows that the current ROM framework is robust and can be constructed using a subset of the data of the full order model, with lower temporal resolution.
Finally, we compute the phaseaveraged aerodynamic coefficients for the plunging SD7003 airfoil. Lift and drag coefficients are shown in figure 16 for the FOM and ROM using the six cycles computed in the training and test sets. Both the pressure and friction forces are accounted for in the calculation of the aerodynamic coefficients. However, for this case, the pressure component dominates the aerodynamic loads. The effective incidence angles are depicted in the figures and the aerodynamic coefficients are calculated using the 10th order compact scheme for the derivatives of the temporal modes and the first 10 SPOD modes. As a validation of the current solutions, the results from Visbal (2011) are also shown. Resuls are presented for the reduced order models computed using both the pressure and kinetic energy norms. The DNNs could find stable and accurate models for both POD norms and the aerodynamic coefficients obtained for the best models are compared in the figure. As one can see, the present ROM solutions show good comparisons to the current LES and Visbal (2011). The total simulation cost of the FOM for this case was 100,000 hours. On the other hand, the full cost of the ROM, including the optimization of the hyperparameters, plus the training and evaluation of 700 models, required approximately 7 hours. Hence, the computational cost for the training procedure is around 100 models per hour in a single GPU. Once a reduced order model is chosen, the cost of the simulation is reduced to a few seconds.
5 Conclusions
We present a methodology for constructing reduced order models combining flow modal decomposition and regression analysis via deep feedforward neural networks (DNNs). The framework is implemented in a context similar to that of the sparse identification of nonlinear dynamics (SINDy) algorithm recently proposed in the literature. The details of the methodology are described including algorithm charts which facilitate the understanding and implementation of the proposed framework. The source code can be downloaded from <http://cces.unicamp.br/software/>.
The method is tested for different problems involving nonlinear dynamical systems: the canonical nonlinear damped oscillator, the flow past a circular cylinder at a low Reynolds number and the turbulent flow past a plunging SD7003 airfoil under deep dynamic stall. For the previous two cases, the compressible Navier Stokes equations are solved in full contravariant form and additional noninertial terms are added to the equations to simulate the airfoil plunging motion. Numerical simulations are performed using a high order compact finite difference flow solver. Then, the high fidelity results from the simulations are used as the input data for the construction of the reduced order models.
To create stable and accurate ROMs of more complex flows, the application of the spectral proper orthogonal decomposition was found necessary. Hence, in order to filter high frequency content present in the temporal modes of the classical snapshot POD, we apply a filter function to the correlation matrix (SPOD approach). The proposed numerical framework allows the prediction of the flowfield beyond the training window using larger time increments than those employed by the full order model, which demonstrates the robustness of the current ROMs constructed via deep feedforward neural networks. The resolution of the numerical schemes used for computation of the POD temporal mode derivatives is shown to have an important role when larger time increments are employed in the construction of the reduced order models. In this case, the higher POD modes are composed by a broad range of frequencies and the accurate representation of the temporal derivatives is crucial for obtaining ROMs via regression analysis.
A discussion on the optimization of hyperparameters for obtaining the best ROMs via deep neural networks is provided together with a description of the effects of individual hyperparameters on model performance. Here, we test the random search and the Bayesian optimization, which are the most widely used procedures for hyperparameter optimization. To report the performance of each model from a set of parameters, we compute the mean absolute error (MAE) over the training data and choose the candidate models with lower MAE values to provide the best parameters. Following this approach, the random search produced the best models at lower computational costs for all cases investigated in this work. It is worth mentioning that, in order to reduce the set of parameters for optimization, we first selected the hyperparameters related to the optimization step, such as the learning rate and the number of iterations, by manual search. We also choose the exponential linear unit activation function over the hyperbolic tangent function for the current problems since it provided results with fewer iterations.
Using 10 POD modes, 99% of the flow energy is recovered in the cylinder flow study. The ROM obtained for this case shows an excellent agreement with the FOM. Even when 2 POD modes are employed in the reconstruction, the ROM is still stable beyond the training set and captures most of the dynamics of the vortex shedding and sound wave propagation. Both the DNN and the original SINDy approaches are tested for the reconstruction of the transient regime of an incompressible flow past a cylinder. In this case, regression is performed for the 2 most energetic POD modes of the flow, obtained from Brunton et al. (2016). The DNN model is able to accurately recover both the transient and limit cycle solutions of the test data for both POD modes. On the other hand, sparse regression reconstructs the transient portion of the dynamics but not the long term prediction of the limit cycle.
In the dynamic stall configuration, the flow is turbulent along the airfoil suction side, mostly during the downstroke motion. The complex flow dynamics of this case exhibit unpaired POD modes with high frequency noise. We show how the SPOD approach modifies the temporal modes for this study. Reduced order models are constructed both for the full 3D and the spanwiseaveraged flow solutions. In both cases, the ROMs are able to capture the dynamics of the leading edge stall vortex, including its formation, transport and ejection, and of the trailing edge vortex. The total simulation cost of the FOM for this case was 100,000 hours. On the other hand, the full cost of the ROM was approximately 7 hours in a single GPU. This cost already includes the optimization of hyperparameters, plus the training and evaluation of 700 models. When the best model is chosen, the cost of the simulation is reduced to a few seconds. The data obtained by the DNNROMs were used to compute aerodynamic coefficients of the dynamic stall and good agreement was found compared to the LES used as full order model. Again, we compare models obtained by DNNs and sparse regression and show that, despite the higher computational cost, the DNN approach presents good longtime predictive capabilities with stable and accurate solutions beyond the training window. On the other hand, the best model obtained from sparse regression could learn the dynamics during the training window and also for a few cycles in the test set but its solution was not stable for longtime predictions. We expect that, in a future work, the current methodology can be further improved for applications in flow control and extrapolation of flow configurations other than those used for training.
Acknowledgements
The authors acknowledge the financial support received from Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP, under Grants No. 2013/082937 and No. 2013/073750, and from Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq, under Grants No. 304335/20185 and 407842/20187. The support provided by CNPq through a scholarship for the first author is likewise gratefully appreciated. We also thank CEPIDCeMEAI for providing the computational resources used in this work. We gratefully acknowledge Brener D’Lelis and Tulio Ricciardi for providing the high fidelity dynamic stall and cylinder data employed in the current work.
References
 Abadi & et al. (2015) Abadi, M. & et al. 2015 TensorFlow: Largescale machine learning on heterogeneous systems. Software available from tensorflow.org.
 Akaike (1973) Akaike, H. 1973 Information Theory and an Extension of the Maximum Likelihood Principle, pp. 199–213. New York, NY: Springer New York.
 Aris (1989) Aris, R. 1989 Vectors, Tensors, and the Basic Equations of Fluid Mechanics. Dover Publications.
 Baik et al. (2009) Baik, Y. S., Rausch, J., Bernal, L. & Ol, M. V. 2009 Experimental investigation of pitching and plunging airfoils at reynolds number between 1 x 10^{4} and 6 x 10^{4}. In AIAA Paper.
 Balajewicz et al. (2016) Balajewicz, M., Tezaur, I. & Dowell, E. 2016 Minimal subspace rotation on the Stiefel manifold for stabilization and enhancement of projectionbased reduced order models for the compressible NavierStokes equations. Journal of Computational Physics 321, 224–241.
 Beam & Warming (1978) Beam, R. M & Warming, R. F 1978 An implicit factored scheme for the compressible navierstokes equations. AIAA Journal 16 (4), 393–402.
 Bengio (2009) Bengio, Y. 2009 Learning deep architectures for ai. Found. Trends Mach. Learn. 2 (1), 1–127.
 Bergstra & Bengio (2012) Bergstra, J. & Bengio, Y. 2012 Random search for hyperparameter optimization. J. Mach. Learn. Res. 13, 281–305.
 Brochu et al. (2010) Brochu, E., Cora, V. M. & Freitas, N. 2010 A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. CoRR abs/1012.2599, arXiv: 1012.2599.
 Brunton & Noack (2015) Brunton, S. L. & Noack, B. R. 2015 Closedloop turbulence control: Progress and challenges. Applied Mechanics Reviews 67, 050801.
 Brunton et al. (2016) Brunton, S. L., Proctor, J. L. & Kutz, N. J. 2016 Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), 3932–3937, arXiv: http://www.pnas.org/content/113/15/3932.full.pdf.
 Carlberg et al. (2011) Carlberg, K., BouMosleh, C. & Farhat, C. 2011 Efficient nonlinear model reduction via a leastsquares petrovâgalerkin projection and compressive tensor approximations. International Journal for Numerical Methods in Engineering 86 (2), 155–181, arXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.3050.
 Cazemier et al. (1998) Cazemier, W., Verstappen, R. W. C. P. & Veldman, A. E. P. 1998 Proper orthogonal decomposition and lowdimensional models for driven cavity flows. Physics of Fluids 10 (7), 1685.
 Chaturantabut & Sorensen (2010) Chaturantabut, S. & Sorensen, D. 2010 Nonlinear model reduction via discrete empirical interpolation. SIAM Journal of Scientific Computing 32, 2737–2764.
 Clainche & Vega (2017) Clainche, S. & Vega, J. M. 2017 Higher order dynamic mode decomposition to identify and extrapolate flow patterns. Physics of Fluids 29, 084102.
 Clevert et al. (2015) Clevert, D., Unterthiner, T. & Hochreiter, S. 2015 Fast and accurate deep network learning by exponential linear units (elus). CoRR abs/1511.07289, arXiv: 1511.07289.
 D’Lelis et al. (2019) D’Lelis, B. O. R., Wolf, W. R., Yeh, C. & Taira, K. 2019 Modal analysis of deep dynamic stall for a plunging airfoil. In AIAA Scitech 2019. AIAA.
 Goodfellow et al. (2016) Goodfellow, I., Bengio, Y. & Courville, A. 2016 Deep Learning. The MIT Press.
 Green et al. (2007) Green, M. A., Rowley, C. W. & Haller, G. 2007 Detecting vortex formation and shedding in cylinder wakes using lagrangian coherent structures. Journal of Fluid Mechanics 572, 111–120.
 Haller (2015) Haller, G. 2015 Lagrangian coherent structures. Annual Review of Fluid Mechanics 47, 137–162.
 Juang & Pappa (1985) Juang, J. N. & Pappa, R. S. 1985 An eigensystem realization algorithm for modal parameter identification and model reduction. Journal of Guidance, Control, and Dynamics 8, 620–627.
 Kennedy et al. (1999) Kennedy, C. A., Carpenter, M. H. & Lewis, R. M. 1999 Lowstorage, explicit RungeKutta schemes for the compressible NavierStokes equations. ICASE Report No. 9922.
 Kingma & Ba (2014) Kingma, D. P. & Ba, J. 2014 Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR) , arXiv: 1412.6980.
 Krizhevsky (2009) Krizhevsky, A. 2009 Learning multiple layers of features from tiny images. Tech. Rep..
 Kutz (2017) Kutz, J. 2017 Deep learning in fluid dynamics. Journal of Fluid Mechanics 814, 1â4.
 Lele (1992) Lele, S. K. 1992 Compact finite difference schemes with spectrallike resolution. Journal of Computational Physics 103 (1), 16–42.
 Ling et al. (2016) Ling, J., Kurzawski, A. & Templeton, J. 2016 Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics 807, 155–166.
 Ling & Templeton (2015) Ling, J. & Templeton, J. 2015 Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty. Physics of Fluids 27, 085103.
 Luhar et al. (2014) Luhar, M., Sharma, A. S. & McKeon, B. J. 2014 On the structure and origin of pressure fluctuations in wall turbulence: predictions based on the resolvent analysis. Journal of Fluid Mechanics 751, 38–70.
 Lumley (1967) Lumley, J. L. 1967 The structure of inhomogeneous turbulence. In Atmospheric Turbulence and Wave Propagation, pp. 166–178. Moscow: Nauka.
 Nagarajan et al. (2003) Nagarajan, S., Lele, S. K. & Ferziger, J. H. 2003 A robust highorder compact method for large eddy simulation. Journal of Computational Physics 191 (2), 392–419.
 Nair & Hinton (2010) Nair, V. & Hinton, G. E. 2010 Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 807–814. USA: Omnipress.
 Noack et al. (2003) Noack, B. R., Afanasiev, K., Morzynski, M., Tadmor, G. & Thiele, F. 2003 A hierarchy of lowdimensional models for the transient and posttransient cylinder wake. Journal of Fluid Mechanics 497, 335–363.
 Ol et al. (2009) Ol, M. V., Bernal, L., Kang, C. & Shyy, W. 2009 Shallow and deep dynamic stall for flapping low reynolds number airfoils. Experiments in Fluids 46 (5), 883–901.
 Olson (2012) Olson, B. J. 2012 Largeeddy simulation of multimaterial mixing and overexpanded nozzle flow. PhD thesis, Stanford University.
 Östh et al. (2014) Östh, J., Noack, B. R., KrajnoviÄ, S., Barros, D. & Borée, J. 2014 On the need for a nonlinear subscale turbulence term in POD models as exemplified for a highReynoldsnumber flow over an Ahmed body. Journal of Fluid Mechanics 747, 518–544.
 Pan & Duraisamy (2018) Pan, S. & Duraisamy, K. 2018 Longtime predictive modeling of nonlinear dynamical systems using neural networks. ArXiv eprints , arXiv: 1805.12547.
 Protas et al. (2015) Protas, B., Noack, B. R. & Östh, J. 2015 Optimal nonlinear eddy viscosity in Galerkin models of turbulent flows. Journal of Fluid Mechanics 766, 337–367.
 Ribeiro & Wolf (2017) Ribeiro, J. H. M. & Wolf, W. R. 2017 Identification of coherent structures in the flow past a naca0012 airfoil via proper orthogonal decomposition. Physics of Fluids 29 (8), 085104.
 Rockwood et al. (2016) Rockwood, M. P., Taira, K. & Green, M. A. 2016 Detecting vortex formation and shedding in cylinder wakes using lagrangian coherent structures. AIAA Journal 55, 15–23.
 Rowley et al. (2004) Rowley, C. W., Colonius, T. & Murray, R. M. 2004 Model reduction for compressible flows using POD and galerkin projection. Physica D: Nonlinear Phenomena 189 (1â2), 115 – 129.
 Rudy et al. (2017) Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, N. J. 2017 Datadriven discovery of partial differential equations. Science Advances 3 (4), arXiv: http://advances.sciencemag.org/content/3/4/e1602614.full.pdf.
 Rudy et al. (2018) Rudy, S. H., Kutz, J. N. & Brunton, S. L. 2018 Deep learning of dynamics and signalnoise decomposition with timestepping constraints. ArXiv eprints , arXiv: 1808.02578.
 San & Maulik (2018) San, O. & Maulik, R. 2018 Extreme learning machine for reduced order modeling of turbulent geophysical flows. Phys. Rev. E 97, 042322.
 Schmid (2010) Schmid, P. J. 2010 Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656, 5â28.
 Schwarz (1978) Schwarz, G. 1978 Estimating the dimension of a model. The Annals of Statistics 6, 461–464.
 Sharma et al. (2016) Sharma, A. S., Moarref, R., McKeon, B. J., Park, J. S., Graham, M. D. & Willis, A. P. 2016 Lowdimensional representations of exact coherent states of the navierstokes equations from the resolvent model of wall turbulence. Physical Review E 93, 021102.
 Sieber et al. (2016) Sieber, M., Paschereit, C. O. & Oberleithner, K. 2016 Spectral proper orthogonal decomposition. Journal of Fluid Mechanics 792, 798â828.
 Sirovich (1986) Sirovich, L. 1986 Turbulence and the dynamics of coherent structures. Part I: Coherent structures. Quarterly of Applied Mathematics 45, 561–571.
 Snoek et al. (2012) Snoek, J., Larochelle, H. & Adams, R. P. 2012 Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pp. 2951–2959. Curran Associates, Inc.
 Ströfer et al. (2019) Ströfer, C. M., Wu, J. L., Xiao, H. & Paterson, E. 2019 Datadriven, physics based feature extraction from fluid flow fields using convolutional neural networks. Communications in Computational Physics 25, 625–650.
 Taira et al. (2017) Taira, K., Brunton, S. L., Dawson, S. T. M., Rowley, C. W., Colonius, T., McKeon, B. J., Schmidt, O. T., Gordeyev, S., Theofilis, V. & Ukeiley, L. S. 2017 Modal analysis of fluid flows: An overview. AIAA Journal 47, 1–28.
 Towne et al. (2018) Towne, A., Schmidt, O. T. & Colonius, T. 2018 Spectral proper orthogonal decomposition and its relationship to dynamic mode decomposition and resolvent analysis. Journal of Fluid Mechanics 847, 821â867.
 Tu et al. (2014) Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. & Kutz, N. J. 2014 On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics 1, 391–421.
 Visbal (2011) Visbal, M. R. 2011 Numerical investigation of deep dynamic stall of a plunging airfoil. AIAA Journal 49, 2152–2170.
 Vlachas et al. (2018) Vlachas, P. R., Byeon, W., Wan, Z. Y., Sapsis, T. P. & Koumoutsakos, P. 2018 Datadriven forecasting of highdimensional chaotic systems with long shortterm memory networks. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 474 (2213), arXiv: http://rspa.royalsocietypublishing.org/content/474/2213/20170844.full.pdf.
 Wan et al. (2018) Wan, Z. Y., Vlachas, P., Koumoutsakos, P. & Sapsis, T. 2018 Dataassisted reducedorder modeling of extreme events in complex dynamical systems. PLOS ONE 13 (5), 1–22.
 Wang et al. (2017) Wang, J. X., Wu, J. L. & Xiao, H. 2017 Physicsinformed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data. Physical Review Fluids 2, 034603.
 Wolf (2011) Wolf, W. R. 2011 Airfoil aeroacoustics: Les and acoustic analogy predictions. PhD thesis, Stanford University.
 Wolf et al. (2012a) Wolf, W. R., Azevedo, J. L. F. & Lele, S. K. 2012a Convective effects and the role of quadrupole sources for aerofoil aeroacoustics. Journal of Fluid Mechanics 708, 502–538.
 Wolf et al. (2012b) Wolf, W. R., Lele, S. K., Jothiprasard, G. & Cheung, L. 2012b Investigation of noise generated by a du96 airfoil. In 18th AIAA/CEAS Aeroacoustics Conference (33th AIAA Aeroacoustics Conference).
 Xavier & Yoshua (2010) Xavier, G. & Yoshua, B. 2010 Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (ed. Yee Whye Teh & Mike Titterington), Proceedings of Machine Learning Research, vol. 9, pp. 249–256. Chia Laguna Resort, Sardinia, Italy: PMLR.
 Zimmermann & Willcox (2016) Zimmermann, R. & Willcox, K. 2016 An accelerated greedy missing point estimation procedure. SIAM Journal of Scientific Computing 38, 2827–2850.