Learning physics-based reduced-order models for a single-injector combustion process

Learning physics-based reduced-order models for a single-injector combustion process


This paper presents a physics-based data-driven method to learn predictive reduced-order models (ROMs) from high-fidelity simulations, and illustrates it in the challenging context of a single-injector combustion process. The method combines the perspectives of model reduction and machine learning. Model reduction brings in the physics of the problem, constraining the ROM predictions to lie on a subspace defined by the governing equations. This is achieved by defining the ROM in proper orthogonal decomposition (POD) coordinates, which embed the rich physics information contained in solution snapshots of a high-fidelity computational fluid dynamics (CFD) model. The machine learning perspective brings the flexibility to use transformed physical variables to define the POD basis. This is in contrast to traditional model reduction approaches that are constrained to use the physical variables of the high-fidelity code. Combining the two perspectives, the approach identifies a set of transformed physical variables that expose quadratic structure in the combustion governing equations and learns a quadratic ROM from transformed snapshot data. This learning does not require access to the high-fidelity model implementation. Numerical experiments show that the ROM accurately predicts temperature, pressure, velocity, species concentrations, and the limit-cycle amplitude, with speedups of more than five orders of magnitude over high-fidelity models.

ROM-predicted pressure traces accurately match the phase of the pressure signal and yield good approximations of the limit-cycle amplitude.


 = System matrix for linear part
 = Input matrix
 = Matricized quadratic tensor
 = State vector in finite dimensions
 = External input vector
 = Snapshot matrix
 = Matrix of POD basis vectors
 = Spatial coordinates
 = Number of physical variables
 = Spatial discretization dimension
 = Reduced model dimension
 = Time
 = State vector in primitive, conservative, and learning variables
 = Pressure, continuous and discretized
 = Temperature, continuous and discretized
 = Density, continuous and discretized
 = Specific volume, continuous and discretized
 = Velocity in direction, continuous and discretized
 = Species mass fraction,
 = Species molar concentrations, , also denoted as for chemical species
 = Kronecker product
 = Notation for ROM quantities
CFD  = Computational Fluid Dynamics
GEMS  = General equation and mesh solver; a CFD code
PDE  = Partial differential equation
POD  = Proper orthogonal decomposition
ROM  = Reduced-order model

1 Introduction

This paper presents an approach to learning low-dimensional surrogate models for a complex, nonlinear, multi-physics, multi-scale problem in form of a multi-species combustion process. The need for repeated model evaluations in optimization, design, uncertainty quantification and control of aerospace systems has driven the development of reduced-order models (ROMs) for applications in aerodynamics [1, 2, 3, 4, 5, 6, 7], reacting flows [8, 9, 10] and combustion [11, 12, 13]. ROMs combine the rich information embedded in high-fidelity simulations with the efficiency of low-dimensional surrogate models; yet, effective and robust ROM methods for nonlinear, multi-scale applications such as combustion have remained an open challenge.

Most existing nonlinear model reduction methods are intrusive—that is, they derive the ROM by projecting the high-fidelity model operators onto a low-dimensional subspace. In doing so, the physics of the problem is embedded in the reduced-order representation. The proper orthogonal decomposition (POD) [14, 15] is the most common way to define the low-dimensional subspace, using the singular value decomposition to identify low-dimensional structure based on training data. For some problems, the projection approach is amenable to rigorous error analysis and structure-preservation guarantees [16, 17, 18, 19], but these rigorous guarantees do not apply to nonlinear, multi-physics, multi-scale models, for which projection-based ROMs remain challenging to implement (due to the need for access to the high-dimensional operators). Furthermore, ROMs for these problems typically require relatively high dimensionality (and thus high cost) to avoid problems with robustness and stability [12, 20]

There is increasing attention to non-intrusive model reduction methods (sometimes called black-box or data-driven methods) that learn a model based on training data, without requiring explicit access to the high-fidelity model operators. The non-intrusive philosophy aligns directly with the field of machine learning, where representations such as neural networks have been shown to induce nonlinear model forms that can approximate many physical processes [21]. However, neural networks require a large amount of training data, limiting their utility when the data comes from expensive large-scale partial differential equation (PDE) simulations [22]. Moreover, the parametrization of the learned low-dimensional model is critical to the predictive accuracy and success of the learned model—in particular, it is critical to determining whether the ROM can issue reliable predictions in regimes outside of the training data.

For large-scale PDE models, an important class of non-intrusive learning approaches tackle this challenge of model parametrization by embedding the structure of the problem into the learning formulation. Some approaches use sparse learning techniques to identify PDE model terms that explain the data [23, 24, 25]. When the model can be expressed in the form of a dynamical system with polynomial terms, then the learning problem can be formulated as a parameter estimation problem, as in the operator inference approach of [26]. An important advantage of non-intrusive learning approaches is that the user has the flexibility to choose the variables that drive the learning. This opens the way for variable transformations that expose system structure and, in doing so, transform the ROM learning task into a structured form. In some cases, the governing PDEs naturally admit variable transformations that reveal polynomial form, such as the specific volume representation of the Euler equations [27]. More generally, one can introduce new auxiliary variables to the problem—known as lifting—to produce a system that is polynomial in its expanded set of state variables [28, 29, 30]. This allows for a much broader class of nonlinear systems to be learned using the operator inference framework.

In this work, we build on the operator inference framework to learn structured, polynomial ROMs from simulated snapshot data of a single-injector combustion model. While in our case the polynomial model parametrization is a model approximation, we show that the predictive capabilities of the learned ROM are excellent beyond the training data. Our proposed approach follows the steps below, which we develop in detail in the ensuing sections:

  1. We obtain high-dimensional simulation snapshot data for a spatially two-dimensional combustion process from the General Equation and Mesh Solver (GEMS) CFD code [31] developed at Purdue University. The governing equations and combustion problem setup are described in Section 2.

  2. We identify a set of state variables in which many of the terms in the governing equations have quadratic form. We transform the snapshot data to these new state variables, as described in Section 3.3.

  3. We use operator inference to learn a ROM that evolves the combustion dynamics in a low-dimensional subspace. Details of the model learning are given in Section 3 and Section 4.2.

We present numerical results comparing our learned ROMs with GEMS test data in Section 4 and conclude the paper in Section 5.

2 Combustion model

Section 2.1 defines the computational domain under consideration, Section 2.2 presents the governing equations for the combustion model, and Section 2.3 briefly summarizes the numerical implementation. The combustion model follows the implementation of the General Equation and Mesh Solver (GEMS) CFD code [31] and more details can be found in Huang et al. [11].

2.1 Computational domain

A single-injector combustor as in [32] is shown in Figure 0(a), with the computational domain outlined in red dashed lines. Our domain is a simplified two-dimensional version of the computational domain, shown in Figure 0(b), which also shows the four locations where we monitor the state variables.

(a) Combustor assembly.
(b) The upper half (due to symmetry) of the computational domain and the monitor locations where we measure the state variables.
Figure 1: Setup and geometry of single-injector combustor.

2.2 Governing equations

The dynamics of the combustor are governed by the conservation equations for mass, momentum, energy and species mass fractions. For this two-dimensional problem, the conservation equations are


and they describe the evolution of the conservative variables

where is the density (), and are the and velocity (), is the total energy (), and is the th species mass fraction with and is defined as the number of chemical species that are included in the model.

The total energy is defined as


where pressure is given in (Pa), is the enthalpy corresponding to the th species and is a highly nonlinear function of temperature, , and is the stagnation enthalpy. The inviscid flux and viscous flux in Eq. (1) are

The two-dimensional viscous shear tensor is defined as


where is the mixture viscosity coefficient. The diffusive heat flux vector is defined as


where quantifies thermal conductivity and is the diffusion coefficient for the th species into the mixture, which is an approximation used to model the multi-component diffusion as the binary diffusion of each species into a mixture. The two terms in the definition of the heat flux (Eq. (4)) represent heat transfer due to conductivity and species diffusion. The diffusive mass flux vector of species is modeled as


The source term, in Eq. (1) is


and is defined by considering a 1-step combustion reaction governed by

as presented in [33], with . The corresponding general stoichiometric equation is defined as where , HO and is the net stoichiometric coefficients of each species with and . The molar concentration of the th species is denoted by . In our case, , so and are the molar concentrations. Here, we use the standard bracket notation to indicate molar concentration of a species. The general relationship between a species molar concentration, , and a species mass fraction, , is


where is the mass fraction of CH, is the mass fraction of O, is the mass fraction of CO and is the mass fraction of HO. The production rate of the th species in the source term in Eq. 6 is modeled as


where are chemical reaction source terms whose dynamics are described below and is the reaction rate. The molar mass of CH is , the molar mass of O is , the molar mass of CO is , and the molar mass of HO is . The reaction rate is approximated by

where is the number of reactants, is the rate coefficient and is the reaction order of the th reactant. In our case and . The rate coefficient, , is described by the Arrhenius equation as


where is the universal gas constant, is the pre-exponential constant and is the energy required to reach a chemical reaction, measured in Joules and referred to as the activation energy. In this work, we use the ideal gas state equation that relates density and pressure to temperature


where and is the average molar mass of the mixtures.

At the downstream end of the combustor, we impose a non-reflecting boundary condition while maintaining the chamber pressure via


where Pa, and Hz. The top and bottom wall boundary conditions are no-slip conditions, and for the upstream boundary we impose constant mass flow at the inlets.

2.3 Numerical model

GEMS uses the finite volume method to discretize the governing equations, which are solved in primitive variables. For a spatial discretization with cells, this results in a -dimensional system of nonlinear ordinary differential equations (ODEs)


for , where is the number of unknowns in the PDE governing equations and here (four flow variables and four species concentrations). In Eq. (12), is the discretized state vector at time (for GEMS, it is the discretization of the primitive variables ), are the specified initial conditions, and is the time derivative of the state vector at time . The inputs arise from the time-dependent boundary condition, defined in Eq. (11), applied at the combustor downstream end. The nonlinear function maps the discretized states and the input to the state time derivatives, representing the spatial discretization of the governing equations described in Section 2.2.

Solving these high-dimensional nonlinear ODEs is expensive, motivating the derivation of a ROM that can yield approximate solutions at reduced cost. The nonlinear multi-scale dynamics represented by these equations makes this a challenging task. To maintain computational efficiency in the ROM, state-of-the-art nonlinear model reduction methods combine POD with a sparse interpolation method (often called hyperreduction) by evaluating the nonlinear functions only at a select number of points. For instance, POD together with the discrete empirical interpolation method (DEIM) has had some success, but also encountered problems in combustion applications [12] . Of particular challenge is the need to include a large number of interpolation points in the POD-DEIM approximation, which means that the ROM loses its computational efficiency. Robustness and stability of the POD-DEIM models is also a challenge [20]. In the next section, we present a different approach that uses non-intrusive ROM learning to enable variable transformations that expose system structure. This structure is then exploited in the derivation of the ROM and removes the need for the DEIM approximation.

3 Non-intrusive learning of a combustion reduced model

This section presents our approach to learn ROMs for the unsteady combustion dynamics simulation from GEMS. Section 3.1 writes a general nonlinear system in a form that exposes the underlying structure of the governing equations and shows how projection preserves that structure. Section 3.2 presents the operator inference approach from [26], which learns structured ROM operators from simulation data. Section 3.3 describes variable transformations that lead to the desired polynomial structure for the combustion governing equations presented in Section 2. These transformations yield the structure needed to apply the operator inference approach.

3.1 Projection preserves polynomial structure in the governing equations

Consider a large-scale sytem of nonlinear ODEs written in polynomial form


Relating this equation to the general nonlinear system in Eq. (12), we see that are the terms in that are linear in the state , with ; are the terms in that are quadratic in , with ; are the terms in that are cubic in , with ; are the terms in that are linear in the input , with ; and are constant terms in that do not depend on state or input. The abbreviation “HOT” in Eq. (13) denotes higher-order terms, and represents terms that are quartic and higher order, as well as any other nonlinear terms that cannot be represented in polynomial form.

We emphasize that we are not (yet) introducing approximations—rather, we are explicitly writing out the discretized equations in the form (13) to expose the system structure that arises from the form of the terms in the governing PDEs. For example, a term such as in Eq. (1) is linear in the state , while a term such as is quadratic in the states and . Also note that the term is quadratic in the states and , highlighting the important point that the structure of the nonlinear model depends on the particular choice of state variables.

A projection-based ROM of Eq. (13) preserves the polynomial structure. Approximating the high-dimensional state in a low-dimensional basis , with , we write . Using a Galerkin projection, this yields the ROM of Eq. (13) as


where , , , and are the ROM operators corresponding respectively to , , , and , and is a constant vector. We note again that projection preserves polynomial structure, that is, (14) has the same polynomial form as (13), but in the reduced subspace defined by .

In what follows, we will work with a quadratic system in order to simplify notation. We note that the least squares learning approach described below applies directly to cubic, quartic and all higher-order polynomial terms (although it should be noted that the number of elements in the ROM operators scales with for the cubic operator, for the quartic operator, etc.). For terms in the governing equations that are not in polynomial form (such as terms involving , and the Arrhenius reaction terms) we discuss in Section 3.3 the introduction of variable transformations and auxiliary variables via the process of lifting [28, 29] to convert these terms to polynomial form.

3.2 Operator inference for learning reduced models

Here we summarize the steps of the operator inference approach from [26]. First, we collect snapshots of the state by solving the high-fidelity model. We store the snapshots and the inputs used to generate them in the matrices:


where and with . In general, , so the matrix is tall and skinny. Second, we identify the low-dimensional subspace in which we will learn the ROM. In this work, we use the POD to define the low-dimensional subspace, by computing the singular value decomposition of the snapshot matrix

where , and . The dimensional POD basis, , is given by the first columns of . Third, we project the state snapshot data onto the POD subspace spanned by the columns of and obtain the reduced snapshot matrices


where the columns of are computed from using any time derivative approximation (see, e.g., [34, 35, 36]), or can be obtained—if available—by collecting and projecting snapshots of .

Operator inference solves a least squares problem to find the reduced operators that yield the ROM that best matches the projected snapshot data in a minimum residual sense. For the quadratic ROM


operator inference solves the least squares problem


where is the length column vector with all entries set to unity. Note that this least squares problem is linear in the coefficients of the unknown ROM operators , , and . Also note that the operator inference approach permits us to compute the ROM operators , , and without needing explicit access to the original high-dimensional operators , , and .

We combine the unknown operators of Eq. (17) in the matrix

and the known low-dimensional data in the data matrix


and then solve the minimization problem


It was proven in [26] that Eq. (20) can be written as independent least squares problems of the form , for , where is a column of (row of ) and is a column of . This makes the operator inference approach efficient and scalable.

Regularization becomes necessary to avoid overfitting and to infer operators that produce a stable ROM. In this work, we use an L regularization penalty on the off-diagonal elements of the operator and on all elements of the remaining operators. With this regularization, our least squares problem becomes


where is the regularization parameter and is the identity matrix with the th diagonal set to zero so that we avoid regularizing the diagonal elements of . It should be noted that the regularization parameter, , is problem specific and should be chosen accordingly. In Section 4.2, we discuss details of the operator inference implementation, a method for selecting , and the removal of redundant terms in the least squares problem in Eq. (21).

3.3 A structure-exploiting ROM learning formulation for GEMS

A key contribution of this work is to recognize that the non-intrusive operator inference approach gives us complete flexibility in the set of physical variables we work with to define the ROM. We can identify choices of physical variables that expose the desired polynomial structure in the governing equations, and then extract snapshots for those variables by applying transformations to the snapshot data—we do not need to make any modifications to the high-fidelity CFD simulation model itself. In theory, a classical intrusive ROM approach could work with transformed variables (e.g., in the work of [37]); however, this would involve rewriting the high-fidelity simulator, a task that would be not only time-consuming but also fraught with mathematical pitfalls, especially for unusual choices of variables. This is where the data-driven perspective of machine learning becomes extremely valuable.

The Euler equations admit a quadratic representation in the specific volume variables; in that case, a transformation of the snapshots from conservative (or primitive) variables to specific volume variables can be exploited to create quadratic ROMs [27]. Other PDEs may not admit polynomial structure via such straightforward transformations, but the process of lifting the equations via the introduction of new auxiliary variables can produce a set of coordinates in which the governing equations become polynomial in the lifted state [28, 29, 30]. For example, the tubular reactor example of [29] includes Arrhenius-type reaction terms similar to those in Eq. (9). The introduction of auxiliary variables permits the governing equations to be written equivalently with quartic nonlinearity in the lifted variables.5

Lifting to polynomial form for the GEMS equations described in Section 2 is made difficult by several of the terms, in particular through some of the gas thermal properties such as the nonlinear dependence of enthalpy on temperature. A complete lifting that converts all equations to a polynomial form is possible, but would require the introduction of a large number of auxiliary variables and would also result in the introduction of some algebraic equations. However, as the analysis below shows, the GEMS governing equations admit a transformation for which many terms in the governing equations take polynomial form when we use the variables


Here is the specific volume, and recall that and are the molar concentrations with .

Below, we derive the governing PDEs for specific volume and velocities . These three governing PDEs all turn out to be quadratic in the learning variables . In Appendix A we present the lifting transformations for the source term dynamics in the vector in Eq. (6). In Appendix B we derive the equations governing the pressure and the species molar concentrations . These equations have some terms that are not polynomial in the chosen learning variables .

To keep notation clean in application of the chain rule, let the conservative variables be denoted as . Throughout, we frequently use the relationship


and similarly for . Note also that we are assuming the existence of these partial derivatives, that is, we do not consider the case of problems with discontinuities.

Specific Volume . We use the constitutive relationship for the density in Eq. (1) in the derivation:


Inserting Eq. (23) into the above, we obtain


which is quadratic in the learning variables .

Velocities : We have


and from Eq. (1) we have that as well as . Thus, we obtain


and we get a similar expression for . Both dynamics are quadratic in the learning variables .

As noted above, Appendix A and Appendix B present the derivations for the chemical source terms, pressure, and chemical species.

4 Numerical Results

We now apply the variable transformations and operator inference framework to learn a predictive ROM from GEMS high-fidelity combustion simulation data.6 Section 4.1 describes the problem setup and GEMS dataset. Section 4.2 discusses implementation details and Section 4.3 presents our numerical results. Additional numerical results can be found in [38].

4.1 GEMS Dataset

The computational domain shown in Figure 0(b) is discretized with spatial discretization points. Each CFD state solution thus has dimension . The problem considered here has fuel and oxidizer input streams with constant mass flow rates of and , respectively. The fuel is composed of gaseous methane and the oxidizer is 42% gaseous O and 58% gaseous HO, as described in [11]. The forcing input Eq. (11) is applied at the right side of the domain.

To generate training data, GEMS is simulated for a time duration of 1ms with a time step size of s resulting in snapshots. The GEMS output is transformed to the variables given in Eq. (22). The recorded snapshot matrix is thus

Our numerical experiments were parallelized on a cluster with two computing nodes. Each node has two 10-core Intel Xeon-E5 processors (20 cores per node) and 128 GB RAM. The training data generation took approximately 200h in CPU time for the 1ms, 10000 snapshots of high-fidelity CFD data.

The range of variable values for the training data is shown in Table 2. Note that the data covers a wide range of scales. Pressure is of the order while species concentrations can be as low as 10. This large scaling difference presents a challenge when learning models from data. To deal with the numerical issues related to large differences in scaling and small species concentrations and velocities, we scale each variable to the interval . Variables are scaled before computing the POD basis and projecting the data.

State variable Minimum Mean Maximum
Pressure in Pa
Velocity in -222.930 69.637 307.147
Velocity in -206.990 1.304 186.548
Specific volume in 0.0533 0.220 0.674
Molar concentration 0.0 0.063 1.169
Molar concentration 0.0 0.056 0.097
Molar concentration 0.0 0.002 0.012
Molar concentration 0.0 0.154 0.232
Table 2: Range of variable values for GEMS data.

To obtain snapshots of the projected state time derivative, we approximate the derivative with a five-point approximation . This approximation is fourth order accurate. The first two and last two time derivatives are computed using first-order forward and backward Euler approximations, respectively.

4.2 Learning a quadratic reduced-order model

To learn the operators of the quadratic ROM, we solve the regularized least squares problem shown in Eq. (21). In what follows next, we describe several important implementation details.

Singular value decomposition implementation and POD basis selection

Due to the large size of this dataset, we implement the randomized singular value decomposition algorithm, introduced in [39], to compute the leading 500 singular values and vectors of the snapshot matrix. The randomized singular value decomposition algorithm can be implemented in a scalable way for very large datasets. The POD basis is chosen as the leading left singular vectors. The dimension is typically chosen so that the cumulative energy contained in the subspace is greater than a user specified tolerance , i.e.,

where are the squared singular values of the data matrix . To guide the choice of we also use the relative projection error


Removing redundant terms in least squares problem

Due to the redundant terms computed in in equation (19), the least squares problem may become ill-posed. Thus, the Kronecker product is replaced with the term

where . Each vector is defined, according to [26], as

and is the th element of the vector . Now, instead of learning the operator , the operator is learned, which satisfies the equivalent least squares problem


The least squares problem is again of the form (20) but we use the data matrix and solve for the operators . Once we have solved for the operator we can easily transform it to obtain .


We use an L-regularization (also known as Tikhonov regularization or ridge regression) to solve the operator inference problem, as shown in Eq. (32). The regularization term introduces a trade-off between operators that fit the data well and operators with small values. This regularization is used to avoid overfitting to the data, which in this setting causes our learned ROMs to be unstable. The regularization parameter affects the performance of this algorithm—we require enough regularization to avoid overfitting, but if is too large, the data will be poorly fit.

To help determine appropriate values of , we consider the “L-curve” criterion discussed in [40]. The L-curve is a way of visualizing the effects of different values of on the norm of the residual (data fit) against the norm of the solution. The L-curve criterion recommends choosing a value for that lies in the corner of the curve, nearest the origin. In our numerical experiments, we compute the L-curve to help determine appropriate values for .

Regularization also helps to reduce the condition number of the regularized least squares data matrix, . In Figure 2 we show the condition number of the data matrix from Eq. (32) (with the data already scaled to [-1,1]) when we include and snapshots in the training set. The condition number is quite large for this application, yet it decreases as we add more training data. This effect is due to the fact that as we add more training data, the first POD basis vectors become richer. This in turn means that the projected data in are richer for a given dimension of the POD basis. Our numerical experiments reinforced this finding by confirming that training snapshots were required to achieve a sufficiently rich POD basis to obtain accurate ROMs, an indication of the complexity of the combustor dynamics.

Figure 2: The condition number of the data matrix, , vs. basis size for different sized training sets.

Error measures

To evaluate the ROM performance, we define appropriate error metrics for each variable. The error is computed after the ROM solutions have been reconstructed back into the full CFD model dimension and scaled back to the original variable ranges. Recall Table 2, which showed the range of values for each variable. Below we provide details on how error is computed for each variable:

  • For pressure and temperature, the values are always positive and well above zero, so we use a standard relative error, defined as


    where and denotes the th entry of a vector, and is the CFD solution variable and is the solution variable obtained from the ROM simulation.

  • Due to the small values of species concentrations (on the order of ), dividing by the true value can skew a small error. Similarly, velocities range from positive to negative, including zero. Thus, for species concentrations and velocities, we use a normalized absolute error, defined as


    where and denotes the maximum entry of , i.e., the maximum absolute value over the discretized spatial domain.

4.3 Learned reduced model performance

The given snapshots representing 1ms of GEMS simulation data are used to learn the ROM. We are also given another 1ms of testing data at the monitor locations shown in Figure 0(b), which we use to assess the predictive capabilities of our learned ROMs beyond the range of training data.

The cumulative energy of the singular values of the snapshot data matrix is shown in Figure 3. The singular values that correspond to a cumulative energy of 97.5% and 99% are indicated with the red triangle and green square, respectively. We use basis sizes of , capturing 97.5% of the total energy, and , capturing 99% of the total energy.

Figure 3: The cumulative energy computed with Eq. (31). The leading singular values capture 97.5% or the energy and the leading capture 99%.

In Figures 3(a)3(b) we show the L-curve for each basis size. The L-curve for a basis of is somewhat uninformative in this case. The regularization parameters chosen were E+04 and E+05, which lie in the middle of the L-curve and are therefore a recommended choice by the L-curve criterion discussed above. For a basis of size of , the L-curve indicates a regularization parameter around E+04. Stable systems are produced for E+04 and E+04.

(a) Basis size .
(b) Basis size .
Figure 4: The L-curve for different basis sizes and regularization parameters . The horizontal axis shows the squared norm of learned operators, , and the vertical axis shows the least squares residual, .

We simulate the learned ROM for the two model sizes and with the same initial value and time step size (s) as those used to generate the training set. Figures 5 and 6 compare the time trace of pressure computed by GEMS (our “truth” data) with the ROM predictions for all 20000 time steps at the cell located at in the domain (denoted as monitor location 1 in Figure 0(b)). The performance of the ROM on the training data (first 1ms of data) is good in both cases, although the ROM (Figure 6) is more accurate. For test data predictions beyond the training data, Figure 6 shows that a basis of size of yields accurate phase predictions and pressure oscillation amplitudes that are good approximations of the truth. We notice an improved amplitude prediction when using a smaller regularization value.

(a) .
(b) .
Figure 5: Pressure time traces for basis size . Training with 10000 snapshots. Black vertical line denotes the end of the training data and the beginning of the test data.
(a) .
(b) .
Figure 6: Pressure time traces for basis size of . Training with 10000 snapshots. Black vertical line denotes the end of the training data and the beginning of the test data.

We also compute the average error of each variable over the entire domain at the last time step of the training set (the 10000th time step). The normalized absolute error, defined in Eq. (34), is shown for each species and for and velocity in Figure 7. This figure also shows the relative error, defined in Eq. (33), for pressure and temperature. These plots show that overall, the error is decreasing with an increasing basis size. At a basis size of 18, 22, and 24 the systems were unstable and so these basis sizes are excluded from the figure. The cause of this may be due to the fact that the same regularization parameter was used for each of these, E+04, and ideally one would pick a parameter specific for the basis size.

Figure 7: Error measures vs. basis size, averaged over the spatial domain at the last time step of training data. Normalized absolute error (34) of species and and velocities; Relative error (33) given for pressure and temperature.

In Figure 8, we show the integrated species concentrations over time. To compute these, at each time step in our simulation, we sum all elements of a species vector. This measure monitors whether our ROM conserves species mass, an important feature of a physically meaningful simulation. As the discretization of the high-fidelity model becomes finer, point-wise error may become large and misleading if the mass is shifted slightly into the neighboring cells. The integrated species concentration complements the evaluation of point-wise errors and provides a global view of the error in the domain.

Figure 8: Integrated species at each time step for different basis sizes. Training with 10000 snapshots.

We also compare the state variables over the entire domain predicted by the learned ROM with the given GEMS data at the last time step (which corresponds to s). We provide the true field, the ROM-predicted field, and an error field for each variable in Figures 916. Again, for pressure and temperature, we use a relative error from Eq. (33). For and velocity and for species molar concentrations we use a normalized absolute error from Eq. (34). The plots show that the ROM predictions are, as expected, not perfect, but indeed they capture well the overall structure and many details of the solution fields.

(a) True pressure.
(b) Relative error of pressure.
(c) Predicted pressure.
Figure 9: Predictive results for pressure at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True velocity .
(b) Normalized absolute error of velocity .
(c) ROM-predicted velocity .
Figure 10: Predictive results for velocity at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True velocity .
(b) Normalized absolute error of velocity .
(c) ROM-predicted velocity .
Figure 11: Predictive results for velocity at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True temperature.
(b) Relative error of temperature.
(c) Predicted temperature.
Figure 12: Predictive results for temperature at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True CH.
(b) Normalized absolute error of CH.
(c) Predicted CH.
Figure 13: Predictive results for CH molar concentration at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True O.
(b) Normalized absolute error of O.
(c) Predicted O.
Figure 14: Predictive results for O molar concentration at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True CO.
(b) Normalized absolute error of CO.
(c) Predicted CO.
Figure 15: Predictive results for CO molar concentration at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .
(a) True HO.
(b) Normalized absolute error of HO.
(c) Predicted HO.
Figure 16: Predictive results for HO molar concentration at the last time step of training data. Training with 10000 snapshots, a basis size of , and regularization set to .

Table 3 shows timing results for the ROM generation and simulation, as performed using python 3.6.4 on a dual-core Intel i5 processor with 2.3 GHz and 16GB RAM. We report the following CPU times: solving the operator inference least squares problem (32); ROM runtime for the two different basis sizes for 2ms of real time prediction; and reconstruction of the high-dimensional, unscaled combustion variables. The latter is required since the ROM is evolving the dynamics in the scaled variables, and thus a post-processing step is required to obtain the true magnitudes of the variables. In comparison with the approximately 200h of CPU time needed on a 40-core architecture (more details in Section 4.1) to compute the first 10000 snapshots of data, the ROMs provide five to six orders of magnitude in computational speedup.

ROM order LS solve of (21) ROM simulation Reconstructing high-dim. field
0.97s 2.88s 0.03s
6.05s 3.73s 0.05s
Table 3: CPU times for two ROMs with time step size and 20000 time steps.

5 Conclusion

Operator inference is a data-driven method for learning reduced-order models (ROMs) of dynamical systems with polynomial structure. This paper demonstrates how variable transformations can expose quadratic structure in the nonlinear system of partial differential equations describing a complex combustion model. This quadratic structure is preserved under projection, providing the mathematical justification for learning a quadratic ROM using operator inference. An important feature of the approach is that the learning of the ROM is non-intrusive—it requires state solutions generated by running the high-fidelity combustion model, but it does not require access to the discretized operators of the governing equations. This is important because it means that the variable transformations can be applied as a post-processing step to the simulation data set, but no intrusive modifications are needed to the high-fidelity code. While the quadratic model form is an approximation for this particular application problem, the numerical results show that the learned quadratic ROM can predict relevant quantities of interest and can also conserve species accurately. Many nonlinear equations in scientific and engineering applications admit variable transformations that expose polynomial structure. This combined with the non-intrusive nature of the approach make it a viable option for deriving ROMs for complex nonlinear applications where traditional intrusive model reduction is impractical and/or unreliable.

Appendix A: Lifting chemical source terms

The chemical source terms in in Eq. (6) are , see Eq. (8). The dynamics for the source terms are given by


To lift this system to polynomial form, we introduce the auxiliary variables