Lift & Learn: Physics-informed machine learning for large-scale nonlinear dynamical systems
We present Lift & Learn, a physics-informed method for learning low-dimensional models for large-scale dynamical systems. The method exploits knowledge of a system’s governing equations to identify a coordinate transformation in which the system dynamics have quadratic structure. This transformation is called a lifting map because it often adds auxiliary variables to the system state. The lifting map is applied to data obtained by evaluating a model for the original nonlinear system. This lifted data is projected onto its leading principal components, and low-dimensional linear and quadratic matrix operators are fit to the lifted reduced data using a least-squares operator inference procedure. Analysis of our method shows that the Lift & Learn models are able to capture the system physics in the lifted coordinates at least as accurately as traditional intrusive model reduction approaches. This preservation of system physics makes the Lift & Learn models robust to changes in inputs. Numerical experiments on the FitzHugh-Nagumo neuron activation model and the compressible Euler equations demonstrate the generalizability of our model.
keywords:data-driven model reduction, scientific machine learning, dynamical systems, partial differential equations, lifting map
The derivation of low-dimensional models for high-dimensional dynamical systems from data is an important task that makes feasible many-query computational analyses like uncertainty propagation, optimization, and control. Traditional model reduction methods rely on full knowledge of the system governing equations as well as the ability to intrusively manipulate solver codes. In contrast, classical machine learning methods fit models to data while treating the solver as a black box, ignoring knowledge of the problem physics. In this paper, we propose a new hybrid machine learning-model reduction method called Lift & Learn, in which knowledge of the system governing equations is exploited to identify a set of lifted coordinates in which a low-dimensional model can be learned.
In projection-based model reduction, the system governing equations are projected onto a low-dimensional approximation subspace to obtain a reduced model. The basis for the reduced space is computed from data; a common choice is the Proper Orthogonal Decomposition (POD) basis, comprised of the leading principal components of the data lumley1967structure; sirovich87turbulence; berkooz1993proper; holmes2012. However, the projection process is intrusive, requiring access to the codes that implement the high-dimensional operators of the original equations. One way to avoid the intrusive projection is to learn a map from input parameters to coefficients of the reduced basis, e.g., via gappy POD bui2004aerodynamic; mifsud2019fusing, or using radial basis functions Nair2013, a neural network wang2019non; mainini2017data; swischuk2018physics or a nearest-neighbors method swischuk2018physics. However, these approaches are agnostic to the dynamics of the system in that they do not model the evolution of the system state over time.
To learn models for the dynamics of a system, the work in schaeffer2018extracting uses compressed sensing methods to learn sparse, high-dimensional matrix operators from data. Techniques from compressed sensing are also used in brunton2016discovering; rudy2017data to fit terms of governing equations from a dictionary of possible nonlinear terms. However, these approaches do not reduce the dimension of the system. To learn a reduced model, dynamic mode decomposition (DMD) rowley2009spectral; schmid2010dynamic projects data onto a reduced basis and then fits to the reduced data a linear operator. Similarly, the operator inference approach of Peherstorfer16DataDriven fits linear and polynomial matrix operators to reduced data. However, if the true underlying dynamics of the system are non-polynomial, then the DMD and operator inference models may be inaccurate.
Several communities have explored using variable transformations to expose structure in a nonlinear system. In particular, Koopman operator theory, which states that every nonlinear dynamical system can be exactly described by an infinite-dimensional linear operator that acts on scalar observables of the system koopman1932dynamical, has been used to extend DMD to fit linear models for nonlinear systems in observable space defined by a dictionary williams2015data, kernel kevrekidis2016kernel, or dictionary learning method li2017extended. In contrast, lifting transformations mccormick1976computability are maps derived for specific governing equations that yield a finite-dimensional coordinate representation in which the system dynamics are quadratic gu2011qlmor. Lifting is exploited for model reduction in gu2011qlmor; kramer2018lifting; benner2015two; bennergoyal2016QBIRKA, where the quadratic operators of a high-dimensional lifted model are projected onto a reduced space to obtain a quadratic reduced model. However, in many settings, it is impossible or impractical to explicitly derive a high-dimensional lifted model, so the only available model is the original nonlinear one.
In Lift & Learn, we use the available nonlinear model to learn quadratic reduced model operators for the lifted system without requiring a high-dimensional lifted model to be available. We collect state trajectory data by evaluating the original nonlinear model, lift the data, project the lifted data onto a low-dimensional basis, and fit reduced quadratic operators to the data using the operator inference procedure of Peherstorfer16DataDriven. In this way, we learn a quadratic model that respects the physics of the original nonlinear system. Our contributions are thus:
the use of lifting to explicitly parametrize the reduced model in a form that can be learned using the operator inference approach of Peherstorfer16DataDriven,
the use of learning to non-intrusively obtain lifted reduced models from data generated by the original nonlinear model, enabling their use even in settings where lifted models are unavailable, and
the exploitation of the preservation of system physics to prove guarantees about the lifted model fit to data.
Our approach fits reduced quadratic operators to the data in state space. In the situation where frequency-domain data is available, the work in gosea2018LoewnerQB fits quadratic operators to data in frequency space.
Section 2 describes projection-based model reduction for nonlinear systems. Section 3 defines the lifting map and its properties and introduces our Lift & Learn model learning method. Section 4 derives a bound on the mismatch between the data and the Lift & Learn model dynamics. In Section 5 we apply our method to two examples: the FitzHugh-Nagumo prototypical neuron activation model and the Euler fluid dynamics equations. Our numerical results demonstrate the learned models’ reliability and ability to generalize outside their training sets.
2 Projection-based model reduction
Let denote a physical domain and let be a time domain for some final time . The nonlinear partial differential equation (PDE)
defines a dynamical system for the -dimensional vector state field
where , for , and where
is a nonlinear function that maps the state field to its time derivative. We assume that any spatial derivatives of appearing in are well-defined. Denote by the -dimensional product space , so that .
We consider a semi-discrete model where the spatially discretized state vector corresponds to the values of the state variables at some collection of spatial points , e.g., in a finite difference discretization or a finite element setting with a nodal basis. Let denote the product domain . Then, the semi-discrete full model is given by a system of ordinary differential equations:
where is a Lipschitz continuous function that discretizes .
Projection-based model reduction seeks a low-dimensional approximation to eq. 4 to achieve computational speed-ups. Denote by the state snapshot at time , i.e., the solution of eq. 4 at time . The state snapshot matrix, , collects snapshots as
Note that can contain states from multiple full model evaluations, e.g., from different initial conditions. The singular value decomposition (SVD) of is given by
where , , and . The Proper Orthogonal Decomposition (POD) basis of size is denoted by and defined by the leading columns of . The POD state approximation is , where is the reduced state. The POD reduced model is defined by Galerkin projection:
When has no particular structure, solving eq. 7 requires evaluating the -dimensional map , which is expensive. When the PDE contains only polynomial nonlinearities in the state, however, the POD-Galerkin model preserves this structure and can be efficiently evaluated without recourse to high-dimensional quantities. That is, let
where and are linear and quadratic operators, respectively. The discretized full model can then be written using matrix operators as follows:
where is a linear matrix operator, is a matricized tensor operator, and denotes the Kronecker product. The POD-Galerkin reduced model is then given by:
are reduced matrix operators. The cost of evaluating eq. 10 depends only on the reduced dimension . However, in many cases, including in our setting, the high-dimensional operators and are not available, so the reduced matrix operators cannot be computed as in eq. 11 and must be obtained through other means.
3 Lift & Learn: Reliable, generalizable predictive models for nonlinear PDEs
Lift & Learn is a method for learning quadratic reduced models for dynamical systems governed by nonlinear PDEs. The method exposes structure in nonlinear PDEs by identifying a lifting transformation in which the PDE admits a quadratic representation. Non-quadratic state data is obtained by evaluating the original nonlinear model (eq. 4) and the lifting transformation is applied to this data. Quadratic reduced model operators are then fit to the transformed data. The result is an efficiently evaluable quadratic reduced model for the original nonlinear PDE.
Section 3.1 introduces lifting transformations and describes their properties. Section 3.2 describes how lifted reduced state data are obtained from a full model simulation in the original non-polynomial system variables. Section 3.3 introduces the operator inference procedure of Peherstorfer16DataDriven used to learn the reduced model from the lifted data. Section 3.4 summarizes the Lift & Learn method.
3.1 Exposing structure via lifting transformations
We begin by defining lifting transformations.
Define the lifting map,
and let . is a quadratic lifting of eq. 1 if the following conditions are met:
the map is differentiable with respect to with bounded derivative, i.e., if is the Jacobian of with respect to , then
for some , and
the lifted state satisfies
for some linear functions and quadratic functions , .
The -dimensional vector field is called the lifted state and eq. 14 is the lifted PDE.
Note that this definition may be extended to PDEs that contain constant and input-dependent terms — see Section 5.1 for an example. Lifting maps that transform non-polynomial dynamics into higher-order polynomial dynamics are also possible, but we focus on the quadratic case in this work. We now define a reverse lifting map which embeds the lifted state in the original state space.
Given a lifting map , a map
is called a reverse lifting map if it is differentiable with respect to with bounded derivative on and satisfies for all .
We now present a simple illustrative example of lifting.
Consider the nonlinear PDE,
To lift eq. 17, we define the auxiliary variable . That is, the lifting map and its reverse map are given by
so that the lifted system is quadratic:
The lifting map must be specifically derived for the problem at hand. One strategy for doing so is to introduce auxiliary variables for the non-quadratic terms in the governing PDE and augment the system with evolution equations for these auxiliary variables gu2011qlmor. The work in gu2011qlmor shows that a large class of nonlinear terms which appear in engineering systems may be lifted to quadratic form in this way with , including monomial, sinusoidal, and exponential terms. In kramer2018lifting, this strategy is used to lift PDEs that govern systems in neuron modeling and combustion to quadratic form. Note that is generally non-unique (consider in Equation 19), and for a given , the reverse lifting map is also non-unique.
3.2 Lifted training data
This section presents a method for obtaining data in the lifted variables from the available non-lifted model (eq. 4).
State data We first collect original snapshot data by simulating the original full model eq. 4. Then, for each column of the data matrix (eq. 5), we apply the lifting map node-wise to the discrete state to obtain lifted snapshot data. That is, the lifted data matrix is given by
where denotes the discrete lifting map defined by applying node-wise to each spatial node.
We denote the SVD of the transformed data by , with , , . The -dimensional POD basis matrix is given by the leading columns of , denoted . Projection onto yields state data in the reduced space:
Time derivative data To learn the matrix operators of a lifted reduced model of the form in eq. 10, reduced state time derivative data are also required. To obtain data for dynamics that are Markovian in the reduced state, we adapt the procedure in P19ReProj to the Lift & Learn setting. For each ,
denotes the projection of the lifted state onto the subspace spanned by the POD basis. Denote by the discrete reverse lifting defined by applying node-wise for each spatial node. We assume that is a sufficiently rich basis that is well-defined for , and reverse the lifting for all to obtain a discrete non-lifted state that corresponds to the projected lifted state:
We then evaluate the available nonlinear full model to obtain new time derivative data, which we denote :
Let denote the Jacobian of with respect to . Applying the chain rule to eq. 24 yields a time derivative in the discrete lifted state:
Time derivative data in the reduced space is then obtained by projecting eq. 25 back onto the POD basis:
The reduced time derivative data matrix collects this data:
3.3 Least-squares operator inference procedure
Given reduced state snapshots (eq. 21) and the corresponding reduced time derivative data (eq. 27) and a postulated model form (eq. 10), operator inference Peherstorfer16DataDriven formulates the following minimization problem for learning the matrix operators of eq. 10:
where denotes the column-wise Kronecker product, also known as the Khatri-Rao product. Each column of , , and corresponds to a single component of the reduced state , yielding independent least-squares problems which each define one row of and . Each least-squares problem has degrees of freedom when the structural redundancy of the Kronecker product is accounted for.
In practice, to solve eq. 28, we form a data matrix,
where contains quadratic data with the redundant cross terms of the Kronecker product removed. We then solve the least-squares equation,
where contains coefficients of quadratic terms without redundancy, and we reconstruct the symmetric tensor by splitting the coefficients for quadratic cross terms in across the redundant terms. As long as has full column rank, eq. 30 has a unique solution golub1996matrix.
The Lift & Learn reduced model is then given by
Note that the linear-quadratic case is considered for simplicity and concreteness, but the operator inference procedure can be flexibly adapted to learn reduced models with constant, input-dependent, and higher-order polynomial terms. Section 5.1 contains an example in which input operators are learned.
3.4 Lift & Learn: Method summary
The Lift & Learn method is summarized in Algorithm 1. The first step is to identify an appropriate lifting transformation as described in Section 3.1. The available full model in the original non-lifted variables is then used to obtain lifted reduced state data as described in Section 3.2. The operator inference framework in Section 3.3 is then employed to learn a quadratic reduced model. Note that although this paper has primarily considered the case where the governing physics are described by nonlinear PDEs, the approach applies equally to systems where the governing equations are high-dimensional nonlinear ODEs.
4 Finite difference analysis
This section presents analysis of the Lift & Learn method in the setting where the full model arises from a consistent finite difference discretization of the original nonlinear PDE. Section 4.1 presents the setting of our analysis and Section 4.2 proves an upper bound on the residual of the Lift & Learn minimization.
4.1 Consistent finite difference models
We now provide a local error analysis for the setting where eq. 4 arises from a consistent finite-difference discretization of the state. Let denote the vector of original state values at the grid points :
Note that differs from in that is the exact continuous (strong) solution of the original PDE (eq. 1) evaluated at the grid points, and is the semi-discrete solution of the spatially discrete set of ODEs (eq. 4). If eq. 4 is a consistent order- discretization of eq. 1, then for any given there exists a constant such that
for and , where is defined as in eq. 3 and denotes the -th entry of the vector-valued nonlinear function , which corresponds to the time derivative of .
Now, let denote the discretization of the exact continuous lifted state on the same spatial grid:
is a consistent order- discretization of eq. 14 if for any given there exists a constant such that, for all ,
for and , where and denote the -th rows of and , respectively, which correspond to the time derivative of the -th lifted state at the -th spatial node.
We note that in Lift & Learn, we assume that the discretized nonlinear model (eq. 4) is available, and that it is stable. In contrast, for the lifted system, we assume only that the discrete operators and for a consistent discretization exist — they are used for analysis only, and availability is not required.
4.2 Theoretical results
We now prove that the minimum achieved by the Lift & Learn model in eq. 28 is bounded above by the objective value achieved by the intrusive lifted reduced model. This demonstrates two advantages of our method:
our non-intrusively obtained model is able to model the data at least as well as an intrusive reduced model, and
unlike many other learning methods, the quadratic form of the model we learn respects the model physics, enabling us to put an upper bound on the residual of the Lift & Learn model on the training data.
We begin with a consistency result.
Because is defined by applying node-wise, the -th row of contains partial derivatives of the -th lifted state at the -th spatial node with respect to . This corresponds to the -th row of evaluated at -th spatial node. That is, for all ,
for all , . Then, for all ,
Since is has bounded derivative, is bounded. Then, because is consistent, we have, for all ,
By definition, the lifted dynamics are exact for , so for all , and ,
for all , , and . Since there are discrete lifted states, we have for all the following bound (up to a constant):
Lemma 1 holds for any state for which the quadratic lifted dynamics of the corresponding are exactly equivalent to the non-lifted dynamics (eq. 43). Let be a collection of snapshots of these exact lifted states and suppose that is an -dimensional POD basis for the . We now prove an analogous bound for the projected snapshots :
Assume is also Lipschitz and define . Then, there exist constants , , and such that for all
where is the identity matrix of dimension .
We begin by breaking the left side of eq. 46 into the following terms using the triangle inequality
The middle term can be bounded by Lemma 1. For the first term, define . Note that since is polynomial, it is Lipschitz over the finite set of snapshots, so there exists a constant such that
For the third term on the right side of eq. 47, since and are Lipschitz, the function is also Lipschitz over the finite set of snapshots. Then, for some constant ,
This result lets us upper-bound the residual of the Lift & Learn minimization (eq. 28).
Let be the singular values of the snapshot data matrix . Then, the residual of the Lift & Learn operator inference is bounded as follows:
Apply Lemma 2 to each projected state snapshot :
Then, for each snapshot,
since because its columns are orthonormal vectors. Since the snapshots in the Lift & Learn data are used to compute the POD basis,
for all snapshots . Thus, taking the mean of the square over all snapshots,
The least-squares residual is a measure of the mismatch between the postulated quadratic model form and the true dynamics of the system. Theorem 1 shows that this mismatch has an upper bound dependent on two factors: the truncation error of the full model and the projection error of the POD basis.
In this section, the Lift & Learn method is applied to two different dynamical systems. Section 5.1 considers the FitzHugh-Nagumo prototypical neuron activation model and Section 5.2 considers the Euler equations for inviscid fluid flow.
5.1 Application to the FitzHugh-Nagumo system
The FitzHugh-Nagumo system was first proposed in 1961 fitzhugh1961impulses and then realized as a circuit for electronic pulse transmission in nagumo1962active. The system has been used to study exciteable oscillatory dynamics such as those found in cardiac rocsoreanu2012fitzhugh and neuron modeling longtin1993stochastic, and has become a benchmark problem in nonlinear model reduction deim2010. Section 5.1.1 introduces the FitzHugh-Nagumo equations and lifts the system to quadratic form. Section 5.1.2 describes the data used for learning and the performance of the model on training and test sets.
FitzHugh-Nagumo problem statement and lifting transformation
The FitzHugh-Nagumo equations are a simplified neuron activation model with two states: represents voltage and represents voltage recovery. We consider the equations using the same parameters as in deim2010:
where . The full model solves the system eq. 55 on the spatial domain for the timeframe . At , the states and are both zero everywhere, and the boundary conditions are given by
where is a time-dependent input which represents a neuron stimulus. Because eq. 55a contains a cubic nonlinear term, the model is lifted to quadratic form. Although the operator inference framework developed in Peherstorfer16DataDriven can, in principle, learn cubic and higher-order tensor terms, the number of unknowns in the learning problem can increase exponentially as the polynomial order of the model is increased, if the higher-order operators are dense.
The following lifting map is used in benner2015two; kramer2018lifting to lift the system to quadratic form:
The lifted system is then given by
To be consistent with the prescribed initial and boundary conditions for the original state, must be zero everywhere at and its boundary conditions are given by
We wish to model the response of the FitzHugh-Nagumo system to inputs of the form
where the parameter varies log-uniformly between 500 and 50000, and the parameter varies uniformly on the range . To train our Lift & Learn model of the form eq. 60, snapshot data from nine simulations of the original equations (eq. 55) corresponding to the parameters and are generated. The simulation outputs the system state and on a uniform spatial grid with nodes. The data are recorded every 0.01 seconds, yielding 400 state snapshots for each simulation, with a total of 3600 snapshots used for learning. The lifting map is applied to the state data to obtain lifted data for , , and . Separate POD bases are computed for each lifted state variable. That is, if denote the snapshot data in , , and , respectively, then
The number of modes to retain in our low-dimensional model is determined by examining the quantity
where are the singular values of the data. The quantity in eq. 63 is the relative energy of the unretained modes of the training data, or the energy lost in truncation. The energy spectrum of the training data in each separate lifted state variable is shown in Figure 1. The required numbers of modes needed to capture 99.9%, 99.99%, 99.999%, and 99.9999% of the energy in the data are tabulated in Table 1, along with the corresponding Lift & Learn model dimensions.
|# modes required|
For each of the model sizes in Table 1, the data are generated by evaluating the original non-quadratic model and lifting the data, as described in Section 3.2. The corresponding input data and bilinear state-input data are also collected. The state, time derivative, input, and bilinear state-input data are then used to learn a model of the form eq. 60. Additionally, we exploit knowledge of the governing equations to enforce block-sparsity in the least-squares solution; for example, there are only linear and constant terms in the evolution of , so for the reduced state components corresponding to , only the linear and constant terms are inferred.
The training set consists of the nine training trajectories described above. Two test sets are considered: one in which 100 trajectories are generated with and realizations randomly drawn from their distributions above, in the same regime as the training set, and a second test set in which 100 trajectories are generated from a different parameter regime, with varying log-uniformly on and . Training and test errors are shown in Figure 2. For each training and test input, the error relative to the solution of the original full model, , is calculated by solving the reduced model to generate predictions, then reconstructing the full lifted state from the Lift & Learn reduced trajectory, and finally reversing the lifting to obtain , the Lift & Learn prediction of the trajectory in the original full state space. The relative error is given by
The relative error of the intrusive lifted POD reduced model (obtained by discretizing the lifted PDE and then reducing, as in kramer2018lifting) is shown for reference.