Lift & Learn: Physicsinformed machine learning for largescale nonlinear dynamical systems
Abstract
We present Lift & Learn, a physicsinformed method for learning lowdimensional models for largescale dynamical systems. The method exploits knowledge of a system’s governing equations to identify a coordinate transformation in which the system dynamics have quadratic structure. This transformation is called a lifting map because it often adds auxiliary variables to the system state. The lifting map is applied to data obtained by evaluating a model for the original nonlinear system. This lifted data is projected onto its leading principal components, and lowdimensional linear and quadratic matrix operators are fit to the lifted reduced data using a leastsquares operator inference procedure. Analysis of our method shows that the Lift & Learn models are able to capture the system physics in the lifted coordinates at least as accurately as traditional intrusive model reduction approaches. This preservation of system physics makes the Lift & Learn models robust to changes in inputs. Numerical experiments on the FitzHughNagumo neuron activation model and the compressible Euler equations demonstrate the generalizability of our model.
keywords:
datadriven model reduction, scientific machine learning, dynamical systems, partial differential equations, lifting map1 Introduction
The derivation of lowdimensional models for highdimensional dynamical systems from data is an important task that makes feasible manyquery computational analyses like uncertainty propagation, optimization, and control. Traditional model reduction methods rely on full knowledge of the system governing equations as well as the ability to intrusively manipulate solver codes. In contrast, classical machine learning methods fit models to data while treating the solver as a black box, ignoring knowledge of the problem physics. In this paper, we propose a new hybrid machine learningmodel reduction method called Lift & Learn, in which knowledge of the system governing equations is exploited to identify a set of lifted coordinates in which a lowdimensional model can be learned.
In projectionbased model reduction, the system governing equations are projected onto a lowdimensional approximation subspace to obtain a reduced model. The basis for the reduced space is computed from data; a common choice is the Proper Orthogonal Decomposition (POD) basis, comprised of the leading principal components of the data lumley1967structure; sirovich87turbulence; berkooz1993proper; holmes2012. However, the projection process is intrusive, requiring access to the codes that implement the highdimensional operators of the original equations. One way to avoid the intrusive projection is to learn a map from input parameters to coefficients of the reduced basis, e.g., via gappy POD bui2004aerodynamic; mifsud2019fusing, or using radial basis functions Nair2013, a neural network wang2019non; mainini2017data; swischuk2018physics or a nearestneighbors method swischuk2018physics. However, these approaches are agnostic to the dynamics of the system in that they do not model the evolution of the system state over time.
To learn models for the dynamics of a system, the work in schaeffer2018extracting uses compressed sensing methods to learn sparse, highdimensional matrix operators from data. Techniques from compressed sensing are also used in brunton2016discovering; rudy2017data to fit terms of governing equations from a dictionary of possible nonlinear terms. However, these approaches do not reduce the dimension of the system. To learn a reduced model, dynamic mode decomposition (DMD) rowley2009spectral; schmid2010dynamic projects data onto a reduced basis and then fits to the reduced data a linear operator. Similarly, the operator inference approach of Peherstorfer16DataDriven fits linear and polynomial matrix operators to reduced data. However, if the true underlying dynamics of the system are nonpolynomial, then the DMD and operator inference models may be inaccurate.
Several communities have explored using variable transformations to expose structure in a nonlinear system. In particular, Koopman operator theory, which states that every nonlinear dynamical system can be exactly described by an infinitedimensional linear operator that acts on scalar observables of the system koopman1932dynamical, has been used to extend DMD to fit linear models for nonlinear systems in observable space defined by a dictionary williams2015data, kernel kevrekidis2016kernel, or dictionary learning method li2017extended. In contrast, lifting transformations mccormick1976computability are maps derived for specific governing equations that yield a finitedimensional coordinate representation in which the system dynamics are quadratic gu2011qlmor. Lifting is exploited for model reduction in gu2011qlmor; kramer2018lifting; benner2015two; bennergoyal2016QBIRKA, where the quadratic operators of a highdimensional lifted model are projected onto a reduced space to obtain a quadratic reduced model. However, in many settings, it is impossible or impractical to explicitly derive a highdimensional lifted model, so the only available model is the original nonlinear one.
In Lift & Learn, we use the available nonlinear model to learn quadratic reduced model operators for the lifted system without requiring a highdimensional lifted model to be available. We collect state trajectory data by evaluating the original nonlinear model, lift the data, project the lifted data onto a lowdimensional basis, and fit reduced quadratic operators to the data using the operator inference procedure of Peherstorfer16DataDriven. In this way, we learn a quadratic model that respects the physics of the original nonlinear system. Our contributions are thus:

the use of lifting to explicitly parametrize the reduced model in a form that can be learned using the operator inference approach of Peherstorfer16DataDriven,

the use of learning to nonintrusively obtain lifted reduced models from data generated by the original nonlinear model, enabling their use even in settings where lifted models are unavailable, and

the exploitation of the preservation of system physics to prove guarantees about the lifted model fit to data.
Our approach fits reduced quadratic operators to the data in state space. In the situation where frequencydomain data is available, the work in gosea2018LoewnerQB fits quadratic operators to data in frequency space.
Section 2 describes projectionbased model reduction for nonlinear systems. Section 3 defines the lifting map and its properties and introduces our Lift & Learn model learning method. Section 4 derives a bound on the mismatch between the data and the Lift & Learn model dynamics. In Section 5 we apply our method to two examples: the FitzHughNagumo prototypical neuron activation model and the Euler fluid dynamics equations. Our numerical results demonstrate the learned models’ reliability and ability to generalize outside their training sets.
2 Projectionbased model reduction
Let denote a physical domain and let be a time domain for some final time . The nonlinear partial differential equation (PDE)
(1) 
defines a dynamical system for the dimensional vector state field
(2) 
where , for , and where
(3) 
is a nonlinear function that maps the state field to its time derivative. We assume that any spatial derivatives of appearing in are welldefined. Denote by the dimensional product space , so that .
We consider a semidiscrete model where the spatially discretized state vector corresponds to the values of the state variables at some collection of spatial points , e.g., in a finite difference discretization or a finite element setting with a nodal basis. Let denote the product domain . Then, the semidiscrete full model is given by a system of ordinary differential equations:
(4) 
where is a Lipschitz continuous function that discretizes .
Projectionbased model reduction seeks a lowdimensional approximation to eq. 4 to achieve computational speedups. Denote by the state snapshot at time , i.e., the solution of eq. 4 at time . The state snapshot matrix, , collects snapshots as
(5) 
Note that can contain states from multiple full model evaluations, e.g., from different initial conditions. The singular value decomposition (SVD) of is given by
(6) 
where , , and . The Proper Orthogonal Decomposition (POD) basis of size is denoted by and defined by the leading columns of . The POD state approximation is , where is the reduced state. The POD reduced model is defined by Galerkin projection:
(7) 
When has no particular structure, solving eq. 7 requires evaluating the dimensional map , which is expensive. When the PDE contains only polynomial nonlinearities in the state, however, the PODGalerkin model preserves this structure and can be efficiently evaluated without recourse to highdimensional quantities. That is, let
(8) 
where and are linear and quadratic operators, respectively. The discretized full model can then be written using matrix operators as follows:
(9) 
where is a linear matrix operator, is a matricized tensor operator, and denotes the Kronecker product. The PODGalerkin reduced model is then given by:
(10) 
where
(11) 
are reduced matrix operators. The cost of evaluating eq. 10 depends only on the reduced dimension . However, in many cases, including in our setting, the highdimensional operators and are not available, so the reduced matrix operators cannot be computed as in eq. 11 and must be obtained through other means.
3 Lift & Learn: Reliable, generalizable predictive models for nonlinear PDEs
Lift & Learn is a method for learning quadratic reduced models for dynamical systems governed by nonlinear PDEs. The method exposes structure in nonlinear PDEs by identifying a lifting transformation in which the PDE admits a quadratic representation. Nonquadratic state data is obtained by evaluating the original nonlinear model (eq. 4) and the lifting transformation is applied to this data. Quadratic reduced model operators are then fit to the transformed data. The result is an efficiently evaluable quadratic reduced model for the original nonlinear PDE.
Section 3.1 introduces lifting transformations and describes their properties. Section 3.2 describes how lifted reduced state data are obtained from a full model simulation in the original nonpolynomial system variables. Section 3.3 introduces the operator inference procedure of Peherstorfer16DataDriven used to learn the reduced model from the lifted data. Section 3.4 summarizes the Lift & Learn method.
3.1 Exposing structure via lifting transformations
We begin by defining lifting transformations.
Definition 1.
Define the lifting map,
(12) 
and let . is a quadratic lifting of eq. 1 if the following conditions are met:

the map is differentiable with respect to with bounded derivative, i.e., if is the Jacobian of with respect to , then
(13) for some , and

the lifted state satisfies
(14) where
(15) for some linear functions and quadratic functions , .
The dimensional vector field is called the lifted state and eq. 14 is the lifted PDE.
Note that this definition may be extended to PDEs that contain constant and inputdependent terms — see Section 5.1 for an example. Lifting maps that transform nonpolynomial dynamics into higherorder polynomial dynamics are also possible, but we focus on the quadratic case in this work. We now define a reverse lifting map which embeds the lifted state in the original state space.
Definition 2.
Given a lifting map , a map
(16) 
is called a reverse lifting map if it is differentiable with respect to with bounded derivative on and satisfies for all .
We now present a simple illustrative example of lifting.
Example 1.
Consider the nonlinear PDE,
(17) 
To lift eq. 17, we define the auxiliary variable . That is, the lifting map and its reverse map are given by
(18) 
so that the lifted system is quadratic:
(19) 
The lifting map must be specifically derived for the problem at hand. One strategy for doing so is to introduce auxiliary variables for the nonquadratic terms in the governing PDE and augment the system with evolution equations for these auxiliary variables gu2011qlmor. The work in gu2011qlmor shows that a large class of nonlinear terms which appear in engineering systems may be lifted to quadratic form in this way with , including monomial, sinusoidal, and exponential terms. In kramer2018lifting, this strategy is used to lift PDEs that govern systems in neuron modeling and combustion to quadratic form. Note that is generally nonunique (consider in Equation 19), and for a given , the reverse lifting map is also nonunique.
3.2 Lifted training data
This section presents a method for obtaining data in the lifted variables from the available nonlifted model (eq. 4).
State data We first collect original snapshot data by simulating the original full model eq. 4. Then, for each column of the data matrix (eq. 5), we apply the lifting map nodewise to the discrete state to obtain lifted snapshot data. That is, the lifted data matrix is given by
(20) 
where denotes the discrete lifting map defined by applying nodewise to each spatial node.
We denote the SVD of the transformed data by , with , , . The dimensional POD basis matrix is given by the leading columns of , denoted . Projection onto yields state data in the reduced space:
(21) 
Time derivative data To learn the matrix operators of a lifted reduced model of the form in eq. 10, reduced state time derivative data are also required. To obtain data for dynamics that are Markovian in the reduced state, we adapt the procedure in P19ReProj to the Lift & Learn setting. For each ,
(22) 
denotes the projection of the lifted state onto the subspace spanned by the POD basis. Denote by the discrete reverse lifting defined by applying nodewise for each spatial node. We assume that is a sufficiently rich basis that is welldefined for , and reverse the lifting for all to obtain a discrete nonlifted state that corresponds to the projected lifted state:
(23) 
We then evaluate the available nonlinear full model to obtain new time derivative data, which we denote :
(24) 
Let denote the Jacobian of with respect to . Applying the chain rule to eq. 24 yields a time derivative in the discrete lifted state:
(25) 
Time derivative data in the reduced space is then obtained by projecting eq. 25 back onto the POD basis:
(26) 
The reduced time derivative data matrix collects this data:
(27) 
3.3 Leastsquares operator inference procedure
Given reduced state snapshots (eq. 21) and the corresponding reduced time derivative data (eq. 27) and a postulated model form (eq. 10), operator inference Peherstorfer16DataDriven formulates the following minimization problem for learning the matrix operators of eq. 10:
(28) 
where denotes the columnwise Kronecker product, also known as the KhatriRao product. Each column of , , and corresponds to a single component of the reduced state , yielding independent leastsquares problems which each define one row of and . Each leastsquares problem has degrees of freedom when the structural redundancy of the Kronecker product is accounted for.
In practice, to solve eq. 28, we form a data matrix,
(29) 
where contains quadratic data with the redundant cross terms of the Kronecker product removed. We then solve the leastsquares equation,
(30) 
where contains coefficients of quadratic terms without redundancy, and we reconstruct the symmetric tensor by splitting the coefficients for quadratic cross terms in across the redundant terms. As long as has full column rank, eq. 30 has a unique solution golub1996matrix.
The Lift & Learn reduced model is then given by
(31) 
Note that the linearquadratic case is considered for simplicity and concreteness, but the operator inference procedure can be flexibly adapted to learn reduced models with constant, inputdependent, and higherorder polynomial terms. Section 5.1 contains an example in which input operators are learned.
3.4 Lift & Learn: Method summary
The Lift & Learn method is summarized in Algorithm 1. The first step is to identify an appropriate lifting transformation as described in Section 3.1. The available full model in the original nonlifted variables is then used to obtain lifted reduced state data as described in Section 3.2. The operator inference framework in Section 3.3 is then employed to learn a quadratic reduced model. Note that although this paper has primarily considered the case where the governing physics are described by nonlinear PDEs, the approach applies equally to systems where the governing equations are highdimensional nonlinear ODEs.
4 Finite difference analysis
This section presents analysis of the Lift & Learn method in the setting where the full model arises from a consistent finite difference discretization of the original nonlinear PDE. Section 4.1 presents the setting of our analysis and Section 4.2 proves an upper bound on the residual of the Lift & Learn minimization.
4.1 Consistent finite difference models
We now provide a local error analysis for the setting where eq. 4 arises from a consistent finitedifference discretization of the state. Let denote the vector of original state values at the grid points :
(32) 
Note that differs from in that is the exact continuous (strong) solution of the original PDE (eq. 1) evaluated at the grid points, and is the semidiscrete solution of the spatially discrete set of ODEs (eq. 4). If eq. 4 is a consistent order discretization of eq. 1, then for any given there exists a constant such that
(33) 
for and , where is defined as in eq. 3 and denotes the th entry of the vectorvalued nonlinear function , which corresponds to the time derivative of .
Now, let denote the discretization of the exact continuous lifted state on the same spatial grid:
(34) 
Then,
(35) 
is a consistent order discretization of eq. 14 if for any given there exists a constant such that, for all ,
(36) 
for and , where and denote the th rows of and , respectively, which correspond to the time derivative of the th lifted state at the th spatial node.
We note that in Lift & Learn, we assume that the discretized nonlinear model (eq. 4) is available, and that it is stable. In contrast, for the lifted system, we assume only that the discrete operators and for a consistent discretization exist — they are used for analysis only, and availability is not required.
4.2 Theoretical results
We now prove that the minimum achieved by the Lift & Learn model in eq. 28 is bounded above by the objective value achieved by the intrusive lifted reduced model. This demonstrates two advantages of our method:

our nonintrusively obtained model is able to model the data at least as well as an intrusive reduced model, and

unlike many other learning methods, the quadratic form of the model we learn respects the model physics, enabling us to put an upper bound on the residual of the Lift & Learn model on the training data.
We begin with a consistency result.
Lemma 1.
Proof.
Because is defined by applying nodewise, the th row of contains partial derivatives of the th lifted state at the th spatial node with respect to . This corresponds to the th row of evaluated at th spatial node. That is, for all ,
(38)  
(39) 
for all , . Then, for all ,
(40)  
(41) 
Since is has bounded derivative, is bounded. Then, because is consistent, we have, for all ,
(42) 
By definition, the lifted dynamics are exact for , so for all , and ,
(43) 
Since , we can use the triangle inequality to combine eqs. 43, 42 and 36 as follows:
(44) 
for all , , and . Since there are discrete lifted states, we have for all the following bound (up to a constant):
(45) 
∎
Lemma 1 holds for any state for which the quadratic lifted dynamics of the corresponding are exactly equivalent to the nonlifted dynamics (eq. 43). Let be a collection of snapshots of these exact lifted states and suppose that is an dimensional POD basis for the . We now prove an analogous bound for the projected snapshots :
Lemma 2.
Assume is also Lipschitz and define . Then, there exist constants , , and such that for all
(46) 
where is the identity matrix of dimension .
Proof.
We begin by breaking the left side of eq. 46 into the following terms using the triangle inequality
(47)  
The middle term can be bounded by Lemma 1. For the first term, define . Note that since is polynomial, it is Lipschitz over the finite set of snapshots, so there exists a constant such that
(48) 
For the third term on the right side of eq. 47, since and are Lipschitz, the function is also Lipschitz over the finite set of snapshots. Then, for some constant ,
(49) 
This result lets us upperbound the residual of the Lift & Learn minimization (eq. 28).
Theorem 1.
Let be the singular values of the snapshot data matrix . Then, the residual of the Lift & Learn operator inference is bounded as follows:
(50) 
where .
Proof.
Apply Lemma 2 to each projected state snapshot :
(51) 
Then, for each snapshot,
(52) 
since because its columns are orthonormal vectors. Since the snapshots in the Lift & Learn data are used to compute the POD basis,
(53) 
for all snapshots . Thus, taking the mean of the square over all snapshots,
(54) 
Note that this sum is exactly the objective function in eq. 28. Thus, choosing and in eq. 28 would yield an objective value less than , so the minimizer must be bounded by this value as well. ∎
The leastsquares residual is a measure of the mismatch between the postulated quadratic model form and the true dynamics of the system. Theorem 1 shows that this mismatch has an upper bound dependent on two factors: the truncation error of the full model and the projection error of the POD basis.
5 Results
In this section, the Lift & Learn method is applied to two different dynamical systems. Section 5.1 considers the FitzHughNagumo prototypical neuron activation model and Section 5.2 considers the Euler equations for inviscid fluid flow.
5.1 Application to the FitzHughNagumo system
The FitzHughNagumo system was first proposed in 1961 fitzhugh1961impulses and then realized as a circuit for electronic pulse transmission in nagumo1962active. The system has been used to study exciteable oscillatory dynamics such as those found in cardiac rocsoreanu2012fitzhugh and neuron modeling longtin1993stochastic, and has become a benchmark problem in nonlinear model reduction deim2010. Section 5.1.1 introduces the FitzHughNagumo equations and lifts the system to quadratic form. Section 5.1.2 describes the data used for learning and the performance of the model on training and test sets.
FitzHughNagumo problem statement and lifting transformation
The FitzHughNagumo equations are a simplified neuron activation model with two states: represents voltage and represents voltage recovery. We consider the equations using the same parameters as in deim2010:
(55a)  
(55b) 
where . The full model solves the system eq. 55 on the spatial domain for the timeframe . At , the states and are both zero everywhere, and the boundary conditions are given by
(56) 
where is a timedependent input which represents a neuron stimulus. Because eq. 55a contains a cubic nonlinear term, the model is lifted to quadratic form. Although the operator inference framework developed in Peherstorfer16DataDriven can, in principle, learn cubic and higherorder tensor terms, the number of unknowns in the learning problem can increase exponentially as the polynomial order of the model is increased, if the higherorder operators are dense.
The following lifting map is used in benner2015two; kramer2018lifting to lift the system to quadratic form:
(57) 
The lifted system is then given by
(58a)  
(58b)  
(58c) 
To be consistent with the prescribed initial and boundary conditions for the original state, must be zero everywhere at and its boundary conditions are given by
(59) 
The lifted system in eq. 58 contains only quadratic nonlinear dependencies on the state. Note that no approximation has been introduced in lifting eq. 55 to eq. 58.
Numerical experiments
We wish to model the response of the FitzHughNagumo system to inputs of the form
(61) 
where the parameter varies loguniformly between 500 and 50000, and the parameter varies uniformly on the range . To train our Lift & Learn model of the form eq. 60, snapshot data from nine simulations of the original equations (eq. 55) corresponding to the parameters and are generated. The simulation outputs the system state and on a uniform spatial grid with nodes. The data are recorded every 0.01 seconds, yielding 400 state snapshots for each simulation, with a total of 3600 snapshots used for learning. The lifting map is applied to the state data to obtain lifted data for , , and . Separate POD bases are computed for each lifted state variable. That is, if denote the snapshot data in , , and , respectively, then
(62) 
The number of modes to retain in our lowdimensional model is determined by examining the quantity
(63) 
where are the singular values of the data. The quantity in eq. 63 is the relative energy of the unretained modes of the training data, or the energy lost in truncation. The energy spectrum of the training data in each separate lifted state variable is shown in Figure 1. The required numbers of modes needed to capture 99.9%, 99.99%, 99.999%, and 99.9999% of the energy in the data are tabulated in Table 1, along with the corresponding Lift & Learn model dimensions.
# modes required  
Retained energy  Total  
99.9%  1  1  1  3 
99.99%  2  1  3  6 
99.999%  3  3  4  10 
99.9999%  5  4  5  14 
For each of the model sizes in Table 1, the data are generated by evaluating the original nonquadratic model and lifting the data, as described in Section 3.2. The corresponding input data and bilinear stateinput data are also collected. The state, time derivative, input, and bilinear stateinput data are then used to learn a model of the form eq. 60. Additionally, we exploit knowledge of the governing equations to enforce blocksparsity in the leastsquares solution; for example, there are only linear and constant terms in the evolution of , so for the reduced state components corresponding to , only the linear and constant terms are inferred.
The training set consists of the nine training trajectories described above. Two test sets are considered: one in which 100 trajectories are generated with and realizations randomly drawn from their distributions above, in the same regime as the training set, and a second test set in which 100 trajectories are generated from a different parameter regime, with varying loguniformly on and . Training and test errors are shown in Figure 2. For each training and test input, the error relative to the solution of the original full model, , is calculated by solving the reduced model to generate predictions, then reconstructing the full lifted state from the Lift & Learn reduced trajectory, and finally reversing the lifting to obtain , the Lift & Learn prediction of the trajectory in the original full state space. The relative error is given by
(64) 
The relative error of the intrusive lifted POD reduced model (obtained by discretizing the lifted PDE and then reducing, as in kramer2018lifting) is shown for reference.