The Koopman operator is a linear but infinite dimensional operator that governs the evolution of scalar observables defined on the state space of an autonomous dynamical system, and is a powerful tool for the analysis and decomposition of nonlinear dynamical systems. In this manuscript, we present a data driven method for approximating the leading eigenvalues, eigenfunctions, and modes of the Koopman operator. The method requires a data set of snapshot pairs and a dictionary of scalar observables, but does not require explicit governing equations or interaction with a “black box” integrator. We will show that this approach is, in effect, an extension of Dynamic Mode Decomposition (DMD), which has been used to approximate the Koopman eigenvalues and modes. Furthermore, if the data provided to the method are generated by a Markov process instead of a deterministic dynamical system, the algorithm approximates the eigenfunctions of the Kolmogorov backward equation, which could be considered as the “stochastic Koopman operator” . Finally, four illustrative examples are presented: two that highlight the quantitative performance of the method when presented with either deterministic or stochastic data, and two that show potential applications of the Koopman eigenfunctions.
Approximating the Koopman Operator with Data] A Data–Driven Approximation
of the Koopman Operator:
Extending Dynamic Mode Decomposition M.O. Williams and I.G. Kevrekidis and C.W. Rowley] \subjclassPrimary: 65P99; 37M25; Secondary: 47B33.
Matthew O. Williams
Program in Applied and Computational Mathematics (PACM)
Princeton University, NJ 08544, USA.
Ioannis G. Kevrekidis
Chemical and Biological Engineering Department & PACM
Princeton University, NJ 08544, USA.
Clarence W. Rowley
Mechanical and Aerospace Engineering Department
Princeton University, NJ 08544, USA.
In many mathematical and engineering applications, a phenomenon of interest can be summarized in different ways. For instance, to describe the state of a two dimensional incompressible fluid flow, one can either record velocity and pressure fields or streamfunction and vorticity . Furthermore, these states can often be approximated using a low dimensional set of Proper Orthogonal Decomposition (POD) modes , a set of Dynamic Modes [4, 5], or a finite collection of Lagrangian particles . A mathematical example is the linear time invariant (LTI) system provided by , where is the system state at the -th timestep. Written as such, the evolution of is governed by the eigenvalues of . One could also consider the invertible but nonlinear transformation, , which generates a nonlinear evolution law for . Both approaches (i.e., or ) describe the same fundamental behavior, yet one description may be preferable to others. For example, solving an LTI system is almost certainly preferable to evolving a nonlinear system from a computational standpoint.
In general, one measures (or computes) the state of a system using a set of scalar observables, which are functions defined on state space, and watches how the values of these functions evolve in time. Furthermore, provided the set of observations is rich enough, one can even write an evolution law for the dynamics of the set of observations, and use this system in lieu of the original one. Because the properties of this new dynamical system depend on our choice of variables (observables), it would be highly desirable if one could find a set of observables whose dynamics appear to be governed by a linear evolution law. If such a set could be identified, the dynamics would be completely determined by the spectrum of the evolution operator. Furthermore, this could enable the simple yet effective algorithms designed for linear systems, for example controller design [7, 8] or stability analysis [1, 9], to be applied to nonlinear systems.
Mathematically, the evolution of observables of the system state is governed by the Koopman operator [10, 11, 12, 13], which is a linear but infinite dimensional operator that is defined for an autonomous dynamical system. Of particular interest here is the “slow” subspace of the Koopman operator, which is the span of the eigenfunctions associated with eigenvalues near the unit circle in discrete time (or near the imaginary axis in continuous time). These eigenvalues and eigenfunctions capture the long term dynamics of observables that appear after the fast transients have subsided, and could serve as a low dimensional approximation of the otherwise infinite dimensional operator when a spectral gap, which clearly delineates the “fast” and “slow” temporal dynamics, is present. In addition to the eigenvalues and eigenfunctions, the final element of Koopman spectral analysis is the set of Koopman modes for the full state observable [13, 12], which are vectors that enable us to reconstruct the state of the system as a linear combination of the Koopman eigenfunctions. Overall, the “tuples” of Koopman eigenfunctions, eigenvalues, and modes enable us to: (a) transform state space so that it the dynamics appear to be linear, (b) determine the temporal dynamics of the linear system, and (c) reconstruct the state of the original system from our new linear representation. In principle, this framework is quite broadly applicable, and useful even for problems with multiple attractors that cannot be accurately approximated using models based on local linearization.
There are several algorithms in the literature that can computationally approximate subsets of these quantities. Three examples are Generalized Laplace Analysis (GLA) [12, 14, 15], the Ulam Galerkin Method [16, 17], and Dynamic Mode Decomposition (DMD) [4, 18, 13]. None of these techniques require explicit governing equations, so all, in principle, can be applied directly to data. GLA can approximate both the Koopman modes and eigenfunctions, but it requires knowledge of the eigenvalues to do so [12, 14, 15]. The Ulam Galerkin method has been used to approximate the eigenfunctions and eigenvalues , though it is more frequently used to generate finite dimensional approximations of the Perron–Frobenius operator, which is the adjoint of the Koopman operator. Finally, DMD has been used to approximate the Koopman modes and eigenvalues [13, 18], but not the Koopman eigenfunctions.
Even in pairs instead of triplets, approximations of these quantities are useful. DMD and its variants [19, 20, 21] have been successfully used to analyze nonlinear fluid flows using data from both experiments and computation [4, 22, 23]. GLA and similar methods have been applied to extract meaningful spatio-temporal structures using sensor data from buildings and power systems [24, 25, 26, 27]. Finally, the Ulam Galerkin method has been used to identify coherent structures and almost invariant sets [28, 29, 30] based on the singular value decomposition of (a slight modification of) the Koopman operator.
In this manuscript, we present a data driven method that approximates the leading Koopman eigenfunctions, eigenvalues, and modes from a data set of successive “snapshot” pairs and a dictionary of observables that spans a subspace of the scalar observables. There are many possible ways to choose this dictionary, and it could be comprised of polynomials, Fourier modes, spectral elements, or other sets of functions of the full state observable. We will argue that this approach is an extension of DMD that can produce better approximations of the Koopman eigenfunctions; as such, we refer to it as Extended Dynamic Mode Decomposition (EDMD). One regime where the behavior of both EDMD and DMD can be formally analyzed and contrasted is in the limit of large data. In this regime, we will show that the numerical approximation of the Koopman eigenfunctions generated by EDMD converges to the numerical approximation we would obtain from a Galerkin method  in that the residual is orthogonal to the subspace spanned by the elements of the dictionary. With finite amounts of data, we will demonstrate the effectiveness of EDMD on two deterministic examples: one that highlights the quantitative accuracy of the method, and a more practical application.
Because EDMD is an entirely data driven procedure, it can also be applied to data from stochastic systems without any algorithmic changes. If the underlying system is a Markov process, we will show that EDMD approximates the eigenfunctions of the Kolmogorov backward equation [32, 33], which has been called the stochastic Koopman operator (SKO) . Once again, we will demonstrate the effectiveness of the EDMD procedure when the amount of data is limited by applying it to two stochastic examples: the first to test the accuracy of the method, and the second to highlight a potential application of EDMD as a nonlinear manifold learning technique. In the latter example, we highlight two forms of model reduction: reduction that occurs when the dynamics of the system state are constrained to a low-dimensional manifold, and reduction that occurs when the statistical moments of the stochastic dynamical system are effectively low dimensional.
In the remainder of the manuscript, we will detail the EDMD algorithm and show (when mathematically possible) or demonstrate through examples that it accurately approximates the leading Koopman eigenfunctions, eigenvalues, and modes for both deterministic and stochastic sets of data. In particular, in Sec. 2 the EDMD algorithm will be presented, and we will prove that it converges to a Galerkin approximation of the Koopman operator given a sufficiently large amount of data. In Sec. 3, we detail three choices of dictionary that we have found to be effective in a broad set of applications. In Sec. 4, we will demonstrate that the EDMD approximation can be accurate even with finite amounts of data, and can yield useful parameterizations of common dynamical structures such as systems with multiple basins of attraction when the underlying system is deterministic. In Sec. 5, we experiment by applying EDMD to stochastic data and show it approximates the eigenfunctions of the SKO for Markov processes. Though the interpretation of the eigenfunctions now differs, we demonstrate that they can still be used to accomplish useful tasks such as the parameterization of nonlinear manifolds. Finally, some brief concluding remarks are given in Sec. 6.
2 Dynamic Mode Decomposition and the Koopman Operator
Our ambition in this section is to establish the connection between the Koopman operator and what we call EDMD. To accomplish this, we will define the Koopman operator in Sec. 2.1. Using this definition, we will outline the EDMD algorithm in Sec. 2.2, and then show how it can be used to approximate the Koopman eigenvalues, eigenfunctions, and modes. Next, in Sec. 2.3, we will prove that the EDMD method almost surely converges to a Galerkin method in the limit of large data. Finally, in Sec. 2.4, we will highlight the connection between the EDMD algorithm and standard DMD.
2.1 The Koopman Operator
Because the Koopman operator is central to all that follows, we will define it along with the properties relevant to EDMD in this subsection. No new mathematical results are presented here; our only objective is to include for completeness the terms and definitions we will require later in the paper. To do so, we require the autonomous, discrete time dynamical system , where is the state space, is (discrete) time, and is the evolution operator. Unlike , which acts on , the Koopman operator, , acts on functions of state space, with . The action of the Koopman operator is,
In essence, the Koopman operator defines a new dynamical system, , that governs the evolution of observables, , in discrete time. In what follows, we assume that , where is a positive, single valued analytic function defined on , but not necessarily an invariant measure of the underlying dynamical system. This assumption, which has been made before in the literature [12, 11], is required so that the inner products in the Galerkin-like method we will present can be taken. Because it acts on functions, is infinite dimensional even when is finite dimensional, but it is also linear even when is nonlinear. The infinite dimensional nature of the Koopman operator is potentially problematic, but if it can, practically, be truncated without too great a loss of accuracy (e.g., if the system has multiple time scales), then the result would be a linear and finite dimensional approximation. Therefore, the promise of the Koopman approach is to take the tools developed for linear systems and apply them to the dynamical system defined by the Koopman operator; thus obtaining a linear approximation of a nonlinear system without directly linearizing around a fixed point.
The dynamical system defined by and the one defined by are two different parameterizations of the same fundamental behavior. The link between these parameterizations is the “full state observable,” and , the set of “tuples” of Koopman eigenvalues, eigenfunctions, and modes required to reconstruct the full state. Note that could (and often will) be infinite. Although is a vector valued observable, each component of it is a scalar valued observable, i.e., where is the -th component of . Assuming is in the span of our set of eigenfunctions, with . Then can be obtained by “stacking” these weights into vectors (i.e., ). As a result,
where is the -th Koopman mode, and is the -th Koopman eigenfunction. In doing this, we have assumed that each of the scalar observables that comprise are in the subspace of spanned by our eigenfunctions, but we have not assumed that the eigenfunctions form a basis for .
The system state at future times can be obtained either by directly evolving or by evolving the full state observable through Koopman:
This representation of is particularly advantageous because the dynamics associated with each eigenfunction are determined by its corresponding eigenvalue.
Figure 1 shows a commutative diagram that acts as a visual summary of this section. The top row shows the direct evolution of states, , governed by ; the bottom row shows the evolution of observables, , governed by the Koopman operator. Although and act on different spaces, they encapsulate the same dynamics. For example, once given a state , to compute one could either take the observable , apply , and evaluate it at (the bottom route), or use to compute and then evaluate at this updated position (the top route). Similarly, to compute , one could either apply to (the top route) or apply to the full state observable and evaluate (the bottom route). As a result, one can either choose to work with a finite dimensional, nonlinear system or an infinite dimensional, linear system depending upon which “path” is simpler/more useful for a given problem.
2.2 Extended Dynamic Mode Decomposition
In this subsection, we outline Extended Dynamic Mode Decomposition (EDMD), which is a method that approximates the Koopman operator and therefore the Koopman eigenvalue, eigenfunction, and mode tuples defined in Sec. 2.1. The EDMD procedure requires: (a) a data set of snapshot pairs, i.e., that we will organize as a pair of data sets,
where and are snapshots of the system state with , and (b) a dictionary of observables, where , whose span we denote as ; for brevity, we also define the vector valued function where
The data set needed is typically constructed from multiple short bursts of simulation or from experimental data. For example, if the data were given as a single time series, then for a given snapshot , is the next snapshot in the time series. The optimal choice of dictionary elements remains an open question, but a short discussion including some pragmatic choices will be given in Sec. 3. For now, we assume that is “rich enough” to accurately approximate a few of the leading Koopman eigenfunctions.
2.2.1 Approximating the Koopman Operator and its Eigenfunctions
Now we seek to generate , a finite dimensional approximation of . By definition, a function can be written as
the linear superposition of the elements in the dictionary with the weights . Because is typically not an invariant subspace of ,
which includes the residual term . To determine , we will minimize
where is the -th snapshot in , and is the -th snapshot in . Equation 8 is a least squares problem, and therefore cannot have multiple isolated local minima; it must either have a unique global minimizer or a continuous family (or families) of minimizers. As a result, regularization (here via the truncated singular value decomposition) may be required to ensure the solution is unique, and the that minimizes (8) is:
where denotes the pseudoinverse and
with . As a result, is a finite dimensional approximation of that maps to some other by minimizing the residuals at the data points. As a consequence, if is the -th eigenvector of with the eigenvalue , then the EDMD approximation of an eigenfunction of is
Finally, in many applications the discrete time data in and are generated by a continuous time process with a sampling interval of . If this is the case, we define to approximate the eigenvalues of the continuous time system. In the remainder of the manuscript, we denote the eigenvalues of with the and (when applicable) the approximation of the corresponding continuous time eigenvalues as . Although both embody the same information, one choice is often more natural for a specific problem.
2.2.2 Computing the Koopman Modes
Next, we will compute approximations of the Koopman modes for the full state observable using EDMD. Recall that the Koopman modes are the weights needed to express the full state in the Koopman eigenfunction basis. As such, we will proceed in two steps: first, we will express the full state observable using the elements of ; then, we will find a mapping from the elements of to the numerically computed eigenfunctions. Applying these two steps in sequence will yield the observables expressed as a linear combination of Koopman eigenfunctions, which is, by definition, the Koopman modes for the full state observable.
Recall that the full state observable, , is a vector valued observable (e.g., ) that can be generated by “stacking” scalar valued observables, , as follows:
where is the -th unit vector in . At this time, we conveniently assume that all so that , where is some appropriate vector of weights. If this is not the case, approximate Koopman modes can be computed by projecting onto , though the accuracy and usefulness of this fit clearly depends on the choice of . To avoid this issue, for in all examples that follow. In either case, the entire vector valued observable can be expressed (or approximated) in this manner as
Next, we will express the in terms of all the , which are our numerical approximations of the Koopman eigenfunctions. For notational convenience, we define the vector–valued functions , where
where is the -th eigenvector of associated with . Therefore, we can determine the as a function of by inverting . Because is a matrix of eigenvectors, its inverse is
where is the -th Koopman mode. This is the formula for the Koopman modes that we desired.
In summary, EDMD requires a data set of snapshot pairs, , that we represent as two data sets, and , as well as a dictionary of observables, . Furthermore, it assumes that the leading Koopman eigenfunctions are (nearly) contained within , the subspace spanned by the elements of . With this information, a finite dimensional approximation of the Koopman operator, , can be computed using (9). The eigenvalues of are the EDMD approximations of the Koopman eigenvalues. The right eigenvectors of generate the approximations of the eigenfunctions, and the left eigenvectors of generate the approximations of the Koopman modes.
2.3 Convergence of the EDMD Algorithm to a Galerkin Method
In this subsection, we relate EDMD to the Galerkin methods one would use to approximate the Koopman operator with complete information about the underlying dynamical system. In this context, a Galerkin method is a weighted residual method where the residual, as defined in (7), is orthogonal to the span of . In particular, we show that the EDMD approximation of the Koopman operator converges to the approximation that would be obtained from a Galerkin method when becomes sufficiently large and if: (1) the elements of are drawn from a distribution on with density , and (2) . The first assumption defines a process for adding new data points to our set, and could be replaced with other sampling schemes. The second assumption is required so that the inner products in the Galerkin method converge, which is relevant for problems where .
If EDMD were a Galerkin method, then the entries of and in (9) would be defined as
where is the inner product, and the finite dimensional Galerkin approximation of the Koopman operator would be . The performance of this method certainly depends upon the choice of and , but it is nevertheless a Galerkin method as the residual would be orthogonal to [31, 34]. There are nontrivial questions about what sets of and what measures, , are required if the Galerkin method is to generate a useful approximation of the Koopman operator (e.g., when can we “trust” our eigenfunctions if is compactly supported but ?), but they are beyond the scope of this manuscript and will be the focus of future work.
For a finite , the -th element of is
|where the bar denotes the sample mean. Similarly,|
When is finite, (19) is approximated by (20). However by the law of large numbers, the sample means almost surely converge to the expectation when the number of samples, , becomes sufficiently large. For this system, the expectation can be written as
which reintroduces the integrals in (19). As a result, the entries of and converge to the values they would have if the integrals were taken analytically, and therefore, the output of the EDMD procedure will converge to the output of a Galerkin method. With randomly distributed initial data, the needed integrals are computed using Monte-Carlo integration, and the rate of convergence will be . Other sampling choices, such as placing points on a uniform grid, effectively use different quadrature rules and could therefore obtain a better rate of convergence.
2.4 Relationship with DMD
When is not large, EDMD will not be an accurate Galerkin method because the quadrature errors generated by the Monte-Carlo integrator will be significant, and so the residual will probably not be orthogonal to . However, it is still formally an extension of DMD, which has empirically been shown to yield meaningful results even without exhaustive data sets. In this section, we show that EDMD is equivalent to DMD for a very specific – and restrictive – choice of because EDMD and DMD will produce the same set of eigenvalues and modes for any set of snapshot pairs.
Because there are many conceptually equivalent but mathematically different definitions of DMD, the one we adopt here is taken from Ref. , which defines the DMD modes as the eigenvectors of the matrix
where the -th mode is associated with the -th eigenvalue of , . is constructed using the data matrices in (4), where again denotes the pseudoinverse. This definition is a generalization of preexisting DMD algorithms [4, 35], and does not require the data to be in the form of a single time series.
Now, we will prove that the Koopman modes computed using EDMD with , which is the special (if relatively restrictive) choice of dictionary alluded to earlier, are equivalent to the DMD modes by showing that the -th Koopman mode, , is also an eigenvector of and, hence, a DMD mode. Because the elements of the full state observable are the dictionary elements, in (14). Then, the Koopman modes are the complex conjugates of the left eigenvectors of , so . Furthermore, and . Then
Therefore, , and all the Koopman modes computed by EDMD are eigenvectors of and, thus, the DMD modes. Once again, the choice of dictionary is critical; EDMD and DMD are only equivalent for this very specific , and other choices of will generate different (and potentially more useful) results.
Conceptually, DMD can be thought of as producing an approximation of the Koopman eigenfunctions using the set of linear monomials as basis functions for , which is analogous to a one–term Taylor expansion. For problems where the eigenfunctions can be approximated accurately using linear monomials (e.g., in some small neighborhood of a stable fixed point), then DMD will produce an accurate local approximation of the Koopman eigenfunctions. However, this is certainly not the case for all systems (particularly beyond the region of validity for local linearization). EDMD can be thought of as an extension of DMD that retains additional terms in the expansion, where these additional terms are determined by the elements of . The quality of the resulting approximation is governed by , and therefore, depends upon the choice of . However, the hope is that a more extensive will produce a superior approximation of the Koopman eigenfunctions compared to the one produced by DMD simply it uses a larger . As a result, with the right choice of , EDMD should be applicable to a broader array of problems where the implicit choice of made by DMD results in less accuracy than desired even if is not large enough for EDMD to be an accurate Galerkin method.
3 The Choice of the Dictionary
As with all spectral methods, the accuracy and rate of convergence of EDMD depends on , whose elements, which we refer to as trial or basis functions, span the subspace of observables, . Possible choices for the elements of include: polynomials , Fourier modes , radial basis functions , and spectral elements , but the optimal choice of basis functions likely depends on both the underlying dynamical system and the sampling strategy used to obtain the data. Any of these sets are, in principle, a useful choice for , though some care must be taken on infinite domains to ensure that any needed inner products will converge.
Choosing for EDMD is, in some cases, more difficult than selecting a set of basis functions for use in a standard spectral method because the domain on which the underlying dynamical system is defined, , is not necessarily known. Typically, we can define so that it contains all the data in and , e.g., pick to be a “box” in that contains every snapshot in and . Next, we choose the elements of to be a basis for where is the space of functions that map . Because , this choice of can be used in the EDMD procedure, but there is no guarantee that the elements of form a basis for as there may be redundancies. The potential for these redundancies and the numerical issues they generate is why regularization and hence the pseudoinverse  is required in (9). An example of these redundancies and their effects is given in App. A.
|Hermite Polynomials||Problems defined on|
|Radial Basis Functions||Problems defined on irregular domains|
|Discontinuous Spectral Elements||Large problems where a block-diagonal is beneficial/computationally important|
Although the optimal choice of is unknown, there are three choices that are broadly applicable in our experience. They are: Hermite polynomials, radial basis functions (RBFs), and discontinuous spectral elements. The Hermite polynomials are the simplest of the three sets, and are best suited to problems defined on if the data in are normally distributed. The observables that comprise are products of the Hermite polynomials in a single dimension (e.g., where is the -th Hermite polynomial and ). This set of basis functions is simple to implement, and conceptually related to approximating the Koopman eigenfunctions with a Taylor expansion. Furthermore, because they are orthogonal with respect to Gaussian weights, will be diagonal if the are drawn from a normal distribution, which can be beneficial numerically.
An alternative to the Hermite polynomials are discontinuous spectral elements. To use this set, we define a set of boxes, , such that . Then, on each of the , we define (suitably transformed) Legendre polynomials. For example, in one dimension, each basis function is of the form
where is the -th Legendre polynomial, and is transformed such that the “edges” of the box are at . The advantage of this basis is that will be block diagonal, and therefore easy to invert even if a very large number of basis functions are employed.
With a fixed amount of data, an equally difficult task is choosing the ; the number and arrangement of the is a balance between span of the basis functions (i.e., -type convergence), which increases as the number of boxes is increased, and the accuracy of the quadrature rule, which decreases because smaller boxes contain fewer data points. To generate a covering of , we use a method similar to the one used by GAIO . Initially, all the data (i.e., and ) are contained within a single user selected box, . If this box contains more than a pre-specified number of data points, it is subdivided into domains of equation measure (e.g., in one dimension, ). We then proceed recursively, if any of contain more than a pre-specified number of points, then they too are subdivided; this proceeds until no box has an “overly large” number of data points. Any that do not contain any data points are pruned, which after iterates leaves the set of subdomains, , on which we define the Legendre polynomials. The resulting trial functions are compactly supported and can be evaluated efficiently using trees, where is the dimension of a snapshot. Finally, the higher order polynomials used here allow for more rapid -type convergence if the eigenfunctions happen to be smooth.
The final choice of trial functions is a set of radial basis functions (RBFs), which appeal to previous work on “mesh-free” methods . Because these methods do not require a computational grid or mesh, they are particularly effective for problems where has what might be called a complex geometry. Many different RBFs could be effective, but one particularly useful set of RBFs are the thin plate splines [36, 41] because they do not require the scaling parameter that other RBFs (e.g., Gaussians) do. However, we still must choose the “centers” about which the RBFs are defined, which we do with -means clustering  with a pre-specified value of on the combined data set. Although we make no claims of optimality, in our examples, the density of the RBF centers appears to be directly related to the density of data points, which is, intuitively, a reasonable method for distributing the RBF centers as regions with more samples will also have more spatial resolution.
There are, of course, other dictionaries that may prove more effective in other circumstances. For example, basis functions defined in polar coordinates are useful when limit cycles or other periodic orbits are present as they mimic the form of the Koopman eigenfunctions for simple limit cycles . How to choose the best set of trial functions is an important, yet open, question; fortunately, the EDMD method often produces useful results even with the relatively naive choices of trial functions presented in this section.
4 Deterministic Data and the Koopman Eigenfunctions
Most applications of DMD assume that the data sets were generated by a deterministic dynamical system. In Sec. 2, we showed that EDMD produces an approximation of the Koopman eigenfunctions, eigenvalues, and modes with large amounts of data. In this section, we demonstrate that EDMD can produce accurate approximations of the Koopman eigenfunctions, eigenvalues, and modes with limited amounts of data by applying the method to two illustrative examples. The first is a discrete time linear system, one where the eigenfunctions, eigenvalues, and modes are known analytically, and serves as a test case for the method. The second is the unforced Duffing oscillator. Our goal there is to demonstrate that the approximate Koopman eigenfunctions obtained via EDMD have the potential to serve as a data driven parameterization of a system with multiple basins of attraction.
4.1 A Linear Example
4.1.1 The Governing Equation, Data, and Analytically Obtained Eigenfunctions
One system where the Koopman eigenfunctions are known analytically is a simple LTI system of the form
with and . It is clear that an eigendecomposition yields complete information about the underlying system provided has a complete set of eigenvectors. Because the underlying dynamics are linear, it should not be surprising that the Koopman approach contains the eigendecomposition.
To show this, note that the action of the Koopman operator for this problem is
where . Assuming has a complete set of eigenvectors, it will have left eigenvectors, , that satisfy , and where the -th eigenvector is associated with the eigenvalue . Then, the function
is an eigenfunction of the Koopman operator with the eigenvalue for . This is a well known result (see, e.g., Ref. ), so we show the proof of this only for the sake of completeness. We proceed directly:
where the -th Koopman mode, , is the -th eigenvector of suitably scaled so that . This is identical to writing in terms of the eigenvectors of ; inner products with the left eigenvectors determine the component in each direction, and the (right) eigenvectors allow the full state to be reconstructed.
As a concrete example, consider
where . From (27), the Koopman eigenfunctions and eigenvalues are
for . Figure 2 shows the first 8 nontrivial eigenfunctions sorted by their associated eigenvalue. The 0-th eigenfunction, with , was omitted because it is always an eigenfunction and will be recovered by EDMD if is included as a dictionary element.
To apply the EDMD procedure, one needs both data and a dictionary of observables. The data in consists of 100 normally distributed initial conditions, , and their images, , which we aggregate in the matrix , i.e., . The dictionary, , is chosen to contain the direct sum of Hermite polynomials in and that include up to the 5th order terms in and , i.e.,
where has been written this way to make the ordering of the dictionary elements apparent. The Hermite polynomials were chosen because they are an appropriate basis for Cauchy problems, and orthogonal with respect to the weight function, , that is implicit in the normally distributed sampling strategy used here.
Figure 3 shows the same 8 eigenfunctions computed using the EDMD method. Overall, there is (as expected) excellent quantitative agreement, both in the eigenvalues and the eigenfunctions, with the analytical results presented in Fig. 2. On the domain shown, the eigenvalues are accurate to 10 digits, and the maximum pointwise difference between the true and computed eigenfunction is . In this problem, standard DMD also generates highly accurate approximations of and and their associated eigenvalues, but will not produce any of the other eigenfunctions; the standard choice of the dictionary only contains linear terms and, therefore, cannot reproduce eigenfunctions with constant terms or any nonlinear terms. As a result, expanding the basis allows EDMD to capture more of the Koopman eigenfunctions than standard DMD could. These additional eigenfunctions are not necessary for an LTI system, but are in principle needed in nonlinear settings.
This level of accuracy is in large part because the first nine eigenfunctions are in , the subspace of observables spanned by . When this is not the case, the result is either a missing or erroneous eigenfunctions like the examples shown in Fig. 4. The 9-th eigenfunction, with , is not captured by EDMD with the dictionary chosen here because it lacks the needed 5-th order monomials in and , which is similar to how DMD skips the 2nd Koopman eigenfunction due to a lack of quadratic terms.
The erroneous eigenfunction appears because is not invariant with respect to the action of the Koopman operator. In particular, contains the term whose image because . In most applications, there are small components of the eigenfunction that cannot be represented in the dictionary chosen, which results in errors in the eigenfunction such as the one seen here. Even in the limit of infinite data, we would compute the eigenfunctions of , where is the projection onto , rather than the eigenfunction of . To see that this not a legitimate eigenfunction, we added and to , which removes this erroneous eigenfunction.
Finally, we compute the Koopman modes for the full state observable. Using the ordering of the dictionary elements given in (31), the weight matrix, in (14), needed to compute the Koopman modes for the full state observable, is
The Koopman modes associated with is , while the Koopman mode associated with is ; again, these are the eigenvectors of . The contribution of the other numerically computed eigenfunctions in reconstructing the full state observable is negligible (i.e., for ), so the Koopman/EDMD analysis is an eigenvalue/eigenvector decomposition once numerical errors are taken into consideration.
Although EDMD reveals a richer set of Koopman eigenfunctions that are analytically known to exist, their associated Koopman modes are zero and, hence, they can be neglected. Our goal in presenting this example is not to demonstrate any new phenomenon, but rather to demonstrate that there is good quantitative agreement between the analytically obtained Koopman modes, eigenvalues, and eigenfunctions and the approximations produced by EDMD. Furthermore, it allowed us to highlight the types of errors that appear when is not an invariant subspace of , which results in erroneous eigenfunctions, or when the dictionary is missing elements, which results in missing eigenfunctions.
4.2 The Duffing Oscillator
In this section, we will compute the Koopman eigenfunctions for the unforced Duffing Oscillator, which for the parameter regime of interest here, has two stable spirals and a saddle point whose stable manifold defines the boundary between the basins of attraction. Following Ref.  and the references contained therein, the eigenvalues of the linearizations about the fixed points in the system are known to be a subset of the Koopman eigenvalues, and for each stable spiral, the magnitude and phase of the associated Koopman eigenfunction parameterizes the relevant basin of attraction. Additionally, because basins of attraction are forward invariant sets, there will be two eigenfunctions with , each of which is supported on one of the two basins of attraction in this system (or, equivalently, there will be a trivial eigenfunction and another eigenfunction with whose level sets denote the basins of attraction). Ultimately, we are not interested in recovering highly accurate eigenfunctions in this example. Instead, we will demonstrate that the eigenfunctions computed by EDMD are accurate enough that they can be used to identify and parameterize the basins of attraction that are present in this problem for the region of interest.
The governing equations for the unforced Duffing Oscillator are
which we will study using the parameters , , and . In this regime, there are two stable spirals at with , and a saddle at , so almost every initial condition (except for those on the stable manifold of the saddle) is drawn to either of the spirals. In what follows, the data consist of trajectories with 11 samples each with a sampling interval of (i.e., ), and initial conditions uniformly distributed over . With this sampling rate and initialization scheme, many trajectories will approach the stable spirals, but few will have (to numerical precision) reached the fixed points. As a result, the basins of attraction cannot be determined by observing the last snapshot in a given trajectory. Instead, EDMD will be used to “stitch” together this ensemble of trajectories to form a single coherent picture.
However, because there are multiple basins of attraction, the leading eigenfunctions will be discontinuous , and supported only on the appropriate basin of attraction. In principle, our computation could be done “all at once” using a single and applying EDMD to the complete data set. To enforce the compactly supported nature of the eigenfunctions regardless of which dictionary we use, we will proceed in a two-tiered fashion. First, the basins of attraction will be identified using all of the data and a dictionary with support everywhere we have data. Once we have identified these basins, both state space and the data will be partitioned into subdomains based on the numerically identified basins. The EDMD procedure will then be run on each subdomain and the corresponding partitioned data set individually.
4.2.1 Locating Basins of Attraction
Figure 5 highlights the first step: partitioning of state space into basins of attraction. We used a dictionary consisting of the constant function and 1000 radial basis functions (RBFs) (the thin plate splines described in Sec. 3), where -means clustering  on the full data set was used to choose the RBF centers. RBFs were chosen here because of the geometry of the computational domain; indeed, RBFs are often a fundamental component of “mesh-free” methods that avoid the nontrivial task of generating a computational mesh .
The leading (continuous time) eigenvalue is which corresponds to the constant function. The second eigenfunction, shown in the leftmost image of Fig 5, has , which should be considered an approximation of zero. The discrepancy between the numerically computed eigenfunction and the theoretical one is due to the choice of the dictionary. The analytical eigenfunction possesses a discontinuity on the edge of the basin of attraction (i.e., the stable manifold of the saddle point at the origin), but discontinuous functions are not in the space spanned by RBFs. Therefore, the numerically computed approximation “blurs” this edge as shown in Fig 5.
The scatter plot in the center of Fig. 5 shows the data points colored by the 1st nontrivial eigenfunction. There is good qualitative agreement between the numerically computed basin of attraction and the actual basin. By computing the mean value of on the data and using that value as the threshold that determines which basin of attraction a point belongs to, the EDMD approach mis-classifies only 46 of the data points, resulting in an error of only 0.5% as shown by the rightmost plot. As a result, the leading eigenfunctions computed by EDMD are sufficiently accurate to produce a meaningful partition of the data.
4.2.2 Parameterizing a Basin of Attraction
Now that the basins of attraction have been identified, the next task is to develop a coordinate system or parameterization of the individual basins. To do so, we will use the eigenfunctions associated with the eigenvalues of the system linearization about the corresponding fixed point. Because these fixed points are spirals, this parameterization can be realized using the amplitude and phase of one member of the complex-conjugate pair of eigenfunctions. To approximate these eigenfunctions, we first partition our data into two sets as mentioned above using the leading Koopman eigenfunctions (including the mis-classified data points). On each subset of the data, the -means procedure was run again to select a new set of 1000 RBF centers, and this “adjusted” basis along with the constant function comprised the used by EDMD. Figure 6 shows the amplitude and phase of the eigenfunction with eigenvalue closest to computed using the data in each basin of attraction. The computed eigenvalues agree favorably with the analytically obtained eigenvalue; the basin of the spiral at has the eigenvalue , and the basin of the spiral at has the eigenvalue .
Figure 6 demonstrates that the amplitude and phase of a Koopman eigenfunction forms something analogous to an “action–angle” parameterization of the basin of attraction. Due to the nonlinearity in the Duffing oscillator, this parameterization is more complicated than an appropriately shifted polar coordinate system, and is, therefore, not the parameterization that would be generated by linearization about either . The level sets of the amplitude of this eigenfunction are the so-called “isostables” . One feature predicted in that manuscript is that the 0-level set of the isostable is the fixed point in the basin of attraction; this feature is reflected in Fig. 6 by the blue region, which corresponds to small values of the eigenfunction that are near the fixed points at . Additionally, a singularity in the phase can be observed there. The EDMD approach produces noticeable numerical errors near the edges of the basin. These errors can be due to a lack of data, or to the singularities in the eigenfunctions that can occur at unstable fixed points .
In this section, we applied the EDMD procedure to deterministic systems and showed that it produces an approximation of the Koopman operator. With a sensible choice of data and , we showed that EDMD generates a quantitatively accurate approximation of the Koopman eigenvalues, eigenfunctions, and modes for the linear example. In the second example, we used the Koopman eigenfunctions to identify and parameterize the basins of attraction of the Duffing Oscillator. Although the EDMD approximation of the eigenfunctions could be made more accurate with more data, it is still accurate enough to serve as an effective parameterization. As a result, the EDMD method can be useful outside of the large data limit, and should be considered an enabling technology for data driven approximations of the Koopman eigenvalues, eigenfunction, and modes.
5 Stochastic Data and the Kolmogorov Backward Equation
The EDMD approach is entirely data driven, and will produce an output regardless of the nature of the data given to it. However, if the results of EDMD are to be meaningful, then certain assumptions must be made about the dynamical system that produced the data used. In the previous section, it was assumed the data were generated by a deterministic dynamical system; as a result, EDMD produced approximations of the tuples of Koopman eigenfunctions, eigenvalues, and modes.
Another interesting case to consider is if the underlying dynamical system is a Markov process, such as a stochastic differential equation (SDE). For such systems, the evolution of an observable is governed by the Kolmogorov backward (KB) equation , whose “right hand side” has been called the “stochastic Koopman operator” (SKO) . In this section, we will show that EDMD produces approximations of the eigenfunctions, eigenvalues, and modes of the SKO if the underlying dynamical system happens to be a Markov process.
To accomplish this, we will prove that the EDMD method converges to a Galerkin method in the large data limit. After that, we will demonstrate its accuracy with finite amounts of data by applying it to the model problem of a 1D SDE with a double well potential, where the SKO eigenfunctions can be computed using standard numerical methods.
Another proposed application of the Koopman operator is for the purposes of model reduction, which as been explored in Ref. . Model reduction based on the Koopman eigenfunctions is equally applicable in both deterministic and stochastic settings, but we choose to present it for stochastic systems to highlight the similarities between EDMD and manifold learning techniques such as diffusion maps [45, 46]. In particular, we apply EDMD to an SDE defined on a “Swiss Roll,” which is a nonlinear manifold often used to test manifold learning methods . The purpose of this example is twofold: first, we show that a data driven parameterization of the Swiss Roll can be obtained using EDMD, and second, we show that this parameterization will preferentially capture “slow” dynamics on that manifold before the “fast” dynamics when the noise is made anisotropic.
5.1 EDMD with Stochastic Data
For a discrete time Markov process,
the SKO  is defined as
where is an element in the probability space associated with the stochastic dynamics (), denotes the expected value over that space, and is a scalar observable. The SKO  takes an observable of the full system state and returns the conditional expectation of the observable “one timestep in the future.” Note that this definition is compatible with the deterministic Koopman operator because if is deterministic.
As with the deterministic case, we assume the snapshots in were generated by randomly placing initial conditions on with the density of and that is sufficiently large. Once again, does not need to be an invariant measure of the underlying dynamical system; it is simply the sampling density of the data. Due to the stochastic nature of the system, there are two probability spaces involved: one related to the samples in and another for the stochastic dynamics. Because our system has “process” rather than “measurement” noise, the are known exactly, and the interpretation of the Gram matrix, , remains unchanged. Therefore,
by the law of large numbers when is large enough. This is identical to the deterministic case. However, the definition of will change. Assuming that the choice of and are independent,
The elements of now contain a second integral over the probability space that pertains to the stochastic dynamics, which produces the expectation of the observable in the expression above.
The accuracy of the resulting method will depend on the dictionary, , the manifold on which the dynamical system is defined, the data, and the dynamics used to generate it. One interesting special case is if the basis functions are indicator functions supported on “boxes.” When this is the case, EDMD is equivalent to the widely used Ulam Galerkin method [39, 17]. This equivalence is lost for other choices of and , but as we will demonstrate in the subsequent sections, EDMD can produce accurate approximations of the eigenfunctions for many other choices of these quantities.
The “stochastic Koopman modes” can then be computed using (18), but they too must be reinterpreted as the weights needed to reconstruct the expected value of the full state observable using the eigenfunctions of the SKO. Due to the stochastic nature of the dynamics, the Koopman modes can no longer exactly specify the state of the system. However, they can be used as approximations of the Koopman modes that would be obtained in the “noise free” limit when some appropriate restrictions are placed on the nature of the noise and the underlying dynamical system. Indeed, these are the modes we are truly computing when we apply DMD or EDMD to experimental data, which by its very nature, contains some noise.
5.2 A Stochastic Differential Equation with a Double Well Potential
In this section, we will show that the EDMD procedure is capable of accurately approximating the eigenfunctions of the stochastic Koopman operator by applying it to an SDE with a double well potential. Although we do not have analytical solutions for the eigenfunctions, the problem is simple enough that we can accurately compute them using standard numerical methods.
5.2.1 The Double Well Problem and Data
First, consider an SDE with a double well potential. Let the governing equations for this system be
where is the state, the drift, and is the (constant) the diffusion coefficient. Furthermore, no flux boundary conditions are imposed at . For this problem, we let as shown in Fig. 7.
The Fokker-Planck equation associated with this SDE is
where is a probability density with due to the no-flux boundary conditions we impose, and is the Perron-Frobenius operator. The adjoint of the Perron-Frobenius operator determines the Kolmogorov backward equation, and thus defines the stochastic Koopman operator, . For this example,
with Neumann boundary conditions, . To directly approximate the Koopman eigenfunctions, (37) is discretized in space using a second order finite difference scheme with 1024 interior points. The eigenvalues and eigenvectors of the resulting finite dimensional approximation of the Koopman operator will be used to validate the EDMD computations.
The data are initial points on drawn from a uniform distribution, which constitute , and their positions after , which constitute . The evolution of each initial condition was accomplished through steps of the Euler–Maruyama method [48, 49] with a timestep of using the double well potential in Fig. 7. The dictionary chosen is a discontinuous spectral element basis that splits into four equally sized subdomains with up to tenth order Legendre polynomials on each subdomain (see Sec. 3) for a total of forty degrees of freedom.
5.2.2 Recovering the Koopman Eigenfunctions and Eigenvalues
Because the Koopman operator is infinite dimensional, we will clearly be unable to approximate all of the tuples. Instead, we focus on the leading (i.e., most slowly decaying) tuples, which govern the long–term dynamics of the underlying system. In this example, we seek to demonstrate that our approximation is: (a) quantitatively accurate, and (b) valid over a range of coefficients, , and not solely in the small (or large) noise limits.
Figure 8 shows the first six eigenvalues obtained using a finite difference discretization of the Koopman operator and the EDMD approximation as a function of . In this problem, , is always an eigenfunction of the Koopman operator with for all values of . Because contains the constant function, it should be no surprise the EDMD method is able to identify it as an eigenfunction. Though trivial, the existence of this eigenfunction is a “sanity check” for the method.
More interesting is the first nontrivial eigenfunction which has the eigenvalue . The change in as a function of is shown in Fig. 9. As with the eigenvalue, there is good agreement between EDMD and the directly computed eigenfunctions at different values of . For all values of , is an odd function; what changes is how rapidly transitions from its maximum to its minimum. When is small, this transition is rapid, and will approach a step function as . When grows, this eigenfunction is “smoothed out” and the transition becomes slower. In the limit as , the dynamics of the problem are dominated by the diffusion term, and will be proportional to as is implied by the rightmost plot in the figure.
In many system identification algorithms (e.g., Ref. ), one often constructs deterministic governing equations from inherently stochastic data (either due to measurement or process noise). Similarly, methods like DMD have been applied to noisy sets of data to produce an approximations of the Koopman modes and eigenvalues with the assumption that the underlying system is deterministic. In this example, this is equivalent to using the output of EDMD with data taken with as an approximation of the Koopman tuples that would be obtained with .
For certain tuples, this is a reasonable approach. Taking , and and and are good approximations of their deterministic counterparts. In particular and are one-to-one with their associated basin of attraction and appear to possess a zero at the stable fixed point. However, these approximate eigenfunctions lack some important features such as a singularity at that occurs due to the unstable fixed point there. Therefore, both eigenfunctions are good approximations of their counterparts, but cannot be “trusted” in the vicinity of an unstable fixed point.
For other tuples, even a small amounts of noise can be important. Consider the “slowest” non-zero eigenvalue, , which appears to approach as , but is not obtained by the EDMD method when . Formally, the existence of an eigenvalue of is not surprising. The fixed point at is unstable with , and in continuous time, if (, ) is an eigenvalue/eigenfunction pair then (, ) is, at least formally, an eigenvalue/eigenfunction pair for any scalar . Using an argument similar to Ref. , it can be shown that as where is chosen to normalize . However, this approaches a delta function as , and therefore leaves the subspace of observables spanned by our dictionary. When this occurs, this tuple appears to “vanish,” which is why it does not appear in the limit. As a result, when applying methods like EDMD or DMD to noisy data, the spectrum of the finite dimensional approximation is not necessarily a good approximation of the spectrum that would be obtained with noise–free data. Some of the tuples, such as those containing , , and , have eigenvalues that closely approximate the ones found in the deterministic problem. However, others such as the tuple containing , do not. Furthermore, the only method to determine that can be neglected is by directly examining the eigenfunction. As a result, when we apply methods like DMD/EDMD to noisy data with the purpose of using the spectrum to determine the time scales and behaviors of the underlying system, we must keep in mind that not all of the eigenvalues obtained with noisy data will be present if “clean” data is used instead.
5.2.3 Rate of Convergence
Among other things, the performance of the EDMD method is dependent upon the number of snapshots provided to it, the distribution of the data, the underlying dynamical system, and the dictionary. In this section, we examine the convergence of EDMD to a Galerkin method as the number of snapshots increases in order to provide some intuition about the “usefulness” of the eigenfunctions obtained without an exhaustive amount of data. To do so, we generated a larger set of data consisting of initial conditions chosen from a spatially uniform distribution for the case with . Each initial condition was propagated using the Euler-Maruyama method described in the previous section. Then we applied EDMD using the same dictionary to subsets of the data, computed the leading nontrivial eigenvalue and eigenfunction, and compared the results to the “true” leading eigenfunction and eigenvalue computed using a finite difference approximation of the stochastic Koopman operator.
Figure 10 shows the convergence of the leading nontrivial eigenvalue and eigenfunction as a function of the number of snapshots, . In the rightmost plot, we define the error as after both eigenfunctions have been normalized so that . As expected, EDMD is inaccurate when is small (here, ); there is not enough data to accurately approximate the scalar products. For , the eigenfunction produced by EDMD have the right shape, and the eigenvalue is approaching its true value. For , there is no “visible” difference in the leading eigenvalue, and the error in the leading eigenfunction is less than .
To quantify the rate of convergence, we fit a line to the plot of error versus in the right panel of Fig. 10. As expected, EDMD converges like , which is very close to the predicted value of associated with Monte–Carlo integration. Because this problem is stochastic, we cannot increase the rate of convergence by uniform sampling (the integral over the probability space associated with the stochastic dynamics will still converge like ), even though that is a simple method for enhancing the rate of convergence for deterministic problems.
To provide some intuition about what the eigenfunctions look like for a fixed value of , Fig. 11 plots the leading nontrivial eigenfunction for , 14384, and 1128837. EDMD is qualitatively accurate even at the smallest values of , but there are clearly some numerical issues at the edges of the domain and near where the discontinuities in the numerically computed eigenfunctions can occur with our choice of dictionary. To obtain a more quantitatively accurate solution, additional data points are required. When , the numerical issues at the boundaries and the discontinuity at have diminished. As shown in the plot with , this process continues until the EDMD eigenfunction is visually identical to the true eigenfunction.
5.3 Parameterizing Nonlinear Manifolds and Reducing Stochastic Dynamics
In this section, we will briefly demonstrate how the EDMD method can be used to parameterize nonlinear manifolds and reduce stochastic differential equations defined on those manifolds. Everything done here could also be done for a deterministic system; we chose to use an SDE rather than an ODE only to highlight the similarities between EDMD and methods like diffusion maps, and not because of any restriction on the Koopman approach. We proceed in two steps: first, we will show that data from an SDE defined on the Swiss Roll, which is a nonlinear manifold often used as a test of nonlinear manifold learning techniques [47, 46, 45, 52], in conjunction with the EDMD procedure can generate a data driven parameterization of that manifold. For this first example, isotropic diffusion is used, so there is no “fast” or “slow” solution component that can be meaningfully neglected. Instead, we will show that the leading eigenfunctions are one-to-one with the “length” and “width” of the Swiss Roll. Then we alter the SDE and introduce a “fast” component by making the diffusion term anisotropic. In this case, EDMD will “pick” the slower components before the faster ones. Both of these tasks can be accomplished using other methods such as Diffusion Maps (DMAPs) and its variants [46, 45, 52, 53], and it is only recently that the Koopman operator has also been applied to such problems .
5.3.1 Parameterizing a Nonlinear Manifold with a Diffusion Process
For this example, the data are generated by a diffusion process on a rectangular domain,
where is the state and is a two-dimensional Wiener process with and . No flux boundary conditions are imposed at the domain edges. If one had access to the true variables, the SKO could be written as
also with no-flux boundary conditions; in this particular problem, the SKO is self-adjoint and therefore equivalent to the Perron-Frobenius operator. The eigenfunctions should be with the eigenvalues . Note that the leading eigenfunctions, and are and , which are one-to-one with and on and respectively, and could be used to parameterize state space if and were not known.
In this example, these true data (i.e., the state expressed in terms of and ) on the rectangle are mapped onto a “Swiss Roll” via the transformation
which, among other things, has introduced a new spatial dimension. In all that follows, the EDMD approach is applied to the 3-dimensional, transformed variables and not the 2-dimensional, true variables. Our objective here is to determine a 2-parameter description of what initially appears to be 3-dimensional data, directly from the data.
The data given to EDMD were generated by initial conditions uniformly distributed in and that were evolved for a total time of using the Euler-Maruyama method with 100 timesteps. Then both the initial and terminal states of the system were mapped into 3 dimensions using (40). Next, a dictionary must be defined. However, is unknown (indeed, parameterizing