Delay-coordinate maps and the spectra of Koopman operators

Delay-coordinate maps and the spectra of Koopman operators

Suddhasattwa Das Courant Institute of Mathematical Sciences, New York University, New York, New York, USA dass@cims.nyu.edu    Dimitrios Giannakis Center for Atmosphere Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, New York, USA dimitris@cims.nyu.edu
Abstract

The Koopman operator induced by a dynamical system is inherently linear and provides an alternate method of studying many properties of the system, including attractor reconstruction and forecasting. Koopman eigenfunctions represent the non-mixing component of the dynamics. They factor the dynamics, which can be chaotic, into quasiperiodic rotation on tori. Here, we describe a method in which these eigenfunctions can be obtained from a kernel integral operator, which also annihilates the continuous spectrum. We show that incorporating a large number of delay coordinates in constructing the kernel of that operator results, in the limit of infinitely many delays, in the creation of a map into the discrete spectrum subspace of the Koopman operator. This enables efficient approximation of Koopman eigenfunctions from high-dimensional data in systems with point or mixed spectra.

1 Introduction

The tasks of dimension reduction and forecasting of time series are very common in engineering and physical sciences, where the time-series studied are often partial observations of a high-dimensional dynamical system. A classical example of such time series is data collected from the Earth’s climate system, where many of the active degrees of freedom are difficult to access via direct observations (e.g., subsurface ocean circulation). Moreover, the observations that are available typically mix together different physical processes operating on a wide range of spatial and temporal scales; e.g., the seasonal cycle and the El Niño Southern Oscillation (the latter evolving on inter-annual timescales) both have strong associated signals in sea surface temperature [1]. A direct approach to this problem is through large-scale numerical simulation of the underlying dynamics; here, the main challenge is the large number of degrees of freedom necessitating the use of parameterization schemes to represent unresolved dynamics. An alternative is parametric and non-parametric empirical low-order methods. Parametric techniques [2, 3] are based on an explicit model for the dynamics, and the problem is reduced to tuning the parameters of the model so as to fit the data. These methods are particularly useful when there is some prior knowledge of the equations of motion. Our focus in this paper will be on nonparametric techniques [4, 5, 6, 7, 8, 9, 10, 11], where the approach is to utilize only the observed data, without assuming any explicit parametric form for the underlying dynamics.

Another major classification of data-driven techniques for dynamical systems is as a state-space based or operator theoretic approach [12]. State-space based methods generally exploit the geometrical structure of the dynamical system, for example, by identifying “analog” states in past observations of the system [4, 10], by using a collection of local models on the attractor [5, 6, 7, 8], by constructing global nonlinear models for the dynamical evolution map [13], or by nonlinearly projecting the attractor to lower-dimensional Euclidean spaces and then studying the resulting reduced order system [14]. In contrast, operator-theoretic techniques [15, 16] only deal with the space of observables of the system and attempt to determine the action induced by the dynamics on this space. The advantage of this approach is that irrespective of the underlying system, the operator is always linear However, it is also infinite dimensional, so the issue of finite-dimensional approximation of (potentially unbounded) operators becomes relevant.

The operator-theoretic approach relies on the use of either the Koopman [17, 16, 18, 19, 12, 20, 21, 22, 23, 24, 11] or the Perron-Frobenius (transfer) operators [15, 25, 26, 27, 28]. The idea common to these techniques is to approximate the spectrum of the operator in question by means of either a finite partition or orbit and representing the operator as a matrix based on its action on functions supported on these finite sets. This approach can be used in conjunction with regularization techniques to perform spectral approximation of the operators. A common underlying assumption for these methods is ergodicity, which is explained below.

Ergodicity. When the underlying dynamics of the time series is ergodic, they satisfy the ergodic hypothesis, namely that long-time averages are equivalent to expectation values with respect to the invariant, ergodic measure of the dynamics [29]. This property justifies the working principle that the global properties (with respect to ) of an observable can be obtained from a time series for , namely, , where is an unobserved trajectory on the dynamical system. For our purposes, ergodicity implies that inner products between observables can be approximated by calculating time-correlations. Also, our methods rely on integral operators, and these can be approximated as matrices under the ergodic hypothesis.

We now make our assumptions more precise. Let be an -dimensional manifold equipped with its Borel -algebra. {, } is a flow on with a compact, invariant, attracting set with an invariant ergodic measure with support equal to and a compact, forward-invariant neighborhood . will be a continuous measurement function through which we collect a time-ordered data set consisting of samples , each lying in -dimensional data space. Here, , and is a fixed sampling interval.

The Koopman operator. Central to all our following discussions will be the concept of the Koopman operator. Koopman operators [30] act on observables by composition with the flow map, i.e., by time shifts. The space of square-integrable, complex-valued functions on will be our space of observables. Given an observable and time , is the operator defined as

is called the Koopman operator associated with the flow, at time . For measure-preserving systems, is unitary, and has a well-defined spectral expansion consisting in general of both discrete (pure point) and continuous parts lying in the unit circle [16]. The problems of mode decomposition and non-parametric prediction can both be stated in terms of the Koopman operator [11].

Physical measures. An important property that we need the invariant measure to have, is that it is a physical measure [31], i.e., that there is a set with Lebesgue volume measure , such that , and

(1)

The set is called the basin of the measure . We will assume that

(2)

The assumptions in (1) and (2) are not required for theorems 1, 2, and 3 to hold, but will be required later for the convergence of our data-driven methods. In fact, these assumptions guarantee that even when the attractor is a zero Lebesgue-measure set and the orbit lies in the neighborhood , but not in , our data-driven methods converge. Of course, our methods also converge if (i.e., is an ergodic flow on a compact manifold), in which case , and lies on the attractor. We will now describe an important tool for studying Koopman operators, namely their eigenfunctions.

Koopman eigenfunctions. Every eigenfunction of satisfies the following equation for some :

(3)

Koopman eigenfunctions are particularly useful for prediction and dimension reduction in dynamical systems. This is because, as seen in the equation above, the knowledge of an eigenfunction at time enables accurate predictions of up to any time , since operates on as a multiplication operator, and the multiplication constant remains equal to 1 in norm. Moreover, it is possible to construct a dimension reduction map, sending the high-dimensional data to the vector , where and the are Koopman eigenfunctions corresponding to rationally independent frequencies [16, 22, 11]. In this representation, the can be thought of as “coordinates” corresponding to distinct physical processes operating at the timescales . Also of interest (and in some cases easier to compute) are the projections of the observation map onto the Koopman eigenfunctions, called Koopman modes [16]. Data-driven techniques for computing Koopman eigenvalues, eigenfunctions, and modes that have been explored in the past include methods based on generalized Laplace analysis [17, 16], dynamic mode decomposition (DMD) [32, 18, 19, 20], extended DMD (EDMD) [21, 33], Hankel matrix analysis [20, 24, 23], and data-driven Galerkin methods [22, 11]. The latter approach, as well as the related work in [9], additionally address the problem of non-parametric prediction of observables and probability densities.

Let be the subspace of spanned by the eigenfunctions of , and its orthogonal complement. Systems in which contains non-constant functions and is non-empty are called mixed-spectrum systems. The space of a general system admits the -invariant decomposition

(4)

Kernel integral operators. The method that we will describe in this paper relies heavily on kernel integral operators. In this context, a kernel is a function , measuring the similarity between pairs of points on . Kernel functions can be of various designs, and are meant to capture the nonlinear geometric structures of data; see for example [34, 35, 36]. One advantage of using kernels is that they can be defined so as to operate directly on the data space, e.g., for some function of appropriate smoothness. Defined in this manner, can be evaluated using measured quantities without explicit knowledge of the underlying state . An integral operator with kernel acts on a function as

(5)

Strategy. We will address the eigenvalue problem for by solving it for a kernel integral operator that, in a suitable limit, commutes with . Note, in particular, that commuting operators have the same eigenspaces. In what follows, we will construct by defining from (5) using a kernel function operating on delay-coordinate mapped data [37] with delays, and performing a Markov normalization of that operator via the procedure introduced in the diffusion maps algorithm [36]. In our following main result, we claim that in the limit of infinitely many delays, , the eigenspaces of corresponding to nonzero eigenvalues are also eigenspaces of .

Theorem 1.

Let be a flow on a compact invariant set with an invariant, ergodic measure . Let be an observation map. Then, there exists a one-parameter family of real, ergodic, compact Markov operators , , with real eigenvalues, which commutes with and which is a limit of operators (also compact, Markov, and with real eigenvalues) in the operator-norm topology. The operator is obtained from a sequence of observations , for -a.e. , after application of delay-coordinate maps with delays.

Theorem 2 below is a continuation of theorem 1, and can be used to conclude some useful properties of the operator .

Theorem 2.

Let be a flow on a compact invariant metric space with an invariant, ergodic measure . Let be a compact, real, integral operator commuting with (e.g., ). Then, . Moreover, if contains non-constant functions, there exists a map for some , whose components consist of joint eigenfunctions of and , such that:

  1. factors into a rotation on the torus by a vector , i.e., .

  2. If is a kernel integral operator with a symmetric kernel , there exists a symmetric kernel such that .

Theorem 3 below shows that under further assumptions, the kernels are continuous and the integral operators have convergence in the operator-norm topology.

Theorem 3.

Let the assumptions of theorem 1 hold; and in addition let:

  1. All Koopma eigefunctions be continuous;

  2. The observation map be continuous;

  3. The discrete component from (4) be expressible as , for some .

Then, the operator from theorem 1 has a kernel and therefore, maps into itself. Moreover, the sequence converges to in the operator-norm topology.

Corollary 4 (spectral convergence).

Under the assumptions of Theorem 1, the following hold.

  1. For every nonzero eigenvalue of with multiplicity and every neighborhood of such that , there exists such that for all , contains elements converging as to .

  2. Let and be orthogonal projectors to the eigenspace of associated with and the eigenspaces of associated with , respectively. Then, converges strongly to .

Theorems 1, 2, 3 and corollary 4 are proved in section 4.

Remark.

One advantage of approaching the Koopman eigenvalue problem though the operator , , is that this operator is a compact, Markov operator and hence has bounded eigenvalues with finite multiplicities converging to 0. Moreover, as will be shown later, under assumptions (1) and (2), can be approximated to any degree of accuracy by a data-driven operator , defined on a finite dimensional Hilbert space associated with the sampling measure obtained from an observed time-series . A result analogous to this theorem but restricted to smooth manifolds, smooth observation maps, and Koopman operators with pure point spectrum and smooth eigenfunctions was presented in [11]. Theorem 1 generalizes this result to non-smooth state spaces and Koopman operators with mixed spectrum.

Once the eigenfunctions and eigenvalues of are obtained, the eigenvalues of or its generator (defined in section 2 ahead) can be computed using stable Galerkin techniques [22, 11]. One such technique will be discussed later and is based on perturbation theory for self-adjoint operators (e.g., [38]). Figure 1 shows two examples with mixed spectra, described in (31) and (32), respectively. In both examples, we start with a vector field on a manifold . In the first example, , so ; in the second example, and , where is the Lorenz 63 attractor embedded in .

Remark 5.

The class of integral operators studied in this work has been used in [39, 40] and independently in [41] for dimension reduction and mode decomposition of high-dimensional time series. In these works and subsequent applications to a variety of problems (e.g., [42]), a phenomenon called in [41] “timescale separation” was observed; namely, it was observed that at increasingly large the eigenfunctions of capture increasingly distinct timescales of a multi-scale input signal. Theorems 1 and 2 provide an interpretation of this observation from the point of view of spectral properties of Koopman operators; in particular, from the fact that has, in the limit , common eigenfunctions with and the latter capture distinct timescales associated with the eigenfrequencies . It should also be noted that even though in this work we focus on the class of operators , analogous results should also hold for other classes of compact operators for data analysis that employ delays, in particular the covariance operators used in singular spectrum analysis [43, 44, 45] and the related Hankel matrix analysis [20, 23, 24].

Remark 6.

One class of dynamical systems that this work does not address is those with purely continuous spectrum (i.e., , where is the function equal to 1 everywhere on ). In fact, theorem 2 shows that for such systems, in the limit of infinitely many delays, every non-constant observable will lie in the nullspace of the integral operator. See section 7 for further discussions. Moreover, in certain cases of the observation map , the kernel of can become the whole space other than the constant functions, even though . Corollary 30 in section 7 establishes necessary and sufficient “observability” conditions under which, in certain product dynamical systems, is not a trivial operator.

Figure 1: Eigenfunctions of and the associated matrix representation of the generator for the torus-based flow in (31) (top panels) and the L63-based flow in (32) (bottom panels). The eigenfunctions of from (12) have been computed from single orbits of these two systems using a large number of delays, , and plotted as time series along these orbits in the middle and right-hand panels. These time series are periodic with frequencies equal to integer multiples of the rotation frequency , each frequency having multiplicity 2. This behavior is in agreement with the result from theorem 1 that and commute in the limit . The left-hand panels show the elements of the matrix representation of the generator in the basis. Note that since is a skew-symmetric operator. Moreover, because the first eigenfunction of is the constant function and , the first column and row only have zero entries. The block-diagonal form of the matrices indicates that each of the eigenfunction pairs spans an eigenspace of .
Remark 7.

Our data driven implementation uses kernels and hence can only approximate Koopman eigenfunctions. In [46], Anosov and Katok construct discrete-time ergodic maps which are measure-theoretically isomorphic to systems with only discrete spectrum but with no continuous Koopman eigenfunctions. However, to our knowledge, no continuous-time ergodic flow with discontinuous Koopman eigenfunctions has been established. This paper does not address the problem of finding Koopman eigenvalues and eigenfunctions for this unknown class of systems.

Outline of the paper. In section 2, we will review some important concepts from the spectral theory of dynamical systems. In section 3, we will construct an integral operator which is the key tool of our methods and is also the operator described in theorems 1 and 2. Next, we prove these theorems in section 4. In section 5, we present a Galerkin method for the eigenvalue problem for the Koopman generator, with a small amount of diffusion added for regularization, formulated in the eigenbasis of . In section 6, we discuss the data-driven realization of , and discuss its spectral convergence properties along with the convergence properties of the associated data-driven Galerkin scheme for the generator. In section 7, the methods will be applied to two mixed spectrum flows, followed by a discussion of the results.

2 Overview of spectral methods for dynamical systems

In this section, we review some concepts from the spectral theory of dynamical systems and establish some facts about Koopman eigenfunctions. Henceforth, we use the notations and to represent the inner product and norm of , respectively.

Generator of a flow. The family of operators is a strongly continuous, 1-parameter group of unitary transformations of the Hilbert space . By Stone’s theorem [47], any such family has a generator , which is a skew-adjoint operator with a dense domain , defined as

(6)

Thus, in light of (3) and (6), we can interpret the quantity , as a frequency intrinsic to the dynamical system. The operators and share the same eigenfunctions; in particular, with satisfies

Vector fields as generators. If we start with a vector field on an -dimensional compact manifold , then this vector field induces a flow defined for all . Suppose that there is an invariant set with an ergodic invariant measure . This set is not necessarily a submanifold, and may not even have any differentiability properties. However, is still an ergodic measure preserving system, and the generator for the Koopman operators coincides with the vector field restricted to . For example, in quasiperiodic systems, , the -dimensional torus, and is equivalent to the Lebesgue volume measure. On the other hand, for the Lorenz attractor (see (30)), , is a compact subset with non-integer fractal dimension [48], and is supported on .

Eigenfunctions as factor maps. An important information about the dynamics carried by a Koopman eigenfunction lies in its fibers, i.e., the subsets with . We state the following observations.

  1. Eigenfunctions of the Koopman generator at nonzero corresponding eigenvalues have zero mean with respect to the invariant measure . This can be concisely expressed as .

  2. The flow is semi-conjugate to the irrational rotation by on the unit circle, with acting as a semi-conjugacy map. This follows directly from (3).

  3. Normalized eigenfunctions with have for -a.e. . This also follows from (3). As a result, the map can now be viewed as a projection onto a circle, i.e., .

Eigenfunctions form a group. Another important property of Koopman eigenfunctions is that they form a group under multiplication. That is, the product of two eigenfunctions of is again an eigenfunction, because of the following relation:

Moreover, an analogous relation holds for the eigenfunctions and eigenvalues of . The following lemma is about products of an element of with an element of . The proof is left to the reader.

Lemma 8.

Given a mixed spectrum, ergodic dynamical system , for every and for which , lies in .

The eigenvalues of are closed under integer linear combinations and are generated by a finite set of rationally independent eigenvalues . That is, every eigenvalue of is is simple, and has the form for some . Moreover, the corresponding eigenfunction is given by , where is the eigenfunction at eigenvalue . The following is a generalization of property (ii) of Koopman eigenfunctions listed above.

Proposition 9.

Given an arbitrary collection of Koopman eigenfunctions, the image of under the map with is a torus of dimension . Moreover, the flow on is semi-conjugate to a rotation on (i.e., ) associated with a frequency vector whose components are a subset of . In addition, if are rationally independent, then .

Remark 10.

If , the set of eigenvalues is dense on the imaginary axis. This property adversely affects the stability of numerical approximations of Koopman eigenvalues and eigenfunctions even in systems with pure point spectrum, necessitating the use of regularization [11]. We will return to this point in section 5.

Given , will denote its orthogonal projection onto , i.e.,

(7)

where is the orthogonal projector onto the Koopman eigenspace corresponding to the eigenvalue . Thus, is a countable sum of projections of onto the various eigenspaces of the Koopman operator. The following lemma shows that a projection onto an individual eigenspace occurs naturally as an exponentially weighted Birkhoff average of the function .

Lemma 11 ([49], section 2.3).

Given a sampling interval , the orthogonal projection of onto the eigenspace of corresponding to the eigenvalue of is given by the limit . Moreover, if is not in the discrete spectrum. Otherwise, .

Mixing and weak mixing. An observable is said to be mixing if for all , . The flow is said to be mixing if is mixing for all . An observable is said to be weak mixing if for all , = . In addition, weak mixing observables in and observables in have a useful pointwise decorrelation property:

Lemma 12.

Let and . Then, for -a.e. ,

Proof.

Without loss of generality, we may assume that is an eigenfunction of with eigenvalue . Then, becomes , which equals by lemma 11. The latter is equal to zero since . ∎

3 Kernel integral operators from delay-coordinate mapped data

3.1 Choice of kernel

Consider a kernel integral operator of the class (5) associated with a kernel . Then, is necessarily in , and the following properties hold (e.g., [50, 51]):

  1. is a Hilbert-Schmidt, and therefore compact, operator on .

  2. If is symmetric, then is self-adjoint.

  3. If is , then is also for every .

  4. Is is a manifold and is , then is also for every .

A common approach in machine learning and harmonic analysis [52, 36, 53] is to work with kernels of the form

where is a continuous shape function on , and is a metric or pseudo-metric on , usually defined as a pullback of a distance-like function in data space. As a concrete example, in what follows we employ an exponential shape function parameterized by a bandwidth parameter ; such functions are popular in manifold learning and other related geometrical data analysis techniques due to their localizing behavior as . We also consider the family of distance-like functions

(8)

where is the observation map and is the canonical 2-norm on . Note in particular that corresponds to a distance (or, as we will see below, pseudo-distance) on delay-coordinate mapped data with delays. Using the exponential shape function and the distance-like functions in (8) we arrive at the two-parameter family of kernels

(9)

and the associated integral operators and , respectively, from (5). Note that , and (see theorem 16) satisfy all four properties of kernels listed above.

As stated in remark 5, kernels of the class (9) have previously been used for data analysis in [39, 40, 41]; the analysis that follows will provide an interpretation of the behavior of these algorithms (in particular, their timescale separation capabilities) from the point of view of the spectral properties of Koopman operators. Note that our approach is applicable to other classes of kernels constructed from delay-coordinate mapped data; in particular covariance kernels associated with linear shape functions.

Before proceeding, we state a property of and valid for the specific choice of kernels in (9), which is due to compactness of and boundedness and strict positivity of the exponential shape function . This property will be important in the construction of the Markov-normalized operators in theorems 1 and 2.

Lemma 13.

Let denote the constant function equal to 1 on . The functions and , are continuous, positive, and bounded away from zero. As a result, and are also continuous, positive, and bounded away from zero.

Remark 14.

In corollary 19 below, we will see that is, in fact, constant. On the other hand, at finite , will generally be non-constant, and in applications it may be the case that has a large range of values. In such situations it may be warranted to replace (9) by a variable-bandwidth kernel of the form with a bandwidth function introduced so as to control the decay of the kernel away from the diagonal, . Various types of bandwidth functions have been proposed in the literature, including functions based on neighborhood distances [54, 55], state space velocities [40, 56], and local density estimates [57]. While we do not study variable bandwidth techniques in this work, our approach should be applicable in that setting too, so long as corollary 19 holds.

Remark 15.

In a number of applications such as statistical learning on manifolds [34, 36, 53, 57], one-parameter families of integral operators such as and are studied in the limit , where under certain conditions they can be used to approximate generators of Markov semigroups; one of the primary examples being the Laplace-Beltrami operator on Riemannian manifolds. Here, the fact that the state space may not (and in general, will not) be smooth precludes us from taking such limits unconditionally. However, according to theorem 2(ii), passing first to the limit allows us to view and as operators on functions on a smooth manifold, namely a -dimensional torus, and study the small- behavior of these operators in that setting.

3.2 Asymptotic behavior in the infinite-delay limit

To study the behavior of , and thus the behavior of in the limit of infinitely many delays, , we first consider the properties of the function . The latter can be studied in turn through a useful (nonlinear) map , which maps a given observation function into a (pseudo)metric on , namely,

(10)
Theorem 16.

Let be the dynamical system and the observation map from theorem 1, and the decomposition of from (4). Then, the map in (10) is well-defined as a function in . Moreover,

  1. For every and -a.e. , .

  2. For -a.e. , .

  3. is a constant almost everywhere and equals . Therefore,

    (11)

    In particular, .

  4. Under the additional assumptions in theorem 3, and converges to uniformly on .

Proof.

To prove well-definition of , note that exists -a.e. since it is the pointwise limit of the Birkhoff averages of the continuous function with respect to the product flow on . By compactness of , each of the functions is bounded above by . Therefore, lies in , and thus in since is a probability measure.

Claim (i) follows from the invariance of the infinite Birkhoff averages .

To prove claim (ii), let and denote and respectively. Let := , and similarly define . Expanding the right-hand side of (10), we obtain

and the first two terms in the equation above are and respectively. Therefore, to prove claim (ii), it suffices to prove that the third term vanishes. This is equivalent to showing that for -a.e. ,

which follows from lemma 12. This completes the proof of claim (ii).

To prove claim (iii), let , denote , respectively. then (10) can be re-written for as

The first two terms converge to the constant . It is therefore sufficient to show that the last term vanishes. Note that the function lies in the continuous spectrum subspace of the product-system . Therefore,

The proof of claim (iv) requires the following important observations. First, is a continuous map. This follows from the fact that and for each , is constant and equals . The details are left to the reader. Second, the family { : } is equicontinuous. To see why, note that , and therefore,

Given any , can be chosen large enough so that the second sum in the RHS is less than . Then, the first sum is a finite sum of uniformly continuous functions, so there is a such that if , then this sum is less than . Thus, if , then which establishes equicontinuity of { : }. Observe now that the family { : } is equicontinuous too. As a result, is continuous by a classic result of Krengel ([58], theorem 2.6). This completes the proof of claim (iv) and the theorem. ∎

Remark 17.

Theorem 16 establishes that is well-defined and continuous. It can also be verified that satisfies the triangle inequality and is non-negative. However, is a degenerate metric, i.e., may vanish for some . In fact, it is easy to check that if lies in the stable manifold of , then .

The following proposition shows that the operators and depend only on the “discrete component” of , and is a direct consequence of theorem 16 and (11).

Proposition 18.

Let and be as in theorem 1. Then, the integral operator is a constant scaling operator iff its kernel from (9) is a constant, which occurs iff is a constant.

3.3 Markov normalization

Next, we construct the Markov operators and appearing in theorems 1 and 2 by normalization of and . Here, we use a procedure introduced in the diffusion maps algorithm [36] and further developed in [53], although there are also other normalization procedures with the same asymptotic behavior. Specifically, we define:

(12)

In [53], the steps leading to from and to from are called right and left normalization, respectively. Due to lemma 13, the normalizing functions in each step are continuous, meaning that and are both compact operators on . Moreover, is by construction a Markov operator preserving constant functions; that is, . For the same reasons, is also a compact Markov operator, but in this case the effects of right normalization cancel by corollary 19, so it is sufficient to construct this operator directly via left normalization of .

As with and , the operators and admit a kernel integral representation of the form (5). The corresponding kernels are

(13)

Note that is constant by corollary 19 ahead. Moreover, and are both bounded above and away from zero, and for all we have .

The Markov kernel is symmetric by symmetry of and the fact that is constant. As a result, is self-adjoint, its eigenvalues admit the ordering , and there exists a real orthonormal basis consisting of corresponding eigenfunctions, , with being constant. Even though is not symmetric and thus is not self–adjoint, we will see in section 6 that the eigenvalues of this operator are real, and admit the same ordering as those of , and there exists a real (non-orthogonal) basis of consisting of corresponding eigenfunctions. In the next section, we will show that the operators converge as to , and thereby prove theorem 1.

We end this section with two important corollaries of theorem 16, which are central to both theorems 1 and 2.

Corollary 19.

Viewed as elements of , both and are invariant under . Moreover, the function is invariant under , and thus constant by ergodicity.

Corollary 20.

The operators and commute.

Proof.

Since is an invariant measure, for every in and we have

It therefore follows from corollary 19 that

and the claim of the corollary follows. ∎

4 Proof of theorems 1, 2, 3 and corollary 4

Proof of theorem 1

. Since we know already from corollary 20 that and commute, it remains to show that converges to in operator norm, as . First, observe that theorem 16(i) and continuity of the kernel shape function imply uniform convergence of to as . Note also that for every and we have , and a similar result holds for . Therefore,

where the last inequality follows from the Cauchy-Schwartz inequality (e.g., [51]). We thus have , so . Repeating these arguments for the kernels and (noting, in particular, lemma 13) establishes uniform convergence of to . This completes the proof of theorem 1. ∎

Proof of theorem 2.

This requires proving a series of claims.

Proposition 21.

For any eigenvalue of , the corresponding eigenspace is invariant under the action of the Koopman generator , and is diagonalizable. The constant function is an eigenfunction of corresponding to eigenvalue . Moreover, if , is an even number greater than or equal to 2.

Proof.

Since is compact, every nonzero eigenvalue has finite multiplicity and its corresponding eigenspace has finite dimension, . Since commutes with , and hence leave invariant. Similarly, since the constant function is an eigenfunction of , it is an eigenfunction of with some eigenvalue .

Let . Then, is a skew-symmetric operator on a finite-dimensional space, and hence can be diagonalized w.r.t. a basis of simultaneous eigenfunctions of and . Fix any element of this basis. By our choice of , is a non-constant eigenfunction of , hence . Therefore, by ergodicity of , for some . This implies that has non-zero real and imaginary parts. Hence the conjugate is linearly independent from and corresponds to eigenvalue of . However, since is a real operator, is still in . We therefore conclude that can be split into disjoint 2-dimensional spaces spanned by the conjugate pair of eigenfunctions and . Therefore is an even number . ∎

Note that since the integral operator commutes with , for -a.e. ,