Spatiotemporal pattern extraction by spectral analysis of vector-valued observables

# Spatiotemporal pattern extraction by spectral analysis of vector-valued observables

Dimitrios Giannakis Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA    Abbas Ourmazd Department of Physics, University of Wisconsin-Milwaukee, Milwaukee, WI 53211, USA    Joanna Slawinska Department of Physics, University of Wisconsin-Milwaukee, Milwaukee, WI 53211, USA    Zhizhen Zhao Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
July 3, 2019
###### Abstract

We develop a data-driven framework for extracting complex spatiotemporal patterns generated by ergodic dynamical systems. Our approach, called Vector-valued Spectral Analysis (VSA), is based on an eigendecomposition of a kernel integral operator acting on a Hilbert space of vector-valued observables of the system, taking values in a space of functions (scalar fields) on a spatial domain. This operator is constructed by combining aspects of the theory of operator-valued kernels for machine learning with delay-coordinate maps of dynamical systems. Specifically, delay-coordinate maps performed pointwise in the spatial domain induce an operator acting on functions on that domain for each pair of dynamical states. Unlike conventional eigendecomposition techniques, which decompose the input data into pairs of temporal and spatial modes with a separable, tensor product structure, the patterns recovered by VSA can be manifestly non-separable, requiring only a modest number of modes to represent signals with intermittency in both space and time. In addition, our kernel construction naturally quotients out any symmetries present in the data. We show that in the limit of infinitely many delays the kernel integral operator employed in our scheme commutes with a Koopman operator governing the evolution of vector-valued observables under the dynamics; as a result, in that limit our recovered patterns lie in simultaneous eigenspaces of these operators associated with the point spectrum of the dynamical system. We present applications of VSA to the Kuramoto-Sivashinsky model, which demonstrate considerable performance gains in efficient and meaningful decomposition over eigendecomposition techniques utilizing scalar-valued kernels.

## I Introduction

Spatiotemporal pattern formation is ubiquitous in physical, biological, and engineered systems, ranging from molecular-scale reaction-diffusion systems, to engineering- and geophysical-scale convective flows, and astrophysical flows, among many examples Cross and Hohenberg (1993); Ahlers et al. (2009); Slawinska et al. (2015); Fung et al. (2016). The mathematical models for such systems are generally formulated by means of partial differential equations (PDEs), or coupled ordinary differential equations, with dissipation playing an important role in the development of low-dimensional effective dynamics on attracting subsets of the state space Eckmann and Ruelle (1985); Foias et al. (1988); Constantin et al. (1989). In light of this property, many pattern forming systems are amenable to analysis by empirical, data-driven techniques, complementing the scientific understanding gained from first-principles approaches.

Historically, many of the classical proper orthogonal decomposition (POD) techniques for spatiotemporal pattern extraction have been based on the spectral properties of temporal and spatial covariance operators estimated from snapshot data (Aubry et al., 1991; Holmes et al., 1996). In Singular Spectrum Analysis (SSA) and related algorithms Broomhead and King (1986); Vautard and Ghil (1989), the combination of this approach with delay-coordinate maps of dynamical systems Packard et al. (1980); Takens (1981); Sauer et al. (1991); Robinson (2005); Deyle and Sugihara (2011) generally improves the representation of the information content of the data in terms of a few meaningful modes. More recently, advances in machine learning and applied harmonic analysis Schölkopf et al. (1998); Belkin and Niyogi (2003); Coifman et al. (2005); Coifman and Lafon (2006); Jones et al. (2008); Singer and Coifman (2008); von Luxburg et al. (2008); Talmon and Coifman (2013); Portegies (2014); Giannakis (2015); Berry and Harlim (2015); Berry and Sauer (2016a, b) have led to the development of techniques recovering temporal and spatial patterns through the eigenfunctions of kernel integral operators (e.g., heat operators) defined intrinsically in terms of a Riemannian geometric structure of the data. In particular, in a family of techniques called Nonlinear Laplacian Spectral Analysis (NLSA) Giannakis and Majda (2011, 2012, 2013), as well as in independent work in Berry et al. (2013), the diffusion maps algorithm Coifman and Lafon (2006) was combined with delay-coordinate maps to extract spatiotemporal patterns through the eigenfunctions of a kernel integral operator adept at capturing distinct and physically meaningful timescales in individual eigenmodes from multiscale high-dimensional signals.

At the same time, spatial and temporal patterns have been extracted from eigenfunctions of Koopman Mezić and Banaszuk (2004); Mezić (2005); Rowley et al. (2009); Tu et al. (2014); Arbabi and Mezić (2016); Brunton et al. (2017) and Perron-Frobenius Dellnitz and Junge (1999); Froyland and Dellnitz (2003) operators governing the evolution of observables and probability measures, respectively, in dynamical systems Sinai (2000); Budisić et al. (2012); Eisner et al. (2015). In Rowley et al. (2009), it was shown that the dynamic mode decomposition (DMD) algorithm Schmid and Sesterhenn (2008); Schmid (2010) can, under some conditions, approximate spatial patterns called Koopman modes, corresponding to projections of the data onto eigenfunctions of the Koopman operator. The DMD algorithm can be implemented as a singular value decomposition of a linear operator relating time-shifted snapshots to a reference snapshot dataset. As shown in Tu et al. (2014), this has connections with linear inverse modeling (LIM) techniques Penland (1989). In Williams et al. (2015), an extended formulation of DMD was developed using as an approximation space a dictionary of functions providing greater flexibility and stronger convergence guarantees than the snapshot-based dictionary employed in DMD. In Giannakis et al. (2015); Giannakis (2017); Das and Giannakis (2017), an approximation scheme targeting the generator of the Koopman group (as opposed to Koopman operators themselves) was developed using a data-driven orthonormal basis of the Hilbert space of the dynamical system acquired through diffusion maps.

In general, Koopman and Perron-Frobenius techniques target operators defined intrinsically for the dynamical system generating the data, and thus can, in principle, recover temporal and spatial patterns of higher physical interpretability and utility in predictive modeling than POD and kernel integral operator based approaches. In practice, however, the Koopman and Perron-Frobenius operators tend to have significantly more complicated spectral properties (e.g., non-isolated eigenvalues and/or continuous spectrum), hindering the stability and convergence of data-driven approximation techniques. In Giannakis (2017); Das and Giannakis (2017) these issues were addressed at least in the case of systems with pure point spectra and mixed spectra of their Koopman operators. There, it was shown that the eigenfunctions recovered by kernel integral operators defined on delay-coordinate mapped data (e.g., the covariance and heat operators in SSA and NLSA algorithms, respectively) in fact converge to Koopman eigenfunctions in the limit of infinitely many delays, indicating a deep connection between these two major branches of data analysis algorithms.

All of the techniques described above recover from the data set of temporal patterns and a corresponding set of spatial patterns, referred to in Aubry et al. (1991) as “chronos” and “topos” modes, respectively. In particular, for a dynamical system with a state space developing patterns in a physical domain , each chronos mode, , corresponds to a scalar- (real- or complex-) valued function on , and the corresponding topos mode, , corresponds to a scalar-valued function on . Spatiotemporal reconstructions of the data with these approaches thus correspond to linear combinations of tensor product patterns of the form , mapping pairs of points in the product space to the number . For a dynamical system possessing a compact invariant set with an ergodic invariant measure (e.g., a dissipative system) and states sampled “on the attractor”, the chronos modes become scalar-valued functions on , which may be of significantly smaller dimension than , increasing the feasibility of robust approximation of these modes from finite datasets. As discussed below, if the system additionally possesses a so-called physical measure Young (2002), these patterns can also be approximated from data lying near the invariant set, as would be the case in a typical experimental scenario.

Evidently, for spatiotemporal patterns of high complexity, tensor product patterns of the form with separable dependence on and can be highly inefficient in capturing the properties of the input signal. That is, the number of such patterns needed to recover via a linear superposition would generally be large, and none of the individual patterns would be representative of . In essence, the problem is similar to that of approximating a non-separable space-time signal in a tensor product basis of temporal and spatial basis functions. Another issue with tensor product decompositions based on scalar-valued eigenfunctions, which has received considerable attention Aubry et al. (1993); Holmes et al. (1996) is that in the presence of nontrivial spatial symmetries, the recovered patterns are oftentimes pure symmetry modes (e.g., Fourier modes in a periodic domain with translation invariance), with minimal dynamical significance and physical interpretability

Here, we present a framework for spatiotemporal pattern extraction, called Vector-valued Spectral Analysis (VSA), designed to alleviate the shortcomings mentioned above. Instead of producing separable space-time modes, our approach recovers patterns in a Hilbert space of vector-valued observables assigning a scalar field (spatial pattern) on for every dynamical state in , without imposing a separable structure. We recover data-driven patterns in this space through the eigenfunctions of a kernel integral operator, , constructed using the theory of operator-valued kernels Micchelli and Pontil (2005); Caponnetto et al. (2008); Carmeli et al. (2010). As in NLSA algorithms, operates on data in delay-coordinate space with delays, and has analogous relationships with Koopman operators as those established in Giannakis (2017); Das and Giannakis (2017). However, unlike NLSA and other techniques recovering scalar-valued temporal patterns, the kernel of our integral operator is a function of both the state of the dynamical system in and the spatial position in ; in particular, for every pair of dynamical states in , there is a bounded linear operator acting on functions on (as opposed to a real number as in the case of NLSA and comparable algorithms) constructed from .

As we will show, every eigenfunction of can be equivalently characterized as a function-valued function on , taking values in a Hilbert space of scalar functions on , or as a scalar-valued function on the set of delay-embedded sequences corresponding to all dynamical states in and spatial points in . In particular, can be understood as the base space of a topological bundle with the product space acting as the total space and the observation function in delay-coordinate space as the projection map. This structure naturally quotients out any spatiotemporal symmetries present in the delay-embedded data (which may depend on ), reducing the dimension of when such symmetries are present. Moreover, the eigenfunctions of can have a manifestly non-separable structure. In fact, we will see that for small to moderate numbers of delays the leading non-constant eigenspace of can capture a large portion of the input signal. Thus, our kernel acts as a matched filter for the spatiotemporal signal generated by the dynamical system. This property has useful applications requiring estimation of level sets of functions from noisy data, such as topological data analysis Carlsson (2009), and is mathematically related to nonlocal averaging techniques in image processing. Besides spatiotemporal data, this bundle construction may be useful in other scenarios, such as analysis of data generated by dynamical systems with varying parameters Yair et al. (2017).

Another limit of interest corresponds to infinitely many delays, , where a result that generalizes the results in Giannakis (2017); Das and Giannakis (2017) is found to hold. Namely, as , and converge to operators, and , respectively, that commute with the lift of the Koopman operator of the dynamical system on . This result implies that and have common eigenfunctions with the lifted Koopman operator, and thus act as filters recovering the component of the signal associated with the discrete Koopman spectrum. We demonstrate the utility of our approach in the context of the Kuramoto-Sivashinsky (KS) model Kuramoto and Tsuzuki (1976); Sivashinsky (1977) in chaotic regimes.

The plan of this paper is as follows. In Section II, we introduce the class of systems under study, and establish our notation. Section III presents the VSA framework for spatiotemporal pattern extraction using operator-valued kernels. Section IV discusses certain connections between our framework and the theory of topological bundles, which are useful for gaining intuition on the behavior of our technique, particularly in the presence of symmetries. In Section V, we examine the asymptotic behavior of VSA in the limit of no delays or infinitely-many delays, and establish connections with nonlocal averaging and Koopman operators, respectively. Section VI describes the data-driven implementation of our techniques. That section includes Theorem 6 and Corollary 7, which establish convergence of our data-driven algorithms to their counterparts in section III. In Section VII, we present our applications to the KS system. Our primary conclusions are described in Section VIII. The paper includes four appendices containing a summary of the properties of Koopman eigenfunctions and technical results. All numerical results presented here were obtained via a Matlab code which is available for download from the first author’s personal website, http://cims.nyu.edu/~dimitris.

## Ii Notation and preliminaries

Let , , be a continuous-time flow on a differentiable (of class ) manifold , generated by a Lipschitz vector field , possessing a compact invariant set and an ergodic invariant probability measure with support equal to . As with every such manifold, can be equipped with a metric ; in what follows we will not require an explicit specification of , but its existence will be used in the proof of Theorem 6. Let also be a compact, connected metric space equipped with a Borel probability measure . In what follows, will be the dynamical system generating the data and the spatial domain in which measurements are taken. The measure provides a notion of normalized volume on . In addition, we consider that we observe the system at a fixed sampling time interval through a function ; that is, for every , is a continuous scalar-valued function on , and the dependence of on is continuous. Associated with is a continuous scalar-valued function on such that . The product space can be equipped with a metric induced by and .

The assumptions stated above are sufficient to ensure that the VSA framework introduced below is theoretically well posed. Additional assumptions are needed to ensure that our scheme is amenable to data-driven approximation using finite numbers of sampled points in and . These assumptions will be stated in Section VI.1.

Next, we introduce groups of Koopman operators Sinai (2000); Budisić et al. (2012); Eisner et al. (2015), governing the evolution of scalar and vector-valued observables under the dynamics on . For that, we first specify appropriate function spaces. In the case of scalar-valued observables, , the standard space to consider is the Hilbert space associated with the invariant measure. Hereafter, we abbreviate our notation for this space using , and also use the notations and for its inner product and norm, respectively. Similarly, we consider scalar-valued observables on the spatial domain and the product space lying in the Hilbert spaces and , respectively, where is the product probability measure between the invariant measure of the dynamics and the volume measure of the spatial domain. The inner products and norms of these spaces are defined analogously to those of , and will be denoted using and subscripts, respectively. Because the measures and are both finite (and thus and are both separable), is isomorphic to the tensor product space ; that is, every element of can be expressed as a countable sum with and .

The Koopman operator on at time is the unitary, bounded operator that acts on observables via composition with the flow map, namely , or equivalently, , where the equality holds up to null sets in with respect to . Note that the unitarity of , , is a consequence of preserving , and we also have . The Koopman operators on trivially lift to the unitary Koopman operators on , which are composition operators associated with the flow . Note that “dynamics” on the spatial domain are trivial under this flow, as they are described by the identity map . Moreover, is manifestly non-ergodic, as every set of the form with is -invariant and has positive -measure. Further details on the properties of Koopman operators are included in Appendix A.

Koopman operators give rise to a distinguished class of observables through their eigenfunctions. In the case of , the Koopman eigenvalue problem reads

 Utz=Λtz,Λt∈C,z∈HA.

As discussed in Appendix A, in ergodic dynamical systems the eigenvalues are simple, lie on the unit circle, and take the form , where are real frequencies. Thus, under the action of , every such eigenfunction evolves by multiplication by a periodic phase factor; this makes Koopman eigenfunctions highly predictable observables. In addition, Koopman eigenfunctions have the important property of being intrinsic to the dynamical system; i.e., they do not depend on additional structures such as a Riemannian geometry often utilized in manifold learning algorithms. In measure-preserving systems, Koopman eigenfunctions corresponding to distinct eigenvalues are orthogonal by unitarity of . Hereafter, we will assume that all eigenfunctions of are continuous; while this is not necessarily the case, we are not aware of any continuous-time flows with discontinuous Koopman eigenfunctions (though discrete-time dynamical systems with discontinuous Koopman eigenfunctions are known to exist Anosov and Katok (1970)).

The eigenvalue problem for the lifted Koopman operator on ,

 ~Ut~z=~Λt~z,~Λt∈C,~z∈HM, (1)

is structurally very similar to that for ; in fact, it is a direct consequence of the definition of that its eigenvalues are the same as those of . However, in contrast to the one-dimensional eigenspaces of , the eigenspaces of are all infinite dimensional. In particular, given any eigenfunction of at eigenvalue and any spatial pattern , the function is an eigenfunction of at the same eigenvalue.

Here, in addition to spaces of scalar-valued observables, we will be interested in vector-valued observables lying in the space . This space consists of (equivalence classes of) functions on taking values in , which are square-integrable with respect to the invariant measure . That is, every observable satisfies . is a Hilbert space for the inner product . Given a point (an initial condition), we can associate a spatiotemporal pattern to every continuous observable through the relationship

 t↦f(Φt(x)),t∈R. (2)

This generalizes the standard notion of temporal patterns (including the chronos modes) constructed from scalar valued observables in or via analogous relationships to (2). As with the isomorphism between and noted above, the finiteness of the measures and implies the isomorphisms . Thus, we can equivalently characterize the observables of interest in this work either as vector-valued functions in , scalar-valued functions in , or elements of the tensor product space .

## Iii Vector-valued Spectral Analysis formalism

We now describe the VSA framework for spatiotemporal pattern extraction through operator-valued kernels. Due to the isomorphism between and stated in Section II, we first introduce our approach through a scalar-valued kernel construction on the product space , and then pass to an equivalent operator-valued kernel formulation on the invariant set .

### iii.1 Scalar-valued kernels and their associated integral operators

For the purposes of this work, a (scalar-valued) kernel is a function mapping pairs of points in to a positive real number. Intuitively, can be thought of as a measure of similarity between points in . Mathematically, we require that it has the following properties:

1. is continuous on . Hence, restricted to any compact subset of (in particular, ), it is bounded.

2. Restricted to , is bounded away from zero; that is, there exists a constant such that for all .

Sometimes, but not always, we consider symmetric kernels, i.e., kernels having the additional property for all . It should be noted that while the assumptions above are sufficient to ensure the theoretical well-posedness of our schemes, in Section VI below we will make additional assumptions in order to construct convergent data-driven approximations to these schemes. In particular, we will require that be continuous on a compact superset of of positive Lebesgue measure.

Associated with every kernel is an integral operator defined via , where

 Kf=∫Mk(⋅,ω)f(ω)dρ(ω). (3)

By continuity of , is a compact operator whose range is included in the space of continuous functions on . If, in addition, is symmetric, is also self-adjoint. As a result, in the symmetric case, the eigenfunctions of are continuous, and its eigenvalues are real, bounded, and can only accumulate at zero. Moreover, the nonzero eigenvalues of have finite multiplicities (i.e., the corresponding eigenspaces are finite-dimensional).

### iii.2 Kernels from pseudometric functions

We will focus on kernels constructed from scaled pseudometric functions. Below, a pseudometric will be a symmetric function , continuous on , and having the properties and for all . Note that can vanish even if , which would not be the case for a proper metric. Moreover, is not required here to obey the triangle inequality.

We first consider the class of pseudometric functions , , obtained from delay-coordinate mapped data with delays. Specifically, given and with and , we define

 d2Q(ω,ω′)=1QQ−1∑q=0|F(Φ−qτ(x),y)−F(Φ−qτ(x′),y′)|2.

Note that corresponds to a time-averaged squared pseudo-distance, evaluated by fixing the spatial points , and sampling the observation map on temporally-equispaced points along the dynamical trajectories starting from .

We will also be interested in the limiting behavior of as the number of delays increases, viz.

 d∞(ω,ω′)=limQ→∞dQ(ω,ω′). (4)

Unlike at finite , may not be continuous on . However, it can be shown that it is a well-defined function in ; in particular, the limit in (4) exists for almost every pair of points with respect to the product measure Das and Giannakis (2017). Moreover, may vanish despite being non-vanishing (e.g., if lies in the stable manifold of ). Additional details on the properties of can be found in Appendix C.

Next, as is common in machine learning algorithms Zelnik-Manor and Perona (2004); Giannakis and Majda (2012); Giannakis (2015); Berry and Harlim (2015); Berry and Sauer (2016a) (including NLSA), we introduce a positive scaling function , and form a rescaled pseudometric , given by

 ~dQ(ω,ω′)=dQ(ω,ω′)√sQ(ω)sQ(ω′). (5)

We require that, restricted to , is continuous (and therefore bounded). Under these conditions, has the required properties of pseudometrics stated above, and we can similarly study the limit

 ~d∞=limQ→∞~dQ; (6)

see Appendix C for additional details.

Intuitively, one can think of as a modification of the original pseudometric to account for variations in the sampling density and/or time tendency (“velocity”) of the data. For instance, we can choose such that it is large in regions of where the sampling density is large, and small in regions where the sampling density is small. With this choice, becomes relatively smaller (larger) than for pairs of points in regions of with small (large) sampling density. As a result, adaptively provides higher “resolution” in regions where we have plentiful observations and smaller resolution in regions where the observations are sparse. Bandwidth functions employed in the literature include functions based on near-neighbor distances Zelnik-Manor and Perona (2004); Berry and Sauer (2016a), local time tendencies of the data Giannakis and Majda (2012); Giannakis (2015), and kernel density estimates Berry and Harlim (2015); Berry and Sauer (2016a). In certain cases involving data on smooth Riemannian manifolds, the transformation associated with these techniques can be interpreted as a conformal change of metric Giannakis (2015); Berry and Sauer (2016a); that is, a change of volume on the data manifold without change of angles.

Here, we employ a class of scaling functions introduced in Giannakis (2017), which has both velocity- and density-dependent contributions. Explicit formulas and discussion on this class of bandwidth functions are included in Appendix B. Note that in Zelnik-Manor and Perona (2004); Berry and Sauer (2016a); Giannakis and Majda (2012); Giannakis (2015) the convention is to work with rescaled distance functions of the form

 ~dQ(ω,ω′)=dQ(ω,ω′)√s′Q(ω)s′Q(ω′), (7)

which is equivalent to (5) with , so long as is bounded away from zero. Here, we have opted to work with (5) in order to allow for the possibility that vanishes. As we will see below, this may be useful in the context of spatiotemporal data where the signal can exhibit constant or near-constant values in certain parts of the domain (e.g., if is the temperature field in a convection flow, and held to fixed values at the boundary of ).

Having specified the class of pseudometrics , we also select a continuous shape function , and define

 kQ(ω,ω′)=h(~dQ(ω,ω′)),k∞(ω,ω′)=h(~d∞(ω,ω′)).

By symmetry of pseudometrics, and are both symmetric kernels. In what follows, we will always use Gaussian shape functions, , parameterized by a positive bandwidth parameter . Explicitly, we have

 kQ(ω,ω′)=e−~d2Q(ω,ω′)/ϵ,k∞(ω,ω′)=e−~d2∞(ω,ω′)/ϵ. (8)

We denote the kernel integral operators from (3) associated with and by and , respectively. Note that in the context of exponentially decaying kernels such as (8) the function from (7) can be interpreted as a bandwidth function since, for a fixed base point , a large (small) value increases (decreases) the size of the neighborhood where is appreciably greater than zero.

The Gaussian class of shape functions has been widely used in machine learning applications Schölkopf et al. (1998); Belkin and Niyogi (2003); Coifman et al. (2005); Coifman and Lafon (2006); Berry and Sauer (2016b), including NLSA. Among their attractive properties is their localizing behavior at small (not true for quadratic shape functions associated with the covariance kernels employed in POD), which allows the associated kernel integral operators to approximate, as , heat kernels on Riemannian manifolds. The latter have, in turn, a number of useful properties for dimension reduction and clustering of data with a manifold geometric structure. The exponential decay of and also implies that and can be well approximated by sparse matrices. The latter property is particularly important in the problems studied here involving spatiotemporal data, as the number of sampled states in and spatial points in are generally both large (see Section VI). It is important to note that in the present work we do not assume that is a smooth manifold (since for nonlinear dissipative systems with fractal attractors is highly non-smooth), so we do not consider the limiting behavior of our kernels as .

### iii.3 Markov normalization

Following the approach taken in NLSA algorithms and in Berry et al. (2013); Giannakis et al. (2015); Giannakis (2017); Das and Giannakis (2017), we construct from and associated compact Markov operators, and , respectively, by applying the normalization procedure introduced in the diffusion maps algorithm Coifman and Lafon (2006) and further developed in the context of general exponentially decaying kernels in Berry and Sauer (2016b). Below, we describe the procedure for a general kernel integral operator of the class in (3) associated with a kernel .

We begin by defining the functions

 r=K1M,l=K(1M/r)

where is the function on everywhere equal to 1. By the properties of kernels listed in section III.1, both of the above functions are continuous, positive functions on , bounded away from zero. We then define the kernel integral operator (again of the class in (3)), associated with the kernel , where

 p(ω,ω′)=k(ω,ω′)l(ω)r(ω′). (9)

In Berry and Sauer (2016b), the division of by and to form is referred to as left and right normalization, respectively. Because and are both positive and bounded away from zero (which is in turn a consequence of property 2 of ), meets both conditions on kernels stated in Section III.1. Moreover, it has the Markov property,

 ∫Mp(ω,⋅)dρ=1,∀ω∈M,

by construction. As result, is a compact Markov operator on , preserving constant functions. In addition, since is bounded away from zero, is ergodic; i.e., it has a simple eigenvalue , and the corresponding eigenfunction, , is constant. Here, we are interested in the kernels and , and the corresponding Markov operators and , constructed by applying the diffusion maps normalization procedure to and , respectively.

In general, the kernels of the class in (9), including and , are not symmetric, and as a result the corresponding Markov operators are not self-adjoint. However, for a kernel constructed by applying the diffusion maps normalization procedure to a symmetric kernel (including the kernels introduced in Section III.1), the corresponding operator is related to a self-adjoint compact operator, by a similarity transformation. In particular, let be a bounded function in , and the corresponding multiplication operator by . That is, for , is the function equal to for -a.e. . Defining as the self-adjoint kernel integral operator from (3) with the symmetric kernel

 ^p(ω,ω′)=k(ω,ω′)^d(ω)^d(ω′),^d=√lr,

one can verify that can be obtained from through the similarity transformation

 ^P=T~d∘P∘T−1~d,~d=√l/r. (10)

Due to (10), and have the same eigenvalues , which are real by self-adjointness of , and admit the ordering , because is an ergodic Markov operator. Moreover, the eigenvalues have the properties , , and the eigenspaces corresponding to the nonzero are finite-dimensional.

Since is self-adjoint, there exists an orthonormal basis of consisting of real eigenfunctions of corresponding to the , which are continuous by the assumed continuity of kernels. Moreover, due to (10), for every element of this basis, the continuous functions and are eigenfunctions of and , respectively, corresponding to the same eigenvalue . The sets and are (non-orthogonal) bases of satisfying the bi-orthogonality relation . In particular, every can be expanded as with , and we have . For notational simplicity, in what follows we will use the same symbols, () to represent the eigenvalues (eigenfunctions) of either or ; the symmetric operators constructed by applying (10) to and , respectively. Moreover, the eigenfunctions of and will both be denoted by .

### iii.4 Operator-valued kernels and the associated spatiotemporal patterns

Having constructed the kernel integral operators of section III.2, we now make use of the isomorphism between and the Hilbert space of vector-valued observables to establish analogous operators associated with operator-valued kernels. Let be the Banach space of bounded linear operators on , equipped with the operator norm. For our purposes, an operator-valued kernel will be a continuous map , mapping each pair of states to a bounded linear operator in . Note that we do not impose a positive-definiteness property employed in the definition of operator-valued kernels in Micchelli and Pontil (2005); Caponnetto et al. (2008); Carmeli et al. (2010); that property is important in the construction of associated reproducing kernel Hilbert spaces of vector-valued observables, which we do not address in this work (though it would be an interesting topic for future work).

To every such operator-valued kernel there corresponds a kernel integral operator defined by , where

 Lf=∫Xl(⋅,x′)f(x′)dμ(x′),x∈A, (11)

and the above integral is a Bochner integral for Hilbert-space valued observables taking values in .

Given a scalar kernel from Section III.1, we can constructed an associated operator-valued kernel by observing that the functions , defined for every as , are kernels on the spatial domain . As a result, associated with every is a compact (hence bounded) kernel integral operator , defined analogously to (3) via

 ~Kxx′f=∫Y~kxx′(⋅,y′)f(y′)dν(y′).

The operator-valued kernel associated with is then defined as . It follows by construction of that if is an eigenfunction of the integral operator from (3) associated with corresponding to the eigenvalue , then the vector-valued function defined as

 →ϕj(x)(y)=ϕj((x,y)),x∈A,y∈Y, (12)

is an eigenfunction of corresponding to the same eigenvalue . Note that by continuity of and , both and are continuous.

In this work, we extract spatiotemporal patterns using the eigenfunctions of the kernel integral operator of the class (11), whose kernel is the operator-valued kernel associated with the Markov kernel from Section III.3. We also consider the analogous limiting operator with kernel associated with the Markov kernel , characterizing the behavior of in the limit of infinitely many delays. As with their counterparts and acting on scalar-valued functions in , there exists a (non-orthogonal) basis of consisting of real eigenfunctions of related to those of via (12). Moreover, to each such eigenfunction there corresponds a spatiotemporal pattern obtained via (2). In Section VII, we will demonstrate that these patterns can provide an efficient and physically meaningful decomposition of complex spatiotemporal signals.

## Iv Connections with topological bundles

To gain intuition on the behavior of the kernel integral operators introduced in Section III, it is useful to examine a natural topological bundle structure that acquires through the observation map in delay-coordinate space. First, we recall that a topological bundle is a triplet , where and are topological spaces, called the total and base spaces, respectively, and is a continuous surjective map. The preimage of a point is called the fiber of over . A topological bundle is called a fiber bundle if all fibers are homeomorphic to a reference fiber, . If it happens that , the fiber is said to be trivial. While not every fiber bundle will have this property, it is the case that fiber bundles are locally trivial; that is, for every there exists a neighborhood such that is homeomorphic to , and agrees with the natural projection onto the first factor of that Cartesian product; see, e.g., James (1995) for more details.

### iv.1 Bundles for finitely many delays

In what follows, we will be interested in topological bundles where the total space is the product space , and the base space and the projection map are constructed by application of delay-coordinate maps to . In particular, for , let be the continuous map defined by

 FQ((x,y))=(F((x,y)),F((Φ−τ(x)),y)),…,F((Φ−(q−1)τ(x),y)).

Let also be the image of under , and the map defined uniquely by for all . Then, is a topological bundle. Note that the invariant measure on induces a Borel measure on , where is the pushforward map on measures associated with .

The usefulness of this construction is that we can express the pseudometric on as a pullback of a Euclidean metric on . In particular, let be the Euclidean metric on scaled by the square root of the number of delays, i.e.,

 d′Q(b,b′)=∥b−b′∥RQ/Q1/2,b,b′∈RQ.

Then, for all we have

 dQ(ω,ω′)=d′Q(FQ(ω),FQ(ω′)), (13)

which expresses the fact that is a pullback of . The same conclusion holds for the restrictions of and to and , respectively, as stated above. Now, as with every pseudometric, partitions (and therefore ) into equivalence classes of points for which vanishes. In particular, the equivalence class associated with is given by . It follows from (13) that the set of all such equivalence classes in is homeomorphic (topologically equivalent) to ; that is, for every , the equivalence class is given by the preimage .

By similar arguments, we can also conclude that the kernel can be expressed as a pullback of the kernel constructed by applying the Markov normalization procedure described in Section III.3 to the kernel on . That is, we have

 pQ(ω,ω′)=p′Q(FQ(ω),FQ(ω′)),ω,ω′∈Ω.

As a result, the eigenfunctions of correspond to pullbacks of the eigenfunctions of the kernel integral operator whose kernel is . Explicitly, we have

 ϕj=φj∘πQ, (14)

where is an eigenfunction of satisfying

 P′Qφj=λjφj, P′Qφj(b)=∫BQp′Q(b,b′)φj(b′)dρQ(b′),b∈BQ.

In general, the “useful” patterns that can be extracted from a kernel integral operator are represented by eigenfunctions at nonzero corresponding eigenvalues (recall that such eigenfunctions lie in finite-dimensional eigenspaces, and thus can be robustly approximated from data). Therefore, it is useful to examine the closed subspace

 HQ=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯span{ϕj:λj>0}⊆HM,

whose every element is expressible as a countable linear combination of such eigenfunctions of . Note that can be equivalently defined as the closure of the range of , or as a pullback under of the closure of the range of . In particular, whether or not is a strict subspace of depends on the number of delays , and importantly, it also depends on the presence of symmetries in the data.

To examine this in more detail, suppose that for every spatial point the observation map is generic in the sense of delay-coordinate mappings *PackardEtAl80,Takens81,SauerEtAl91,Robinson05,DeyleSugihara11; that is, there exists a minimal positive integer such that for all the image of the set is homeomorphic to . In that case, we can topologically reconstruct the attractor from time-lagged measurements taken at . Assume now that is finite, and observe that, for any , the base space of the bundle is given by . For , if the sets are disjoint, is homeomorphic to , and thus has a Cartesian product structure. In other words, in that case, becomes a trivial bundle with acting as the reference fiber. Moreover, under additional conditions on the form of the kernel and the properties of the measure , and may have no zero eigenvalues, in which case .

On the other hand, in a number of cases, including when nontrivial symmetries are present, some of the for may intersect or overlap. In such scenarios, will generally not have a Cartesian product structure, and may not be trivial (in fact, it may not even be a fiber bundle). Moreover, will be a strict subspace of . For data analysis purposes, this property may actually be desirable since functions in are pullbacks of functions on , and the latter is in turn homeomorphic to a “smaller”, quotient space, constructed from the equivalence classes on as described above. In effect, by factoring out redundancies in the data, functions on can be approximated more robustly than functions on , while still capturing the salient properties of the signal.

### iv.2 Spatial symmetries

We now examine one particular scenario encountered frequently in applications leading to the map being non-invertible, namely the presence of spatial symmetries. Throughout this section, we consider that the state space manifold is actually a subset of the Hilbert space . That would be the case, for instance, if is the solution space of a dissipative PDE defined on a spatial domain , and is an inertial manifold of that PDE Constantin et al. (1989); the KS system studied in Section VII is an example of such a system. In this setting, the vector-valued observation map is reduces to the inclusion map from to . For notational simplicity, given , we will identify with . We will also use the symbol to represent the space of continuous vector fields on .

#### iv.2.1 Group actions and their corresponding invariant pseudometrics

Let be a topological group with a left action on . By that, we mean that there exists a map with the following properties:

1. is continuous, and is a homeomorphism for all .

2. is compatible with the group structure of ; that is, for all and ,

 ΓY(gg′,y)=ΓY(g,ΓY(g′,y)),

and , where is the identity element of .

Henceforth, given , we will abbreviate the map by . Note that the (continuous) inverse of this map is given by .

The action of on induces a continuous left action on such that the action map sends to . As usual, we will consider to be a dynamical symmetry (e.g., Holmes et al. (1996)) if the following hold for all :

1. The state space manifold is invariant under . Thus, we obtain a left group action on by restriction of .

2. is differentiable for all , and the dynamical vector field is invariant under the pushforward map . That is, for every , , and , we have

 ΓgX∗(v|x)(f)=v|ΓgX(x)(f),

or, equivalently,

 v|x(f∘ΓgX)=v|ΓgX(x)(f).

Note that the well-definition of as a map on vector fields relies on the fact that is a diffeomorphism (which is in turn a consequence of the fact that is a differentiable group action).

Together, these two conditions lead to the following important commutation (equivariance) property between the group action on and the dynamical evolution map, namely,

 (ΓgX∘Φt)(x)=(Φt∘ΓgX)(x), (15)

which holds for all , , and . Equation (15) expresses the fact that if is a dynamical trajectory starting at , then is also a dynamical trajectory, starting at . In addition, upon evaluation at the point , where is arbitrary, (15) leads to the invariance property

 Φt(ΓgX(x))(ΓgY(y))=Φt(x)(y), (16)

which holds for all , , and .

Next, consider the left -action on the product space induced by the corresponding actions on and ; this is characterized by the action map , , given by

 ΓgΩ((x,y))=(ΓgX(x),ΓgY(y))=(x∘Γg−1Y,ΓgY(y)).

It follows from (15) that this group action and the dynamical evolution map (see Section II) satisfy

 (ΓgΩ∘~Φt)(ω)=(~Φt∘ΓgΩ)(ω), (17)

for all , , and .

The invariance property in (16) endows the pseudometrics and in delay-coordinate space with symmetries, which can be summarized as follows.

###### Proposition 1.

The pseudometric in delay-coordinate space satisfies

 dQ(ω,ω′)=dQ(ΓgΩ(ω),Γg′Ω(ω′)),

for all and . In particular, it vanishes along every -orbit of :

 dQ(ω,ΓgΩ(ω))=0,∀g∈G.
###### Proof.

It follows from (16) that for any , , and ,

The last equation and the fact that lead to the first claim of the Proposition. The second claim follows immediately from the first. ∎

###### Proposition 2.

For the class of scaling functions in Appendix B, the scaled pseudometric from (5) exhibits the analogous invariance properties, namely

 ~dQ(ω,ω′)=~dQ(ΓgΩ(ω),Γg′Ω(ω′)),

for all and .

A proof of Proposition 2 can be found in Appendix B.4.

To examine the implications of these results, consider the quotient set , consisting of all -orbits on . It is a direct consequence of Proposition 1 that the set of equivalence classes on with respect to the pseudometric contains as a subset. Since the delay-coordinate observation map maps all elements of an arbitrary equivalence class (and in particular, the elements of an equivalence class in ) to a single point , we interpret this behavior as our method “factoring out” the dynamical symmetry from the data.

In particular, given any observable , it follows from the definition of the kernel integral operator that for any and ,

 KQf(ω)=KQf(ΓgΩ(ω)).

Moreover, it is straightforward to check that the left- and right-normalization functions and are both invariant under , and as a result,

 PQf(ω)=PQf(ΓgΩ(ω)).

The last equation implies that the eigenfunctions of used for spatiotemporal pattern extraction are constant on -orbits. Note that the properties described above do not rely on any prior knowledge of the symmetry represented by , and would also apply for other types of group symmetries that the system exhibits besides the dynamical symmetries discussed here.

#### iv.2.2 Spectral characterization

We can gain additional intuition on the relationships between the spectra of and symmetry operators associated with acting on , under the additional assumptions that and preserve the measures and , respectively; that is for every and measurable set , and an analogous relationship holds for . An immediate consequence of these assumptions is that the product measure is preserved by . While these properties cannot be expected to hold in general (for example, the support of may not be invariant under ), if they are indeed true, then associated with are groups of unitary operators , , and , defined for every via composition with the corresponding group actions, i.e.,

 RgAfA=fA∘ΓgA,RgYfY=fY∘ΓgY, RgMfM=(R