Spatiotemporal pattern extraction by spectral analysis of vectorvalued observables
Abstract
We develop a datadriven framework for extracting complex spatiotemporal patterns generated by ergodic dynamical systems. Our approach, called Vectorvalued Spectral Analysis (VSA), is based on an eigendecomposition of a kernel integral operator acting on a Hilbert space of vectorvalued observables of the system, taking values in a space of functions (scalar fields) on a spatial domain. This operator is constructed by combining aspects of the theory of operatorvalued kernels for machine learning with delaycoordinate maps of dynamical systems. Specifically, delaycoordinate maps performed pointwise in the spatial domain induce an operator acting on functions on that domain for each pair of dynamical states. Unlike conventional eigendecomposition techniques, which decompose the input data into pairs of temporal and spatial modes with a separable, tensor product structure, the patterns recovered by VSA can be manifestly nonseparable, requiring only a modest number of modes to represent signals with intermittency in both space and time. In addition, our kernel construction naturally quotients out any symmetries present in the data. We show that in the limit of infinitely many delays the kernel integral operator employed in our scheme commutes with a Koopman operator governing the evolution of vectorvalued observables under the dynamics; as a result, in that limit our recovered patterns lie in simultaneous eigenspaces of these operators associated with the point spectrum of the dynamical system. We present applications of VSA to the KuramotoSivashinsky model, which demonstrate considerable performance gains in efficient and meaningful decomposition over eigendecomposition techniques utilizing scalarvalued kernels.
pacs:
I Introduction
Spatiotemporal pattern formation is ubiquitous in physical, biological, and engineered systems, ranging from molecularscale reactiondiffusion systems, to engineering and geophysicalscale convective flows, and astrophysical flows, among many examples Cross and Hohenberg (1993); Ahlers et al. (2009); Slawinska et al. (2015); Fung et al. (2016). The mathematical models for such systems are generally formulated by means of partial differential equations (PDEs), or coupled ordinary differential equations, with dissipation playing an important role in the development of lowdimensional effective dynamics on attracting subsets of the state space Eckmann and Ruelle (1985); Foias et al. (1988); Constantin et al. (1989). In light of this property, many pattern forming systems are amenable to analysis by empirical, datadriven techniques, complementing the scientific understanding gained from firstprinciples approaches.
Historically, many of the classical proper orthogonal decomposition (POD) techniques for spatiotemporal pattern extraction have been based on the spectral properties of temporal and spatial covariance operators estimated from snapshot data (Aubry et al., 1991; Holmes et al., 1996). In Singular Spectrum Analysis (SSA) and related algorithms Broomhead and King (1986); Vautard and Ghil (1989), the combination of this approach with delaycoordinate maps of dynamical systems Packard et al. (1980); Takens (1981); Sauer et al. (1991); Robinson (2005); Deyle and Sugihara (2011) generally improves the representation of the information content of the data in terms of a few meaningful modes. More recently, advances in machine learning and applied harmonic analysis Schölkopf et al. (1998); Belkin and Niyogi (2003); Coifman et al. (2005); Coifman and Lafon (2006); Jones et al. (2008); Singer and Coifman (2008); von Luxburg et al. (2008); Talmon and Coifman (2013); Portegies (2014); Giannakis (2015); Berry and Harlim (2015); Berry and Sauer (2016a, b) have led to the development of techniques recovering temporal and spatial patterns through the eigenfunctions of kernel integral operators (e.g., heat operators) defined intrinsically in terms of a Riemannian geometric structure of the data. In particular, in a family of techniques called Nonlinear Laplacian Spectral Analysis (NLSA) Giannakis and Majda (2011, 2012, 2013), as well as in independent work in Berry et al. (2013), the diffusion maps algorithm Coifman and Lafon (2006) was combined with delaycoordinate maps to extract spatiotemporal patterns through the eigenfunctions of a kernel integral operator adept at capturing distinct and physically meaningful timescales in individual eigenmodes from multiscale highdimensional signals.
At the same time, spatial and temporal patterns have been extracted from eigenfunctions of Koopman Mezić and Banaszuk (2004); Mezić (2005); Rowley et al. (2009); Tu et al. (2014); Arbabi and Mezić (2016); Brunton et al. (2017) and PerronFrobenius Dellnitz and Junge (1999); Froyland and Dellnitz (2003) operators governing the evolution of observables and probability measures, respectively, in dynamical systems Sinai (2000); Budisić et al. (2012); Eisner et al. (2015). In Rowley et al. (2009), it was shown that the dynamic mode decomposition (DMD) algorithm Schmid and Sesterhenn (2008); Schmid (2010) can, under some conditions, approximate spatial patterns called Koopman modes, corresponding to projections of the data onto eigenfunctions of the Koopman operator. The DMD algorithm can be implemented as a singular value decomposition of a linear operator relating timeshifted snapshots to a reference snapshot dataset. As shown in Tu et al. (2014), this has connections with linear inverse modeling (LIM) techniques Penland (1989). In Williams et al. (2015), an extended formulation of DMD was developed using as an approximation space a dictionary of functions providing greater flexibility and stronger convergence guarantees than the snapshotbased dictionary employed in DMD. In Giannakis et al. (2015); Giannakis (2017); Das and Giannakis (2017), an approximation scheme targeting the generator of the Koopman group (as opposed to Koopman operators themselves) was developed using a datadriven orthonormal basis of the Hilbert space of the dynamical system acquired through diffusion maps.
In general, Koopman and PerronFrobenius techniques target operators defined intrinsically for the dynamical system generating the data, and thus can, in principle, recover temporal and spatial patterns of higher physical interpretability and utility in predictive modeling than POD and kernel integral operator based approaches. In practice, however, the Koopman and PerronFrobenius operators tend to have significantly more complicated spectral properties (e.g., nonisolated eigenvalues and/or continuous spectrum), hindering the stability and convergence of datadriven approximation techniques. In Giannakis (2017); Das and Giannakis (2017) these issues were addressed at least in the case of systems with pure point spectra and mixed spectra of their Koopman operators. There, it was shown that the eigenfunctions recovered by kernel integral operators defined on delaycoordinate mapped data (e.g., the covariance and heat operators in SSA and NLSA algorithms, respectively) in fact converge to Koopman eigenfunctions in the limit of infinitely many delays, indicating a deep connection between these two major branches of data analysis algorithms.
All of the techniques described above recover from the data set of temporal patterns and a corresponding set of spatial patterns, referred to in Aubry et al. (1991) as “chronos” and “topos” modes, respectively. In particular, for a dynamical system with a state space developing patterns in a physical domain , each chronos mode, , corresponds to a scalar (real or complex) valued function on , and the corresponding topos mode, , corresponds to a scalarvalued function on . Spatiotemporal reconstructions of the data with these approaches thus correspond to linear combinations of tensor product patterns of the form , mapping pairs of points in the product space to the number . For a dynamical system possessing a compact invariant set with an ergodic invariant measure (e.g., a dissipative system) and states sampled “on the attractor”, the chronos modes become scalarvalued functions on , which may be of significantly smaller dimension than , increasing the feasibility of robust approximation of these modes from finite datasets. As discussed below, if the system additionally possesses a socalled physical measure Young (2002), these patterns can also be approximated from data lying near the invariant set, as would be the case in a typical experimental scenario.
Evidently, for spatiotemporal patterns of high complexity, tensor product patterns of the form with separable dependence on and can be highly inefficient in capturing the properties of the input signal. That is, the number of such patterns needed to recover via a linear superposition would generally be large, and none of the individual patterns would be representative of . In essence, the problem is similar to that of approximating a nonseparable spacetime signal in a tensor product basis of temporal and spatial basis functions. Another issue with tensor product decompositions based on scalarvalued eigenfunctions, which has received considerable attention Aubry et al. (1993); Holmes et al. (1996) is that in the presence of nontrivial spatial symmetries, the recovered patterns are oftentimes pure symmetry modes (e.g., Fourier modes in a periodic domain with translation invariance), with minimal dynamical significance and physical interpretability
Here, we present a framework for spatiotemporal pattern extraction, called Vectorvalued Spectral Analysis (VSA), designed to alleviate the shortcomings mentioned above. Instead of producing separable spacetime modes, our approach recovers patterns in a Hilbert space of vectorvalued observables assigning a scalar field (spatial pattern) on for every dynamical state in , without imposing a separable structure. We recover datadriven patterns in this space through the eigenfunctions of a kernel integral operator, , constructed using the theory of operatorvalued kernels Micchelli and Pontil (2005); Caponnetto et al. (2008); Carmeli et al. (2010). As in NLSA algorithms, operates on data in delaycoordinate space with delays, and has analogous relationships with Koopman operators as those established in Giannakis (2017); Das and Giannakis (2017). However, unlike NLSA and other techniques recovering scalarvalued temporal patterns, the kernel of our integral operator is a function of both the state of the dynamical system in and the spatial position in ; in particular, for every pair of dynamical states in , there is a bounded linear operator acting on functions on (as opposed to a real number as in the case of NLSA and comparable algorithms) constructed from .
As we will show, every eigenfunction of can be equivalently characterized as a functionvalued function on , taking values in a Hilbert space of scalar functions on , or as a scalarvalued function on the set of delayembedded sequences corresponding to all dynamical states in and spatial points in . In particular, can be understood as the base space of a topological bundle with the product space acting as the total space and the observation function in delaycoordinate space as the projection map. This structure naturally quotients out any spatiotemporal symmetries present in the delayembedded data (which may depend on ), reducing the dimension of when such symmetries are present. Moreover, the eigenfunctions of can have a manifestly nonseparable structure. In fact, we will see that for small to moderate numbers of delays the leading nonconstant eigenspace of can capture a large portion of the input signal. Thus, our kernel acts as a matched filter for the spatiotemporal signal generated by the dynamical system. This property has useful applications requiring estimation of level sets of functions from noisy data, such as topological data analysis Carlsson (2009), and is mathematically related to nonlocal averaging techniques in image processing. Besides spatiotemporal data, this bundle construction may be useful in other scenarios, such as analysis of data generated by dynamical systems with varying parameters Yair et al. (2017).
Another limit of interest corresponds to infinitely many delays, , where a result that generalizes the results in Giannakis (2017); Das and Giannakis (2017) is found to hold. Namely, as , and converge to operators, and , respectively, that commute with the lift of the Koopman operator of the dynamical system on . This result implies that and have common eigenfunctions with the lifted Koopman operator, and thus act as filters recovering the component of the signal associated with the discrete Koopman spectrum. We demonstrate the utility of our approach in the context of the KuramotoSivashinsky (KS) model Kuramoto and Tsuzuki (1976); Sivashinsky (1977) in chaotic regimes.
The plan of this paper is as follows. In Section II, we introduce the class of systems under study, and establish our notation. Section III presents the VSA framework for spatiotemporal pattern extraction using operatorvalued kernels. Section IV discusses certain connections between our framework and the theory of topological bundles, which are useful for gaining intuition on the behavior of our technique, particularly in the presence of symmetries. In Section V, we examine the asymptotic behavior of VSA in the limit of no delays or infinitelymany delays, and establish connections with nonlocal averaging and Koopman operators, respectively. Section VI describes the datadriven implementation of our techniques. That section includes Theorem 6 and Corollary 7, which establish convergence of our datadriven algorithms to their counterparts in section III. In Section VII, we present our applications to the KS system. Our primary conclusions are described in Section VIII. The paper includes four appendices containing a summary of the properties of Koopman eigenfunctions and technical results. All numerical results presented here were obtained via a Matlab code which is available for download from the first author’s personal website, http://cims.nyu.edu/~dimitris.
Ii Notation and preliminaries
Let , , be a continuoustime flow on a differentiable (of class ) manifold , generated by a Lipschitz vector field , possessing a compact invariant set and an ergodic invariant probability measure with support equal to . As with every such manifold, can be equipped with a metric ; in what follows we will not require an explicit specification of , but its existence will be used in the proof of Theorem 6. Let also be a compact, connected metric space equipped with a Borel probability measure . In what follows, will be the dynamical system generating the data and the spatial domain in which measurements are taken. The measure provides a notion of normalized volume on . In addition, we consider that we observe the system at a fixed sampling time interval through a function ; that is, for every , is a continuous scalarvalued function on , and the dependence of on is continuous. Associated with is a continuous scalarvalued function on such that . The product space can be equipped with a metric induced by and .
The assumptions stated above are sufficient to ensure that the VSA framework introduced below is theoretically well posed. Additional assumptions are needed to ensure that our scheme is amenable to datadriven approximation using finite numbers of sampled points in and . These assumptions will be stated in Section VI.1.
Next, we introduce groups of Koopman operators Sinai (2000); Budisić et al. (2012); Eisner et al. (2015), governing the evolution of scalar and vectorvalued observables under the dynamics on . For that, we first specify appropriate function spaces. In the case of scalarvalued observables, , the standard space to consider is the Hilbert space associated with the invariant measure. Hereafter, we abbreviate our notation for this space using , and also use the notations and for its inner product and norm, respectively. Similarly, we consider scalarvalued observables on the spatial domain and the product space lying in the Hilbert spaces and , respectively, where is the product probability measure between the invariant measure of the dynamics and the volume measure of the spatial domain. The inner products and norms of these spaces are defined analogously to those of , and will be denoted using and subscripts, respectively. Because the measures and are both finite (and thus and are both separable), is isomorphic to the tensor product space ; that is, every element of can be expressed as a countable sum with and .
The Koopman operator on at time is the unitary, bounded operator that acts on observables via composition with the flow map, namely , or equivalently, , where the equality holds up to null sets in with respect to . Note that the unitarity of , , is a consequence of preserving , and we also have . The Koopman operators on trivially lift to the unitary Koopman operators on , which are composition operators associated with the flow . Note that “dynamics” on the spatial domain are trivial under this flow, as they are described by the identity map . Moreover, is manifestly nonergodic, as every set of the form with is invariant and has positive measure. Further details on the properties of Koopman operators are included in Appendix A.
Koopman operators give rise to a distinguished class of observables through their eigenfunctions. In the case of , the Koopman eigenvalue problem reads
As discussed in Appendix A, in ergodic dynamical systems the eigenvalues are simple, lie on the unit circle, and take the form , where are real frequencies. Thus, under the action of , every such eigenfunction evolves by multiplication by a periodic phase factor; this makes Koopman eigenfunctions highly predictable observables. In addition, Koopman eigenfunctions have the important property of being intrinsic to the dynamical system; i.e., they do not depend on additional structures such as a Riemannian geometry often utilized in manifold learning algorithms. In measurepreserving systems, Koopman eigenfunctions corresponding to distinct eigenvalues are orthogonal by unitarity of . Hereafter, we will assume that all eigenfunctions of are continuous; while this is not necessarily the case, we are not aware of any continuoustime flows with discontinuous Koopman eigenfunctions (though discretetime dynamical systems with discontinuous Koopman eigenfunctions are known to exist Anosov and Katok (1970)).
The eigenvalue problem for the lifted Koopman operator on ,
(1) 
is structurally very similar to that for ; in fact, it is a direct consequence of the definition of that its eigenvalues are the same as those of . However, in contrast to the onedimensional eigenspaces of , the eigenspaces of are all infinite dimensional. In particular, given any eigenfunction of at eigenvalue and any spatial pattern , the function is an eigenfunction of at the same eigenvalue.
Here, in addition to spaces of scalarvalued observables, we will be interested in vectorvalued observables lying in the space . This space consists of (equivalence classes of) functions on taking values in , which are squareintegrable with respect to the invariant measure . That is, every observable satisfies . is a Hilbert space for the inner product . Given a point (an initial condition), we can associate a spatiotemporal pattern to every continuous observable through the relationship
(2) 
This generalizes the standard notion of temporal patterns (including the chronos modes) constructed from scalar valued observables in or via analogous relationships to (2). As with the isomorphism between and noted above, the finiteness of the measures and implies the isomorphisms . Thus, we can equivalently characterize the observables of interest in this work either as vectorvalued functions in , scalarvalued functions in , or elements of the tensor product space .
Iii Vectorvalued Spectral Analysis formalism
We now describe the VSA framework for spatiotemporal pattern extraction through operatorvalued kernels. Due to the isomorphism between and stated in Section II, we first introduce our approach through a scalarvalued kernel construction on the product space , and then pass to an equivalent operatorvalued kernel formulation on the invariant set .
iii.1 Scalarvalued kernels and their associated integral operators
For the purposes of this work, a (scalarvalued) kernel is a function mapping pairs of points in to a positive real number. Intuitively, can be thought of as a measure of similarity between points in . Mathematically, we require that it has the following properties:

is continuous on . Hence, restricted to any compact subset of (in particular, ), it is bounded.

Restricted to , is bounded away from zero; that is, there exists a constant such that for all .
Sometimes, but not always, we consider symmetric kernels, i.e., kernels having the additional property for all . It should be noted that while the assumptions above are sufficient to ensure the theoretical wellposedness of our schemes, in Section VI below we will make additional assumptions in order to construct convergent datadriven approximations to these schemes. In particular, we will require that be continuous on a compact superset of of positive Lebesgue measure.
Associated with every kernel is an integral operator defined via , where
(3) 
By continuity of , is a compact operator whose range is included in the space of continuous functions on . If, in addition, is symmetric, is also selfadjoint. As a result, in the symmetric case, the eigenfunctions of are continuous, and its eigenvalues are real, bounded, and can only accumulate at zero. Moreover, the nonzero eigenvalues of have finite multiplicities (i.e., the corresponding eigenspaces are finitedimensional).
iii.2 Kernels from pseudometric functions
We will focus on kernels constructed from scaled pseudometric functions. Below, a pseudometric will be a symmetric function , continuous on , and having the properties and for all . Note that can vanish even if , which would not be the case for a proper metric. Moreover, is not required here to obey the triangle inequality.
We first consider the class of pseudometric functions , , obtained from delaycoordinate mapped data with delays. Specifically, given and with and , we define
Note that corresponds to a timeaveraged squared pseudodistance, evaluated by fixing the spatial points , and sampling the observation map on temporallyequispaced points along the dynamical trajectories starting from .
We will also be interested in the limiting behavior of as the number of delays increases, viz.
(4) 
Unlike at finite , may not be continuous on . However, it can be shown that it is a welldefined function in ; in particular, the limit in (4) exists for almost every pair of points with respect to the product measure Das and Giannakis (2017). Moreover, may vanish despite being nonvanishing (e.g., if lies in the stable manifold of ). Additional details on the properties of can be found in Appendix C.
Next, as is common in machine learning algorithms ZelnikManor and Perona (2004); Giannakis and Majda (2012); Giannakis (2015); Berry and Harlim (2015); Berry and Sauer (2016a) (including NLSA), we introduce a positive scaling function , and form a rescaled pseudometric , given by
(5) 
We require that, restricted to , is continuous (and therefore bounded). Under these conditions, has the required properties of pseudometrics stated above, and we can similarly study the limit
(6) 
see Appendix C for additional details.
Intuitively, one can think of as a modification of the original pseudometric to account for variations in the sampling density and/or time tendency (“velocity”) of the data. For instance, we can choose such that it is large in regions of where the sampling density is large, and small in regions where the sampling density is small. With this choice, becomes relatively smaller (larger) than for pairs of points in regions of with small (large) sampling density. As a result, adaptively provides higher “resolution” in regions where we have plentiful observations and smaller resolution in regions where the observations are sparse. Bandwidth functions employed in the literature include functions based on nearneighbor distances ZelnikManor and Perona (2004); Berry and Sauer (2016a), local time tendencies of the data Giannakis and Majda (2012); Giannakis (2015), and kernel density estimates Berry and Harlim (2015); Berry and Sauer (2016a). In certain cases involving data on smooth Riemannian manifolds, the transformation associated with these techniques can be interpreted as a conformal change of metric Giannakis (2015); Berry and Sauer (2016a); that is, a change of volume on the data manifold without change of angles.
Here, we employ a class of scaling functions introduced in Giannakis (2017), which has both velocity and densitydependent contributions. Explicit formulas and discussion on this class of bandwidth functions are included in Appendix B. Note that in ZelnikManor and Perona (2004); Berry and Sauer (2016a); Giannakis and Majda (2012); Giannakis (2015) the convention is to work with rescaled distance functions of the form
(7) 
which is equivalent to (5) with , so long as is bounded away from zero. Here, we have opted to work with (5) in order to allow for the possibility that vanishes. As we will see below, this may be useful in the context of spatiotemporal data where the signal can exhibit constant or nearconstant values in certain parts of the domain (e.g., if is the temperature field in a convection flow, and held to fixed values at the boundary of ).
Having specified the class of pseudometrics , we also select a continuous shape function , and define
By symmetry of pseudometrics, and are both symmetric kernels. In what follows, we will always use Gaussian shape functions, , parameterized by a positive bandwidth parameter . Explicitly, we have
(8) 
We denote the kernel integral operators from (3) associated with and by and , respectively. Note that in the context of exponentially decaying kernels such as (8) the function from (7) can be interpreted as a bandwidth function since, for a fixed base point , a large (small) value increases (decreases) the size of the neighborhood where is appreciably greater than zero.
The Gaussian class of shape functions has been widely used in machine learning applications Schölkopf et al. (1998); Belkin and Niyogi (2003); Coifman et al. (2005); Coifman and Lafon (2006); Berry and Sauer (2016b), including NLSA. Among their attractive properties is their localizing behavior at small (not true for quadratic shape functions associated with the covariance kernels employed in POD), which allows the associated kernel integral operators to approximate, as , heat kernels on Riemannian manifolds. The latter have, in turn, a number of useful properties for dimension reduction and clustering of data with a manifold geometric structure. The exponential decay of and also implies that and can be well approximated by sparse matrices. The latter property is particularly important in the problems studied here involving spatiotemporal data, as the number of sampled states in and spatial points in are generally both large (see Section VI). It is important to note that in the present work we do not assume that is a smooth manifold (since for nonlinear dissipative systems with fractal attractors is highly nonsmooth), so we do not consider the limiting behavior of our kernels as .
iii.3 Markov normalization
Following the approach taken in NLSA algorithms and in Berry et al. (2013); Giannakis et al. (2015); Giannakis (2017); Das and Giannakis (2017), we construct from and associated compact Markov operators, and , respectively, by applying the normalization procedure introduced in the diffusion maps algorithm Coifman and Lafon (2006) and further developed in the context of general exponentially decaying kernels in Berry and Sauer (2016b). Below, we describe the procedure for a general kernel integral operator of the class in (3) associated with a kernel .
We begin by defining the functions
where is the function on everywhere equal to 1. By the properties of kernels listed in section III.1, both of the above functions are continuous, positive functions on , bounded away from zero. We then define the kernel integral operator (again of the class in (3)), associated with the kernel , where
(9) 
In Berry and Sauer (2016b), the division of by and to form is referred to as left and right normalization, respectively. Because and are both positive and bounded away from zero (which is in turn a consequence of property 2 of ), meets both conditions on kernels stated in Section III.1. Moreover, it has the Markov property,
by construction. As result, is a compact Markov operator on , preserving constant functions. In addition, since is bounded away from zero, is ergodic; i.e., it has a simple eigenvalue , and the corresponding eigenfunction, , is constant. Here, we are interested in the kernels and , and the corresponding Markov operators and , constructed by applying the diffusion maps normalization procedure to and , respectively.
In general, the kernels of the class in (9), including and , are not symmetric, and as a result the corresponding Markov operators are not selfadjoint. However, for a kernel constructed by applying the diffusion maps normalization procedure to a symmetric kernel (including the kernels introduced in Section III.1), the corresponding operator is related to a selfadjoint compact operator, by a similarity transformation. In particular, let be a bounded function in , and the corresponding multiplication operator by . That is, for , is the function equal to for a.e. . Defining as the selfadjoint kernel integral operator from (3) with the symmetric kernel
one can verify that can be obtained from through the similarity transformation
(10) 
Due to (10), and have the same eigenvalues , which are real by selfadjointness of , and admit the ordering , because is an ergodic Markov operator. Moreover, the eigenvalues have the properties , , and the eigenspaces corresponding to the nonzero are finitedimensional.
Since is selfadjoint, there exists an orthonormal basis of consisting of real eigenfunctions of corresponding to the , which are continuous by the assumed continuity of kernels. Moreover, due to (10), for every element of this basis, the continuous functions and are eigenfunctions of and , respectively, corresponding to the same eigenvalue . The sets and are (nonorthogonal) bases of satisfying the biorthogonality relation . In particular, every can be expanded as with , and we have . For notational simplicity, in what follows we will use the same symbols, () to represent the eigenvalues (eigenfunctions) of either or ; the symmetric operators constructed by applying (10) to and , respectively. Moreover, the eigenfunctions of and will both be denoted by .
iii.4 Operatorvalued kernels and the associated spatiotemporal patterns
Having constructed the kernel integral operators of section III.2, we now make use of the isomorphism between and the Hilbert space of vectorvalued observables to establish analogous operators associated with operatorvalued kernels. Let be the Banach space of bounded linear operators on , equipped with the operator norm. For our purposes, an operatorvalued kernel will be a continuous map , mapping each pair of states to a bounded linear operator in . Note that we do not impose a positivedefiniteness property employed in the definition of operatorvalued kernels in Micchelli and Pontil (2005); Caponnetto et al. (2008); Carmeli et al. (2010); that property is important in the construction of associated reproducing kernel Hilbert spaces of vectorvalued observables, which we do not address in this work (though it would be an interesting topic for future work).
To every such operatorvalued kernel there corresponds a kernel integral operator defined by , where
(11) 
and the above integral is a Bochner integral for Hilbertspace valued observables taking values in .
Given a scalar kernel from Section III.1, we can constructed an associated operatorvalued kernel by observing that the functions , defined for every as , are kernels on the spatial domain . As a result, associated with every is a compact (hence bounded) kernel integral operator , defined analogously to (3) via
The operatorvalued kernel associated with is then defined as . It follows by construction of that if is an eigenfunction of the integral operator from (3) associated with corresponding to the eigenvalue , then the vectorvalued function defined as
(12) 
is an eigenfunction of corresponding to the same eigenvalue . Note that by continuity of and , both and are continuous.
In this work, we extract spatiotemporal patterns using the eigenfunctions of the kernel integral operator of the class (11), whose kernel is the operatorvalued kernel associated with the Markov kernel from Section III.3. We also consider the analogous limiting operator with kernel associated with the Markov kernel , characterizing the behavior of in the limit of infinitely many delays. As with their counterparts and acting on scalarvalued functions in , there exists a (nonorthogonal) basis of consisting of real eigenfunctions of related to those of via (12). Moreover, to each such eigenfunction there corresponds a spatiotemporal pattern obtained via (2). In Section VII, we will demonstrate that these patterns can provide an efficient and physically meaningful decomposition of complex spatiotemporal signals.
Iv Connections with topological bundles
To gain intuition on the behavior of the kernel integral operators introduced in Section III, it is useful to examine a natural topological bundle structure that acquires through the observation map in delaycoordinate space. First, we recall that a topological bundle is a triplet , where and are topological spaces, called the total and base spaces, respectively, and is a continuous surjective map. The preimage of a point is called the fiber of over . A topological bundle is called a fiber bundle if all fibers are homeomorphic to a reference fiber, . If it happens that , the fiber is said to be trivial. While not every fiber bundle will have this property, it is the case that fiber bundles are locally trivial; that is, for every there exists a neighborhood such that is homeomorphic to , and agrees with the natural projection onto the first factor of that Cartesian product; see, e.g., James (1995) for more details.
iv.1 Bundles for finitely many delays
In what follows, we will be interested in topological bundles where the total space is the product space , and the base space and the projection map are constructed by application of delaycoordinate maps to . In particular, for , let be the continuous map defined by
Let also be the image of under , and the map defined uniquely by for all . Then, is a topological bundle. Note that the invariant measure on induces a Borel measure on , where is the pushforward map on measures associated with .
The usefulness of this construction is that we can express the pseudometric on as a pullback of a Euclidean metric on . In particular, let be the Euclidean metric on scaled by the square root of the number of delays, i.e.,
Then, for all we have
(13) 
which expresses the fact that is a pullback of . The same conclusion holds for the restrictions of and to and , respectively, as stated above. Now, as with every pseudometric, partitions (and therefore ) into equivalence classes of points for which vanishes. In particular, the equivalence class associated with is given by . It follows from (13) that the set of all such equivalence classes in is homeomorphic (topologically equivalent) to ; that is, for every , the equivalence class is given by the preimage .
By similar arguments, we can also conclude that the kernel can be expressed as a pullback of the kernel constructed by applying the Markov normalization procedure described in Section III.3 to the kernel on . That is, we have
As a result, the eigenfunctions of correspond to pullbacks of the eigenfunctions of the kernel integral operator whose kernel is . Explicitly, we have
(14) 
where is an eigenfunction of satisfying
In general, the “useful” patterns that can be extracted from a kernel integral operator are represented by eigenfunctions at nonzero corresponding eigenvalues (recall that such eigenfunctions lie in finitedimensional eigenspaces, and thus can be robustly approximated from data). Therefore, it is useful to examine the closed subspace
whose every element is expressible as a countable linear combination of such eigenfunctions of . Note that can be equivalently defined as the closure of the range of , or as a pullback under of the closure of the range of . In particular, whether or not is a strict subspace of depends on the number of delays , and importantly, it also depends on the presence of symmetries in the data.
To examine this in more detail, suppose that for every spatial point the observation map is generic in the sense of delaycoordinate mappings *PackardEtAl80,Takens81,SauerEtAl91,Robinson05,DeyleSugihara11; that is, there exists a minimal positive integer such that for all the image of the set is homeomorphic to . In that case, we can topologically reconstruct the attractor from timelagged measurements taken at . Assume now that is finite, and observe that, for any , the base space of the bundle is given by . For , if the sets are disjoint, is homeomorphic to , and thus has a Cartesian product structure. In other words, in that case, becomes a trivial bundle with acting as the reference fiber. Moreover, under additional conditions on the form of the kernel and the properties of the measure , and may have no zero eigenvalues, in which case .
On the other hand, in a number of cases, including when nontrivial symmetries are present, some of the for may intersect or overlap. In such scenarios, will generally not have a Cartesian product structure, and may not be trivial (in fact, it may not even be a fiber bundle). Moreover, will be a strict subspace of . For data analysis purposes, this property may actually be desirable since functions in are pullbacks of functions on , and the latter is in turn homeomorphic to a “smaller”, quotient space, constructed from the equivalence classes on as described above. In effect, by factoring out redundancies in the data, functions on can be approximated more robustly than functions on , while still capturing the salient properties of the signal.
iv.2 Spatial symmetries
We now examine one particular scenario encountered frequently in applications leading to the map being noninvertible, namely the presence of spatial symmetries. Throughout this section, we consider that the state space manifold is actually a subset of the Hilbert space . That would be the case, for instance, if is the solution space of a dissipative PDE defined on a spatial domain , and is an inertial manifold of that PDE Constantin et al. (1989); the KS system studied in Section VII is an example of such a system. In this setting, the vectorvalued observation map is reduces to the inclusion map from to . For notational simplicity, given , we will identify with . We will also use the symbol to represent the space of continuous vector fields on .
iv.2.1 Group actions and their corresponding invariant pseudometrics
Let be a topological group with a left action on . By that, we mean that there exists a map with the following properties:

is continuous, and is a homeomorphism for all .

is compatible with the group structure of ; that is, for all and ,
and , where is the identity element of .
Henceforth, given , we will abbreviate the map by . Note that the (continuous) inverse of this map is given by .
The action of on induces a continuous left action on such that the action map sends to . As usual, we will consider to be a dynamical symmetry (e.g., Holmes et al. (1996)) if the following hold for all :

The state space manifold is invariant under . Thus, we obtain a left group action on by restriction of .

is differentiable for all , and the dynamical vector field is invariant under the pushforward map . That is, for every , , and , we have
or, equivalently,
Note that the welldefinition of as a map on vector fields relies on the fact that is a diffeomorphism (which is in turn a consequence of the fact that is a differentiable group action).
Together, these two conditions lead to the following important commutation (equivariance) property between the group action on and the dynamical evolution map, namely,
(15) 
which holds for all , , and . Equation (15) expresses the fact that if is a dynamical trajectory starting at , then is also a dynamical trajectory, starting at . In addition, upon evaluation at the point , where is arbitrary, (15) leads to the invariance property
(16) 
which holds for all , , and .
Next, consider the left action on the product space induced by the corresponding actions on and ; this is characterized by the action map , , given by
It follows from (15) that this group action and the dynamical evolution map (see Section II) satisfy
(17) 
for all , , and .
The invariance property in (16) endows the pseudometrics and in delaycoordinate space with symmetries, which can be summarized as follows.
Proposition 1.
The pseudometric in delaycoordinate space satisfies
for all and . In particular, it vanishes along every orbit of :
Proof.
It follows from (16) that for any , , and ,
The last equation and the fact that lead to the first claim of the Proposition. The second claim follows immediately from the first. ∎
Proposition 2.
To examine the implications of these results, consider the quotient set , consisting of all orbits on . It is a direct consequence of Proposition 1 that the set of equivalence classes on with respect to the pseudometric contains as a subset. Since the delaycoordinate observation map maps all elements of an arbitrary equivalence class (and in particular, the elements of an equivalence class in ) to a single point , we interpret this behavior as our method “factoring out” the dynamical symmetry from the data.
In particular, given any observable , it follows from the definition of the kernel integral operator that for any and ,
Moreover, it is straightforward to check that the left and rightnormalization functions and are both invariant under , and as a result,
The last equation implies that the eigenfunctions of used for spatiotemporal pattern extraction are constant on orbits. Note that the properties described above do not rely on any prior knowledge of the symmetry represented by , and would also apply for other types of group symmetries that the system exhibits besides the dynamical symmetries discussed here.
iv.2.2 Spectral characterization
We can gain additional intuition on the relationships between the spectra of and symmetry operators associated with acting on , under the additional assumptions that and preserve the measures and , respectively; that is for every and measurable set , and an analogous relationship holds for . An immediate consequence of these assumptions is that the product measure is preserved by . While these properties cannot be expected to hold in general (for example, the support of may not be invariant under ), if they are indeed true, then associated with are groups of unitary operators , , and , defined for every via composition with the corresponding group actions, i.e.,