Nonlinear Laplacian spectral analysis: Capturing intermittent and lowfrequency spatiotemporal patterns in highdimensional data
251 Mercer St, New York, NY 10012
Abstract
We present a technique for spatiotemporal data analysis called nonlinear Laplacian spectral analysis (NLSA), which generalizes singular spectrum analysis (SSA) to take into account the nonlinear manifold structure of complex datasets. The key principle underlying NLSA is that the functions used to represent temporal patterns should exhibit a degree of smoothness on the nonlinear data manifold ; a constraint absent from classical SSA. NLSA enforces such a notion of smoothness by requiring that temporal patterns belong in lowdimensional Hilbert spaces spanned by the leading LaplaceBeltrami eigenfunctions on . These eigenfunctions can be evaluated efficiently in high ambientspace dimensions using sparse graphtheoretic algorithms. Moreover, they provide orthonormal bases to expand a family of linear maps, whose singular value decomposition leads to sets of spatiotemporal patterns at progressively finer resolution on the data manifold. The Riemannian measure of and an adaptive graph kernel width enhances the capability of NLSA to detect important nonlinear processes, including intermittency and rare events. The minimum dimension of required to capture these features while avoiding overfitting is estimated here using spectral entropy criteria.
As an application, we study the upperocean temperature in the North Pacific sector of a 700year control run of the CCSM3 climate model. Besides the familiar annual and decadal modes, NLSA recovers a family of intermittent processes associated with the Kuroshio current and the subtropical and subpolar gyres. These processes carry little variance (and are therefore not captured by SSA), yet their dynamical role is expected to be significant.
Keywords
Laplacian eigenmaps, singular spectrum analysis, intermittency, decadal variability, manifold embedding
1 Introduction
In recent years, there has been a proliferation of data in geophysics and other applied sciences acquired through observations [1], reanalyses [2], and largescale numerical models [3]. The availability of such data holds promise for advances in a number of important applications, such as decadalrange climate forecasts through improved understanding of regime transitions in the ocean [4] and weather forecasts beyond sevenday lead times through skillful models of organized convection in the tropics [5]. However, a major obstacle in using largescale datasets in such applications is their sheer volume and complexity. Typically, the data are presented in the form of a highdimensional time series, acquired through noisy, partial observations of a stronglynonlinear dynamical system. Thus, there exists a strong practical interest in developing novel data analysis techniques to decompose the data into a set of spatiotemporal patterns revealing the operating nonlinear processes. Such patterns can be used to gain scientific understanding of complex phenomena, or to build reduced dynamical models for prediction.
Machine learning methods, such as those based on kernels [6] and neural networks [7], are wellsuited to capture the nonlinear features of data generated by complex systems, but in certain cases are prone to overfitting and/or poor scaling with the dimension of ambient space [8]. In contrast, classical linear approaches, such as singular spectrum analysis (SSA) and its variants [9, 10, 11, 12], have the advantages of direct physical interpretability and analysis through the tools of linear algebra, but are generally limited in detecting patterns carrying a high portion of the variance (energy) of the observed signal. Such patterns are likely to miss important nonlinear dynamical features, including intermittency and rare events [13, 14]. The latter carry low variance, but can play an important dynamical role by triggering largescale regime transitions.
In [15] (building on preliminary work in [16]), a method called nonlinear Laplacian spectral analysis (NLSA) was developed, whose objective is to address the above shortcomings by combining aspects of both nonlinear and linear methods. Similarly to SSA, NLSA decomposes an observed signal through spectral analysis of linear operators mapping a space of temporal modes (the “chronos” space) to the corresponding space of spatial patterns (“topos” space) [10]. However, the linear operators used in NLSA, denoted here by , differ crucially from SSA in that they are tailored to the nonlinear manifold structure of the data. Specifically, the chronos spaces in NLSA are lowdimensional subspaces of the Hilbert space of squareintegrable functions on the data manifold , with the Riemannian measure induced by the embedding of in dimensional ambient space. NLSA employs graphtheoretic algorithms [17, 18] to produce a set of orthonormal LaplaceBeltrami eigenfunctions, which form bases for a family of dimensional subspaces of , with controlling the scale (“resolution”) on the data manifold of the temporal patterns described by the corresponding operator. A decomposition of the data into a set of spatial and temporal patterns then follows from singular value decomposition (SVD) of the matrix representing in the eigenfunction basis.
A key difference between NLSA and other nonlinear dimensionality reduction techniques in the literature is that the eigenfunctions are not used to define feature maps as done, e.g., in kernel PCA [19, 6]. Moreover, the diffusion process on the graph employed to evaluate the LaplaceBeltrami eigenfunctions has no implied relation to the actual dynamics (cf. [20, 21]). Rather, the graph Laplacian eigenfunctions are used in NLSA solely as a set of basis functions for . The physical, spatiotemporal processes operating on the data manifold are obtained in the linear SVD step. Two further important ingredients of the scheme are:

Timelagged embedding [22] to alleviate nonMarkovianity of the input data due to partial observations, and capture propagating coherent structures;

Adaptive kernels in the construction of the graph Laplacian (with Gaussian widths determined from the distances between temporal nearest neighbors), enhancing the capability of the algorithm to recover rapid transitions and rare events.
We demonstrate the efficacy of the scheme in an analysis of the North Pacific sector of the Community Climate System model version 3 (CCSM3) [23], augmenting the work in [15, 16] with new practical criteria for choosing the dimension of the temporal spaces . Using a 700year equilibrated dataset of the upper300 m ocean [24, 4, 25], we identify a number of qualitativelydistinct spatiotemporal processes, each with a meaningful physical interpretation. These include the seasonal cycle, semiannual variability, as well as decadalscale processes resembling the Pacific decadal oscillation (PDO) [26]. Besides these modes, which are familiar from SSA, the spectrum of NLSA also contains modes with a strongly intermittent behavior in the temporal domain, characterized by fiveyear periods of highamplitude oscillations with annual and semiannual frequencies, separated by periods of quiescence. Spatially, these modes describe enhanced eastward transport in the region of the Kuroshio current, as well as retrograde (westward) propagating temperature anomalies and circulation patterns resembling the subpolar and subtropical gyres. The burstinglike behavior of these modes, a hallmark of nonlinear dynamics, means that they carry little variance of the raw signal (about an order of magnitude less than the seasonal and PDO modes), and as a result, they are not captured by linear SSA.
Here, we pay particular attention to the choice of the dimension of . Introducing a spectral entropy measure characterizing the change in the energy distribution among the modes of as grows, we propose to select as the minimum value beyond which becomes small. We find this to be a particularly effective way to prevent overfitting the data (a common issue in machine learning methods [8]), while capturing the important features of the signal through the spatiotemporal modes of . Another key theme of the framework presented here is to apply minimal or no preprocessing to the data prior to analysis. In particular, we find that removing the seasonal cycle from the data (as is commonly done in the geosciences) eliminates the intrinsically nonlinear intermittent modes from the spectrum of ; a fact which we attribute to the distortion of neighborhood relationships imparted by preprocessing a nonlinear data manifold through a linear operation such as seasonal detrending. These results should be useful in several disciplines where data detrending is common practice. Overall, the theoretical perspective in this paper complements our earlier work [15], emphasizing the role of regularity of the temporal basis functions on the nonlinear data manifold.
The plan of this paper is as follows. In Sec. 2, we describe the theoretical framework of NLSA algorithms. In Sec. 3, we apply this framework to the upperocean temperature in the North Pacific sector of CCSM3. We discuss the implications of these results in Sec. 4, and conclude in Sec. 5. A Movie showing dynamical evolution of spatiotemporal patterns is provided as Additional Supporting Information.
2 Theoretical framework
We consider that we have at our disposal samples of a timeseries of a dimensional variable sampled uniformly with time step . Here, is generated by a dynamical system, but observations of alone are not sufficient to uniquely determine the state of the system in phase space; i.e., our observations are incomplete. For instance, in Section 3 ahead, will be a depthaveraged ocean temperature field restricted in the NorthPacific sector of CCSM3. Our objective is to produce a decomposition of into a set of spatiotemporal patterns,
(1) 
taking explicitly into account the fact that the underlying trajectory of the dynamical system lies on a nonlinear manifold in phase space.
2.1 Overview of NLSA
The methodology employed here to address this objective consists of four basic steps: (1) Embed the observed data in a vector space of dimension greater than via the method of delays; (2) construct a linear map taking a Hilbert space of scalar functions on representing temporal patterns to the spatial patterns in ; (3) perform an SVD in a basis of orthonormal Laplacian eigenfunctions to extract the spatial and temporal modes associated with ; (4) project the modes from to physical space to obtain the spatiotemporal patterns in (1). Below, we provide a description of each step. Further details of the procedure, as well as pseudocode, are presented in [15]. Hereafter, we shall consider that is compact and smooth, so that a welldefined spectral theory exists [27]. Even though these conditions may not be fulfilled in practice, eventually we will pass to a discrete, graphtheoretic description [28], where smoothness is not an issue.
Step (1) is familiar from the qualitative theory of dynamical systems [29, 30, 22, 31]. Under generic conditions, the image of in embedding space under the delayedcoordinate mapping,
(2) 
lies on a manifold which is diffeomorphic to (i.e., indistinguishable from from the point of view of differential geometry), provided that the dimension of is sufficiently large. Thus, given a sufficientlylong embedding window , we obtain a representation of the nonlinear manifold underlying our incomplete observations, which can be thought of as a curved hypersurface in Euclidean space. That hypersurface inherits a Riemannian metric (i.e., an inner product between tangent vectors on constructed from the canonical inner product of ) and a corresponding Riemannian measure .
Steps (2) and (3) effectively constitute a generalization of SSA, adapted to nonlinear datasets. First, recall that SSA [10] views the data matrix
(3) 
(dimensioned for samples in dimensional embedding space) as a linear map from the space of temporal patterns (socalled chronos space) to the space of spatial patterns (the topos space), defined as
(4) 
That is, the spatial pattern corresponding to the temporal pattern is given by a weighted sum of the data points by . Depending on the application at hand, may be replaced by a more general Hilbert space over the set of temporal observations . In either case, SSA produces a spatiotemporal decomposition of the signal through SVD of the linear map , viz.
(5) 
with
(6) 
This leads to a rank decomposition of the signal through
(7) 
where is the component of associated with time . Similarly to (4), the spatial pattern corresponding to is given by a weighted sum of the input data,
(8) 
Geometrically, are the principal axes of the covariance ellipsoid , and linear projections of the data onto those axes.
Even though the temporal patterns are wellbehaved functions of time [because they are squareintegrable functions in ], they exhibit no notion of regularity on the nonlinear data manifold. That is, the corresponding temporal patterns in SSA (acquired through linear projections) may develop rapid oscillations which are not related to the intrinsic geometry of the nonlinear data manifold (e..g, if the embedding of in produces folds). In NLSA, geometrical regularity is viewed as an essential ingredient of an efficient description of highdimensional complex data, and is enforced by replacing the chronos space of SSA with function spaces on the data manifold of sufficient smoothness. More specifically, the temporal modes in NLSA have wellbehaved derivatives on the data manifold, in the sense that is integrable on . Here and in (9), summation over the tensorial indices and is over the dimensions of , and the gradient operator associated with the Riemannian metric of . Moreover, we assume for now that is smooth and compact, so that a welldefined spectral theory exists. Later, we will pass to a graphtheoretic description, where smoothness is not an issue.
A natural set of basis functions possessing this type of regularity are the eigenfunctions of the LaplaceBeltrami operator , defined as [27]
(9) 
with , and an element of the Hilbert space of squareintegrable scalar functions on with inner product inherited from the Riemannian measure [see (11) ahead]. The eigenfunctions of are solutions of the eigenvalue problem
(10) 
(together with appropriate boundary conditions if has boundaries) with . Moreover, can be chosen to be orthonormal with respect to the inner product of , i.e.,
(11) 
Let be the dimensional subspace of spanned by the leading eigenfunctions from (10) meeting the orthonormality condition in (11); i.e.,
(12) 
These spaces have the following important properties.

As , provides a dense coverage of . Moreover, if is a differentiable manifold, every basis element of is a smooth function with bounded covariant derivative. Heuristically, may be though of as a parameter controlling the scale on the data manifold resolved by the eigenfunctions spanning .

For sufficiently small and largeenough number of samples , the values of on the discrete samples of the data manifold in (3) can be computed efficiently using sparse graphtheoretic algorithms [17, 32, 18]. Even if is not a smooth manifold (as is frequently the case in practice), the leading few eigenfunctions determined by graphtheoretic analysis can be associated with some smooth coarsegrained manifold reflecting the largescale nonlinear geometry of .
Note that is in general not equal to the dimension of .
In NLSA, the family of with between one and some upper limit are employed to construct temporal spaces analogous to the chronos space in classical SSA. Specifically, given a trajectory on the data manifold, a function
(13) 
with expansion coefficients gives rise to a temporal process for . Thus, introducing the Hilbert space with weighted inner product
(14) 
the th temporal space in NLSA is the dimensional subspace of consisting of temporal patterns generated by functions of the form in (13).
Similarly to (4), for each there exists a linear map linking the temporal modes to the spatial modes through the formula
(15) 
Let be the orthonormal basis of from (10), and an orthonormal basis of . The matrix elements of with this choice of bases are
(16) 
where is the inner product of , and . Computing the SVD of the matrix then leads to a spectral decomposition of analogous to (5), viz.
(17) 
where and are elements of and orthogonal matrices, respectively, and are singular values (ordered in order of decreasing ). Each is a spatial pattern in expanded in the basis. Moreover, the entries are the expansion coefficients of the corresponding temporal pattern in the basis for , leading to the temporal process
(18) 
Thus, the decomposition of the signal in terms of the chronos and topos modes of , analogous to (7), is
(19) 
Note that by completeness of LaplaceBeltrami eigenfunctions [], converges to as . Moreover, the spatial and temporal covariance operators in NLSA, and , can be expressed as convolutions of the spatial and temporal twopoint correlation functions with the spectral kernel, , weighted by the Riemannian measure; see [15] for details.
To complete the procedure, in step (5) the are projected to dimensional physical space by writing
(20) 
and taking the average,
(21) 
This leads to the decomposition in (1).
2.2 Graphtheoretic analysis
In applications, the LaplaceBeltrami eigenfunctions for a finite dataset are computed by replacing the manifold by a weighted graph with vertices , and solving the eigenproblem of a transition probability matrix defined on the vertex set of , such that, for largeenough and smallenough , the right eigenvectors of approximate the corresponding LaplaceBeltrami eigenfunctions in (10); i.e.,
(22) 
with and . These eigenvectors satisfy an orthonormality condition which is the discrete analog to (10),
(23) 
with given by the invariant measure (leading left eigenvector) of ; namely,
(24) 
where , , and .
In the present work, we evaluate using the diffusion map (DM) algorithm of Coifman and Lafon [18], with a simple but important modification in the calculation of the Gaussian weight matrix. Specifically, we assign to each sample a local velocity in embedding space, , and evaluate the Gaussian weights
(25) 
where denotes the norm of . This approach was motivated by the clustering algorithm developed in [33], with the difference that in the latter paper is evaluated using spatial nearest neighbors, rather than the temporal nearest neighbors employed here. Locallyscaled kernels have also been used in other works in the literature; e.g., [21, 34].
In the standard implementation of DM, must be sufficiently small in order for the diffusion process represented by to be sensitive only to local neighborhoods around each data point. Here, the normalization by enforces geometrical localization even for . In [15], we found that this type of adaptive kernel significantly enhances the capability of NLSA to capture rare events. The remaining standard steps needed to compute given are [18]
(26a)  
(26b)  
(26c)  
(26d) 
Because the Riemannian measure in the DM family of algorithms is given by
(27) 
for some parameter , states with large acquire larger Riemannian measure than in the uniform case. Hereafter, we work with the socalled LaplaceBeltrami normalization, [18]. Even though we have not performed such tests, we believe that suitable basis functions for NLSA may be constructed using a variety of other graph Laplacian algorithms (e.g., [17, 35]), so long as those algorithms permit the use of timeadaptive weights of the form of (25).
The scalability of this class of algorithms to large problem sizes has been widely demonstrated in the machine learning and data mining literature. In particular, as a result of Gaussian decay of the weights, the matrix used in implementations is made sparse, e.g., by truncating to the largest nonzero elements per row with , significantly reducing the cost of the eigenvalue problem for . The leastfavorable scaling involves the pairwise distance calculation between the data samples in embedding space, which scales like if done in brute force. Despite the quadratic scaling with , the linear scaling with is of key importance, as it guarantees that NLSA does not suffer from a “curse of dimension” as do neuralnetworkbased methods [7]. Moreover, an scaling may be realized in the pairwisedistance calculation if the dimension of is smallenough for approximate treebased algorithms to operate efficiently [36]. Note also that the number of elements of the matrix (17) appearing in the SVD step of NLSA is smaller by a factor of compared to the full data matrix (3) used in SSA, resulting in significant efficiency gains in practical applications with . In the present study, all calculations were performed on a desktop workstation using bruteforce evaluation of pairwise distances.
2.3 Selecting the dimension of via spectral entropy
An important question concerning parameter selection in NLSA is how to choose the parameter controlling the dimension of the temporal spaces . Clearly, working at large is desirable, since the lengthscale on the data manifold resolved by the eigenfunctions spanning generally becomes smaller as grows. However, for a given finite number of samples , the approximation error in the eigenvectors of the graph Laplacian in (22) also increases with [18]. In other words, the eigenfunctions determined through graphtheoretic analysis will generally depend more strongly on for large , resulting in an overfit of the discrete data manifold. Thus, in practical applications it is important to establish criteria that allow one to determine a reasonable tradeoff between improved resolution and risk of overfitting.
One way of addressing this issue is to monitor the growth of an operator norm for with , seeking plateau behavior or shaped curves. A standard choice of operator norm in this context is the Frobenius norm, which may be evaluated conveniently via the matrix elements in (16), viz.
(28) 
However, as we will see below, this approach may lead to considerable uncertainty in the choice of . Instead, we find that a more effective method is to choose by monitoring changes in the distribution of the singular values with . In particular, we propose to assess the behavior of the spectrum of with via a spectral entropy measure, as follows.
Let
(29) 
be normalized weights measuring the distribution of energy among the spatiotemporal modes of , which, as usual, are ordered in order of decreasing . Consider also the energy distribution over modes determined by replicating for , and setting the energy in equal to . That is, we have
(30) 
Here, we measure the change in the spectrum of relative to through the relative entropy between the energy distributions and , i.e.,
(31) 
It is a standard result in information theory and statistics [37] that is a nonnegative quantity which vanishes if and only if for all . In Sec. 4.2 we demonstrate that as grows exhibits a sequence of spikes at small to moderate (as qualitatively new features appear in the spectrum of ), until it eventually settles to small values at higher . The practical criterion proposed here is to set to values near that transition.
3 Spatiotemporal patterns in the North Pacific sector of a comprehensive climate model
3.1 Dataset description
We apply the NLSA framework presented above to study variability in the North Pacific sector of CCSM3; specifically, variability of the mean upper 300 m sea temperature field in the 700 y equilibrated control integration used by Teng and Branstator [4] and Branstator and Teng [25] in work on the initial and boundaryvalue predictability of subsurface temperature in that model. Here, our objective is to diagnose the prominent modes of variability in a time series generated by a coupled general circulation model. In this analysis, the observable is the mean upper 300 m temperature field sampled every month at gridpoints (native ocean grid mapped to the model’s T42 atmosphere) in the region 20N–65N and 120E–110W.
3.2 Spatiotemporal patterns revealed by NLSA
Deferring a discussion on parameter selection to Sec. 4, we begin with a description of the spatiotemporal patterns determined via NLSA using a twoyear laggedembedding window and the leading LaplaceBeltrami eigenfunctions as basis functions for . Thus, the dimension of the spatial and temporal spaces is and , respectively. Throughout, we work with canonical Euclidean distances between vectors in embedding space,
(32) 
where denotes the th gridpoint value of the system state at time in embedding space. For the evaluation of the graphLaplacian eigenfunctions in (22) we nominally set (adaptive kernel width) and (number of edges per data point).
We display the singular values of the resulting operator in Fig. 1(a). To verify the robustness of our results, we evaluated the spectrum of for various parameter values in the intervals , , and . In the examples shown in Fig. 1(b), reducing to 1000 with and fixed to their nominal values causes some reshuffling of the modes (e.g., and ), but imparts little change to the singular values of the modes which are not reshuffled. Increasing to 47 reestablishes the mode ordering to some extent, bringing the and spectra to a near overlap for the first 17 modes. We will return to the question of setting in Sec. 4.2 ahead, but for the time being note that the modes which are most sensitive to changes of and/or in Fig. 1 lie on the SSA branch of the spectrum, i.e., they are modes carrying little spectral information in the sense of the relative entropy metric in (31). In general, the properties the leading modes in the all three families described below exhibit small sensitivity to changes of and in the examined ranges, with selected according to the criterion.
For the remainder of Sec. 3, we work with the nominal values , , and . Studying the corresponding temporal patterns from (18) in both the temporal and frequency (Fourier) domains, we find that the modes fall into three distinct families of periodic, lowfrequency, and intermittent modes, illustrated in Fig. 2, and described below. The resulting spatiotemporal patterns from (1) are shown in Fig. 3 and, more clearly, in the dynamical evolution in Movie S1. Note that the timelagged embedding in (2) is essential to the separability of the modes into these families; we will return to this point in Sec. 4.3.
Periodic modes. The periodic modes come in doublydegenerate pairs (see Fig. 1), and have the structure of sinusoidal waves with phase difference and frequency equal to integer multiples of 1 y. The leading periodic modes, and , represent the annual (seasonal) cycle in the data. In the physical domain [Fig. 3(c) and Movie S1(c)], these modes generate an annual oscillation of temperature anomalies, whose amplitude is largest (C) in the western part of the basin (E–E) and for latitudes in the range N–N. The second set of periodic modes, and , have semiannual variability. These modes exhibit significant amplitude in the western part of the domain [Fig. 3(e) and Movie S1(e)], but also along the West Coast of North America, which is consistent with semiannual variability of the upper ocean associated with the California current [38]. Together with the higherfrequency overtones, the modes in this family are the standard eigenfunctions of the Laplacian on the circle, suggesting that the data manifold has the geometry of a circle along one of its dimensions.
Lowfrequency modes. The lowfrequency modes are characterized by high spectral power over interannual to interdecadal timescales, and strongly suppressed power over annual or shorter time scales. As a result, these modes represent the lowfrequency variability of the upper ocean, which has been wellstudied in the North Pacific sector of CCSM3 [24, 25, 4]. The leading mode in this family [; see Fig. 2(b)], gives rise to a typical PDO pattern [Figure 3(c) and Movie S1(c)], where the most prominent basinscale structure is a horseshoelike temperature anomaly pattern developing eastward along the Kuroshio current, together with an anomaly of the opposite sign along the west coast of North America. The higher modes in this family gradually develop smaller spatial features and spectral content over shorter time scales than , but have no spectral peaks on annual or shorter timescales.
Intermittent modes. As illustrated in Fig. 3(f) and Movie S1(f), the key feature of modes in this family is temporal intermittency, arising out of oscillations at annual or higher frequency, which are modulated by relatively sharp envelopes with a temporal extent in the 2–10year regime. Like their periodic counterparts, the intermittent modes form nearly degenerate pairs (see Fig. 1), and their base frequency of oscillation is an integer multiple of 1 year. The resulting Fourier spectrum is dominated by a peak centered at at the base frequency, exhibiting some skewness towards lower frequencies.
In the physical domain, these modes describe processes with relatively fine spatial structure, which are activated during the intermittent bursts, and become quiescent when the amplitude of the envelopes is small. The most physicallyrecognizable aspect of these processes is enhanced transport along the Kuroshio current region, shown for the leadingtwo intermittent modes ( and ) in Figure 3(d). This process features sustained eastward propagation of smallscale, C temperature anomalies during the intermittent bursts. The intermittent modes higher in the spectrum also encode rich spatiotemporal patterns, including retrograde (westward) propagating anomalies, and gyrelike patterns resembling the subpolar and subtropical gyres. These features are shown in Movie S1(f), which displays a composite temperature anomaly field consisting of the leading four intermittent modes (; see Fig. 1).
4 Discussion
4.1 Intermittent processes and relation to SSA
The main result of this analysis, which highlights the importance of taking explicitly into account the nonlinear structure of complex highdimensional datasets, is the existence of intermittent patterns of variability in the North Pacific sector of CCSM3, which are not accessible through SSA. This type of variability naturally emerges by restricting the temporal modes to lie in the lowdimensional subspaces spanned by the leading LaplaceBeltrami eigenfunctions on the data manifold . The inner product of these vector spaces, weighted by the Riemannian measure in (14), plays an important role in the skill of NLSA of capturing intermittency and rare events [15]. Heuristically, this is because weighs each state by the volume it occupies in , which is expected to be large for rare and/or intermittent states.
As shown in Figs. 1 and 3, the spatiotemporal patterns determined through NLSA are in close agreement with SSA for the annual and lowfrequency modes, but intermittent modes have no SSA counterparts. In particular, instead of the qualitativelydistinct families of processes described above, the SSA spectrum is characterized by a smooth decay involving modes of progressively higher spatiotemporal frequencies, but with no intermittent behavior analogous, e.g., to mode in Fig. 2. The values associated with the intermittent modes and, correspondingly, their contributed variance of temperature anomaly, is significantly smaller than the periodic or lowfrequency modes. However, this is not to say the dynamical significance of these modes is negligible. In fact, intermittent events, carrying low variance, are widely prevalent features of complex dynamical systems [13, 14, 15]. Being able to capture this intrinsically nonlinear behavior constitutes one of the major strengths of the NLSA methodology presented here.
4.2 Selecting the temporalspace dimension through spectral entropy
As discussed in Sec. 2.3, an important issue in NLSA is the selection of the dimension of the temporal space through the number of LaplaceBeltrami eigenfunctions used in (12). Setting too low will cause some important features to be lost, due to underresolution on the data manifold. On the other hand, as grows, eventually the algorithm will overfit the data if the number of samples remains fixed. Here, we illustrate how the spectral entropy measure introduced in (31) can be used to inform the selection of suitable values of . For ease of comparison, we normalize to lie in the unit interval by applying the standard transformation [39]
(33) 
Figure 4 shows the dependence of , as well as the Frobenius norm of the linear operators in NLSA, for values of in the interval and embedding windows , 2, and 4 y. As expected, increases with , and apart from the case with , rapidly approaches values close to 90% of its maximum value (occurring for ). Compared with the corresponding behavior of the norm in truncated SSA expansions (also shown in Fig. 4), follows a more staircaselike growth, but because the operator norm is dominated by the leading few singular values carrying the majority of the energy of the signal, it is not immediately obvious when has reached an optimum value. On the other hand, the spectral entropy measure clearly transitions from a regime of sawtooth behavior with appreciable magnitude at small to moderate , to a regime of negligible amplitude at larger values of . That transition, which occurs around for the NLSA cases with and y ( for the case with no embedding and for SSA), indicates that increasing the dimension of beyond those values introduces no qualitatively new spatiotemporal patterns. This provides justification for the value used in Sec. 3. As illustrated in Fig. 5, the spectral gaps separating the lowfrequency, semiannual, and intermittent modes are absent from the spectrum with significantly exceeding the threshold value identified via .
4.3 The role of lagged embedding
The embedding in (2) of the input data in is essential to the separability of the temporal into distinct families of processes. To illustrate this, in Figure 6 we display the temporal mode that mostclosely resembles the PDO mode of Fig. 3(d), evaluated without embedding (, ). It is evident from both the temporal and Fourier representations of that mode that the decadal process recovered in Sec. 3.2 using a twoyear embedding window has been contaminated with highfrequency variability; in particular, prominent spectral lines at integer multiples of 1 y down to the maximum frequency of allowed by the monthly sampling of the data. An even stronger frequency mixing was found to take place in the corresponding temporal SSA modes.
Optimal values for the embedding window are generally hard to determine a priori. However, the following guidelines are useful in the parameter selection process if some physical information is available about the dataset:

External factors and nonMarkovian effects. If the data are known to be influenced by timedependent external factors (e.g., solar forcing), or exhibit nonMarkovian effects due to partial observations [29, 30, 40, 22], then a good initial choice for would be a value comparable to timescales associated with those factors. The y choice made here is compatible with the annually varying solar forcing, which is one of the most important external factors in upper ocean dynamics.

Timescales associated with propagating structures. If the dataset features propagating coherent structures (e.g., oceanic or atmospheric waves), then suitable candidate values for are given by the natural timescales (e.g., period, decorrelation time) of the propagating structures. Because each point in lagged embedding space corresponds to a spatiotemporal process of temporal extent , such propagating structures can be described via twofold degenerate sets of modes evolving in temporal quadrature. For instance, the y window employed here is longenough to capture prominent lowfrequency waves in the ocean such as baroclinic Rossby waves and intraseasonal Kelvin waves [41]. See [42] for an example involving organized convection in the tropics.
The relative entropy measure of Sec. 2.3 can also be used to inform the selection of , especially in situations with little a priori information about the dynamics. In separate calculations, we have verified that the modes separate into periodic, lowfrequency, and intermittent processes for embedding windows up to y, including the y case displayed in Fig. 4.
4.4 Seasonallydetrended data
A standard preprocessing step in the geosciences and other disciplines is to detrend the data by subtracting monthly climatology, or applying bandpass filtering, to isolate the signal of interest (see, e.g., [43, 44] for an overview of popular methods). In the context of NLSA algorithms, we advocate that such preprocessing can in fact lead to significant loss of information and performance degradation. Conceptually, preprocessing a nonlinear data manifold through a linear operation such as seasonalcycle subtraction may lead to distortion of the neighborhood relationships of the data, with adverse consequences on the extracted spatiotemporal patterns, which is what we have observed in the North Pacific data studied here.
Figure 7 shows the relative entropy measure obtained by applying NLSA with lagged embedding window y to the temperature field data of Sec. 3 with the monthly climatology (average temperature anomaly field for each month) subtracted. As manifested by the faster decay of compared to the y results of Fig. 4, detrending leads to a decrease in the number of modes that carry significant information. Indeed, the temporal patterns of the detrended data all have the same qualitative structure as the lowfrequency modes of Sec. 3.2, characterized by decaying frequency spectra with no obvious peaks; see Fig. 8. That seasonal detrending eliminates the annual periodic modes and their harmonics from the spectrum is intuitively obvious, but it turns out that the intermittent modes are removed from the spectrum as well. In particular, we have not been able to find modes with the spatial features characteristic of the intermittent modes in Fig. 3(e,f) in a number of experiments with temporal space dimension up to 34. We believe that the intermittent modes depend crucially on phase relationships with the seasonal cycle and its higher harmonics, and as result the algorithm fails to detect these modes if seasonality has been removed from the data.
5 Conclusions
Combining techniques from machine learning and the qualitative theory of dynamical systems, in this work we have presented a scheme called NLSA for spatiotemporal analysis of highdimensional time series, which takes explicitly into account the nonlinear geometrical structure of datasets arising in geophysics and other applied sciences. Like classical SSA [12], the method presented here utilizes timelagged embedding and SVD to produce a decomposition of time series generated by partial observations of highdimensional, complex dynamical systems into distinct spatiotemporal modes. However, the linear operators used here in the SVD step differs crucially from SSA in that their domains of definition are lowdimensional Hilbert spaces of squareintegrable functions on the nonlinear manifold comprised by the data (in a suitable coarsegrained representation via a graph). This family of spaces, , is tailored to the nonlinear geometry of through its Riemannian measure, and has high skill in capturing intermittency and rare events. As its dimension grows, provides a description of spatiotemporal patterns at increasingly fine resolution on the data manifold. Moreover, wellbehaved orthonormal basis functions for these spaces can be computed efficiently via graphLaplacian algorithms developed in data mining and machine learning [17, 18].
Applying this scheme to the upperocean temperature in the North Pacific sector of the CCSM3 model, we find a family of intermittent processes which are not captured by SSA. These processes describe eastwardpropagating, smallscale temperature anomalies in the Kuroshio current region, as well as retrogradepropagating structures at high latitudes and in the subtropics (see Movie S1). Moreover, they carry little variance of the raw signal, and display burstlike behavior characteristic of strongly nonlinear dynamics. The remaining identified modes include the familiar PDO pattern of lowfrequency variability, as well as annual and semiannual periodic processes. The ability to recover the intermittent modes relies crucially on retaining the seasonal cycle (monthly climatology) of the data, suggesting that these modes operate in nontrivial phase relationships with the periodic modes.
The nature of the analysis presented here is purely diagnostic. In particular, we have not touched upon the dynamical role of these modes in reproducing the observed dynamics, e.g., by triggering largescale regime transitions [14, 26]. This question was addressed to some extent in [15] in the context of a loworder chaotic model for the atmosphere, but remains open in applications involving highdimensional complex dynamical systems with unknown equations of motion. Here, statistical modeling techniques, such as Bayesian hierarchical modeling [45], combined with informationtheoretic methods for assessing predictability and model error [46, 47], are promising methods for training empirical models for these processes, and assessing their predictive skill. We plan to study these topics in future work.
Acknowledgments
This work was supported by NSF grant DMS0456713, ONR DRI grants N2574200F6607 and N000141010554, and DARPA grants N000140710750 and N000140811080. We thank G. Branstator and H. Teng for providing access to the CCSM3 dataset used in this analysis.
Supporting information available
Movie S1. Spatiotemporal patterns of the the upper 300 m temperature anomaly field (annual mean subtracted at each gridpoint) in the North Pacific sector of CCSM 3, evaluated using NLSA with (see Fig. 1) and SSA. (a) Raw data. (b) Leading lowfrequency mode from SSA. (c–f) Composite fields determined through NLSA. (c) Annual modes, and . (d) Leading lowfrequency mode, , describing the PDO. (e) Semiannual modes, and . (f) Leading four intermittent modes, , describing variability of the Kuroshio current and retrograde (westward) propagating structures. The starting time of this animation is the same as in Fig. 2.
References
 Kyo [2011] ICSU World Data System, Kyoto, Japan, 2011. Kyoto University.
 Dee et al. [2011] D. P. Dee et al. The ERAinterim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc., 137:553–597, 2011.
 Taylor et al. [2011] K. E. Taylor, R. J. Stouffer, and G. A. Meehl. An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 2011. doi: 10.1175/bamsd1100094.1. early online release.
 Teng and Branstator [2010] H. Teng and G. Branstator. Initialvalue predictability of prominent modes of North Pacific subsurface temperature in a CGCM. Climate Dyn., 36(9–10):1813–1834, 2010. doi: 10.1007/s0038201007497.
 Roundy [2011] P. E. Roundy. Tropical–extratropical interactions. In W. K. M. Lau and D. E. Waliser, editors, Intraseasonal Variability in the Atmosphere–Ocean Climate System. SpringerVerlag, Berlin, 2011.
 Lima et al. [2009] C. H. R. Lima, U. Lall, T. Jebara, and A. G. Barnston. Statistical prediction of ENSO from subsurface sea temperature using a nonlinear dimensionality reduction. J. Clim., 22:4501–4519, 2009. doi: 10.1175/2009jcli2524.1.
 Hsieh [2007] W. W. Hsieh. Nonlinear principle component analysis of noisy data. Neural Networks, 20:434–443, 2007.
 Christiansen [2005] B. Christiansen. The shortcomings of nonlinear component analysis in identifying circulation regimes. J. Climate, 2005.
 Vautard and Ghil [1989] R. Vautard and M. Ghil. Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series. Phys. D, 35:395–424, 1989. doi: 10.1016/01672789(89)900778.
 Aubry et al. [1991] N. Aubry, R. Guyonnet, and R. Lima. Spatiotemporal analysis of complex signals: Theory and applications. J. Stat. Phys., 64(3–4):683–739, 1991. doi: 10.1007/bf01048312.
 Golyandina et al. [2001] N. Golyandina, V. Nekrutkin, and A. Zhigljavsky. Analysis of Time Series Structure: SSA and Related Techniques. CRC Press, Boca Raton, 2001.
 Ghil et al. [2002] M. Ghil et al. Advanced spectral methods for climatic time series. Rev. Geophys., 40(1):1003, 2002. doi: 10.1029/2000rg000092.
 Aubry et al. [1993] N. Aubry, W.Y. Lian, and E. S. Titi. Preserving symmetries in the proper orthogonal decomposition. SIAM J. Sci. Comput., 14:483–505, 1993.
 Crommelin and Majda [2004] D. T. Crommelin and A. J. Majda. Strategies for model reduction: Comparing different optimal bases. J. Atmos. Sci., 61:2206, 2004.
 Giannakis and Majda [2012] D. Giannakis and A. J. Majda. Nonlinear Laplacian spectral analysis for time series with intermittency and lowfrequency variability. Proc. Natl. Acad. Sci., 109(7):2222–2227, 2012. doi: 10.1073/pnas.1118984109.
 Giannakis and Majda [2011a] D. Giannakis and A. J. Majda. Revealing qualitativelydistinct spatiotemporal processes in climate data through machine learning. In Conference on Intelligent Data Understanding (CIDU) 2011. NASA, 2011a.
 Belkin and Niyogi [2003] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373–1396, 2003. doi: 10.1162/089976603321780317.
 Coifman and Lafon [2006] R. R. Coifman and S. Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006. doi: 10.1016/j.acha.2006.04.006.
 Schölkopf et al. [1998] B. Schölkopf, A. Smola, and K. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput., 10:1299–1319, 1998. doi: 10.1162/089976698300017467.
 Nadler et al. [2006] B. Nadler, S. Lafon, R. R. Coifman, and I. Kevrikedes. Diffusion maps, spectral clustering, and reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal., 21:113–127, 2006.
 Singer et al. [2009] A. Singer, R. Erban, I. G. Kevrekidis, and R. R. Coifman. Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc. Natl. Acad. Sci., 106(38):16090–16095, 2009. doi: 10.1073/pnas.0905547106.
 Sauer et al. [1991] T. Sauer, J. A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65(3–4):579–616, 1991. doi: 10.1007/bf01053745.
 Collins et al. [2006] W. D. Collins et al. The community climate system model version 3 (CCSM3). J. Climate, 19:2122–2143, 2006. doi: 10.1175/jcli3761.1.
 Alexander et al. [2006] M. Alexander et al. Extratropical atmosphere–ocean variability in CCSM3. J. Climate, 19:2496–2525, 2006. doi: 10.1175/jcli3743.1.
 Branstator and Teng [2010] G. Branstator and H. Teng. Two limits of initialvalue decadal predictability in a CGCM. J. Climate, 23(23):6292–6311, 2010. doi: 10.1175/2010jcli3678.1.
 Overland et al. [2008] J. Overland, S. Rodionov, S. Minobe, and N. Bond. North Pacific regime shifts: Definitions, issues and recent transitions. Prog. Oceanog., 77:92–102, 2008. doi: 10.1016/j.pocean.2008.03.016.
 Bérard [1989] P. H. Bérard. Spectral Geometry: Direct and Inverse Problems, volume 1207 of Lecture Notes in Mathematics. SpringerVerlag, Berlin, 1989.
 Chung [1997] F. R. K. Chung. Spectral Graph Theory, volume 97 of CBMS Regional Conference Series in Mathematics. Americal Mathematical Society, Providence, 1997.
 Packard et al. [1980] N. H. Packard et al. Geometry from a time series. Phys. Rev. Lett., 45:712–716, 1980. doi: 10.1103/PhysRevLett.45.712.
 Takens [1981] F. Takens. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980, volume 898 of Lecture Notes in Mathematics, pages 366–381. Springer, Berlin, 1981. doi: 10.1007/bfb0091924.
 Deyle and Sugihara [2011] E. R. Deyle and G. Sugihara. Generalized theorems for nonlinear state space reconstruction. PLoS ONE, 6(3):e18295, 2011. doi: 10.1371/journal.pone.0018295.
 Coifman et al. [2005] R R. Coifman et al. Geometric diffusions as a tool for harmonic analysis and structure definition on data. Proc. Natl. Acad. Sci., 102(21):7426–7431, 2005. doi: 10.1073/pnas.0500334102.
 ZelnikManor and Perona [2004] L. ZelnikManor and P. Perona. Selftuning spectral clustering. In Advances in neural information processing systems, volume 17, page 1601, 2004.
 Rohrdanz et al. [2011] M. A. Rohrdanz, W. Zheng, M. Maggioni, and C. Clementi. Determination of reaction coordinates via locally scaled diffusion map. J. Chem. Phys., 134:124116, 2011. doi: 10.1063/1.3569857.
 Lee and Verleysen [2007] J. A. Lee and M. Verleysen. Nonlinear Dimensionality Reduction. Springer, New York, 2007.
 Arya et al. [1998] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. J. ACM, 45:891, 1998.
 Cover and Thomas [2006] T. A. Cover and J. A. Thomas. Elements of Information Theory. WileyInterscience, Hoboken, 2 edition, 2006.
 Mendelssohn et al. [2004] R. Mendelssohn, F. B. Schwing, and S. J. Bograd. Nonstationary seasonality of upper ocean temperature in the California Current. J. Geophys. Res., 109:C10015, 2004. doi: 10.1029/2004jc002330.
 Joe [1989] H. Joe. Relative entropy measures of multivariate dependence. J. Amer. Stat. Assoc., 84(405):157–164, 1989.
 Broomhead and King [1986] D. S. Broomhead and G. P. King. Extracting qualitative dynamics from experimental data. Phys. D, 20(2–3):217–236, 1986. doi: 10.1016/01672789(86)90031x.
 Chelton and Schlax [1996] D. B. Chelton and M. G. Schlax. Global observations of oceanic Rossby waves. Science, 272:234, 1996.
 Giannakis et al. [2012] D. Giannakis, W.w. Tung, and A. J. Majda. Hierarchical structure of the MaddenJulian oscillation in infrared brightness temperature revealed through nonlinear laplacian spectral analysis. In Conference on Intelligent Data Understanding (CIDU) 2012. NASA, 2012. submitted.
 von Storch and Zwiers [2002] H. von Storch and F. W. Zwiers. Statistical Analysis in Climate Research. Cambridge University Press, Cambridge, 2002.
 Waliser et al. [2009] D. Waliser et al. MJO simulation diagnostics. J. Clim., 22:3006–3030, 2009.
 Wikle and Berliner [2007] C. K. Wikle and L. M. Berliner. A Bayesian tutorial for data assimilation. Physica D, 230(1):1–16, 2007. doi: 10.1016/j.physd.2006.09.017.
 Giannakis and Majda [2011b] D. Giannakis and A. J. Majda. Quantifying the predictive skill in longrange forecasting. Part I: Coarsegrained predictions in a simple ocean model. J. Clim., 25(6):1793–1813, 2011b.
 Giannakis and Majda [2011c] D. Giannakis and A. J. Majda. Quantifying the predictive skill in longrange forecasting. Part II: Model error in coarsegrained Markov models with application to oceancirculation regimes. J. Clim., 25(6):1814–1826, 2011c.